Abstract Novel antibody‐drug conjugates highlight the benefits for breast cancer patients with low human epidermal growth factor receptor 2 (HER2) expression. This study aims to develop and validate a Vision Transformer (ViT) model based on dynamic contrast‐enhanced MRI (DCE‐MRI) to classify HER2‐zero, ‐low, and ‐positive breast cancer patients and to explore its interpretability. The model is trained and validated on early enhancement MRI images from 708 patients in the FUSCC cohort and tested on 80 and 101 patients in the GFPH cohort and FHCMU cohort, respectively. The ViT model achieves AUCs of 0.80, 0.73, and 0.71 in distinguishing HER2‐zero from HER2‐low/positive tumors across the validation set of the FUSCC cohort and the two external cohorts. Furthermore, the model effectively classifies HER2‐low and HER2‐positive cases, with AUCs of 0.86, 0.80, and 0.79. Transcriptomics analysis identifies significant biological differences between HER2‐low and HER2‐positive patients, particularly in immune‐related pathways, suggesting potential therapeutic targets. Additionally, Cox regression analysis demonstrates that the prediction score is an independent prognostic factor for overall survival (HR, 2.52; p = 0.007). These findings provide a non‐invasive approach for accurately predicting HER2 expression, enabling more precise patient stratification to guide personalized treatment strategies. Further prospective studies are warranted to validate its clinical utility. Keywords: breast cancer, deep learning, medical imaging, transcriptomics analysis __________________________________________________________________ This study develops a Vision Transformer‐based DCE‐MRI model for the non‐invasive classification of HER2 expression in breast cancer, demonstrating robust performance across multicenter cohorts. By integrating transcriptomic analysis, the model reveals immune‐related pathway differences among distinct HER2 expression levels. This approach offers insights into treatment stratification and advances precision oncology. graphic file with name ADVS-12-e03925-g006.jpg 1. Introduction Breast cancer is the most common malignant tumor among women worldwide and one of the leading causes of cancer‐related deaths.^[ [50]^1 ^] Human epidermal growth factor receptor 2 (HER2) plays a critical role in the molecular subtyping and treatment of breast cancer, with its expression status directly affecting patient prognosis and therapeutic strategies.^[ [51]^2 ^] With the emergence of novel antibody‐drug conjugates (ADCs), the subtype of HER2‐low expression (immunohistochemistry [IHC] 1+ or IHC 2+ with in situ hybridization [ISH] ‐) has gradually gained attention. Precisely distinguishing between HER2‐zero, HER2‐low, and HER2‐positive is crucial for identifying potential beneficiaries of ADCs therapies.^[ [52]^3 , [53]^4 ^] Recent clinical trials have demonstrated that patients with HER2‐low breast cancer significantly benefit from novel HER2‐targeted ADCs.^[ [54]^3 , [55]^5 ^] Unlike HER2‐positive patients who traditionally respond to HER2‐targeted therapies, HER2‐low patients were previously classified as HER2‐negative and lacked effective targeted treatment options. Accurate differentiation among HER2‐zero, HER2‐low, and HER2‐positive tumors is therefore essential, as misclassification may result in missed opportunities for appropriate ADCs therapies and suboptimal clinical outcomes. Traditional pathological techniques rely on invasive tissue sampling and may not fully capture the tumor heterogeneity. Therefore, developing a non‐invasive and reliable method to preoperatively assess HER2 expression status in breast cancer is essential for optimizing treatment strategies. Magnetic resonance imaging (MRI) plays a significant role in the diagnosis, staging, and therapeutic monitoring of breast cancer.^[ [56]^6 , [57]^7 , [58]^8 , [59]^9 ^] MRI‐based radiomics can extract high‐throughput quantitative features from medical images, reflecting tumor heterogeneity and microenvironment characteristics.^[ [60]^8 , [61]^9 ^] However, traditional radiomics methods require manual feature extraction and have limited interpretability, making it difficult to fully elucidate the biological mechanisms behind model decisions. Vision Transformer (ViT), a deep learning architecture that utilizes self‐attention mechanisms to capture both global and local information in images,^[ [62]^10 , [63]^11 ^] has achieved significant results in medical image analysis. However, the black‐box nature of deep learning models limits insight into their decision‐making processes, posing challenges for clinical adoption. Radiogenomics, by integrating imaging features with multi‐omics data, provides biological interpretability for the model.^[ [64]^12 ^] Studies have shown that the predictive results of deep learning models are correlated with tumor gene expression patterns, pathway activities, and tumor microenvironment features.^[ [65]^13 , [66]^14 , [67]^15 , [68]^16 ^] By conducting bioinformatics analysis on the model's predictions, it is possible to explore the relationship between model decisions and tumor biological characteristics, revealing the biological interpretability of the models and promoting their application in clinical practice. In this study, we developed a ViT deep learning model using DCE‐MRI data to non‐invasively distinguish HER2 expression in breast cancer. Next, we evaluated the interpretability of the model's decisions using attention maps and transcriptomics correlation analysis. Furthermore, given the prognostic value of HER2 status, we hypothesized that the features the model focuses on during decision‐making may correlate with tumor biological characteristics and prognosis and the prediction score is a significant independent predictor of overall survival (OS). 2. Experimental Section 2.1. Ethics Statement The multicenter retrospective study was conducted in accordance with the Declaration of Helsinki. Ethics approval was obtained from the Institutional Review Board of Fudan University Shanghai Cancer Center ([69]NCT04461990), as well as from the ethics boards of Guangzhou First People's Hospital and the First Affiliated Hospital of China Medical University. All participants provided written consent after being informed. 2.2. Study Patients We retrospectively collected three independent patient cohorts. The Fudan University Shanghai Cancer Center (FUSCC) cohort included 879 breast cancer patients treated at Fudan University Shanghai Cancer Center between November 2011 and January 2016. The Guangzhou First People's Hospital (GFPH) cohort included 100 patients treated at Guangzhou First People's Hospital between November 2017 and December 2022. The First Hospital of China Medical University (FHCMU) cohort included 140 patients treated at the First Hospital of China Medical University between October 2018 and December 2021. The inclusion criteria were as follows: a) pathologically confirmed primary breast cancer; b) availability of pretreatment MRI images; and c) HER2 expression status determined by the IHC and/or ISH of postsurgical specimens. The exclusion criteria included a) incomplete DCE‐MRI; b) severe motion artifacts; and c) multiple lesions with different pathological HER2 expression status. Finally, 708 samples were retained in the FUSCC cohort, 80 in the GFPH cohort, and 101 in the FHCMU cohort. The clinical characteristics of all three cohorts are summarized in Table [70]1 . In addition, the FUSCC cohort included follow‐up data of each patient (n = 708), among whom 367 had matched RNA sequencing data obtained from surgically resected tumor specimens, allowing for integrated analysis of imaging and transcriptomic profiles. Table 1. Summary of Demographic and Clinical Data from Three Study Cohorts. Variable FUSCC cohort GFPH cohort FHCMU cohort (n = 708) (n = 80) (n = 101) Age (y)[71] ^a) 52.6 (10.5) 54.4 (9.3) 49.2 (10.4) Menopause YES 403 (56.9) 56 (70.0) 44 (43.6) NO 301 (42.5) 24 (30.0) 57 (56.4) NA 4 (5.6) 0 0 Histology Invasive 648 (91.5) 73 (91.3) 93 (92.1) Others 60 (8.5) 7 (8.7) 8 (7.9) pT stage T1 303 (42.8) 15 (18.8) 6 (5.9) T2 379 (53.5) 15 (18.8) 64 (63.4) T3 19 (2.7) 1 (1.3) 21 (20.8) T4 0 0 10 (9.9) NA 7 (1.0) 49 (61.1) 0 pN stage N0 365 (51.6) 23 (28.6) 17 (16.8) N1 182 (25.7) 6 (7.5) 57 (56.4) N2 89 (12.6) 1 (1.3) 15 (14.9) N3 66 (9.3) 1 (1.3) 12 (11.9) NA 6 (0.8) 49 (61.1) 0 Grade G1 5 (0.7) 8 (10.0) 0 G2 272 (38.4) 35 (43.7) 73 (72.3) G3 369 (52.1) 12 (15.0) 27 (26.7) NA 62 (8.8) 25 (31.3) 1 (1.0) ER status Positive 401 (56.6) 57 (71.2) 64 (63.4) Negative 307 (43.4) 23 (28.8) 37 (36.6) PR status Positive 343 (48.4) 51 (61.7) 66 (65.3) Negative 365 (51.6) 29 (36.3) 35 (34.7) HER2 status Positive 226 (31.9) 38 (47.5) 22 (21.8) Low 344 (48.6) 32 (40.0) 53 (52.5) Zero 138 (19.5) 10 (12.5) 26 (25.7) Recurrence free survival (m)[72] ^b) 78.1 (37.6‐90.6) NA NA Overall survival (m)[73] ^b) 80.4 (52.4–91.3) NA NA [74]Open in a new tab Note—Except where indicated, data are numbers of women with percentages in parentheses. HER2 = human epidermal growth factor receptor 2; ER = estrogen receptor; PR = progesterone receptor; NA = not available; ^^a) Data are means ± SDs; ^^b) Data are medians, with IQRs in parentheses. Second, a diagnostic framework comprising two classification tasks was designed. Task 1 aimed to distinguish HER2‐zero from HER2‐low/positive breast cancers, given that patients with HER2‐zero tumors have not demonstrated clinical benefit from currently approved HER2‐targeted ADCs. Task 2 aimed to distinguish HER2‐low from HER2‐positive breast cancers, with the goal of refining treatment decision‐making among potential candidates for novel ADCs therapies. For both tasks, patients from the FUSCC cohort were randomly allocated into training and validation sets at a ratio of 8:2, and patients from the GFPH cohort and FHCMU cohort were used as external test sets. The study flowchart is shown in Figure [75]1 . Detailed patient distributions for training, validation, and test sets are provided in [76]Supporting Information. Figure 1. Figure 1 [77]Open in a new tab Data curation flowchart. DCE‐MRI = dynamic contrast‐enhanced magnetic resonance imaging. HER2 = human epidermal growth factor receptor 2. 2.3. HER2 Expression Status Classification HER2 expression status was determined by postsurgical specimens following the American Society of Clinical Oncology/College of American Pathologists guidelines.^[ [78]^17 ^] HER2‐zero was defined as an IHC score of 0, HER2‐low as an IHC score of 1+ or 2+ with a negative ISH result, and HER2‐positive as an IHC score of 3+ or 2+ with a positive ISH result. 2.4. Clinical Outcomes For prognosis prediction, the primary endpoint was relapse‐free survival (RFS), defined as the time from diagnosis to the first recurrence of locoregional recurrence, distant metastasis, a diagnosis of contralateral breast cancer, or death from any cause. The secondary endpoint was overall survival (OS), defined as the time from diagnosis to death from any cause. Patients without events were censored at the time of the last follow‐up. 2.5. Breast DCE‐MRI Acquisition and Evaluation All preoperative examinations were performed on three MRI scanners. The DCE‐MRI data acquisition parameters and MRI protocols for each cohort are presented in Table [79]S1 (Supporting Information) and [80]Supporting Information. Tumor regions of interest (ROIs) were delineated semiautomatically on the early enhanced phase of DCE‐MRI by 3D Slicer software (version 4.10; [81]www.slicer.org),^[ [82]^18 ^] with the largest cross‐sectional slice selected as the model input. ROI segmentation was performed by a radiologist (C.Y., with 9 years of experience in breast MRI), blinded to the histopathological results. To ensure reproducibility and the stability of segmentation outcomes, intraclass correlation coefficients (ICCs) were obtained from repeatability experiments, which involved tumor outlining and feature extraction. Sixty randomly selected samples from the FUSCC cohort were assessed for intra‐ and inter‐observer agreement. For intra‐observer evaluation, the same radiologist (C.Y.) repeated the ROI delineation for these cases after a two‐week interval. For inter‐observer evaluation, another radiologist (Y.Y.S., with 4 years of experience in breast MRI) independently performed ROI segmentation on the same cases, blinded to clinical and histopathological information. An ICC value greater than 0.75 was considered indicative of good agreement, and greater than 0.9 as excellent according to established guidelines. The concordance analysis revealed high consistency, with intra‐observer ICCs exceeding 0.9 and inter‐observer ICCs above 0.8, indicating reliable segmentation. Based on these results, the more experienced radiologist completed the entire ROI segmentation for each MRI scan layer. For the GFPH cohort and FHCMU cohort, tumor ROIs were manually delineated following the same protocol. The section containing the largest tumor region was chosen as the model's input (Figure [83]2A). Figure 2. Figure 2 [84]Open in a new tab A) Overview of the workflow of this study. First, three independent breast cancer cohorts (FUSCC cohort: n = 708; GFPH cohort: n = 80; FHCMU cohort: n = 101) were integrated, and tumor regions of interest were manually segmented. Second, Vision Transformer (ViT) models were trained to perform two classification tasks: differentiating human epidermal growth factor receptor 2 (HER2)‐low/positive from HER2‐zero tumors, and then distinguishing HER2‐positive from HER2‐low tumors. Third, model performance was quantified using receiver operating characteristic (ROC) curves and confusion matrices. Fourth, for task 2 (distinguishing HER2‐positive from HER2‐low), interpretability was investigated using attention maps and transcriptomics analysis. Finally, Kaplan–Meier survival analysis and Cox proportional hazards regression were utilized for risk stratification to assess prognostic value. DCE = dynamic contrast‐enhanced. B) Detailed architecture of the Vision Transformer (ViT_tiny) model comprising 12 transformer layers, each with 3 parallel attention heads. Input images are processed with non‐overlapping 16 × 16 pixel patches, and the model operates with a hidden state dimension of 192. MLP = multilayer perceptron. 2.6. Model Development As shown in Figure [85]2, two ViT deep learning models were trained: 1) to distinguish HER2‐zero from HER2‐low and ‐positive breast cancer, and 2) to distinguish HER2‐low from HER2‐positive. Given the increasing clinical importance of HER2‐low in the context of novel ADCs, this study focused on distinguishing HER2‐low from HER2‐positive breast cancers to guide treatment strategies, following the initial exclusion of HER2‐zero tumors for whom current HER2‐targeted ADCs have not demonstrated clinical benefit. ViT leverages multi‐head self‐attention mechanisms to capture global relationships within images. The models were implemented using the ViT_tiny architecture from the timm library ([86]https://github.com/huggingface/pytorch‐image‐models),^[ [87]^19 ^] with pretrained weights initialized from ImageNet. The models were trained on early‐phase enhanced tumor images from the DCE‐MRI training set and validated on the validation set and independent test sets. To further explore the interpretability of the model in distinguishing HER2‐low from HER2‐positive breast cancer, attention maps were generated to highlight regions critical for decision‐making,^[ [88]^10 ^] while transcriptomics analysis (including differential expression analysis and functional enrichment analysis^[ [89]^20 , [90]^21 ^]) was conducted to investigate underlying biological mechanisms. Additional details on the ViT architecture, training parameters, and evaluation processes are provided in [91]Supporting Information. 2.7. Transcriptomic Data Generation and Analysis Sample processing for total RNA extraction, RNA sequencing procedures, and bioinformatic operations and analysis are presented in the [92]Supporting Information. 2.8. Statistical Analysis All statistical analyses were conducted using Python (version 3.8.0, [93]https://www.python.org) and R software (version 4.1.2, [94]https://www.r‐project.org). Patient characteristics were analyzed using the student's t‐test for continuous variables and the χ^2 test or Fisher's exact test for categorical variables. The performance of the ViT models was evaluated using metrics including accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). The optimal classification thresholds were determined using the Youden index. Bootstrapped 95% confidence intervals (CI) were calculated using 1000 resamples. Hub genes were identified using the STRING^[ [95]^22 ^] ([96]https://string‐db.org) database and Cytoscape^[ [97]^23 ^] (version 3.10.2, [98]https://cytoscape.org). Survival curves for OS and RFS were estimated using the Kaplan‒Meier analysis, and the log‐rank test was used to compare survival between different groups. Hazard ratios (HR) with 95% CI were calculated using univariate and multivariate Cox proportional hazards regression models to identify prognostic factors associated with OS. All statistical tests were two‐sided, and a p‐value <05 was considered statistically significant. 3. Results 3.1. Clinical Characteristics The clinical characteristics of patients in the FUSCC cohort (n = 708), GFPH cohort (n = 80), and FHCMU cohort (n = 101) are summarized in Table [99]1. The mean age was 52.6 years (SD, 10.5) for the FUSCC cohort, 54.4 years (SD, 9.3) for the GFPH cohort, and 49.2 years (SD, 10.4) for the FHCMU cohort. In the FUSCC cohort, the median RFS was 78.1 months (IQR, 37.6–90.6), and the median OS was 80.4 months (IQR, 52.4–91.3). For task 1 and task 2, no statistically significant differences in baseline characteristics were observed between the training and validation set (Tables [100]S2 and [101]S3, Supporting Information). 3.2. Performance of the Model in Distinguishing HER2‐Zero versus HER2‐Low and HER2‐Positive Status The performance of the model in distinguishing HER2‐zero from HER2‐low/positive across the training, validation, and test datasets is illustrated in Figure [102]3A. The AUC was 0.85 for the training set, 0.80 for the validation set, 0.73 for the test set 1, and 0.71 for the test set 2. Figure 3. Figure 3 [103]Open in a new tab Performance of the Vision Transformer (ViT) model in distinguishing different status of human epidermal growth factor receptor 2 (HER2) expression. A) Receiver operating characteristic curves for distinguishing HER2‐low/positive from HER2‐zero, shown for the training, validation, and two independent test sets. B,C) Confusion matrices for the two independent test sets for the HER2‐low/positive versus HER2‐zero classification. D) Receiver operating characteristic curves for distinguishing HER2‐low from HER2‐positive, shown for the training, validation, and two independent test sets. E,F) Confusion matrices for the two independent test sets for the HER2‐low versus HER2‐positive classification. AUC = area under the receiver operating characteristic curve. CI = confidence intervals. The model achieved an accuracy, sensitivity, specificity, and F1 score of 0.73, 0.70, 0.83, and 0.81; in the training set; 0.74, 0.76, 0.67, and 0.82 in the validation set; 0.65, 0.64, 0.70 and 0.76 in the test set 1; and 0.71, 0.89, 0.19 and 0.82 in the test set 2, respectively (Table [104]2 , Figure [105]3B,C). Table 2. Performance Metrics of the ViT Model for HER2 Classification Tasks. Tasks and datasets AUC [95% CI] Accuracy [95% CI] Sensitivity [95% CI] Specificity [95% CI] F1 score [95% CI] Task 1 Training set 0.85 (0.82, 0.89) 0.73 (0.69, 0.76) 0.70 (0.66, 0.74) 0.83 (0.75, 0.90) 0.81 (0.78, 0.83) Validation set 0.80 (0.70, 0.88) 0.74 (0.66, 0.81) 0.76 (0.67, 0.83) 0.67 (0.50, 0.83) 0.82 (0.78, 0.87) Test set 1 0.73 (0.57, 0.86) 0.65 (0.55, 0.75) 0.64 (0.53, 0.75) 0.70 (0.38, 1.00) 0.76 (0.71, 0.82) Test set 2 0.71 (0.60, 0.82) 0.71 (0.62, 0.79) 0.89 (0.82, 0.96) 0.19 (0.05, 0.35) 0.82 (0.77, 0.87) Task 2 Training set 0.95 (0.93, 0.97) 0.91 (0.88, 0.94) 0.92 (0.89, 0.95) 0.88 (0.83, 0.93) 0.93 (0.91, 0.95) Validation set 0.86 (0.80, 0.92) 0.77 (0.69, 0.84) 0.63 (0.51, 0.77) 0.89 (0.81, 0.95) 0.72 (0.65, 0.79) Test set 1 0.80 (0.69, 0.90) 0.74 (0.63, 0.84) 0.53 (0.36, 0.71) 0.92 (0.82, 1.00) 0.65 (0.56, 0.75) Test set 2 0.79 (0.68, 0.88) 0.67 (0.57, 0.76) 0.55 (0.42, 0.67) 0.95 (0.84, 1.00) 0.70 (0.64, 0.76) [106]Open in a new tab Note.—Task 1 refers to the classification of HER2‐zero versus HER2‐low/positive breast cancer cases. Task 2 refers to the classification of HER2‐low versus HER2‐positive breast cancer cases. AUC = area under the receiver operating characteristic curve. CI = confidence interval. 3.3. Performance of the Model in Distinguishing HER2‐Low versus HER2‐Positive Status The performance of the model in distinguishing HER2‐low from HER2‐positive across the training, validation, and test datasets is illustrated in Figure [107]3D. The AUC was 0.95 for the training set, 0.86 for the validation set, 0.80 for the test set 1, and 0.79 for the test set 2. The model achieved an accuracy, sensitivity, specificity, and F1 score of 0.91, 0.92, 0.88, and 0.93 in the training set; 0.77, 0.63, 0.89, and 0.72 in the validation set; 0.74, 0.53, 0.92, and 0.65 in the test set 1; and 0.67, 0.55, 0.95, and 0.70 in the test set 2, respectively (Table [108]2, Figure [109]3E,F). 3.4. Attention Map Analysis of the Model for Distinguishing HER2‐Low and HER2‐Positive To improve the interpretability of our deep learning model in distinguishing HER2‐positive from HER2‐low breast cancer patients, attention maps were used to visualize pixel importance through color gradients, highlighting the most significant regions contributing to the model's decision. By visualizing pixel importance through color gradients, the maps highlighted distinct features in the imaging data. In the HER2‐low group, the attention maps exhibited broader and more intense activation regions reflecting the heterogeneous characteristics associated with HER2‐low tumors, while fewer and more focused regions were activated for the HER2‐positive group (Figure [110]4A,B). Figure 4. Figure 4 [111]Open in a new tab Interpretability of the Model for Distinguishing HER2‐Low and HER2‐Positive. Diagram presents attention maps for six representative patients, with three diagnosed as HER2‐Low A) and three diagnosed as HER2‐Positive B). In each panel, the first column shows the original MRI images with the deep learning (DL) score for each patient, the second column displays magnified regions of interest, and the third column overlays attention maps, highlighting areas that the model identified as being of high importance for its classification decision. C) Volcano plot showing the differentially expressed genes (DEGs) based on the predicted HER2‐Positive and HER2‐Low groups. Functional enrichment analysis of DEGs: D) GO enrichment analysis; E) KEGG enrichment analysis. F) Protein‐protein interaction (PPI) network of DEGs. Top 10 hub genes were identified using the Maximal Clique Centrality (MCC) method. 3.5. Biological Relevance of the Model for Distinguishing HER2‐Low and HER2‐Positive To further investigate the biological relevance underlying our model's predictions in distinguishing HER2‐low from HER2‐positive breast cancer, we performed a differential gene expression analysis based on the model's predicted classifications. In total, 1173 differentially expressed genes (DEGs) were identified between the model‐predicted HER2‐low and HER2‐positive groups (Table [112]S4, Supporting Information). Among these, ERBB2, GRB7, and S100A9 were the most upregulated in HER2‐positive tumors, while SCUBE2 was the most downregulated in HER2‐positive tumors. These genes are critical in tumor progression and HER2 signaling pathways. Next, GO and KEGG analyses were performed to further elucidate the biological functions of the captured model‐based genes (Figure [113]4E,D; Figures [114]S4 and [115]S5; Table [116]S5–S10, Supporting Information). The GO analysis revealed significant enrichment in immune‐related processes such as immune receptor activity (p.adjust <0.001) and cytokine receptor activity (p.adjust <0.001), which are central to immune modulation and tumor‐immune interactions. Additionally, pathways related to T cell receptor binding and MHC protein complex binding were also enriched (p.adjust <0.001), highlighting the involvement of antigen presentation and immune surveillance in shaping tumor‐immune dynamics (Figure [117]4E). The KEGG analysis supported these findings, with the top enriched pathways including cytokine‐cytokine receptor interaction (p.adjust <0.001) and chemokine signaling pathway (p.adjust <0.001). These pathways are central to immune system regulation and cell communication, further indicating the importance of immune signaling in the tumor microenvironment. Moreover, cell adhesion molecules (p.adjust <0.001) and antigen processing and presentation pathways were also enriched, suggesting that the model's predictions may capture biological differences in how immune cells engage with and recognize tumor cells (Figure [118]4D). We then constructed a protein‐protein interaction (PPI) network using the DEGs and identified the top 10 hub genes through the Maximal Clique Centrality (MCC) method. These hub genes, including CTLA4, CD4, CD8A, CCR7, CD27, GZMB, PRF1, IL7R, FCGR3A, and CD69, are primarily involved in immune regulation and T‐cell activation, reflecting the model's focus on immune‐related mechanisms in distinguishing HER2‐positive from HER2‐low tumors. (Figure [119]4F). 3.6. Prognostic Value of the Model for Distinguishing HER2‐Low and HER2‐Positive Moreover, to explore the prognostic value of the model, Kaplan‐Meier analysis was employed to evaluate RFS and OS based on both the actual HER2 status and the predicted risk groups (cutoff = 0.5, derived from model predictions) in the FUSCC cohort. Patients in the predicted high‐risk group had significantly poorer OS (p = 0.02, Figure [120]5A), whereas no significant difference in OS was observed between groups defined by actual HER2 status (p = 0.069, Figure [121]S1, Supporting Information). RFS showed no significant differences for either the predicted risk groups or the actual HER2 status group (both p > 0.05, Figures [122]S2 and [123]S3, Supporting Information). Univariable Cox proportional hazards analysis was conducted to assess the prognostic significance of the predicted risk group, tumor size, positive lymph nodes, age, menopausal status, and lymphovascular invasion status. Multivariable Cox analysis demonstrated the predicted high‐risk group as an independent OS prognostic factor (HR, 2.52; 95% CI, 1.29–4.93; p = 0.007; Figure [124]5B, Table [125]3 ). Figure 5. Figure 5 [126]Open in a new tab A) Kaplan–Meier analysis for overall survival (OS) based on high‐ and low‐risk groups (cutoff = 0.5) predicted by the Vision Transformer (ViT) model in distinguishing HER2‐low from HER2‐positive patients (p = 0.02 by log‐rank test) in FUSCC cohort. B) Forest plot of the multivariable Cox proportional hazards model based on OS in FUSCC cohort. LN = lymph node. Table 3. Uni‐ and Multivariable Cox Regression Analyses for OS in FUSCC cohort. Variable Univariable HR p value Multivariable HR p value Group (high risk vs low risk) 2.10 (1.11, 3.99) 0.023 2.52 (1.29, 4.93) 0.007 Tumor size 1.33 (1.08, 1.62) 0.006 1.29 (1.03, 1.63) 0.027 Positive LN 1.12 (1.09, 1.14) <0.001 1.11 (1.08, 1.14) <0.001 Age 1.06 (1.02, 1.09) <0.001 1.09 (1.04, 1.14) <0.001 Menopause (yes vs no) 2.27 (1.10, 4.68) 0.026 0.64 (0.23, 1.79) 0.393 LVI (yes vs no) 1.92 (0.99, 3.72) 0.052 [127]Open in a new tab Note.—Data in parentheses are 95% CIs. Group risk stratification was based on model predictions (cutoff = 0.5). LVI = lymphovascular invasion status. LN = lymph node. 4. Discussion This study developed a ViT‐based deep learning model using DCE‐MRI to non‐invasively assess HER2 expression in breast cancer, identifying HER2‐low/positive cases and further distinguishing between HER2‐low and HER2‐positive subtypes. The model demonstrated robust performance across multicenter datasets, achieving AUC values of 0.73 and 0.71 in external test sets for distinguishing HER2‐zero from HER2‐low/positive cancers, and 0.80 and 0.79 for distinguishing HER2‐low from HER2‐positive cancers. Transcriptomic analysis revealed immune microenvironment features specific to HER2‐low tumors, supporting the model's biological interpretability. Moreover, survival analysis showed the model's prediction scores, derived from distinguishing HER2‐low from HER2‐positive patients, as independent predictors of overall survival (OS), highlighting the clinical value of this model. Accurate distinguishing HER2‐low breast cancers is particularly critical, as HER2‐low patients have now emerged as a distinct therapeutic subgroup eligible for novel ADCs treatments.^[ [128]^24 ^] HER2‐zero patients generally do not benefit from HER2‐targeted therapies. Our ViT model addresses this clinical need, supporting more precise therapeutic decision‐making and potentially expanding the pool of patients who can benefit from targeted therapies. Previous studies have utilized multiparametric MRI, incorporating sequences such as DWI, T2WI, and DCE‐MRI to explore MRI‐based radiomics for HER2 characterization.^[ [129]^25 , [130]^26 , [131]^27 , [132]^28 , [133]^29 ^] Ramtohul et al. developed MRI radiomics models using multiparametric MRI sequences to differentiate HER2‐zero from HER2‐low/positive cancers, with an AUC of 0.80 in the external test set.^[ [134]^25 ^] Similarly, Chen et al. employed habitat imaging based on multiparametric MRI to quantify intratumoral heterogeneity, demonstrating remarkable performance in differentiating HER2‐positive, HER2‐low, and HER2‐zero breast cancers.^[ [135]^28 ^] Building on these findings, our study utilized DCE‐MRI‐based ViT models and achieved robust and substantial performance, especially in distinguishing HER2‐low from HER2‐positive breast cancers, likely due to the ViT's advanced feature extraction capabilities, which leverage self‐attention mechanisms to capture both global and local imaging patterns. Compared with convolutional neural networks (CNNs), which rely on local receptive fields and hierarchical feature aggregation, ViT models employ self‐attention mechanisms that enable direct modeling of long‐range dependencies and global contextual relationships within medical images. This architecture allows ViT to better preserve spatial coherence and capture complex feature interactions, particularly important in cases like HER2‐low breast cancer where subtle heterogeneity can influence clinical decision‐making. However, relying on a single sequence may limit the model's ability to capture the additional discriminative information provided by other MRI sequences. Meanwhile, the biological similarities between HER2‐zero and HER2‐low tumors, both traditionally classified as HER2‐negative, lead to overlapping imaging features that further complicate differentiation. Together, these factors likely contribute to the slightly lower AUCs observed when distinguishing HER2‐zero from HER2‐low/positive cancers. Additionally, the modeling approaches employed in previous studies have predominantly been based on traditional radiomics, which involves manual feature extraction and selection, potentially missing intricate spatial relationships within imaging data, while studies leveraging deep learning remain relatively limited.^[ [136]^30 , [137]^31 ^] Guo et al further advanced this field by integrating radiomics with deep learning features to develop a deep learning radiomics (DLR) model, which outperformed traditional radiomics models in distinguishing HER2 status.^[ [138]^30 ^] This approach highlighted the benefits of combining manual radiomic features with automatically learned deep features to enhance model performance. In contrast, our ViT‐based deep learning model learns and integrates both global and local imaging features through self‐attention mechanisms in an end‐to‐end manner. This not only enhances the model's ability to detect nuanced differences in HER2 expression but also improves robustness across multicenter datasets. By eliminating the need for manual feature extraction and leveraging the powerful feature representation capabilities of ViT, our model effectively captures the subtle imaging patterns associated with different HER2 statuses. Given the advantages the model, it has the potential to bring potential value to clinical workflows. On the one hand, it enables non‐invasive HER2 status evaluation based on DCE‐MRI, potentially serving as a “virtual biopsy” to reduce the invasiveness and associated risks of traditional tissue biopsies. On the other hand, it facilitates dynamic monitoring of HER2 status during treatment, allowing clinicians to adjust therapeutic strategies in a timely manner. To elucidate the biological relevance of our ViT model, we integrated attention map analysis with gene expression profiling to investigate the molecular underpinnings of HER2 status differentiation. Our findings demonstrated that the broader regions of activation identified in HER2‐low tumors align with the known heterogeneity of this subtype.^[ [139]^32 , [140]^33 ^] The differential expression of key genes further underscores the biological divergence between HER2‐low and HER2‐positive tumors, particularly involving tumor proliferation, HER2 signaling activation, and immune regulation. ERBB2, encoding the HER2 receptor, was markedly upregulated in HER2‐positive tumors, supporting its central role in oncogenic pathway activation and tumor proliferation.^[ [141]^34 ^] GRB7, frequently co‐amplified with ERBB2, enhances tumor invasion and metastatic potential, contributing to the more aggressive phenotype observed in HER2‐positive cancers.^[ [142]^35 ^] S100A9, a calcium‐binding protein, is upregulated in HER2‐positive tumors and modulates the tumor immune microenvironment by promoting metabolic reprogramming and reducing lymphocyte infiltration.^[ [143]^36 , [144]^37 ^] Although direct comparisons between HER2‐low and HER2‐positive tumors are limited, the known functions of S100A9 suggest its involvement in shaping distinct immune dynamics associated with HER2 status. In contrast, SCUBE2, a protein with context‐dependent tumor suppressor functions, was found upregulated in HER2‐positive tumors in our analysis. This observation is consistent with previous studies indicating that SCUBE2 expression is associated with chemotherapy resistance and may influence clinical outcomes in breast cancer^[ [145]^38 , [146]^39 ^] Pathway enrichment analysis further revealed that these gene expression differences were associated with immune‐related biological processes, including immune response regulation, antigen presentation, and cell adhesion. These pathways are critical for modulating tumor‐immune dynamics and suggest unique biological interactions within HER2‐low tumors. This interpretability bridges the gap between imaging‐based predictions and underlying tumor biology, offering a deeper understanding of the mechanisms driving HER2‐low status. Moreover, the model's ability to noninvasively characterize molecular features highlights its clinical potential for stratifying patients who may benefit from HER2‐targeted therapies, including antibody‐drug conjugates (ADCs), which have shown increasing efficacy in HER2‐low cases, emphasizing the need for precise subgroup identification to optimize treatment outcomes. Despite its strengths, our study acknowledges several limitations in the imaging‐based prediction of HER2‐low status. First, the biological similarity between HER2‐low and HER2‐zero tumors complicates precise differentiation based solely on imaging features. Second, while attention maps and gene expression correlations have enhanced the interpretability of our deep learning model, the inherent black‐box nature of such models limits a comprehensive understanding of how specific imaging features drive predictions. Future research could address this by integrating deep learning with habitat imaging, enabling more granular exploration of tumor microenvironment heterogeneity and improving model interpretability. Finally, the retrospective design and variability in imaging protocols across centers may limit the model's generalizability, underscoring the need for prospective studies with standardized protocols and larger, diverse cohorts to validate its clinical applicability and enhance predictive accuracy. In conclusion, our ViT‐based model offers a novel and effective approach for non‐invasive HER2 status assessment in breast cancer, bridging imaging‐derived features with molecular biology to provide clinically relevant insights. By capturing tumor heterogeneity and its associated molecular pathways, such as immune response and antigen presentation, the model not only enhances biological interpretability but also supports the identification of patients who may benefit from emerging HER2‐targeted therapies, including antibody‐drug conjugates. Furthermore, the observed correlation between model predictions and survival outcomes highlights its potential utility in prognostic evaluation and treatment stratification. Compared to traditional methods like IHC and ISH, this MRI‐based model represents a comprehensive and non‐invasive alternative, addressing challenges related to sampling bias and procedural invasiveness. These findings underscore the model's promise in advancing precision oncology and improving outcomes for breast cancer patients through tailored therapeutic strategies. Conflict of Interest The authors declare no conflict of interest. Author Contributions X.Z., Y.Y.S., G.H.S., and Y.G. contributed equally to this work and shared the co‐first authors. Y.J.G. and C.Y. conceptualized the study; X.Z. and Y.Y.S. developed the methodology. Y.Y.S., R.C.Z., S.Y.D., S.Y.C., Z.M.S., and Y.G. curated the data. G.H.S. and Y.X. conducted the investigation. G.H.S. performed the formal analysis. X.Z. provided statistical guidance. X.Z. and Y.Y.S. drafted the original manuscript. Y.J.G. and C.Y. reviewed and edited the manuscript. Y.J.G., L.N.Z., Y.Z.J., H.W., and C.Y. supervised the project. All authors approved the final manuscript, and the corresponding author, C.Y., affirmed that all listed authors meet the authorship criteria. Supporting information Supporting Information [147]ADVS-12-e03925-s001.docx^ (656.4KB, docx) Supplemental Tables [148]ADVS-12-e03925-s002.xlsx^ (2.4MB, xlsx) Acknowledgements