Abstract

   Breast cancer is now the most frequently diagnosed malignancy, and
   metastasis remains the leading cause of death in breast cancer.
   However, little is known about the dynamic changes during the
   evolvement of dissemination. In this study, 65 968 cells from
   four patients with breast cancer and paired metastatic axillary lymph
   nodes are profiled using single‐cell RNA sequencing (scRNA‐seq) and
   spatial transcriptomics. A disseminated cancer cell cluster with high
   levels of oxidative phosphorylation (OXPHOS), including the
   upregulation of cytochrome C oxidase subunit 6C and
   dehydrogenase/reductase 2, is identified. The transition between
   glycolysis and OXPHOS when dissemination initiates is noticed.
   Furthermore, this distinct cell cluster is distributed along the
   tumor's leading edge. The findings here are verified in three different
   cohorts of breast cancer patients and an external scRNA‐seq dataset,
   which includes eight patients with breast cancer and paired metastatic
   axillary lymph nodes. This work describes the dynamic metabolic
   evolvement of early disseminated breast cancer and reveals a switch
   between glycolysis and OXPHOS in breast cancer cells as the early event
   during lymph node metastasis.

   Keywords: breast cancer, early dissemination, metabolism, single‐cell
   RNA sequencing, spatial transcriptomics
     __________________________________________________________________

   By single‐cell RNA sequencing and spatial transcriptomics, the early
   early‐disseminated breast cancer cells are found to travel from the
   border of primary tumor to axillary lymph nodes. During this
   metastasis, a switch between glycolysis and oxidative phosphorylation
   occurs in early disseminated breast cancer cells, indicating an
   interesting dynamic metabolic evolvement.

   graphic file with name ADVS-10-2205395-g005.jpg

1. Introduction

   Breast cancer has become the most common malignancy worldwide,
   accounting for nearly 30% of female cancers.^[ [46]^1 ^] Owing to the
   rapid development of therapeutic approaches, most patients with
   non‐metastatic breast cancer are curable. However, advanced breast
   cancers with distant metastasis are still considered to be incurable.
   Axillary lymph nodes are the most common site for metastasis. The
   status of axillary lymph nodes is a crucial clinical factor during the
   determination of therapeutic strategies and is also of critical
   significance in risk estimation and prognosis evaluation.^[ [47]^2 ^]

   Single‐cell RNA sequencing (scRNA‐seq) is a powerful tool that can
   provide expression profiling of human cancer at the resolution of
   individual cells, which allows the identification and characterization
   of specific subclusters that bear unique biological effects.^[ [48]^3
   ^] Such studies have been widely performed in breast cancer research to
   investigate the tumor microenvironment (TME) and the evolvement of
   tumor cells.^[ [49]^4 , [50]^5 , [51]^6 , [52]^7 , [53]^8 , [54]^9 ,
   [55]^10 , [56]^11 ^] It has been gaining significant interest that the
   spatial arrangement and cellular interactions can affect the state of
   cells to a massive extent; however, information on the location was not
   preserved when using scRNA‐seq to measure the state of cells.^[ [57]^12
   ^] Spatial transcriptomics (ST) is a novel biotechnology that allows
   visualization and quantitative analysis of the transcriptome with
   spatial resolution in tissue sections, compensating for the lack of
   spatial information in scRNA‐seq.^[ [58]^13 ^] Combining ST and
   scRNA‐seq helps to overcome the limitations of both scRNA‐seq—which
   lacks spatial information—and ST—which is not at individual cell
   resolution.^[ [59]^14 ^] Multimodal intersection analysis (MIA) is one
   of the useful approaches which provides meaningful biological insight
   into research by combining ST and scRNA‐seq.^[ [60]^15 ^] Regional
   lymph nodes are often the earliest sites of metastasis when breast
   cancer cells leave the primary tumor to the whole body.^[ [61]^16 ,
   [62]^17 ^] Therefore, we used patients with breast cancer and paired
   metastatic axillary lymph nodes to mimic the early dissemination
   events. By integrating these two complementary technologies, we
   intended to provide an insight into the process during early
   dissemination.

   In this study, we applied scRNA‐seq and discovered a cluster of breast
   cancer cells that disseminated. This distinct cell cluster exhibited
   upregulated oxidative phosphorylation (OXPHOS) pathway. We found that
   this cluster experienced a reverting transition between OXPHOS and
   glycolysis using pseudo‐time analysis. CellChat analysis was also
   conducted to map its cellular interaction with immune cells.
   Furthermore, we also performed ST analysis to identify that this
   disseminated cell cluster is located along the tumor's leading edge. We
   tested these findings via cytological experiments and
   immunohistochemical staining and verified the results in both bulk and
   single‐cell RNA‐seq cohorts.

2. Results

2.1. scRNA‐Seq Identified Early‐Disseminated Cancer Cell Clusters in Paired
Primary Breast Tumors and Metastatic Lymph Nodes

   To comprehensively identify the cellular composition and architecture
   in primary tumors and metastatic lymph nodes, we performed scRNA‐seq
   (10x Genomics) on four pairs of primary invasive breast tumors and
   metastatic axillary lymph nodes collected from four patients in Fudan
   University Shanghai Cancer Center (FUSCC) cohort during surgical
   resection (Figure [63] 1A and Data [64]S1, Supporting Information). Two
   pathologists confirmed all lymph nodes to ensure metastasis. After
   filtering the scRNA‐seq data to exclude damaged or dead cells and
   putative cell doublets, we constructed an atlas consisting of
   65 968 single cells with a median of 1437 expressed genes that passed
   the stringent quality filtering (see Experimental Section).

Figure 1.

   Figure 1
   [65]Open in a new tab

   Expression profiling of 65 968 single cells in paired primary breast
   tumors and metastatic lymph nodes. A) Graphic overview of this study
   design. Tumor and paired lymph node tissue from four breast cancer
   patients were processed into single‐cell suspension and unsorted cells
   were used for scRNA‐seq with 10x Genomics. Tumor slides were processed
   to obtain by 10x Genomics Visium. The following integrated analysis of
   cytological experiments and IHC staining is described in squares. B)
   t‐SNE plots of 65 968 cells from tumor and paired lymph node tissue of
   four breast cancer patients, showing 15 clusters in each plot. Each
   cluster is shown in a different color. C) t‐SNE plot of 65 968 cells
   profiled in the present study colored by major cell types. D) Bubble
   plots of the marker genes expressed in the major cell types. Dot color
   reflects expression level and dot size represents the percent of cells
   expressing marker genes in different cell types. IHC,
   immunohistochemistry. HE, hematoxylin and eosin. scRNA‐seq, single‐cell
   RNA sequencing. t‐SNE, t‐distributed stochastic neighbor embedding.

   After integrating the transcriptional data from all acquired cells, we
   first applied low‐resolution t‐distributed stochastic neighbor
   embedding (t‐SNE) clustering and generated 2D graphs with 15 clusters
   in eight samples (Figure [66]1B). To identify the main cell types of
   this atlas, we annotated each cluster with their marker gene expression
   (Figure [67]1C and Figure [68]S1A,B, Supporting Information). All cells
   were divided into eight cell types, including epithelial cells (ECAM1,
   KRT19, PROM1), plasma cells (IGHG1, MZB1, SDC1), myeloid cells (CD68,
   CD163, CSF1R, MRC1, TPSAB1, MS4A2, CD1C), natural killer (NK) cells
   (NCR1, GLNY, NKG7), B cells (CD79A, CD79B, MS4A1), T cells (CD3D, CD3E,
   CD8A, CD4), fibroblasts (COL1A1, COL1A2, DCN), and endothelial cells
   (PCAM1, VWF) (Figure [69]1D and Figure [70]S1C and Data [71]S2,
   Supporting Information).

   Given that breast cancer cells originate from epithelial cells, we then
   performed high‐resolution t‐SNE analysis and re‐cluster the epithelial
   cells into 11 clusters (Figure [72] 2A and Figure [73]S2A,B, Supporting
   Information). Copy number variation (CNV) analysis has been widely used
   in scRNA‐seq to investigate disease evolvement and development.^[
   [74]^18 , [75]^19 ^] To distinguish malignant from non‐malignant
   clusters, we evaluated the CNV level of all epithelial cells and each
   epithelial cell cluster (Figure [76]2B and Figure [77]S2C,D, Supporting
   Information). As expected, the CNV levels of epithelial cells in lymph
   nodes were significantly higher than those in primary tumors (Figure
   [78]S2E, Supporting Information). Among all epithelial cell clusters,
   C3 exhibited remarkably lower CNV levels than other clusters
   (Figure [79]2B), suggesting C3 is a group of normal mammary ductal
   epithelial cells, and other clusters are malignant epithelial cells.
   Based on the transformation of CNV along the metastasis of breast
   cancer cells, we constructed four evolutionary trees of epithelial
   cells for each patient (Figure [80]S3A, Supporting Information). All
   evolutionary trees showed a similar evolutionary trajectory from the
   primary tumor to lymph node, indicating the common features of tumor
   evolution during the early dissemination of breast cancer. To dissect
   the evolutionary dynamics of breast epithelial lineages, we also
   constructed a pseudo‐time cell trajectory analysis of the 11 epithelial
   cell clusters and generated a four‐branch trajectory depicting the
   development from non‐malignant, malignant, to metastatic cells
   (Figure [81]2C). Due to its low CNV level, C3 was located at the
   bottom‐right corner of the trajectory curve, suggesting the clear
   starting point of this evolving trajectory curve. After confirmation of
   this starting point, developmental routes were determined to begin with
   the initial state and then bifurcating into either Cell fate 1 or Cell
   fate 2 branches with metastasis‐rich endpoints (Figure [82]2C and
   Figure [83]S3B, Supporting Information). Hence, C2, 4, and 9 were
   identified as early‐disseminated cancer cell (EDC) clusters since they
   were at the end of the trajectory curve with relatively high CNV levels
   (Figure [84]2B,C). More importantly, these three clusters appeared in
   both primary tumors and lymph nodes (Figure [85]S2B, Supporting
   Information). Although C8 had a CNV level close to the normal
   epithelial cell C3 and was part of the initial state, it was still
   identified as an EDC cluster due to its existence in both primary
   tumors and lymph nodes (Figure [86]S2B, Supporting Information). In a
   word, we identified that C3 represented the normal mammary ductal
   epithelial cells, while C2, 4, 8, and 9 were the EDC clusters of breast
   cancer.

Figure 2.

   Figure 2
   [87]Open in a new tab

   Metastatic epithelial cell characteristics identified by scRNA‐seq. A)
   t‐SNE plot of 5739 epithelial cells showing 11 clusters. Each cluster
   is shown in a different color. B) Violin plots of the CNV levels in
   11 epithelial cell clusters. C) Potential trajectory of all epithelial
   cells identified two distinct cell fates colored by cluster. The arrow
   shows the potential evolutionary direction in the trajectory. D) Bubble
   plot of hallmarks for DEGs between EDC clusters and other epithelial
   cell clusters. The intensity represents the adjusted p‐value of each
   hallmark. Dot size shows gene count for each hallmark. Wilcoxon
   signed‐rank test was used to assess the difference. E) Heatmap showing
   selected DEGs (rows) between EDC clusters and other epithelial cell
   clusters along the pseudo‐time (columns), which was clustered into
   three profiles. Color key differentially coding from blue to red
   indicated the relative expression levels from low to high. F) Dot plots
   of dynamic expression of key genes in glycolysis and OXPHOS and the two
   pathways themselves based on KEGG database along two cell fates. CNV,
   copy number variation. DEG, differentially expressed genes. EDC,
   early‐disseminated cancer cell. KEGG, Kyoto Encyclopedia of Genes and
   Genomes. OXPHOS, oxidative phosphorylation. Pre‐branch,
   premalignant‐status branch.

2.2. Metabolic Switch between OXPHOS and Glycolysis Acts as a Potential Key
Regulator in EDC Clusters

   To investigate the changes in the gene expression of EDC clusters (C2,
   4, 8, and 9), we performed enrichment analysis based on hallmark gene
   sets of the Molecular Signatures Database (MsigDB)^[ [88]^20 ^] to
   identify the altering pathways in cancer cells during the disseminating
   process (Data [89]S3, Supporting Information). The top enriched
   pathways included pathways that are associated with cell proliferation
   (mTORC1 signaling, MYC targets V1) and therapy response of breast
   cancer (estrogen response early, estrogen response late)
   (Figure [90]2D), which are compatible with the patients’
   clinicopathological features and more malignant traits of EDC clusters.
   Interestingly, the EDC clusters displayed significant enrichment of the
   OXPHOS pathway, suggesting the novel metabolic traits of disseminated
   cells (Figure [91]2D and Data [92]S3, Supporting Information). However,
   when analyzing each EDC cluster separately, the OXPHOS pathway was not
   significantly enriched in C4. Even so, the adjusted p‐value was close
   to 0.05 (Data [93]S3, Supporting Information).

   The heatmap showed how the expression of genes changed from the initial
   state to Cell fate 1 or 2 and identified three different transformation
   patterns (Figure [94]2E and Figure [95]S3C–E, Supporting Information).
   The importance of the OXPHOS pathway in breast cancer metastasis was
   emphasized in the previous work of Davis et al., where switching from
   glycolysis to OXPHOS promotes metastasis specifically in the tumor
   seeding step.^[ [96]^21 ^] Consistent with their discoveries, this
   metabolic switch in our study through pseudo‐time analysis was also
   observed in our study. Marker genes in OXPHOS (cytochrome C oxidase
   subunit 6C [COX6C], dehydrogenase/reductase 2 [DHRS2], ATP5MC2, and
   NDUFB4) exhibited an up‐then‐down expression pattern, while marker
   genes in glycolysis pathways (GAPDH, LDHA, PKM, and PGK1) exhibited a
   down‐then‐up expression pattern during early dissemination of malignant
   epithelial cells. Meanwhile, the expression of genes linked to cell
   adhesion and migration (CLDN3, F11R) also exhibited a down‐then‐up
   pattern, which indicated the dynamic change of cellular behavior during
   the process of dissemination (Figure [97]2F and Figure [98]S3F,G,
   Supporting Information). Herein we raised a hypothesis that this
   metabolic switch of breast cancer cells resembles changes in
   epithelial–mesenchymal transition (EMT) during the metastatic process,
   that is, after detachment from the primary tumor, metabolic profiles of
   EDCs switched from glycolysis to OXPHOS. Once the metastatic cells
   seed, the metabolic preference then reverts to promoting cell
   proliferation. Taken together, our findings revealed the transition
   between OXPHOS and glycolysis in the evolvement during early
   dissemination of breast cancer, suggesting OXPHOS might play a vital
   role in the metastatic process.

2.3. Intercellular Interaction between EDCs and Immune Cells in the TME

   To determine the potential interactions between EDCs and immune cells
   in the TME, we performed CellChat analysis to acquire cell–cell
   signaling links.^[ [99]^22 ^] Although numbers of interactions between
   major cell types were higher in the primary tumor, the strength of
   interactions between the two groups was very close (Figure [100] 3A).
   EDC clusters showed more interaction with immune cells in the primary
   tumor, while showing stronger interaction with immune cells in lymph
   nodes (Figure [101]3B–E).

Figure 3.

   Figure 3
   [102]Open in a new tab

   Intercellular ligand–receptor prediction among EDCs and immune cells
   revealed by CellChat. A) Bar plot showing the number and strength of
   intercellular interactions in both lymph node and tumor. B,C) Heatmaps
   of differential number (B) and strength (C) of intercellular
   interactions between lymph node and tumor. D,E) An overview of
   cell–cell interactions. Arrow and edge color indicate direction. Circle
   size is proportional to the number of cells in each cell group. Edge
   thickness indicates the number (D) and the strength (E) of interaction
   between populations. The loops indicate cell types. F) Bubble plots of
   the significant differentially expressed ligand–receptor pairs in the
   lymph node versus tumor. Dot color reflects communication
   probabilities, and dot size represents computed p‐values. Each cell
   group is shown in different color. Empty space means the communication
   probability is zero. p‐values are computed from a two‐sided permutation
   test. G,H) Chord diagrams of the inferred MIF signaling networks in
   lymph node (G) and tumor (H). Arc length represents the number of cells
   in each cell group and edge width represents the communication
   probability. I) Bar plots of ranked significant signaling pathways
   based on differences in the overall information flow within the
   inferred networks between lymph node and tumor. The top signaling
   pathways colored red are enriched in lymph node, and these colored blue
   pathways were enriched in the tumor. EC, epithelial cell. MIF,
   macrophage migration inhibitory factor.

   We investigated the specific ligand–receptor interactions among
   different cell clusters. The ligand–receptor pairs among immune cells
   (CD22‐PTPRC, PTPRC‐CD22, and ADGRE5‐CD55) were significantly
   upregulated in lymph nodes versus primary tumors, suggesting that these
   pathways are essential for the immune response against tumors
   (Figure [103]3F and Data [104]S4, Supporting Information). Macrophage
   migration inhibitory factor (MIF) pathway was active between EDCs and
   immune cells in both lymph nodes and primary tumors, indicating it is
   essential in both sites and likely to contribute to breast cancer
   dissemination (Figure [105]3F–H). We also calculated the information
   flows across lymph nodes and primary tumors, which is defined as the
   overall communication probability among all the pairs of cell groups in
   the communication network. Intriguingly, 4 out of 20 pathways are
   highly active both in lymph nodes and primary tumors, albeit at
   different levels (Figure [106]3I). These are likely the major signaling
   pathways between epithelial cells and immune cells independent of
   different TMEs. We also found 5 pathways that were specifically active
   in lymph nodes, while the other 11 pathways involved in apoptosis, cell
   migration, and proliferation were explicitly active in primary tumors,
   such as JAM, APP, CDH, NECTIN, and MK (Figure [107]3I). Altogether,
   CellChat analysis demonstrated that the intercellular interactions in
   TME were shaped in early disseminated breast cancer.

2.4. Spatial Transcriptome Combined with scRNA‐Seq Revealed Spatial Features
of EDC Clusters

   To further assess the spatial organization of epithelial cells, we
   generated breast cancer tissue cryosections originating from the fresh
   tumor samples of four patients (see [108]Experimental Section). After
   hematoxylin and eosin (H&E) staining and brightfield imaging, we
   annotated the slide into three main regions, including the tumor
   region, ductal epithelium, and the stromal region (Figure [109] 4A,
   left). We then conducted ST analysis on collected samples.
   Transcriptomes from 11 137 spots across four sections were obtained at
   a median depth of 8373 unique molecular identifiers (UMIs)/spot and
   3167 genes/spot. To depict the spatial features of main cell types in
   the primary tumors, we mapped the scRNA data to the ST slides by
   SPOTlight.^[ [110]^23 ^] As expected, the stroma region was enriched
   with endothelial cells, while the tumor region was enriched with
   epithelial cells (Figure [111]4A, left, and Figure [112]S4A and Data
   [113]S5, Supporting Information). Interestingly, T cells were obviously
   gathered along the boundary of the stroma region and tumor region
   (Figure [114]S4B, Supporting Information), suggesting the unique
   spatial features of this leading‐edge zone. In a word, we delineated
   the landscape of TME in breast primary tumors.

Figure 4.

   Figure 4
   [115]Open in a new tab

   Leading edge heterogeneity revealed by ST. A) H&E staining of tissue
   sections (left) and mapped with unbiased clustering of ST spots in
   4 tumor samples (right). Each region is surrounded by dotted lines in
   different color. Scale bar = 500 µm. B) t‐SNE plot of 11 137 ST spots
   from four primary breast cancer ST data. Each cluster is shown in
   different color. C) MIA map of overlap between all scRNA‐seq‐identified
   epithelial cell clusters and ST‐identified spot clusters. Each element
   in the matrix is computed for all pairs of epithelial cell clusters and
   spot clusters using the same 728 background genes. Red indicates
   enrichment (significantly high overlap); blue indicates depletion
   (significantly low overlap). D) All ST spots of BMS score and the tumor
   leading edge ST spots of BMS and EMT score in T3 and T4 tissue
   sections. The intensity represents score of each ST spot. Scale
   bar = 500 µm. E) Violin spots of BMS score of the tumor inner region
   and the tumor leading edge in T3, T4, and four samples together. F)
   Unbiased clustering of ST spots mapped to T3 and T4 tissue sections,
   respectively (left, middle). BMS score feature plots from T3 and
   T4 with data generated using the Visium ST platform. The intensity
   represents max expression of each gene. Scale bar = 500 µm. (right). G)
   Violin plots of BMS score by ST spots subpopulation in ST data. H) Bar
   plots of GSVA analysis comparing between leading edge‐associated ST
   plots and non‐leading edge‐associated ST plots in four primary breast
   tumor samples. All p‐values were determined using an unpaired two‐sided
   Wilcoxon rank‐sum test. ns, p ≥ 0.05; *, p < 0.05; **, p < 0.01; ***,
   p < 0.001; ****, p < 0.0001. BMS, breast metastatic signature. COX6C,
   cytochrome C Oxidase Subunit 6C. DHRS2, dehydrogenase/Reductase 2. EMT,
   epithelial–mesenchymal transition. GSVA, Gene set variation analysis.
   H&E, Hematoxylin and eosin. MIA, multimodal intersection analysis. ST
   spatial transcriptomics

   To explore the spatial features of EDC clusters in detail, ST data was
   processed using t‐SNE analysis, and all spots were divided into
   15 clusters (Figure [116]4B). By projecting all ST clusters onto
   cryosections, we observed a noticeable distribution pattern, which was
   consistent with the histological annotations under H&E staining
   (Figure [117]4A, right). For each sample, tumor regions were made up of
   different clusters, including CII from T1, CX, and CXIV from T2, CIX,
   and CXIII from T3, CVIII, and CXV from T4. These results implied the
   inter‐tumor heterogeneity among four patients. To locate the epithelial
   cells in the samples precisely, MIA was applied to generate the
   corresponding relationship between the epithelial clusters and ST
   clusters.^[ [118]^15 ^] We spotted that the EDC clusters (C2, 4, 8, and
   9) were enriched in the tumor region (Figures [119]4A, right,
   and [120]4C). Interestingly, we noticed that EDC clusters C2 and
   C8 were mainly located at the leading edge of the tumor regions in
   T3 and T4, while the other malignant cell (C6) was surrounded by C8 in
   T3. Together with the results from pseudo‐time analysis, we speculated
   that the evolvement from the malignant stage (C6) to the dissemination
   stage (C8) is in parallel with the route from the inside of the tumor
   to the leading edge and finally to metastatic sites.

   To verify this hypothesis, we manually selected the spots on the tumor
   leading edge of each sample. Then we developed a breast metastatic
   signature (BMS) score based on the differential gene expression
   analysis between EDC clusters and other epithelial clusters
   (see [121]Experimental Section). Among the top 65 genes, 26 were
   upregulated in the metastatic cell group (Data [122]S6, Supporting
   Information). As expected, these genes are related to cell migration
   and cell population proliferation (e.g., HSPB1, CRIP1, and XBP1),
   indicating the more malignant biological feature of the EDCs cluster.
   Notably, the expression of multiple genes in OXPHOS pathways was also
   significantly altered in EDC clusters, such as COX6C, DHRS2, and
   NDUFC2. We also calculated the OXPHOS score and EMT score of the tumor
   leading edge of each sample,^[ [123]^24 ^] which is a known pathway
   associated with tumor dissemination. Although three scores varied in
   four samples, the tumor leading edge showed significantly higher BMS
   score, OXPHOS score, and EMT score than the tumor inner region in each
   sample (Figure [124]4D,E and Figure [125]S3C–H, Supporting
   Information). These results suggested that the distribution of EDC
   clusters follows the tumor leading edge, which is consistent with the
   result produced by MIA and the metabolic traits of EDC clusters
   identified by scRNA‐seq.

   To further investigate EDCs at the tumor leading edge, we performed
   graph‐based t‐SNE analysis on ST data of T3 and T4, re‐clustering ST
   spots from each sample into nine clusters and five clusters,
   respectively (Figure [126]4F, left). The two clusters (c2 in T3, c4 in
   T4) with the highest levels of BMS score were distributed along the
   tumor leading edge in both samples (Figures [127]4F, middle, [128]4F,
   right, and [129]4G). According to the results from scRNA‐seq, COX6C and
   DHRS2 are among the top 25 differential genes between EDC clusters and
   other epithelial clusters (Data [130]S6, Supporting Information), and
   both participate in the OXPHOS pathway. As expected, the two clusters
   with the highest levels of COX6C and DHRS2 were distributed along the
   tumor leading edge in both samples, which is consistent with the result
   produced by MIA (Figure [131]S3I, Supporting Information)

   Gene set variation analysis (GSVA) was also performed to investigate
   differential pathways between spots along the tumor leading edge and
   other spots in the tumor region. The results of GSVA showed that the
   OXPHOS pathway was significantly upregulated in spots along the tumor
   leading edge. Furthermore, other enriched pathways such as DNA repair,
   MYC targets V1, and mTORC1 signaling were also significantly
   upregulated in EDC clusters (Figure [132]4H and Data [133]S7,
   Supporting Information).

   Altogether, we identified EDC clusters tended to distribute along the
   tumor leading edge and exhibited the metabolic signature of upregulated
   OXPHOS. OXPHOS might play an essential role in the metabolic transition
   of EDC clusters.

2.5. Knocking Down COX6C and DHRS2 Inhibited Breast Cancer Cells
Proliferation, Migration, and EMT

   To evaluate the role of COX6C and DHRS2 in breast cancer cells, we
   utilized short hairpin RNA (shRNA) to generate T‐47D, MDA‐MB‐231, and
   MCF‐7 cell lines with low COX6C or DHRS2 expression. Downregulation of
   COX6C and DHRS2 notably inhibited the proliferation of breast cancer
   cells (Figure [134] 5A and Figure [135]S5A, Supporting Information). In
   the transwell migration assay, migrated cells transfected with
   shIL4I1 were remarkably decreased by 60% or more compared with control
   (Figure [136]5B and Figure [137]S5B, Supporting Information),
   indicating high levels of COX6C and DHRS2 could facilitate breast
   cancer invasion. We next performed immunohistochemistry (IHC) staining
   on lymph nodes and primary tumors with positive or negative lymph node
   metastasis. We observed that the representative protein expression of
   COX6C and DHRS2 was positive in samples with lymph node metastasis
   (Figure [138]5C). Fisher's exact test showed the expression of both
   COX6C and DHRS2 are significantly related to the presence of lymph node
   metastasis (Tables [139]S1 and [140]S2, Supporting Information).
   Interestingly, the expression of E‐cadherin was upregulated while Snail
   was downregulated in COX6C knockdown (shCOX6C) and DHRS2 knockdown
   (shDHRS2) cells (Figure [141]5D and Figure [142]S5C, Supporting
   Information), which suggested that COX6C and DHRS2 may promote cell
   migration through upregulating EMT‐related pathways. Moreover, lower
   levels of β‐Catenin in shCOX6C and shDHRS2 cells than in control
   indicated the downregulated WNT signaling pathway. Meanwhile, the
   analysis results based on scRNA‐seq dataset also displayed the positive
   correlation among COX6C, DHRS2, and the EMT pathway (Figure [143]5E,F).
   Taken together, these data demonstrated that the downregulation of
   COX6C and DHRS2 might inhabit cell proliferation, migration, and EMT in
   breast cancer.

Figure 5.

   Figure 5
   [144]Open in a new tab

   Downregulation of COX6C and DHRS2 inhibited breast cancer cells
   proliferation, migration, and EMT. A) Line plots showing significantly
   lower cell proliferation rates in T‐47D and MDA‐MB‐231 cells after
   knocking down COX6C and DHRS2. B) Bar plots showing downregulation of
   COX6C and DHRS2 significantly inhibited cell migration ability of T‐47D
   and MDA‐MB‐231 cells in trans‐well assay (right). Representative images
   randomly selected from T‐47D and MDA‐MB‐231 cells are shown (left).
   Scale bars = 1 mm. C) IHC images of COX6C and DHRS2 expressions in
   primary tumor with or without lymph node metastasis. Three independent
   experiments were performed and generated similar results. Scale
   bar = 100 µm. D) Western blot images showing the EMT signaling pathway
   was inactivated in shCOX6C and shDHRS2 group, compared with control of
   MDA‐MB‐231 and 4T1 cells. E) Scatter plots with a significant positive
   spearman correlation for the expression of COX6C and those of DHRS2. F)
   Violin spots of EMT score of the high expression and the low expression
   of COX6C and DHRS2 in epithelial cells. All p‐values in (A) and (B)
   were determined using an unpaired two‐sided Student's t‐test. All
   p‐values in (F) were determined using an unpaired two‐sided Wilcoxon
   rank‐sum test. Data presented as the mean ± SD. of n = 3. ns, p ≥ 0.05;
   *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.0001. EMT,
   epithelial–mesenchymal transition. shCOX6C, COX6C knockdown. shDHRS2,
   DHRS2 knockdown.

2.6. Upregulation of OXPHOS Exhibited Associations with Lymph Node Status and
Prognosis in Breast Cancer Patients

   To validate the association between OXPHOS pathway and breast cancer
   early dissemination, we applied single sample gene set enrichment
   analysis (ssGSEA) analysis in our FUSCC cohort and GSVA analysis in
   some patients from TCGA and METABRIC datasets.^[ [145]^25 , [146]^26 ,
   [147]^27 , [148]^28 ^] Compared with patients without lymph nodes
   metastasis, those with lymph nodes metastasis demonstrated
   significantly higher scores in the OXPHOS pathway and lower scores in
   the glycolysis pathway (Figure [149] 6A–C), suggesting that the
   upregulation of OXPHOS might be related to lymph node metastasis and
   the metabolic switch from glycolysis to OXPHOS might take place.

Figure 6.

   Figure 6
   [150]Open in a new tab

   Clinical significance of OXPHOS in early disseminated breast cancer.
   A–C) Dot plots showing significantly higher level of OXPHOS and lower
   level of glycolysis in N status positive (N1‐3) breast cancer patients
   than those with N status negative (N0). Patients sequence data were
   from FUSCC (A), TCGA (B), and METABRIC (C). All p‐values were
   determined using an unpaired two‐sided Wilcoxon rank‐sum test. D) UMAP
   plot of 6350 cancer epithelial cells with tissue source. Each tissue
   source is shown in different color. E) Bubble plot of enriched KEGG
   pathways of cancer epithelial cell from lymph nodes compared to those
   from primary tumors. The intensity represents adjusted p‐value of each
   KEGG pathway. Dot size shows gene count for each KEGG pathway. Wilcoxon
   signed‐rank test was used to assess the difference. F) Kaplan–Meier
   curve illustrating higher COX6C and DHRS2 accompanied by poor OS in
   basal‐like breast cancer patients from TCGA dataset. G) Kaplan–Meier
   curve illustrating higher COX6C accompanied by poor OS and DMFS in
   HER2‐positive breast cancer while higher DHRS2 upregulation was
   associated with poor OS and DMFS in basal‐like breast cancer. Log‐rank
   test was used to assess the difference. ns, p ≥ 0.05; *, p < 0.05; **,
   p < 0.01; ***, p < 0.001; **** p < 0.0001. DMFS, distant
   metastasis‐free survival. GOBP, Gene Ontology Biological Process. KEGG,
   Kyoto Encyclopedia of Genes and Genomes. OS, overall survival. OXPHOS,
   oxidative phosphorylation. UMAP, Uniform Manifold Approximation and
   Projection.

   To validate our findings in an external dataset, we selected the
   [151]GSE167036 dataset as an external validation set,^[ [152]^11 ^]
   which includes the scRNA‐seq data of eight pairs of primary breast
   tumors and metastatic axillary lymph nodes. We performed low‐resolution
   Uniform Manifold Approximation and Projection (UMAP) clustering and
   identified cancer epithelial cells in eight patients. After cell
   annotation based on marker genes (Figure [153]6D), we conducted Kyoto
   Encyclopedia of Genes and Genomes (KEGG) analysis to identify pathways
   enriched in cancer epithelial cells from lymph nodes. As expected,
   OXPHOS, estrogen signaling, and tight junction pathways were enriched
   in the EDC cells (Figure [154]6E and Data [155]S8, Supporting
   Information), which is consistent with our results (Figure [156]2D).
   Furthermore, COX6C upregulation was associated with poor overall
   survival (OS) in basal‐like breast cancer, while poor OS and distant
   metastasis‐free survival (DMFS) in HER2‐positive breast cancer.
   DHRS2 upregulation was associated with poor OS and DMFS in basal‐like
   breast cancer (Figure [157]6F,G). Collectively, these data validated
   that the OXPHOS pathway exhibited great value in predicting the risk of
   lymphatic metastasis and prognosis in breast cancer.

3. Discussion

   In this study, we defined a specific cell cluster disseminated from the
   primary tumor site by performing scRNA‐seq in the primary breast cancer
   samples and paired metastatic lymph nodes. By integrating scRNA‐seq and
   ST, we characterized this EDC cluster from both spatial and temporal
   levels. During the evolvement of the early dissemination of breast
   cancer, the EDC cluster undergoes a transition between glycolysis and
   OXPHOS and tends to distribute along the tumor border. The previous
   study investigated the energy anabolism shift using scRNA‐seq in breast
   cancer patient‐derived xenografts models,^[ [158]^21 ^] our study has
   brought this finding more in‐depth by elaborating the bioenergetic
   state once the disseminated cells are seeded. Furthermore, our findings
   have significant clinical relevance since the sequencing is performed
   on paired clinical samples of primary breast cancer and metastatic
   lymph nodes.

   Tumor cells go through a series of evolving events to become
   disseminated malignant cells. The evolvement occurs at multiple levels,
   including the genetic, epigenetic, and transcriptomic alterations,
   endowing the malignant cells with a more invasive phenotype.^[ [159]^29
   ^] The malignant cells share characteristics such as genomic
   instability, cell death resistance, and altered metabolism. Based on
   these cancer hallmarks, they further acquired motility and invasion,
   plasticity, and experienced microenvironment modulation and EMT to
   become disseminated cells and eventually colonize at distant sites.^[
   [160]^30 ^] The fast‐developing sequencing technology now allows us to
   look deeper at the transitions during this process. This study focused
   on the metabolism evolvement in the early disseminated breast cancer
   cells. The metabolic change in cancer cells plays an important role in
   regulating tumor metastasis,^[ [161]^31 ^] and the role of OXPHOS and
   glycolysis in tumor metastasis remains debatable. Increased OXPHOS was
   utilized in circulating cancer cells in breast cancer mouse model,
   where peroxisome proliferator‐activated receptor gamma, coactivator
   1 alpha (PGC‐1α) promoted distant metastases via enhancing this
   pathway.^[ [162]^32 ^] The OXPHOS pathway promoted cellular invasion,
   possibly due to the increased mitochondrial superoxide from overactive
   mitochondrial metabolism.^[ [163]^33 ^] However, another study showed
   that OXPHOS negatively correlates with EMT, and the downregulation of
   genes in this pathway was noticed in patients with high survival
   rates.^[ [164]^34 ^] Inhibition of LDHA, the key enzyme of glycolysis,
   suppressed tumor migration in multiple cancer types.^[ [165]^35 ,
   [166]^36 , [167]^37 ^] These controversial results are possibly due to
   the dynamic metabolic changes during metastasis; the fluctuating levels
   of OXPHOS and glycolysis might interfere with the consistency of the
   results of studies. Our study suggested that the upregulation of the
   OXPHOS is a temporal phase in the process of tumor dissemination, and
   once colonized, the disseminated group exhibits a more malignant
   property with upregulated glycolysis.

   Our study localized EDC clusters at the tumor leading edge, in other
   words, early disseminated tumor cells accumulate along the tumor‐host
   interface. The tumor‐host microenvironment allows extensive cellular
   communication between tumor cells and mesenchymal stromal components.^[
   [168]^38 ^] In squamous cell carcinoma, a tumor‐specific cluster of
   cells localized on the leading edge of tumor, where they interact with
   multiple cell types and exhibited invasive phenotype.^[ [169]^39 ^]
   Cellular interactions are also crucial in breast cancer. Neighboring
   cells such as fibroblasts and immune cells facilitate dissemination.^[
   [170]^40 ^] In this study, we applied CellChat analysis on scRNA‐seq
   and identified that the MIF pathway is highly activated between
   disseminated tumor cells and immune cells. MIF is a cytokine that binds
   and activates receptors CD74/CD44, CXCR2, CXCR4, and CXCR7, which has
   been reported to play an essential role in malignant diseases and as a
   novel therapeutic approach against triple‐negative breast cancer.^[
   [171]^41 , [172]^42 ^] Further studies are still needed to clarify the
   relationship between MIF and metastasis.

   This study still has several limitations. First is the high spatial
   heterogeneity among samples, which makes it hard to obtain shared
   characteristics. The second is the limited sample numbers. We performed
   scRNA‐seq and ST in four patients, limiting our conclusion's
   generalization. However, Liu et al. have characterized the
   microenvironment of primary tumors and paired metastatic lymph nodes by
   scRNA‐seq.^[ [173]^11 ^] In this study, we set their dataset as an
   external validation set to further confirm the metabolic traits of the
   metastatic cancer cells.

   We used two ways to combine scRNA‐seq and ST to make the result more
   convincing. Last but not least, early dissemination could occur either
   in a hematogenous way or via lymph way. However, early detection of
   hematogenous metastasis is difficult and unstable. Here we only focus
   on dissemination in the lymph ways, and further research on early
   hematogenous dissemination is needed.

   Our results uncovered the metabolism alterations in disseminated cells
   and revealed the dynamic evolvement of breast cancer cells during early
   dissemination. Our results warrant further investigation of the breast
   cancer early dissemination mechanism and have potential prognostic and
   therapeutic value.

4. Experimental Section

Ethical Statement

   This study was reviewed and approved by the Institutional Review Board
   of FUSCC. Each patient provided written informed consent before sample
   collection.

Human Specimens

   This study included four patients who underwent surgery at FUSCC. In
   total, eight fresh surgical specimens (four primary tumors and
   four paired metastatic lymph nodes) were sequenced and incorporated in
   further analyses. The metastasis of lymph nodes was confirmed by
   pathologists through both cytological detections during the surgery and
   the paraffin section after surgery. Clinical information, including
   demographics and tumor clinicopathologic characteristics, are
   summarized in Data [174]S1, Supporting Information (the staging was
   performed according to the American Joint Committee on Cancer system
   8 at the time of diagnosis).

Cell Lines and Cell Culture

   The human breast cancer cell line MDA‐MB‐231 and murine 4T1 were
   purchased from ATCC. MDA‐MB‐231 and T‐47D breast cancer cells were
   cultured in Roswell Park Memorial Institute (RPMI) 1640 medium
   supplemented with 5% fetal bovine serum (FBS, Gibco) and 1% pen–strep
   antibiotic (Beyotime). MCF‐7 breast cancer cells were cultured in
   Eagle's Minimal Essential Medium supplemented with 10%FBS and 1%
   pen–strep antibiotic. HEK293T cells were cultured in Dulbecco's
   Modified Eagle Medium (Thermo Fisher) supplemented with 10% FBS and 1%
   pen–strep antibiotic. All of the cell lines were tested and
   authenticated. All cells were cultured at 37 °C in a humidified
   atmosphere containing 5% CO[2]. The cell lines were Mycoplasma‐free and
   authenticated by PCR analysis monthly.

Tumor Sample Handling and Dissociation to a Single‐Cell Suspension for
scRNA‐Seq

   Tumor and lymph node samples were resected by an experienced physician
   to ensure that the tumor samples contained neoplastic, stromal, and
   para‐cancerous tissue and the lymph node samples consisted of
   neoplastic and lymphoid tissue with complete cross section. Fresh
   samples were washed two times with precooling 0.04% bovine serum
   albumin (BSA, MACS) in RPMI 1640 medium and evenly divided into two
   parts for scRNA‐seq and ST. Each sample for scRNA‐seq was cut into
   ≈1 mm^3 pieces and enzymatically digested with 10 mL digestion medium
   containing 0.2% collagenase type IV (Gibco), 0.05% hyaluronidase type
   I‐S (Sigma‐Aldrich), and 0.002% DNase I (Applichem). After 45 min of
   digestion at 37 °C in a shaking water bath, the enzymatic hydrolysate
   was filtered through a 40‐µm pore size nylon mesh and centrifuged at
   1500 rpm for 5 min to obtain a single‐cell suspension. Next,
   erythrocytes were lysed using a red blood cell lysis solution (MACS)
   for 5 min. The cell suspension was centrifuged at 1500 rpm for 5 min.
   After removing the supernatant, the cell pellet was washed twice with
   RPMI 1640 with 0.04% BSA and then re‐suspended in the sorting buffer
   (PBS supplemented with 0.04% FBS). Cell concentration and viability
   were assessed by Luna cell counter.

scRNA‐Seq and Reads Processing

   The cell concentration of fresh cell suspension for each sample was
   adjusted to 700–1200 cells/µL. Then the cell suspension was subjected
   to Chromium Next GEM Single Cell 3ʹ Reagent Kits v3.1 (10x Genomics,
   Pleasanton, CA) for library preparation according to the standard
   protocols. The single cell libraries were sequenced on Illumina NovaSeq
   6000 Systems using paired‐end sequencing (150 bp). The Cell Ranger
   software pipeline (version 3.1.0) provided by 10x Genomics was used to
   demultiplex cellular barcodes, map reads to the genome and
   transcriptome using the STAR aligner, and down‐sample reads as required
   to generate normalized aggregate data across samples, producing a
   matrix of gene counts versus cells.

Quality Control and Batch Effect Correction of scRNA‐Seq Data

   The UMI count matrix was processed using the R package Seurat (version
   3.1.1).^[ [175]^15 ^] To remove low‐quality cells and likely multiplet
   captures, which was a major concern in microdroplet‐based experiments,
   the following criteria were applied to filter out cells with UMI/gene
   numbers out of the limit of mean value +/− twofolds of standard
   deviations assuming a Gaussian distribution of each cells' UMI/gene
   numbers. Following visual inspection of the distribution of cells by
   the fraction of mitochondrial genes expressed, low‐quality cells where
   >10% of the counts belonged to mitochondrial genes were discarded .
   After applying these quality control criteria, 65 968 single cells were
   included in downstream analyses. Library size normalization was
   performed with NormalizeData() function in Seurat to obtain the
   normalized count. Specifically, the global‐scaling normalization method
   “LogNormalize” normalized the gene expression measurements for each
   cell by the total expression, multiplied by a scaling factor (10 000 by
   default), and the results were log‐transformed. After that, the
   normalized expression profiles of all samples were merged together
   using the merge() function in R v3.6.3. To remove the batch effects in
   single cell RNA sequencing data, the mutual nearest neighbors presented
   by Haghverdi et al. was performed with the R package batchelor.^[
   [176]^43 ^] The top 5000 highly variable genes (HVGs) of the merged
   dataset identified by the FindVariableFeatures() function were utilized
   as input for batch effect correction. Finally, the scaled and batch
   effect‐corrected expression profiles of all samples were obtained for
   downstream analyses.

Unsupervised Clustering and Dimensional Reduction

   The top principal components (PCs) were computed based on the gene
   expression profiles of the top 5000 HVGs after batch effect correction.
   The PCElbowPlot() function in Seurat was utilized to select the optimal
   number of PCs for further analysis as recommended by Seurat. The
   FindNeighbors() and FindClusters() functions in Seurat were both
   applied for cell clustering. The RunTSNE() and RunUMAP() function were
   both performed for visualization when appropriate. In the first round
   of “low‐resolution” clustering, 15 clusters were found from
   65 968 single cells. To resolve the high heterogeneity in tumor tissue,
   the second round of “high‐resolution” clustering was conducted to
   identify the finer clusters within in epithelial cell. Procedures of
   the second round of clustering were identical to the first one, both
   starting from the computation of PCs and then clustering cells with the
   optimal resolution obtained from the “clustering tree” method to seek
   the identity for each cluster.

Identification of Marker Genes for Cell Clusters and Cell Type Annotation

   The marker genes in each cluster were identified through the
   FindAllMarkers() function in Seurat. Significant levels of these
   signature genes were determined using the bimod test. For a given
   cluster, FindAllMarkers identified positive markers compared with all
   other cells. Then, the R package SingleR,^[ [177]^44 ^] with the
   reference transcriptomic datasets “Human Primary Cell Atlas”^[ [178]^45
   ^] was used to infer the cell of origin of each of the single cells
   independently and identify cell types. In the first round of
   “low‐resolution” clustering, epithelial cells (EPCAM, CDH1, KRT8,
   KRT18), endothelial cells (PECAM1, CDH5, CD34, VWF), B cells (CD79A,
   CD79B, CD19, IGHD), T/natural killer (NK) cells (CD3D, CD3G, FCGR3A,
   NKG7), and fibroblasts (COL1A1, COL1A2, COL3A1, ACTA2) were identified.

Single‐Cell Copy‐Number Variation Evaluation

   The CNV evaluation of each cell for each region on the chromosome was
   conducted by infercnv R package (version 1.0.4)
   ([179]https://github.com/broadinstitute/inferCNV)^[ [180]^18 ^] based
   on the amount of gene expression in the single‐cell transcriptome data.
   The CNV levels of epithelial cells were calculated and other cells were
   applied as the reference. The genes were sorted by chromosomal
   location, and the average gene expression was calculated using
   101 genes as the sliding window size, and then normal cell expression
   was used as a control to generate the final CNV result file after
   denoising. The parameters of inferCNV analysis included “denoise,”
   default hidden Markov model settings, and a value of 0.1 for ″cutoff."
   To reduce the false‐positive CNV calls, the default Bayesian latent
   mixture model was implemented to identify the posterior probabilities
   of the CNV alterations in each cell, with the default value of 0.5 as
   the threshold. The phylogenetic tree of tumor evolution was draft using
   Uphyloplot2^[ [181]^46 ^] software based on the CNV data, and the
   phylogenetic tree of each sample was classified according to tumor
   location and mutation status.

Identification of Signature Genes among Cell Clusters

   Differentially expressed genes (DEGs) were identified using the
   FindMarkers() function of Seurat package. Significance levels of these
   signature genes were determined using the Wilcoxon rank‐sum test along
   with Bonferroni correction. p‐value <0.05 and |log2foldchange| >0.5 was
   set as the threshold for significantly differential expression.

Inference of Cell Fate by Trajectory Analysis

   The trajectory analysis was performed using the Monocle2 package
   (version 2.9.0)^[ [182]^47 ^] to reveal the cell‐state transitions. The
   ordering genes in the trajectory analysis were determined according to
   each gene's expression (mean expression > 1) and variance level
   (dispersion_empirical > 2 × dispersion_fit) as recommended by Monocle2.
   The importCDS() function converted the Seurat object to a CellDataSet
   object. Then the differentialGeneTest() function was applied to filter
   out the genes used to sort cells (ordering gene, qval < 0.01). The
   reduceDimension() function in Monocle2 was applied to reduce the
   dimensions. All functions were set with default settings. As the normal
   ductal epithelial cells were separated and clearly defined in this
   study, these cells were set as the root state and performed the
   “orderCell” function in Monocle2. The DEGs changed along with the
   pseudotime were identified using the differentialGeneTest() function in
   Monocle2.

Tissue Processing and Visium Data Generation

   Breast cancer tissues were gently washed with cold 1× PBS (Gibco) and
   evenly divided for preparing a single‐cell suspension (see above) and
   ST. The fresh frozen tumor tissues for ST were stored at −80 °C and
   embedded in OCT (Sakura, Alphen aan den Rijn, Netherlands) before
   cryosectioning for RNA extraction. Each tumor cryosection was cut with
   thicknesses of 10 µm in the cryostat (Leica CM1950, Germany) and
   mounted onto ST microarrays. For processing, the tissue was dehydrated
   with isopropanol for 1 min and staining with H&E.

   Bright‐field images were obtained by a whole slide scanner (Pannoramic
   MIDI FL, 3DHISTECH) at 20× resolution. H&E‐stained sections of each
   sample were carefully reviewed by two experienced pathologists to
   confirm the pathology, and were then manually annotated by a trained
   pathologist to identify tumor, stromal, and normal ductal region.

ST Barcoded Microarray Slide Information

   Library preparation slides used were purchased from the ST team
   ([183]https://www.spatialtranscriptomics.com). Each of the spots
   printed onto the array was 55 µm in diameter and 100 µm from center to
   center, covering an area of 6.5 × 6.5 mm^2. Each slide included
   four capture zones. Each capture zone had ≈5000 unique gene expression
   spots.

Tissue Permeabilization, cDNA Synthesis, Library Construction, and Sequencing

   Tissue sections were subjected to Visium Spatial Gene Expression
   Reagent Kits (10× Genomics, Pleasanton, CA) for library preparation
   according to the standard protocols. First, fixed and stained tissue
   sections were permeabilized with 70 µL permeabilization enzyme at 37 °C
   for 18 min. Then each well was washed with 100 µL SSC and 75 µL reverse
   transcription Master Mix containing reverse transcription reagents were
   added for cDNA Synthesis. After incubation with reagents, full‐length
   cDNAs with spatial barcodes generated from polyadenylated mRNAs were
   obtained . At the end of first‐strand synthesis, RT Master Mix was
   removed from the wells with the subsequent addition of 75 µL 0.08 m KOH
   and incubated for 5 min at room temperature. KOH was then removed from
   wells, and the residuals were washed with 100 µL EB buffer. In all,
   75 µL second strand mix was added to each well for second‐strand
   synthesis. Finally, spatial barcodes were encoded and the full‐length
   cDNA was amplified by PCR to obtain a sufficient amount for library
   construction. Libraries were prepared according to the Visium Spatial
   Gene Expression User Guide ([184]CG000239, Rev B, 10x Genomics). The
   constructed library was sequenced on Illumina NovaSeq 6000 Systems with
   a sequencing depth of at least 100 000 reads per spot with a pair‐end
   150 bp reading strategy.

ST Library Sequence Alignment, Annotation, and Data Analysis

   The Space Ranger Pipeline (version 1.0.0, 10X Genomics, Pleasanton,
   CA)^[ [185]^48 ^] was used to conduct data quality statistical analyses
   of the ST sequencing data. Reads were mapped to the human GRCh38 genome
   assembly and aligned using STAR. Spots with more than 10% mitochondrial
   genes or fewer than 200 detected gene counts were discarded. The UMI
   count matrix was processed using the R package Seurat (version 3.1.1).
   First, the data were normalized with sctransform^[ [186]^49 ^] in order
   to account for variance in sequencing depth across data points, detect
   high‐variance features, and store the data in the SCT assay.

   Top variable genes across single cells were identified using the method
   described by Macosko et al.^[ [187]^50 ^] Briefly, the average
   expression and dispersion were calculated for each gene, genes were
   subsequently placed into x bins based on the expression. Principal
   component analysis was performed to reduce the dimensionality on the
   log‐transformed gene‐barcode matrices of top variable genes. Cells were
   clustered based on a graph‐based clustering approach, and were
   visualized in 2D using tSNE. Likelihood ratio test that simultaneously
   tests for changes in mean expression and in the percentage of expressed
   cells was used to identify significantly DEGs between clusters.

   To identify molecular features that correlate with spatial location
   within a tissue, Seurat was used to perform differential expression
   based on pre‐annotated anatomical regions within the tissue, which may
   be determined either from unsupervised clustering or prior knowledge.
   The BMS score was calculated for each ST spot using the
   AddModuleScore() function with default parameters in Seurat with the
   top 25–100 significantly DEGs between EDCs and other epithelial cells
   in four tumor samples. To deconvolute cell types within scRNA‐seq,
   first the cell number of each cell type was down‐sampled to 100 in the
   corresponding scRNA‐seq data, and then SPOTlight^[ [188]^23 ^] was
   applied to deconvolute and seven cell types were mapped to ST section.
   The probability was set lower than 0.2 of each cell type as noise, then
   chose the highest probability out of all cell type as the identified
   cell type.

Determination of Cell Type Enrichment/Depletion by MIA

   MIA was an analytical method that integrated single‐cell expression
   profiles with spatial transcriptome data.^[ [189]^15 ^] This analysis
   was proceeded by first delineating sets of cell type‐specific and
   tissue region‐specific genes and then determining whether their overlap
   was higher (enrichment) or lower (depletion) than expected by chance.
   After obtaining two kinds of gene sets, the significance of the overlap
   between ST genes and cell type marker genes was queried using the
   hypergeometric cumulative distribution, with all genes as the
   background to compute the p‐value. In parallel, test was done for cell
   type depletion by computing −log10(1 − P).

Cell–Cell Communication Analysis

   To infer the potential intercellular communication network between
   clusters, CellChat R package (version 1.1.3)^[ [190]^22 ^] was used to
   quantitatively measure networks through the law of mass action based on
   the average expression values of a ligand by one cell group and that of
   a receptor by another cell group, as well as their cofactors. The
   normalized expression matrix was imported and a cellchat object was
   created with the createCellChat() function. After preprocessing data
   with dentifyOverExpressedGenes(), identifyOverExpressedInteraction(),
   and projectData(), the computeCommunProb, filterCommunication
   (min.cells = 10), and computeCommunProbPathway functions were used to
   calculate potential ligands–receptor interactions. Finally, using the
   “aggregateNet” function in CellChat, the aggregated cell–cell
   communication network was calculated.

KEGG Enrichment Analysis, Hallmark Pathway Enrichment Analysis, GSVA, and
ssGSEA of DEGs

   To determine the biological and molecular functional processes of the
   prioritized gene list as well as their significant enriched pathways,
   the KEGG enrichment analysis and Hallmark pathway enrichment analysis
   were performed by clusterProfiler (Version 3.6.0) with p <0.05. To find
   which functional pathways work in specific cell clusters, GSVA package
   (version 1.30.0)^[ [191]^51 ^] was used to perform gene set enrichment
   analyses based on H dataset (version 7.2) from MsigDB
   ([192]http://www.gsea‐msigdb.org/gsea/msigdb/) and calculated scores
   for pathway activity values in each cell. Finally, LIMMA package
   (version 3.38.3) was used to calculate the difference in signaling
   pathway activity between different groups. Pathways with adjusted
   p‐value >0.05 were not considered for further analysis. The upregulated
   and downregulated pathways were visualized by bar plot. Also, ssGSEA
   was used that define an enrichment score to represent the degree of
   absolute enrichment of a gene set in each sample, including KEGG, Gene
   Ontology Biological Process, and Reactome pathways. Normalized
   enrichment scores could be calculated for OXPHOS or glycolysis pathway.
   The ssGSEA analysis was performed in R package GSVA. p‐value <0.05 was
   considered as statistical significance.

Plasmid/shRNA Construction and Virus Infection

   Human COX6C and DHRS2 were amplified with the reverse‐transcribed cDNA
   from MDA‐MB‐468, T‐47D, and MCF‐7 cell lines and cloned into the
   lentiviral vector pSIN‐Flag (puromycin‐resistant; Addgene), and the
   authenticity was verified by sequencing, respectively. shRNA sequences
   of COX6C and DHRS2 were purchased from Sigma‐Aldrich and cloned into
   the lentiviral vector pLKO.1 (Addgene). A highly efficient lentiviral
   system was used to generate the viruses. The cell lines were infected
   with the lentiviruses, and stable cell lines were established. The
   lentiviral transfection efficiency was more than 90% in all cell lines.

Western Blot Analysis

   Total protein was extracted using RIPA buffer (Thermo Fisher
   Scientific, Waltham, MA, USA) and detected utilizing BCA assay (Thermo
   Fisher Scientific, Waltham, MA, USA). 30 µg of protein per sample was
   separated by SDS‐PAGE, and then transferred onto PVDF membrane (Gene
   Molecular Biotech, Inc., Shanghai, China). After obturated with 5% milk
   for 2 h at room temperature, the membrane was incubated overnight at
   4 °C with primary antibodies as follow: COX6C (2 µg mL^−1, Abcam,
   ab110267), DHRS2 (0.1 µg mL^−1, Abcam, ab220483), β‐Catenin (1:5000,
   Abcam, ab32572), Vimentin (1:1000, Abcam, ab92547), E‐cadherin
   (1:10 000, Abcam, ab40772), Snail (1:1000, Abcam, ab216347), and
   β‐tubulin (1:1000, Abcam, ab179511). Afterward, the membrane was
   incubated with HRP‐conjugated goat anti‐rabbit IgG secondary antibodies
   (1:7500, Abcam, ab205718) or HRP‐conjugated goat anti‐mouse IgG
   secondary antibodies (1:7500, Abcam, ab205719) for 1 h at room
   temperature, the expression level was measured with an ECL kit (Roche
   Diagnostics, Basel, Switzerland) by Western blot imaging system.

Cell Proliferation Assays

   1 × 10^3 cells were incubated in 96‐well plates. Then, 10 µL Cell
   Counting Kit‐8 (Dojindo, Rockville, MD, USA) solution was added to each
   well and incubated for 2 h to evaluate cell proliferation. The
   absorbance of each well was measured at OD450 with a Tecan Infinite
   M1000 PRO (Tecan, Switzerland) from days 1 to 7.

Transwell Assays

   The transwell migration and invasion assays were conducted with Corning
   Transwell Inserts (8.0 µm). For the transwell migration assay,
   1.5 × 10^4 transfected cells suspended in 50 µL serum‐free medium were
   placed in the upper chamber and 600 µL medium (10% FBS) was filled in
   the lower compartment. The cells were incubated at 37 °C for 24 h. The
   successfully translocated cells were fixed with 4% paraformaldehyde and
   stained with 0.1% crystal violet, and counted in four randomly chosen
   fields (200×) under a microscope. For the transwell invasion assay,
   1.2 × 10^5 cells were seeded on transwell coated with 50 µL Matrigel
   (dilution of 1:4 with 0.2% BSA). It is worth noting that Matrigel was
   used to coat membranes for 12 h at 37 °C before invasion assays. The
   culture condition was the same as the transwell migration assay. The
   cells on the lower surface were fixed, stained, and photographed
   microscopically after 48 h.

Immunohistochemical Staining of Formalin‐Fixed, Paraffin‐Embedded Tissue

   Formalin‐fixed, paraffin‐embedded tumor tissues of two additional cases
   were separately cut into 5‐µm sections and mounted on glass slides. The
   slides were baked at 65 °C overnight. After deparaffinization and
   hydration, these slides were boiled in citrate buffer at 100 °C for
   15 min. Subsequently, a 3% H2O2 solution was used to block endogenous
   peroxidase activities for 20 min. To prevent nonspecific antibody
   binding, the slides were incubated with 5% normal goat serum for 1 h at
   room temperature. Then these slides were incubated at 4 °C overnight
   with anti‐COX6C and anti‐DHRS2 primary antibody (Abcam, ab110267 and
   ab220483), 1:100, which was validated for IHC by the manufacturer on
   HeLa cells and RT‐4 cells, respectively. Following washes with TBST for
   three times, the slides were then incubated with HRP‐conjugated goat
   anti‐rabbit/mouse secondary antibody (GeneTech, GK500705) for 1 h at
   room temperature. Sections were stained by DAB and then counterstained
   with hematoxylin according to the manufacturer's instructions.

Quantification and Statistical Analysis

   All the statistical analyses were performed using R (version 3.6.1) and
   GraphPad Prism 8.0. The assay was repeated at least three times and the
   data were presented as mean ± standard deviation. Student's t‐test,
   bimod test, Fisher's exact test, Wilcoxon rank‐sum test, likelihood
   ratio test, spearman correlation analysis, and log‐rank test were
   utilized in this study. p‐values of less than 0.05 were considered
   statistically significant (ns, p ≥ 0.05; *, p < 0.05; **, p < 0.01;
   ***, p < 0.001; ****, p < 0.0001).

Conflict of Interest

   The authors declare no conflict of interest.

Author Contributions

   Y.‐M.L., J.‐Y.G., and Y.‐F.C. contributed equally to this work.
   Conceptualization: K.‐D.Y. and Z.‐M.S. Methodology: Y.‐M.L., J.‐Y.G.,
   L.C., and Y.‐W.C. Investigation: D.M., Y.‐Y.C., and C.‐C.L.
   Visualization: Y.‐M.L. and J.‐Y.G. Supervision: K.‐D.Y., Z.‐M.S.,
   Y.‐Y.X., and C.‐C.L. Writing—original draft: Y.‐M.L. and Y.‐F.C.
   Writing—review & editing: K.‐D.Y. and Z.‐M.S. All authors read and
   approved the final manuscript.

Supporting information

   Supporting Information
   [193]Click here for additional data file.^ (2.6MB, pdf)

   Supplemental Dataset 1
   [194]Click here for additional data file.^ (10.4KB, xlsx)

   Supplemental Dataset 2
   [195]Click here for additional data file.^ (2.2MB, xlsx)

   Supplemental Dataset 3
   [196]Click here for additional data file.^ (37.8KB, xlsx)

   Supplemental Dataset 4
   [197]Click here for additional data file.^ (17.2KB, xlsx)

   Supplemental Dataset 5
   [198]Click here for additional data file.^ (854.8KB, xlsx)

   Supplemental Dataset 6
   [199]Click here for additional data file.^ (17.2KB, xlsx)

   Supplemental Dataset 7
   [200]Click here for additional data file.^ (14.1KB, xlsx)

   Supplemental Dataset 8
   [201]Click here for additional data file.^ (23.3KB, xlsx)

Acknowledgements