Abstract Metastasis remains a leading cause of cancer-related mortality, irrespective of the primary tumour origin. However, the core gene regulatory program governing distinct stages of metastasis across cancers remains poorly understood. We investigate this through single-cell transcriptome analysis encompassing over two hundred patients with metastatic and non-metastatic tumours across six cancer types. Our analysis revealed a prognostic core gene signature that provides insights into the intricate cellular dynamics and gene regulatory networks driving metastasis progression at the pan-cancer and single-cell level. Notably, the dissection of transcription factor networks active across different stages of metastasis, combined with functional perturbation, identified SP1 and KLF5 as key regulators, acting as drivers and suppressors of metastasis, respectively, at critical steps of this transition across multiple cancer types. Through in vivo and in vitro loss of function of SP1 in cancer cells, we revealed its role in driving cancer cell survival, invasive growth, and metastatic colonisation. Furthermore, tumour cells and the microenvironment increasingly engage in communication through WNT signalling as metastasis progresses, driven by SP1. Further validating these observations, a drug repurposing analysis identified distinct FDA-approved drugs with anti-metastasis properties, including inhibitors of WNT signalling across various cancers. Supplementary Information The online version contains supplementary material available at 10.1186/s12943-024-02182-w. Keywords: Transcription Factors, Gene regulation, Single-cell heterogeneity, Metastasis, Cancer Introduction Cancer metastasis dramatically reduces survival and is the greatest cause of death for these patients [[40]1, [41]2]. Metastasis involves cancer cells leaving the primary tumour and colonising distant organs [[42]3, [43]4]. During the early stages of metastasis, cancer cells acquire the epithelial–mesenchymal transition (EMT) program to become mobile and invasive [[44]5, [45]6]. Furthermore, the metastatic progression is influenced by complex interactions of cancer cells with the microenvironment at the primary and secondary sites, manifesting different cell fates [[46]3, [47]4, [48]7]. Despite over 200 drugs approved in the last six decades targeting various aspects of this process, the overall survival in metastatic disease remains poor [[49]8]. Moreover, while all cancer types share hallmarks of metastasis, whether a treatment could target this process irrespective of the tissue origin is unclear [[50]9, [51]10]. Although combination therapy or neoadjuvant therapy can have a therapeutic benefit in several cancers, their drawbacks include triggering of the metastatic cascade and drug resistance [[52]11–[53]13]. Elucidating the potential of when a patient is likely to metastasize and targeting it timely and effectively is the biggest unmet need in clinical practice. Previous studies have focused on investigating features of metastatic potential in specific cancers with several signatures showing pre-clinical utility in their corresponding cancer types [[54]14, [55]15]. For example, in breast cancer, many studies have identified distinct gene expression signatures that can predict metastatic progression [[56]16–[57]18]. Similar advances have been made for lung, colorectal and prostate cancers [[58]19–[59]21]. However, these signatures only apply to the cancer type which they were identified in and thus show no versatile utility for other cancer types [[60]22]. This has further prevented identifying common therapeutic targets for preventing metastasis. In recent years, there has been a shift in the examination of the molecular mechanisms of metastasis at the pan-cancer level, which has uncovered subtypes of metastasis that transcend primary tumours and helped inform precision medicine [[61]23–[62]25]. For example, a recent paper investigating the genomic landscape of primary and metastatic tumours at the pan-cancer level found metastatic lesions to be less heterogeneous than reported for primary tumours, implying that shared transcriptional programs across metastatic tumours might exist [[63]26]. However, these studies applied bulk sequencing techniques, which mask the heterogeneity within the tumour and tumour microenvironment [[64]27]. This is a key limitation, as metastasis is a multi-step process, involving continuous communication within subpopulations of the tumour and with the microenvironment. This information is masked when averaging information in bulk sequencing approaches [[65]27]. These approaches have consequently not been able to help current clinical practices detect early disseminating cells, resulting in underestimation of a patient’s current metastatic state and risk. To address this unmet need and overcome the limitation, we conducted a pan-cancer single-cell transcriptome analysis, involving over 200 patients with metastatic and non-metastatic tumours across six cancer types. Our research has identified a core gene signature that effectively detects disseminating cancer cells and elucidates the cellular dynamics and gene regulatory networks driving stepwise metastasis progression at both the pan-cancer and single-cell levels. Notably, we dissected transcription factor networks active across various stages of metastasis. Through functional perturbation, we identified SP1 and KLF5 as crucial regulators, acting as drivers and suppressors of metastasis, respectively, across multiple cancer types. Additionally, we found that tumour cells and the microenvironment increasingly communicate via WNT signalling as metastasis begins, driven by SP1. Furthermore, a drug repurposing analysis followed by in vitro validation identified FDA-approved drugs with anti-metastasis properties, including inhibitors of WNT signalling, across various cancers. Results A core gene signature provides insights into metastatic potential across multiple cancer types Given the known shared cellular hallmarks of metastasis across cancers, we hypothesised that a conserved metastatic signature may exist across multiple cancer types irrespective of tissue origin. To test this hypothesis, we curated single-cell transcriptome (scRNA-seq) data from 17 studies that encompassing 222 patients across six different cancer types (colorectal, gastric, lung, nasopharyngeal (NPC), ovarian, pancreatic ductal adenocarcinoma (PDAC), breast) (Supplementary Table 1) all of which are known to have a high risk of metastasis. After removing samples with fewer than 100 malignant cells, based on the results of applying inferCNV on each, we integrated each sample using Seurat, resulting in a pan-cancer dataset comprised of expression profiles of more than 1.2 million (1,237,224) cancer cells from 266 tumour samples (Fig. [66]1A-B). Fig. 1. [67]Fig. 1 [68]Open in a new tab Defining the core transcriptional landscape driving pan-cancer metastasis. A Graphical overview of the study, highlighting the cancer types examined, the multi-omics data utilised, the in-silico analysis methods employed, and the validation approaches for in silico findings. B UMAP projection of pan-cancer single-cell RNA-seq (scRNA-seq) data, annotated by cancer types and cell types. C The top panel shows the number of programs associated with the expression of metastatic gene lists, while the bottom panel presents the clustering analysis of genes frequently associated with 25 or more programs across all samples, ranked by their association with the number of archetypes. D The top panel illustrates the aim to define a refined epithelial cell type–specific signature from 286 genes, and the bottom panel displays the cell type specificity scores of each metastatic gene across different cell types, with clusters annotated to highlight cluster-specific expression of the signature. E Metastatic scoring of each TCGA pan-cancer patient, stratifying them into high and low metastatic potential groups. F Kaplan–Meier survival plot of patients stratified by metastatic potential genes in the TCGA pan-cancer cohort Our primary objective was to identify specific groups of cells, regardless of their cell type, with a high potential to metastasize in patients across different types of cancer in our pan-cancer scRNA-seq. To do this, we started by examining the Human Cancer Metastasis Database in which they have a list of 2,183 genes available to download that have been linked to metastasis in various types of cancer [[69]28]. From this database, we split the downloaded gene list into two lists, in which 1,426 genes were linked to metastasis based on one publication [[70]28], and 753 genes were referred by multiple publications [[71]28]. To create a core set of genes that can effectively identify metastatic cells, we performed a detailed analysis on each patient's sample utilizing multiresolution archetypal analysis from the ACTIONet R package [[72]29]. This enabled us to identify common cells across different types of cancer and different patients based on gene expression patterns related to each metastatic gene list. Furthermore, this approach would enable us to select relevant genes, in the context of scRNA-seq, from a list of genes obtained from various omics-methods and cancer types as detailed in their publication [[73]28]. To identify archetype programs with high metastatic potential, we applied our metastatic gene sets to score each archetype, using UCell, across the scRNA-seq data from patients. Specifically, we evaluated the expression levels of genes from each metastatic gene set within each archetype program. Archetypes exhibiting high expression of multiple genes from both sets were deemed to have increased metastatic potential. Supplementary Fig. 1A illustrates representative results for lung and breast cancer patients, where archetypes with high scores (indicated in red) were selected for further analysis. These high-scoring archetype programs, enriched with numerous metastasis-associated genes, likely represent cell subpopulations with a greater propensity for metastasis, as indicated by their shared transcriptomic signatures related to metastatic activity. This approach allowed us to narrow down specific archetype programs potentially linked to metastatic processes. Next, we ranked the archetypes based on their UCell scores and concentrated on those with the highest UCell scores, as these are most likely associated with metastasis. We extracted genes present in 25 or more archetypes across all patients from these top-scoring archetypes. This threshold was determined by the inflection point of our curve analysis (Fig. [74]1C), ensuring that we selected genes consistently expressed in a substantial number of archetypes. We then performed linear regression analysis based on the number of archetypes in which each gene was expressed. We identified the top-ranking genes that define these archetype programs by clustering the genes according to this metric. This analysis culminated in a core metastatic signature of 286 genes (Fig. [75]1C), representing a key set of genes implicated in metastatic progression. Interestingly, these signature genes were associated with important processes related to metastasis, such as cell adhesion, regulation of cell proliferation, and epithelial cell differentiation (Supplementary Fig. 1B) [[76]30]. Next, we scored each patient in our pan-cancer scRNA-seq data for a shared high expression of these 286 genes to identify those with high metastatic potential and to ensure that a specific cancer type was not driving the identified gene list. Interestingly, the outcome showed that this signature could rank each patient from high or low metastatic potential with no clear patterns in tumour type driving this scoring (Supplementary Fig. 1C). We extracted available metadata on tumour stages for each patient to further investigate the relationship between our ranking and clinical parameters. Tumours were categorized as either above or below stage III, and we performed a correlation analysis between the tumour stage and the ranking scores derived from the expression of the 286 signature genes. The analysis revealed a weak positive correlation (r = 0.134) between the ranking scores and tumour stage (Supplementary Fig. 1D). This modest association aligns with our expectations, given that the ranking was based on the entire gene expression profile rather than a subset specifically optimized for predicting metastatic potential. While higher ranking scores tend to be associated with more advanced tumour stages, this ranking is not fully predictive. Next, we aimed to validate the robustness of our gene signature using bulk RNA-seq data, accompanied by patient relapse-free survival information. However, bulk RNA-seq data can be influenced by a mixture of cell types, potentially skewing the results. We hypothesized that a gene signature specific to cancer epithelial cells, the main drivers of tumour behaviour, would provide more accurate prognostic information. To address this, we calculated a cell-type specificity score by evaluating the expression of each of the 286 metastasis-associated genes across all cell types. We then averaged these scores across tumours and visualized them in a heatmap, revealing nine distinct clusters based on gene expression patterns (Fig. [77]1D). Since many genes were not specific to epithelial cells, we focused on clusters 4 and 5, which exhibited high expression in epithelial cells. This refinement yielded 177 genes with high epithelial specificity and minimal expression in other cell types (Fig. [78]1D). This subset provided a more targeted gene signature relevant to cancer epithelial cells. Gene ontology (GO) analysis of these 177 genes revealed their involvement in migratory processes and B cell activation (Supplementary Fig. 1E), both of which are critical in cancer progression and metastasis. We also conducted GO analysis on the remaining 109 genes from the original 286 that were not epithelial-specific. These genes were enriched in pathways related to extracellular matrix organization, angiogenesis, and blood vessel development, highlighting their significant roles in supporting the tumour microenvironment and metastatic processes (Supplementary Fig. 1F). To investigate the prognostic capabilities of our 177 gene signature, we first measured the average expression of all 177 genes for each patient across all cancer types in the TCGA data [[79]31] and showed statistically significantly higher expression in tumour versus normal tissue across 14 out 22 cancer types, in particular, cancers of epithelial cell origin (Supplementary Fig. 1G). Hence, our refined 177 gene metastatic signature can be detected in bulk RNA-seq datasets and holds discriminatory power for tumour versus normal samples. Next, we calculated a metastatic score across all patients in the TCGA pan-cancer cohort and stratified patients into high or low metastatic potential using the median score within each cancer type as a cut-off (Fig. [80]1E). Subsequently, we modelled the relapse free survival (RFS) with high and low metastatic scores for a range of cancers. Here, we evaluated data from various established resources, including the Breast Invasive Carcinoma Collection (BRCA), head and neck squamous cell carcinoma (HNSC), Kidney Renal Clear Cell Carcinoma (KIRC), Rectum Adenocarcinoma (READ), Prostate Adenocarcinoma (PRAD) and Lung Adenocarcinoma (LUAD). In all cases, we found a high metastatic score associated with a reduced RFS (cox hazard ratio: 2.65; p = 7.18 × 10 − 5) (Fig. [81]1F). The significant classification of patients into high and low RFS across multiple cancer types supports the prognostic utility of our 177 gene signature beyond specificity to a single of these cancer types. Identification of molecularly distinct early disseminating cells through metastatic scoring To delineate the potential core mechanisms of metastasis shared across cancers, we examined our scRNA-seq data to identify cells with low and high metastatic potential. Towards this, we used the refined gene signature to score each cell from low to high metastatic potential [[82]32]. Interestingly, most cells displayed metastatic scores in the intermediate range rather than high or low scores (Fig. [83]2A). Subsequently, we binarized the cell scoring into 16 distinct bins. We categorized the cells with scores at the top 20% as high, the bottom 20% as low, and the remaining 60% as mid. Using these data, we investigated the differences in the transcriptional landscape between low and high metastatic cells using pseudobulk differential gene expression analysis (Fig. [84]2B). Here, the genes that were significantly higher expressed in metastatic cells included many genes previously known to be involved in this process in separate cancers, including LCN2 [[85]33, [86]34] and AGR2 [[87]35] (Fig. [88]2B). These observations further validate our approach that the 177-core gene signature can identify cells with a high metastatic potential and further reveals a wider gene regulatory network driving this progression. Interestingly, cells with a high metastatic score showed enrichment for metastasis associated biological processes, including cell motility, locomotion, and immune activation (Fig. [89]2C). Fig. 2. [90]Fig. 2 [91]Open in a new tab A refined metastatic signature uncovers the pan-cancer molecular landscape of metastatic cells. A UMAP projection of cells scored for metastatic potential from low to high using UCell. B Differential gene expression (DEG) analysis comparing cells with low versus high metastatic scores. C Gene Ontology (GO) terms associated with genes exhibiting higher expression in high metastatic potential cells. D Spatial transcriptomics plots illustrating tumour regions (indicated by black dashed lines) with metastatic scoring based on a 177-gene signature using UCell. (E) Expression patterns of the metastatic signature across stromal, tumour body, and invasive edge regions in spatial transcriptomics data. F GO terms enriched in genes with significantly higher expression in invasive edge clusters compared to the tumour body Having identified cells in scRNA-seq data, we next sought to explore the utility of our signature in spatially locating metastatic cells within a tumour using spatial transcriptomics data. This approach aimed to reveal insights into potential cell–cell interactions within the tumour microenvironment and validate that our signature was selecting cells with a high metastatic potential due to their spatial locations. To this end, we obtained and analyzed the spatial transcriptomics data of four breast cancer patients containing the primary tumour, the invasive edge and the surrounding stroma [[92]36]. By implementing UCell scoring [[93]32] of our 177 gene metastatic signature, we successfully identified groups of cells with a high metastatic potential on each patient slide (Fig. [94]2D). Of note, our signature genes were most highly expressed along the invasive edge, followed by the tumour body as compared to the surrounding stroma (Fig. [95]2E). Furthermore, cells with a high metastatic score on the invasive edge were significantly enriched for GO terms associated with metastasis and cell migration (Fig. [96]2F). We further validated these observations by analyzing additional spatial transcriptomic datasets from patients with invasive breast cancer [[97]36] and prostate cancer [[98]37]. Specifically, we applied UCell scoring to our refined 177-gene metastatic signature and overlaid these scores onto the spatial transcriptomics plots of the respective tissues. Visually, the regions annotated as invasive carcinoma exhibited higher UCell scores, indicating elevated expression of our gene signature (Supplementary Fig. 2A-B). To quantify this observation, we extracted the metadata from both datasets and performed a Wilcoxon rank-sum test to compare the UCell scores between invasive carcinoma regions and all other tissue regions. The statistical analysis revealed that invasive carcinoma regions had significantly higher scores than non-carcinoma regions (Supplementary Fig. 2C-D). This result reinforces the specificity and relevance of our 177-gene signature in identifying regions of invasive carcinoma within spatial transcriptomic data. Thus, our 177 gene signature can capture cells with metastatic phenotypes in both scRNA-seq and spatial transcriptomic data from different cancers. Furthermore, using our 177 gene signature, we are classifying cells that are potentially on the verge of dissemination, highlighting the validity and relevance of our signature to the pan-cancer phenomenon. Shared metastatic fate driven by correlated gene expression program across distinct cell types Subsequently, to enhance the visualisation of scored cells in the scRNA-seq data, we employed force-directed graph (FDG), which depicts cellular relationships based on their mutual interactions and similarities in gene expression profiles. This analysis revealed a compelling observation in which cells with elevated metastatic scores clustered together (Fig. [99]3A). This spatial organisation of metastatic scored cells suggested a coherent progression or trajectory associated with metastatic scoring (Fig. [100]3A). To resolve the metastatic trajectories at the pan-cancer level, we used a CellRank computed KNN graph as well as the CytoTRACE pseudotime [[101]38] (Supplementary Fig. 3A). Additionally, we calculated the directed transition matrix and transition streams on the FDG embedding to determine cellular differentiation kinetics. Fig. 3. [102]Fig. 3 [103]Open in a new tab Simulated cell fate mapping reveals metastatic cellular dynamics from low to high metastatic potential across cancers. (A) Force-directed graph of pan-cancer single-cell RNA-seq (scRNA-seq) data, with cells coloured based on their metastatic scores. (B) Metastatic transition matrix illustrating cellular dynamics transitioning from low to high metastatic potential. (C) Identification of genes driving cell fate progression in epithelial (blue) and fibroblast (yellow) lineages during the transition from low to high metastatic potential. (D) Expression levels of CTHRC1 and ANO3 in primary breast cancer compared to metastatic sites, as measured by RNA-sequencing (RNA-seq) Interestingly, we identified a collective metastatic trajectory from low to high metastatic potential irrespective of cancer and cell types (Fig. [104]3B). Importantly, CellRank revealed that cell type-specific genes drive metastasis, implying that distinct genes can drive the same metastatic fate in different cell types (Fig. [105]3C). Of note, a set of these genes have previously been implicated in similar processes. For example, we identified CTHRC1 as a key gene in driving metastatic progression, and it has been previously implicated in several metastatic cancers [[106]39, [107]40]. In particular, CTHRC1 was shown to be secreted from cancer-associated fibroblasts in breast cancer and promote invasion, EMT processes and activation of the Wnt/β-catenin pathway [[108]41]. In epithelial cells, ANO3 and FGP4 were identified as key drivers of metastasis and may represent novel therapeutic targets across cancers. Next, we explored the expression of our cell type-specific metastatic genes in RNA-seq data from primary breast cancer and metastatic sites [[109]42]. CTHRC1 and ANO3 were shown to be highly expressed in metastatic tumours compared to primary tumours (Fig. [110]3D-E). To gain further insights into the underlying transcriptional landscape driving each distinct cell type of metastatic progression, we extracted the top genes driving this progression as calculated by CellRank for epithelial and fibroblast cells and performed gene ontology analysis on each. This showed enrichment for many cancer and metastatic related KEGG terms associated with driving this progression (Supplementary Fig. 3B-C). Of note, WNT signalling was found to be the top pathway driving epithelial metastatic progression as well as in fibroblasts, hinting that many cell types, whilst having distinct gene expression profiles, may share common signalling pathways to drive a metastatic progression. These results highlight that correlated expression of distinct genes in specific cell types and common signalling pathways may contribute to metastatic progression across different cancers, opening the opportunity to identify novel pan-cancer biomarkers. Transcriptional networks underlying metastatic continuum across diverse cancer types Our previous analysis was based solely on primary tumour data and metastatic scoring. Next, we aimed to leverage our 177-gene metastatic signature with pseudotime analysis to identify cells within the primary tumour that have a high potential to metastasize. Additionally, we aimed to identify cells within the secondary site that may have originated from the primary tumour. To achieve this, we obtained scRNA-seq of paired primary PDAC and secondary liver tumours in a PDX model [[111]43]. Following pre-processing, we focused our analysis on epithelial cells only. Next, using our 177 gene metastatic signature, we scored each cell using an unbiased cellular fate trajectory inference using Monocle 2 and observed a continuum. In the mouse data from the primary tumour to the liver metastatic site, the highest-scoring cells appear to align with the low and intermediate pseudotime range, while in human primary breast cancer to lymph node metastasis, they align with the intermediate and later range of pseudotime cells in a separate branch (Fig. [112]4A-B). Identical results were obtained using datasets from paired samples of primary PDAC to lung metastatic site in a PDX model. (Supplementary Fig. 4A&B). Furthermore, similar analyses were obtained using scRNA-seq from primary breast cancer and paired lymph node and lung metastases [[113]44] where our 177 gene metastatic score identified cells on a continuum with the highest scoring cells spanning the intermediate and later pseudotime ranges (Fig. [114]4B, Supplementary Fig. 4C). These observations suggest that our 177-gene signature can identify cells undergoing dissemination and those recently metastasized to the secondary site. Fig. 4. [115]Fig. 4 [116]Open in a new tab Refined metastatic signature can recapitulate the cascade of tumour migration. A Monocle 2 trajectory analysis of paired patient-derived xenograft (PDX) pancreatic ductal adenocarcinoma (PDAC) and liver metastasis cells, with cells coloured by site, pseudotime, and metastatic score using UCell. B Monocle 2 trajectory analysis of paired breast cancer and lymph node metastasis cells, similarly, coloured by site, pseudotime, and metastatic score using UCell. C Differential gene expression along pseudotime for the PDX PDAC and liver metastasis pair, quality of fitting is calculated using McFadden's Pseudo R^2 D Differential gene expression along pseudotime for the breast cancer and lymph node metastasis pair, quality of fitting is calculated using McFadden's Pseudo R^2 (E) Spatial transcriptomic profiling of a breast cancer patient, scored for metastatic potential using UCell. F Trajectory analysis of the breast cancer spatial transcriptomics data, highlighting genes that drive the observed cellular trajectories We next attempted to identify the order in which genes are either switched on or off along the discovered metastasis trajectory and therefore exported our Monocle 2 objects as inputs into the R package GeneSwitches that uses a statistical framework based on logistic regression for this purpose [[117]45]. Interestingly, this analysis in PDX primary-to-liver and human breast cancer-to-lymph node trajectory identified several distinct and shared genes that are dynamically switched on or off as cells moved towards a highly metastatic fate (Fig. [118]4C-D). For example, the transcription factor SP1, a known driver of metastasis, was identified to gain in activity in the later stages in both cases. In PDX primary to liver metastasis trajectory, we also found a dramatic switch between Tet2 and Dnmt1 at the earlier time points (Fig. [119]4C), two enzymes with opposing functions in DNA methylation, which is known to be aberrant in cancer [[120]46]. Utilising human breast cancer spatial-transcriptomics data, we next sought to explore the trajectory from non-invasive ductal carcinoma in situ (DCIS) to invasive carcinoma and identify drivers common with our metastatic gene list. We began by using stlearn [[121]47] to reconstruct cell type evolution in spatial transcriptomics data [[122]47]. After scoring each region with our 177 gene metastatic signature, we aimed to reconstruct the spatial trajectory between a DCIS cluster with a low metastatic score and an IDC cluster with a high metastatic score (Fig. [123]4E-F). Interestingly, this analysis revealed that many genes driving this trajectory were within our pan-cancer 177 gene metastatic signature (Fig. [124]4F), including COL1A2, whose high expression is known to promote the tumour cell proliferation and metastasis in oesophageal cancer [[125]48], highlighting the utility of our pan-cancer 177 gene signature in distinct cancer types. Altogether, our pan-cancer 177 gene metastatic signature can accurately identify and arrange cells along a metastatic continuum across multiple cancer types. Furthermore, these results further highlight that many of the genes within our signature potentially contribute to the dissemination of tumour cells. Emergence of intercellular WNT signalling from the microenvironment to tumour epithelial cells during metastatic progression Communication between cancer cells and the surrounding stromal cells impacts tumour proliferation, metastasis and treatment failure [[126]49, [127]50]; and its disruption holds the potential to combat metastatic progression [[128]27]. Since our 177 gene signature showed high expression in cell types beyond epithelial cells, and each distinct subtype shared similar signalling pathways (Fig. [129]1D), we next sought to investigate the cell-to-cell communication networks driving metastatic progression at the pan-cancer level. In the first instance, we used CellChat [[130]51] to uncover signalling interactions between low, mid and high scored metastatic cells. We calculated the cell–cell communication network for all cell types in high metastasis scored cells, where the width of the edges represents the strength of the communication and revealed a wide range of communication networks between all cell types (Fig. [131]5A). Finally, we compared all the signalling pathways in cells scored as high to low (Supplementary Fig. 5A) and mid (Fig. [132]5B, Supplementary Fig. 5B). Fig. 5. [133]Fig. 5 [134]Open in a new tab Identification of WNT signalling as a key driver of communication networks in metastatic cells. A Total cell–cell interactions across different cell types in samples with high metastatic potential. B Comparison of the number of interactions between metastatic high and metastatic medium scored cells. C WNT signalling interactions among cell types in metastatic medium scored cells. D WNT signalling interactions among cell types in metastatic high scored cells. E Communication networks mediated by WNT signalling in metastatic medium scored cells. F Communication networks mediated by WNT signalling in metastatic high scored cells. G Top ligand interactions driving WNT signalling in metastatic high cells. (H) UCell scoring of WNT target genes across metastatic timepoints, highlighting gradual increases in expression and the implication of WNT signalling in metastatic cells Interestingly, we found that WNT signalling was most active in high metastatic cells, followed by mid and inactive in low metastatic cells across all cancers. WNT signalling has previously been implicated in metastatic progression, and it appeared among one of the top pathways driving epithelial and fibroblast metastatic cell fate (Supplementary Fig. 3A-B). However, the cell types involved in the communication networks were previously unknown [[135]52]. In mid scored cells, endothelial cells were the dominant cell type receiving, sending, mediating (controlling signalling flow), and influencing (a hybrid measure of controlling communication) WNT signalling. In high scored cells, multiple cell types received WNT signalling, with epithelial cells emerging as the dominant receiver along with the endothelial cells (Fig. [136]5 C-F). Notably, we found that this communication was primarily driven by the WNT3A ligand, the FZD8 receptor and the LRP5 co-receptor (Fig. [137]5G). In line with these observations, LRP5 has been shown to correlate significantly with tumour metastasis [[138]53]. The robustness of these observations was further confirmed using another highly cited tool, LIgand-receptor ANalysis framework (LIANA [[139]54]) where we see a higher activity of WNT5A and FZD4 ligand and receptors only in metastatic cells (Supplementary Fig. 5E). Interestingly, genes encoding some of these ligand-receptor pairs were found to be bound by SP1, suggesting its potential involvement in regulating their expression cells undergoing metastasis (Supplementary Fig. 5F). To corroborate these findings, we performed cell–cell communication analysis on spatial transcriptomics data of invasive breast cancer using stlearn [[140]47]. We scored cells using our 177 gene signature and calculated the communication networks between highly scored (i.e. “metastasis high”) cells and the surrounding stromal cells (Fig. [141]4E-F). To further substantiate these findings, we next investigated the dynamics in the expression of WNT signalling target genes during metastatic progression. Towards this, we scored each cell using UCell for WNT target gene expression and plotted their scores across each metastasis timepoint (Fig. [142]5H). Notably, this analysis revealed that the WNT target genes had increasing expression across metastasis time points, with high scored cells having the highest expression. This aims to validate our findings and supports previous observations that WNT signalling plays a key role in the maintenance and proliferation of tumour cells [[143]39, [144]55]. Altogether, these observations establish that emergent WNT signalling from the stroma to tumour epithelial cells potentially plays a key role in the metastatic transition. Drug repurposing analysis reveals distinct FDA-approved drugs targeting pan-cancer metastasis Drug repurposing has recently become highly attractive as it permits new uses of a drug outside the scope of its original medical approval or investigation, accelerating patient support [[145]56]. Given the strong potency of our signature in identifying metastatic cells across cancers, we implemented the drug repurposing recommendation tool ASGARD [[146]58] on our scRNA-seq data to identify any existing drugs with the potential to target metastasis. Here, we first divided our merged pan-cancer scRNA-seq dataset into low metastatic cells (bottom 20% score) and high metastatic cells (top 20% score) (Fig. [147]6A). We then used limma [[148]59] to identify differentially expressed genes between cell types classified as metastasis low vs metastasis high. These consistently differentially expressed genes were then used as inputs to identify drugs that can significantly (FDR < 0.05) reverse their expression levels in the L1000 drug response dataset, which comprises 591,697 drug/compound treatments. We next applied ASGARD and predicted 15 drugs (FDR < 0.05 and overall drug score > 0.99) for the treatment of metastasis from breast, colorectal, lung, NPC, ovarian and/or PDAC cancers (Fig. [149]6B). Fig. 6. [150]Fig. 6 [151]Open in a new tab In silico drug repurposing analysis reveals FDA-approved drugs targeting metastatic cells. A UMAP projection of low and high metastatic cohorts, coloured by distinct cell types. B Drug repurposing strategy aimed at targeting metastatic cells. C Gene Ontology (GO) term analysis of cell type–specific genes targeted by the top three FDA-approved drugs. D Schematic illustrating the targeting of WNT signalling to disrupt cell–cell communication networks that drive metastatic progression The top candidate in our repurposing analysis, Vorinostat, was shown to have a high drug score across five out of 6 cancer types. It is a histone deacetylase (HDAC) inhibitor approved for the treatment of cutaneous T-cell lymphoma and has been used to treat metastatic tumours in several cancer types and is the focus of many clinical trials, including for breast cancer treatment [[152]60–[153]63]. Furthermore, Vorinostat has been shown to have a potential role in modulating cell proliferation via WNT signalling and the cell cycle through degradation of β-catenin, resulting in an inhibition of cell proliferation, with a cell cycle arrest occurring in G1/G0. Additionally, Vorinostat treatment has been shown to impede cell migration [[154]64, [155]65]. We performed a GO analysis to identify the target genes and pathways of the top three candidates (Vorinostat, thioridazine, sirolimus) across the top four cell type clusters based on cell type proportion in each condition (Fig. [156]6C). This analysis uncovered WNT signalling as a target, particularly in epithelial, endothelial, and T cells (Fig. [157]6C). Altogether, these findings suggest that FDA-approved drugs that disrupt WNT signalling could be repurposed to overcome or prevent metastatic progression across multiple cancer types. Conserved gene regulatory networks driving metastatic cascade Next, we investigated how transcription factor (TF) networks contribute to the stage-specific metastasis programs we identified from scRNA-seq data. Utilising CellOracle [[158]66] we defined gene regulatory networks, in which CellOracle [[159]66] constructed GRN models between a TF and its target genes for each metastatic timepoint. We then utilised CellOracle [[160]66] again to assess the contribution of each TF using centrality metrics, resulting in a final list of TFs for each metastatic score category (Fig. [161]7A). Here, degree centrality scoring in the high metastatic cluster gene regulatory network configuration successfully recognised key TFs associated with metastatic disease, such as SP1 and E2F4 (Fig. [162]7A). In contrast, the low and mid metastatic score clusters identified TFs that are known to be associated with reducing or inhibiting metastatic progression, such as STAT1 and KLF5 (Fig. [163]7A) [[164]67, [165]68]. Fig. 7. [166]Fig. 7 [167]Open in a new tab Reconstructing low to high metastatic regulatory networks conserved across cancers. A Transcription factors (TFs) within the gene regulatory network (GRN) associated with different metastatic stages. B Network dynamics of TFs across metastatic stages, coloured by MET_Score stage. C Expression levels of KLF5 and SP1 in normal, tumour, and metastatic samples, as measured by RNA sequencing (RNA-seq). D Genome browser tracks of SP1 ChIP-seq data, highlighting WNT target genes bound by SP1. D Schematic overview of the SP1 ChIP-seq data analysis. The SP1 ChIP-seq data were derived from ENCODE database for metastatic-like HCT116 and non-metastatic-like MCF7 cells and analysed using standard ChIP-seq analysis pipeline. E The barplot shows number of peaks detected for SP1 bound regions in HCT116 and MCF7 cells. F The barplot shows annotation of SP1 bound genes at promoter and non-promoter regions of the genome. G The dotplot represents top enriched pathways of SP1 bound genes in HCT116 and MCF7 cells. The top enriched pathways were derived using hallmark gene signatures from Molecular Signatures Database (MSigDB). H) The browser tracks show SP1 binding signal at hallmark WNT pathway genes As we had details on each regulon across each metastatic stage, we analysed how network connectivity changes during metastasis to gain an insight into the contribution of each GRN along our metastatic continuum. We were particularly interested in two TFs, SP1 and KLF5, given their previous implications in cancer metastasis and opposite kinetics in our data during metastasis progression (Fig. [168]7B). Interestingly, the network scores for SP1 recapitulate the metastatic progression scores, with reduced activity in the low-to-mid scored cells and a sharp increase in the high scored cells (Fig. [169]7B). In contrast, KLF5 had a high network score in the mid scored cells and a sharp decrease in cells with a high metastatic score (Fig. [170]7B) [[171]67]. To further validate these observations, we investigate the expression dynamics of these TFs in normal tissues, primary and metastatic tumours. We found that KLF5 had lower expression in metastatic tumours versus primary tumours and normal tissue, hence potentially acting as a tumour/metastatic suppressor, whereas SP1 displayed the opposite pattern, functioning likely as a tumour/metastatic promoter (Fig. [172]7C). These expression patterns clearly highlight that our observed continuum correlates with metastatic cancer progression and is also in line with previous observations where high expression of SP1 was found to be associated with an unfavourable prognosis across multiple cancer types, which directly correlates with TNM staging [[173]69]. We next sought to explore the possibility of SP1 in directly regulating WNT-related genes in metastatic cells. Towards this, we processed SP1 ChIP-seq from a metastatic (Colon: HCT116) and a non-metastatic (Breast: MCF7) cancer cell line from the ENCODE database and further analysed to identify SP1-drive transcriptional circuitry in these cell models (Fig. [174]7D). Interestingly, we found a dramatically larger number of regions bound by SP1 in metastatic cells compared to non-metastatic cells, suggesting a more active gene regulatory role of SP1 in metastatic cells (Fig. [175]7E). Notably, SP1 binding occurs more in the distal regulatory regions within metastatic cells compared to non-metastatic cells, where SP1 occupancy is at the proximal regulatory sites (Fig. [176]7F). Given the established critical role of distal regulatory elements in defining cell identity, SP1 potentially functions as a dominant driver of metastatic cell features in these cells. Notably, these data showed a clear occupancy of SP1 at regulatory elements of several WNT target genes, such as WNT7B and JUN genes in metastatic cells, but did not target these loci in non-metastatic like cells [[177]70–[178]73] (Fig. [179]7G-H). These results suggest that SP1 is an upstream inducer of WNT signalling genes during metastatic progression. Overall, our data uncovered key TFs that function at different stages of metastatic progression to drive essential driver pathways. SP1 controls cell survival, invasive growth and metastatic colonisation As a proof-of-principle for prioritising TFs as novel therapeutic targets, we examined our breast cancer scRNA-seq and perturbed SP1 in silico (Supplementary Fig. 6A). We scored each cell using our 177 gene signature on an FDG layout of the breast cancer subset and again found that metastatic cells appeared to cluster together (Supplementary Fig. 6B). We first calculated the pseudotime and development flow using CellOracle [[180]66] and found it followed the sample progression as our metastatic score (Fig. [181]8A). Next, we used CellOracle-based simulation of SP1 perturbation to recapitulate the progression from low to high metastatic cell fate. Using the 16 metastatic gene regulatory network configurations inferred by CellOracle, we simulated SP1 knockout signal propagation (expression set to 0 across all cells), enabling the prediction of future gene expression, and hence the direction of cell identity transitions, at single-cell resolution. This simulation predicts a visual shift of high metastatic cell identity toward a low metastatic signature following SP1 knockout (Fig. [182]8B). Fig. 8. [183]Fig. 8 [184]Open in a new tab SP1 and KLF5 have opposing roles in the metastasis program. A Pseudotime calculation of cells, showing overlap with metastatic scoring and the transition from low to high metastatic potential cells. B In silico perturbation of SP1 alters the metastatic transition trajectory. C UMAP projections of MB231 (high metastatic) and HCC1806 (low metastatic) cells, coloured by transcription factor knockdown (TF KD) and non-targeting control (siNTC) cohorts. D UMAP projection with cells scored from low to high metastatic potential using UCell. E Representative immunoblotting images of SP1 knockdown and non-targeting control (NT) MDA-MB-231 cells (n = 4). F Representative Crystal Violet assay images for viability in SP1 knockdown and non-targeting control (NT) MDA-MB-231 cells (n = 4), with quantification of absorbance at 595 nm (right). G Representative images and quantitation of knockdown and control MDA-MB-231 spheroid growth embedded in Collagen-I at day 0 (D0) and day 1 (D1), scale bar = 50 μm (n = 3). H Schematic of lung colonisation assay using vital dye-stained SP1 knockdown (red) and control MDA-MB-231 cells (green) co-injected into the tail vein of NXG mice. Lungs were imaged 24 h post-injection, displaying representative images with a heatmap analysis using QuPath pixel mapping. Cell nuclei are stained with Hoechst 33,342. Quantitation of the area occupied by fluorescent cells in the lungs (%) for siNT and siSP1 MDA-MB-231 cells is shown (right). Scale bar = 100 μm. The same experiment was repeated with inverted vital dye colours (SP1 knockdown in green and NT controls in red) (n = 2). Violin plots display median (blue) with interquartile ranges. p-values were calculated using unpaired t-tests. All n numbers indicate independent experiments unless otherwise stated We repeated this analysis in a mouse scRNA-seq dataset containing primary and matched lung metastases to validate these findings. We scored each cell using our 177 gene signature and calculated the gene regulatory networks at each metastatic stage, which revealed Sp1 as a key regulator in highly metastatic cells (Supplementary Fig. 6C-D). Interestingly, another transcription factor, Klf2, was found to have the highest activity in mid scored cells(Supplementary Fig. 6D) and is a well-known repressor of metastasis in multiple cancer types [[185]74, [186]75]. Calculating the pseudotime and development flow using CellOracle followed the sample progression as our metastatic score (Supplementary Fig. 6E). Notably, the in-silico perturbation of SP1 predicted a clear shift from high to low metastatic potential (Supplementary Fig. 6F). Targeting SP1 TFs has been previously shown to disrupt metastatic cancer in vitro [[187]76]. Collectively, these findings suggest that SP1 inhibitors could potentially prevent or reverse metastatic progression across multiple cancer types. Following the initial in silico analysis, we sought to validate and gain deeper insights into the transcriptional and cell-fate changes resulting from SP1 and KLF5 knockdown in vitro. For this purpose, we identified the MDA-MB-231 cell line as having a high metastatic potential and the HCC1806 cell line as having a low metastatic potential based on our 177-gene signature (Supplementary Fig. 7A-B). Subsequently, we conducted SP1 depletion in the highly metastatic MDA-MB-231 cell line and KLF5 knockdown in the low metastatic potential HCC1806 cell line and performed single-cell transcriptome (scRNA-seq) analysis in biological replicates. Further analysis revealed that the loss of SP1 made the highly metastatic cells cluster together with the low metastatic potential control cells (Fig. [188]8C). Moreover, SP1 knockdown cells exhibited a significant reversal in metastatic scoring of the 177 genes, confirming the inhibitory effects of SP1 knockdown (Fig. [189]8D). Furthermore, gene ontology analysis revealed a remarkable decrease in the expression of genes associated with metastasis (Supplementary Fig. 7C-D). For example, key pathways related to GTPase signalling were prominently downregulated, indicating a potential suppression of the metastatic phenotype [[190]77, [191]78]. Additionally, genes associated with cell migration exhibited reduced expression levels, further supporting the role of SP1 in governing the metastatic gene expression program. Overall, these results highlight a significant role of SP1 in driving the gene expression program underlying metastatic progression. Conversely, KLF5 knockdown in HCC1806 cells showed a significant increase in metastatic scoring, resulting in KLF5 knockdown cells clustering with the high metastatic potential control cells and separating from the control, poorly metastatic HCC1806 cells, indicating an increase in metastatic potential following KLF5 knockdown (Fig. [192]8C-D). In direct contrast to the SP1 depletion, the gene ontologies for KLF5 loss were enriched for pro-metastatic biological processes (Supplementary 7Fig E–F). For example, GTPase signalling pathways and genes associated with cell migration were significantly upregulated, confirming an enhancement in metastatic potential (Supplementary Fig. 8E). To further confirm our in-silico findings, we performed SP1 knockdown in MDA-MB-231 cells and subjected them to a set of assays that measure features of metastatic cells. First, we observed reduced cell viability compared to control cells (Fig. [193]8E-F). Next, we measured the invasive growth of SP1 knockdown cells using the 3D spheroid assay on collagen [[194]79]. We observed reduced invasive growth in SP1 knockdown cells compared to control (Fig. [195]8F). Finally, we performed in vivo lung colonisation assays that measure cancer cell survival in the lung parenchyma. Briefly, siRNA-transfected cells stained with vital dyes were co-injected at a 1:1 ratio into the tail-vein of immunocompromised mice, and the percentage of cells retained in the lungs was quantified [[196]80–[197]82]. We observed that SP1 knockdown cells were less competent to grow in the lung parenchyma than control cells (Fig. [198]8G). Overall, this data suggests that SP1 controls several features of metastatic cancer cells in vitro and in vivo. SP1-driven WNT Pathway activity is essential for metastatic features To explore a direct role for SP1 in driving WNT pathway activity to promote metastatic features, we further analyzed the SP1 ChIP-seq datasets from metastatic (HCT116) and non-metastatic (MCF7) cancer cells (shown in Fig. [199]7D-H). By overlapping SP1-bound genes with those misregulated upon SP1 knockdown, we identified a subset of targets, including key WNT pathway genes (WNT7B, DVL1, JUN, PPP2R5B, and NFATC2), that were downregulated in the absence of SP1 (Fig. [200]9A). Fig. 9. [201]Fig. 9 [202]Open in a new tab Induction of WNT pathway genes by SP1 drives metastatic features. A Venn diagram shows overlap of SP1 bound genes with downregulated genes upon siSP1. The five genes shown inside the venn diagram are WNT pathway genes. B The bar graph shows relative qPCR fold change for the expression of GAPDH, SP1, WNT7B, DVL1, JUNC and NFATC2 upon SP1 knockdown in MDA-MB-231 cells. Each bar indicates the mean of replicate values. Error bar indicates SEM. For statistical analysis, student ‘s t-test is performed (* p < 0.05, *** p < 0.001). Gene expression was normalized to the corresponding expression of each gene upon siNTC control knockdown in MDA-MB-231. C Immunoblotting for SP1 knockdown in MDA-MB-231 (left plot) cells against SP1, a-Tubulin and DVL1. A-Tubulin served as a loading control. siNTC: siRNA for non-targeting control. The right plot showing Immunoblotting for GFP-SP1 overexpression in HCC1806 cells against GFP, a-Tubulin and DVL1. A-Tubulin served as a loading control. siNTC: siRNA for non-targeting control. D The browser tracks show SP1 binding signal at hallmark WNT pathway genes WNT7B and DVL1 in HCT116 and MCF7 cells. E ChIP-qPCR relative fold enrichment for SP1 binding at the promoter sites of WNT7B and DVL1. Each bar indicates the mean of replicate values. Error bar indicates SEM. For statistical analysis, student ‘s ttest is performed (* p < 0.05). IgG is used as a negative antibody control for ChIP. Non-metastatic: HCC1806 cells; metastatic: MDA-MB-231 cells. F The boxplot showing expression of WNT7B and DVL1 in normal, primary and metastatic tumors from colon and breast cancer patients. The expression levels were derived from TNMplot database. G Representative images of Immunofluorescent staining for non-metastatic and metastatic CRC human tissue section. DVL1 in green, WNT7B in red and DAPI in blue. Scale bar indicates 20 microns. Region of interest is marked in dashed yellow rectangle. H) Quantification of immunofluorescent signal intensity for WNT 7B and DVL1. Each bar represents the mean intensity. Error bars indicate SEM. Each dot represents the quantification value for each individual region. For statistical analysis, student ‘s t test was performed (* p < 0.05, ** p < 0.01). I Representative bright field images of incucyte experiment for SP1 overexpressing HCC180 and highly metastatic HCT116 cancer cells with and without treatment with the five selected drugs (vorinostat, Thiaridazine, Niclosamide, Salinomycin and Foxy 5 on Day 0, 2 and 3. J Quantification for the incucyte experiment coupled with all drug treatments shown in (I). Vor: vorinostat, Thio: thioridazine, Salino: salionomycin, Niclo: niclosamide, Foxy: Foxy-5 To further validate SP1's regulation of WNT pathway genes, we performed SP1 knockdown in two metastatic cancer cells MDA-MB-231 and HCT116. The results showed a marked reduction in cell migration upon SP1 knockdown compared to controls (Supplementary Figure S8C, D), consistent with the previous assays (Fig. [203]8E, F) that demonstrated reduced metastatic potential. Importantly, WNT7B and DVL1 were significantly downregulated following SP1 knockdown (Fig. [204]9B-C), reinforcing the idea that the WNT pathway is a downstream target of SP1 in regulating metastasis. We also investigated whether SP1 overexpression alone in non-metastatic cells is sufficient to induce metastatic behavior. Interestingly, such ectopic overexpression of SP1 in HCC1806 cells resulted in a dramatically faster migration, as confirmed by scratch assays and Incucyte Live-Cell imaging (Supplementary Figure S8E-F; Fig. [205]9I, J). Furthermore, this accompanied an induction of WNT pathway genes at the RNA (Supplementary S8A) and protein level (Fig. [206]9C). ChIP-seq analysis further showed that SP1 binds at the regulatory elements of these genes only in metastatic cancer cells and not in non-metastatic cells (Fig. [207]9D). To independently confirm the occupancy of SP1 at WNT pathway genes, we performed chromatin immunoprecipitation assay followed by qPCR (ChIP-qPCR) for WNT7B and DVL1 loci. The results show a clear occupancy of SP1 at the regulatory elements of these genes in independent metastatic cancer cells (MDA-MB-231) but not in non-metastatic (HCC1806) cancer cells (Fig. [208]9E). We extended our analysis by investigating WNT pathway activity in metastatic tumors. First, we injected metastatic breast cancer cells (MDA-MB-231) into mice and collected tumor samples post-metastasis. Staining for WNT7B and DVL1 in these tumors revealed their high expression levels and a strong co-localization (Supplementary S8B). Furthermore, RNA-seq analysis in a large cohort revealed that the expression of SP1-target WNT pathway genes were significantly elevated in metastatic tumors compared to the controls (Fig. [209]9F). These observations were further validated by immunohistochemistry in colorectal cancer patient samples, where DVL1 and WNT7B exhibited significantly higher and more homogeneous expression in metastatic tissues compared to non-metastatic ones (Fig. [210]9G, H). Lastly, we aimed at testing whether such SP1-driven metastatic behaviour can be disrupted by pharmacological approaches for therapeutic purposes. In our earlier analysis, we had identified vorinostat and thioridazine as the top drugs targeting metastatic cells in majority of the cancer types (5 of 6 cancer types) in ASGARD analysis which ranks FDA-approved drugs against cell populations using scRNA-seq datasets (Fig. [211]6B). Furthermore, given the activation of WNT signaling in metastatic cells, we also shortlisted three WNT pathway inhibitors: Niclosamide and Salinomycin (both FDA-approved), and Foxy-5 (currently in Phase 2 clinical trials). Interestingly, application of these drugs in two independent highly metastatic cancer cell lines, MDA-MB-231 (breast) and HCT116 (colon), severely impaired their migratory capacity to varying degrees, with some cell-type specific exceptions (Fig. [212]9I, J; Supplementary Figure S8 G-I). Furthermore, similar effects were observed for SP1-overexpression induced migratory behaviour in normally non-metastatic cancer cells (HCC1806) (Fig. [213]9I, J). These findings underscore the essential role of the SP1/WNT axis in metastasis and highlight the potential for therapeutic targeting of this pathway to inhibit cancer progression. Discussion Our study presents a comprehensive analysis of a 177-gene signature that offers significant insights into the metastatic progression of various cancer types. This pan-cancer approach underscores the utility of a unified genetic framework to understand the complex biology underlying metastasis across diverse tumour types and microenvironments. The 177-gene signature highlights key molecular pathways involved in metastatic dissemination, including those related to cell adhesion, migration, and extracellular matrix remodelling. Notably, several genes within this signature have previously been implicated in metastatic processes in specific cancers, suggesting a broader applicability of these molecular mechanisms across multiple cancer types. The results demonstrate that this gene signature can serve as a predictive tool for metastatic potential, providing a valuable resource for both basic and translational cancer research. The ability to predict metastasis using a common genetic signature could facilitate earlier intervention and personalised treatment strategies, potentially improving patient outcomes. Moreover, the integration of this gene signature with existing clinical parameters could enhance the accuracy of prognostic models. Future studies should aim to validate these findings in larger, independent cohorts and explore the therapeutic implications of targeting these key pathways. Our analysis reveals that the metastatic scoring across samples follows a continuous distribution, highlighting the nuanced spectrum of metastatic potential within individual cells. Utilising cellular dynamics modelling on single cells, we observed that cells with lower metastatic scores tend to progress towards a higher metastatic state. This trend is consistent across various cell and cancer types, suggesting a universal trajectory towards metastasis, which presents novel opportunities for therapeutic intervention aimed at halting metastatic progression universally across cancers. Interestingly, while this shared metastatic fate is common among different cell and cancer types, each exhibits a unique underlying transcriptional program. Notably, genes such as ANO3 and CTHRC1 emerged as novel contributors to pan-cancer metastasis, with their expression levels showing a direct correlation. This finding underscores the potential for these genes to serve as biomarkers or therapeutic targets, warranting further investigation. Moreover, our exploration into the transcriptional programs driving cellular progression towards metastatic high scored cells identified WNT signalling as the primary pathway influencing metastatic progression. This pathway's significant role across different cell types underscores its potential as a target for broad-spectrum anti-metastatic therapies. WNT signalling has been reported to be involved in metastatic progression [[214]55, [215]83]. Our findings reveal specific WNT signaling networks involved in cell-to-cell communication at the single cell level during metastatic progression. This insight opens the opportunity to target WNT signalling to disrupt these communication networks and prevent metastatic progression. We further show that a higher WNT activity is primarily governed by the SP1 transcription factor in high metastatic cells. High levels of SP1 protein have been shown to correlate with tumour cell migration and metastasis in a number of tumour models and patient samples, including gastric and breast cancers [[216]84–[217]88] and WNT signalling activity [[218]89, [219]90]. However, a role for SP1-WNT signalling axis in metastatic progression had remained poorly understood, thereby highlighting the importance of our study which has clearly filled this knowledge gap that applies at the pan-cancer level. Furthermore, our in-silico knockout in breast cancer and paired primary and metastatic site mouse scRNA-seq datasets showed that a loss of SP1 in high metastatic cells could reverse the fates to a low metastatic state. Through our detailed mechanistic investigation in vitro and in vivo validation, we confirmed that SP1 knockdown in metastatic cells MDA-MB-231 and HCT116 resulted in significant suppression of metastatic features which accompanied a downregulation of WNT pathway genes such as WNT7B, DVL1, and JUN. Noticeably, these genes were strongly bound by SP1 in metastatic cells but not in non-metastatic cells. These observations were further confirmed in human tissue sections as well as RNA-seq of low and high metastatic tumours, hence demonstrating the clinical relevance of these findings. Conversely, overexpression of SP1 in lowly metastatic cancer cells was sufficient to confer a migratory behaviour which accompanied upregulation of these markers both at the RNA and protein levels. Furthermore, clustering analysis confirmed the inhibitory effects of SP1 knockdown, while KLF5 knockdown in low metastatic HCC1806 cells resulted in the opposite effect, enhancing metastatic gene ontologies and upregulating GTPase signalling. GTPase signalling plays a crucial role in cancer metastasis. GTPases, such as Ras and Rho, act as molecular switches that regulate cellular processes, including migration [[220]91]. Activation of GTPases promotes dynamic changes in the cytoskeleton, leading to cell protrusions and motility, and Rho GTPases control actomyosin contractility [[221]92]. Dysregulation of GTPase signalling contributes to enhanced cancer cell migration and invasion, promoting metastatic dissemination [[222]93]. Understanding the intricate interplay between GTPase signalling pathways and cell migration mechanisms is vital for developing targeted therapies to impede cancer metastasis and improve patient outcomes. These intriguing findings shed light on the complex and context-dependent roles of SP1 and KLF5 in cancer metastasis. Of note, there have previously been links between KLF5 expression and SP1 in breast and prostate cancers, linking elevated KLF expression with tumour suppressive functions [[223]94]. Further experiments showed that SP1 is critical for cancer cell survival, invasive growth and metastatic colonisation, highlighting its critical importance in metastasis at the pan-cancer level. In conclusion, our integrated analysis of scRNA-seq and metastatic scoring data reveals the opposing roles of SP1 and KLF5 in regulating metastasis by governing the underlying gene expression program in cancer cells. Ultimately, these findings contribute to a better understanding of the metastatic process and offer potential targets for precision medicine approaches in cancer treatment. Future studies aimed at unravelling the downstream targets and crosstalk between SP1 and KLF5 could provide valuable insights into novel therapeutic strategies for managing metastatic breast cancer. Lastly, we set out to explore whether our findings can be used for pharmacological targeting to block metastatic activity across cancers. Our ASGARD analysis revealed FDA-approved vorinostat and thiaridazine as the most effective drugs in targeting metastatic cells in the majority of cancer types. Additionally, given our findings on the activation of WNT signaling in metastatic cells, we included three WNT pathway inhibitors: Niclosamide and Salinomycin (both FDA-approved), and Foxy-5 (in Phase 2 clinical trials). Notably, applying these drugs to naturally metastatic cancer cells or cancer cells forced to migrate with SP1 overexpression markedly reduced their migratory abilities. Vorinostat is a histone deacetylase (HDAC) inhibitor, FDA-approved for the treatment of cutaneous T cell lymphoma [[224]95, [225]96]. In addition, several preclinical and clinical investigations have its promising potential in inhibiting metastasis in cancer [[226]97–[227]100]. Thiaridazine is a low-potency typical antipsychotic but has been shown to have anticancer effects in the brain [[228]101, [229]102] and aggressive breast cancer [[230]103, [231]104]. Similarly, WNT inhibitors salinomycin, Niclosamide and Foxy-5 have previously been known to have anti-tumour activity across cancers [[232]105–[233]109]. These findings demonstrate the therapeutic potential of these drugs in targeting and preventing metastasis in aggressive cancers. Conclusions In this study, we investigated pan-cancer core mechanisms of metastasis by performing the largest single-cell transcriptome analysis involving over 200 patients with both metastatic and non-metastatic tumours across six cancer types. Our findings uncovered a prognostic core gene signature that sheds light on the complex cellular dynamics and gene regulatory networks that govern metastasis. Specifically, the examination of transcription factor networks active at different stages of metastasis, along with functional perturbation experiments, identified SP1 and KLF5 as key regulators at critical transition steps of this process. SP1 acts as a driver of metastasis, while KLF5 functions as a suppressor across multiple cancer types at defined time points of the metastatic transition. In vivo and in vitro loss-of-function studies of SP1 in cancer cells demonstrated its critical role in promoting cancer cell survival, invasive growth, and metastatic colonisation. Moreover, our results showed that as metastasis advances, tumour cells and their microenvironment increasingly communicate through WNT signalling, a process that is driven by SP1. Supporting these findings, a drug repurposing analysis identified several FDA-approved drugs with potential anti-metastasis properties, including inhibitors of WNT signalling, effective across various cancer types. These findings mark a groundbreaking advancement in cancer research by unveiling the core gene regulatory circuitry driving metastasis conserved across various cancers and discovering novel therapeutic avenues. Methods Processing and annotation of scRNA-seq Single-cell RNA sequencing data obtained through droplet-based 10 × Genomics technology was selectively utilised for meaningful comparisons in the analysis. Raw count matrices consisting of Unique Molecular Identifiers (UMIs) and corresponding cell metadata were aggregated from diverse sources (Supplementary Table 1). Quality control Preliminary quality control procedures were conducted for each individual dataset using the Seurat R package (v4.0). Cells selected for subsequent analysis were those with more than 200 detected genes and genes identified in a minimum of 3 cells. To identify and eliminate potential cell doublets, scDblFinder v1.6.0 was applied with default parameters. Cells manifesting a mitochondrial transcript content exceeding 20% were also excluded to mitigate confounding effects. To identify samples with fewer than 100 malignant cells we utilised inferCNV with default parameters. Normalisation and transformation Normalisation was achieved by scaling UMI counts with respect to library size, followed by a log transformation to stabilise variance. Subsequently, the dataset underwent Principal Components Analysis (PCA) to reduce dimensionality. Clustering and visualisation Utilising the first 30 principal components, a nearest neighbour graph was constructed, facilitating subsequent clustering via the Louvain algorithm at a lower resolution (FindClusters, resolution = 0.2). For illustrative purposes, uniform manifold approximation and projection (UMAP) embeddings were produced based on 30 principal components, enabling effective visualisation. Cell Identity annotation SingleR v1.6.1 in conjunction with the Human Primary Cell Atlas from the celldex R package (v1.2.0) was employed to accurately annotate individual cell identities within each sample. Importantly, the references for annotation encompassed expected cell types within the