Abstract

   Metastasis remains a leading cause of cancer-related mortality,
   irrespective of the primary tumour origin. However, the core gene
   regulatory program governing distinct stages of metastasis across
   cancers remains poorly understood. We investigate this through
   single-cell transcriptome analysis encompassing over two hundred
   patients with metastatic and non-metastatic tumours across six cancer
   types. Our analysis revealed a prognostic core gene signature that
   provides insights into the intricate cellular dynamics and gene
   regulatory networks driving metastasis progression at the pan-cancer
   and single-cell level. Notably, the dissection of transcription factor
   networks active across different stages of metastasis, combined with
   functional perturbation, identified SP1 and KLF5 as key regulators,
   acting as drivers and suppressors of metastasis, respectively, at
   critical steps of this transition across multiple cancer types. Through
   in vivo and in vitro loss of function of SP1 in cancer cells, we
   revealed its role in driving cancer cell survival, invasive growth, and
   metastatic colonisation. Furthermore, tumour cells and the
   microenvironment increasingly engage in communication through WNT
   signalling as metastasis progresses, driven by SP1. Further validating
   these observations, a drug repurposing analysis identified distinct
   FDA-approved drugs with anti-metastasis properties, including
   inhibitors of WNT signalling across various cancers.

Supplementary Information

   The online version contains supplementary material available at
   10.1186/s12943-024-02182-w.

   Keywords: Transcription Factors, Gene regulation, Single-cell
   heterogeneity, Metastasis, Cancer

Introduction

   Cancer metastasis dramatically reduces survival and is the greatest
   cause of death for these patients [[40]1, [41]2]. Metastasis involves
   cancer cells leaving the primary tumour and colonising distant organs
   [[42]3, [43]4]. During the early stages of metastasis, cancer cells
   acquire the epithelial–mesenchymal transition (EMT) program to become
   mobile and invasive [[44]5, [45]6]. Furthermore, the metastatic
   progression is influenced by complex interactions of cancer cells with
   the microenvironment at the primary and secondary sites, manifesting
   different cell fates [[46]3, [47]4, [48]7]. Despite over 200 drugs
   approved in the last six decades targeting various aspects of this
   process, the overall survival in metastatic disease remains poor
   [[49]8]. Moreover, while all cancer types share hallmarks of
   metastasis, whether a treatment could target this process irrespective
   of the tissue origin is unclear [[50]9, [51]10]. Although combination
   therapy or neoadjuvant therapy can have a therapeutic benefit in
   several cancers, their drawbacks include triggering of the metastatic
   cascade and drug resistance [[52]11–[53]13]. Elucidating the potential
   of when a patient is likely to metastasize and targeting it timely and
   effectively is the biggest unmet need in clinical practice.

   Previous studies have focused on investigating features of metastatic
   potential in specific cancers with several signatures showing
   pre-clinical utility in their corresponding cancer types [[54]14,
   [55]15]. For example, in breast cancer, many studies have identified
   distinct gene expression signatures that can predict metastatic
   progression [[56]16–[57]18]. Similar advances have been made for lung,
   colorectal and prostate cancers [[58]19–[59]21]. However, these
   signatures only apply to the cancer type which they were identified in
   and thus show no versatile utility for other cancer types [[60]22].
   This has further prevented identifying common therapeutic targets for
   preventing metastasis.

   In recent years, there has been a shift in the examination of the
   molecular mechanisms of metastasis at the pan-cancer level, which has
   uncovered subtypes of metastasis that transcend primary tumours and
   helped inform precision medicine [[61]23–[62]25]. For example, a recent
   paper investigating the genomic landscape of primary and metastatic
   tumours at the pan-cancer level found metastatic lesions to be less
   heterogeneous than reported for primary tumours, implying that shared
   transcriptional programs across metastatic tumours might exist
   [[63]26]. However, these studies applied bulk sequencing techniques,
   which mask the heterogeneity within the tumour and tumour
   microenvironment [[64]27]. This is a key limitation, as metastasis is a
   multi-step process, involving continuous communication within
   subpopulations of the tumour and with the microenvironment. This
   information is masked when averaging information in bulk sequencing
   approaches [[65]27]. These approaches have consequently not been able
   to help current clinical practices detect early disseminating cells,
   resulting in underestimation of a patient’s current metastatic state
   and risk.

   To address this unmet need and overcome the limitation, we conducted a
   pan-cancer single-cell transcriptome analysis, involving over 200
   patients with metastatic and non-metastatic tumours across six cancer
   types. Our research has identified a core gene signature that
   effectively detects disseminating cancer cells and elucidates the
   cellular dynamics and gene regulatory networks driving stepwise
   metastasis progression at both the pan-cancer and single-cell levels.
   Notably, we dissected transcription factor networks active across
   various stages of metastasis. Through functional perturbation, we
   identified SP1 and KLF5 as crucial regulators, acting as drivers and
   suppressors of metastasis, respectively, across multiple cancer types.
   Additionally, we found that tumour cells and the microenvironment
   increasingly communicate via WNT signalling as metastasis begins,
   driven by SP1. Furthermore, a drug repurposing analysis followed by in
   vitro validation identified FDA-approved drugs with anti-metastasis
   properties, including inhibitors of WNT signalling, across various
   cancers.

Results

A core gene signature provides insights into metastatic potential across
multiple cancer types

   Given the known shared cellular hallmarks of metastasis across cancers,
   we hypothesised that a conserved metastatic signature may exist across
   multiple cancer types irrespective of tissue origin. To test this
   hypothesis, we curated single-cell transcriptome (scRNA-seq) data from
   17 studies that encompassing 222 patients across six different cancer
   types (colorectal, gastric, lung, nasopharyngeal (NPC), ovarian,
   pancreatic ductal adenocarcinoma (PDAC), breast) (Supplementary
   Table 1) all of which are known to have a high risk of metastasis.
   After removing samples with fewer than 100 malignant cells, based on
   the results of applying inferCNV on each, we integrated each sample
   using Seurat, resulting in a pan-cancer dataset comprised of expression
   profiles of more than 1.2 million (1,237,224) cancer cells from 266
   tumour samples (Fig. [66]1A-B).

Fig. 1.

   [67]Fig. 1
   [68]Open in a new tab

   Defining the core transcriptional landscape driving pan-cancer
   metastasis. A Graphical overview of the study, highlighting the cancer
   types examined, the multi-omics data utilised, the in-silico analysis
   methods employed, and the validation approaches for in silico findings.
   B UMAP projection of pan-cancer single-cell RNA-seq (scRNA-seq) data,
   annotated by cancer types and cell types. C The top panel shows the
   number of programs associated with the expression of metastatic gene
   lists, while the bottom panel presents the clustering analysis of genes
   frequently associated with 25 or more programs across all samples,
   ranked by their association with the number of archetypes. D The top
   panel illustrates the aim to define a refined epithelial cell
   type–specific signature from 286 genes, and the bottom panel displays
   the cell type specificity scores of each metastatic gene across
   different cell types, with clusters annotated to highlight
   cluster-specific expression of the signature. E Metastatic scoring of
   each TCGA pan-cancer patient, stratifying them into high and low
   metastatic potential groups. F Kaplan–Meier survival plot of patients
   stratified by metastatic potential genes in the TCGA pan-cancer cohort

   Our primary objective was to identify specific groups of cells,
   regardless of their cell type, with a high potential to metastasize in
   patients across different types of cancer in our pan-cancer scRNA-seq.
   To do this, we started by examining the Human Cancer Metastasis
   Database in which they have a list of 2,183 genes available to download
   that have been linked to metastasis in various types of cancer
   [[69]28]. From this database, we split the downloaded gene list into
   two lists, in which 1,426 genes were linked to metastasis based on one
   publication [[70]28], and 753 genes were referred by multiple
   publications [[71]28].

   To create a core set of genes that can effectively identify metastatic
   cells, we performed a detailed analysis on each patient's sample
   utilizing multiresolution archetypal analysis from the ACTIONet R
   package [[72]29]. This enabled us to identify common cells across
   different types of cancer and different patients based on gene
   expression patterns related to each metastatic gene list. Furthermore,
   this approach would enable us to select relevant genes, in the context
   of scRNA-seq, from a list of genes obtained from various omics-methods
   and cancer types as detailed in their publication [[73]28].

   To identify archetype programs with high metastatic potential, we
   applied our metastatic gene sets to score each archetype, using UCell,
   across the scRNA-seq data from patients. Specifically, we evaluated the
   expression levels of genes from each metastatic gene set within each
   archetype program. Archetypes exhibiting high expression of multiple
   genes from both sets were deemed to have increased metastatic
   potential. Supplementary Fig. 1A illustrates representative results for
   lung and breast cancer patients, where archetypes with high scores
   (indicated in red) were selected for further analysis. These
   high-scoring archetype programs, enriched with numerous
   metastasis-associated genes, likely represent cell subpopulations with
   a greater propensity for metastasis, as indicated by their shared
   transcriptomic signatures related to metastatic activity. This approach
   allowed us to narrow down specific archetype programs potentially
   linked to metastatic processes.

   Next, we ranked the archetypes based on their UCell scores and
   concentrated on those with the highest UCell scores, as these are most
   likely associated with metastasis. We extracted genes present in 25 or
   more archetypes across all patients from these top-scoring archetypes.
   This threshold was determined by the inflection point of our curve
   analysis (Fig. [74]1C), ensuring that we selected genes consistently
   expressed in a substantial number of archetypes. We then performed
   linear regression analysis based on the number of archetypes in which
   each gene was expressed. We identified the top-ranking genes that
   define these archetype programs by clustering the genes according to
   this metric. This analysis culminated in a core metastatic signature of
   286 genes (Fig. [75]1C), representing a key set of genes implicated in
   metastatic progression. Interestingly, these signature genes were
   associated with important processes related to metastasis, such as cell
   adhesion, regulation of cell proliferation, and epithelial cell
   differentiation (Supplementary Fig. 1B) [[76]30].

   Next, we scored each patient in our pan-cancer scRNA-seq data for a
   shared high expression of these 286 genes to identify those with high
   metastatic potential and to ensure that a specific cancer type was not
   driving the identified gene list. Interestingly, the outcome showed
   that this signature could rank each patient from high or low metastatic
   potential with no clear patterns in tumour type driving this scoring
   (Supplementary Fig. 1C).

   We extracted available metadata on tumour stages for each patient to
   further investigate the relationship between our ranking and clinical
   parameters. Tumours were categorized as either above or below stage
   III, and we performed a correlation analysis between the tumour stage
   and the ranking scores derived from the expression of the 286 signature
   genes. The analysis revealed a weak positive correlation (r = 0.134)
   between the ranking scores and tumour stage (Supplementary Fig. 1D).
   This modest association aligns with our expectations, given that the
   ranking was based on the entire gene expression profile rather than a
   subset specifically optimized for predicting metastatic potential.
   While higher ranking scores tend to be associated with more advanced
   tumour stages, this ranking is not fully predictive.

   Next, we aimed to validate the robustness of our gene signature using
   bulk RNA-seq data, accompanied by patient relapse-free survival
   information. However, bulk RNA-seq data can be influenced by a mixture
   of cell types, potentially skewing the results. We hypothesized that a
   gene signature specific to cancer epithelial cells, the main drivers of
   tumour behaviour, would provide more accurate prognostic information.
   To address this, we calculated a cell-type specificity score by
   evaluating the expression of each of the 286 metastasis-associated
   genes across all cell types. We then averaged these scores across
   tumours and visualized them in a heatmap, revealing nine distinct
   clusters based on gene expression patterns (Fig. [77]1D).

   Since many genes were not specific to epithelial cells, we focused on
   clusters 4 and 5, which exhibited high expression in epithelial cells.
   This refinement yielded 177 genes with high epithelial specificity and
   minimal expression in other cell types (Fig. [78]1D). This subset
   provided a more targeted gene signature relevant to cancer epithelial
   cells. Gene ontology (GO) analysis of these 177 genes revealed their
   involvement in migratory processes and B cell activation (Supplementary
   Fig. 1E), both of which are critical in cancer progression and
   metastasis. We also conducted GO analysis on the remaining 109 genes
   from the original 286 that were not epithelial-specific. These genes
   were enriched in pathways related to extracellular matrix organization,
   angiogenesis, and blood vessel development, highlighting their
   significant roles in supporting the tumour microenvironment and
   metastatic processes (Supplementary Fig. 1F).

   To investigate the prognostic capabilities of our 177 gene signature,
   we first measured the average expression of all 177 genes for each
   patient across all cancer types in the TCGA data [[79]31] and showed
   statistically significantly higher expression in tumour versus normal
   tissue across 14 out 22 cancer types, in particular, cancers of
   epithelial cell origin (Supplementary Fig. 1G). Hence, our refined 177
   gene metastatic signature can be detected in bulk RNA-seq datasets and
   holds discriminatory power for tumour versus normal samples.

   Next, we calculated a metastatic score across all patients in the TCGA
   pan-cancer cohort and stratified patients into high or low metastatic
   potential using the median score within each cancer type as a cut-off
   (Fig. [80]1E). Subsequently, we modelled the relapse free survival
   (RFS) with high and low metastatic scores for a range of cancers. Here,
   we evaluated data from various established resources, including the
   Breast Invasive Carcinoma Collection (BRCA), head and neck squamous
   cell carcinoma (HNSC), Kidney Renal Clear Cell Carcinoma (KIRC), Rectum
   Adenocarcinoma (READ), Prostate Adenocarcinoma (PRAD) and Lung
   Adenocarcinoma (LUAD). In all cases, we found a high metastatic score
   associated with a reduced RFS (cox hazard ratio: 2.65;
   p = 7.18 × 10 − 5) (Fig. [81]1F). The significant classification of
   patients into high and low RFS across multiple cancer types supports
   the prognostic utility of our 177 gene signature beyond specificity to
   a single of these cancer types.

Identification of molecularly distinct early disseminating cells through
metastatic scoring

   To delineate the potential core mechanisms of metastasis shared across
   cancers, we examined our scRNA-seq data to identify cells with low and
   high metastatic potential. Towards this, we used the refined gene
   signature to score each cell from low to high metastatic potential
   [[82]32]. Interestingly, most cells displayed metastatic scores in the
   intermediate range rather than high or low scores (Fig. [83]2A).
   Subsequently, we binarized the cell scoring into 16 distinct bins. We
   categorized the cells with scores at the top 20% as high, the bottom
   20% as low, and the remaining 60% as mid. Using these data, we
   investigated the differences in the transcriptional landscape between
   low and high metastatic cells using pseudobulk differential gene
   expression analysis (Fig. [84]2B). Here, the genes that were
   significantly higher expressed in metastatic cells included many genes
   previously known to be involved in this process in separate cancers,
   including LCN2 [[85]33, [86]34] and AGR2 [[87]35] (Fig. [88]2B). These
   observations further validate our approach that the 177-core gene
   signature can identify cells with a high metastatic potential and
   further reveals a wider gene regulatory network driving this
   progression. Interestingly, cells with a high metastatic score showed
   enrichment for metastasis associated biological processes, including
   cell motility, locomotion, and immune activation (Fig. [89]2C).

Fig. 2.

   [90]Fig. 2
   [91]Open in a new tab

   A refined metastatic signature uncovers the pan-cancer molecular
   landscape of metastatic cells. A UMAP projection of cells scored for
   metastatic potential from low to high using UCell. B Differential gene
   expression (DEG) analysis comparing cells with low versus high
   metastatic scores. C Gene Ontology (GO) terms associated with genes
   exhibiting higher expression in high metastatic potential cells.
   D Spatial transcriptomics plots illustrating tumour regions (indicated
   by black dashed lines) with metastatic scoring based on a 177-gene
   signature using UCell. (E) Expression patterns of the metastatic
   signature across stromal, tumour body, and invasive edge regions in
   spatial transcriptomics data. F GO terms enriched in genes with
   significantly higher expression in invasive edge clusters compared to
   the tumour body

   Having identified cells in scRNA-seq data, we next sought to explore
   the utility of our signature in spatially locating metastatic cells
   within a tumour using spatial transcriptomics data. This approach aimed
   to reveal insights into potential cell–cell interactions within the
   tumour microenvironment and validate that our signature was selecting
   cells with a high metastatic potential due to their spatial locations.
   To this end, we obtained and analyzed the spatial transcriptomics data
   of four breast cancer patients containing the primary tumour, the
   invasive edge and the surrounding stroma [[92]36]. By implementing
   UCell scoring [[93]32] of our 177 gene metastatic signature, we
   successfully identified groups of cells with a high metastatic
   potential on each patient slide (Fig. [94]2D). Of note, our signature
   genes were most highly expressed along the invasive edge, followed by
   the tumour body as compared to the surrounding stroma (Fig. [95]2E).
   Furthermore, cells with a high metastatic score on the invasive edge
   were significantly enriched for GO terms associated with metastasis and
   cell migration (Fig. [96]2F). We further validated these observations
   by analyzing additional spatial transcriptomic datasets from patients
   with invasive breast cancer [[97]36] and prostate cancer [[98]37].
   Specifically, we applied UCell scoring to our refined 177-gene
   metastatic signature and overlaid these scores onto the spatial
   transcriptomics plots of the respective tissues. Visually, the regions
   annotated as invasive carcinoma exhibited higher UCell scores,
   indicating elevated expression of our gene signature (Supplementary
   Fig. 2A-B).

   To quantify this observation, we extracted the metadata from both
   datasets and performed a Wilcoxon rank-sum test to compare the UCell
   scores between invasive carcinoma regions and all other tissue regions.
   The statistical analysis revealed that invasive carcinoma regions had
   significantly higher scores than non-carcinoma regions (Supplementary
   Fig. 2C-D). This result reinforces the specificity and relevance of our
   177-gene signature in identifying regions of invasive carcinoma within
   spatial transcriptomic data. Thus, our 177 gene signature can capture
   cells with metastatic phenotypes in both scRNA-seq and spatial
   transcriptomic data from different cancers. Furthermore, using our 177
   gene signature, we are classifying cells that are potentially on the
   verge of dissemination, highlighting the validity and relevance of our
   signature to the pan-cancer phenomenon.

Shared metastatic fate driven by correlated gene expression program across
distinct cell types

   Subsequently, to enhance the visualisation of scored cells in the
   scRNA-seq data, we employed force-directed graph (FDG), which depicts
   cellular relationships based on their mutual interactions and
   similarities in gene expression profiles. This analysis revealed a
   compelling observation in which cells with elevated metastatic scores
   clustered together (Fig. [99]3A). This spatial organisation of
   metastatic scored cells suggested a coherent progression or trajectory
   associated with metastatic scoring (Fig. [100]3A). To resolve the
   metastatic trajectories at the pan-cancer level, we used a CellRank
   computed KNN graph as well as the CytoTRACE pseudotime [[101]38]
   (Supplementary Fig. 3A). Additionally, we calculated the directed
   transition matrix and transition streams on the FDG embedding to
   determine cellular differentiation kinetics.

Fig. 3.

   [102]Fig. 3
   [103]Open in a new tab

   Simulated cell fate mapping reveals metastatic cellular dynamics from
   low to high metastatic potential across cancers. (A) Force-directed
   graph of pan-cancer single-cell RNA-seq (scRNA-seq) data, with cells
   coloured based on their metastatic scores. (B) Metastatic transition
   matrix illustrating cellular dynamics transitioning from low to high
   metastatic potential. (C) Identification of genes driving cell fate
   progression in epithelial (blue) and fibroblast (yellow) lineages
   during the transition from low to high metastatic potential. (D)
   Expression levels of CTHRC1 and ANO3 in primary breast cancer compared
   to metastatic sites, as measured by RNA-sequencing (RNA-seq)

   Interestingly, we identified a collective metastatic trajectory from
   low to high metastatic potential irrespective of cancer and cell types
   (Fig. [104]3B). Importantly, CellRank revealed that cell type-specific
   genes drive metastasis, implying that distinct genes can drive the same
   metastatic fate in different cell types (Fig. [105]3C). Of note, a set
   of these genes have previously been implicated in similar processes.
   For example, we identified CTHRC1 as a key gene in driving metastatic
   progression, and it has been previously implicated in several
   metastatic cancers [[106]39, [107]40]. In particular, CTHRC1 was shown
   to be secreted from cancer-associated fibroblasts in breast cancer and
   promote invasion, EMT processes and activation of the Wnt/β-catenin
   pathway [[108]41]. In epithelial cells, ANO3 and FGP4 were identified
   as key drivers of metastasis and may represent novel therapeutic
   targets across cancers. Next, we explored the expression of our cell
   type-specific metastatic genes in RNA-seq data from primary breast
   cancer and metastatic sites [[109]42]. CTHRC1 and ANO3 were shown to be
   highly expressed in metastatic tumours compared to primary tumours
   (Fig. [110]3D-E). To gain further insights into the underlying
   transcriptional landscape driving each distinct cell type of metastatic
   progression, we extracted the top genes driving this progression as
   calculated by CellRank for epithelial and fibroblast cells and
   performed gene ontology analysis on each. This showed enrichment for
   many cancer and metastatic related KEGG terms associated with driving
   this progression (Supplementary Fig. 3B-C). Of note, WNT signalling was
   found to be the top pathway driving epithelial metastatic progression
   as well as in fibroblasts, hinting that many cell types, whilst having
   distinct gene expression profiles, may share common signalling pathways
   to drive a metastatic progression.

   These results highlight that correlated expression of distinct genes in
   specific cell types and common signalling pathways may contribute to
   metastatic progression across different cancers, opening the
   opportunity to identify novel pan-cancer biomarkers.

Transcriptional networks underlying metastatic continuum across diverse
cancer types

   Our previous analysis was based solely on primary tumour data and
   metastatic scoring. Next, we aimed to leverage our 177-gene metastatic
   signature with pseudotime analysis to identify cells within the primary
   tumour that have a high potential to metastasize. Additionally, we
   aimed to identify cells within the secondary site that may have
   originated from the primary tumour. To achieve this, we obtained
   scRNA-seq of paired primary PDAC and secondary liver tumours in a PDX
   model [[111]43]. Following pre-processing, we focused our analysis on
   epithelial cells only. Next, using our 177 gene metastatic signature,
   we scored each cell using an unbiased cellular fate trajectory
   inference using Monocle 2 and observed a continuum. In the mouse data
   from the primary tumour to the liver metastatic site, the
   highest-scoring cells appear to align with the low and intermediate
   pseudotime range, while in human primary breast cancer to lymph node
   metastasis, they align with the intermediate and later range of
   pseudotime cells in a separate branch (Fig. [112]4A-B). Identical
   results were obtained using datasets from paired samples of primary
   PDAC to lung metastatic site in a PDX model. (Supplementary Fig. 4A&B).
   Furthermore, similar analyses were obtained using scRNA-seq from
   primary breast cancer and paired lymph node and lung metastases
   [[113]44] where our 177 gene metastatic score identified cells on a
   continuum with the highest scoring cells spanning the intermediate and
   later pseudotime ranges (Fig. [114]4B, Supplementary Fig. 4C). These
   observations suggest that our 177-gene signature can identify cells
   undergoing dissemination and those recently metastasized to the
   secondary site.

Fig. 4.

   [115]Fig. 4
   [116]Open in a new tab

   Refined metastatic signature can recapitulate the cascade of tumour
   migration. A Monocle 2 trajectory analysis of paired patient-derived
   xenograft (PDX) pancreatic ductal adenocarcinoma (PDAC) and liver
   metastasis cells, with cells coloured by site, pseudotime, and
   metastatic score using UCell. B Monocle 2 trajectory analysis of paired
   breast cancer and lymph node metastasis cells, similarly, coloured by
   site, pseudotime, and metastatic score using UCell. C Differential gene
   expression along pseudotime for the PDX PDAC and liver metastasis pair,
   quality of fitting is calculated using McFadden's Pseudo
   R^2 D Differential gene expression along pseudotime for the breast
   cancer and lymph node metastasis pair, quality of fitting is calculated
   using McFadden's Pseudo R^2 (E) Spatial transcriptomic profiling of a
   breast cancer patient, scored for metastatic potential using UCell.
   F Trajectory analysis of the breast cancer spatial transcriptomics
   data, highlighting genes that drive the observed cellular trajectories

   We next attempted to identify the order in which genes are either
   switched on or off along the discovered metastasis trajectory and
   therefore exported our Monocle 2 objects as inputs into the R package
   GeneSwitches that uses a statistical framework based on logistic
   regression for this purpose [[117]45]. Interestingly, this analysis in
   PDX primary-to-liver and human breast cancer-to-lymph node trajectory
   identified several distinct and shared genes that are dynamically
   switched on or off as cells moved towards a highly metastatic fate
   (Fig. [118]4C-D). For example, the transcription factor SP1, a known
   driver of metastasis, was identified to gain in activity in the later
   stages in both cases. In PDX primary to liver metastasis trajectory, we
   also found a dramatic switch between Tet2 and Dnmt1 at the earlier time
   points (Fig. [119]4C), two enzymes with opposing functions in DNA
   methylation, which is known to be aberrant in cancer [[120]46].

   Utilising human breast cancer spatial-transcriptomics data, we next
   sought to explore the trajectory from non-invasive ductal carcinoma in
   situ (DCIS) to invasive carcinoma and identify drivers common with our
   metastatic gene list. We began by using stlearn [[121]47] to
   reconstruct cell type evolution in spatial transcriptomics data
   [[122]47]. After scoring each region with our 177 gene metastatic
   signature, we aimed to reconstruct the spatial trajectory between a
   DCIS cluster with a low metastatic score and an IDC cluster with a high
   metastatic score (Fig. [123]4E-F). Interestingly, this analysis
   revealed that many genes driving this trajectory were within our
   pan-cancer 177 gene metastatic signature (Fig. [124]4F), including
   COL1A2, whose high expression is known to promote the tumour cell
   proliferation and metastasis in oesophageal cancer [[125]48],
   highlighting the utility of our pan-cancer 177 gene signature in
   distinct cancer types. Altogether, our pan-cancer 177 gene metastatic
   signature can accurately identify and arrange cells along a metastatic
   continuum across multiple cancer types. Furthermore, these results
   further highlight that many of the genes within our signature
   potentially contribute to the dissemination of tumour cells.

Emergence of intercellular WNT signalling from the microenvironment to tumour
epithelial cells during metastatic progression

   Communication between cancer cells and the surrounding stromal cells
   impacts tumour proliferation, metastasis and treatment failure
   [[126]49, [127]50]; and its disruption holds the potential to combat
   metastatic progression [[128]27]. Since our 177 gene signature showed
   high expression in cell types beyond epithelial cells, and each
   distinct subtype shared similar signalling pathways (Fig. [129]1D), we
   next sought to investigate the cell-to-cell communication networks
   driving metastatic progression at the pan-cancer level. In the first
   instance, we used CellChat [[130]51] to uncover signalling interactions
   between low, mid and high scored metastatic cells. We calculated the
   cell–cell communication network for all cell types in high metastasis
   scored cells, where the width of the edges represents the strength of
   the communication and revealed a wide range of communication networks
   between all cell types (Fig. [131]5A). Finally, we compared all the
   signalling pathways in cells scored as high to low (Supplementary
   Fig. 5A) and mid (Fig. [132]5B, Supplementary Fig. 5B).

Fig. 5.

   [133]Fig. 5
   [134]Open in a new tab

   Identification of WNT signalling as a key driver of communication
   networks in metastatic cells. A Total cell–cell interactions across
   different cell types in samples with high metastatic potential.
   B Comparison of the number of interactions between metastatic high and
   metastatic medium scored cells. C WNT signalling interactions among
   cell types in metastatic medium scored cells. D WNT signalling
   interactions among cell types in metastatic high scored cells.
   E Communication networks mediated by WNT signalling in metastatic
   medium scored cells. F Communication networks mediated by WNT
   signalling in metastatic high scored cells. G Top ligand interactions
   driving WNT signalling in metastatic high cells. (H) UCell scoring of
   WNT target genes across metastatic timepoints, highlighting gradual
   increases in expression and the implication of WNT signalling in
   metastatic cells

   Interestingly, we found that WNT signalling was most active in high
   metastatic cells, followed by mid and inactive in low metastatic cells
   across all cancers. WNT signalling has previously been implicated in
   metastatic progression, and it appeared among one of the top pathways
   driving epithelial and fibroblast metastatic cell fate (Supplementary
   Fig. 3A-B). However, the cell types involved in the communication
   networks were previously unknown [[135]52]. In mid scored cells,
   endothelial cells were the dominant cell type receiving, sending,
   mediating (controlling signalling flow), and influencing (a hybrid
   measure of controlling communication) WNT signalling. In high scored
   cells, multiple cell types received WNT signalling, with epithelial
   cells emerging as the dominant receiver along with the endothelial
   cells (Fig. [136]5 C-F). Notably, we found that this communication was
   primarily driven by the WNT3A ligand, the FZD8 receptor and the LRP5
   co-receptor (Fig. [137]5G). In line with these observations, LRP5 has
   been shown to correlate significantly with tumour metastasis [[138]53].
   The robustness of these observations was further confirmed using
   another highly cited tool, LIgand-receptor ANalysis framework (LIANA
   [[139]54]) where we see a higher activity of WNT5A and FZD4 ligand and
   receptors only in metastatic cells (Supplementary Fig. 5E).
   Interestingly, genes encoding some of these ligand-receptor pairs were
   found to be bound by SP1, suggesting its potential involvement in
   regulating their expression cells undergoing metastasis (Supplementary
   Fig. 5F).

   To corroborate these findings, we performed cell–cell communication
   analysis on spatial transcriptomics data of invasive breast cancer
   using stlearn [[140]47]. We scored cells using our 177 gene signature
   and calculated the communication networks between highly scored (i.e.
   “metastasis high”) cells and the surrounding stromal cells
   (Fig. [141]4E-F). To further substantiate these findings, we next
   investigated the dynamics in the expression of WNT signalling target
   genes during metastatic progression. Towards this, we scored each cell
   using UCell for WNT target gene expression and plotted their scores
   across each metastasis timepoint (Fig. [142]5H). Notably, this analysis
   revealed that the WNT target genes had increasing expression across
   metastasis time points, with high scored cells having the highest
   expression. This aims to validate our findings and supports previous
   observations that WNT signalling plays a key role in the maintenance
   and proliferation of tumour cells [[143]39, [144]55]. Altogether, these
   observations establish that emergent WNT signalling from the stroma to
   tumour epithelial cells potentially plays a key role in the metastatic
   transition.

Drug repurposing analysis reveals distinct FDA-approved drugs targeting
pan-cancer metastasis

   Drug repurposing has recently become highly attractive as it permits
   new uses of a drug outside the scope of its original medical approval
   or investigation, accelerating patient support [[145]56]. Given the
   strong potency of our signature in identifying metastatic cells across
   cancers, we implemented the drug repurposing recommendation tool ASGARD
   [[146]58] on our scRNA-seq data to identify any existing drugs with the
   potential to target metastasis. Here, we first divided our merged
   pan-cancer scRNA-seq dataset into low metastatic cells (bottom 20%
   score) and high metastatic cells (top 20% score) (Fig. [147]6A). We
   then used limma [[148]59] to identify differentially expressed genes
   between cell types classified as metastasis low vs metastasis high.
   These consistently differentially expressed genes were then used as
   inputs to identify drugs that can significantly (FDR < 0.05) reverse
   their expression levels in the L1000 drug response dataset, which
   comprises 591,697 drug/compound treatments. We next applied ASGARD and
   predicted 15 drugs (FDR < 0.05 and overall drug score > 0.99) for the
   treatment of metastasis from breast, colorectal, lung, NPC, ovarian
   and/or PDAC cancers (Fig. [149]6B).

Fig. 6.

   [150]Fig. 6
   [151]Open in a new tab

   In silico drug repurposing analysis reveals FDA-approved drugs
   targeting metastatic cells. A UMAP projection of low and high
   metastatic cohorts, coloured by distinct cell types. B Drug repurposing
   strategy aimed at targeting metastatic cells. C Gene Ontology (GO) term
   analysis of cell type–specific genes targeted by the top three
   FDA-approved drugs. D Schematic illustrating the targeting of WNT
   signalling to disrupt cell–cell communication networks that drive
   metastatic progression

   The top candidate in our repurposing analysis, Vorinostat, was shown to
   have a high drug score across five out of 6 cancer types. It is a
   histone deacetylase (HDAC) inhibitor approved for the treatment of
   cutaneous T-cell lymphoma and has been used to treat metastatic tumours
   in several cancer types and is the focus of many clinical trials,
   including for breast cancer treatment [[152]60–[153]63]. Furthermore,
   Vorinostat has been shown to have a potential role in modulating cell
   proliferation via WNT signalling and the cell cycle through degradation
   of β-catenin, resulting in an inhibition of cell proliferation, with a
   cell cycle arrest occurring in G1/G0. Additionally, Vorinostat
   treatment has been shown to impede cell migration [[154]64, [155]65].
   We performed a GO analysis to identify the target genes and pathways of
   the top three candidates (Vorinostat, thioridazine, sirolimus) across
   the top four cell type clusters based on cell type proportion in each
   condition (Fig. [156]6C). This analysis uncovered WNT signalling as a
   target, particularly in epithelial, endothelial, and T cells
   (Fig. [157]6C). Altogether, these findings suggest that FDA-approved
   drugs that disrupt WNT signalling could be repurposed to overcome or
   prevent metastatic progression across multiple cancer types.

Conserved gene regulatory networks driving metastatic cascade

   Next, we investigated how transcription factor (TF) networks contribute
   to the stage-specific metastasis programs we identified from scRNA-seq
   data. Utilising CellOracle [[158]66] we defined gene regulatory
   networks, in which CellOracle [[159]66] constructed GRN models between
   a TF and its target genes for each metastatic timepoint. We then
   utilised CellOracle [[160]66] again to assess the contribution of each
   TF using centrality metrics, resulting in a final list of TFs for each
   metastatic score category (Fig. [161]7A). Here, degree centrality
   scoring in the high metastatic cluster gene regulatory network
   configuration successfully recognised key TFs associated with
   metastatic disease, such as SP1 and E2F4 (Fig. [162]7A). In contrast,
   the low and mid metastatic score clusters identified TFs that are known
   to be associated with reducing or inhibiting metastatic progression,
   such as STAT1 and KLF5 (Fig. [163]7A) [[164]67, [165]68].

Fig. 7.

   [166]Fig. 7
   [167]Open in a new tab

   Reconstructing low to high metastatic regulatory networks conserved
   across cancers. A Transcription factors (TFs) within the gene
   regulatory network (GRN) associated with different metastatic stages.
   B Network dynamics of TFs across metastatic stages, coloured by
   MET_Score stage. C Expression levels of KLF5 and SP1 in normal, tumour,
   and metastatic samples, as measured by RNA sequencing (RNA-seq).
   D Genome browser tracks of SP1 ChIP-seq data, highlighting WNT target
   genes bound by SP1. D Schematic overview of the SP1 ChIP-seq data
   analysis. The SP1 ChIP-seq data were derived from ENCODE database for
   metastatic-like HCT116 and non-metastatic-like MCF7 cells and analysed
   using standard ChIP-seq analysis pipeline. E The barplot shows number
   of peaks detected for SP1 bound regions in HCT116 and MCF7 cells. F The
   barplot shows annotation of SP1 bound genes at promoter and
   non-promoter regions of the genome. G The dotplot represents top
   enriched pathways of SP1 bound genes in HCT116 and MCF7 cells. The top
   enriched pathways were derived using hallmark gene signatures from
   Molecular Signatures Database (MSigDB). H) The browser tracks show SP1
   binding signal at hallmark WNT pathway genes

   As we had details on each regulon across each metastatic stage, we
   analysed how network connectivity changes during metastasis to gain an
   insight into the contribution of each GRN along our metastatic
   continuum. We were particularly interested in two TFs, SP1 and KLF5,
   given their previous implications in cancer metastasis and opposite
   kinetics in our data during metastasis progression (Fig. [168]7B).
   Interestingly, the network scores for SP1 recapitulate the metastatic
   progression scores, with reduced activity in the low-to-mid scored
   cells and a sharp increase in the high scored cells (Fig. [169]7B). In
   contrast, KLF5 had a high network score in the mid scored cells and a
   sharp decrease in cells with a high metastatic score (Fig. [170]7B)
   [[171]67].

   To further validate these observations, we investigate the expression
   dynamics of these TFs in normal tissues, primary and metastatic
   tumours. We found that KLF5 had lower expression in metastatic tumours
   versus primary tumours and normal tissue, hence potentially acting as a
   tumour/metastatic suppressor, whereas SP1 displayed the opposite
   pattern, functioning likely as a tumour/metastatic promoter
   (Fig. [172]7C). These expression patterns clearly highlight that our
   observed continuum correlates with metastatic cancer progression and is
   also in line with previous observations where high expression of SP1
   was found to be associated with an unfavourable prognosis across
   multiple cancer types, which directly correlates with TNM staging
   [[173]69].

   We next sought to explore the possibility of SP1 in directly regulating
   WNT-related genes in metastatic cells. Towards this, we processed SP1
   ChIP-seq from a metastatic (Colon: HCT116) and a non-metastatic
   (Breast: MCF7) cancer cell line from the ENCODE database and further
   analysed to identify SP1-drive transcriptional circuitry in these cell
   models (Fig. [174]7D). Interestingly, we found a dramatically larger
   number of regions bound by SP1 in metastatic cells compared to
   non-metastatic cells, suggesting a more active gene regulatory role of
   SP1 in metastatic cells (Fig. [175]7E). Notably, SP1 binding occurs
   more in the distal regulatory regions within metastatic cells compared
   to non-metastatic cells, where SP1 occupancy is at the proximal
   regulatory sites (Fig. [176]7F). Given the established critical role of
   distal regulatory elements in defining cell identity, SP1 potentially
   functions as a dominant driver of metastatic cell features in these
   cells. Notably, these data showed a clear occupancy of SP1 at
   regulatory elements of several WNT target genes, such as WNT7B and JUN
   genes in metastatic cells, but did not target these loci in
   non-metastatic like cells [[177]70–[178]73] (Fig. [179]7G-H). These
   results suggest that SP1 is an upstream inducer of WNT signalling genes
   during metastatic progression. Overall, our data uncovered key TFs that
   function at different stages of metastatic progression to drive
   essential driver pathways.

SP1 controls cell survival, invasive growth and metastatic colonisation

   As a proof-of-principle for prioritising TFs as novel therapeutic
   targets, we examined our breast cancer scRNA-seq and perturbed SP1 in
   silico (Supplementary Fig. 6A). We scored each cell using our 177 gene
   signature on an FDG layout of the breast cancer subset and again found
   that metastatic cells appeared to cluster together (Supplementary
   Fig. 6B). We first calculated the pseudotime and development flow using
   CellOracle [[180]66] and found it followed the sample progression as
   our metastatic score (Fig. [181]8A). Next, we used CellOracle-based
   simulation of SP1 perturbation to recapitulate the progression from low
   to high metastatic cell fate. Using the 16 metastatic gene regulatory
   network configurations inferred by CellOracle, we simulated SP1
   knockout signal propagation (expression set to 0 across all cells),
   enabling the prediction of future gene expression, and hence the
   direction of cell identity transitions, at single-cell resolution. This
   simulation predicts a visual shift of high metastatic cell identity
   toward a low metastatic signature following SP1 knockout
   (Fig. [182]8B).

Fig. 8.

   [183]Fig. 8
   [184]Open in a new tab

   SP1 and KLF5 have opposing roles in the metastasis program.
   A Pseudotime calculation of cells, showing overlap with metastatic
   scoring and the transition from low to high metastatic potential cells.
   B In silico perturbation of SP1 alters the metastatic transition
   trajectory. C UMAP projections of MB231 (high metastatic) and HCC1806
   (low metastatic) cells, coloured by transcription factor knockdown (TF
   KD) and non-targeting control (siNTC) cohorts. D UMAP projection with
   cells scored from low to high metastatic potential using UCell.
   E Representative immunoblotting images of SP1 knockdown and
   non-targeting control (NT) MDA-MB-231 cells (n = 4). F Representative
   Crystal Violet assay images for viability in SP1 knockdown and
   non-targeting control (NT) MDA-MB-231 cells (n = 4), with
   quantification of absorbance at 595 nm (right). G Representative images
   and quantitation of knockdown and control MDA-MB-231 spheroid growth
   embedded in Collagen-I at day 0 (D0) and day 1 (D1), scale bar = 50 μm
   (n = 3). H Schematic of lung colonisation assay using vital dye-stained
   SP1 knockdown (red) and control MDA-MB-231 cells (green) co-injected
   into the tail vein of NXG mice. Lungs were imaged 24 h post-injection,
   displaying representative images with a heatmap analysis using QuPath
   pixel mapping. Cell nuclei are stained with Hoechst 33,342.
   Quantitation of the area occupied by fluorescent cells in the lungs (%)
   for siNT and siSP1 MDA-MB-231 cells is shown (right). Scale
   bar = 100 μm. The same experiment was repeated with inverted vital dye
   colours (SP1 knockdown in green and NT controls in red) (n = 2). Violin
   plots display median (blue) with interquartile ranges. p-values were
   calculated using unpaired t-tests. All n numbers indicate independent
   experiments unless otherwise stated

   We repeated this analysis in a mouse scRNA-seq dataset containing
   primary and matched lung metastases to validate these findings. We
   scored each cell using our 177 gene signature and calculated the gene
   regulatory networks at each metastatic stage, which revealed Sp1 as a
   key regulator in highly metastatic cells (Supplementary Fig. 6C-D).
   Interestingly, another transcription factor, Klf2, was found to have
   the highest activity in mid scored cells(Supplementary Fig. 6D) and is
   a well-known repressor of metastasis in multiple cancer types [[185]74,
   [186]75]. Calculating the pseudotime and development flow using
   CellOracle followed the sample progression as our metastatic score
   (Supplementary Fig. 6E). Notably, the in-silico perturbation of SP1
   predicted a clear shift from high to low metastatic potential
   (Supplementary Fig. 6F). Targeting SP1 TFs has been previously shown to
   disrupt metastatic cancer in vitro [[187]76]. Collectively, these
   findings suggest that SP1 inhibitors could potentially prevent or
   reverse metastatic progression across multiple cancer types.

   Following the initial in silico analysis, we sought to validate and
   gain deeper insights into the transcriptional and cell-fate changes
   resulting from SP1 and KLF5 knockdown in vitro. For this purpose, we
   identified the MDA-MB-231 cell line as having a high metastatic
   potential and the HCC1806 cell line as having a low metastatic
   potential based on our 177-gene signature (Supplementary Fig. 7A-B).
   Subsequently, we conducted SP1 depletion in the highly metastatic
   MDA-MB-231 cell line and KLF5 knockdown in the low metastatic potential
   HCC1806 cell line and performed single-cell transcriptome (scRNA-seq)
   analysis in biological replicates. Further analysis revealed that the
   loss of SP1 made the highly metastatic cells cluster together with the
   low metastatic potential control cells (Fig. [188]8C). Moreover, SP1
   knockdown cells exhibited a significant reversal in metastatic scoring
   of the 177 genes, confirming the inhibitory effects of SP1 knockdown
   (Fig. [189]8D). Furthermore, gene ontology analysis revealed a
   remarkable decrease in the expression of genes associated with
   metastasis (Supplementary Fig. 7C-D). For example, key pathways related
   to GTPase signalling were prominently downregulated, indicating a
   potential suppression of the metastatic phenotype [[190]77, [191]78].
   Additionally, genes associated with cell migration exhibited reduced
   expression levels, further supporting the role of SP1 in governing the
   metastatic gene expression program. Overall, these results highlight a
   significant role of SP1 in driving the gene expression program
   underlying metastatic progression.

   Conversely, KLF5 knockdown in HCC1806 cells showed a significant
   increase in metastatic scoring, resulting in KLF5 knockdown cells
   clustering with the high metastatic potential control cells and
   separating from the control, poorly metastatic HCC1806 cells,
   indicating an increase in metastatic potential following KLF5 knockdown
   (Fig. [192]8C-D). In direct contrast to the SP1 depletion, the gene
   ontologies for KLF5 loss were enriched for pro-metastatic biological
   processes (Supplementary 7Fig E–F). For example, GTPase signalling
   pathways and genes associated with cell migration were significantly
   upregulated, confirming an enhancement in metastatic potential
   (Supplementary Fig. 8E).

   To further confirm our in-silico findings, we performed SP1 knockdown
   in MDA-MB-231 cells and subjected them to a set of assays that measure
   features of metastatic cells. First, we observed reduced cell viability
   compared to control cells (Fig. [193]8E-F). Next, we measured the
   invasive growth of SP1 knockdown cells using the 3D spheroid assay on
   collagen [[194]79]. We observed reduced invasive growth in SP1
   knockdown cells compared to control (Fig. [195]8F). Finally, we
   performed in vivo lung colonisation assays that measure cancer cell
   survival in the lung parenchyma. Briefly, siRNA-transfected cells
   stained with vital dyes were co-injected at a 1:1 ratio into the
   tail-vein of immunocompromised mice, and the percentage of cells
   retained in the lungs was quantified [[196]80–[197]82]. We observed
   that SP1 knockdown cells were less competent to grow in the lung
   parenchyma than control cells (Fig. [198]8G). Overall, this data
   suggests that SP1 controls several features of metastatic cancer cells
   in vitro and in vivo.

SP1-driven WNT Pathway activity is essential for metastatic features

   To explore a direct role for SP1 in driving WNT pathway activity to
   promote metastatic features, we further analyzed the SP1 ChIP-seq
   datasets from metastatic (HCT116) and non-metastatic (MCF7) cancer
   cells (shown in Fig. [199]7D-H). By overlapping SP1-bound genes with
   those misregulated upon SP1 knockdown, we identified a subset of
   targets, including key WNT pathway genes (WNT7B, DVL1, JUN, PPP2R5B,
   and NFATC2), that were downregulated in the absence of SP1
   (Fig. [200]9A).

Fig. 9.

   [201]Fig. 9
   [202]Open in a new tab

   Induction of WNT pathway genes by SP1 drives metastatic features. A
   Venn diagram shows overlap of SP1 bound genes with downregulated genes
   upon siSP1. The five genes shown inside the venn diagram are WNT
   pathway genes. B The bar graph shows relative qPCR fold change for the
   expression of GAPDH, SP1, WNT7B, DVL1, JUNC and NFATC2 upon SP1
   knockdown in MDA-MB-231 cells. Each bar indicates the mean of replicate
   values. Error bar indicates SEM. For statistical analysis, student ‘s
   t-test is performed (* p < 0.05, *** p < 0.001). Gene expression was
   normalized to the corresponding expression of each gene upon siNTC
   control knockdown in MDA-MB-231. C Immunoblotting for SP1 knockdown in
   MDA-MB-231 (left plot) cells against SP1, a-Tubulin and DVL1. A-Tubulin
   served as a loading control. siNTC: siRNA for non-targeting control.
   The right plot showing Immunoblotting for GFP-SP1 overexpression in
   HCC1806 cells against GFP, a-Tubulin and DVL1. A-Tubulin served as a
   loading control. siNTC: siRNA for non-targeting control. D The browser
   tracks show SP1 binding signal at hallmark WNT pathway genes WNT7B and
   DVL1 in HCT116 and MCF7 cells. E ChIP-qPCR relative fold enrichment for
   SP1 binding at the promoter sites of WNT7B and DVL1. Each bar indicates
   the mean of replicate values. Error bar indicates SEM. For statistical
   analysis, student ‘s ttest is performed (* p < 0.05). IgG is used as a
   negative antibody control for ChIP. Non-metastatic: HCC1806 cells;
   metastatic: MDA-MB-231 cells. F The boxplot showing expression of WNT7B
   and DVL1 in normal, primary and metastatic tumors from colon and breast
   cancer patients. The expression levels were derived from TNMplot
   database. G Representative images of Immunofluorescent staining for
   non-metastatic and metastatic CRC human tissue section. DVL1 in green,
   WNT7B in red and DAPI in blue. Scale bar indicates 20 microns. Region
   of interest is marked in dashed yellow rectangle. H) Quantification of
   immunofluorescent signal intensity for WNT 7B and DVL1. Each bar
   represents the mean intensity. Error bars indicate SEM. Each dot
   represents the quantification value for each individual region. For
   statistical analysis, student ‘s t test was performed (* p < 0.05, **
   p < 0.01). I Representative bright field images of incucyte experiment
   for SP1 overexpressing HCC180 and highly metastatic HCT116 cancer cells
   with and without treatment with the five selected drugs (vorinostat,
   Thiaridazine, Niclosamide, Salinomycin and Foxy 5 on Day 0, 2 and 3. J
   Quantification for the incucyte experiment coupled with all drug
   treatments shown in (I). Vor: vorinostat, Thio: thioridazine, Salino:
   salionomycin, Niclo: niclosamide, Foxy: Foxy-5

   To further validate SP1's regulation of WNT pathway genes, we performed
   SP1 knockdown in two metastatic cancer cells MDA-MB-231 and HCT116. The
   results showed a marked reduction in cell migration upon SP1 knockdown
   compared to controls (Supplementary Figure S8C, D), consistent with the
   previous assays (Fig. [203]8E, F) that demonstrated reduced metastatic
   potential. Importantly, WNT7B and DVL1 were significantly downregulated
   following SP1 knockdown (Fig. [204]9B-C), reinforcing the idea that the
   WNT pathway is a downstream target of SP1 in regulating metastasis. We
   also investigated whether SP1 overexpression alone in non-metastatic
   cells is sufficient to induce metastatic behavior. Interestingly, such
   ectopic overexpression of SP1 in HCC1806 cells resulted in a
   dramatically faster migration, as confirmed by scratch assays and
   Incucyte Live-Cell imaging (Supplementary Figure S8E-F; Fig. [205]9I,
   J). Furthermore, this accompanied an induction of WNT pathway genes at
   the RNA (Supplementary S8A) and protein level (Fig. [206]9C).

   ChIP-seq analysis further showed that SP1 binds at the regulatory
   elements of these genes only in metastatic cancer cells and not in
   non-metastatic cells (Fig. [207]9D). To independently confirm the
   occupancy of SP1 at WNT pathway genes, we performed chromatin
   immunoprecipitation assay followed by qPCR (ChIP-qPCR) for WNT7B and
   DVL1 loci. The results show a clear occupancy of SP1 at the regulatory
   elements of these genes in independent metastatic cancer cells
   (MDA-MB-231) but not in non-metastatic (HCC1806) cancer cells
   (Fig. [208]9E).

   We extended our analysis by investigating WNT pathway activity in
   metastatic tumors. First, we injected metastatic breast cancer cells
   (MDA-MB-231) into mice and collected tumor samples post-metastasis.
   Staining for WNT7B and DVL1 in these tumors revealed their high
   expression levels and a strong co-localization (Supplementary S8B).
   Furthermore, RNA-seq analysis in a large cohort revealed that the
   expression of SP1-target WNT pathway genes were significantly elevated
   in metastatic tumors compared to the controls (Fig. [209]9F). These
   observations were further validated by immunohistochemistry in
   colorectal cancer patient samples, where DVL1 and WNT7B exhibited
   significantly higher and more homogeneous expression in metastatic
   tissues compared to non-metastatic ones (Fig. [210]9G, H).

   Lastly, we aimed at testing whether such SP1-driven metastatic
   behaviour can be disrupted by pharmacological approaches for
   therapeutic purposes. In our earlier analysis, we had identified
   vorinostat and thioridazine as the top drugs targeting metastatic cells
   in majority of the cancer types (5 of 6 cancer types) in ASGARD
   analysis which ranks FDA-approved drugs against cell populations using
   scRNA-seq datasets (Fig. [211]6B). Furthermore, given the activation of
   WNT signaling in metastatic cells, we also shortlisted three WNT
   pathway inhibitors: Niclosamide and Salinomycin (both FDA-approved),
   and Foxy-5 (currently in Phase 2 clinical trials). Interestingly,
   application of these drugs in two independent highly metastatic cancer
   cell lines, MDA-MB-231 (breast) and HCT116 (colon), severely impaired
   their migratory capacity to varying degrees, with some cell-type
   specific exceptions (Fig. [212]9I, J; Supplementary Figure S8 G-I).
   Furthermore, similar effects were observed for SP1-overexpression
   induced migratory behaviour in normally non-metastatic cancer cells
   (HCC1806) (Fig. [213]9I, J). These findings underscore the essential
   role of the SP1/WNT axis in metastasis and highlight the potential for
   therapeutic targeting of this pathway to inhibit cancer progression.

Discussion

   Our study presents a comprehensive analysis of a 177-gene signature
   that offers significant insights into the metastatic progression of
   various cancer types. This pan-cancer approach underscores the utility
   of a unified genetic framework to understand the complex biology
   underlying metastasis across diverse tumour types and
   microenvironments. The 177-gene signature highlights key molecular
   pathways involved in metastatic dissemination, including those related
   to cell adhesion, migration, and extracellular matrix remodelling.
   Notably, several genes within this signature have previously been
   implicated in metastatic processes in specific cancers, suggesting a
   broader applicability of these molecular mechanisms across multiple
   cancer types.

   The results demonstrate that this gene signature can serve as a
   predictive tool for metastatic potential, providing a valuable resource
   for both basic and translational cancer research. The ability to
   predict metastasis using a common genetic signature could facilitate
   earlier intervention and personalised treatment strategies, potentially
   improving patient outcomes. Moreover, the integration of this gene
   signature with existing clinical parameters could enhance the accuracy
   of prognostic models. Future studies should aim to validate these
   findings in larger, independent cohorts and explore the therapeutic
   implications of targeting these key pathways. Our analysis reveals that
   the metastatic scoring across samples follows a continuous
   distribution, highlighting the nuanced spectrum of metastatic potential
   within individual cells. Utilising cellular dynamics modelling on
   single cells, we observed that cells with lower metastatic scores tend
   to progress towards a higher metastatic state. This trend is consistent
   across various cell and cancer types, suggesting a universal trajectory
   towards metastasis, which presents novel opportunities for therapeutic
   intervention aimed at halting metastatic progression universally across
   cancers. Interestingly, while this shared metastatic fate is common
   among different cell and cancer types, each exhibits a unique
   underlying transcriptional program. Notably, genes such as ANO3 and
   CTHRC1 emerged as novel contributors to pan-cancer metastasis, with
   their expression levels showing a direct correlation. This finding
   underscores the potential for these genes to serve as biomarkers or
   therapeutic targets, warranting further investigation.

   Moreover, our exploration into the transcriptional programs driving
   cellular progression towards metastatic high scored cells identified
   WNT signalling as the primary pathway influencing metastatic
   progression. This pathway's significant role across different cell
   types underscores its potential as a target for broad-spectrum
   anti-metastatic therapies. WNT signalling has been reported to be
   involved in metastatic progression [[214]55, [215]83]. Our findings
   reveal specific WNT signaling networks involved in cell-to-cell
   communication at the single cell level during metastatic progression.
   This insight opens the opportunity to target WNT signalling to disrupt
   these communication networks and prevent metastatic progression.

   We further show that a higher WNT activity is primarily governed by the
   SP1 transcription factor in high metastatic cells. High levels of SP1
   protein have been shown to correlate with tumour cell migration and
   metastasis in a number of tumour models and patient samples, including
   gastric and breast cancers [[216]84–[217]88] and WNT signalling
   activity [[218]89, [219]90]. However, a role for SP1-WNT signalling
   axis in metastatic progression had remained poorly understood, thereby
   highlighting the importance of our study which has clearly filled this
   knowledge gap that applies at the pan-cancer level. Furthermore, our
   in-silico knockout in breast cancer and paired primary and metastatic
   site mouse scRNA-seq datasets showed that a loss of SP1 in high
   metastatic cells could reverse the fates to a low metastatic state.

   Through our detailed mechanistic investigation in vitro and in vivo
   validation, we confirmed that SP1 knockdown in metastatic cells
   MDA-MB-231 and HCT116 resulted in significant suppression of metastatic
   features which accompanied a downregulation of WNT pathway genes such
   as WNT7B, DVL1, and JUN. Noticeably, these genes were strongly bound by
   SP1 in metastatic cells but not in non-metastatic cells. These
   observations were further confirmed in human tissue sections as well as
   RNA-seq of low and high metastatic tumours, hence demonstrating the
   clinical relevance of these findings. Conversely, overexpression of SP1
   in lowly metastatic cancer cells was sufficient to confer a migratory
   behaviour which accompanied upregulation of these markers both at the
   RNA and protein levels.

   Furthermore, clustering analysis confirmed the inhibitory effects of
   SP1 knockdown, while KLF5 knockdown in low metastatic HCC1806 cells
   resulted in the opposite effect, enhancing metastatic gene ontologies
   and upregulating GTPase signalling. GTPase signalling plays a crucial
   role in cancer metastasis. GTPases, such as Ras and Rho, act as
   molecular switches that regulate cellular processes, including
   migration [[220]91]. Activation of GTPases promotes dynamic changes in
   the cytoskeleton, leading to cell protrusions and motility, and Rho
   GTPases control actomyosin contractility [[221]92]. Dysregulation of
   GTPase signalling contributes to enhanced cancer cell migration and
   invasion, promoting metastatic dissemination [[222]93]. Understanding
   the intricate interplay between GTPase signalling pathways and cell
   migration mechanisms is vital for developing targeted therapies to
   impede cancer metastasis and improve patient outcomes. These intriguing
   findings shed light on the complex and context-dependent roles of SP1
   and KLF5 in cancer metastasis. Of note, there have previously been
   links between KLF5 expression and SP1 in breast and prostate cancers,
   linking elevated KLF expression with tumour suppressive functions
   [[223]94]. Further experiments showed that SP1 is critical for cancer
   cell survival, invasive growth and metastatic colonisation,
   highlighting its critical importance in metastasis at the pan-cancer
   level. In conclusion, our integrated analysis of scRNA-seq and
   metastatic scoring data reveals the opposing roles of SP1 and KLF5 in
   regulating metastasis by governing the underlying gene expression
   program in cancer cells. Ultimately, these findings contribute to a
   better understanding of the metastatic process and offer potential
   targets for precision medicine approaches in cancer treatment. Future
   studies aimed at unravelling the downstream targets and crosstalk
   between SP1 and KLF5 could provide valuable insights into novel
   therapeutic strategies for managing metastatic breast cancer.

   Lastly, we set out to explore whether our findings can be used for
   pharmacological targeting to block metastatic activity across cancers.
   Our ASGARD analysis revealed FDA-approved vorinostat and thiaridazine
   as the most effective drugs in targeting metastatic cells in the
   majority of cancer types. Additionally, given our findings on the
   activation of WNT signaling in metastatic cells, we included three WNT
   pathway inhibitors: Niclosamide and Salinomycin (both FDA-approved),
   and Foxy-5 (in Phase 2 clinical trials). Notably, applying these drugs
   to naturally metastatic cancer cells or cancer cells forced to migrate
   with SP1 overexpression markedly reduced their migratory abilities.
   Vorinostat is a histone deacetylase (HDAC) inhibitor, FDA-approved for
   the treatment of cutaneous T cell lymphoma [[224]95, [225]96]. In
   addition, several preclinical and clinical investigations have its
   promising potential in inhibiting metastasis in cancer
   [[226]97–[227]100]. Thiaridazine is a low-potency typical antipsychotic
   but has been shown to have anticancer effects in the brain [[228]101,
   [229]102] and aggressive breast cancer [[230]103, [231]104]. Similarly,
   WNT inhibitors salinomycin, Niclosamide and Foxy-5 have previously been
   known to have anti-tumour activity across cancers [[232]105–[233]109].
   These findings demonstrate the therapeutic potential of these drugs in
   targeting and preventing metastasis in aggressive cancers.

Conclusions

   In this study, we investigated pan-cancer core mechanisms of metastasis
   by performing the largest single-cell transcriptome analysis involving
   over 200 patients with both metastatic and non-metastatic tumours
   across six cancer types. Our findings uncovered a prognostic core gene
   signature that sheds light on the complex cellular dynamics and gene
   regulatory networks that govern metastasis. Specifically, the
   examination of transcription factor networks active at different stages
   of metastasis, along with functional perturbation experiments,
   identified SP1 and KLF5 as key regulators at critical transition steps
   of this process. SP1 acts as a driver of metastasis, while KLF5
   functions as a suppressor across multiple cancer types at defined time
   points of the metastatic transition. In vivo and in vitro
   loss-of-function studies of SP1 in cancer cells demonstrated its
   critical role in promoting cancer cell survival, invasive growth, and
   metastatic colonisation. Moreover, our results showed that as
   metastasis advances, tumour cells and their microenvironment
   increasingly communicate through WNT signalling, a process that is
   driven by SP1. Supporting these findings, a drug repurposing analysis
   identified several FDA-approved drugs with potential anti-metastasis
   properties, including inhibitors of WNT signalling, effective across
   various cancer types. These findings mark a groundbreaking advancement
   in cancer research by unveiling the core gene regulatory circuitry
   driving metastasis conserved across various cancers and discovering
   novel therapeutic avenues.

Methods

Processing and annotation of scRNA-seq

   Single-cell RNA sequencing data obtained through droplet-based
   10 × Genomics technology was selectively utilised for meaningful
   comparisons in the analysis. Raw count matrices consisting of Unique
   Molecular Identifiers (UMIs) and corresponding cell metadata were
   aggregated from diverse sources (Supplementary Table 1).

Quality control

   Preliminary quality control procedures were conducted for each
   individual dataset using the Seurat R package (v4.0). Cells selected
   for subsequent analysis were those with more than 200 detected genes
   and genes identified in a minimum of 3 cells. To identify and eliminate
   potential cell doublets, scDblFinder v1.6.0 was applied with default
   parameters. Cells manifesting a mitochondrial transcript content
   exceeding 20% were also excluded to mitigate confounding effects. To
   identify samples with fewer than 100 malignant cells we utilised
   inferCNV with default parameters.

Normalisation and transformation

   Normalisation was achieved by scaling UMI counts with respect to
   library size, followed by a log transformation to stabilise variance.
   Subsequently, the dataset underwent Principal Components Analysis (PCA)
   to reduce dimensionality.

Clustering and visualisation

   Utilising the first 30 principal components, a nearest neighbour graph
   was constructed, facilitating subsequent clustering via the Louvain
   algorithm at a lower resolution (FindClusters, resolution = 0.2). For
   illustrative purposes, uniform manifold approximation and projection
   (UMAP) embeddings were produced based on 30 principal components,
   enabling effective visualisation.

Cell Identity annotation

   SingleR v1.6.1 in conjunction with the Human Primary Cell Atlas from
   the celldex R package (v1.2.0) was employed to accurately annotate
   individual cell identities within each sample. Importantly, the
   references for annotation encompassed expected cell types within the