Abstract Rectal neuroendocrine neoplasms (rNENs) are among the most frequent gastrointestinal neuroendocrine neoplasms and pose a serious challenge for clinical management. The size of the primary neoplasm is considered to be the most important predictor of disease progression, but the genetic alterations that occur during the progression of rNENs remain unknown. Here, we performed a comprehensive whole-exome sequencing study on 54 tumor-normal paired, formalin-fixed paraffin-embedded specimens from patients locally diagnosed with rNENs. Of these, 81.5% (n = 44) were classified as small-sized (≤2 cm) rNENs, while the remainder (18.5%, n = 10) were classified as large-sized (>2 cm) rNEN samples. Comparative analysis revealed marked disparities in the mutational landscape between small- and large-sized rNEN samples, and between large-sized rNEN samples with or without lymph node metastases. The high-confidence driver genes RHPN2, MUC16, and MUC4 were significantly mutated in both small- and large-sized rNEN specimens, whereas mutations in MAN2A1, and BAG2 were only identified in large-sized specimens diagnosed with lymph node metastases. Correspondingly, we observed that the mTOR and MAPK pathways were preferentially enriched in the large-sized rNEN specimens. Signature-based analysis revealed that mutational processes associated with defective DNA base excision repair (SBS30) significantly accumulated in large-sized rNEN samples with lymph node metastases, highlighting the important role of this mutagenic process in promoting rNEN progression. We further found that most rNEN subjects, regardless of tumor size, harbored at least one alteration with targeted therapeutic implications. Taken together, these results elucidate the genetic features associated with tumor size and lymphatic metastasis in rNEN patients, which will deepen our understanding of the genetic changes during rNEN progression and potentially directing improvements in rNEN treatment strategies. Subject terms: Rectal cancer, Genetics research Introduction Neuroendocrine neoplasms (NENs) are a spectrum of heterogeneous malignancies characterized by the loss of epithelial tubular gland structures and extensive expression of neuroendocrine markers, including chromogranin A, CD56, synaptophysin and INSM1 [[56]1–[57]4]. NENs can arise from virtually any organ or system in the body but predominantly affect the lung, bronchi, and gastroenteropancreatic system [[58]5, [59]6]. According to data from the Surveillance, Epidemiology, and End Results (SEER) registry, the distribution of primary tumors differs by ethnicity, with the lung being the most common site in White individuals, and the rectum being the most common in Asian patients [[60]7]. Rectal NENs (rNENs), which account for 12%–27% of all NENs, have exhibited the greatest increase in incidence among all gastroenteropancreatic NENs in recent decades, probably owing to the widespread use of colonoscopy as a screening tool [[61]8]. rNENs are usually characterized as well-differentiated tumors exhibiting a slow growth associated with a good prognosis. However, regional lymph node metastases and distant metastases are present in 16–40% of patients at the moment of diagnosis. In these cases, the mean survival decreases to a minimum of 5 months [[62]8, [63]9]. Tumor size is considered to be the most important predictor of disease progression in patients with rNENs [[64]10]. Matsuhashi et al. revealed that rNENs sized <10 mm and without lymphovascular invasion could be considered benign and curatively treated by endoscopic resection. Tumors sized between 1 and 2 cm in diameter have a 10%–15% probability of lymph node and distant metastases. However, rNENs sized >2 cm carry a risk of metastasis equivalent to that of rectal adenocarcinomas, with a 64% probability of lymph node metastases and a 45% probability of distant metastases [[65]11]. Likewise, a single-center retrospective study reported that 292 cases of rNENs (diameter ≤2 cm) with endoscopic resection had no recurrence or metastasis after 0.5–5 years’ postoperative follow-up. However, among the remaining 20 cases, which had diameters >2 cm, 8 patients were diagnosed with intrahepatic metastasis, and 4 of these died due to extensive liver metastasis within the next two years [[66]12]. Several whole-genome sequencing (WGS) and whole-exome sequencing (WES) studies have been performed to elucidate the genetic features of NENs [[67]13–[68]16]. However, these studies mostly focused on pancreatic or gastrointestinal NENs, with very limited sample sizes of rNENs in the study cohorts, leading to inadequate understanding of the biology of rNENs. Moreover, in all of the above studies, the colonic NEN and rNEN samples were grouped together, although it is becoming clear that rNENs differ from colonic NENs in malignancy, differentiation, and disease phenotype [[69]8, [70]17]. To investigate the mutational landscape during the progression of rNENs, we herein performed WES of tumor-normal paired, formalin-fixed paraffin-embedded (FFPE) specimens from 54 patients locally diagnosed with rNENs, 81.5% (44/54) of which were classified as small-sized rNEN samples (≤2 cm), while the rest (18.5%, 10/54) were classified as large-sized rNEN samples (>2 cm). Among them, 5.6% (n = 3) of the small-sized rNEN patients and 70% (n = 7) of the large-sized rNEN patients had lymph node or distant metastases. By applying a series of well-established bioinformatic algorithms, we were able to profile the genetic alterations, mutational signatures, and copy number aberrations between small- and large-sized rNEN samples, and between large-sized rNEN samples in the absence or presence of lymph node metastases, and further explore their potential clinical significance. Collectively, our findings will deepen our understanding of the genetic features associated with tumor size and lymphatic metastasis during the progression of rNENs and provide a valuable information source for identification of promising biomarkers, therapeutic targets, and prognostic determinants for this disease. Materials and methods Sample collection FFPE samples were obtained from 54 enrolled rNEN patients who underwent endoscopic surgery at Shenzhen Hospital, Southern Medical University from December 2015 to April 2023. No statistical methods were used to predetermine sample size. For sample enrollment, the patients that were diagnosed as rNEN with histological confirmed to be rNEN were enrolled, following the inclusion criteria [[71]1]: between 18 and 80 years old [[72]2]; received resection of the primary tumor in Shenzhen Hospital, Southern Medical University [[73]3]; enough tumor sample for sequencing [[74]4]; no prior chemotherapy, radiotherapy, immunotherapy or other anti-tumor treatment at sampling point [[75]5]; no history of other malignancy. Patients diagnosed with colonic NEN were excluded from this study. rNEN was firstly diagnosed by two independent experienced endoscopists. The hematoxylin and eosin-stained tumor sections from each sample were then examined by two independent pathologists to confirm the diagnosis and histopathological features, including tumor size, mitotic count, tumor grade, depth of invasion, lymphatic invasion, and status of the resection margin. According to the diameters of primary tumors, these samples were classified into rNEN-S and rNEN-L groups. FFPE tumor samples were used for whole-exome sequencing with corresponding adjacent normal tissues used as control. Tumor samples with more than 10% of tumor content were eligible for sequencing. Matched adjacent noncancerous samples were defined as tissues with normally histological morphology and containing no tumor cells. All protocols were approved by the Ethic Committee of Southern Medical University (NYSZYYEC20190013), and written informed consent were obtained from all study participants for the use of tissue samples. DNA extraction and library construction For FFPE samples, genomic DNA was extracted from ten tumor or adjacent noncancerous sections of 5 μm using the QIAamp DNA FFPE Tissue Kit (Qiagen, Hilden, Germany) according to the manufacture’s instruction. DNA quantity was assessed by Nanodrop 2000 spectrophotometer and Qubit 2.0 Fluorometer with Quanti-IT dsDNA HS Assay Kit (Thermo Fisher Scientific, MA, USA). At least 10 ng DNA was prepared for sequencing libraries. Library construction was then performed using a custom 53 M length capturing probe (Integrated DNA Technologies, Coralville, IA, USA). Samples with a DNA library concentration ≥20 ng/uL were then subjected to sequencing. Captured libraries were then pair-end sequenced with Geneplus-2000 sequencing platform (Geneplus, Beijing, China), based on MGI DNBSEQ-G400 sequencer ([76]https://en.mgi-tech.com/products/) which utilized DNA nano-ball (DNB) preparation technology and fluorescent signal detection for base calling. Data quality control The original fluorescent image files obtained from T7 platform are transformed to short reads (Raw data) by base calling and these short reads are recorded in FASTQ format, which contains sequence information and corresponding sequencing quality information. Quality control is applied to exclude sequence artifacts, including reads containing adapter contamination, low-quality nucleotides and unrecognizable nucleotides (N), thus, to guarantee the meaningful downstream analysis. The steps of data processing were as follows: (a) Discard a paired reads if either one read contains adapter contamination (>10 nucleotides aligned to the adapter, allowing ≤10% mismatches); (b) Discard a paired reads if more than 5 bases are uncertain in either one read; (c) Discard a paired reads if the proportion of low quality (Phred quality < 5) bases is over 40% in either one read. Alignment and somatic mutation calling for genomes For each patient, valid sequencing data including tumor and matched normal samples were mapped to the reference human genome (GRCh37/hg19) by Burrows-Wheeler Aligner (BWA, version: 0.7.17) software [[77]18] to get the original mapping results stored in BAM format. If one or one paired read(s) were mapped to multiple positions, the strategy adopted by BWA was to choose the most likely placement. If two or more most likely placements presented, BWA picked one randomly. Then, SAMtools [[78]19] (version: 1.9) and GATK ([79]https://github.com/broadinstitute/gatk, version: 4.1.2) were used to sort BAM files and do duplicate marking, local realignment, and base quality recalibration to generate final BAM file for computation of the sequence coverage and depth. Duplicate reads were uninformative and shouldn’t be considered as evidence for variants. BAM files from both tumor and matched normal samples were used to detect somatic SNV and Indel by GATK Mutect 2 (version: 4.1.2), and GATK AllelicCNV (version: 4.1.2) was used to detect somatic CNV. VEP (version: 93.7) is performed to do annotation for VCF (Variant Call Format) file obtained in the previous step. The variant position, variant type, conservative prediction, and other information are obtained at this step through a variety of databases, such as dbSNP, 1000 Genome, esp6500, GnomAD, CADD, HGMD and COSMIC, and so on. Since we are interested in exonic variants, gene transcript annotation databases, such as Consensus CDS, RefSeq, Ensemble and UCSC, are also applied for annotation to determine amino acid alternation. In this step, the databases mainly include GO, KEGG, Reactome, Biocarta, PID, and so on. Somatic mutation filtering Germline variants were filtered from the Genome Aggregation Database (gnomAD). A stringent downstream filter comprised of the following criteria was used to obtain high quality somatic variants: a minimum of 8× coverage; Variant Allele Fraction (VAF) ≥ 5% and at least 5 variant supporting reads in the tumor sample, and VAF < 1% in the non-tumor sample; strand bias % 0.95; After that, mutations in the non-coding regions (30UTR, 50UTR, Intron, gene intergenic etc.) were removed. HC-SMG identification To detect recurrent mutated gene, a multi-tool approach was implemented to provide a comprehensive analysis. We used four computational tools: dNdScv [[80]20], MutSigCV [[81]21], OncodriveCLUST [[82]22] and OncodriveFML [[83]23]. The rational of each tool was described as follow: The dNdScv R package is designed to compare the rate of non-synonymous (dN) to synonymous (dS) mutations to identify genes under positive selection in cancer, and is particularly useful for datasets ranging from a few samples to thousands, taking into account background mutation rates and various genomic features to increase accuracy. MutSigCV is a classical and well-known frequency method that has been cited thousands of times in cancer genomics. It is a statistical method that identifies significantly mutated genes by considering the mutational landscape of a cancer genome, which is effective for whole genome or whole exome sequencing data and is particularly powerful for identifying cancer driver genes in large-scale studies. OncodriveCLUSTL is a sequence-based clustering algorithm to detect significant clustering signals across genomic regions, which is designed to exploit the feature that mutations in cancer genes, especially oncogenes, often cluster in particular positions of the protein. OncodriveFML is designed to analyze the pattern of somatic mutations across tumors in both coding and non-coding genomic regions to identify signals of positive selection. It is capable of identifying driver mutations in protein-coding genes, promoters, untranslated regions, intronic splice regions, and lncRNAs. These tools were run with the default parameters. SMGs identified by ≥2 tools were considered HC-SMGs. Putative driver genes were searched by mapping these HC-SMGs to the COSMIC Cancer Gene Census [[84]24] and DriverDBv4 [[85]25] databases. Tumor mutational burden (TMB) and microsatellite instability (MSI) analysis TMB was defined as the number of non-synonymous somatic mutations (including base substitutions and indels) in the coding region. In order to calculate the TMB, the total number of mutations counted was divided by the size of the coding sequence region of the Agilent SureSelect Human All Exon V6. Msisensor (v0.5) ([86]https://github.com/ding-lab/msisensor) is used for microsatellite instability analysis which counted the number of known microsatellite loci that were altered by somatic insertion or deletion. Recurrently mutated pathway analysis ClusterProfiler (version 3.12.0) [[87]26] was used to analyze the enrichment of mutated SMGs. Mutated genes were compared with the Kyoto Encyclopedia of Genes and Genomes (KEGG) ([88]http://www.genome.ad.jp/kegg/) database to determine the altered pathways. The p values of KEGG pathway enrichment were calculated based on hypergeometric distribution with false discovery rate (FDR) correction using the Benjamini and Hochberg method. Representative key signaling pathways with an FDR-corrected p value < 0.05 were exhibited for these rNEN samples. Mutational signature analysis Mutation signatures were jointly inferred with the software of MutationalPatterns ([89]https://bioconductor.org/packages/release/bioc/html/MutationalPatt erns.html, version: 1.10.0). The 96 mutational vectors (or contexts) generated by somatic SNVs based on six base substitutions (C > A, C > G, C > T, T > A, T > C, and T > G) within 16 possible combinations of neighboring bases for each substitution were used as input data, then rectified by the FFPEsig algorithm to exclude the formalin-induced artefacts [[90]27]. Somatic copy number alteration (SCNA) analysis For each tumor, arm-level and focal CNVs were detected and analyzed by GATK (version 4.0) and GISTIC (version 2.0). Genes encompassed by focal CNVs were also inferred using GISTIC (version 2.0). Significant somatic CNVs were analyzed in a group-wise fashion using GISTIC (version 2.0). Somatic CNV events of each sample were also obtained using GISTIC (version 2.0) to reveal recurrent CNV events, where P value was calculated by Fisher’s exact test. The sample-specific CNV data were also used to calculate CNV burden of each sample. A burden score was given to each CNV event based on the amplitude of the log2 copy number ratio of the varied region. Analysis of putative clinically relevant alterations Somatic SNV/Indel and CNVs were analyzed by Precision Heuristics for Interpreting the Alteration Landscape (PHIAL) software [[91]28] (version 1.0.R) ([92]https://github.com/ vanallenlab/phial) with default parameters and database. Statistical analysis Statistical analysis and data visualization were performed using R software (v 4.1.2) or Graphpad Prism (version 9.5.0, GraphPad Software, La Jolla, CA, USA). Two-tailed Wilcoxon rank-sum test was used to analyze the mutational frequencies of HC-SMGs and the contributions of mutational signatures between rNEN-S and rNEN-L samples, and between rNEN-L-M and rNEN-L-N samples. The statistical significance of KEGG presented pathways in rNEN samples using the SMGs list were also calculated by two-tailed Wilcoxon rank-sum test. P values presented in Figs. [93]S1 and [94]S2 were calculated by Student’s t-test. Statistical significance was defined as a two-sided p < 0.05. Results Overview of the clinicopathological characteristics of the included rNEN patients The study included a total of 54 patients histopathologically diagnosed as having a rNEN, comprising 28 (51.9%) male and 26 (48.1%) female patients with median ages of 47 (quartile [Q][1]–Q[3]: 39–54) and 50 (Q[1]–Q[3]: 45–60) at diagnosis, respectively. Among the subjects, 81.5% (n = 44) had small- or moderate-sized tumors (diameter < 1 cm or 1–2 cm), denoted as rNEN-S, whereas the remainder (18.5%, n = 10) were diagnosed with larger sized neoplasms (diameter >2 cm), denoted as rNEN-L. The tumor sizes of the rNEN-L patients were significantly larger than those of their small-sized counterparts (Figure [95]S1A). Pathologically, the majority of the rNEN-S specimens (98.1%, n = 43) were diagnosed as being at the G1 stage according to the latest World Health Organization (WHO) classification [[96]5]. In contrast, most of the rNEN-L specimens (90%, n = 9) were classified as being at the G2 or G3 stages. Neuroendocrine carcinoma samples were not included in this study. Consistent with previous reports that tumor size was an important predictor of disease progression [[97]29, [98]30], we found that only 5.6% (n = 3) of the rNEN-S patients had lymph node metastases, whereas 70% (n = 7) of the rNEN-L subjects had lymph node or distant metastases, at the time of biopsy. The clinicopathological characteristics of the rNEN subjects were summarized in Table [99]S1. Figure [100]S1B showed representative colonoscopy and hematoxylin and eosin (H&E) images of rNEN samples. Identification of high-confidence Significantly Mutated Genes (HC-SMGs) and their frequencies of loss of heterozygosity (LOH) To investigate the mutational differences between the rNEN-S and rNEN-L populations, we extracted DNA from all 54 FFPE rNEN samples and their corresponding nonneoplastic tissues for WES using paired-end procedures, to a median mean read coverage of 262× (Q[1]–Q[3]: 150–350×) and 90× (Q[1]–Q[3]: 73–119×), respectively (Table [101]S2). After strict data processing (see filter criteria in the “Method” section), we obtained a total of 32,959 single-nucleotide variants (SNVs; median: 83; Q[1]–Q[3]: 24–222), 5279 small insertions and deletions (InDels; median: 30; Q[1]–Q[3]: 9–114), and 114,216 structural variants (SVs; median: 1,130; Q[1]–Q[3]: 619.5–3150) in the rNEN-S samples. Meanwhile, from the rNEN-L population, we obtained 7,093 SNVs (median: 97.5; Q[1]–Q[3]: 67–369), 883 InDels (median: 33.5; Q[1]–Q[3]: 26–74), and 114,216 SVs (median: 580; Q[1]–Q[3]: 421–794). Similar to the moderate accumulation of somatic mutations in other NEN cohorts [[102]14], we detected an overall low tumor mutational burden (TMB) in the rNEN-S (median: 0.33 somatic mutations per megabase [Mb], Q[1]–Q[3]: 0.165–1.44) and rNEN-L (median: 0.495 somatic mutations per Mb, Q[1]–Q[3]: 0.18–1.86) specimens (Fig. [103]1A). No significant difference in TMB was observed between these two cohorts (p = 0.918, Wilcoxon rank sum test) (Fig. [104]S2A). MSIsensor was applied to determine the status of microsatellite instability (MSI) in these 54 subjects [[105]31], which revealed that 61.4% of the rNEN-S samples were MSI-high (MSI-H), whereas 40% of the rNEN-L samples were MSI-H (Fig. [106]S2B). As anticipated, we detected significantly higher TMB in the MSI-H samples than in the microsatellite stable (MSS) samples in both the rNEN-S (p = 0.04, Wilcoxon rank sum test) and rNEN-L (p = 0.014, Wilcoxon rank sum test) populations (Fig. [107]S2C). Somatic mutations in DNA mismatch repair genes were identified in 4.5% and 20% of the rNEN-S and rNEN-L specimens, respectively (Fig. [108]1A). Whole-genome duplication (WGD) is a macroprogressionary genetic alteration that promotes tumorigenesis by inducing genomic instability [[109]32]. We therefore calculated the frequencies of WGD in these two populations and found that 52.3% (23/44) of the rNEN-S and 60% (6/10) of the rNEN-L samples harbored WGD (Fig. [110]1A), however, no statistically significant difference was detected between the two groups (p = 0.736, Wilcoxon rank sum test). Moreover, we found no significant difference in TMB between the WGD^+ and WGD^- samples in the rNEN-S (p = 0.776, Wilcoxon rank sum test) patients or in the rNEN-L (p = 0.286, Wilcoxon rank sum test) subjects (Fig. [111]S2D). Fig. 1. Identification of HC-SMGs in rNENs. [112]Fig. 1 [113]Open in a new tab A SMGs identified by ≥2 algorithms were shown. Driver genes significantly mutated in both rNEN-S and rNEN-L samples were labeled as red color, and driver genes only mutated in rNEN-L specimens were labeled as green color. Top: bars represent the number of altered genes in each rNEN sample. Left: Algorithms that were used to identify the SMGs. Right: Mutation status of the SMGs ranked by mutation frequency in each rNEN sample. P values of the SMGs were calculated by the MutSigCV. Bottom: The clinicopathological data, MSI scores, TMB, somatic MMR gene mutations, and WGD events are displayed. B Mutation frequencies of 40 top HC-SMGs in rNEN-S and rNEN-L samples. P values were calculated by Wilcoxon rank sum test. *p < 0.05; **p < 0.01. C Lollipop plot showed the distribution of somatic mutations in MUC16 gene in rNEN-S and rNEN-L samples, respectively. The Ven diagram showed the overlapped hotspots in MUC16 between rNEN-S and rNEN-L specimens. To identify genomic alterations that are likely to be linked to tumorigenesis, we applied an integrative screening approach to unbiasedly identify significantly mutated genes (SMGs) using four algorithms, namely MutSigCV (a frequency-based method of estimating highly recurrent mutations within a gene), OncodriveCLUSTL (a sequence-based clustering algorithm for detecting known clusters and bona-fide cancer drivers across genomic regions) [[114]22], OncodriveFML (a method designed to identify signals of positive selection) [[115]23], and dNdScv (a statistical model for predicting variable mutation rates across the genome) [[116]20] (Fig. [117]S3A, Table [118]S3). The number of SMGs identified by each tool was shown in Fig. [119]S3B. Only candidate genes identified by two or more algorithms were considered HC-SMGs. In this way, we obtained 49 HC-SMGs in the 54 rNEN samples (Fig. [120]1A), among which RHPN2 (17%), MUC19 (13%), MUC4 (11%), MUC16 (11%), ZNF721 (7%), ZNF493 (7%), MCFD2 (6%), STAG1 (6%), ARHGEF26 (6%), MAD2L1BP (6%), ZNF695 (6%), HLA-DRB5 (6%), ZNF430 (6%), NBPF14 (6%), STC1 (6%), ANKRD18B (6%), OR2T8 (6%), SMC4 (6%), and MAN2A1 (6%) exhibited moderate to high mutation prevalence (>5%). Next, we searched for putative driver genes by mapping these HC-SMGs to the COSMIC Cancer Gene Census [[121]24] and DriverDBv4 [[122]25] databases and found that a total of 15 previously known tumor suppressors and oncogenes were enriched in somatic aberrations in these rNEN patients (Fig. [123]1A). Among them, MUC16 has been previously reported as a driver gene in pancreatic and midgut-derived NENs [[124]14]. In addition, Duan et al. recently showed that MUC16 was recurrently mutated in 55.3% of their cohort of rectal NEN patients [[125]33]. The previously unreported putative drivers comprised RHPN2 (17%), MUC4 (11%), ZNF493 (7%), STAG1 (6%), ARHGEF26 (6%), ZNF695 (6%), HLA-DDRB5 (6%), ZNF430 (6%), ANKRD18B (6%), SMC4 (6%), MAN2A1 (6%), PJA1 (4%), KLHL5 (4%), and BAG2 (2%). We also detected several genes enriched in non-synonymous mutations that were previously proposed as driver genes in other NEN cohorts, such as MEN1, KMT2C, BPTF, ATRX, and ARID1A [[126]13, [127]14, [128]16]. However, each of these genes was only identified by one algorithm, so they were not included in our analysis. Next, we compared the mutational frequencies of these driver genes in rNEN-S and rNEN-L populations and found that RHPN2, MUC16, and MUC4, which were all among the most prevalent in the list of HC-SMGs, were highly mutated in both the rNEN-S (RHPN2: 13.64%; MUC16: 6.82%, MUC4: 11.36%) and rNEN-L (RHPN2: 30%; MUC16: 30%, MUC4: 10%) populations, with no statistically significant difference in their mutation rates between the two populations (RHPN2: p = 0.06; MUC16: p = 0.07; MUC4: p = 0.107; Wilcoxon rank sum test) (Fig. [129]1B). RHPN2 (rhophilin Rho GTPase binding protein 2), encoding a member of the rhophilin family of Ras-homologous (Rho)-GTPase binding proteins, is frequently mutated in several cancer types, such as lung cancer [[130]34], hepatocellular carcinoma [[131]35], and ovarian cancer [[132]36]. MUC16 (mucin 16, cell surface-associated), previously known as carbohydrate antigen 125 (CA125), is a mucin marker that is among the most frequently mutated genes in human cancer tissues [[133]37, [134]38]. MUC4 (mucin 4, cell surface-associated) is a transmembrane glycoprotein that is implicated in the pathogenesis of various human cancers [[135]39]. Importantly, we found that several driver genes, including SMC4, MAN2A1, and ZNF493, were significantly mutated in the rNEN-L samples (SMC4: 30%, p = 0.005; MAN2A1: 30%, p = 0.005; ZNF493: 30%, p = 0.017; Wilcoxon rank sum test) than in the rNEN-S samples (SMC4: 0%; MAN2A1: 0%; ZNF493: 2.27%) (Fig. [136]1C), suggesting that these genes were associated with the tumorigenesis or progression of rNEN-L. SMC4 (structural maintenance of chromosomes 4) is closely related to the occurrence, progression, immune infiltration, and prognosis of various tumor types [[137]40, [138]41]. MAN2A1 (mannosidase alpha class 2A member 1) encodes lysosomal alpha-d-mannosidase and is associated with the recurrence of multiple cancers [[139]42]. ZNF493 (zinc finger protein 493) was reported to be frequently mutated in colorectal cancer [[140]43]. As mentioned above, we detected lymph node or distant metastases in 70% (n = 7) of the rNEN-L patients (referred to as rNEN-L-M), while the remainder (30%, n = 3) were diagnosed as having no metastatic lymph nodes (referred to as rNEN-L-N) (Fig. [141]1A, Table [142]S1). To further investigate the mutational discrepancy in these rNEN-L patients with or without lymphatic metastases, we compared the mutation frequencies of these HC-SMGs in rNEN-L-M and rNEN-L-N samples (Fig. [143]S3C), and revealed that HC-SMGs such as MUC19 (57.1%, p = 0.29) and MAN2A1 (42.9%, p = 0.475) were exclusively mutated in rNEN-L-M samples, whereas HC-SMGs, including ANKRD18B (33.3%, p = 0.3), ZNF430 (33.3%, p = 0.3), OR2T8 (33.3%, p = 0.3), HLA-DRB5 (33.3%, p = 0.3), and ZNF695 (33.3%, p = 0.3) were only mutated in rNEN-L-N specimens. In contrast, we observed that HC-SMGs, including MUC16 (rNEN-L-M: 14.3%, rNEN-L-N: 66.7%, p = 0.18), SMC4 (rNEN-L-M: 28.6%, rNEN-L-N: 33.3%, p = 0.33), RHPN2 (rNEN-L-M: 28.6%, rNEN-L-N: 33.3%, p = 0.33), ZNF493 (rNEN-L-M: 28.6%, rNEN-L-N: 33.3%, p = 0.33), and ZNF721 (rNEN-L-M: 14.3%, rNEN-L-N: 33.3%, p = 0.57) were highly mutated in both rNEN-L-M and rNEN-L-N subjects, with no statistical difference in their mutation rates between the two populations. We then calculated the frequency of LOH for each HC-SMG. As expected, RHPN2, MUC4, and MUC16, which were among the most highly mutated genes in the total cohort of rNEN samples, had high frequencies of mutation combined with LOH in both the rNEN-S and rNEN-L patients, whereas SMC4, MAN2A1, and ZNF493 had high frequencies of LOH only in the rNEN-L subjects (Table [144]S4). Mutational hotspots can help to identify cancer drivers and druggable targets [[145]44]. We therefore explored recurrent point mutations in these rNEN samples using an algorithm developed by Chen et al. [[146]45], by which we identified 46 substitution hotspots in six genes in the rNEN-S specimens and 23 hotspots in four genes in the rNEN-L specimens. Interestingly, we detected a cluster of hotspot mutations distributed in the MUC16 gene in both the rNEN-S (n = 19) and rNEN-L (n = 7) samples, while none of these mutations were simultaneously detected in both the populations (Table [147]S5). The loci of individually divergent mutations located in different functional domains of MUC16, MUC4, RHPN2, SMC4, and MAN2A1 in the rNEN-S and rNEN-L populations are presented in Fig. [148]1C and Fig. [149]S3D. Key signaling pathways affected by somatic mutations Somatic mutations are known to target common cancer signaling pathways. To compare the mutational landscape between the rNEN-S and rNEN-L samples in a pathway-centric manner, we annotated the SMGs identified in these two populations with cancer hallmark pathways. We found that several pathways, including focal adhesion (p = 0.006) and spliceosome pathways (p = 0.0003), were significantly mutated in 43.18% (19/44) and 36.36% (16/44), respectively, of the rNEN-S samples, while pathways including mTOR (p = 0.004), Jak-STAT (p = 0.04), MAPK signaling (p = 0.01), and N-glycan biosynthesis (p = 0.03) were preferentially enriched in 40% (4/10), 40% (4/10), 50% (5/10), and 40% (4/10), respectively, of the rNEN-L specimens (Fig. [150]2). Fig. 2. Recurrently mutated key signaling pathways in either rNEN-S or rNEN-L samples. [151]Fig. 2 [152]Open in a new tab A, B Focal adhesion (hsa04510) and spliceosome (hsa03040) pathways were recurrently mutated in 43.18% and 36.36% of rNEN-S samples, respectively. C–F mTOR (hsa04150), N-Glycan biosynthesis (hsa00510), Jak-STAT (hsa04630), and MAPK (hsa04010) signaling pathways were recurrently mutated in 40%, 40%, 40% and 50% of rNEN-L specimens, respectively. Boxes with different colors show the fractions of rNEN-S (Left) and rNEN-L (Right) samples with alterations in these genes. We further mapped SMGs identified in rNEN-L-M and rNEN-L-N populations to cell signaling pathways, and revealed that several key biological processes affected by somatic mutations, including citrate cycle (p = 0.009), mTOR (p = 0.009), N-Glycan biosynthesis (p = 0.02), and insulin signaling pathway (p = 0.0009) were selectively altered in 42.9% (3/7), 57.1% (4/7), 57.1% (4/7), and 85.7% (6/7), respectively, of the rNEN-L-M samples (Fig. [153]S4). Mutational signatures corrected by the FFPEsig algorithm To better understand the mutational processes operative in the rNEN-S and rNEN-L patients, we performed de novo mutational signature analysis using the SigProfilerExtractor tool to extract the proportions of the 96 possible trinucleotides previously described in Alexandrov’s mutation classification (COSMIC v3) [[154]46]. We further rectified the formalin-induced artefacts in these FFPE-derived DNA samples using the FFPEsig algorithm [[155]27] (Fig. [156]S5A). Finally, we obtained five base-substitution signatures in the rNEN-S and rNEN-L subjects that may contribute to rNEN mutagenesis, denoted as Signature 1 to Signature 5 (Figure [157]S5B). All of these signatures were strongly correlated with previously known mutational signatures (Fig. [158]3A). In the rNEN-S samples, the main etiology of mutagenesis was Signature 5 (an ultraviolet light exposure-related signature, SBS7d), presenting in 54.5% of the specimens. Signature 4 (associated with possible sequencing artefact, SBS46), Signature 2 (associated with defective DNA base excision repair due to NTHL1 mutations, SBS30), Signature 3 (associated with spontaneous deamination of 5−methylcytosine, SBS1), and Signature 1 (associated with possible sequencing artefact, SBS54) were present in 45.5%, 52.3%, 38.6%, and 31.8% of the patients, respectively (Fig. [159]3B). In contrast, in the rNEN-L patients, we detected Signatures 3–5 (associated with SBS5, a flat signature related to the age of individuals) [[160]47], Signature 2 (associated with spontaneous deamination of 5-methylcytosine, SBS1), and Signature 1 (associated with defective DNA base excision repair due to NTHL1 mutations, SBS30) in 70%, 40%, and 20% of the patients, respectively (Fig. [161]3B). Fig. 3. Corrected mutational signatures in rNENs. [162]Fig. 3 [163]Open in a new tab A Mutational signatures were extracted by decomposing matrix of base substitutions, and then rectified by the FFPEsig algorithm. The most similar validated signatures in rNEN-S and rNEN-L samples were shown and cosine-similarities were calculated to identify best match. B Bar plots showed the identified mutational signatures and their relative contributions to somatic mutations detected in either rNEN-S or rNEN-L samples. C Contribution of SBS30, SBS5, SBS1, and SBS7d in rNEN-S and rNEN-L samples were shown. P values were calculated by Wilcoxon rank sum test. *p < 0.05; **p < 0.01. SBS1, caused by the deamination of 5methylcytosine, was simultaneously detected in the rNEN-S (contributing 5.69% of total mutations) and rNEN-L (contributing 12.9% of total mutations) samples, while there was no statistically significant difference in its occurrences between the two populations (p = 0.405, Wilcoxon rank sum test) (Fig. [164]3C). However, we identified that SBS30, one of the dominant signatures, contributed 2.34% and 75.4% of mutations in the rNEN-S and rNEN-L populations, respectively. Significantly higher SBS30 activity in the rNEN-L samples were detected than in the rNEN-S samples (p = 0.041, Wilcoxon rank sum test). Likewise, a higher presence of SBS5 (contributing 25.7% of total mutations) in the rNEN-L patients was observed than in the rNEN-S patients (contributing 1.66% of total mutations, p = 0.01, Wilcoxon rank sum test). Thus, these results suggest that defective DNA mismatch repair and aging were involved in mutagenesis in the rNEN-L patients. We further extracted mutational signatures from rNEN-L-M and rNEN-L-N samples, and corrected them using the FFPEsig algorithm (Fig. [165]S6A). As a result, five base-substitution signatures designated as Signature 1 to Signature 5 were profiled in rNEN-L-M and rNEN-L-N specimens, respectively (Fig. [166]S6B). In rNEN-L-M samples, we detected Signature 1 (associated with SBS30), Signature 2 (associated with SBS1), and Signature 3-5 (associated with SBS5) in 14.3%, and 28.6%, and 57.1% of the subjects, respectively. In contrast, Signature 1 (associated with SBS1), Signature 2 (associated with defective DNA mismatch repair, SBS6), Signature 3 (associated with defective DNA mismatch repair, SBS15), and Signature 4&5 (associated with thiopurine chemotherapy, SBS87) were identified in 33.3%, 33.3%, 33.3%, and 33.3% of the rNEN-L-N patients, respectively (Fig. [167]S6C, D). We also examined the contributions of several predominant signatures in rNEN-L-M and rNEN-L-N populations, and revealed increased SBS30 (p = 0.66, Wilcoxon rank sum test) and SBS1 (p = 0.73, Wilcoxon rank sum test) activities in rNEN-L-M samples compared to their rNEN-L-N counterparts. On the contrary, higher SBS15 (p = 0.09, Wilcoxon rank sum test) and SBS87 (p = 0.06, Wilcoxon rank sum test) activities were preferentially enriched in rNEN-L-N subjects (Fig. [168]S6E). Somatic copy number aberration (SCNAs) Next, we applied the GISTIC2.0 tool to identify the recurrent SCNAs that might contribute to tumorigenesis in the rNEN-S and rNEN-L populations (Fig. [169]4A). In this way, we found a total of 31 focal events, including 6101.6 Mb copy number gains and 3255.8 Mb copy number losses involving 177 amplified and 1492 deleted genes, across the rNEN-S cohort. In contrast, seven focal amplifications, including 3734.4 Mb gains involving 107 amplified genes, were identified in the rNEN-L patients. We did not detect any significant focal deletions in the rNEN-L samples. Five focal amplifications, namely 1q21.1, 1q21.2, 9q12, 15q11.2, and 17q12, were simultaneously captured in both the cohorts. We further pinpointed the driver SCNAs by mapping genes with copy number aberrations to the DriverDBv4 database [[170]25] and identified three genes, namely LRRC37A, ARL17B, and RASA4 with copy number gains were driver SCNAs (Fig. [171]4B). These genes were exclusively detected in the rNEN-S subjects. ZNF658 (Zinc finger protein 658), with a copy number gain, annotated as a potential driver SCNA, was detected in both the rNEN-S and rNEN-L populations. Fig. 4. Landscape of significantly altered somatic CNVs in rNENs. [172]Fig. 4 [173]Open in a new tab A Genomic regions with significantly recurrent somatic CNVs in either rNEN-S or rNEN-L samples. B Overlap of focal amplification regions detected in rNEN-S and rNEN-L subjects. C Pathway enrichment analysis showed the top 10 pathways using genes with a copy number gain or lost in rNEN-S and rNEN-L samples. To further understand the biological function of the genes with copy number aberrations, we searched for over-represented pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database using the list of genes with recurrent SCNAs in rNEN-S or rNEN-L subjects. The top 10 altered signaling pathways (q < 0.05, Wilcoxon rank sum test) are shown in Fig. [174]4C. Pathways related to nucleotide excision repair and Toll-like receptor were selectively enriched using genes with a copy number gain in the rNEN-S cohort. Likewise, we observed that ABC transporters, chemokine signaling and glycerolipid metabolism-related pathways were specifically presented using genes with a copy number loss in the rNEN-S subjects. Functional enrichment of genes with a copy number gain revealed that pathways related to linoleic acid metabolism, VEGF signaling, and ether lipid metabolism were markedly enriched in the rNEN-L cohort. Furthermore, we showed that Ras and chemokine-associated pathways were simultaneously presented in rNEN-S and rNEN-L samples using the amplificated gene shared by these two populations. We then analyzed recurrent SCNV events in rNEN-L-M and rNEN-L-N populations and identified 6 focal amplifications involving 73 genes in rNEN-L-M subjects and 2 focal amplifications involving 12 genes in rNEN-L-N samples (Fig. [175]S7A). The 8p23.1 focal amplification event was identified simultaneously in both populations (Fig. [176]S7B). Functional enrichment analysis of genes with copy number gain showed that several pathways, including olfactory transduction and nitrogen metabolism, were specifically enriched in rNEN-L-M samples. In contrast, Toll-like receptor- and chemokine-related pathways were selectively enriched in rNEN-L-N samples (Fig. [177]S7C). Thus, these results showed that SCNVs in rNEN-L-M subjects had similarities and differences with those in rNEN-L-N patients. Therapeutic implications of somatic mutations We next used the Precision Heuristics for Interpreting the Alteration Landscape (PHIAL) database to assess the therapeutic implications of the somatic alterations profiled in these rNEN subjects [[178]48] and found that a total of 65.9% (29/44) of rNEN-S patients and 60% (6/10) of rNEN-L patients had at least one clinically relevant genetic alteration (Fig. [179]5A). Putatively actionable alterations that could be targeted, together with possible therapies, are shown in Fig. [180]5B. The number of potentially actionable changes varied between the rNEN-S (median: 9, Q[1]–Q[3]: 2–17) and rNEN-L (median: 4.5, Q[1]–Q[3]: 1–10) patients (Fig. [181]5C). The therapeutic implications also differed between the two populations. PDE4DIP was identified as the most common putatively actionable alteration in both the rNEN-S and rNEN-L populations, followed by ARAF, BCL11A, MSH2, WAS, and IL17R (Fig. [182]5D). Compared with the rNEN-S patients, we detected a higher frequency of actionable alterations of CLTCL1 in the rNEN-L patients. Correspondingly, we inferred that rNEN-L patients may benefit from or be resistant to PI3K/AKT/mTOR inhibitors, gefitinib, dasatinib, and CDK4/6 inhibitors, whereas therapeutic implications of sorafenib, PARP, JAK, and HDAC inhibitors were identified in both rNEN-S and rNEN-L patients (Fig. [183]5E). Fig. 5. Clinically relevant somatic alterations in rNENs. [184]Fig. 5 [185]Open in a new tab A Proportions of patients harboring clinically relevant somatic alterations in either rNEN-S or rNEN-L samples. ActMut+ indicates patients with somatic actionable alterations. B Landscape of somatic altered genes and their corresponding putative therapeutic implications in either rNEN-S or rNEN-L samples. Colors of the circles indicate disease groups; sizes of the circles stand for the frequencies of clinically relevant somatic alterations in these genes or their corresponding putative therapeutic implications. C Numbers of clinically relevant somatic alterations in either rNEN-S or rNEN-L samples. D Frequencies of clinically relevant somatic alterations in commonly altered genes in either rNEN-S or rNEN-L samples. E Proportions of patients that might benefit from or resist to specific therapies in either rNEN-S or rNEN-L subjects. Similarly, 71.4% (5/7) of rNEN-L-M patients and 33.3% (1/3) of rNEN-L-N patients were found to have at least one clinically relevant genetic alteration (Fig. [186]S8A). Putative actionable alterations with their indicated therapies are shown in Fig. [187]S8B. The number of actionable alterations varied between rNEN-L-M (median: 5, Q1–Q3: 3–11) and rNEN-L (median: 1) patients (Fig. [188]S8C). We also identified different therapeutic implications in these two populations (Fig. [189]S8D, E). Compared to rNEN-L-N patients, we identified a higher frequency of actionable alterations in PDE4DIP, CLTCL1, BCL11A, and TET2 in rNEN-L-M patients, with corresponding therapeutic implications for RET, PI3K/AKT/MTOR and JAK inhibitors, and so on. In contrast, CRKL was identified as having a higher frequency of putative actionable alterations in rNEN-L-N patients, with therapeutic implications for EGFR inhibitors in these subjects. Discussion rNENs are the most common digestive NENs in several countries and ethnic groups, and their incidence has increased substantially over recent decades [[190]6]. However, the incidence and prevalence of this disease are still underestimated, due to the frequent failure to recognize it by endoscopy. Several predictors of disease progression, metastatic spread, and poor prognosis of rNENs have been proposed, among which the diameter of the primary neoplasm is the most important [[191]10]. However, the genetic differences between small-sized, benign rNENs and their large-sized counterparts remain poorly understood. In this context, we herein performed a comprehensive and in-depth WES study to dissect the genetic diversity between small- and large-sized rNEN samples, and between large-sized rNEN samples in the absence or presence of lymph node metastases. Comparative analysis revealed marked disparities in their mutational landscapes during the progression of this neoplasm. In this study, we adopted an integrative screening approach, mainly referenced from a recent publication on colorectal cancer genomics [[192]49], to identify HC-SMGs that are likely to be involved in the tumorigenesis of rNENs. HC-SMGs, i.e., SMGs identified by multiple bioinformatics algorithms rather than a single algorithm, can be prioritized for further investigation. In this way, 49 HC-SMGs in these rNEN samples were detected by two or more algorithms, including some, such as MUC16, that were also discovered by previous NEN studies [[193]14, [194]33]. Notably, several SMGs profiled in our cohort, such as RHPN2 and MUC4, had never been previously reported, likely due to the large sample size of rNEN patients in this study compared with previously studied NEN cohorts. This discovery implies that the genomic alteration signatures in rNENs are markedly distinct from those in other types of NEN. RHPN2, which was mutated in 13.64% and 30% of our rNEN-S and rNEN-L subjects, respectively, is required for the proliferation and invasion of multiple cancer cells [[195]34] and can be used as a prognostic biomarker for patients with surgically resected colorectal cancer [[196]50]. MUC4, mutated in 11.36% and 10% of our rNEN-S and rNEN-L subjects, has also been revealed to be aberrantly expressed in a variety of epithelial carcinomas and can serve as a clinical tool for cancer diagnosis and prognosis [[197]51]. We further identified that putative driver genes enriched in non-synonymous mutations, including ZNF493, SMC4, and MAN2A1, were substantially mutated in the rNEN-L specimens, but not in the rNEN specimens. Moreover, HC-SMGs such as MUC19, MAN2A1 and HNRNPCL1 were exclusively mutated in rNEN-L patients with lymph node metastases, suggesting their involvement in the increasingly aggressive behavior of this disease. However, further in vitro and in vivo investigation of their functions in rNENs is necessary. To reveal the underlying exogenous or endogenous mutagens driving rNEN tumorigenesis, we characterized and compared the landscapes of mutational processes in small- and large-sized rNEN samples, and large-sized rNEN samples with or without lymph node metastases. These deconvoluted mutational signatures were further rectified using the FFPEsig algorithm to remove artefactual FFPE mutations [[198]27]. We showed that endogenous mutational processes, including defective DNA base excision repair (SBS30) and spontaneous deamination of 5-methylcytosine (SBS1), were enriched in the majority of samples in both the rNEN-S and rNEN-L populations. However, unlike SBS1, which contributed similarly in both the populations, SBS30 was significantly more frequent in the rNEN-L specimens, particularly in rNEN-L samples in the presence of metastatic lymph nodes, suggesting an important role of this mutagenic process in promoting rNEN progression. Likewise, we also found that SBS5, a clock-like signature mutation that increases in number with age, was significantly more frequent in the rNEN-L samples than in the rNEN-S samples, implying that age is a risk factor for rNEN progression. Indeed, the age at diagnosis was higher in the rNEN-L cohort than in the rNEN-S cohort, although this difference was not statistically significant (p = 0.07, unpaired Student’s t test), likely due to the relatively small sample size of the rNEN-L cohort. At the level of SCNV events, we observed several focal amplifications, including 1q21.1, 1q21.2, 9q12, 15q11.2, and 17q12 in both small- and large-sized rNEN samples, whereas the focal amplifications 8q21.2 and 8q23.1 were detected exclusively in rNEN-L patients. By annotating genes with copy number aberrations in the DriverDBv4 database, we were able to identify several genes with high frequency and important SCNVs, including LRRC37A, ARL17B and RASA4, in rNEN-S patients. In contrast, amplification of ZNF658 on 9q12, a potential driver SCNA, was detected in both rNEN-S and lymph node metastatic rNEN-L subjects. Interestingly, pathways related to viral infection were significantly enriched in rNEN-S and rNEN-L patients with or without lymph node metastases, which is reminiscent of numerous publications suggesting the involvement of viral infection in the development of neuroendocrine neoplasms [[199]52–[200]54]. Currently, the choice of therapeutic strategy for rNENs is largely dependent on whether the tumors are resectable or not. For intramucosal rNENs with a diameter <1 cm, endoscopic procedures are considered sufficient and usually associated with desirable long-term results. However, for locally advanced, unresectable neoplasms, pharmacological treatment, including biotherapy and chemotherapy, prevails. Several clinical trials have shown that somatostatin analogues, mTOR inhibitors (everolimus), and VEGF pathway inhibitors (sunitinib) have efficacy in patients with GI or lung NENs [[201]55, [202]56]. In this study, using the PHIAL database, we revealed that most of the rNEN subjects, regardless of their tumor size and lymph node metastases, harbored at least one alteration with targeted therapeutic implications. In line with the results of previous clinical trials, we also showed that both rNEN-S and rNEN-L patients may benefit from VEGF inhibitor therapy (sorafenib) and the feasibility of EGFR inhibitors (gefitinib) and CDK4/6 inhibitors in the treatment of rNEN-L patients. In particular, our findings suggest that rNEN-L patients diagnosed with lymph node metastases may be more sensitive to RET inhibitors (sorafenib, vandetinib) as well as PI3K/AKT/MTOR and JAK inhibitors. However, their effectiveness should be further assessed in future clinical investigations. In conclusion, this study has identified several genetic features associated with tumor size and lymphatic metastasis in rNEN patients, which will deepen our understanding of the genetic alterations that occur during rNEN progression and potentially directing improvements in rNEN treatment strategies. Supplementary information [203]Supplemental materials^ (2MB, pdf) [204]Table S1^ (11.6KB, xlsx) [205]Table S2^ (16.4KB, xlsx) [206]Table S3^ (12.3KB, xlsx) [207]Table S4^ (14.6KB, xlsx) [208]Table S5^ (11.4KB, xlsx) Acknowledgements