Abstract

Purpose

   Small cell lung cancer (SCLC) is an aggressive and rapidly progressive
   malignant tumor characterized by a poor prognosis. Chemotherapy remains
   the primary treatment in clinical practice; however, reliable
   biomarkers for predicting chemotherapy outcomes are scarce.

Methods

   In this study, 78 SCLC patients were stratified into “good” or “poor”
   prognosis cohorts based on their overall survival (OS) following
   surgery and chemotherapeutic treatment. Next-generation sequencing was
   employed to analyze the mutation status of 315 tumorigenesis-associated
   genes in tumor tissues obtained from the patients. The random forest
   (RF) method, validated by the support vector machine (SVM), was
   utilized to identify single nucleotide mutations (SNVs) with predictive
   power. To verify the prognosis effect of SNVs, samples from the
   cbioportal database were utilized.

Results

   The SVM and RF methods confirmed that 20 genes positively contributed
   to prognosis prediction, displaying an area under the validation curve
   with a value of 0.89. In the corresponding OS analysis, all patients
   with SDH, STAT3 and PDCD1LG2 mutations were in the poor prognosis
   cohort (15/15, 100%). Analysis of public databases further confirms
   that SDH mutations are significantly associated with worse OS.

Conclusion

   Our results provide a potential stratification of chemotherapy
   prognosis in SCLC patients, and have certain guiding significance for
   subsequent precise targeted therapy.

Supplementary Information

   The online version contains supplementary material available at
   10.1007/s12672-023-00685-4.

   Keywords: SCLC, SDH mutations, Chemotherapy prognostic, Random forest,
   Next generation sequencing

Introduction

   Small cell lung cancer (SCLC) is a common yet aggressive carcinoma with
   an extremely high proliferation rate, accounting for about 15% of cases
   of the most lethal carcinoma, lung cancer [[68]1]. Due to its
   extraordinary invasiveness, SCLC is prone to early metastases, leading
   to generally poor curative outcomes [[69]2]. In contrast to the
   increasing benefits of precision treatment for non-small cell lung
   cancer (NSCLC) patients in recent years, the current clinical treatment
   for SCLC remains inadequate. Concurrent chemoradiotherapy (CRT) is the
   primary clinical treatment for limited-stage (LS) SCLC, while
   chemotherapy alone is employed for extensive-stage (ES) SCLC [[70]3].
   However, most patients’ responses to CRT or chemotherapy are transient,
   followed by rapid recurrence and dismal survival rates, with a median
   survival time of less than 2 years for early patients and about one
   year for patients with metastases [[71]2]. A small number of SCLC
   patients exhibit initial resistance to the first-line standard
   etoposide combined with carboplatin or cisplatin (EC or EP) regimen,
   resulting in more rapid progression and poor prognosis after treatment.
   Accurate identification of primary drug resistance, early acquired drug
   resistance, and patients with poor prognosis remains elusive.
   Consequently, the majority of inoperable advanced SCLCs are
   indiscriminately treated with EP or EC regimens, highlighting the
   urgent clinical need for first-line chemotherapy.

   With advancements in technologies such as next-generation sequencing
   (NGS), molecular profiling of SCLC has made unprecedented progress
   [[72]4, [73]5]. These DNA-level studies have provided insights into the
   genetic variation profile and genetic nature of SCLC [[74]6]. Tumor
   suppressor genes TP53 and RB1 were universally inactivated in SCLC,
   which was once considered a molecularly homogeneous tumor [[75]7], In
   recent years, the importance of MYC family, KMT2D, PIK3CA, and other
   gene mutations has been confirmed in patients, xenograft tumor models,
   mice, and cell levels. SCLC is characterized by complex and variable
   gene mutations, high intra-tumoral heterogeneity, and plasticity
   [[76]3, [77]8]. With the progress of molecular precision diagnosis and
   treatment in melanoma, NSCLC, and other carcinoid tumors, the research
   direction of subtyping SCLC based on molecular gene status to predict
   therapeutic response and achieve first-line precision treatment has
   attracted considerable attention from researchers [[78]9, [79]10].
   Different stages of SCLC, surgical intervention, and first-line
   treatment regimens are crucial prognostic indicators for patients.
   However, significant differences exist in the efficacy of surgery and
   standard chemotherapy in patients with the same stage [[80]11].
   Previous studies have explored biomarkers in tissue protein expression,
   blood tumor markers, blood immune cell counts, and biochemical
   indicators, but their clinical application is limited due to small
   experimental samples and varying detection platform standards [[81]2].
   Currently, clinical practice requires feasible biomarker research to
   predict the efficacy and clinical outcomes of first-line EP or EC
   chemotherapy in advanced SCLC to guide precise treatment in clinical
   practice.

   Machine learning methods can process massive, high-dimensional,
   high-throughput data, identify gene changes with high contributions,
   and use a small number of gene markers to achieve accurate prognosis
   prediction [[82]12–[83]14]. In this study, leave-one-out
   cross-validation (LOOCV) and random forest (RF) methods are employed
   for data processing [[84]15, [85]16]. These methods can efficiently
   apply data and avoid overfitting of subsequent models, making them
   suitable for studies with precious sample sources, such as SCLC
   [[86]9].

   This study aims to identify reliable biomarkers for predicting the
   efficacy and clinical outcomes of first-line chemotherapy treatments in
   advanced SCLC patients. We employed machine learning methods, such as
   random forest (RF) and support vector machine (SVM) algorithms, to
   process complex genetic data and accurately predict patient prognosis
   based on molecular gene status. Our findings have the potential to
   guide precise treatment decisions in clinical practice, ultimately
   improving survival outcomes for SCLC patients.

Patients and methods

Patient selection

   This research was designed as a multi-center retrospective study.
   Patients diagnosed with SCLC at Ruijin Hospital (Shanghai, China) from
   January 2013 to December 2020 and Changhai Hospital (Shanghai, China)
   from January 2018 to December 2020, who met the inclusion criteria,
   were enrolled in this study (Supplementary methods). Data regarding
   baseline characteristics, including age at diagnosis, gender, Eastern
   Cooperative Oncology Group Performance Status (ECOG PS) score, tumor
   stage, smoking habits, pathologic type, metastases, clinical treatment,
   and outcomes, were carefully collected for each participant. Tumor
   evaluation was performed according to the revised Response Evaluation
   Criteria in Solid Tumors (RECIST) guidelines (version 1.1) [[87]17].
   All patients were classified into a two-stage system (limited-stage and
   extensive-stage) according to the Veterans Administration Lung Cancer
   Group (VALG) staging method [[88]2, [89]18]. Overall survival (OS) was
   defined as the time from diagnosis to death from any cause. Based on
   phase III clinical studies and meta-analysis data [[90]19–[91]21],
   patients were assigned to either the “good prognosis” (OS ≥ 10 months)
   or “poor prognosis” (OS < 10 months) cohort.

DNA extraction and library construction

   Paraffin-embedded tumor tissue specimens were provided by each patient.
   A pathologist micro-dissected all samples selected for sequencing to
   confirm regions with > 70% tumor content. Tissue DNA extraction and
   purification were performed using the human tissue DNA extraction kit
   (Yunying Medicine, Ltd., Zhejiang, China). DNA concentration and purity
   were assessed using NanoDrop2000 (Thermo Fisher Scientific, Waltham,
   MA, USA). Prepared samples were stored at − 20 °C until use. Targeted
   sequencing strategies were employed in this study, and the library was
   prepared using the VAHTS Universal DNA library prep kit (Illumina,
   Carlsbad, CA, USA). Target enrichment was performed using optimized
   probes (Yunying Medicine, Ltd.) designed to capture exons and some
   introns, targeting mature transcripts of 315 cancer-related genes.
   Sequencing was performed using the NextSeq500 platform (Illumina,
   Carlsbad, CA, USA), with each experimental step strictly following the
   manufacturer's protocol.

Next-generation sequencing (NGS)-based assay

   FASTQ files were screened using FastQC software (version 0.11.2) and a
   custom Python script to remove adapter sequences and sequences with a Q
   score below 30. The Burrows-Wheeler Aligner (BWA, version 0.7.7) was
   employed to map clean reads to the reference human genome GRCh37/hg19.
   The resulting BAM files were realigned and recalibrated using GATK3.5,
   which was also used to detect mutations. To reduce potential polymerase
   chain reaction bias, duplicate sequences were removed using Picard
   MarkDuplicates (version 1.35). Single nucleotide variations (SNVs) were
   identified using VarScan (version 2.3.2), and SNVs meeting the
   following criteria were retained: allele frequency ≥ 10%, total
   reads ≥ 100, and changed reads ≥ 50. The Yunying internal germline SNV
   database (Yunying Medicine, Ltd., Zhejiang, China) was employed to
   filter germline SNVs, which was built upon the sequencing results of
   2,588 samples, and only SNVs with a frequency of more than 10% in
   individuals were considered [[92]22]. The obtained somatic SNV results
   were verified using the Integrative Genomics Viewer (IGV, version
   2.4.1) to further remove unreliable candidate sites.

SNV selection using random forest (RF) algorithm

   The SNV clustering process was conducted on patient samples and genes
   using CIMminer software ([93]http://disconver.nci.nih.gov/cimminer).
   The “randomForest” R package [[94]23] was employed with the ‘important’
   option set to true. The RF algorithm was then used to analyze SNV data
   from all 78 patients, operating in a classification mode. Only SNVs
   with positive accuracy contributions were selected and subjected to
   further screening with the RF classifier. Progressive screening
   continued until only SNVs that boosted classification accuracy
   remained, and Out-of-Bag (OOB) errors stopped declining. In the RF
   classification algorithm, 2/3 of the samples were used to build the
   decision tree, with the rest reserved for validation. The OOB error, an
   evaluation metric derived from verification, was used to monitor the
   accuracy of the classifier, with decreasing OOB error implying improved
   classification accuracy. In the last stages of progressive screening,
   when a limited number of SNVs were removed and OOB stabilized, SNVs
   that lowered classification accuracy in certain random cases were
   considered for further modeling.

Predictive modeling using support vector machines (SVMs)

   We conducted a four-fold cross-validation experiment. The samples were
   randomly divided into four groups, each with a class distribution that
   was maintained. Each group was then used as the testing set in turn,
   with the other groups serving as the training set. The overall
   performance was reported as the mean and standard deviation of the
   results from the four SVM models generated through cross-validation. To
   assess the impact of SNVs with uncertain effects on SVM performance, we
   repeated the four-fold cross-validation process 1000 times with
   different random group selections. The subset with the highest mean ROC
   curve AUC was selected as the final set for further investigation.

Public database verification

   cBioPortal ([95]https://www.cbioportal.org/) was employed for the
   prognostic ability of SNVs. Briefly, the detailed clinical baseline,
   gene sequencing and treatment information of 239 patients with SCLC
   were downloaded, and removed the duplicate samples.

Bioinformatic analysis

   All genes harboring mutations were recorded, and pathway mapping and
   enrichment analysis were performed using Gene Ontology (GO) and Kyoto
   Encyclopedia of Genes and Genomes (KEGG) via the R package
   “clusterProfiler” (v.3.14.3) [[96]24]. Agilent Literature Search
   (v.3.1.1) [[97]25] in Cytoscape [[98]26] was used to search and
   generate the gene-regulatory network.

Statistical analysis

   One-way analysis of variance was used to assess the effects of
   mutations, followed by Fisher’s exact test to independently compare
   variations between groups, and a Mann–Whitney U test to determine
   differences in total variations between the two cohorts. Variation
   burden was analyzed using the Poisson test, and survival analysis was
   based on the Kaplan–Meier method. All statistical analyses were
   performed, and odds ratios (ORs) generated with SPSS (v.22.0; IBM
   Corp., Armonk, NY, USA), with a P < 0.05 considered statistically
   significant. Correlations (r) were calculated using Kendall’s tau-b or
   tetrachoric correlation methods. The R package “survival” was used for
   Kaplan–Meier survival analysis, and other statistical charts were
   generated by the R package “ggplot2” [[99]27].

Results

Demographic and clinical characteristics of included patients

   From 2013 to 2020, 78 patients newly diagnosed with SCLC and who
   received standard first-line etoposide plus carboplatin or cisplatin
   therapy were enrolled in the study (Fig. S1), including 60 samples from
   Ruijin Hospital and 18 samples from Changhai Hospital. Slightly more
   than half of the patients (44, 56.4%) were younger than 65 years old,
   and the male/female ratio was 73/5. More than three-quarters of
   patients (59, 75.6%) had been or were smokers. Clinical staging
   statistics revealed that 51 patients were evaluated as extensive
   (65.4%) at the time of diagnosis, of which 35 (44.9%) patients had
   extrapulmonary distant metastasis, and most (73/78) patients were in
   good basic condition (ECOG score 0–1).

   In terms of prognosis, 66 cases of progression after first-line
   treatment (progression-free survival, PFS event maturity 84.6%) and 46
   cases of death events (maturity 59.0%) were observed in the study. The
   median PFS of enrolled patients was 7.15 months (interquartile range,
   IQR: 4.13–11.02 months), and the median overall survival (mOS) was
   16.52 months (IQR: 10.03–25.32 months). Referring to previous
   meta-analysis and Phase III clinical trials [[100]20–[101]22], patients
   were divided into a group of good prognosis (OS ≥ 10 months) and
   another poor cohort (OS < 10 months) by a cut-off value of 10 months.
   By this means, 60 patients (76.8%) had a good prognosis and 18 patients
   (23.2%) had a poor prognosis. The clinical characteristics (age,
   gender, ECOG PS status, distant metastasis status, smoking status;
   shown in Table [102]1 and Table S1) of both prognosis groups are
   similar, but there are differences in staging: compared with the good
   cohort, the poor one is more in the extensive stage (P = 0.023). In
   addition, although statistically insignificant, the proportion of
   patients with good prognosis receiving chest radiotherapy was higher
   (61.7% vs 33.3%, P = 0.057).

Table 1.

   Clinical characteristics of SCLC patients receiving first-line platinum
   containing dual drug chemotherapy
   Total patients Good prognosis
   (OS ≥ 10 m) Poor prognosis
   (OS < 10 m) P value
   N(%) N(%) N(%)
   Age
     < 65 years old 44(56.4) 33(55.0) 11(61.1) 0.788
     ≥ 65 years old 34(43.6) 27(45.0) 7(38.9)
   Gender
    Male 73(93.6) 56(93.3) 17(94.4) 1.000
    Female 5(6.4) 4(6.7) 1(5.6)
   Staging
    Limited-stage 27(34.6) 25(41.7) 2(11.1) 0.023^a*
    Extensive-stage 51(65.4) 35(58.3) 16(88.9)
   Distant metastasis
    No 43(55.1) 35(58.3) 8(44.4) 0.418
    Yes 35(44.9) 25(41.7) 10(55.6)
   ECOG PS
    0–1 73(93.6) 58(96.7) 15(83.3) 0.078
    2–3 5(6.4) 2(3.3) 3(16.7)
   Smoking
    Never 19(24.4) 17(28.3) 2(11.1) 0.211
    Present/past 59(75.6) 43(71.7) 16(88.9)
   Chest radiotherapy
    No 35(44.9) 23(38.3) 12(66.7) 0.057
    Yes 43(55.1) 37(61.7) 6(33.3)
   Posterior line immunotherapy
    No 71(91.0) 53(88.3) 18(100) 0.192
    Yes 7(9.0) 7(11.7) 0(0)
   Total number of patients 78(100) 60(76.8) 18(23.2)
   [103]Open in a new tab

   ECOG PS Eastern Cooperative Oncology Group Performance Status; m month

   ^aBold value, statistically significant

   *at the level of P < 0.050

SCLC gene mutation map and random forest screening

   The landscape of patient genotypes was explored using next-generation
   sequencing (NGS) to investigate the molecular characteristics of SCLC.
   A panel of 315 genes related to tumorigenesis was selected to identify
   differences in patient genotypes. Bioinformatic analyses of this panel
   revealed that the top ten frequently mutated genes were TP53, RB1,
   KMT2C, KMT2D, LRP1B, KMT2A, FAT1, PRKDC, BRCA2, and EP300. The
   mutations in 78 patients at the gene level were determined and
   clustered, and there was no significant correlation with the efficacy
   and prognosis of first-line standard chemotherapy (Fig. [104]1). Among
   the 315 genes, there was no significant difference in the total number
   of mutations per capita between patients with good prognosis and those
   with poor prognosis (P = 0.578, Fig. [105]2A).

Fig. 1.

   [106]Fig. 1
   [107]Open in a new tab

   Mutation landscape of SCLC patients receiving platinum-based standard
   chemotherapy

Fig. 2.

   [108]Fig. 2
   [109]Open in a new tab

   Prognostic-related mutated genes selected by random forest (RF) and
   support vector machines (SVM). A Total number of mutations per person
   for the two groups of 315 genes, B the top 50 genes and RF values for
   the random forest screening, C the total number of mutations per person
   for the two groups of the top 20 genes, D ROC curve of a prognostic
   model built by SVM-LOOCV

   Contributive SNVs to the prognosis were screened by random forest (RF)
   analysis, and the top 50 obtained genes are shown in Fig. [110]2B. The
   top 10 included SDHC, C11orf30, SDHB, PDCD1LG2, TOP2A, GRM3, SDHD,
   PRKDC, STAT3, and AURKA. The top 20 genes contributing to clinical
   survival outcomes were included to construct an SVM model. After
   screening the top 20 determinant SNVs by RF, the total number of
   mutations per capita in the cohort of patients with poor prognosis was
   significantly higher than that in the cohort with good prognosis
   (P < 0.001, Fig. [111]2C).

SVM classification model and evaluation

   A cross-validation of the SVM-LOOCV algorithm was applied to confirm
   the positive contribution of the top 20 genes selected by RF. As shown
   in Fig. [112]2D, the area under the validation curve (AUC) of SVM-LOOCV
   for prognosis is 0.89, while the AUCs for staging and distant
   metastasis are 0.65 and 0.57, respectively, suggesting that the SVM
   model has good prognostic differentiation ability. We further conducted
   single-factor and multi-factor survival analyses on the predictive
   ability of the model (Table S2 and S3) and found that the model
   grouping is an independent predictor of PFS and OS (PFS: HR = 2.820,
   95% CI 1.371–5.801, P = 0.005; OS: HR = 2.512, 95% CI 1.107–5.701,
   P = 0.028). Therefore, combined with the cross-validation
   characteristics of SVM-LOOCV and the survival analysis of model
   grouping, the top 20 genes model possesses a certain ability to predict
   clinical prognosis.

   The details of the 20 gene mutations are shown in Fig. [113]3 in the
   form of a heatmap. Additionally, we observed that despite the low
   mutation frequency (3.85%) of SDHC (3/78), SDHB (3/78), and SDHD
   (3/78), all patients with SDH family gene mutations belong to the poor
   prognosis cohort (8/8, 100%). This suggests that the SDH family may
   have a certain correlation with the prognosis of SCLC patients.

Fig. 3.

   [114]Fig. 3
   [115]Open in a new tab

   Complex heatmap of the top 20 genes SNV profiling selected through
   random forest (RF) and their clinical characteristics

Survival analysis of key mutant genes

   To verify the predictive capability of the top 20 genes selected by
   random forest analysis, we analyzed the distribution of these genes in
   the two cohorts with good and poor prognosis. The results revealed
   statistically significant diversity in mutation frequencies for 9 genes
   (PIK3R1, TOP2A, PMS2, STAT3, AURKA, SDHB, SDHC, SDHD, PDCD1LG2) between
   the two cohorts (Table S4), with patients carrying these gene mutations
   tending to have worse prognosis. Interestingly, mutations in the MYC
   family, KMT2D, and PIK3CA, which are thought to have significant
   effects on SCLC, were not selected as prognostic features by the random
   forest model in our cohort. Upon further examination of the
   distribution of these genes in our cohort, we found that their
   mutations did not influence patients' chemotherapy outcomes (Table S5).

   To further explore the 9 genes with significant Poisson distribution
   divergence, we conducted a Kaplan–Meier survival analysis for the two
   prognostic cohorts. Log-rank analysis demonstrated that the OS of
   patients with SDHB/C/D mutations was shorter than that of non-carriers
   after receiving first-line standard EC or EP regimen chemotherapy (mOS:
   23.0 m vs 9.3 m, P < 0.0001, Fig. [116]4A). Additionally, the prognosis
   of patients with PDCD1LG2 and STAT3 gene mutations was worse than that
   of non-carriers (PDCD1LG2 mOS: 22.6 m vs 9.7 m, P = 0.00043,
   Fig. [117]4B; STAT3 mOS: 23.0 m vs 10.0 m, P = 0.00017, Fig. [118]4C).
   Except for TOP2A, patients with mutations in AURKA, PMS2, and PIK3R1
   genes tended to have worse prognosis, although the difference was not
   statistically significant (Fig. [119]4D–G).

Fig. 4.

   [120]Fig. 4
   [121]Open in a new tab

   Kaplan Meier curves for OS according to the mutation of A SDH family, B
   PDCDLG2, C STAT3, D AURKA, E PMS2, F PIK3R1, and G TOP2A

   Importantly, to investigate whether patients’ staging would influence
   the predictive potential of these mutations on chemotherapy prognosis,
   we further divided the patient cohort into limited-stage (LS) and
   extensive-stage (ES) groups. The results demonstrated that in the LS
   group, which typically exhibits better chemotherapy outcomes, no
   patients carrying SDHB/C/D or PDCD1LG2 mutations were observed (Fig.
   [122]5A and C). Conversely, in the ES group, patients with SDHB/C/D or
   PDCD1LG2 mutations had significantly lower OS than non-carriers (Fig.
   [123]5B and D). Additionally, irrespective of the LS or ES group, STAT3
   gene mutation carriers exhibited significantly shorter OS than
   non-carriers (Fig. [124]5E and F), and LS patients with AURKA or PMS2
   mutations had poorer chemotherapy outcomes compared to non-carriers
   (Fig. S2A and S2C). No other statistically significant prognostic
   stratification differences were observed in either the LS or ES groups
   (Fig. S2), suggesting that the predictive value of these gene mutations
   for SCLC patient chemotherapy prognosis is fundamentally independent of
   tumor staging.

Fig. 5.

   [125]Fig. 5
   [126]Open in a new tab

   Kaplan–Meier curves for overall survival (OS) based on mutation status
   and tumor stage. A Impact of SDH family mutations on the prognosis of
   limited-stage (LS) patients, B impact of SDH family mutations on the
   prognosis of extensive-stage (ES) patients, C influence of PDCD1LG2
   mutations in LS cohort, D influence of PDCD1LG2 mutations in ES cohort,
   E effect of STAT3 mutations on LS cohort, and F effect of STAT3
   mutations on ES cohort

Verification of the prognostic ability of SNVs

   To verify our results, we utilized the cBioPortal, an open-source
   platform that enables interactive analysis of complex cancer genomics
   data sets. Our verification cohort comprised 239 SCLC patients. Upon
   analysis, we discovered that 6 of these patients from 3 study IDs
   carried 7 SDH family mutations (6 SDHA mutations and 1 SDHB mutation),
   4 patients with 4 STAT3 mutations, and no patients with PDCD1LG2
   mutation. Furthermore, our research indicated that SCLC patients with
   SDH family mutations frequently possess both TP53 and RB1 mutations
   (Table [127]2). Upon conducting further analysis of the association
   between these mutations and OS, we observed that individuals with SDH
   family mutations exhibited a relatively shorter median survival time,
   although the difference was not statistically significant (Fig. S3A).
   Conversely, no significant association was found between STAT3
   mutations and OS among patients (Fig. S3B).

Table 2.

   Clinical and genomic characteristics of SCLC patient’s harbored SDH and
   STAT3 mutations in cbioportal database
   Study ID Patient ID Diagnosis age Sex UICC tumor stage Overall survival
   (months) Overall survival status SDH or STAT3 mutation TP53
   mutation RB1
   mutation
   sclc_ucologne_2015 sclc_ucologne_2015_S00938 58 Female IV 15 1:DECEASED
   SDHA A466V SDHA E473V P151S L477Mfs*17, D479Vfs*14
   sclc_ucologne_2015 sclc_ucologne_2015_S01170 71 Male Ia 20 1:DECEASED
   SDHB G182W G266E –
   sclc_ucologne_2015 sclc_ucologne_2015_S02279 58 Male IIb 1 1:DECEASED
   SDHA T298 =  V272L K791*
   sclc_ucologne_2015 sclc_ucologne_2015_S02273 58 Female IV 24 1:DECEASED
   SDHA A172P E68Rfs*55 E817*
   Small Cell Lung Cancer (Johns Hopkins, Nat Genet 2012) 134398 NA NA NA
   NA NA SDHA A391S P47Rfs*76 W75*
   Small Cell Lung Cancer (CLCGP, Nat Genet 2012) S00501 55 Female NA NA
   NA

   SDHA T656P

   STAT3 A35V
   R158G Y454*
   Small Cell Lung Cancer (U Cologne, Nature 2015)
   sclc_ucologne_2015_S01563 53 Male NA NA NA STAT3 V463Dfs*79 R196P
   X738_splice
   Small Cell Lung Cancer (U Cologne, Nature 2015)
   sclc_ucologne_2015_S02287 62 Male Ia 23 1:DECEASED STAT3 K363I H179L –
   Small Cell Lung Cancer (Johns Hopkins, Nat Genet 2012) 585260 NA NA NA
   NA NA STAT3 M200I – –
   [128]Open in a new tab

   *Termination mutation

   ^=Synonymous mutation

Molecular characteristics of prognostic model grouping

   To determine the possible pathways involving the genes included in the
   model, we conducted Gene Ontology (GO) functional annotation analysis
   and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment
   analysis for the 20 genes included in the model. The molecular function
   annotation revealed that the model included genes involved in the
   tricarboxylic acid cycle, mitochondrial oxidative phosphorylation, and
   other functions (Fig. [129]6A). Correspondingly, KEGG analysis
   demonstrated that genes involved in pathways were enriched in various
   tumor-related and metabolism-related pathways, including the
   tricarboxylic acid cycle, carbon metabolism, oxidative phosphorylation,
   platinum drug resistance, and PD-L1 expression (Fig. [130]6B).
   Simultaneously, KEGG & GO analysis results indicated that the target
   gene also participates in RNA post-transcriptional regulation and DNA
   damage repair.

Fig. 6.

   [131]Fig. 6
   [132]Open in a new tab

   Bioinformatics analysis of prognostic-related mutated genes selected by
   random forest (RF) and support vector machines (SVM). A Gene ontology
   (GO) function enrichment of the top 20 genes selected in the prognostic
   model, B Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway
   enrichment of the top 20 genes selected in the prognostic model, C the
   signaling regulatory networks of the top 20 genes selected in the
   prognostic model

   To investigate the functional effects of the selected genes, we
   analyzed the gene-regulatory network of these genes in cancer,
   particularly in lung tumors. The results showed that among the 20
   selected genes that contributed to the prognosis prediction of SCLC
   chemotherapy, 9 were located in the same signal regulatory network and
   mostly at key nodes, indicating that CDK8-CD274-AKT1-TP53 may be an
   essential signal regulatory axis determining the prognosis of SCLC
   chemotherapy (Fig. [133]6C). Additionally, the regulatory network
   suggested that mutated genes were associated with cytokine regulation,
   such as STAT3-IL-6, indicating the potential correlation of the
   included genes in the model with the shaping of the SCLC immune
   microenvironment.

Discussion

   In this study, we explored 315 genes related to tumor cell cycle,
   angiogenesis, and DNA damage repair in 78 patients with unresectable
   LS-SCLC and ES-SCLC who received platinum-based dual-agent
   chemotherapy. The overall genes with high mutation frequency in the
   samples were consistent with previous literature reports, and the
   higher mutation levels observed in genes encoding histone-lysine
   N-methyltransferase 2 (KMT2) family proteins may be related to the
   race, disease stage of patients, and sample sources (surgery vs needle
   biopsy) [[134]6, [135]28].

   According to the random forest algorithm results, we selected the top
   20 prognosis-related genes, including SDHC, C11orf30, SDHB, PDCD1LG2,
   TOP2A, GRM3, SDHD, PRKDC, STAT3, AURKA, etc. We used the support vector
   machine algorithm for final screening and model construction. The
   model’s area under the curve was 0.89, higher than the diagnostic
   ability based on disease stage and distant metastasis status. There
   were significant differences in PFS and OS between the two groups under
   the model. To date, this is the first study focusing on SCLC patients
   treated with standard first-line EC or EP and using machine learning
   algorithms to construct a prognostic model for SCLC.

   The signaling regulatory networks revealed that the pathways genes were
   involved in were enriched in the tricarboxylic acid cycle, carbon
   metabolism, oxidative phosphorylation, platinum resistance, and PD-L1
   expression (Fig. [136]6C), which corresponded with the molecular
   function annotation. CDK8-CD274-AKT1-TP53 may be an important signaling
   axis determining the prognosis of SCLC chemotherapy, participating in
   the cytokine regulatory network, including IL-6. Furthermore, the KM
   curve analysis suggested that patients with SDH family gene mutations,
   PDCD1LG2, and STAT3 gene mutations had worse prognoses after
   chemotherapy. All these characteristic analysis results suggest that
   the significantly shorter survival data in the poor prediction group in
   this study may be related to SDH complex dysfunction.

   However, SDH is composed of six subunits, and deleterious mutations in
   any of them invariably result in functional destabilization of the
   entire complex [[137]29]. SDH family mutations are considered to be
   associated with neuroendocrine tumors such as paraganglioma and
   pheochromocytoma [[138]30, [139]31]. In the study of tumor-associated
   macrophages, the SDH complex participates in the metabolic
   reprogramming of melanoma by regulating the generation of mitochondrial
   reactive oxygen species, and the use of SDH inhibitors can inhibit the
   growth of melanoma [[140]32]. Other types of tumorigenesis, including
   gastrointestinal stromal tumors (GISTs), renal, thyroid, melanoma,
   sarcoma, colon neuroblastoma [[141]30, [142]33], pancreatic
   neuroendocrine tumors, Carney triad, and ganglioneuroma [[143]34] have
   been shown to depend on SDH family mutations in some cases. The role of
   succinate accumulation in previously described tumors as an initiator
   in neoplasm invasion and metastasis has been well documented [[144]35],
   suggesting the potential mechanism and importance of SDH in the highly
   invasive SCLC. This is the first study to find that SDH complex
   mutations may be related to the clinical significance, metabolic
   mechanism, and tumor development mechanism of SCLC. In the future, the
   mechanism of this gene mutation leading to tumor metabolism,
   occurrence, and development can be further explored in basic research,
   and the clinical value of predicting the efficacy of chemotherapy can
   be explored in clinical research prospectively.

   In addition, the protein encoded by PD-L2 and PDCD1LG2 is one of the
   important components of the PD-1/PD-L1 and PD-L2 axis, which is
   involved in tumor immune escape [[145]36]. Moreover, studies have shown
   that the expression level of PD-L2 is related to the efficacy of
   immunotherapy [[146]37]. Based on these results, the prognosis model
   constructed in this study may be conducive to further screening of
   cohorts who benefit from immunotherapy combined with chemotherapy. This
   assumption requires follow-up retrospective and prospective clinical
   research.

   Nevertheless, this study has several limitations. First, it is a
   multicenter retrospective study, and due to the relatively low
   prevalence of SCLC among lung cancer patients, the final sample size
   included is limited. Second, the study did not perform a detailed
   analysis of the impact of subsequent lines of treatment on patients'
   overall survival. Finally, the distribution of patients in the good
   prognosis cohort and the poor prognosis cohort within this study is
   uneven, which may introduce statistical bias. Therefore, further
   prospective multicenter studies involving a larger cohort should be
   undertaken to expand the understanding of our findings' ability to
   distinguish between SCLC patients with good and poor prognoses in
   response to chemotherapy. Additionally, the functional and molecular
   biological implications of these gene mutations warrant careful
   investigation in future research.

   In summary, our study used machine learning algorithms to construct a
   prognostic model for SCLC patients treated with standard first-line EC
   or EP chemotherapy. The model showed potential in predicting clinical
   prognosis and identified several genes that may be involved in the
   development and progression of SCLC. Further exploration of the
   mechanisms by which these gene mutations contribute to tumor
   metabolism, occurrence, and development could provide valuable insights
   into the clinical value of predicting chemotherapy efficacy and
   identifying patients who may benefit from immunotherapy combined with
   chemotherapy.

Supplementary Information

   [147]Additional file1 (DOCX 141 KB)^ (141KB, docx)
   [148]Additional file2 (DOCX 434 KB)^ (434.2KB, docx)
   [149]Additional file3 (DOCX 224 KB)^ (224.5KB, docx)
   [150]Additional file4 (DOCX 17 KB)^ (16.6KB, docx)
   [151]Additional file5 (DOCX 32 KB)^ (32.3KB, docx)
   [152]Additional file6 (DOCX 18 KB)^ (18.3KB, docx)
   [153]Additional file7 (DOCX 19 KB)^ (19.1KB, docx)
   [154]Additional file8 (DOCX 19 KB)^ (19.1KB, docx)
   [155]Additional file9 (DOCX 17 KB)^ (16.6KB, docx)

Acknowledgements