Abstract

   Emerging evidence emphasizes the critical role of alternative
   polyadenylation (APA) in posttranscriptional regulation of genes, and
   APA‐associated genetic variants (apaQTLs) show particular relevance for
   multiple disease. However, genetic regulation of APA and its role in
   non‐small cell lung cancer (NSCLC) risk have not been thoroughly
   studied. Here, by leveraging genotype and APA data from The Cancer
   Genome Atlas, the association between genetic variation and APA is
   determined in NSCLC samples. The identified apaQTLs are distinct from
   eQTLs and are preferentially enriched in functionally relevant
   characteristics, including poly(A) motifs, APA‐associated RBP binding
   sites, functional elements, and known NSCLC risk loci. Moreover, genes
   associated with apaQTLs are broadly involved in cancer‐related
   biological process. Of note, integration of apaQTL variants with
   traditional GWAS‐derived PRS is proved as a potential screening tool
   for NSCLC. By integrating large‐scale population and biological
   experiments, a functional apaQTL variant rs9606 in LYRM4 is identified.
   Mechanistically, rs9606 induces aberrant APA process of LYRM4 via
   allele‐specific interacting with NUDT21, which lead to increased
   expression of oncogene LYRM4 and thus contribute to NSCLC risk. This
   study demonstrates the distinct contribution of APA‐associated genetic
   variants in NSCLC risk, providing critical clues and potential targets
   for NSCLC etiology and clinical intervention.

   Keywords: alternative polyadenylation, apaQTLs, functional PRS, LYRM4,
   non‐small cell lung cancer
     __________________________________________________________________

   Compared with the non‐risk G allele of rs9606, the risk T allele of
   rs9606 decreases the binding affinity of NUDT21 for LYRM4, triggering
   3'UTR shortening that stabilizes LYRM4 mRNA and elevates its
   expression. Increased LYRM4 expression promoted malignant phenotypes of
   non‐small cell lung cancer (NSCLC) cells through modulating
   ferroptosis, supporting population findings linking rs9606[T] to
   increased higher NSCLC risk.

   graphic file with name ADVS-12-2502008-g001.jpg

1. Introduction

   Lung cancer remains one of the most common cancers and the leading
   cause of cancer‐related mortality worldwide, poses a serious threat to
   human health.^[ [46]^1 ^] Non‐small cell lung cancer (NSCLC) is the
   commonest form and accounts for the 85% of total lung cancers.^[ [47]^2
   ^] It is well known that, in addition to environmental risk factors
   (such as tobacco smoking), genetic variants also contribute to risk for
   NSCLC.^[ [48]^3 ^] Genome‐wide association studies (GWAS) have
   identified numerous genetic variants associated with NSCLC risk, but
   identifying causal variants and elucidating their functional mechanism
   remains an ongoing challenge, as GWAS data cannot directly reveal why a
   genetic variant is pathogenic.^[ [49]^4 ^] To date, most of studies
   concerning revealing mechanism of genetic variants impacting messenger
   RNA abundance (expression quantitative trait loci, eQTLs), other
   regulatory mechanisms beyond the genetic control of mRNA expression are
   underexplored.^[ [50]^5 ^]

   One such underexplored mechanism is alternative polyadenylation (APA).
   It is estimated that nearly 70% of human genes have multiple
   polyadenylation sites undergo APA. By employing different poly(A)
   sites, APA yields transcripts with varying 3′‐untranslated region (UTR)
   lengths.^[ [51]^6 ^] Since 3′UTR harbors regulatory elements for
   RNA‐binding proteins (RBPs) and miRNAs, APA can regulate gene
   expression by altering mRNA stability, translational efficiency, or
   cellular localization of proteins.^[ [52]^7 ^] It is well‐established
   that APA is strikingly dysregulated in cancer.^[ [53]^8 ^] Aberrant APA
   leading to 3′UTR shortening is a common feature of multiple types of
   cancer and result in dysregulation of oncogenes and tumor‐suppressor
   genes.^[ [54]^9 ^] Moreover, altered expression of key APA regulators
   (such as NUDT21, PCF11) often shifts APA globally and disrupts
   biological processes involved in cancer, including cell cycle,
   apoptosis, and proliferation.^[ [55]^10 ^] Despite these observations,
   our knowledge of the underlying mechanisms that induce aberrant APA is
   still limited.

   Noticeably, growing studies have established functional links between
   genetic variants and APA regulation.^[ [56]^11 ^] Genetic variants that
   impact APA process have been implicated in diseases including cancer.^[
   [57]^12 ^] Examples include a genetic variation rs78378222 changes the
   polyadenylation signal of TP53 thus leading to impaired 3′end
   processing of TP53 mRNA and further promoting oncogenesis.^[ [58]^13 ^]
   Similarly, rs1020670 at DNM1L 3′UTR locus can promote aberrant APA and
   shorten the 3′UTR of DNM1L, which stabilize DNM1L mRNA to upregulate
   its expression and further lead to increased colorectal cancer risk.^[
   [59]^11b ^] A recent work identified ≈0.4 million genetic variants
   associated with APA, which contribute substantially to the molecular
   mechanisms underlying human diseases.^[ [60]^11a ^] These findings
   highlight the importance of genetic regulation of APA as a molecular
   mechanism for disease risk. However, to date, large‐scale apaQTL
   studies conducted in NSCLC have been sparse, the mechanisms of genetic
   regulation of APA and their associations with NSCLC risk have not been
   systematically studied.

   To fill this gap, by integrating genotype and APA phenotype data from
   TCGA, we systematically investigate the genetic regulation of APA in
   NSCLC tissues. We identified 117 161 significant genetic variants that
   are associated with APA (apaQTL variants) and 2037 apaQTL‐associated
   genes. Then, we comprehensively characterized the functional properties
   of these apaQTLs and the corresponding target genes. Furthermore, we
   used whole‐genome sequencing genotype data derived from 13 835 NSCLC
   patients and 13 835 controls to assess the effect of apaQTLs on NSCLC
   risk. We generated PRS score by using apaQTL variants and known NSCLC
   risk variants, and evaluated its utility and effectiveness in
   predicting NSCLC risk in multiple cohorts. By integrating large‐scale
   population and biological experiments, we confidently identified a
   functional apaQTL variant rs9606 in LYRM4 contributes to NSCLC risk
   through genetic regulation of APA. Collectively, our findings advance
   understanding of molecular mechanism linking genetic variants to NSCLC
   risk and represents a valuable resource for identification of novel
   therapeutic targets.

2. Results

2.1. Identification and Characterization of apaQTLs in Nonsmall Cell Lung
Cancer

   As shown in Figure [61]1A, we performed a genome‐wide apaQTL analysis
   in NSCLC, using percentage of distal poly(A) site usage index (PDUI) of
   APA events, genotype and clinical characteristics data of 1010 NSCLC
   patients from TCGA. After quality control filtering (Experimental
   Section; and Figure [62]S1, Supporting Information), we included
   genetic data of 4022558 SNPs and 4243 APA events for further analysis.
   Adjusting for known and inferred covariates, we totally identified
   117 161 significant apaQTL SNPs in TCGA NSCLC tissues (FDR < 0.05,
   Figure [63]1B). To characterize genomic features of identified apaQTL
   SNPs, we generated a set of control SNPs from the genome with MAF and
   LD scores matched with apaQTL SNPs. We functionally annotated apaQTL
   SNPs and control SNPs according to categories defined by SnpEff (Figure
   [64]S2, Supporting Information). The results showed that compared with
   control SNPs, apaQTL SNPs were significantly enriched in 3′UTR [OR (95%
   CI), 3.53 (3.30–3.78); P < 0.0001] and underrepresented among
   intergenic regions [OR (95% CI), 0.377 (0.371–0.383); P < 0.0001;
   Figure [65]1C]. Consistent with previous studies,^[ [66]^11a ^] when we
   analyzed the relative position of apaQTL SNPs across their
   corresponding genes (aGenes), we found that apaQTL SNPs clustered
   around transcription end sites (Figure [67]1D).

Figure 1.

   Figure 1
   [68]Open in a new tab

   Identification and functional characterization of apaQTLs in NSCLC. A)
   An overview of study design and apaQTL analysis workflow. Integration
   APA phenotype and genotype data of 1010 NSCLC samples from TCGA, a
   genome‐wide apaQTL analysis was performed to identify APA‐associated
   variants in NSCLC. We then performed functional annotation of apaQTLs
   and the its corresponding target genes (aGenes). The biological
   function of NSCLC‐associated apaQTL variants rs9606 was further
   investigated by and population validation and experimental approach. B)
   Manhattan plots displaying apaQTL analysis result in NSCLC. The
   ‐log[10](FDR) for each variant is plotted according to chromosome
   position. FDR < 0.05 was considered statistically significant. C)
   Enrichment of apaQTL SNPs in each genomic region. P‐values were
   calculated by two‐tailed Fisher's exact test. Error Bar indicate 95%
   confidence intervals (CIs). D) Distance of apaQTLs from their target
   genes (aGenes). SNPs were plotted according to its distance to the
   aGene and statistical significance for association with APA event. TES,
   transcription end site. E) Overview of PAS motifs altered by apaQTLs in
   NSCLC. F) Enrichment of apaQTLs in PAS motifs, compared with
   non‐apaQTLs. P‐values were calculated by two‐tailed Fisher's exact
   test. Error bar, 95% CI. G) Enrichment of apaQTLs in binding sites for
   individual RBPs. Blue dashed line: P < 0.05. H) Enrichment analyses of
   apaQTLs in histone modification markers annotated by ENCODE, compared
   with non‐apaQTLs. P‐values were calculated by two‐tailed Fisher's exact
   test. Error bars, 95% CIs. I) Enrichment of apaQTLs in TF‐binding sites
   of individual TFs. Blue dashed line: P < 0.05. J) KEGG pathway
   enrichments for aGenes. K) Somatic copy‐number alteration (SCNA)
   pattern in NSCLC tumor samples. The top 20 most frequently mutated
   genes are shown in each sample, and the mutation subtypes are indicated
   by color. L) Significant aGene‐drug pairs identified in Genomics of
   Drug Sensitivity in Cancer (GDSC) dataset. Red and blue bars denote
   positive and negative associations, respectively.

2.2. apaQTLs are Enriched in Poly(A) Motifs, APA‐Associated RBP Binding
Sites, and Functional Elements

   Next, we explored the potential mechanisms through which apaQTLs
   regulate APA events. Since previous studies have shown that apaQTLs may
   alter the polyadenylation signal (PAS),^[ [69]^14 ^] we examined the
   prevalence of PAS‐altering apaQTLs in NSCLC and extracted significant
   apaQTL variants within 50 bp upstream of annotated poly(A) sites from
   Poly(A) database.^[ [70]^15 ^] By conducting motif searches based on 15
   common PAS motif, we identified 94 apaQTL variants that may either
   alter canonical PAS or change other PAS motifs in NSCLC
   (Figure [71]1E). As expected, we also found apaQTLs were significant
   enriched among poly(A) motifs when compared with control SNPs [OR (95%
   CI), 1.21 (1.12–1.31); P < 0.0001; Figure [72]1F]. Taken together,
   these findings suggest that a fraction of apaQTLs affect APA events by
   altering poly(A) motifs.

   The alteration of poly(A) motifs only explained a fraction of apaQTLs,
   implying that other mechanisms may also play a role. Given the critical
   role of RNA binding proteins (RBPs) in regulating APA process, we
   reasoned that if apaQTLs affect APA by impacting RBP binding sites,
   which was important for transcript 3′‐end processing. To test this
   hypothesis, we further performed enrichment analysis of apaQTLs in RBP
   binding sites by analyzing crosslinking and immunoprecipitation
   followed by sequencing (CLIP‐seq) data from the ENCODE.^[ [73]^16 ^] We
   tested apaQTLs enrichment for each RBP, and found that 55 RBPs
   exhibited significant enrichment among the 74 examined RBPs
   (Figure [74]1G). Among them, several APA regulators and splicing
   factors were identified, and a master APA mediator NUDT21 was also
   identified.^[ [75]^10a ^]

   Recent studies suggest that transcription can influence gene APA
   process.^[ [76]^17 ^] We performed enrichment analyses by using
   ChIP‐seq data from ENCODE portal. We found that apaQTLs were
   significantly enriched in histone marks of enhancers (H3K27ac,
   H3K4me1), promoters (H3K4me3), and active transcription (H3K36me3,
   H4K20me1, H3K9ac, H3K4me2), whereas they were underrepresented among
   repressive epigenetic marker (H3K27me3) (Figure [77]1H). We further
   performed enrichment analysis of apaQTLs within 59 TF binding sites.
   The result showed that 18 TFs significantly enriched with apaQTLs
   (Figure [78]1I). Interestingly, the apaQTL enrichment with the most
   significant P values correspond to TFs involved in APA regulation
   process (CTCF gene).^[ [79]^18 ^] Overall, the data suggest that
   apaQTLs may have a unique epigenomic regulation mechanism and that
   transcription and polyadenylation are tightly linked.

2.3. apaQTLs Targeted Genes Play Crucial Roles in Cancer Development and
Therapy

   According to above apaQTL analysis, we identified a total of 2037 genes
   and 2109 associated APA events. To further investigate the functional
   roles of apaQTLs targeted genes (aGenes) in cancer development and
   therapy, we first performed pathway enrichment analysis of aGenes.
   Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed that
   aGenes were enriched in cancer‐related processes, including P53
   pathway, oxidative phosphorylation, cell apoptosis, and DNA repair
   pathway (Figure [80]1J). As somatic copy‐number alteration (SCNA) may
   disrupt expression levels of cancer genes and influence drug
   sensitivity, we next assessed SCNA in NSCLC based on the data from
   TCGA. By integrating aGenes that ranked top 20 among somatic mutation
   genes for analysis, we observed that aGenes were subject to deleterious
   missense mutations in majority of NSCLC patients (79.04%)
   (Figure [81]1K). Considering the emerging role of immune cell in
   modulating cancer development, we further assessed their association
   with aGene expression, in which immune cell infiltration data were
   quantified using the TIMER and EPIC algorithms. The results showed that
   several aGenes are associated with specific immune cell subsets,
   including CD8^+ T cell, CD4^+ T cell, and macrophage cell (Figures
   [82]S3 and [83]S4, Supporting Information).

   To understand the effects of aGenes on drug response, we performed a
   systematic pharmacogenomic analyses by using Genomics of Drug
   Sensitivity in Cancer (GDSC) data,^[ [84]^19 ^] and identified a total
   of 34 902 significant associated aGene‐drug pairs (Figure [85]1L). We
   focused on drugs highly associated with aGenes and their pharmaceutical
   target. Noticeably, the top drug category is that drugs targeting
   ERK/MAPK signaling pathway including Trametinib and Dabrafenib (Figure
   [86]S5, Supporting Information), which are approved by FDA for use in
   combination for the treatment of NSCLC.^[ [87]^20 ^] Collectively, the
   results of analyses above suggest the unique role of apaQTLs targeted
   genes in cancer development and treatment.

2.4. Distinct Contribution of apaQTLs to NSCLC Risk

   As genetic variation may contribute to disease risk through APA, we
   next tested whether the identified apaQTL SNPs contribute to NSCLC
   risk. By integrating GWAS summary statistics from previous work
   (Methods, Supporting Information), we evaluated the enrichment of
   apaQTLs within NSCLC GWAS loci. Of note, apaQTL SNPs were significantly
   enriched in NSCLC GWAS loci compared with control SNPs [OR (95% CI),
   2.63 (2.57–2.70); P < 0.0001; Figure [88]2A]. Additionally, by
   conducting colocalization analysis, we totally found 2058 apaQTLs
   colocalizing with 12 NSCLC GWAS signals. As an example, apaQTL SNP
   rs8042489 was identified to colocalize with a NSCLC GWAS signal in the
   15q25.1 loci (Figure [89]S6A, Supporting Information), which was
   significantly associated with 3′UTR usage in TBC1D2B (Figure [90]S6B,
   Supporting Information). Recent studies demonstrated that TBC1D2B
   functions as a Rab22‐binding protein and promotes lung cancer
   oncogenesis.^[ [91]^21 ^] Moreover, rs8042489 conferred a significant
   genetic predisposition to NSCLC (Figure [92]S6C, Supporting
   Information). These results suggest that apaQTL variant rs8042489 might
   affect TBC1D2B APA process, thus contributing to NSCLC risk. To further
   understand the contribution of apaQTLs to NSCLC risk, eQTLs, a
   fundamental mechanism of gene regulation, were included for comparison.
   To conduct comparison between apaQTLs and eQTLs, we downloaded NSCLC
   eQTL data from PancanQTL database
   ([93]http://bioinfo.life.hust.edu.cn/PancanQTL/).^[ [94]^22 ^] Then, we
   calculated the overlap proportion of each two of them. The observed
   overlap between apaQTLs and eQTLs was moderate, with only 20.75% of
   apaQTL SNPs overlapping with eQTL SNPs (Figure [95]2B). These results
   are consistent with observations from other studies,^[ [96]^11 ,
   [97]^23 ^] suggesting that apaQTLs exert their functional roles
   independent of regulation of gene expression. Furthermore, both apaQTLs
   and eQTLs showed an excess of low P values in NSCLC GWASs, and apaQTLs
   appeared to have larger magnitude effects than eQTLs (Figure [98]2C).
   Consistently, partitioned heritability estimation using linkage
   disequilibrium (LD) score regression revealed higher enrichment of
   NSCLC heritability in apaQTLs than in eQTLs (Figure [99]2D). We also
   validate these results in the Finnish biobank (FinnGen) and found that
   apaQTLs account for 5.23% of cancer heritability, which is higher than
   that of eQTLs (4.15%) (Figure [100]S7, Supporting Information).
   Additionally, we removed the apaQTLs that overlapped with eQTLs and
   subsequently conducted QQ plot analysis and heritability estimation.
   The results indicate that apaQTLs remain a significant ability in
   cancer heritability estimation (Figure [101]S8, Supporting
   Information). Collectively, our findings suggest that most apaQTLs are
   distinct from eQTLs, demonstrating the distinct contribution of apaQTLs
   to NSCLC risk.

Figure 2.

   Figure 2
   [102]Open in a new tab

   Distinct role of apaQTLs in interpreting NSCLC GWAS variants and risk
   prediction. A) Enrichment of apaQTLs among GWAS loci by the two‐tailed
   Fisher's exact test comparing apaQTLs to control non‐apaQTLs. Error
   bar, 95% CI. B) Overlap between apaQTLs and eQTLs in NSCLC. C)
   Quantile–quantile (QQ) plot showing the P‐values of NSCLC GWAS SNPs,
   which were binary annotated by apaQTLs and eQTLs with FDR < 0.05. D)
   Proportion of GWAS trait heritability explained by apaQTLs and eQTLs.
   The bars indicate standard errors. E) Overview of PRS models
   development, assessment, and testing strategy. F–H) Odds ratios (ORs)
   for NSCLC risk of each PRS group by comparing with those in bottom 20%
   under three PRS models in UKB case‐control set F), OncoArray Consortium
   case‐control set G), and PLCO case‐control set H). PRS[apaQTL]
   indicates the PRS derived from independent apaQTLs, PRS[GWAS] indicates
   the PRS derived from known NSCLC GWAS variants, PRS[Combined] indicates
   the PRS derived from combination of apaQTLs and NSCLC GWAS variants.
   Error bars, 95% CIs. I) Cumulative incidences of top 33% and bottom 33%
   of PRS[apaQTL] in UK Biobank cohort. Hazard ratios and the 95%
   confidence intervals derived from Cox proportional hazards regression
   model. J) Cumulative incidences of top 33% and bottom 33% of PRS[GWAS]
   in UK Biobank cohort. Hazard ratios and the 95% confidence intervals
   derived from Cox proportional hazards regression model. K) Cumulative
   incidences of top 33% and bottom 33% of PRS[Combined] in UK Biobank
   cohort. Hazard ratios and the 95% confidence intervals derived from Cox
   proportional hazards regression model.

2.5. Polygenic Risk Scores from apaQTLs Improve Risk Prediction for NSCLC

   Previous studies have shown that polygenic risk score (PRS) derived
   from functional genetic variants can serve as efficient tool for
   disease risk assessment. To investigate the clinical utility of NSCLC
   apaQTLs in predicting NSCLC risk, we constructed PRS models by using
   genetic variants from three different sets, including 366 independent
   apaQTL variants after LD clumping (r ^2>0.2, PRS[apaQTL]), 120 known
   NSCLC risk variants (PRS[GWAS]), and 486 independent variants
   (PRS[Combined]) from a combination of apaQTL and GWAS variants
   (Figure [103]2E). We then compared the predictive performance of each
   of these PRSs in three independent NSCLC study groups from UK biobank,
   OncoArray Consortium Lung Cancer Studies, and PLCO. Initially, we
   performed the NSCLC risk stratification in UK biobank dataset,
   including 3622 cases and 14 488 controls. We found that apaQTLs‐derived
   PRS exhibited a moderate capability of NSCLC risk prediction across
   each grade of genetic risk, with high genetic risk (in the top 20% PRS)
   had a 1.20‐fold increased NSCLC risk of compared to those with low
   genetic risk (in the bottom 10% PRS; P [trend] = 5.49×10^−3,
   Figure [104]2F). Importantly, PRS[combined] outperformed the other two
   PRS models by the values of OR (OR = 1.47, 95%CI = 1.32–1.63; P [trend]
   = 5.64×10^−12, Figure [105]2F), indicating that apaQTL variants can be
   used to improve NSCLC risk stratification. A similar trend was observed
   in two external validation sets from OncoArray (OR [95%CI] =
   1.35[1.25–1.45], P [trend] = 1.13×10^−7, Figure [106]2G) and PLCO (OR
   [95%CI] = 1.54[1.32–1.81], P [trend] = 3.14×10^−8, Figure [107]2H).

   We further evaluated its effectiveness in risk stratification in a
   prospective UK biobank cohort consisting of 418 989 individuals, of
   whom 3777 developed NSCLC during follow‐up. The apparently separate
   predictive cumulative incidence curves were observed in individuals at
   low (bottom 33% of PRS) and high genetic risk (top 33% of PRS;
   Figure [108]2I–K). Consistent with the findings in case‐control
   dataset, individuals with high apaQTLs‐derived PRS had higher risk of
   incident NSCLC than individuals with low genetic risk (HR = 1.22, 95%
   CI = 1.13–1.33, P = 9.97×10^−7, Figure [109]2I). Of note, PRS[Combined]
   still showed the optimal discriminatory ability to stratify individual
   NSCLC risk (Figure [110]2K). These results highlight the potential
   clinical utility of PRS[apaQTL] to improve NSCLC risk prediction by
   combining with traditional GWAS‐derived PRS. Taken together, our
   findings suggest that integration of apaQTL variants with traditional
   GWAS‐derived PRS improves genetic risk prediction for NSCLC.

2.6. apaQTL Variant rs9606 is Significantly Associated with NSCLC Risk

   To identify the casual apaQTL variants that contribute to NSCLC risk by
   modulating APA events, we integrated 117 161 apaQTLs (FDR < 0.05) with
   NSCLC GWAS data from OncoArray Consortium lung cancer studies (13 835
   cases and 13 835 controls, Figure [111]3A). The demographic
   characteristics of the GWAS dataset were summarized in Table [112]S1
   (Supporting Information). To identify more functional apaQTL variants,
   we applied a significance level of P < 1×10^−6 which is widely used in
   previous studies.^[ [113]^24 ^] A total of 1494 apaQTLs were identified
   to be associated with NSCLC risk (Figure [114]3B). Among these risk
   apaQTL variants, rs9609 showed the most significant association with
   APA events (Figure [115]3C). rs9606 was located in the 3′UTR of LYRM4
   and showed strong association with alternative 3′UTR of LYRM4 mRNA,
   implying its potential role in APA regulation. T allele of rs9606 was
   also revealed to be associated with increased risk of NSCLC in
   OncoArray NSCLC samples (OR = 1.09; 95%CI, 1.06–1.13; P = 8.77×10^−8;
   Table [116]1 ). Thus, we focused on apaQTL rs9606 for further
   investigation.

Figure 3.

   Figure 3
   [117]Open in a new tab

   The LYRM4 rs9606 variant as the putative functional apaQTL link
   alternative polyadenylation to NSCLC risk. A) Flowchart of
   identification and validation of functional risk apaQTLs in NSCLC. B)
   Manhattan plot shows ‐log[10](P value) for associations of apaQTLs with
   NSCLC risk estimated by OncoArray GWAS data consisting of 13 835 cases
   and 13 835 controls. The red line denotes the threshold of significance
   (P < 1×10^−6). C) Manhattan plot depicted the ‐log[10](FDR value) of
   1494 apaQTLs that associated with NSCLC risk. The most significant
   apaQTL LYRM4 rs9606 was annotated with red arrow. D) Box plot showing
   that apaQTL rs9606 was significantly associated with LYRM4 3′UTR usage
   for each genotype in TCGA NSCLC samples. P value was calculated by
   One‐Way ANOVA. E) Schematic depicts LYRM4 with short and long 3′UTR. F)
   3′‐RACE were performed to determine the effect of LYRM4 rs9606
   genotypes on LYRM4 3′UTR usage in A549 and H1299 cells. G) Sanger
   sequencing confirmed the precise polyadenylation sites within LYRM4
   transcript. H) qRT‐PCR validated the effect of LYRM4 rs9606 genotypes
   on LYRM4 3′UTR usage in A549 and H1299 cells. I,J) The LYRM4 mRNA I)
   and protein expression J) levels were detected by qRT‐PCR and western
   blotting in NSCLC cells overexpressing control vector, LYRM4[T], and
   LYRM4[G] vector. K) The effect of LYRM4 rs9606 genotypes on LYRM4 mRNA
   stability were determined by qRT‐PCR in A549 and H1299 cells after
   treating with actinomycin D (5 µg mL^−1) at indicated time points. For
   panels (H–K), the data were shown as mean ± S.D. from three independent
   experiments. P values were calculated using Student's t‐test. **, P
   < 0.01; ***, P < 0.001.

Table 1.

   The association of rs9606 with NSCLC risk in European and Chinese
   case‐control set (N = 68 358).
   OncoArray Consortium set [N = 27 670] UK biobank set [N = 18 110] PLCO
   set [N = 8560] Combined set [N = 54 340]
   Genotypes/ Case/Control OR (95%CI)[118] ^a) P[119] ^b) Case/Control OR
   (95%CI)[120] ^a) P[121] ^b) Case/Control OR (95%CI)[122] ^a) P[123] ^b)
   Case/Control OR (95%CI)[124] ^a) P[125] ^b) MAF
   rs9606
   GG 3670/3937 1.00(Ref) 922/4157 1.00(Ref) 650/2795 1.00(Ref) 5242/10889
   1.00(Ref) 0.465
   GT 6431/6558 1.05(0.99–1.11) 8.60×10^−2 1767/7091 1.12(1.03–1.23)
   1.12×10^−2 771/3250 1.02(0.91–1.15) 7.22×10^−1 8969/16899
   1.10(1.05–1.15) 1.19×10^−5
   TT 3734/3340 1.20(1.12–1.28) 6.95×10^−8 933/3240 1.30(1.17–1.44)
   6.11×10^−7 291/803 1.56(1.33–1.83) 3.55×10^−8 4958/7383 1.37(1.30–1.43)
   3.32×10^−35
   Dominant 1.10(1.04–1.16) 4.58×10^−4 1.18(1.08–1.28) 1.28×10^−4
   1.13(1.01–1.26) 2.97×10^−2 1.18(1.13–1.23) 1.80×10^−16
   Recessive 1.16(1.10–1.22) 9.75×10^−8 1.20(1.11–1.31) 1.50×10^−5
   1.55(1.34–1.79) 4.98×10^−9 1.29(1.24–1.34) 1.32×10^−32
   Additive 1.09(1.06–1.13) 8.77×10^−8 1.14(1.08–1.20) 6.65×10^−7
   1.20(1.11–1.29) 7.19×10^−6 1.17(1.14–1.19) 6.09×10^−34
   Genotypes/ rs9606 Independent I set (N = 3400) Independent II set (N =
   5618) Independent III set (N = 5000) Combined set (N = 14018)
   Case/Control OR (95%CI)[126] ^a) P[127] ^c) Case/Control OR (95%CI)^a)
   P[128] ^c) Case/Control OR(95%CI)^a) P[129] ^c) Case/Control
   OR(95%CI)^a) P[130] ^c) MAF
   GG 1024/1053 1.00(Ref) 1675/1752 1.00(Ref) 1481/1575 1.00(Ref)
   4180/4380 1.00(Ref) 0.205
   GT 630/617 1.37(1.12–1.67) 2.03×10^−3 1068/1006 1.28(1.14–1.44)
   3.93×10^−5 960/884 1.20(1.06–1.37) 4.89×10^−3 2658/2507 1.18(1.10–1.27)
   3.79×10^−6
   TT 46/30 1.64(1.01–2.67) 4.52×10^−2 66/51 1.50(1.03–2.18) 3.66×10^−2
   59/41 1.62(1.07–2.46) 2.29×10^−2 171/122 1.46(1.15–1.86) 1.79×10^−3
   Dominant 1.39(1.14–1.69) 9.59×10^−4 1.29(1.15–1.45) 1.46×10^−5
   1.22(1.07–1.38) 2.24×10^−3 1.20(1.12–1.28) 5.27×10^−7
   Recessive 1.40(0.87–2.25) 1.66×10^−1 1.36(0.93–1.97) 1.12×10^−1
   1.47(0.98–2.22) 6.51×10^−2 1.02(0.96–1.09) 5.03×10^−1
   Additive 1.33(1.13–1.58) 8.14×10^−4 1.25(1.14–1.41) 1.16×10^−5
   1.22(1.09–1.40) 8.65×10^−4 1.19(1.12–1.27) 1.21×10^−7
   [131]Open in a new tab
   ^^a)

   ORs and 95% CIs calculation were conducted under assumption that
   variant alleles were risk alleles;
   ^^b)

   All P values were calculated by unconditional logistic regression model
   after adjusting for gender and age group;
   ^^c)

   All P values were calculated by unconditional logistic regression model
   after adjusting for gender, age group, smoking status and drinking
   status; Abbreviations: Ref, reference; OR, odds ratio; CI, confidence
   interval.

   We adopted NSCLC genotype data from the UK biobank and PLCO cohort to
   strengthen the population epidemiological findings in OncoArray NSCLC
   cohorts. The demographic characteristics were listed in Tables [132]S2
   and [133]S3 (Supporting Information). After combined analysis, the
   result showed that the association between apaQTL and NSCLC risk
   remained significant, with the T allele having an OR for NSCLC risk of
   1.17 (95% CI, 1.14–1.19; P = 6.09×10^−34) in the addictive model
   (Table [134]1). Furthermore, we validate the association between rs9606
   and NSCLC risk in three independent Chinese cohorts. A total of 1700
   NSCLC cases and 1700 healthy controls were included in phase I, 2809
   cases and 2809 controls were included in phase II, and 2500 cases and
   2500 controls were included in phase III. The demographic
   characteristics of the study subjects were summarized in Table [135]S4
   (Supporting Information). Significant associations were verified in
   both phases. Compared with rs9606 GG genotype, individuals with TT
   genotype had higher risk of NSCLC. Combined analysis of two phases
   showed that rs9606 TT genotype was still associated with increased risk
   of NSCLC, with odds ratio of 1.19 (95% CI, 1.12–1.27; P = 1.21×10^−7)
   under the addictive model (Table [136]1).

2.7. Risk T allele of rs9606 Facilitates LYRM4 Expression by Regulating APA
Process

   To examine the functional significance of apaQTL variant rs9606
   contributing to APA process in NSCLC, we first analyzed the association
   between rs9606 genotypes and the alternative polyadenylation of LYRM4.
   Our analysis revealed that apaQTL variant rs9606 was significantly
   associated with alternative 3′UTR of LYRM4 mRNA, implying its potential
   role in APA regulation (Figure [137]3D). To validate the effect of
   rs9606 on APA, we performed qualitative 3′Rapid Amplification of cDNA
   Ends (3′RACE) and quantitative polymerase chain reaction (qPCR)
   analyses of changes of LYRM4 3′UTR length in NSCLC cell lines
   (Figure [138]3E). In both A549 and H1299 cell lines, we found that
   overexpression of LYRM4 containing risk T allele of rs9606 (LYRM4[T])
   increased level of LYRM4 short 3′UTR isoform, compared with that of
   LYRM4 containing the nonrisk G allele of rs9606 (LYRM4[G]) and control
   vector (Figure [139]3F–H). These results suggest that risk T allele of
   rs9606 causes LYRM4 3′UTR shortening by APA.

   As alterations of 3′UTR length by APA may deregulate expression of
   oncogenes, we assessed if risk T allele of rs9606 can modulate LYRM4
   expression. As expected, we found that overexpression of rs9606‐T
   allele resulted in increased mRNA and protein expression levels of
   LYRM4, compared with that of rs9606‐G allele and control vector
   (Figure [140]3I,J). To further explore the mechanism of rs9606 modulate
   LYRM4 expression, we first assessed LYRM4 mRNA decay rate at multiple
   indicated timepoints following Actinomycin D treatment. qRT‐PCR of
   LYRM4 mRNA found that overexpression of rs9606‐T allele significantly
   increased the stability of LYRM4 mRNA (Figure [141]3K). Moreover, we
   determined the direct effect of changes of LYRM4 3′UTR length on its
   mRNA stability. The result showed that overexpression of the LYRM4 with
   short 3′UTR exhibited a significantly slower LYRM4 mRNA decay rate than
   that overexpression of LYRM4 with long 3′UTR (Figure [142]S9,
   Supporting Information). Then, we performed dual luciferase reporter
   assays using reporter plasmids with short or long 3′UTR of LYRM4. We
   mutated the proximal poly(A) signal sequence of the plasmid containing
   long LYRM4 3′UTR to ensure the only dPAS is used (Figure [143]S10A,
   Supporting Information). Notably, overexpression of plasmids with short
   3′UTR showed significantly higher luciferase activity than those with
   long 3′UTR, suggesting 3′UTR shortening affect translation process
   (Figure [144]S10B, Supporting Information). To further investigate
   whether the rs9606 can influence the LYRM4 expression by directly
   regulating gene transcription, in addition to the APA process, we
   constructed luciferase vectors containing either the rs9606‐G or
   rs9606‐T alleles and transfected them into A549 and H1299 cell lines.
   Notably, we found that transfection of the vector containing the
   rs9606‐T allele did not affect luciferase activities in the cells
   compared to the vector containing the rs9606‐G allele (Figure [145]S11,
   Supporting Information). Taken together, these findings demonstrate
   that apaQTL rs9606 function as a causal variant modulating alternative
   polyadenylation of LYRM4, leading to dysregulation of LYRM4 by
   affecting mRNA stability and translation activity.

2.8. RNA Binding Protein NUDT21 Preferentially Bind to Nonrisk G allele of
rs9606 at LYRM4

   Having demonstrated apaQTL rs9606 regulates LYRM4 expression through
   driving aberrant APA process, we next sought to investigate the
   underlying mechanism. Given that apaQTL variants are thought to
   modulate APA process by altering RBP binding, we examined whether
   rs9606 directly alters RBP binding. RNA pull‐down plus mass
   spectrometry (MS) assays revealed several candidate RBPs involved in
   APA process (Figure [146]4A), in which NUDT21 had been reported as a
   critical APA regulator.^[ [147]^10 , [148]^25 ^] Further motif analysis
   using online CLIP‐Seq database (ENCORI platform) showed that rs9606
   overlaps with predicted binding motif of NUDT21, and NUDT21
   preferentially bind to nonrisk G allele of rs9606 (Figure [149]4B).
   Moreover, NUDT21 expression was lower in NSCLC tissues compared to the
   adjacent normal tissues, and NUDT21 expression was negatively
   correlated with LYRM4 in our own NSCLC tissues (Figure [150]4C,D).
   Furthermore, we investigated the effect of NUDT21 on phenotypes of
   NSCLC cell and found that knockdown of NUDT21 promotes NSCLC cell
   proliferation and migration (Figure [151]S12, Supporting Information).
   Therefore, we selected NUDT21 as candidate RBP for further
   investigation.

Figure 4.

   Figure 4
   [152]Open in a new tab

   RBP NUDT21 mediates the process of LYRM4 rs9606 variant regulating
   LYRM4 APA process. A) Mass spectrometry screening for RNA binding
   proteins (RBPs) that differentially bound to rs9606‐G and rs9606‐T
   probes. Horizontal line indicates the P < 0.05 cutoff. Vertical lines
   mark the fold change > 1.1 or < 0.9. B) Predicted preferential binding
   of NUDT21 to nonrisk allele G of rs9606. C) The expression of NUDT21 in
   our own NSCLC tissues and paired adjacent normal tissues. P‐value was
   calculated by Wilcoxon rank‐sum test. D) The correlation between NUDT21
   expression and LYRM4 expression in our own NSCLC tissues. P‐value and
   r‐value were calculated by Spearman correlation analysis. E) The
   candidate RBP NUDT21 identified from proteomic screening was confirmed
   by RNA pull down followed by western blot analysis. F) RIP assay
   followed by qRT‐PCR analysis suggested the preferential binding of
   NUDT21 to the LYRM4 rs9606[G] allele. Results were normalized to input
   from three independent experiments. IgG used as negative control here.
   G) NSCLC cells were seeded on plates after transfection with siRNAs
   targeting NUDT21 or the siControl. Then the cells were transfected with
   LYRM4[G], LYRM4[T], or the control vector. The difference of LYRM4
   3′UTR usage between LYRM4 rs9606 genotypes upon NUDT21 knockdown were
   detected by 3′‐RACE assays. H) Quantification of long and short LYRM4
   3′UTRs by qRT‐PCR in NSCLC cells. Data were presented as the mean ± SD
   from three independent experiments. **P < 0.01 were calculated using
   Student's t‐test. I,J) The effects of NUDT21 knockdown on LYRM4
   expression were detected by qRT‐PCR I) and western blot assays J) in
   cells transfected with LYRM4[G], LYRM4[T], or the control vector. Data
   were presented as the mean ± SD from three independent experiments.
   ***P < 0.001 were calculated using Student's t‐test. K) The effects of
   NUDT21 knockdown on LYRM4 mRNA stability were detected by qRT‐PCR in
   NSCLC cells after actinomycin D (5 µg mL^−1) treatment. Data were
   presented as the mean ± SD from three independent experiments. ***P
   < 0.001 were calculated using Student's t‐test.

   To experimentally verify that rs9606 influences binding affinity of
   NUDT21 for LYRM4, we performed RNA pull down assays using RNA probes
   encompassing risk T allele or nonrisk G allele of rs9606. We found
   NUDT21 was preferentially bound to nonrisk G allele of rs9606 at LYRM4
   (LYRM4[G]), which was further validated by RNA immunoprecipitation
   (RIP) assay (Figure [153]4E,F). We next examined the regulatory effects
   of NUDT21 on LYRM4 APA process through 3′RACE assay. After
   siRNA‐mediated knockdown of NUDT21, we observed significant 3′UTR
   shortening of LYRM4 transcript and the differences in percentage of
   LYRM4 short 3′UTR isoform between LYRM4[T] and LYRM4[G] groups were
   attenuated (Figure [154]4G,H). Furthermore, the differences in LYRM4
   mRNA and protein expression levels between LYRM4[T] and LYRM4[G] were
   significantly attenuated upon NUDT21 knockdown (Figure [155]4I,J). In
   parallel, we measured LYRM4 mRNA stability, finding that knockdown of
   NUDT21 stabilized LYRM4 mRNA and diminished differences of half‐lives
   between LYRM4[T] and LYRM4[G] groups (Figure [156]4K). Collectively,
   these results suggested that NUDT21 preferentially bind to nonrisk G
   allele of rs9606 to regulate APA process of LYRM4, leading to 3′UTR
   lengthening and decreased LYRM4 expression by destabilized its mRNA.

2.9. Risk T allele at LYRM4 Promotes Malignant Phenotypes of NSCLC Cells
through Suppressing Ferroptosis

   To further explore biological function of LYRM4 in NSCLC, we first
   compared the mRNA expression of LYRM4 in our own NSCLC and adjacent
   normal tissues, and observed that LYRM4 was significantly highly
   expressed in NSCLC tissues (Figure [157]5A). Then, we overexpressed
   LYRM4 with different rs9606 alleles and explore its effect on tumor
   phenotypes in vitro and in vivo. We found that overexpression of LYRM4
   promoted cell proliferation, migration, and colony‐forming abilities of
   NSCLC cells (Figure [158]5B–F), suggesting the oncogenic role of LYRM4
   in NSCLC. Notably, compared with LYRM4 containing nonrisk G allele of
   rs9606 (LYRM4[G]), LYRM4 containing risk T allele of rs9606(LYRM4[T])
   showed stronger effects on promoting cell proliferation, migration, and
   colony‐forming abilities (Figure [159]5B–F). Consistent with our in
   vitro findings, we found that LYRM4[T] overexpression facilitated tumor
   formation in vivo, compared with LYRM4[G] overexpression and control
   group (Figure [160]5G,H). Taken together, these in vitro and in vivo
   results suggest that LYRM4 may acts as an oncogene in NSCLC
   development. Overexpression of LYRM4[T] more strongly promotes the
   malignant phenotypes of NSCLC cancer cells compared to LYRM4[G].

Figure 5.

   Figure 5
   [161]Open in a new tab

   Risk T allele at LYRM4 drives NSCLC cell proliferation and progression
   in vitro and in vivo. A) The expression of LYRM4 in our own NSCLC
   tissues and paired adjacent normal tissues. P‐value was calculated by
   Wilcoxon rank‐sum test. B) The effect of LYRM4[T] and LYRM4[G]
   overexpression on in vitro cell proliferation determined by CCK‐8
   assays. Results were shown as the mean ± SD from three independent
   experiments. C,D) The effect of LYRM4[T] and LYRM4[G] overexpression on
   colony formation ability of NSCLC cells in vitro. Representative
   pictures C) and quantification D) of colony formation ability were
   shown. Results were shown as the mean ± SD from three independent
   experiments. E,F) The effect of LYRM4[T] and LYRM4[G] overexpression on
   migration ability of NSCLC cells in vitro. Representative images E) and
   quantification F) of cell migration ability were shown. Results were
   shown as the mean ± SD from three independent experiments. G,H)
   Representative images, growth curves, and tumor weight of xenografts
   injected with NSCLC cells transfected with LYRM4[G], LYRM4[T], or the
   control vector. Results were shown as mean ± SD from 5 mice of each
   group. **P < 0.01 and ***P < 0.001were calculated using Student's
   t‐test.

   Previous studies have demonstrated that LYRM4 (also known as ISD11) is
   crucial for the function of iron–sulfur (Fe–S) cluster assembly
   complex.^[ [162]^26 ^] Since Fe–S cluster assembly complex plays a key
   role in Fe–S cluster biogenesis and ferroptosis, we wondered whether
   LYRM4 promotes cell proliferation through modulating ferroptosis. To
   test this hypothesis, we performed immunoprecipitation experiments
   followed by mass spectrometry (MS) analysis. We identified NFS1 as a
   potential interactor of LYRM4, and observed an increased binding of
   NFS1 in LYRM4[T] overexpression group compared to LYRM4[G]
   overexpression group (Figure [163]6A,B). Coimmunoprecipitation followed
   by western blotting analysis further confirmed the direct interaction
   between LYRM4 and NFS1, and showed LYRM4[T] significantly increased the
   interaction compared to LYRM4[G] (Figure [164]6C). As the essential
   role of NFS1‐LYRM4 complex in the process of Fe–S cluster biogenesis
   has been reported,^[ [165]^27 ^] it is plausible to postulate that
   LYRM4 is functionally linked to ferroptotic activity. We then
   overexpressed LYRM4 with different rs9606 alleles and found that LYRM4
   overexpression significantly inhibited Erastin‐induced ferroptosis of
   NSCLC cells, which was confirmed by increased cell viability, decreased
   accumulation of ferroptosis biomarkers Fe^2+, MDA, cellular ROS, lipid
   ROS, and increased GSH level (Figure [166]6D–I; and Figure [167]S13A–F,
   Supporting Information). Compared with LYRM4[G] overexpression and
   control group, LYRM4[T] overexpression group was more resistant to
   Erastin‐induced ferroptosis (Figure [168]6D–I; and Figure [169]S13A–F,
   Supporting Information). Morphological change in cells and organelles
   is another indicator of ferroptosis. Consistently, transmission
   electron microscopy revealed significant shrunken mitochondria and
   increased membrane density in NSCLC cells after erastin treatment
   (Figure [170]6J; and Figure [171]S13G, Supporting Information).
   Overexpression of LYRM4[T] partly alleviated these erastin‐induced
   morphological changes in NSCLC cells, compared with control and
   LYRM4[G] overexpression group (Figure [172]6J; and Figure [173]S13G,
   Supporting Information). In summary, these findings indicated that
   LYRM4[T] overexpression confers NSCLC cells with stronger resistance to
   ferroptosis compared to LYRM4[G] and LYRM4 act as an oncogene promoting
   the malignant phenotypes of NSCLC cells through suppressing
   ferroptosis.

Figure 6.

   Figure 6
   [174]Open in a new tab

   Risk T allele at LYRM4 suppressing ferroptosis activity to promote
   malignant phenotypes of NSCLC cells. A) Schematic of LYRM4 interactor
   identification through proteomic screening by mass spectrometry in A549
   cells. This figure was created with BioRender ([175]www.biorender.com).
   B) Immunoprecipitation followed by proteomic screening showed increased
   binding of NFS1 in LYRM4[T] overexpression group compared to LYRM4[G]
   overexpression group. Horizontal line indicates the P < 0.05 cutoff.
   Vertical lines mark the fold change > 1.1 or < 0.9. C)
   Coimmunoprecipitation followed by western blotting analysis confirmed
   the interaction between NFS1 and LYRM4 isoforms. D–G) Cell viability
   D), Fe^2+ levels E), MDA levels F), and GSH levels G) were measured.
   A549 cells were transfected LYRM4[T], LYRM4[G], and control vector.
   After transfection, cells were treated with DMSO or erastin (10 µm) for
   24 h. Results were shown as the mean ± SD from three independent
   experiments. H,I) Intracellular ROS and lipid ROS levels were detected
   using flow cytometry with DCFH‐DA and BODIPY staining, respectively.
   A549 cells were transfected with indicated DNA constructs. After
   transfection, cells were treated with DMSO or erastin (10 µm) for 24 h.
   Representative images H) and quantitative static results I) were shown.
   Results were presented as the mean ± SD from three independent
   experiments. J) Morphological changes of mitochondria were detected by
   transmission electron microscopy. A549 cells were transfected with
   indicated DNA constructs, followed by treatment with DMSO or erastin
   (10 µm) for 24 h. P values were calculated using Student's t‐test. *, P
   < 0.05; **, P < 0.01; ***, P < 0.001.

3. Discussion

   In the present study, we performed a large‐scale apaQTL analysis in
   1010 NSCLC patients from TCGA. By analyzing genotype and APA events
   data from TCGA, we identified a total of 117 161 apaQTL SNPs and 2109
   corresponding targeted genes in NSCLC. Further integrated analysis of
   apaQTLs with GWAS suggested the role of APA as an important link from
   genetic variation to NSCLC risk. Of note, we proved that integration of
   apaQTL variants with traditional GWAS‐derived PRS improves risk
   prediction for NSCLC in multiple independent cohorts. Finally, through
   large‐scale population study and biological experiments, we identified
   a causal apaQTL variant rs9606 that was significantly associated with
   NSCLC risk and LYRM4 APA process. Mechanistically, risk T allele of
   rs9606 facilitate alternative polyadenylation of LYRM4 and increase
   usage of LYRM4 proximal poly(A) sites mediated by NUDT21, leading to
   shortening of the LYRM4 3′UTR and increased expression of LYRM4. The
   increased expression of LYRM4 promoted malignant phenotypes of NSCLC
   cells via ferroptosis inhibition, which could explain why rs9606‐T
   allele contribute to increased risk of NSCLC. Collectively, this study
   advances our knowledge of the relationships between genetic variants
   and alternative polyadenylation, therefore providing new insights into
   regulatory mechanism underlying NSCLC etiology (Figure [176]7 ).

Figure 7.

   Figure 7
   [177]Open in a new tab

   Graphical representation summarizing the mechanism of apaQTL variant
   rs9606 contributes to NSCLC risk. Compared with the nonrisk G allele of
   rs9606, the risk T allele of rs9606 decreases the binding affinity of
   NUDT21 for LYRM4, leading to 3′UTR shortening of LYRM4 mRNA. Shortening
   of the LYRM4 3′UTR leads to increased expression of LYRM4 through
   stabilization its mRNA. The increased expression of LYRM4 promoted
   malignant phenotypes of NSCLC cells through suppressing ferroptosis,
   which provided functional evidence supporting our population findings
   that apaQTL variant rs9606[T] allele contribute to increased risk of
   NSCLC. The figure was created by Figdraw ([178]www.figdraw.com).

   Alternative polyadenylation plays an essential role in many
   physiological processes and is often dysregulated in cancer.^[ [179]^9
   , [180]^28 ^] With the emerging evidence indicate that genetic variants
   that affect APA events contribute to multiple complex disease risk,
   there is an urgent need to characterize genetic basis and mechanism of
   APA regulation. In this study, through integrative apaQTL analysis in
   NSCLC tissues, we identified a set of apaQTLs in NSCLC. Consistent with
   previous apaQTL studies,^[ [181]^11a ^] we found that these identified
   apaQTLs were significantly enriched in 3′UTR region. Interestingly, the
   identified apaQTLs were enriched in multiple active regulatory elements
   (epigenetic marks and TF binding sites), implying the potential role in
   gene regulation.^[ [182]^29 ^] We also observed the significant
   enrichment of apaQTLs within binding sites for various RBPs involved in
   APA regulation. In particular, we experimentally validated a causal
   apaQTL variant rs9606 alters the binding site for RBP NUDT21, leading
   to a shift to an LYRM4 isoform with shorter 3′UTR through APA. These
   findings are consistent with previous studies suggest genetic
   regulation of APA events as an important molecular mechanism underlying
   human diseases.^[ [183]^30 ^]

   Molecular quantitative trait loci (xQTL) analysis provides a powerful
   strategy to interpret functional effects of GWAS variants. xQTLs
   identified by previous population‐scale studies, such as expression
   QTLs(eQTLs),^[ [184]^31 ^] splicing QTLs(sQTLs),^[ [185]^32 ^] and
   chromatin QTLs(cQTLs),^[ [186]^33 ^] have furthered our understanding
   of the molecular mechanisms linking genetic variants to cancer
   susceptibility. In our study, we found that several apaQTLs overlapped
   with eQTLs. This overlap suggests that these apaQTLs may influence gene
   expression levels either by modulating the APA process alone or by
   simultaneously regulating gene transcription, similar to classical
   3′UTR eQTL SNPs.^[ [187]^34 ^] However, the majority of apaQTLs and
   eQTLs appear to be largely independent. We speculate that
   eQTL‐independent apaQTLs regulate translation or the cellular
   localization of target genes without affecting the regulation of gene
   expression.^[ [188]^35 ^] Consistently, our integrated analysis of
   apaQTLs and NSCLC GWAS data also revealed the important role of apaQTLs
   in NSCLC risk. Colocalization analyses showed significant enrichment of
   apaQTLs among NSCLC GWAS loci. Further analysis indicated that apaQTLs
   can explain some of the missing heritability in NSCLC and are largely
   distinct from eQTLs. Furthermore, our investigations revealed
   significant association between an apaQTL variant rs9606 and NSCLC
   risk, which was further validated in multiple independent cohorts.
   Mechanistically, rs9606 affects the choice of poly(A) site by NUDT21,
   resulting in altered mRNA stability of LYRM4 which ultimately
   contributes to NSCLC cell proliferation. Thus, large‐scale apaQTL
   studies in NSCLC and other tissues can discover more functional genetic
   variants to advance our understanding of genetic regulation of APA and
   facilitate the translation of GWAS findings to mechanistic
   understanding.

   PRS are playing increasing important roles in research studies to
   identify individuals at high risk of diseases, such as cardiovascular
   disease^[ [189]^36 ^] and lung cancer.^[ [190]^37 ^] However, due to
   the modest predictive ability for diseases, the clinical application of
   PRS in disease risk assessment remains limited. In this study, we
   constructed three PRS models and evaluated the its predictive
   performance of NSCLC risk. Of note, integration of apaQTL variants with
   traditional GWAS‐derived PRS improves NSCLC risk stratification across
   multiple independent cohorts. This finding is consistent with recent
   studies, which suggested that incorporating functional variants
   discovered in molecular quantitative trait loci studies improve risk
   prediction ability of diseases.^[ [191]^38 ^] Remarkably, the combined
   PRS not only optimized the risk stratification for NSCLC in
   case‐control sets, but also outperformed in separating populations into
   high risk and low risk of NSCLC respectively in a perspective cohort
   set. These findings provided additional evidence that apaQTL variants
   could be used for NSCLC risk assessment, implying the its potential
   translational utility for NSCLC screening and individualized
   prevention.

   Identifying and annotation of target genes from genetic data will
   enable mechanistic understanding of disease and development of
   innovative therapies.^[ [192]^39 ^] In this study, we found that target
   genes affected by apaQTLs were enriched in processes related to
   carcinogenesis, including oncogenic pathways, immune infiltration, and
   drug responses. Among these targeted genes, we experimentally validated
   that rs9606 target gene LYRM4 directly impacted malignant phenotypes of
   NSCLC cells in vitro and in vivo. LYRM4 (also known as ISD11) has an
   essential role in both iron–sulfur (Fe–S) cluster biogenesis and
   maintenance of cellular iron homeostasis.^[ [193]^27 , [194]^40 ^]
   Depletion of LYRM4 causes deficiency in Fe–S cluster biogenesis, which
   leads to ferroptotic cell death in cancer cells.^[ [195]^41 ^] Our
   study demonstrated that overexpression of LYRM4 promotes malignant
   phenotype of NSCLC cells through regulating ferroptotic activity, which
   is consistent with previous reports that overexpression of LYRM4
   protects lung cancer cells from undergoing ferroptosis.^[ [196]^42 ^]
   Interestingly, ferroptosis plays critical role in tumor metastasis and
   immune response. Induction of ferroptosis not only suppresses tumor
   progression but also augments immunotherapy responses.^[ [197]^43 ^]
   However, our understanding of the interaction between LYRM4‐mediated
   ferroptosis regulation and tumor immunity remains limited. Therefore,
   future studies should be warranted to investigate potential role of
   LYRM4 in metastasis, tumor microenvironment, and immune response. It
   will also be interesting to investigate the intriguing potential of the
   association between SNPs and APA events of noncoding RNAs in the
   future.

   This study has several limitations. First, similar to eQTLs and
   sQTLs,^[ [198]^44 ^] apaQTLs may also be cell type‐specific. Our
   current study only used bulk RNA‐seq data for analysis, potentially
   overlooking genetic effect that are cell type‐specific and in turn
   limiting their functional interpretation. Therefore, further
   investigations should combine GWAS with single‐cell genomic atlases to
   provide a more comprehensive understanding of genetic landscape and
   regulatory mechanism underlie NSCLC. Second, we have identified
   numerous genetic variants associated with alternative polyadenylation
   and NSCLC risk, but the functional consequences of these variants are
   mostly unknown. More functional experiments are required to elucidate
   the molecular mechanisms underlie genetic control of APA and the
   following cascades of events that lead to NSCLC. For instance, RNA
   sequencing (RNA‐Seq) is planned to provide a more comprehensive view of
   the global impact of rs9606 on gene expression, including noncoding
   RNAs and other regulatory pathways. Third, apaQTLs identified in this
   study only explain a small fraction of NSCLC GWAS loci, probably due to
   genetic regulatory mechanisms beyond alternative polyadenylation. Thus,
   investigation combined effects of genetic variant associated with other
   molecular phenotypes (e.g., alternative splicing, N^6‐methyladenosine,
   and histone modification) with apaQTL variants is essential to unveil
   molecular basis of polygenic diseases and traits. Fourth, subsequent
   research could aim to develop transcriptome‐wide association studies
   (TWAS) based on QTL results to highlight the role of APA in nominating
   susceptibility genes. Finally, various environmental and lifestyle
   factors, including diet, obesity, and physical activity, have been
   shown to be associated with the NSCLC risk.^[ [199]^45 ^] These factors
   should be adjusted to enhance the accuracy of risk assessment.

4. Conclusion

   In summary, as a comprehensive apaQTL study in NSCLC thus far, our
   recent work provides a catalog of apaQTL variants that will be a
   valuable resource for future studies. We further characterized
   functional properties of these apaQTL variants, demonstrating its
   distinct role in NSCLC risk. Functional annotation of apaQTLs and
   enrichment of their target genes in cancer‐related pathways will inform
   the interpretation of NSCLC GWAS loci, providing valuable functional
   hints for potential leading molecular pathways. In particular, we
   confidently identified an apaQTL variant rs9606 as a causal variant for
   NSCLC. rs9606 promoted aberrant alternative polyadenylation process of
   LYRM4, which lead to increased expression of oncogene LYRM4 and thus
   contribute to NSCLC risk. These findings may help advance our
   understanding of the molecular mechanisms linking genetic variation to
   NSCLC susceptibility and will ultimately improve precise prevention and
   treatment strategies of NSCLC.

5. Experimental Section

Integrative apaQTL Analyses in NSCLC

   To quantify dynamic APA events in NSCLC, a well‐established DaPars was
   utilized for identification APA event indicated by Percentage of Distal
   polyA site Usage Index (PDUI) for each transcript. The PDUI data
   measure derived from DaPars were mined from TC3A
   ([200]http://tc3a.org/). To generate a reliable set of APA events, APA
   events were filtered according to the following criteria: i) PDUI
   values with a missing rate ≥ 0.1; ii) PDUI values with a standard
   deviation ≤ 10%. Ultimately, an average of 4243 APA events in NSCLC
   were included for further analysis.

   Genotype data of NSCLC samples detected by the Affymetrix SNP 6.0 Array
   were obtained from TCGA portal ([201]https://portal.gdc.cancer.gov/).
   To fill the missing genotypes of these NSCLC samples, prephase and
   imputation was conducted by IMPUTE2 (version 2.3.2),^[ [202]^46 ^] with
   1000 Genomes Project Phase III database as the reference. After
   imputation, further variants were excluded with imputation confidence
   score < 0.4, minor allele frequency (MAF) < 5%, SNP missing rate ≥ 5%
   for best‐guessed genotypes at posterior probability ≥ 0.9, or departure
   from Hardy–Weinberg Equilibrium (P < 1×10^−6). After imputation and
   quality control, 4 022 558 SNPs per cancer type were remained for
   downstream analyses.

   After PDUI values and genotype data were obtained, an integrative
   apaQTL analysis was performed to identify apaQTL SNPs in NSCLC. To
   account for potential known and other hidden covariates in apaQTL
   analyses, first principal component analysis (PCA) was performed by
   using PLINK to correct sample genotype for population structure.
   Further PEER method was used to adjust for the first 15 PEER factors of
   PDUI values in each tissue. Other common confounders include age, sex,
   and tumor stage were also included as known covariates for apaQTL
   analysis. Then a linear regression framework in FastQTL^[ [203]^47 ^]
   was applied to test associations between SNPs and APA events within a
   distance of 1 Mb around a candidate SNP as previously described.^[
   [204]^11b ^] apaQTL SNPs were subsequently identified with an FDR below
   0.05.

Functional Annotation Comparisons Between apaQTLs and Non‐apaQTLs

   To generate the control set of non‐apaQTL SNPs, first SNPs were
   extracted which were matched to apaQTL SNPs in terms of MAF, LD, and
   variant type by using a web tool vSampler
   ([205]http://mulinlab.org/vsampler/). Then, non‐apaQTL SNPs were
   selected which are not associated with APA events.^[ [206]^48 ^] Next,
   apaQTL and non‐apaQTL SNPs were functionally annotated using a SNP
   annotation tool SnpEff.^[ [207]^49 ^] According to the functional
   annotation defined by SnpEff, SNPs were classified into following
   categories: 3′‐untranslated region (UTR), 5′UTR, downstream, upstream,
   intronic, intergenic, and exonic. The difference in enrichment of
   apaQTLs and non‐apaQTLs among each functional category was evaluated by
   the two‐tailed Fisher's exact test with 2×2 table (columns; apaQTL and
   non‐apaQTL SNPs, rows; SNPs “assigned” and “not assigned” to the
   particular functional class).

Functional Enrichment of apaQTLs within Poly(A) Motifs, Functional Elements,
and RBP Binding Sites

   Putative poly(A) sites of genes undergoing APA were obtained from
   PolyA^[ [208]^50 ^] and UCSC databases. Subsequently, NSCLC apaQTLs
   which located within the 3′UTR of the associated APA genes were
   intersected with 50 bp upstream of each poly(A) site. Sequence located
   within 6 bp upstream or downstream of poly(A)‐overlapping apaQTLs were
   extracted to identify poly(A) motif‐altering apaQTLs. Enrichment
   analyses of apaQTL SNPs among poly(A) motifs were conducted by
   two‐tailed Fisher's exact test with the 2×2 table (columns; apaQTL and
   non‐apaQTL SNPs, rows; SNPs within and not within poly(A) motifs).

   To assess whether the identified apaQTL SNPs were enriched among
   functional regulatory elements in the genome, functional enrichment
   analysis of apaQTL SNPs was performed. In brief, annotation files for
   epigenetic histone marks of promoters and enhancers, and transcription
   factor‐binding sites (TFBs) were obtained from ENCODE.^[ [209]^16b ^]
   Then, overlap of apaQTL SNPs with each regulatory element peaks using
   BEDTools was explored.^[ [210]^51 ^] For enrichment analysis,
   two‐tailed Fisher's exact test with 2×2 table (columns; apaQTL and
   non‐apaQTL SNPs, rows; SNPs within and not within regulatory element)
   was used.

   To predict potential APA regulators that function through binding to
   apaQTLs, enrichment analysis of apaQTLs among RBP binding sites was
   performed. Enhanced crosslinking and immunoprecipitation followed by
   sequencing (eCLIP‐seq) data were downloaded from ENCODE data portal
   ([211]https://www.encodeproject.org). Then, whether each apaQTL SNP
   falls within RBP binding peaks using BEDtools was tested. The
   statistical significance of the enrichment analysis was evaluated using
   two‐tailed Fisher's exact test with 2×2 table (columns; apaQTL and
   non‐apaQTL SNPs, rows; SNPs within and not within the RBP binding
   site).

Enrichment of apaQTLs within NSCLC GWAS loci

   To test for enrichment of apaQTL SNPs among NSCLC GWAS loci, GWAS SNPs
   were downloaded associated with NSCLC risk from the GWAS catalog
   ([212]https://www.ebi.ac.uk/gwas/). SNPs with genome‐wide significant
   association P < 5×10^−8 were included in the analyses (Table [213]S7,
   Supporting Information). SNPs that are in LD with an index SNP at r ^2
   ≥ 0.2 were defined as an associated genomic locus. Subsequently, the
   BEDtools were used to test whether an SNP was located within NSCLC GWAS
   loci. The enrichment of apaQTL SNPs among NSCLC loci was evaluated
   using two‐tailed Fisher's exact test with 2×2 table (columns; apaQTL
   and non‐apaQTL SNPs, rows; SNPs within and not within the GWAS loci).

Partitioned Heritability Analysis

   To estimate the heritability of apaQTLs for NSCLC, GWAS results were
   first downloaded from the OncoArray Consortium lung cancer studies
   (dbGaP accession number: phs001273.v3). Next, GWAS SNPs were extracted
   that also belong to apaQTLs and plotted the quantile–quantile plot of
   the GWAS P‐values for those SNPs. Similarly, GWAS SNPs were plotted
   that also belong to eQTLs for comparison. Genome wide P‐values were
   plotted as a control. The heritability enrichment of apaQTLs was
   estimated by stratified LD Score Regression using (S‐LDSC, v1.0.1,
   [214]https://github.com/bulik/ldsc). The enrichments were calculated as
   the proportion of heritability over the proportion of SNPs, and then
   the standard errors were estimated and used for P‐value calculation.^[
   [215]^52 ^]

Characterization of aGenes

   To explore the potential role of genes affected by apaQTLs (aGenes) in
   tumorigenesis, further these aGenes in terms of biological function,
   somatic copy number alterations (SCNAs), and aGenes associated immune
   cell landscape and drug response were characterized. To examined
   functional enrichment of aGenes in biological pathways, Kyoto
   Encyclopedia of Genes and Genomes (KEGG) analysis was applied.^[
   [216]^53 ^] P < 0.05 was set as the threshold to identify significantly
   enriched pathways. Somatic copy‐number alteration (SCNA) data for NSCLC
   analyzed by GISTC2 were downloaded from the TCGA data portal, to reveal
   relevant genetic alterations of aGenes. Immune cell infiltration data
   of NSCLC was obtained from Tumor Immune Estimation Resource (TIMER2,
   [217]http://timer.cistrome.org/). The associations between immune
   infiltrates and aGene expression were evaluated by Spearman's rank
   correlation analysis. To explore associations between expression of
   aGenes and drug response, gene expression, and drug IC50 data were
   downloaded from the Genomics of Drug Sensitivity in Cancer (GDSC,
   released June 2020, [218]https://www.cancerrxgene.org/). Then, the
   associations were evaluated by Spearman's rank correlation analysis and
   FDR < 0.05 were defined as significant.

Study Subjects in Association Analysis between apaQTLs and NSCLC Risk

   To screen for potential functional apaQTL variants that associated with
   NSCLC risk, a multicenter study in multiple cohorts was performed. In
   the discovery stage, a GWAS array analysis using data from the
   OncoArray Consortium lung cancer studies (dbGaP accession number:
   phs001273.v3.p2) was conducted. After excluding 1815 small cell lung
   cancer samples, a total of 16 665 NSCLC cases and 14 101 healthy
   controls were remained. Furthermore, 1 eligible control from subjects
   without invasive NSCLC was selected by nearest neighbor matching in R
   package MatchIt, with enrollment gender and age group as matching
   criteria. Finally, 13 835 NSCLC cases and 13 835 matched controls were
   included for further analysis. The demographic characteristics of
   samples were listed at Table [219]S1 (Supporting Information).

   In the replication stage, genotyping analysis of the own NSCLC samples
   was performed by TaqMan Assays and ABI Prism 7900HT system. Quality
   control was implemented as described in previous study.^[ [220]^54 ^]
   Three independent population cohorts from China were used for
   evaluating the association between candidate apaQTL variant rs9606 and
   NSCLC risk. In the cohort I, a total of 1700 cases and 1700 controls
   were recruited from the First Affiliated Hospital of Zhengzhou
   University (Zhengzhou, China). In the cohort II, a total of 2809 cases
   and 2809 controls were recruited from Tongji Hospital of Huazhong
   University of Science and Technology (HUST, Wuhan, China). In cohort
   III, a total of 2500 cases and 2500 controls were recruited from Peking
   University Cancer Hospital and Institute. The details of demographic
   characteristics of all samples have been described in Table [221]S4
   (Supporting Information). This study was approved by the Institutional
   Review Board of Tongji hospital (TJ‐IRB20230707), and each participant
   provided written informed consent at recruitment.

   In addition, genotype data from the UK Biobank (UKB) and Prostate,
   Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) datasets
   were adopted to strengthen the finding in the own samples. In UKB
   dataset, NSCLC cases were defined as subjects with primary invasive
   NSCLC diagnosed or NSCLC deaths according to ICD‐10 (C33, C34, and
   C39.9) codes. For each case, 4 eligible controls were selected from
   subjects without invasive NSCLC by nearest neighbor matching in R
   package MatchIt, with enrollment age, race/ethnicity, and sex as
   matching criteria. Finally, 3622 NSCLC cases and 14 488 matched
   controls were included (Table [222]S2, Supporting Information). In PLCO
   dataset (accession number: phs000346.v2. p2, phs001554.v1. p1,
   phs001286.v2. p2, and phs001524.v1. p1), 1712 NSCLC cases and 6848
   healthy controls were included to evaluate the association between
   rs9606 and NSCLC risk. Demographic characteristics of all subjects were
   displayed in Table [223]S3 (Supporting Information).

Generation of Polygenetic Risk Scores to Predict NSCLC Risk

   For the PRS development, the summary statistics was included from an
   inverse variance‐weighted fixed‐effects meta‐analysis performed with
   METAL based on the GWAS from EAS (Biobank Japan NSCLC GWAS; 4444 NSCLC
   cases and 174282 healthy controls) and EUR populations (FinnGen NSCLC
   GWAS; 5842 NSCLC cases and 287137 controls). The study employed three
   distinct PRS models using the selected sets of variants as follows: 1)
   Standard PRS: This model incorporated 120 variants previously
   identified in numerous studies to be significantly associated with
   NSCLC risk at the genome‐wide level. 2) apaQTL‐associated PRS: This
   model focused on 366 independent NSCLC apaQTL variants with MAF>0.05 by
   clumping from 117 161 apaQTL variants. 3) Combined PRS: This model
   integrated all 486 independent variants from both standard PRS and
   apaQTL‐associated PRS. To ensure the efficiency of PRS models,
   LD‐clumping procedure in PLINK v.1.90 was performed to remove highly
   correlated variants. The detail clumping parameters of PRS were as
   follow: –clump‐p 5×10^−8, –clump‐r ^2 0.10 –clump‐kb 250. In brief,
   summary statistics (effect alleles, ORs, and P values for each variant)
   from the NSCLC GWAS meta‐analyses were coordinated with the genotype
   data of case‐control population or cohort population. Then the PRS was
   constructed by utilizing clumping and thresholding (C+T) method, which
   is the most commonly used approach for PRS calculation. Individuals
   were additively scored in a weighted pattern based on the number of
   risk alleles they carry for each variant in the PRS.

   Subsequently, predictive performance of three PRS models described
   above in large‐scale case‐control sets and a perspective cohort set was
   assessed. In brief, summary statistics (effect alleles, ORs, and p
   values for each variant) was used from all GWASs, including UK biobank,
   OncoArray Consortium lung cancer studies, and PLCO. The association
   between PRSs and NSCLC risk among case‐control population were then
   calculated using logistic regression analysis with low‐PRS (bottom 20%)
   as a reference, adjusting for age and gender. Subjects were classified
   into 5 groups according to the distribution of PRS, the dose‐response
   relationship (P [trend]) and risk of NSCLC (ORs) for each group were
   calculated using logistic regression. For the perspective UK biobank
   cohort study, Cox proportional hazards regression model was used to
   obtain hazard ratios (HRs) and 95% CI by comparing the incident risk of
   participants who had a high (top 33%) genetic risk with the incident
   risk of those who had a low (bottom 33%) genetic risk.

Datasets for PRS Application

   Predictive performance of PRS models was assessed for NSCLC risk
   stratification among large‐scale case‐control sets and cohort set. For
   case‐control sets, it was included from UK biobank dataset (3622 NSCLC
   cases and 14488 matched controls), OncoArray Consortium lung cancer
   studies (13835 NSCLC cases and 13835 healthy controls), and PLCO
   studies (1712 NSCLC cases and 6848 matched healthy controls). In terms
   of cohort set, the performance of these PRSs for NSCLC incidence in UK
   Biobank dataset was tested. UK biobank is a large‐scale database and
   cohort containing genetic and health information with follow‐up from
   more than 500 000 UK participants between April 2006 and December 2010.
   Participants with missing genotype data or covariates (sex, age) at
   baseline or a history of other cancers according to the first cancer
   diagnosis were excluded from the dataset. The follow‐up time was
   calculated from baseline assessment to the first diagnosis of NSCLC,
   loss to follow‐up, and death or last follow‐up (October 1, 2022).
   Finally, 3777 incident NSCLC cases and 415 212 cancer‐free normal
   individuals during follow‐up were included for further evaluation.

Cell Culture

   The human NSCLC cell lines A549 and H1299 were obtained from the Cell
   Bank of Chinese Academy of Sciences (Shanghai, China). Cells were
   cultured in RPMI 1640 (HyClone, USA) containing 10% fetal bovine serum
   (Gibco, USA) and 1% penicillin–streptomycin (HyClone, USA) and
   incubated at 37 °C with 5% CO[2] and frozen in commercial cell freezing
   solution (Abkkine, China) at ‐80 °C. All cell lines used were
   authenticated by short tandem repeat (STR) profiling (Applied
   Biosystems, USA), and mycoplasma contamination was not detected
   (MycoAlert, USA).

Plasmids, siRNAs, and Lentiviruses

   The plasmids expressing full‐length LYRM4 cDNA containing the rs9606[T]
   (LYRM4[T]) or rs9606[G] allele (LYRM4[G]) and the plasmids expressing
   LYRM4 cDNA with short 3′UTR or long 3′UTR were constructed using
   pcDNA3.1(+) vector by Tsingke (Beijing, China). The NUDT21‐targeting
   siRNA and nontargeting siRNA control were commercially synthesized by
   AuGCT Biotech Co., Ltd. (Beijing, China). The siRNA sequences are
   listed in Table [224]S4 (Supporting Information). For plasmid
   transfection and cotransfection with siRNA, Lipofectamine 3000
   (Invitrogen, USA) was used as per manufacturer's instructions.

   Lentivirus for full‐length LYRM4 cDNA containing the rs9606[T]
   (LYRM4[T]) or rs9606[G] allele (LYRM4[G]) were purchased from OBIO
   technology and infected with A549 and H1299 cells for in vivo
   experiments.

3′‐Rapid Amplification of cDNA Ends (3′RACE) Assay

   The 3′RACE assay was performed with the HiScript‐TS 5′/3′ RACE Kit
   (Cat#: RA101‐01, Vazyme) according to the manufacturer's instruction.
   In brief, after transfecting with LYRM4[T], LYRM4[G], and empty vector
   plasmids into A549 and H1299 cells for 48 h, total RNA was extracted
   using TRIzol reagent (Invitrogen, USA) and reverse transcribed to
   first‐strand complementary DNA. The gene‐specific primer (GSP) of LYRM4
   and universal primer mix (UPM) were used to perform the first‐round
   PCR, and nested GSP of LYRM4 and Nested Primer were used to perform the
   nested PCR. The amplified products of 3′RACE were separated with 1%
   agarose gels, and then the target bands were excised and recovered
   using FastPure Gel DNA Extraction Mini Kit (Cat#: DC301‐01, Vazyme).
   Next, the purified products were cloned into pCE2 TA/Blunt‐Zero Vector
   using 5 min TA/Blunt‐Zero Cloning Kit (Cat#: C601‐01, Vazyme) and
   identified by Sanger sequencing. The UPM and Nested Primer were
   provided by the HiScript‐TS 5′/3′ RACE Kit. The sequences of GSP and
   nested GSP designed for LYRM4 were listed in Table [225]S4 (Supporting
   Information).

Quantitative Reverse Transcription PCR (qRT‐PCR)

   Total RNA was isolated from NSCLC cell lines, NSCLC tissues, or
   adjacent normal tissues using TRIzol reagent (Invitrogen, USA) and
   reverse transcribed to first‐strand complementary DNA using the
   HiScript III RT SuperMix (Vazyme, China). Quantitative real‐time PCR
   (qPCR) was conducted with SYBR Green qPCR Master Mix (Selleck, USA).
   The mRNA expression was normalized to GAPDH. The primer sequences used
   in qPCR were listed in Table [226]S5 (Supporting Information).

Western Blotting

   NSCLC cells were lysed using RIPA buffer with a mixture of protease
   inhibitors PMSF (Beyotime, China) and cocktail (MedChem Express, USA).
   The lysates were sonicated and centrifuged, supernatant was quantified
   using a BCA assay kit (Beyotime, China). The proteins were separated by
   SDS‐PAGE and transferred to PVDF membrane, then incubated with
   antibodies against LYRM4 (Cat# PA5‐56448, Invitrogen), NUDT21 (Cat#
   10322‐1‐AP, Proteintech), GAPDH (Cat# 10494‐1‐AP, Proteintech), β‐actin
   (Cat# 20536‐1‐AP, Proteintech) and following incubated with antimouse
   or antirabbit secondary antibodies, then detected with ECL reagent (NCM
   Biotech, China).

RNA Stability Assay

   Cells were seeded and transfected with different plasmids, in 6‐well
   plates. After 36 h of transfection, cells were treated with actinomycin
   D (5 µg mL^−1, Cat# S8964, Selleck) and the total RNA was exacted at
   indicated time points. The half‐life time of mRNA was estimated as
   previously described.^[ [227]^55 ^] The decay curves of mRNA were
   plotted using the nonlinear fitting with the equation: N [t]/N [0] =
   e^− ^k ^t. The transcription inhibition time is t, the degradation rate
   is k, and the quantities of RNA at time t and time 0 are N [t] and N
   [0]. The mRNA half‐life was determined using the following equation:
   T[1/2] = ln 2/K.

RNA Pull‐Down Assay

   Biotinylated RNA probes spanning the rs9606 site of 50 bp were designed
   and commercially synthesized. RNA pull‐down experiments were performed
   using RNA pull‐down Kit (BersinBio, cat. no. Bes5102, China). According
   to the manufacturer's instructions, the cellular protein extracts from
   at least 1×10^7 NSCLC cells were incubated with 50 pmol biotinylated
   RNA probes and 50 µL streptavidin magnetic beads with rotation. After
   incubation period, the beads were washed with wash buffer and captured
   with magnetic rack. The retained proteins in the beads were eluted with
   biotin elution buffer, followed by mass spectrometry identification and
   western blot analysis.

Liquid Chromatography Tandem Mass Spectrometry (LC‐MS) Analysis

   The bead samples from RNA pull‐down or immunoprecipitation were
   collected and washed with cold PBS, followed by digestion with
   sequencing‐grade trypsin (Promega). Afterward, the MS data were
   collected using a LC‐MS system that combines the Q Exactive HF mass
   spectrometer with the UltiMate 3000 RSLCnano liquid phase. MS data were
   analyzed using MaxQuant software (V1.6.6) with the Andromeda database
   search algorithm, and the database for search was the UniProt human
   proteome reference database.

Human NSCLC Samples Collection

   A total of 50 NSCLC and adjacent normal lung tissues from patients who
   underwent scheduled surgery in Wuhan Tongji hospital were obtained.
   Patients who donated samples did not receive chemotherapy, radiation
   therapy, targeted therapy, or immunology therapy before surgical
   resection. The clinical information of all samples was provided in
   Table [228]S6 (Supporting Information). The sample collection was
   conducted in accordance with the ethical principles in Declaration of
   Helsinki, and approved by the Ethics Committee of Tongji Medical
   College, Huazhong University of Science and Technology.

RNA Immunoprecipitation (RIP) Assay

   For RIP assays, A549 and H1299 cells were transfected with LYRM4[T] or
   LYRM4[G] plasmids. Afterward, cells were lysed with precold RIP lysis
   buffer on ice. The lysates were centrifuged, part of the supernatant
   was served as the input control, the remaining supernatant was
   incubated with anti‐NUDT21 (Cat# 10322‐1‐AP, Proteintech) or isotype
   control (IgG) and protein A/G beads and rotated at 4 °C overnight.
   Next, the beads were washed with precold RIP wash buffer and collected
   using magnetic rack, and RNA was extracted from the beads and input
   using Trizol. The RNA abundance of LYRM4 mRNA was further analyzed
   through qRT‐PCR method and normalized to input.

Cell Counting Kit‐8 (CCK‐8) Assay

   Cell proliferation and viability were measured by CCK‐8 kit. The
   pretransfected cells were seeded in 96‐well plates, with each well
   containing 100 µL medium of 3000 cells. After cultured for 24, 48, 72,
   and 96 h, cell proliferation were determined using CCK8 reagent
   (MedChem Express, USA) by measuring the absorbance at 450 nm wavelength
   with a microplate reader.

Colony Formation Assay

   For the colony formation assay, the pretransfected cells (200 cells per
   well) were seeded in 6‐well plates. Following a period of 10–14 days,
   the cell colonies were fixed with 4% paraformaldehyde and stained with
   0.1% crystal violet. After washing with water, the visible colonies
   were imaged and counted by ImageJ as previously described.^[ [229]^56
   ^]

Migration Assay

   The migration assay was performed by employing Transwell plates
   (Corning, USA; 8 µm pore size). A total of 0.2 mL (2 × 10^4 cells) of
   cell suspension with serum‐free medium was added into the upper
   chambers of Transwell insert, and 0.6 mL medium containing 10% FBS was
   placed to the lower chamber. After 24 h of incubation, migrated cells
   were fixed with 4% paraformaldehyde, and subsequently stained with 0.1%
   crystal violet.

Dual‐Luciferase Reporter Assay

   The sequences of LYRM4 short 3′UTR and long 3′UTR were synthesized and
   subcloned into pmirGLO reporter vector (Promega, USA). To generate
   vector that expression of only long 3′UTR of LYRM4, the proximal PAS
   sequence was mutated. The sequences of LYRM4 3'UTR containing the
   rs9606 T allele or G allele were cloned into pmirGLO reporter vector.
   A549 and H1299 cells were transfected with above reporter plasmid. 48 h
   after transfection, the renilla and firefly luciferase activities were
   measured with a Dual‐Luciferase Reporter Assay Kit (Cat# DL101‐01,
   Vazyme) according to the manufacturer's protocol. The activity values
   of firefly luciferase were normalized against that of Renilla
   luciferase.

Coimmunoprecipitation (CO‐IP) Assay

   Cells transfected with indicated plasmids in 10 cm plate were washed
   with precold PBS and lysed with IP lysis buffer, and then the
   supernatant was collected after centrifugation. Cell supernatant was
   incubated with anti‐FLAG (Cat# F1804, Sigma‐Aldrich) and protein A/G
   magnetic beads overnight at 4 °C with rotation. After washing the
   magnetic beads with IP wash buffer, the beads were prepared for mass
   spectrometry or western blot analysis.

Cellular Iron Detection

   Intracellular ferrous iron (Fe^2+) levels were detected with the Cell
   Ferrous Iron Colorimetric Assay Kit (E‐BC‐K881‐M, Elabscience, China)
   and the absorbance at 593 nm was calculated as the intracellular Fe^2+
   levels. The iron concentration was determined using a standard
   calibration curve based on the measured absorbance at 593 nm.

Malondialdehyde (MDA) and Glutathione (GSH) Detection

   The relative intracellular MDA and GSH levels were detected using the
   commercial MDA assay kit (BC0020, Solarbio, China) and GSH assay kit
   (BC1170, Solarbio, China) respectively, according to the manufacturer's
   protocol. The MDA and GSH contents were calculated by measuring the
   absorbance at 532 and 412 nm, respectively.

Intracellular ROS and Lipid Peroxidation Detection

   The lipid ROS level was used to evaluate lipid peroxidation.
   Intracellular ROS levels and lipid ROS levels were detected using
   DCFH–DA (Beyotime, China) and C11 BODIPY 581/591(Invitrogen, USA)
   respectively. Briefly, after treatment, cells were incubated with
   DCFH‐DA (10 µM) or C11‐BODIPY (10 µm) for 30 min at 37 °C in the dark.
   After thrice washing with serum‐free medium, the cells were observed
   and photographed using a fluorescence microscopy, or analyzed by flow
   cytometry to measure the fluorescence intensity.

Transmission Electron Microscopy (TEM)

   After the indicated treatment, cells were fixed with 2.5%
   glutaraldehyde at 4 °C overnight. Then, the samples were dehydrated,
   embedded in epoxy resin and cut into ultrathin sections. After staining
   with uranyl acetate and lead citrate, the morphology of mitochondria
   was observed by TEM (JEOL, Japan).

Animal Studies

   Five weeks old male BALB/c nude mice were purchased from Nanjing
   Gempharmatech Co., Ltd, China and raised in specific pathogen‐free
   (SPF) conditions. A549 or H1299 cells (5 ×10^6 in 100 µL PBS) with
   stable overexpression of LYRM4[T] and LYRM4[G] were subcutaneously
   injected into the armpit of each mouse. Tumor size was measured using a
   caliper and the tumor volume was calculated according to the formula:
   volume = (length × width^2)/2. All animal experiments were approved by
   the Animal Ethics Committee of Tongji Hospital (202 311 040).

Statistical Analysis

   Statistical comparisons were assessed using Student's t‐test or
   Pearson's χ ^2 test for continuous and categorical variables,
   respectively. For candidate variant association analysis, unconditional
   multivariable logistic regression was performed to estimate odds ratios
   (ORs) and 95% confidence intervals (CIs), with adjustment for gender,
   age, smoking status, and drinking status. Multiple genetic models that
   include allelic, dominant, recessive, and additive genetic models were
   used to assess the significance associations between rs9606 genotypes
   and NSCLC risk. Other methods used for calculating utility and
   effectiveness of PRS can be found in relevant method part. Statistical
   analyses for gene expression in NSCLC and adjacent normal tissues were
   calculated using Wilcoxon rank‐sum test. Detailed statistical methods
   used in each functional experiment were illustrated in the relevant
   figure legends. Asterisks in the figures define the significance level
   (*P < 0.05; **P < 0.01; and ***P < 0.001). All statistical analyses
   were performed using SPSS software (21.0) or R (3.30).

Conflict of Interest

   The authors declare no conflict of interest.

Supporting information

   Supporting Information
   [230]ADVS-12-2502008-s001.docx^ (5.9MB, docx)

   Supplemental Table 1
   [231]ADVS-12-2502008-s002.xlsx^ (72.1KB, xlsx)

Acknowledgements