Abstract
The subtypes of duodenal cancer (DC) are complicated and the
carcinogenesis process is not well characterized. We present
comprehensive characterization of 438 samples from 156 DC patients,
covering 2 major and 5 rare subtypes. Proteogenomics reveals LYN
amplification at the chromosome 8q gain functioned in the transmit from
intraepithelial neoplasia phase to infiltration tumor phase via MAPK
signaling, and illustrates the DST mutation improves mTOR signaling in
the duodenal adenocarcinoma stage. Proteome-based analysis elucidates
stage-specific molecular characterizations and carcinogenesis tracks,
and defines the cancer-driving waves of the adenocarcinoma and
Brunner’s gland subtypes. The drug-targetable alanyl-tRNA synthetase
(AARS1) in the high tumor mutation burden/immune infiltration is
significantly enhanced in DC progression, and catalyzes the
lysine-alanylation of poly-ADP-ribose polymerases (PARP1), which
decreases the apoptosis of cancer cells, eventually promoting cell
proliferation and tumorigenesis. We assess the proteogenomic landscape
of early DC, and provide insights into the molecular features
corresponding therapeutic targets.
Subject terms: Proteomics, Intestinal diseases, Cancer genomics,
Gastrointestinal cancer
__________________________________________________________________
Duodenal cancer (DC) has complex subtypes and undergoes complicated
morphological changes throughout progression, so understanding the
molecular basis is crucial. Here, the authors perform a proteogenomics
analysis of 156 DCs, revealing molecular subtypes as well as the roles
of smoking, AARS1 and PARP1.
Introduction
Small intestine cancer (SIC) is a malignantly rare cancer, of which the
incidence rate was approximately 0.6%, and the mortality was 0.3% of
cancer-related death worldwide
([68]https://seer.cancer.gov/statfacts/html/smint.html)^[69]1. DC, the
most common type of SIC (ranging from 30 to 50%)^[70]2,[71]3, mainly
consists of four types, adenocarcinoma, neuroendocrine tumor, stromal
tumor, and lymphoma^[72]4. Recent advances in imaging have resulted in
dramatic improvements in diagnostic capability and concluded typical
subtypes of duodenal lesions, including two major subtypes
(adenocarcinoma (named D subtype in this study) and Brunner’s gland
(named G subtype in this study)) and other subtypes, including
heterotopic pancreas (named P subtype in this study), intermuscular
gland (named N subtype in this study), lymphangioma (named L subtype in
this study), cystic dystrophy (named C subtype in this study),
well-differentiated neuroendocrine tumor (named NET subtype in this
study)^[73]5. Whereas, the specific characterizations and associations
of the complex subtypes have not been revealed. Interestingly, the D
subtype usually co-occurred with the G subtype in one patient, while
occasionally with rare NET/L/P/N/C subtypes. However, the origins and
the driven events of different DC subtypes remain largely unknown.
Tobacco usage has important implications regarding the gut environment
and various small intestine disorders, including functional
dyspepsia^[74]6, irritable bowel syndrome^[75]7, and duodenal ulcer
disease^[76]8. Nevertheless, the contribution of tobacco usage in DC
needs to be further explored.
The complexity, tumor heterogeneity, and low incidence of DC are
challenging to collect the DC sample. Several genomic datasets have
illustrated the genomic aberrations of small-bowel adenocarcinoma, and
NET presented the frequent mutations, and proposed the effects of mTOR
signaling in tumor carcinogenesis^[77]9. However, the impacts of the
genomic aberrations on the proteomic alterations and phosphoproteomic
actions remain unclear yet. In addition, the findings are focused on
the advanced stages, the first occurrences of the mutations/key events,
and the related effects during the carcinogenesis of DC are poor
understood.
In DC progression, the morphological characteristics are complicated,
especially for the two major subtypes (D and G subtypes). Due to its
sporadic nature and locations on the posterior or lateral wall of the
second part of the duodenum, the D subtype follows the transition
pattern of colorectal cancer (CRC), from adenoma (T1 stage) to
adenocarcinoma, including minimally invasion adenocarcinoma (named MIA
stage in this study) and advanced-stage adenocarcinoma (e.g., T2–T4
stages, named duodenal adenocarcinoma (DAC) stage in this
study)^[78]10. According to the differentiation degree, the epithelial
adenoma can be divided into low-grade intraepithelial neoplasia stage
(named LGIN in this study), and high-grade intraepithelial neoplasia
stage (named HGIN stage in this study)^[79]11. The tubular adenoma and
tubulovillous adenoma are usually present in the LGIN and
HGIN^[80]12,[81]13. The G subtype showed the features of high-grade
dysplasia and transformed from normal (named N-G in this study) to
hyperplasia (named H–G in this study) and to adenoma (named A–G in this
study)^[82]14–[83]16. The risks of patients at the different
(sub)stages in DC progression are quite diverse. For the patients at
the LGIN stage, the 5-year cumulative probability of eventual
progression to the HGIN or to adenocarcinoma has been reported as no
more than 50%^[84]17,[85]18. Whereas, the HGIN is usually progressed to
adenocarcinoma in pathology (>90%), and should be considered for
survelliance^[86]19. The different stages during the carcinogenesis of
DC are correlated with the lesion size and invasion to the
corresponding layers^[87]20 and require distinctive operation
strategies. For example, for the lesions at the LGIN and HGIN stages
(<20 mm in size), endoscopic resection (e.g., ESD and EMR) is suggested
as an effective treatment approach^[88]21. The lesions invaded into the
submucosal layer (≥20 mm in size) are assessed as carcinoma and are
required for surgical resection^[89]22. For the lesions at the MIA
stage, the clinic management is controversial, further implying the
complexity and undefined characteristics of the (sub)stages in DC
progression. Thus, it is vital for the revelation of the key events
during the carcinogenesis process of DC to explore the pathological
mechanism and the corresponding targeted therapy in DC progression.
However, the molecular characterizations of the (sub)stages of the
lesions in DC progression are yet unclear.
Generally, DC is clinically approached to CRC in a similar manner, such
as panitumumab to anti-EGFR^[90]23,[91]24. However, compared with CRC,
DC patients have a worse outcome of 5-year survival. The increasing
incidence and ambiguous features, and unclear malignancy potentials of
the different subtypes require specific management of the DC clinic
strategy. In addition, there is still a lack of targeted drugs for DC.
Thus, precise strategies and specific druggable candidates for DC are
in urgent need.
Here, we show comprehensive proteogenomic characterization of 438
samples dissected from early DC (EDC) of different stages in 156
patients. The sample pool covers nearly all stages of DC, including 2
major (D/G subtypes) and 5 rare (NET/L/P/N/C subtypes) pathological
subtypes with 23 specific substages from early to advanced stages of
DC. The comprehensive multi-omics analysis reveals the stage-specific
molecular characterization and elucidates the key events in the
transmit of the phases, and identifies four immune clusters.
Furthermore, we propose the potential functions of AARS1 in DC
progression, facilitating future basic and translational research of
this rare digestive tract cancer.
Results
The proteogenomic landscape of early duodenal cancer cohort
To characterize the proteogenomic landscape of DC, whole-exome
sequencing (WES), proteomic, and phosphoproteomic data, 438 samples
were collected from 156 DC patients based on stringent criteria
(Methods), who had not experienced prior chemotherapy or radiotherapy,
covering 2 major (D/G subtypes) and 5 rare subtypes (NET/L/P/N/C
subtypes) (Supplementary Fig. [92]1a). The clinicopathological
characteristics of patients with different subtypes were summarized in
Supplementary Data [93]1a (Table [94]1). To ensure the quality of the
samples in our cohort, 438 samples spanning from 2012 to 2019, were
separately dissected from the formalin-fixed, paraffin-embedded (FFPE)
slides and were well preserved and systematically evaluated to confirm
the histopathologic diagnosis and variant histology according to the
World Health Organization (WHO) classification^[95]11 by more than two
expert gastrointestinal pathologists (Methods). Laser-capture
microdissection (LCM)^[96]25 was applied to dissect the sections of
substage/subtype samples precisely (Supplementary Fig. [97]1b;
Methods), and the tumor purity of all the tumor samples was over 95%,
indicating the high quality of all samples in our cohort (Supplementary
Fig. [98]1c, d; Supplementary Data [99]1b). As for the normal duodenal
tissue, the proportion of normal cells is 100%, while the tumor cell is
0% in our cohort.
Table 1.
Clinical and histopathologic characteristics of the Fudan DC cohort
(n = 156)
Characteristic n (%) Parameter n (%)
Age Survival status
<50 yr 38 (24.4) Alive 156 (100)
50–70 yr 101 (64.7) Death 0 (0)
>70 yr 17 (10.9) Histopathologic (438 samples) 325 (74.2)
Gender Epithelial components
Male 55 (35.3) Yes 325 (74.2)
Female 101 (64.7) No 113 (25.8)
Smoking Histology subtype
Yes 37 (23.7) Adenocarcinoma (D) 325 (74.2)
No 92 (59.0) Brunner’s gland (G) 97 (22.1)
NA 27 (17.3) Heterotopic pancreas (P) 1 (0.2)
Drinking Intermuscular gland (N) 9 (2.1)
Yes 30 (19.2) Lymphangioma (L) 2 (0.5)
No 93 (59.6) Cystic dystrophy (C) 3 (0.7)
NA 33 (21.2) Neuroendocrine tumor (NET) 1 (0.2)
FAP
Yes 2 (1.3)
No 116 (74.4)
NA 38 (24.4)
HNPCC
Yes 4 (2.6)
No 83 (53.2)
NA 69 (44.2)
[100]Open in a new tab
Briefly, five stages (non-tumor (NT), LGIN, HGIN, MIA, and DAC stages
from the D subtype, three stages (N–G, H–G, and A–G Brunner’s gland
stages) from the G subtype, were collected for multi-omics analysis
(Tables [101]2 and [102]3). In total, 438 samples were collected for
proteomic profiling, in which 120 samples (47 DC cases) and 111 samples
(49 DC cases) were conducted on whole-exome sequencing (WES) (15.5 G
raw data, 135× depth-coverage) and phosphoproteomic profiling,
respectively (Fig. [103]1a and Supplementary Fig. [104]2a;
Supplementary Data [105]1c and [106]2a). The stage-based classification
with high tumor purity in our cohort, represented the advantages of our
samples in a pathological region-resolved mode, providing the chance to
portray molecular profiles of DC in a time-resolved mode.
Table 2.
Subclassification information and the number of substage samples in the
D subtype
Stages Substages (n = sample number)
Non-tumor stage (NT stage) 1 (n = 150), 1_1 (n = 1), 1_2 (n = 3), 1_3
(n = 7), 1_4 (n = 4), 1_5 (n = 1), 1_6 (n = 1)
Low-grade intraepithelial neoplasia stage (LGIN stage) 2_1 (n = 22),
2_2 (n = 11),
2_3 (n = 21), 2_4 (n = 8)
High-grade intraepithelial neoplasia stage (HGIN stage) 3_1 (n = 13),
3_2 (n = 9),
3_3 (n = 24), 3_4 (n = 25)
Minimally invasion adenocarcinoma stage (MIA stage) 4 (n = 5)
Duodenal adenocarcinoma stage (DAC stage) 5 (n = 20)
[107]Open in a new tab
Table 3.
Subclassification information and the number of substage samples in the
G subtype
Stages Substages (n = sample number)
Normal stage (N–G stage) G_1 (n = 47)
Hyperplasia Brunner’s stage (H–G stage) G_2 (n = 19), G_2_1 (n = 7)
Adenoma Brunner’s stage (A–G stage) G_3 (n = 22), G_3_1 (n = 1), G_3_2
(n = 1)
[108]Open in a new tab
Fig. 1. Multi-omics landscape in DC progression.
[109]Fig. 1
[110]Open in a new tab
a Overview of the experimental design and the number of samples for the
genomic, proteomic, and phosphoproteomic analyses. The genomic profiles
in the D subtype (b) and the G subtype (c) progression. Top: the
mutation numbers and types of all the samples during carcinogenesis of
DC. DC: duodenal cancer. Bottom: the percentages of somatic SNVs of all
samples. The mutation frequencies are shown by a bar plot on the right
panel. The square directs to a subset of patient samples used for WES
(n = 120). WES whole-exome sequencing, SNVs single nucleotide variants.
d The mutation loads in the diverse phases (red) of D subtypes and
other subtypes in DC. The cumulative number of protein identifications
(e) and phosphosite identifications (f) in DC progression. A total of
438 and 111 samples for proteomic profiling and phosphoproteomics
profiling, respectively, are used in the analysis. ****p < 1.0E−4,
***p < 1.0E−3, **p < 1.0E−2, * p < 0.05, ns. >0.05. Source data are
provided as a Source data file.
At the genomic level, a total of 11,504 mutations were identified in
DC, of which 10,454 and 2810 mutations were detected in the D and G
subtypes, respectively (Supplementary Data [111]2b). The top-ranked
mutations in the carcinogenesis of the D subtype were DST, APC, MUC16,
BIRC6, SYNE1, etc. (Fig. [112]1b). In the G subtype, KMT2B, MAP7D3, and
NRXN2 were highly mutated (Fig. [113]1c). According to the pathological
features and lesion invasion, the D subtype was divided into the NT
phase (including NT stage), the intraepithelial neoplasia (IEN) phase
(including LGIN and HGIN stages), and the infiltration tumor (IFT)
phase (including MIA and DAC stages), and the G subtype was separated
into normal Brunner’s gland (N–G phase) and abnormal Brunner’s gland
(AN–G phase) (Supplementary Fig. [114]2b). Low mutation loads were
observed in the IEN phase of D subtype and G subtype, as well as the N
and C subtypes (Fig. [115]1d), indicating lowly mutation loads in the
earlier phase in DC progression. During the carcinogenesis of DC,
neo-mutations were found in different stages. Specifically, the number
of the neo-mutations was increasingly increased in the D subtype, while
the number peaked at the H–G stage in the G subtype (Supplementary
Fig. [116]2b).
At the protein level, protein abundance was calculated by
intensity-based absolute quantification (iBAQ)^[117]26,[118]27 and then
normalized as a fraction of the total (FOT), allowing for comparisons
between experiments. Whole-cell extract of HEK293T cells was used as
the quality control (QC), which showed the MS was robust and consistent
based on the large Spearman’s correlation coefficients (mean = 0.93)
between the QC samples (Supplementary Fig. [119]2c; Supplementary
Data[120] 3a). Label-free quantification measurement of 438 samples
resulted in a total of 14,426 protein groups with a 1% false discovery
rate (FDR) at the protein and peptide levels (Fig. [121]1e). The
ascending numbers of protein identifications were also shown in the G
subtype (Supplementary Fig. [122]2d; Supplementary Data [123]3b).
Integration of the FFPE procedures, high tumor quality, and
systematical evaluation, ensured the stability of the DC
tissues/samples spanning from 2012 to 2019, evidenced by the no impacts
of the different storage times on the protein identification of the
samples at the same pathological stage (Supplementary Fig. [124]2e).
During the carcinogenesis of the D and G subtypes, gradually decreased
(Spearman’s) correlation coefficient of the histopathological stages
reflected the higher tumor heterogeneity during the carcinogenesis of
DC (Supplementary Fig. [125]2f), highlighting the importance of
exploring molecular characteristics in DC progression. At the
phosphoprotein level, 38,696 phosphosites, corresponding to 9,716
proteins, were quantified in the DC progression (Fig. [126]1f and
Supplementary Fig. [127]2g; Supplementary Data [128]3c). Taken
together, we established a comprehensive landscape of DC progression at
the gene, protein, and phosphoprotein levels, portraying two major (D/G
subtypes) and five rare subtypes (NET/L/P/N/C subtypes) of DC.
Enrichment of tobacco signature and DNA repair panel in DC
To investigate the specific etiological factors that may contribute to
the mutagenesis of DC, we used non-negative matrix factorization
(NMF)^[129]28, and then performed cosine similarity analysis against
mutational signatures in human cancer according to the Catalog of
Somatic Mutations in Cancer (COSMIC)
([130]https://cancer.sanger.ac.uk/cosmic/signatures)^[131]29,[132]30
(Supplementary Fig. [133]3a). As well as in the small intestinal
NET^[134]9, cytosine to thymine (C > T) substitutions were the
predominant SNV type in the D and G subtypes (Supplementary
Fig. [135]3b). The SBS29 signature, also known as Tobacco signature,
exhibited transcriptional strand bias for C > A substitutions, and is
associated with tobacco chewing^[136]31. In our cohort, we found that
the SBS29 signature (Tobacco signature) was overrepresented in the D
subtype (20 of 68 samples), and was detected in the NT stage last till
to DAC stage of the D subtype (Fig. [137]2a and Supplementary
Fig. [138]3c; Supplementary Data [139]4a). In addition, integration of
the clinical information of 156 DC patients revealed that more patients
(60% vs. 18.8%, Fisher’s exact test, p = 1.4E−3) with smoking habit
were observed in the Tobacco signature group with higher SBS29 (Tobacco
signature) score (Wilcoxon signed-rank test, p = 0.024) (Fig. [140]2b
and Supplementary Fig. [141]3d). These findings reflected that tobacco
might be a risk factor for DC, especially for the D subtype.
Fig. 2. The impacts of the Tobacco signature and DNA repair panel in DC.
[142]Fig. 2
[143]Open in a new tab
a Heatmap showing the clinic information (top), correlated mutations
(middle), and the SBS29 (Tobacco) signature contribution (bottom) in
the D and G subtypes. The square is directed to a subset of patient
samples of the D and G subtypes used for WES (n = 120). b Histogram
showing the proportion of the samples with the habit of smoking in the
Tobacco signature and DNA repair panel (two-sided Fisher’s exact test).
c The dominant pathways in the Tobacco signature (top) and DNA repair
panel (bottom). d Scatterplot showing the relationship between CYPs and
aminoglycan catabolism at the protein level (two-sided Pearson’s
correlation test). CYPs cytochromes P450 families. e Boxplot showing
the TMB in the Tobacco signature and DNA repair panel (two-sided
Wilcoxon signed-ranked test). Boxplot shows median (central line),
upper and lower quartiles (box limits), and 1.5× interquartile range
(whiskers). f Boxplot presenting the MSI score in the Tobacco signature
and DNA repair panel (two-sided Wilcoxon signed-ranked test). MSI
microsatellite instability. Boxplot shows median (central line), upper
and lower quartiles (box limits), and 1.5× interquartile range
(whiskers). g Heatmap depicting the represented proteins of the
predominant pathways in the DNA repair panel. The square directs to a
subset of patient samples in the D subtype used for WES. h A brief
summary of the impacts of the Tobacco signature and DNA repair panel in
DC. ****p < 1.0E−4, ***p < 1.0E−3, **p < 1.0E−2, * p < 0.05, ns. >0.05.
Source data are provided as a Source data file.
Generally, cytochromes P450 families (CYPs) play a key role in the
metabolic process of nicotine, major content in tobacco, and it is
nitrosated carcinogenic derivative
4-(methylnitrosamino)−1-(3-pyridyl)−1-butanone (NNK)^[144]32,[145]33.
In our cohort, overrepresentation of CYPs (e.g., CYP2W1, CYP7B1, etc.)
was observed in the Tobacco signature (Supplementary Fig. [146]3e),
indicating the formation of NNK. Glycosaminoglycans (GAGs) enhanced the
metabolic process from NNK to NNK-O-glucuronic acid, which can be
released in small intestinal and kidney^[147]34–[148]36. To further
investigate the impacts of the Tobacco signature, we performed GO/KEGG
enrichment analysis, and found protein O-linked glycosylation and
aminoglycan catabolism were overrepresented in the Tobacco signature
(Fig. [149]2c). In addition, positive association (Pearson’s R = 0.41,
p = 2.2E−3) between CYPs and aminoglycan catabolism was notably
observed (Fig. [150]2d), implying the activation and cumulation of
GAGs. Taken together, the Tobacco signature was prominent in the D
subtype, which might induce NNK-O-glucuronic acid secretion via the
overrepresentation of GAG (Fig. [151]2h).
Other signatures, such as signatures 6, 15, and 88, correlated with DNA
repair^[152]31, were detected in the D, G, P, and N subtypes, and named
as ‘DNA repair panel’ in DC (Supplementary Fig. [153]3c).
Microsatellite instability (MSI) occurs alterations in short-repetitive
DNA sequences^[154]37, as a consequence of DNA mismatch repair
deficiency^[155]38, and shows a positive association with the presence
of hypermutation^[156]39,[157]40. In our cohort, a significantly higher
tumor mutation burden (TMB) was detected (Wilcoxon signed-rank test,
p < 1.0E−3) in the DNA repair panel (Fig. [158]2e and Supplementary
Fig. [159]3f), suggesting MSI might exist in the DNA repair panel of
DC. In order to validate the assumptions, we performed MSI analysis on
the basis of MSI score, which was then found to be higher in the DNA
repair panel (Wilcoxon signed-rank test, p = 0.016), evidenced by the
overrepresented MSI signature-related proteins in the DNA repair panel
(Fig. [160]2f, g, and Supplementary Fig. [161]3g).
Furthermore, DNA repair-related proteins, including PARPs (e.g., PARP1
and PARP9) (Wilcoxon signed-rank test, adjust. p < 0.05), were
prominent in the DNA repair panel (Fig. [162]2g). Aminoacyl-tRNA
synthetases (ARS) are essential enzymes for protein synthesis with
evolutionarily conserved enzymatic mechanisms^[163]41, and promote the
genes expression of PARPs, which act in the DNA damage-induced
ADP-ribosylation^[164]42. In our cohort, the aminoacyl-tRNA
biosynthesis was predominant in the DNA repair panel, evidenced by the
overrepresentation of ARS families (Supplementary Fig. [165]3h),
indicating the translational functions in the DNA repair panel. We then
integrated the ARS families and performed a correlation analysis
between the ARS families and PARPs. As a result, we found AARS1 was
positively correlated with the significant makers (PARP1, PARP9, and
PARP14) in DNA repair signaling (Fig. [166]2g). Moreover, AARS1 showed
a significantly positive correlation with MSI-related proteins^[167]40
(Supplementary Fig. [168]3i). Collectively, DNA repair panel was
featured with high TMB and MSI, in which ARS-PARPs signaling was
overactivated (Fig. [169]2h).
Involvement of the chr8q gain in the transmit process from the IEN to IFT
phase
Somatic copy number alterations (SCNAs) affected proteins and
phosphoproteins in either “cis” or “trans” modes, corresponding to the
diagonal and vertical patterns. To explore the impacts of SCNAs, we
performed whole exome-based SCNAs analyses and examined the regulatory
effects of 23,109 SCNAs on protein expressions of 120 samples in DC
progression (Supplementary Data [170]2c). To this end, we found the
gain of chromosome 8q (chr8q) was notably observed in the IFT phase but
not in the IEN phase (Supplementary Fig. [171]4a). To explore the
biological functions of the gain of chr8q, we performed cis effects of
the genes with CNA regions at the protein level, and found 29 genes at
chr8q gain perturbation profiles had significantly positive cis effects
on their cognate proteins (Fig. [172]3a, b, and Supplementary
Fig. [173]4b). Interestingly, these cis-effect genes at chr8q gain were
involved in MAPK signaling, which were also detected in other
gastrointestinal cancers^[174]43, such as esophageal adenocarcinoma
(ESCA), gastric cancer (GC), and CRC (Fig. [175]3c and Supplementary
Fig. [176]4c). To assess the effects of these amplified genes, we
annotated outliers for the degree of which short hairpin RNA
(RNAi)-mediated depletion reduced DC cell lines (AZ521 and HUTU80 cell
lines)^[177]44. As a result, we found the amplifications of PTK2,
PUF60, RAB2B, and LYN, had negative effects on the proliferation of DC
cells, in which LYN was recorded in cancer-associated genes
(CAGs)^[178]45,[179]46 (Fig. [180]3b; Supplementary Data [181]4b).
Fig. 3. The impacts of chr8q gain in the D subtype.
[182]Fig. 3
[183]Open in a new tab
a Volcano plot showing the cis- correlation of the SCNA (x-axis) and
the associated −log[10] (p-value) (y-axis) on the genes at chr8q gain
(two-sided Spearman’s correlation test). SCNA somatic copy number
alterations. b The cis SCNA-protein regulations of significantly
correlated genes (top) on their corresponding proteins expression
(bottom). The genes are related to MAPK signaling. The right represents
relative survival averaged across all available DC cell lines after
depletion of the indicated genes by RNAi (Dependency map-supported
([184]https://depmap.org) panels) and shows the correlation between the
genes and CAGs. CAGs cancer-associated genes. The square directs to a
subset of patient samples in the D subtype used for WES (n = 120). c
The impacts of the MAPK signaling associated SCNAs on their
corresponding protein expression in gastrointestinal cancers (two-sided
Spearman’s correlation test). d Venn diagram depicting the
amplification of LYN positive associated proteins (top), and the
represented signaling pathways (bottom). e Heatmap showing the
represented proteins in MAPK signaling at the protein (top) and
phosphoprotein (bottom) levels (two-sided Pearson’s correlation test).
f The represented kinases in MAPK signaling were overrepresented in the
LYN amplification group (two-sided Wilcoxon signed-ranked test,
BH-adjusted p < 0.05). The red frame shows the significant correlation
between LYN and the represented kinases (two-sided Pearson’s
correlation test). g Kinase-substrate regulation network showing the
substrates of LYN and MAPK1. The asterisk shows the overrepresented
expression of the substrates in the LYN amplification group (two-sided
Wilcoxon signed-ranked test, BH-adjusted p < 0.05). h Scatterplot
showing the relationship between LYN and cell proliferation at the
protein level (two-sided Pearson’s correlation test). i A brief summary
of the impacts of the chr8q gain in the IFT phase of the D subtype.
****p < 1.0E−4, ***p < 1.0E−3, **p < 1.0E−2, *p < 0.05, ns. >0.05.
Source data are provided as a Source data file.
LYN, a member of Src kinases, regulates the activation of ERK1/2
signaling^[185]47 and promotes cell proliferation and survival in many
diseases, including CRCs^[186]48 and chronic myelogenous
leukemia^[187]49. In our cohort, the LYN amplification had positive
impacts on the proteome-levels of MAPK signaling and cell proliferation
(Fig. [188]3d), evidenced by the adapters (e.g., CRK and CRKL), and
LYN-protein complex (e.g., CBL and PTPN6) (Fig. [189]3e), which
activated the MAPK signaling^[190]48. In addition, LYN was positively
associated with the adapters and LYN-protein complex at the protein
level. Furthermore, the phosphoprotein-levels of MAPK signaling (e.g.,
MAP2K2 S26 and MAPK1 Y187) were overrepresented in the IFT phase
(Fig. [191]3e). These findings indicated the impacts of LYN
amplification on MAPK signaling based on phosphorylation in DC
progression.
To further investigate how LYN amplification affected the activation of
MAPK signaling in DC, we performed kinase-substrate enrichment analysis
(KSEA) of phosphoproteome on the LYN amplification group. As a result,
MAPK signaling (e.g., MAPK1/3/7 and RPS6KA3) related kinases were
overrepresented in the LYN amplification group, in which MAPK1 showed a
positive association with LYN (Pearson’s R = 0.28, p = 3.2E−7)
(Fig. [192]3f and Supplementary Fig. [193]4d). Based on
kinase-substrate regulation network analysis, among of those associated
substrates, we found that MAPK Y187 was the only substrate of LYN
identified and supported by our data, and was overrepresented in the
LYN amplification group (Wilcoxon signed-ranked test, adjust.
p = 3.3E−3, LYN Amp vs. WT ratio = 2.52) (Fig. [194]3g and
Supplementary Fig. [195]4e). In addition, the enhanced phosphorylation
of MAPK1 (Y187) could activate the substrates of RPS6KA3 (S369, S375,
S715, and T365) and TOP2A (S1213) (Supplementary Fig. [196]4f), which
is one of the markers in cell proliferation^[197]50. These results
implied the LYN amplification had positive impacts on cell
proliferation in DC progression. In addition, LYN showed a positive
association with cell proliferation-related markers (e.g., MCM4 and
MCM6), and was significantly correlated with the cell proliferation
pathway (Pearson’s correlation, R = 0.35, p = 2.7E−10) (Fig. [198]3h
and Supplementary Fig. [199]4g). Together, the chr8q gain functioned in
the transmit process from the IEN phase to the IFT phase, in which the
amplification of LYN enhanced the protein-level of LYN, which activated
MAPK signaling and promoted the DC cell proliferation, hinting the
potential medical action of saracatinib^[200]51 in the DC clinic
(Fig. [201]3i).
The diverse characteristic origins and carcinogenesis tracks of the D and G
subtypes
To explore the association between pathological subtypes and proteomic
subtyping, we performed consensus clustering analysis on 438 samples of
DC, and identified two clusters (Supplementary Fig. [202]5a).
Specifically, cluster 1 (CCP1) mainly consisted of the non-surface
epithelial components, including G, NET, L, P, N, and C subtypes
(Supplementary Fig. [203]5b). The most of the samples in the surface
epithelial component (D subtype) were observed in the cluster 2 (CCP2).
These results indicated that the proteomic methods could be applied to
the distinction of the pathological layers.
To investigate the molecular characterizations of the two clusters
(CCP1 and CCP2), we integrated the differential expressed proteins
(DEPs) between CCP1 and CCP2 (Wilcoxon signed-rank test, adjust.
p < 0.05, CCP2 vs. CCP1 ratio ≥2 or ≤0.5), and found 1110 and 1992
proteins were overrepresented in the CCP1 and CCP2, respectively.
Specifically, the CCP1 proteins participated in ECM signaling (e.g.,
COL1A1 and THBS3) and complement cascade (e.g., CFD and C5). In CCP2,
the dominant pathways were cell cycle (e.g., CDK4 and RB1), apoptosis
(e.g., BAX and CASP3), and glycolysis (e.g., ENO1 and HK2)
(Supplementary Fig. [204]5c).
The two major subtypes (D and G subtypes) account for nearly 90% of
DC^[205]5, and were prominent in the CCP2 and CCP1, respectively.
However, the divergency of their origins remains yet unknown. In our
cohort, 60 patients (38.5%) had the D subtype only, 32 patients (20.5%)
had the G subtype only, 54 patients (34.6%) had both D and G subtypes,
and 10 patients (6.4%) had other subtypes. To explore the difference
between D and G subtypes, we divided 156 DC cases into 4 panels shown
in the Sankey diagram: Panel 1 (P1, patients with D subtype only),
Panel 2 (P2, patients with G subtype only), Panel 3 (P3, patients with
both the D and G subtypes), and Panel 4 (P4, patients with other
subtypes) (Fig. [206]4a). Notably, the P1 and P2 were prominent in the
CCP2 and CCP1, respectively (Fisher’s exact test, p < 2.2 E−16)
(Supplementary Fig. [207]5e). In addition, in the P3, the proportions
of CCP1 and CCP2 were comparable, and while the P4 was overrepresented
in the CCP1. The findings indicated that the molecular characterization
of subtype-based panels might be relevant to that of the proteomic
clusters. In addition, the proteomic cluster presented that the
featured protein patterns and dominant pathways were detected in the
subtypes and stages in DC.
Fig. 4. The diverse characteristic origins and the staging waves in the
progression of the D and G subtypes.
[208]Fig. 4
[209]Open in a new tab
a Sankey diagram analysis of all the 156 cases classified into 4
panels. b Venn diagram depicting the specific driven pathways in the
normal (left) and tumor (right) tissues of the P1 and P2. The green
highlights the normal tissue. The blue and orange indicate the D and G
subtypes, respectively. c The morphological feature (H&E staining)
(left) and represented pathways (right) of normal tissue (stage 1)
among the P1–P3. d PCA of the D and G subtypes samples. e Venn diagram
(left) showing the phase/stage associated mutations in the D and G
subtypes (two-sided Fisher’s exact test) and heatmap (right) displaying
the impacts of DST mutation on the protein-levels of DST and PRKCD
(two-sided Wilcoxon signed-rank test). The square directs to a subset
of patient samples in the D subtype used for WES (n = 120). f The
represented pathways in the DST Mut group at the protein level. g KSEA
of the kinase activity in the DST mutation group (n = 20) and WT group
(n = 48). KSEA kinase-substrate enrichment analysis. h Kinase-substrate
interaction network in mTOR signaling. i Scatterplot showing the
relationship between log[10] PRKCD and log[10] PRKDC at the protein
level (two-sided Pearson’s correlation test). j A brief summary of the
impacts of DST mutation in the DAC stage of the D subtype. DAC duodenal
adenocarcinoma. ****p < 1.0E−4, ***p < 1.0E−3, **p < 1.0E−2, *p < 0.05,
ns. >0.05. Source data are provided as a Source data file.
To further explore the molecular features of the subtype-based panels,
we analyzed the DEPs of the panels. As a result, the P1 (D subtype
panel) proteins were enriched in the cell cycle (e.g., CDK1/2/4/6 and
CDKN2A), apoptosis (e.g., CASP3/8 and PARP1/9), and glycolysis (e.g.,
HK2, PGK1, etc.), while the P2 (G subtype panel) proteins participated
in ECM signaling (e.g., COL1A1, LAMA1, etc.) and complement cascade
(e.g., CFH and C4A) (Fig. [210]4b). The P3, patients had both the D and
G subtypes, were characterized by the cell cycle, ECM signaling, etc.,
which were observed in the P1 and P2, respectively (Supplementary
Fig. [211]5f). The links of the findings in the proteomic clusters and
subtype-based panels indicated the correlations between the proteomic
clusters and the pathological stages/subtypes of DC. In addition, that
consistency suggested the advances and applications of the proteome in
stratifying patients and understanding cancer heterogeneity and
phenotypes. Based on the proteomic methods and classification in the
pathological layers, diverse subtypes/stages were collected, promoting
us to explore the origins and carcinogenesis of DC subtypes.
Interestingly, we found that the morphological features of the normal
tissues were diverse among the P1–P3 (Fig. [212]4c), implying the
different characterization of the normal tissues. We further analyzed
the different molecular characterization of the P1–P3. As a result, the
differences in the normal tissues were consistent with the
corresponding tumor tissues of the P1–P3 at the protein level. For
example, the highly expressed proteins of normal tissues in the P1 were
enriched in the cell cycle, apoptosis, and glycolysis. In the P2,
complement cascade and ECM signaling were dominant in the normal
tissues (Fig. [213]4c). In addition, the P3 consisted of patients with
both D and G subtypes, which shared the common normal tissues. Compared
to the P1 (D subtype panel), ECM signaling (e.g., COL1A1 and LAMA1) was
highlighted in the normal tissues of the P3 (Fig. [214]4c). While
compared to the P2, cell cycle (e.g., CDK1 and CDK2) was observed in
the normal tissues of the P3. Together, the different pathological
features and molecular characterizations suggested the diversity in the
D and G subtypes was associated with their origin in normal tissues.
The staging and driver pathway waves in the progression of the D and G
subtypes
In our cohort, stage-based classification during the D and G subtypes
progression provided a chance to portray molecular profiles of D and G
subtypes in a time-resolved mode at the multi-omics level. Principal
component analysis (PCA) presented a clear boundary between the D and G
subtypes, and among the diverse stages (Fig. [215]4d), further
authenticating the specificity of the two major subtypes, and
indicating dramatic alterations in the proteome patterns along with the
carcinogenesis progression in both the D and G subtypes.
To further depict staging molecular models that drove carcinogenesis
from early to progressive DC, we performed stage-based supervised
clustering analysis and described the proposed driving pathways and
characteristic molecules of different stages of the D and G subtypes.
As a result, the D subtype was divided into five sequential waves
(D1–D5): digestion pathway—AMPK signaling/ insulin resistance—cell
cycle/DNA repair—apoptosis/aminoacyl-tRNA biosynthesis—mTOR
signaling/EGFR signaling. In the D1 (normal tissue, NT stage), ACOX1
and APOA4 were generally overrepresented, indicating the remaining
primary physiological functions of the duodenum in the early
carcinogenesis stages. Enhanced proteome-levels of AMPK signaling
(e.g., PRKAA1 and PPP2R5E) and insulin resistance (e.g., CAB39L and
RAB13) were detected in the D2 (LGIN stage), and elevated cell cycle
(e.g., CDK1 and RB1) and DNA repair (e.g., RFC1 and MSH3) in the D3
(HGIN stage), suggesting a response to exogenous stresses. Apoptosis
(e.g., BID, CASP6, etc.) and aminoacyl-tRNA biosynthesis (e.g., CARS2
and AARS2) were predominant in the D4 (HGIN stage), suggesting cancer
malignancy. In the D5 (DAC stage), mTOR signaling (e.g., AKT2 and
RPTOR) and EGFR signaling (e.g., EGFR and GSK3B) were overrepresented
(Supplementary Fig. [216]5g), implying the progression of cancer and
tumor growth.
In the G subtype, we observed a different kinetic lineage with three
sequential waves (M1–M3) in carcinogenetic progression. In the M1
(normal tissue, NT stage), the digestion pathway (e.g., LDLR and ACOX1)
was dominant, which was similar to the D1 of the D subtype. ECM
signaling (e.g., COL4A1 and LAMA1) and focal adhesion (e.g., ITGB1 and
VWF) were prominent in the M2 (G1 and G2 stages). The M3 (G3 stage) was
featured with complement cascade (e.g., C3, CFB, etc.) and platelet
activation (e.g., ITPR1 and PLCB2) (Supplementary Fig. [217]5g),
indicating the immune response was a dominant pathway during the
progression of the G subtype.
Among the mutations in DC progression, we found that DST mutation,
detected in the IFT phase (Fisher’s exact test, p = 1.3E−3) and
overrepresented in the DAC stage (Fisher’s exact test, p = 1.4E−3),
enhanced the protein-level of DST (Wilcoxon signed-ranked test, adjust.
p = 0.019, DST Mut vs. WT ratio = 3.19), and had negative impacts on
the proliferation of DC cell lines (Fig. [218]4e). Generally, DST plays
a key role in calcium (Ca^2+) signal and increased Ca^2+ level^[219]52,
which is the activator in the transmit process from the phospholipid
and diacylglycerol (DAG) to protein kinase C (PKC)^[220]53. In our
study, we found that PRKCD, one of PKC^[221]53, was overrepresented in
the DST mutation group (Wilcoxon signed-ranked test, adjust. p = 0.023,
DST Mut vs. WT ratio = 1.96), which exhibited a positive association
with DST at the protein level (Pearson’s R = 0.60, p = 2.0E−6)
(Fig. [222]4e and Supplementary Fig. [223]5h).
To explore the impacts of DST mutation on DC progression, we performed
GO/KEGG enrichment analysis on the overrepresented proteins in the DST
mutation group, and found those proteins participated in mTOR signaling
(e.g., MTOR, RPS6KA1/3) (Fig. [224]4f and Supplementary Fig. [225]5i).
At the phosphoprotein level, 1608 phosphosites mapped to 885
phosphoproteins showed greater changes than corresponding protein
abundance (Wilcoxon signed-ranked test, adjust. p < 0.05, DST Mut vs.
WT ratio ≥2), and were significantly enriched in mTOR signaling (e.g.,
MTOR S1262 and DEPTOR S265) (Supplementary Fig. [226]5j, k). These
findings indicated that DST mutation had positive impacts of mTOR
signaling based on phosphorylation.
To investigate how DST mutation elevated phosphorylation and enhanced
mTOR signaling, we performed KSEA of the phosphoproteome in the DST
mutation group and WT group. As a result, the dominant kinases were
found to be activated in the DST mutation group, including PDK1, PRKG1,
MAP2K7, PRKCI, PRKDC, and ABL1, in which the protein-levels of PDK1 and
PRKDC were overrepresented in the DST mutation group (Wilcoxon
signed-ranked test, adjust. p < 0.05) (Fig. [227]4g and Supplementary
Fig. [228]5l).
PDK1, a kinase upstream of mTOR, binds to inositol triphosphate (IP3)
and activated PKB/AKT/mTOR signaling^[229]54. The kinase-substrate
regulation network analysis revealed that the elevated substrates of
PDK1 in our cohort, were involved in mTOR signaling, such as AKT1 S124,
PRS6KA3 S415, MTOR S1261, etc. (Fig. [230]4h). These findings further
implied the enhancement of mTOR signaling in the DST mutation group.
PRKDC, one of the catalytic subunits of DNA-dependent protein kinase,
of which the Sh2 domain can bind to the ser/thr sites and thus
activates ser/thr kinases^[231]55,[232]56. In our study, PRKDC showed
significant association with PRKCD (Pearson’s R = 0.70, p = 6.1E−12),
and activated the substrate AKT1 (S124) (Fig. [233]4h, i). The
integrated findings of the kinase-substrate analysis (PDK1 and PRKDC)
suggested the overrepresentation of the mTOR signaling in the DST
mutation group. Taken together, these results indicated that the
overrepresented DST protein in the DST mutation group, enhanced the
protein levels of the kinases (PDK1 and PRKDC), and thus activated mTOR
signaling in the DAC stage (Fig. [234]4j), providing a reference
database for the corresponding personalized medicine of DAC.
Characterization of immune-based clusters in the DC subtypes
To evaluate the immune infiltration of the DC subtypes, we applied
xCell ([235]https://xcell.ucsf.edu)^[236]57 to the 438 samples of the
DC cohort, and deconvoluted immune, stromal, and microenvironmental
cell signatures of DC. Consensus clustering identified four
immune-based clusters as Hot tumor cluster, Cold tumor cluster,
Epithelial cluster, and B cells cluster (Fig. [237]5a and Supplementary
Fig. [238]6a; Supplementary Data [239]5a). Comparing with proteomic
panels, we observed the P1 (D subtype panel) and P2 (G subtype panel)
(Fisher’s exact test, p = 3.5E−6) were prominent in the Hot
tumor/Epithelial clusters and Cold tumor/B cells immune desert
clusters, respectively (Supplementary Fig. [240]6b). Observation of the
immune infiltration from xCell showed that higher stromal scores
(Kruskal–Wallis test, adjust. p = 9.6E−5) in the non-epithelial
components of DC, including G, N, and C subtypes (Supplementary
Fig. [241]6c, d). During the carcinogenesis, stromal score was notably
descended in the D subtype progression (Kruskal–Wallis test, adjust.
p = 2.0E−5), which was not observed in the G subtype progression
(Supplementary Fig. [242]6e). Of note, P1 was predominant in the Hot
tumor cluster, and the P2 was overrepresented in the Cold tumor cluster
and the B cells cluster. Consistently, pathway enrichment analysis
revealed that the D subtype prominent pathways were dominant in the Hot
tumor cluster, such as cell cycle, apoptosis, DNA repair, and so on
(Fig. [243]5c, d). Moreover, the G subtype dominant pathways, such as
focal adhesion, cell-cell adhesion, were overrepresented in the Cold
tumor cluster. These results further illustrated the diverse molecular
characterization of the immune infiltration of DC was correlated with
the subtypes.
Fig. 5. Characterization of immune-based clusters in DC subtypes.
[244]Fig. 5
[245]Open in a new tab
a Heatmap illustrating cell type compositions and activities of the
functional genes/proteins across the four immune clusters
(Kruskal–Wallis test, BH-adjusted p < 0.05). The square directs to a
subset of patient samples used for proteomic profiling (n = 438). b
Boxplot showing the HLA-I score (left) and HLA-II score (right) in the
four immune clusters (two-sided Wilcoxon signed-ranked test). n (Hot
immune cluster) = 105, n (Cold tumor) = 75, n (Epithelial
cluster) = 129, n (B cells cluster) = 129 biologically independent
samples examined. Boxplot shows median (central line), upper and lower
quartiles (box limits), 1.5× interquartile range (whiskers). c
Proteome-based deconvolution of the immune-related cell
signatures/proteins and dominant pathways in the four immune clusters
(Kruskal–Wallis test, BH-adjusted p < 0.05). The GSVA score based on
the global proteomic data for biological pathways overrepresented in
different immune clusters. The square directs to a subset of patient
samples used for proteomic profiling (n = 438), as well as in Fig. 5a.
GSVA gene set variation analysis. d Density contours of immune and
microenvironment scores of four immune clusters. e Heatmap showing the
correlation between AARS1 and TMB, immune infiltration, and HLA-I
molecules (two-sided Pearson’s correlation test). The square directs to
a subset of patient samples used for WES (n = 120). TMB tumor mutation
burden. f The (Pearson’s) correlation between AARS1 and the predominant
pathways in the Hot tumor cluster (two-sided Pearson’s correlation
test). g The correlation between AARS1 and makers of the predominant
pathways in the Hot tumor cluster (two-sided Pearson’s correlation
test). h A brief summary of the impacts of AARS1 in the DC.
****p < 1.0E−4, ***p < 1.0E−3, **p < 1.0E−2, *p < 0.05, ns. >0.05.
Source data are provided as a Source data file.
In addition, the Cold tumor cluster, was characterized by the presence
of dendritic cells (DCs), fibroblasts, and endothelial cells
(Fig. [246]5c, d). CD248, tumor endothelial marker 1^[247]58 and a
modulator of fibrosis which featured by ECM deposition^[248]59, was
highly expressed in the Cold tumor cluster (Fig. [249]5c), implying the
potential functions of anti-CD248 in the Cold tumor cluster.
Comparably, the Hot tumor cluster with the highest immune score and
microenvironment score, exhibited positive association with T cell
activation, evidenced by the highest degree of CD4^+ T cells, CD4^+
Tcm, CD8^+ T cells, and CD8^+ Tcm (Fig. [250]5c). Besides, the highest
HLA-I score and HLA-II score were detected in the Hot tumor cluster,
evidenced by the expressions of the corresponding MHC molecules (e.g.,
HLA-B/C/E) (Fig. [251]5b, c). CD276 (also known as B7-H3), a member of
B7 family^[252]60, is broadly overexpressed by multiple tumor types on
both cancer cells and tumor-infiltrating blood vessels^[253]61, and was
overrepresented in the Hot tumor cluster in the DC cohort. Recently,
CD276 is identified as a compelling target and CAR-T targeting CD276
mediates significant anti-tumor activity in solid tumors^[254]62. The
hyper-activation of T cell infiltration and overrepresentation of CD276
implicated the potential clinical strategies of CD276 and other CAR-T
applications for the Hot tumor cluster. Taken together, we disclosed
the different immune characterization in two major and five rare DC
subtypes, and revealed the unique profiles of four immune clusters,
providing the potential managements of CD248 in the Cold tumor cluster
and CD276/CAR-T in the Hot tumor cluster.
Likely as the immune infiltration, the highest TMB was observed in the
Hot tumor cluster (Kruskal-Wallis test, p = 0.037) (Supplementary
Fig. [255]6f). Findings in many cancer types (e.g., bladder cancer and
breast cancer) suggest the TMB may predict clinical response to immune
checkpoint inhibitors^[256]63. In our study, the TMB exhibited positive
association with the DNA repair signature (Pearson’s R = 0.44,
p = 1.7E−4), indicating MSI in the Hot tumor cluster (Supplementary
Fig. [257]6g). The referred significant proteins in the DNA repair
panel, especially AARS1 and TARS2, were significantly and highly
expressed in the Hot tumor cluster, and were overrepresented in the
tumor tissue both in the D and G subtypes, implying their potential
value in the clinic strategy (Supplementary Fig. [258]6h). Furthermore,
AARS1 was growingly enhanced in the D subtype progression
(Kruskal–Wallis test, adjust. p = 2.6E−5) (Supplementary Fig. [259]6i),
and showed significantly positive correlation with TMB (Pearson’s
R = 0.45, p = 5.4E−5) (Supplementary Fig. [260]6j), indicating the
involvement of AARS1 in the immune infiltration of DC. To further
investigate the functions of AARS1, we then performed correlation
analysis between AARS1 and immune infiltration. As well as immune
score, stroma score, and microenvironment score, we found that AARS1
represented positive correlation with HLA-I score and HLA-II score,
which was correlated with TMB (Fig. [261]5e and Supplementary
Fig. [262]6k, l). In addition, AARS1 showed significantly positive
association with the dominant pathways of the Hot tumor cluster,
including apoptosis, base excision repair, mismatched repair, VEGF
signaling, ERBB signaling, etc. (Fig. [263]5f, g). These findings
reflected that AARS1 was responsible for cancer-associated signaling
pathways (Fig. [264]5h), implying the necessary of exploiting the
utility of AARS1 in DC therapy.
AARS1 promotes cancer cell proliferation and invasion through non-canonical
function
DNA damage was proved to be the driving factor in DC progression, in
which AARS1 displayed a key role, and could be perspective for DC
(Fig. [265]5h). In addition, we found the expression level of AARS1
increased significantly in tumor tissues, and growingly enhanced during
DC progression (Fig. [266]6a).
Fig. 6. AARS1 promotes cancer cell proliferation through non-canonical
function.
[267]Fig. 6
[268]Open in a new tab
a IHC staining of the expression level of AARS1 from normal to advanced
stage tissue. b Overrepresentation of AARS1 and PARP1 in tumor tissue
is detected by western blot analysis (n = 12 repeats per group)
(two-sided Student’s t-test, mean ± SD). N = tumor-adjacent tissue,
T = tumor tissue. The left panel shows western blot analysis of AARS1,
TARS1, SARS1, PARP1, PARylation, actin, and lysine alanylation of PARP1
in tumor tissues and tumor-adjacent normal tissues. The bottom panel
shows the quantified western blot results of 12 pairs of samples. The
results of the other 6 pairs of samples are provided in Extended Data
Fig. [269]7a. c The impacts of AARS1 overexpression (OE) on Hutu80
cells and WDC-1 cells’ proliferation (n = 5 repeats per group)
(two-sided Student’s t-test, mean ± SD): vector (control), OE-AARS1,
and OE-AARS1-alaninol. The value of OD450 and days are shown on the y-
and x-axes, respectively. d The impacts of AARS1 knock-down (KO) on
Hutu80 cells and WDC-1 cells’ proliferation (n = 5 repeats per group)
(two-sided Student’s t-test, mean ± SD): scrambled siRNA and AARS1 KO.
e The effects of AARS1 OE (left)/KO (right) Hutu80 cells and WDC-1
cells invasion (n = 5 repeats per group) (two-sided Student’s t-test,
mean ± SD). f Gene Ontology enrichment analysis of AARS1-interacting
proteins in biological processes. These proteins are identified by a
proteomics survey using tandem affinity purification. g
Co-immunoprecipitation assay showing that exogenous AARS1 and exogenous
PARP1 interacted in cultured Hutu80 cells and WDC-1 cells. h The
impacts of AARS1 on cellular PARylation level. i Activities of PARP1
immunoprecipitated from cells transfected with different aminoacyl-tRNA
synthetases (n = 5 repeats per group) (two-sided Student’s t-test,
mean ± SD). ****p < 1.0E−4, ***p < 1.0E−3, **p < 1.0E−2, *p < 0.05, ns.
>0.05. Source data are provided as a Source data file.
Using western blot and immunohistochemistry (IHC), we confirmed that
AARS1 protein levels were enhanced in tumor tissues during DC
carcinogenesis (t-test, p = 6.7E−8) (Fig. [270]6b and Supplementary
Fig. [271]7a; Supplementary Data [272]6a). As controls, the overall
expression levels of TARS1 and SARS1 were not altered (Fig. [273]6b and
Supplementary Fig. [274]7a). Overexpressing AARS1 promoted, while
knocking down AARS1 inhibited cell proliferation (Fig. [275]6c, d;
Supplementary Data [276]6b, c) and cell invasion capacities
(Fig. [277]6e; Supplementary Data [278]6d) in both Hutu80 cells and
WDC-1 cells. However, overexpression of other aminoacyl-tRNA
synthetases, including SARS1 and TARS1, did not affect cell
proliferation and invasion (Supplementary Fig. [279]7b, c;
Supplementary Data [280]6e, f). These results suggested that the
oncogenic role of AARS1 was not due to protein translation. To
investigate the potential non-translational role of AARS1, we used
tandem affinity purification (TAP) to identify AARS1-interacting
proteins in Hutu80 cells. As a result, a total of 652 differential
proteins were detected in cells, which were highly expressed in the D
subtype (p < 0.05) (Supplementary Fig. [281]7d). Consistently, GO/KEGG
enrichment analysis revealed that those 652 proteins participated in
DNA damage repair-related pathways, which were the dominant pathways in
the D subtype (Fig. [282]6f and Supplementary Fig. [283]7e), further
indicating the roles of AARS1 in DC progression.
Among the 652 proteins, PARP1, one of poly-ADP-ribosyltransferases that
mediates poly-ADP-ribosylation of proteins and plays a key role in DNA
repair^[284]64, was the prior preferential target that was interacted
with AARS1 (Supplementary Fig. [285]7f, g). Accordingly, the
interaction between AARS1 and PARP1 was confirmed by the results of
co-immunoprecipitation assays using either exogenous AARS1 and PARP1 in
cultured Hutu80 cells and WDC-1 cells (Fig. [286]6g) or endogenous
AARS1 and PARP1 in DC tissues (Supplementary Fig. [287]7h). Besides, we
validated that AARS1 was localized in both the cytosol and nucleus,
supporting our observation that AARS1 interacted with PARP1
(Supplementary Fig. [288]7i). Furthermore, overexpression of AARS1 led
to notably decreased cellular poly ADP ribosylation (PARylation)
(Fig. [289]6h), and decreased PARP1 specific activity, which was
measured using a PARP1 enzymatic activity assay (Fig. [290]6i;
Supplementary Data [291]6g). In addition, the overexpression of other
ARS families (e.g., TARS1 and SARS1) would not decrease PARP1 activity
(Fig. [292]6i). Therefore, among of ARS families, PARP1 interacted with
AARS1, reducing the reduction of PARP1 activity. Moreover, we performed
PARP1 activity colorimetric assay in vitro, by adding purified AARS1
into the reaction mix. As a result, we found that overexpression of
AARS1 significantly decreased the activity of PARP1 (t-test,
p = 2.6E−3) (Supplementary Fig. [293]7j; Supplementary Data [294]6h).
These results indicated that increased AARS1 correlated with decreased
PARP1 activity in cells.
AARS1 inhibits PARP1 and cell apoptosis and blocking K-Ala activated PARP1
and suppressed tumor growth
It has been well established that PARP1 plays an important role in the
cell response to DNA damage^[295]65 and the inductive effect on cell
apoptosis^[296]66. We confirmed that cell apoptosis decreased
significantly in AARS1-overexpressing cells (Fig. [297]7a;
Supplementary Data [298]6i). The genotoxic reagent camptothecins (CPT)
was able to induce DNA damage and increase cell apoptosis^[299]67.
Overexpression of AARS1 in CPT-treated cells led to a reduced cell
apoptosis rate (Fig. [300]7a; Supplementary Data [301]6i), although the
DNA damage level increased, as evidenced by the levels of γ-H2AX
detected using immunofluorescence staining (Fig. [302]7b) and western
blot and based on DNA damage detected using a comet assay
(Supplementary Fig. [303]8a, b; Supplementary Data [304]6j). Combined
with the results that AARS1 expression correlated with decreased PARP1
activity, these results indicated that AARS1 overexpression may
contribute to promoting cell proliferation and invasion through
reducing cell apoptosis.
Fig. 7. Lysine alanylation of PARP1 decreases cell apoptosis by inhibiting
PARP1.
[305]Fig. 7
[306]Open in a new tab
a The impacts of AARS overexpressing (OE) in Hutu80 cells and WDC-1
cells apoptosis (n = 5 repeats per group) (two-sided Student’s t-test,
mean ± SD): OE-AARS1, OE-AARS1-alaninol, CPT, OE-AARS1-CPT, and
OE-AARS1-CPT-alaninol. CPT camptothecins. b Immunofluorescence staining
of γH2AX with various treatments: OE-AARS1 and OE-AARS1-CPT. The scale
bar indicates 20 µm. c The effects of AARS1 and PARP1 on K-Ala levels.
d The capacity of SIRT6 to dealanylate the synthetic PARP1 K621-Ala
peptide. e K-Ala levels in cells with different treatments. f DNA
binding affinities of PARP1 and PARP1 mutants (n = 5 repeats per group,
two-sided Student’s t-test, mean ± SD). K-Ala: lysine-alanylation. g
PARP1 activities in PARP1 WT and PARP1 mutants. h The impacts of
alaninol on cellular K-Ala and PARylation levels. i Tumor sizes of cell
xenografts with different treatments in normal nude mice or
alaninol-fed nude mice. n = 6 mice per group. ****p < 1.0E−4,
***p < 1.0E−3, **p < 1.0E−2, * p < 0.05, ns. >0.05. Source data are
provided as a Source data file.
We surveyed how AARS1 inhibited PARP1. In addition to mediating protein
translation, AARS1 can catalyze the alanyl-modification in protein
lysine residues and transferred its signal by altering the functions of
modified proteins^[307]68. Elevated AARS1 expression or alanine levels
led to dose-dependent increases in lysine-alanylation (K-Ala) of total
protein (Supplementary Fig. [308]8c) and K-Ala of PARP1 (Supplementary
Fig. [309]8d), as evaluated through western blotting performed by a
custom anti-K-Ala antibody (Supplementary Fig. [310]8e). Knocking down
AARS1 abrogated the effect of alanine in forming K-Ala on PARP1
(Fig. [311]7c). In tumor and adjacent normal tissues, we confirmed that
AARS1-overexpressing tumors exhibited K-Ala modification on PARP1
(Fig. [312]6b).
We screened for K-Ala modification sites in PARP1 protein, employing
liquid chromatography resolution followed by tandem mass spectrometry
analysis. Four lysine sites were identified as being alanylated in
PARP1 (Supplementary Fig. [313]8f, g), supporting the possibility that
K-Ala affects PARP1 activity. Using an in vitro assay, we validated
that AARS1 catalyzed the K-Ala modification in synthetic PARP1 K621
peptides (Supplementary Fig. [314]8h). However, an aminoacyl-AMP
formation-defective AARS1^R751G mutant failed to form K-Ala in
synthetic peptides (Supplementary Fig. [315]8h). To identify the
possible enzyme responsible for K-Ala removal, we screened the
K-Ala-removing capacities of nuclear localized NAD^+-dependent
sirtuins, including SIRT1, 2, 6, and 7, in cultured Hutu80 cells, as
these deacetylases might have deaminoacylase capacities. We found that
only overexpression of SIRT6, but not other sirtuins, significantly
increased the cellular PARylation in cells, suggesting that SIRT6 could
potentially remove the K-Ala on PARP1 (Supplementary Fig. [316]8i). In
vitro, we validated that SIRT6 could catalyze the de-alanylation of
synthetic K-Ala PARP1 peptides (Fig. [317]7d). Moreover, we confirmed
that although overexpression of AARS1 led to increased K-Ala levels of
PARP1, exogenous expression of SIRT6 in cells could decrease the K-Ala
levels of PARP1 (Fig. [318]7e). These results demonstrated that PARP1
is subjected to AARS1-mediated K-Ala modification.
To further investigate the function of K-Ala modification on PARP1, we
constructed an alanylation memetic lysine (23, 414, 418,
621)-to-alanine switch mutant PARP1 (KA mutant) and a non-alanylation
memetic lysine (23, 414, 418, 621)-to-arginine switch mutant PARP1 (KR
mutant). We found that the PARP1 KA mutant exhibited decreased binding
affinity to DNA, while the PARP1 KR mutant had similar binding affinity
to DNA compared to the wild-type PARP1 (Fig. [319]7f; Supplementary
Data [320]6k). Through an in vitro assay, we observed that the KA
mutant had decreased PARylation activity, while the KR mutant did not
influence the PARylation activity compared to wild-type PARP1
(Fig. [321]7g). To inhibit the AARS1-induced oncogenic effect on cells,
we identified the alanine structural analog alaninol as an AARS1
inhibitor. Alaninol inhibited cellular protein K-Ala and increased the
PARylation level (Fig. [322]7h). When cultured media were supplemented
with alaninol, the effects of AARS1 overexpression, including decreased
cell apoptosis, enhanced cell proliferation and invasion were reversed.
In addition, we noted that increased AARS1 expression promoted the
xenograft growth of tumor cells, especially in HU-treated cell
xenografts in nude mice, whereas inhibition of K-Ala with alaninol
delayed the xenograft growth of tumor cells (Fig. [323]7i).
Discussion
Duodenum is a relatively privileged organ that rarely develops
malignant tumors. The rarity and complicated subtypes of DC cause
unique diagnostic and therapeutic challenges. The landscape on the
molecular level, such as the genome, proteome, and phosphoproteome,
could serve as the basis for understanding the mechanism of the
carcinogenesis and as a resource for seeking potential diagnostic and
therapeutic targets. Generally, the (sub)stage of lesions during the
carcinogenesis of DC have complicated histopathological and
morphological characteristics, including the benign polyp (NT stage),
adenoma (e.g., LGIN and HGIN stage) to adenocarcinoma (e.g., MIA and
DAC stage)^[324]10. Considering the different operation strategies to
meet the risks of the patients at the different (sub)stages in the
clinic, the revelation of the dominant mutations and key events may be
conductive to exploring the molecular mechanism during DC
carcinogenesis and providing precise medication in the clinic of DC.
However, the pathological mechanism of the (sub)stage of the lesions in
DC progression are yet unclear. In this study, we employed 438 samples
from 156 DC cases covering 2 major and 5 rare subtypes, in which the D
and G subtypes (2 major subtypes) were further classified into 17 and 6
substages, respectively, providing a chance for exploring the origins
and proteogenomic characterization of DC subtypes and (sub)stages, and
tracking the key events during DC carcinogenesis in a time resolved
mode.
To achieve deep proteome coverage of such a tiny FFPE tissue volume,
pressure cycling technique (PCT) would be an available method and
effectively minimize sample loss^[325]69, while the lowest operable
weight (about 300 µg) is still higher than that in our study (under
100 µg). On the basis of the method of tris-2-carboxyethylphosphine
(TCEP) extraction protein, with high internal standard recovery
(>90%)^[326]70, we have achieved deep proteome and phosphoproteome
coverage of the trace samples in early-stage gastrointestinal
cancer^[327]71, and thus the method of TCEP extraction protein was
applied to explore the molecular characterization in DC progression.
Irrespective to that DC followed the same transition pattern of CRC,
from adenoma to adenocarcinoma^[328]10, the increasing incidence and
complex subtypes of DC impose needs for specific strategy urgently,
instead of approaching the similar manner of CRC in the clinic.
Relatively, the D and NET are malignant subtypes^[329]72,[330]73, and
comparably, the G subtype is benign^[331]74. Observation of the
globally akin among the D/NET/L subtypes, and among the G/N/C subtypes
(Supplementary Fig. [332]5d), indicated that the D/NET/L/P subtypes
were relatively malignant, and the G/N/C subtypes were comparably
benign in DC, providing a guidance in clinical intervention. Banck et
al. found the high activity of mTOR signaling in NET^[333]9. The
WES-based analyses disclosed that the DAC stage mutation of DST,
displayed positive impacts on the proteome-levels of DST and PKC, and
thus activated the mTOR signaling. Thus, the integration indicated that
the anti-mTOR therapy (e.g., everolimus and rapamycin)^[334]75,[335]76
might be medicative for the malignant subtypes, such as the D/NET/L
subtypes.
Several published studies pointed that smoking is a known risk factor
in a number of gastrointestinal (GI) diseases, including duodenal
ulcer^[336]77 and GI cancer (e.g., ESCA, CRC, and GC)^[337]78. In
addition, tobacco usage has important implications regarding gut
environment and various small intestine disorders, including functional
dyspepsia^[338]6, irritable bowel syndrome^[339]7 (Gastroenterology,
2017, PMID: 27725146), coeliac disease^[340]79, and duodenal ulcer
disease^[341]8. However, there is no literature directly suggest
tobacco usage increased the risk of duodenal adenocarcinoma. In our
cohort, we found more patients (60% vs 18.8%), especially the D subtype
patients, with smoking habits were observed in the Tobacco signature
group. Integration with the findings in our cohort and the impacts of
smoking on GI (e.g., ESCA, CRC, and GC), indicated that smoking or
tobacco usage might be a risk for DC, especially for the D subtype.
Furthermore, CYPs were overrepresented in the Tobacco signature group,
suggesting the potential mediation of CYPs for the patients with
smoking habit.
Familial adenomatous polyposis (FAP) and hereditary nonpolyposis
colorectal cancer (HNPCC), two rare inherited disorders^[342]80, are
infrequent in duodenum. In our cohort, the samples of FAP and HNPCC
were also collected and mainly included in the P3, patients had both D
and G subtypes. That implied that the FAP and HNPCC had the similar
traits of P3, which was characterized by cell cycle, ECM signaling,
etc. Whereas the limited samples of the FAP and HNPCC exhibited a
barrier for presenting comprehensive characterization, more samples are
needed for further analysis in the future.
The immune based investigation delineated the immune characteristics of
DC, and disclosed the potential actions of anti-CD248 in the Cold tumor
cluster, and anti-CD276 or CAR-T/CD276 in the Hot tumor cluster,
providing a clinic direction for DC. ARS families have been implicated
in the etiology of various disorders^[343]41, and can catalyze the
posttranslational addition of amino acids to lysine residues in
proteins^[344]68. Interestingly, AARS1, one of ARS families and
functioned in the DNA repair panel, showed positive correlation with
the TMB and the immune infiltration. In addition, AARS1 was positively
associated with the Hot tumor cluster dominant pathways, such as DNA
repair, cell cycle, apoptosis, etc. PARP1 inhibitory is observed in
homologous recombination cancer^[345]81,[346]82. In this study, we
built a connection between AARS1 and PARP1 in DC development, and
discovered that overexpressed AARS1 catalyzed K-Ala modification on
PARP1, resulting in an inhibitory effect on cell apoptosis. Thus, we
speculated AARS1, growingly increased during the tumor progression of
DC and along with enhanced DNA damage, was a potential target in the
clinic strategy of DC.
However, limits are still represented in our study. The trace amounts
of samples restricted the coverage at multi-omics level, which is also
a challenge for transcriptomic analysis. The possible bias was the
limits for proteome, and showed association with TF/plasma membrane-
related proteins, which might be the challenges for understanding the
characteristics of DC. The advances of MS with more in-depth coverage
of identifies in proteome needed to dissect the bias in proteome,
providing a more comprehensive proteomic landscape. In addition, to
explore the correlation between tobacco usage and DC, more evidences
are needed from other cohorts or epidemiologic studies with more DC
patients for further analysis.
In summary, our study presented a comprehensive multi-omics landscape
of DC. We discovered the characteristic origins and diverse
carcinogenesis tracks, and illustrated the kinetic waves of the
dominant cancer pathways of the D and G subtypes via integrative
proteogenomic analysis. Proteogenomics elucidated the positive impacts
of LYN amplification at the chr8q gain on MAPK signaling in the IFT
phase, and revealed the functions of DST mutation on mTOR signaling in
the DAC stage (Supplementary Fig. [347]9). We illuminated the
drug-targetable AARS1 in the TMB/immune infiltration in DC, which was
demonstrated the value of this multi-omics mapping strategy to suppress
tumor growth. We believe this study provides insights into
understanding DC and enables advances in understanding its mechanism
and diagnostics, delivering a useful resource for potential therapeutic
approaches and personalized medicine for DC.
Methods
Patient samples of early DC cohort
Construction of the DC cohort
Two hundred early-stage DC (EDC) patients underwent ESD therapy from
2012 to 2019 at Zhongshan hospital, Fudan University, and received no
prior anti-cancer treatments. Among of the EDC patients, 23 were
precluded duo to the unavailability of their normal tissue samples, and
41 patients failed to pass the pathological quality check. Thus, 136
EDC patients were eligible for the establishment of the intended study
cohort. In addition, 20 advanced-stage DC patients, without prior
anticancer treatments, were randomly enrolled for the first visit from
2012 to 2019 at Zhongshan hospital, Fudan University. Therefore, a
total of 156 DC patients were selected to construct DC cohort in this
study. The present study complied with the ethical standards of
Helsinki Declaration II and approved by the Research Ethics Committee
of Fudan University Zhongshan Hospital (B2019-200R). Written informed
consent was provided by all patients before any study-specific
investigation. Clinical information of individual patient is summarized
in Supplementary Data [348]1 (Table [349]1).
All the samples, collected in our cohort, were up to the following
criteria. Firstly, all the samples were well preserved and
systematically evaluated to confirm the histopathologic diagnosis and
any variant histology according to the WHO classification^[350]11 by
more than two expert gastrointestinal pathologists, who determined the
acceptable tissues segment based on the tumor content (>95%), the
presence and extent of tumor necrosis (<5%), and signs of invasion into
the muscularis propria. Secondly, according to the clinical sample
collection procedures in CPTAC and other published studies^[351]83, the
following criteria were used, (1) successful extraction in DNA and
protein; (2) no tumor cells in the normal tissue.
All cases were classified according to the 4th and 5th edition WHO
Classification of Digestive system. All substages and subtypes in the
cohort were assessed by more than two experienced gastrointestinal
pathologists on the basis of hematoxylin and eosin (H&E)-stained
sections, who would mark the tumor purity according to the proportion
of tumor cells. In our cohort, the tissues were classified into surface
epithelial (D subtype) and non-surface epithelial components
(G/NET/L/P/N/C subtypes). The D subtype was then classified into 5
stages, including NT stage, LGIN stage, HGIN stage; MIA stage, and DAC
stage. The NT stage included normal duodenal epithelial, mild
hyperplasia (1_1), heterotopic gastric fundus gland (1_2), gastric
antrum gland (1_3), duodenal epithelial hyperplasia (1_4), P-J polyp
(1_5), hyperplastic polyp (1_6). The LGIN stage included tubular
adenoma with mild atypia (2_1), tubular adenoma with mild-middle atypia
(2_2), tubulovillous adenoma with mild atypia (2_3), and tubulovillous
adenoma with mild-middle atypia (2_4). The HGIN stage included tubular
adenoma with middle-severe atypia (3_1), tubular adenoma with severe
atypia (3_2), tubulovillous adenoma with middle-severe atypia (3_3),
and tubulovillous adenoma with severe atypia (3_4) (Table [352]2). The
non-surface epithelial components were divided into G subtype (n = 97),
P subtype, N subtype), L subtype, C subtype, and NET subtype. The G
subtype, one subtype of non-surface epithelial components, comprised
normal (N–G), hyperplasia (H–G), and adenoma (A–G) (Table [353]3).
Finally, a total of 438 samples were collected and prepared for the
early and progressive duodenal cancer cohort.
Considering that sometimes with only 1 or 2 cases for each substage,
such as mild hyperplasia and hyperplastic polyp, we integrated the
substages into stages, and used the stage-based classification for the
proteogenomic analysis in this study (e.g., Figs. [354]2–[355]4),
including the NT, LGIN, HGIN, MIA, DAC stages in the D subtype, and the
N–G, H–G, and A–G stages in the G subtype.
FFPE sampling and processing
All the FFPE specimens were prepared and provided by Zhongshan
Hospital, Fudan University. For clinical sample preparation, slides
(10 μm thick) from FFPE blocks were macro-dissected, deparaffinized
with xylene and washed with ethanol. One 3 μm-thick slides from FFPE
blocks were sectioned for H&E staining. All the selected specimens were
scraped according to the substages, and were stored at −80 °C until
further processing.
Laser-capture microdissection (LCM)
In our cohort, we applied LCM^[356]25 to dissect the sections of
substage/subtype samples precisely. All FFPE specimens were
deparaffinized with xylene and rehydrated through graded alcohols and
water. The H&E Sections were stained with Mayer’s haematoxylin (Sigma)
and dehydrated through graded alcohols and xylene. Before
microdissection, FFPE specimens were sectioned with a microtome (10 μm
thick) and mounted from FFPE blocks were micro-dissected with a Leica
LMD 6500 laser microdissection system. According to the WHO
classification, all samples were systematically evaluated to confirm
the histopathologic diagnosis and variant histology by more than two
expert gastrointestinal pathologists, and then an area of
1–5 × 10^6 μm^2 was collected (derived from dissected area × slide
thickness/average mammalian cell volume of 2000 μm^3, BNID 100,434)
(Supplementary Fig. [357]1b). To avoid cross-tissues contamination
among the substages or subtypes, we collected the tissues at the core
area in each substage. The substage/subtype samples were collected in
1.5-ml tubes and kept in storage at −80 °C until further processing.
The methods are also applied in other published proteogenomic studies,
such as in ovarian cancer^[358]84 and glioblastoma^[359]85.
Whole-exome sequencing (WES)
WES were all performed by Novogene Co., Ltd. DNA from FFPE tumor tissue
samples were collected and used for WES, and matched germline DNA was
obtained from non-tumor tissue samples. One hundred and 20 samples from
47 cases were analyzed by WES including the D/G/P/N/C subtypes.
Paired-end sequencing (PE150) was performed on an Illumina HiSeq with a
135× target depth (mean) and 15.5 G volume (mean) of 120 raw data
(Supplementary Data [360]2a). The resulting sequence libraries (the
paired-end sequence and insert DNA between two ends) were quantified
with a Qubit 2.0 (Thermo Fisher) and insert size was determined using
an Agilent 2100 Bioanalyzer. Base calling was used to obtain the raw
data (sequenced reads) from the primary image data.
DNA extraction and DNA qualification
One hundred and twenty samples from 47 DC patients were conducted for
WES in our cohort. All the samples were firstly dewaxing with
dimethylbenzene, and then DNA degradation and contamination were
monitored on 1% agarose gels. DNA concentration was measured byQubit®
DNA Assay in Qubit® 2.0 Flurometer (Invitrogen, USA). A total amount of
at least 0.6 μg genomic DNA per sample was used as input for DNA sample
preparation.
Library preparation
A total amount of 0.6 μg genomic DNA per sample was used as input for
DNA sample preparation. Sequencing libraries were generated using
Agilent SureSelect Human All Exon kit (Agilent Technologies, CA, USA)
following manufacturer’s recommendations and index codes were added to
each sample.
Fragmentation was carried out by hydrodynamic shearing system (Covaris,
Massachusetts, USA) to randomly generate 180–280 bp fragments.
Remaining overhangs were converted into blunt ends via
exonuclease/polymerase activities. After adenylation of 3′ ends of DNA
fragments, adapter oligonucleotides were ligated. DNA fragments with
ligated adapter molecules on both ends were selectively enriched in a
PCR reaction. Then, libraries hybridize with liquid phase with biotin
labeled probe, then use magnetic beads with streptomycin to capture the
exons of genes. Captured libraries were enriched in a PCR reaction to
add index tags to prepare for sequencing. Products were purified using
AMPure XP system (Beckman Coulter, Beverly, USA) and quantified using
the Agilent high sensitivity DNA assay on the Agilent Bioanalyzer 2100
system.
The clustering of the index-coded samples was performed on a cBot
Cluster Generation System using Hiseq PE Cluster Kit (Illumina)
according to the manufacturer’s instructions. After cluster generation,
the DNA libraries were sequenced on Illumina Hiseq platform and 150 bp
paired-end reads were generated.
Quality control of data processing and analysis
Paired-end sequencing (PE150) was performed on an Illumina HiSeq
(Illumina Novaseq 6000). The resulting sequence libraries (the
paired-end sequence and insert DNA between two ends) were quantified
with a Qubit 2.0 (Thermo Fisher) and insert size was determined using
an Agilent 2100 Bioanalyzer. The original fluorescence image files
obtained from Hiseq platform are transformed to short reads (raw data)
by base calling and these short reads are recorded in FASTQ format,
which contains sequence information and corresponding sequencing
quality information. Base calling was used to obtain the raw data
(sequenced reads) from the primary image data.
Quality control:
1. Discard a paired read if either one read contains adapter
contamination (>10 nucleotides aligned to the adapter, allowing ≤
10% mismatches;
2. Discard a paired read if more than 10% of bases are uncertain in
either one read;
3. Discard a paired read if the proportion of low quality (Phred
quality < 5) bases is over 50% in either one read.
All the downstream bioinformatic analyses were based on the
high-quality clean data, which were retained after these steps. At the
same time, QC statistics including total reads number, raw data, raw
depth, sequencing error rate, percentage of reads with Q30 (the percent
of bases with phred-scaled quality score greater than 30) and QC
content distribution were calculated and summarized.
Reads mapping to reference sequence
Valid sequencing data was mapped to the reference human genome (UCSC
hg19) by Burrows–Wheeler Aligner (BWA) software^[361]86 to get the
original mapping results stored in BAM format. If one or one paired
read(s) were mapped to multiple positions, the strategy adopted by BWA
was to choose the most likely placement. If two or more most likely
placements presented, BWA picked one randomly. Then, SAMtools^[362]87
and Picard ([363]http://broadinstitute.github.io/picard/) were used to
sort BAM files and do duplicate marking, local realignment, and base
quality recalibration to generate final BAM file for computation of the
sequence coverage and depth. Mapping step was very difficult due to
mismatches, including true mutation and sequencing error, and
duplicates resulted from PCR amplification. These duplicate reads were
uninformative and shouldn’t be considered as evidence for variants. We
used Picard to mark these duplicates for follow-up analysis.
Detecting and callings of somatic mutations
BWA and Samblaster were used to genome alignment, and muTect
Software^[364]88 was used for targeting Somatic SNV sites, and
Strelka^[365]89 was used to test Somatic INDEL information. Statistics
used in the manuscript includes moderated t-statistics, and Fisher’s
exact test. The manuscript statistics were used to moderated t-test and
Fisher’s exact test.
Gain of neo-mutations
The mutation frequency was estimated by the ratio of the number of
mutated samples vs. the number of total samples^[366]90. In our study,
the neo-mutations represented the mutations appearing at a certain
stage, but was not identified in earlier stages. For example, DST
mutation was detected in the HGIN stage, but was not in the LGIN stage,
indicating DST mutation was the neo-mutation of HGIN stage.
Analysis of mutation spectra and mutational signature in DC cohort
The features of single nucleotide variants (SNVs) in diverse subtypes
of DC were disclosed. In our study, we used NMF algorithm to estimate
the minimal components explaining maximum variance among DC samples.
Each component was compared to mutation patterns of 30 validated cancer
signatures reported from the COSMIC database
([367]https://cancer.sanger.ac.uk/cosmic/signatures_v2.tt) to identify
cancer-related mutational signatures. Cosine similarity
analysis^[368]91,[369]92 was used to measure the similarity between
component and signatures, which ranged from 0 to 1, indicating maximal
dissimilarity to maximal similarity.
Analysis of somatic SNVs signatures (tobacco signature and DNA repair panel)
To identify SBS and portray the contribution across the whole genome
based on WES data, we applied the analysis procedure as an R/CRAN
package sigminer (Version 2.0.1)
([370]https://cran.r-project.org/web/packages/sigminer/), to extract,
analyze, and visualize mutational signatures for genomic variations,
providing insights into cancer study. The most common criteria to
estimate signature number is the cophenetic correlation coefficient
(Supplementary Fig. [371]3a). Sigminer package (Version 2.0.1) can
provide both relative and absolute exposures of cancer signatures. In
our study, we performed NMF analysis with 200 runs to extract
signatures, and then got the contribution proportion of samples from
the relative exposure of each signature. Finally, the samples with the
highest SBS29 (Tobacco signature) contribution proportion among all
signatures were grouped into SBS29 signature enriched group, or DNA
repair enriched group (DNA repair panel).
Impacts of the detected mutations on the protein and phosphoprotein levels
Analysis of SCNAs and the impacts on protein expressions
SCNAs analysis used WES-derived BAM files that were processed in the
somatic mutation detection pipeline. To investigate the cis-/trans-
effects of SCNAs at the chr8q, we focused on the genes were detected
both at the SCNA and protein levels, and then the spearman correlation
coefficients were calculated between SCNAs and their corresponding
proteins (FDR < 0.05).
Impacts of DST mutation
To investigate the impacts of the mutation of DST on the protein and
phosphoprotein levels, fold change (FC, Mut vs. WT) of the
proteins/phosphoproteins abundance of the two groups was compared, and
the (adjust)p-value (Wilcoxon signed-rank test) less than 0.05 was
considered. Then, we used gene sets of molecular pathways from
KEGG^[372]93/Reactome^[373]94 databases to compute the pathway.
Defining Cancer-associated Genes (CAGs)
CAGs were compiled from genes defined by Bailey et al. ^[374]45 and
CAGs were listed in Mertins et al.^[375]46 and adapted from Vogelstein
et al. ^[376]95. The list of genes is provided in Supplementary
Data [377]4b.
TCEP-based protein extraction and trypsin digestion
All samples in our cohort were dissected with microdissection and then
collected in 1.5 mL EP tubes, and then stored in −80 °C refrigerator.
The size of a single FFPE punch of early DC is no more than
0.1 × 0.1 × 3 mm (10 µm thickness), and the mass was under 100 µg
including wax. PCT would be an available method that effectively
minimize sample loss^[378]69, while the lowest operable weight (about
300 µg, 0.5 × 0.5 × 3 mm, 5 µm thickness) is still higher than that in
this study (under 100 µg). TCEP, a priorly reducing agent, leads to
high internal standard recovery (>95%) combined with iodoacetamid
(IAA)^[379]70, and are applied in other published studies, including
cholangiocarcinoma^[380]96, and urothelial carcinoma of the
bladder^[381]97. In addition, we previously have achieved deep proteome
and phosphoproteome coverage of the definite volume samples (nearly
less than 50 µg) in early-stage gastrointestinal cancer^[382]71 on the
basis of the method of TCEP extraction protein. Therefore, the method
of TCEP extraction protein was applied in this study.
Detailly, 50 μL TCEP buffer (2% deoxycholic acid sodium salt, 40 mM
2-chloroacetamide, 100 mM tris-phosphine hydrochloride, 10 mM
(2-carboxyl)-phosphine hydrochloride, 1 mM phenylmethylsulfonyl
fluoride mixed with MS water, PH 8.5) was added into 1.5 mL EP tubes
with prepared samples, and then heated in a 99 °C metal bath for
30 min. After cooling to room temperature, 3 μg trypsin (REF: V528A,
PROMEGA) was added into each tube and digested for 18 hours in a 37 °C
incubator. Then, 13 μL 10% formic acid was added into each tube and
made vortex for 3 min, and then centrifuged for 5 min (12,000g). Later,
a new 1.5 mL tube with 350 μL buffer (0.1% formic acid in 50%
acetonitrile) is needed for collected the supernatant for extraction
(vortex for 3 min, and then 12,000g centrifuged for 5 min). Then, the
supernatant was transferred into a new tube for drying in 60 °C vacuum
drier. After drying, 100 μL 0.1% formic acid was needed for dissolving
the peptides and vortex for 3 min, and then centrifuged for 3 min
(12,000g). The supernatant was picked into new tube and then
desalinated. Before desalination, the activation of pillars with 2
slides of 3 M C18 disk is required, and the lipid is as follows: 90 μL
100% acetonitrile twice, 90 μL 50 and 80% acetonitriler once in turn,
and then 90 μL 50% acetonitrile once. After pillar balance with 90 μL
0.1% formic acid twice, the supernatant of the tubes was loading into
the pillar twice, and decontamination with 90 μL 0.1% formic acid
twice. Lastly, 90 μL elution buffer (0.1% formic acid in 50%
acetonitrile) was added into the pillar for elution twice and only the
effluent was collected for MS. And then the collect of lipid was put in
60 °C vacuum drier for drying.
Proteome analysis in LC–MS/MS analysis
For the proteomic profiling of samples, peptides were analyzed on a
Q-Exactive HFX Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo
Fisher Scientific, Rockford, IL, USA) coupled with a high-performance
liquid chromatography system (EASY nLC 1200, Thermo Fisher). Dried
peptide samples re-dissolved in Solvent A (0.1% FA in water) were
loaded to a 2-cm self-packed trap column (100-μm inner diameter, 3 μm
ReproSil-Pur C18-AQ beads, Dr. Maisch GmbH) using Solvent A and
separated on a 150-μm-inner-diameter column with a length of 15 cm (1.9
μm ReproSil-Pur C18-AQ beads, Dr. Maisch GmbH) over a 150 min gradient
(Solvent A: 0.1% FA in water; Solvent B: 0.1% FA in 80% ACN) at a
constant flow rate of 600 nL/min. The eluted peptides were ionized
under 2.0 kV and introduced into mass spectrometer). MS was performed
under a data-dependent acquisition mode. For the MS1 Spectra full scan,
ions with m/z ranging from 300 to 1,400 were acquired by Orbitrap mass
analyzer at a high resolution of 120,000. The automatic gain control
(AGC) target value was set as 3E6. The maximal ion injection time was
80 ms. MS2 Spectra acquisition was performed in the ion trap mode at a
rapid speed. Precursor ions were selected and fragmented with higher
energy collision dissociation (HCD) with a normalized collision energy
of 27%. Fragment ions were analyzed by the ion trap mass analyzer with
the AGC target at 5E4. The maximal ion injection time of MS2 was 20 ms.
Peptides that triggered MS/MS scans were dynamically excluded from
further MS/MS scans for 12 s.
In our study, 438 samples were randomly assigned number and then
performed proteomic analysis. The HEK293T cell (National Infrastructure
Cell Line Resource) lysates were measured as the standard every 15
samples injection to assess the stability of the performance of mass
spectrometry.
Phosphopeptide enrichment and analysis
All qualified profiling data were processed at firmiana platform
against the human RefSeq protein database (updated on 04-07-2013) in
the National Center for Biotechnology Information (NCBI). Owing to the
definite volume of the samples, only 111 samples (from 49 EDC patients)
were found to be adequate.
The phosphoproteome samples were prepared by Fe-NTA Phosphopeptide
Enrichment Kit (Thermo, Catalog: A32992) according to the
manufacturer’s instruction. Briefly, 2 mg peptides were resuspended in
200 μL binding/wash buffer and loaded to the equilibrated spin column.
The resin was mixed with the sample by gently tapping. The mixture was
incubated for 30 min and centrifuged at 1000g for 30 s to discard the
flowthrough. The column was then washed by 200 μL of binding/wash
buffer and centrifuged at 1000g for 30 s for 3 times and washed by
200 μL of LC–MS grade water for one time. The phosphopeptide was eluted
by adding 100 μL of elution buffer and centrifuged at 1000g for 30 s 2
times. Phosphopeptides were dried down for LC–MS/MS analysis.
Quantification of global proteomic data and phosphoproteomic data
In our study, all MS raw files were processed at firmiana
platform^[383]98 (a one-stop proteomic cloud platform:
[384]http://www.firmiana.org). Briefly, all MS raw files were searched
against the NCBI human RefSeq protein database (updated on 04-07-2013,
32,015 entries) in Mascot search engine (version 2.3, Matrix Science
Inc). Trypsin was used as the proteolytic enzyme allowing up to two
missed cleavages. Carbamidomethyl (C) was considered as a fixed
modification. For the proteome profiling data, variable modifications
were oxidation (M) and acetylation (Protein N-term). For the
phosphoproteome data, variable modifications were oxidation (M),
acetylation (Protein N-term) and phospho (S/T/Y). All the
identification peptides were quantified at firmiana platform with peaks
area derived from their MS1 intensity. The mass tolerances were 20 ppm
for precursor and 50 mmu for the product collected by Q-Exactive HFX,
which has been applied in previously published studies^[385]97.
Precursor ion score charges were limited to +2, +3, and +4. The FDRs of
the peptide-spectrum matches (PSMs) and proteins were set at a maximum
of 1%. The same cutoff strategies of FDR have been widely used in
recently published researches^[386]99,[387]100. Label-free protein
quantifications were calculated in our cohort, that so called iBAQ
algorithm^[388]26,[389]27, which divided the protein abundance (derived
from identification peptides’ intensities) by the number of
theoretically observable peptides. Then the FOT, defined as a protein’s
iBAQ divided by the total iBAQ of all identified proteins within one
sample, was used to represent the normalized abundance of a particular
protein across samples.
A total of 438 samples from 156 DC patients for proteomic profiling,
and all the proteome data were processed as follows: EDC1 (14,426 GPs):
all 14,426 gene products (GPs) identified in 438 samples (156 EDC
cases) on the basis of the match between runs algorithm
(MBRs)^[390]101; EDC2 (11,904 GPs): all the proteins were required to
have at least 2 unique strict peptide, and we excluded keratins
proteins of which the maximum FOT in all 438 samples were less than
1.0E−5 in FOT; EDC3 (7400 GPs): GPs were identified in more than 20%
samples of each stage, which was also applied in other published
studies^[391]102,[392]103.
At the phosphoprotein level, a total of 111 samples from 49 DC patients
were conducted, and 38,696 phosphosites, corresponding to 9716
phosphoproteins, were identified. The phosphosites (n = 12,851)
identified in more than 20% samples of every stage were used in the
analysis of our cohort. In addition, the phosphoprotein/phosphopeptide
abundance was adjusted by the total protein counterpart abundance,
which has been applied in previous published studies^[393]104.
Data imputation
For the missing value in our study, we applied MBRs
algorithm^[394]101,[395]105 in this study, which has been proved to be
an effective technique to fill the missing values, which was widely
used in other proteomic studies^[396]106. Detailly, we built a dynamic
regression function based on common identification peptides in samples.
According to correlation value R^2, the function chooses linear or
quadratic function for regression to calculate retention time (RT) of
corresponding hidden peptides, and check the existence of the extracted
ion chromatogram (XIC) based on the m/z and calculated RT. The function
evaluated the peak area values of those existed XICs. These peak area
values are considered as parts of corresponding proteins. This strategy
has been applied in other published proteomic
studies^[397]102,[398]107.
Principal component analysis (PCA)
A total of 7400 proteins (FOT) were used for PCA in R (version 3.5.1).
We performed PCA to investigated the three principal components and to
visualize separation of all the stages of the D and G subtypes.
The hierarchical clustering for 7 subtypes of DC
To compare the correlation and distance of 2 major and 5 rare subtypes
of duodenal cancer, hierarchical clustering was performed by the
fastcluster package (version 1.1.25) in R (version 3.5.1). The
expressed proteins of 2 major and 5 rare subtypes of duodenal cancer
were used to reduce the individuality of different patients. Average
was picked for method in the fast cluster. The distance between two
clusters, A and B, is the average between the points in the two
clusters:
[MATH: d(A,B)=1
/∣A∣∣B∣Σ_(a∈A,b∈B)d(a,b) :MATH]
1
Differential proteomics analysis and pathways enrichment
In comparison of the features of the D and G subtypes, the DEPs of the
DAC stage in the P1 and the A-G stage in the P2 were identified.
Meantime, the DEPs of the normal tissues in the D and G subtypes were
integrated to investigate the origins of the D and G subtypes. Then, we
used gene sets of molecular pathways from KEGG^[399]93 and
Reactome^[400]94 databases to compute single sample gene set enrichment
scores for each sample. Statistical significance was considered when
FDR (adjust) was less than 0.05.
When comparing the DEPs of different stages of the D or G subtypes, we
focused on protein abundance (average) at each stage, which were
enriched by KEGG/GO database^[401]93. We then annotated the signaling
pathways (FDR < 0.05) and manually checked the pathway associated
proteins, which were then estimated whether significantly associated
with the stages of the D and G subtypes (Kruskal–Wallis test, adjust.
p < 0.05).
In differential analysis of proteins in DC progression (gradually
decreased or increased) at the protein level, the highly expressed
proteins of each stage/panel were screened, in which foldchange (FC)
and adjust. p-value (Kruskal–Wallis test) were considered. Statistical
analysis was performed in R (version 3.5.1).
To assess the impacts of the four immune clusters, we applied gene set
variation analysis (GSVA)^[402]108 for pathways enrichment analysis, in
which molecular Signatures Database (MSigDB) of KEGG gene sets were
used. In addition, to assess the correlation between mutation
burden/proteins (e.g., LYN and AARS1) and the enrichment pathways, the
enrichment value was used. For example, as for the correlation between
AARS1 and the represented pathways in Fig. [403]5, the R values and P
values were based on the intensity of AARS1 and the enrichment values
of the represented pathways among the samples in the Fudan cohort.
Correlation analysis
In the analysis of the correlation among proteins, we excluded the
outliers and missing values, and two-sided Spearman’s correlation test
was applied to explore the correlation among the HEK293T cells’
samples, and the stage’s samples of DC. In addition, two-sided
Pearson’s correlation test was applied to investigate the correlation
(1) between two proteins/phosphoproteins, (2) between proteins and
pathways, and (3) between proteins and TMB in the study.
Analysis of the specific proteins and features of the D and G subtypes
To explore the carcinogenesis lineages of the D and G subtypes, the
DEPs (Kruskal-Wallis test, adjust. p < 0.05) in P1 and P2 were used
into analysis, which were then applied to the pathways enrichment, and
consequentially resulted into the characterized features of the D and G
subtypes carcinogenesis.
Consensus clustering analysis of immune infiltration in DC
Unsupervised clustering of the patient samples at the different
molecule levels were performed with the ConsensusClusterPlus R package
(ConsensusclusterPlus, v1.46.0)^[404]109. Samples were clustered using
Euclidean distance as the distance measure. A total of 7400 proteins
were used for k-means clustering with up to 10 clusters. The consensus
matrix of k = 4 showed clear separation among clusters and was
appropriate in consideration of the percentage of the samples with
stages pathologically. Taken together, proteome clusters were defined
using k-means consensus clustering with k = 4.
Analysis of the immune signature of the D and G subtypes
Proteomic data is often applied to investigate the immune filtration
and the response to immunotherapy. For example, Keren et.al elucidated
a structure tumor-immune microenvironment in triple negative breast
cancer^[405]110. The landscapes of response to immunotherapy in
melanomas was based on the proteomic data^[406]106. As well as RNA
data, proteomic data, used for xCell enrichment analysis, is applied to
estimate immune characterization^[407]97,[408]111,[409]112. Thus, in
our cohort, we performed the xCell enrichment analysis based on the
proteomic data to investigate the immune filtration of DC. The
abundance of 64 different cell signatures in 438 samples (156 DC cases)
were computed via xCell ([410]https://xcell.ucsf.edu)^[411]57 and then,
consensus clustering was performed to identify subtypes of the samples.
As a result, 4 immune clusters were profiled, including Hot tumor
cluster, Cold tumor cluster, Epithelial cluster, and the B cells
cluster. Each cell signature and immune/stromal/environmental score
were normalized to Z-score. In addition, we performed the differential
analysis of the identified CD/MHC molecules based on global proteomic
data.
To further evaluate the features of the D and G subtypes, we compared
the expression of immune cell markers, which were CellMarker web
([412]http://biocc.hrbmu.edu.cn/CellMarker/). Log[2] FC (P1 vs. P2
ratio) of markers expression from the DNA repair panel was taken into
consideration of the features and specific carcinogenesis lineages of
diverse subtypes of duodenal cancer (D and G subtypes).
Kinase activity prediction and phosphopeptide analysis
The ratios of identified phosphorylation sites of all samples were used
to estimate the kinase activities by Kinase-Substrate Enrichment
Analysis (KSEA) algorithm^[413]113. The information of kinase-substrate
relationships was obtained from publicly available databases, including
PhosphoSite^[414]114, Phospho.ELM^[415]115 and PhosphoPOINT^[416]116.
The information of substrate motifs was obtained either from the
literature^[417]117 or from an analysis of the KSEA dataset with Motif
(sP)^[418]118. The kinase-substrate-motif network analysis was
referenced from PhosphoSitePlus (PSP,
[419]https://www.phosphosite.org/homeAction)^[420]119 and NetworKIN
3.0^[421]120. Statistical analysis was performed in R (version 3.5.1)
with Kruskal–Wallis test.
IHC analysis
AARS1 in the tissue by IHC staining, 3-μm-thick sections from FFPE
tissue block were de-waxed with xylene and rehydrated through a graded
series of ethanol, prepared by Zhongshan Hospital, Fudan University.
Total AARS1 was performed on representative samples from normal to
progressive DC (Fig. [422]6a). Polyclonal antibody for AARS1 (dilution
1:100, Proteintech, catalog No:17394-1-AP) was used for IHC. Antigen
was retrieved by autoclaving for 10 min at 121 °C in Novocastra Epitope
retrieval Solution pH 6 (Leica Biosystems). The UltraVision Quanto
Detection System HRP DAB (Thermo Fisher Scientific) was used according
to the manufacturer’s instruction. Finally, the sections were
counterstained with hematoxylin and then mounted.
Cell lines and cell culture
Human HEK293T (ATCC, Catalog: CRL-11268; RRID: CVCL_QW54) cells were
cultured in high-glucose Dulbecco’s modified Eagle’s medium (DMEM;
HyClone) supplemented with 10% fetal bovine serum (FBS; Invitrogen),
100 units/mL penicillin (Invitrogen), and 100 μg/mL streptomycin
(Invitrogen). Human Hutu80 (ATCC, Catalog: CRL-7928; RRID: CVCL_1301)
and WDC-1 (Cobioer, Catalog: CBP61181, RRID: CVCL_R803) cells were
cultured in RPMI 1640 medium (Invitrogen) supplemented with 10% fetal
bovine serum (Invitrogen, Carlsbad, CA, USA), 100 units/mL penicillin
(Invitrogen), and 100 μg/mL streptomycin (Invitrogen). The cells were
incubated in 5% CO[2] at 37 °C. Cell transfection was performed using
Lipofectamine 3000 (Invitrogen). All the cell lines were authenticated
with short tandem repeat (STR) profiling method.
Plasmid constructs and transfection
Whole length human AARS1, PARP1, SIRT1, SIRT2, SIRT6, and SIRT7 were
amplified from HEK293T cDNA and cloned into the Xho I and EcoR I
restriction sites of the pcDNA3.1-Flag/HA vector using CloneExpress
MultiS One Step Cloning Kit (Vazyme Biotech Co., Ltd., Nanjing, China,
Catalog: C113-02). The PARP1 and AARS1 mutants were generated by
site-directed mutagenesis using the Mut Express MultiS Fast Mutagenesis
kit (Vazyme Biotech Co., Ltd., Nanjing, China, Catalog: C215-01)
according to the manufacturer’s instructions. Each plasmid was
transfected using Lipofectamine 3000 (Invitrogen) according to the
manufacturer’s instructions.
The primers used were as follows:
AARS1: forward, 5′-AACGGGCCCTCTAGACTCGAGATGGACTCTACTCTAACA-3′
reverse, 5′-TAGTCCAGTGTGGTGGAATTCGTTCTTTACATCCCCGAGG-3′
PARP1: forward, 5′-AACGGGCCCTCTAGACTCGAGATGGCGGAGTCTTCGGATA-3′
reverse, 5′-TAGTCCAGTGTGGTGGAATTCCCACAGGGAGGTCTTAAAA-3′
SIRT1: forward, 5′-AACGGGCCCTCTAGACTCGAGATGGCGGACGAGGCGGCC-3’
reverse, 5′-TAGTCCAGTGTGGTGGAATTCTGATTTGTTTGATGGATAG-3′
SIRT2: forward, 5′-AACGGGCCCTCTAGACTCGAGATGGACTTCCTGCGGAACT-3′
reverse, 5′-TAGTCCAGTGTGGTGGAATTCCTGGGGTTTCTCCCTCTCT-3′
SIRT6: forward, 5′-AACGGGCCCTCTAGACTCGAGATGTCGGTGAATTACGCGG-3′
reverse, 5′-TAGTCCAGTGTGGTGGAATTCGCTGGGGACCGCCTTGGCC-3′
SIRT7: forward, 5’-AACGGGCCCTCTAGACTCGAGATGGCAGCCGGGGGTCTG-3′
reverse, 5’-TAGTCCAGTGTGGTGGAATTCCGTCACTTTCTTCCTTTTTGT-3′
AARS1 R751G: forward, 5′-GGGTATCCGGGGGATTGTGGCTGTCACAGGT-3′
reverse, 5′-GCCACAATCCCCCGGATACCCTTGGCAATGG-3′
PARP1 K23A: forward, 5′-TCTTGCAAGGCATGCAGCGAGAGCATCCCCA-3′
reverse, 5′-CTCGCTGCATGCCTTGCAAGAGGCGCGCCCG-3′
PARP1 K23R: forward, 5′-TCTTGCAAGAGATGCAGCGAGAGCATCCCCA-3′
reverse, 5′-CTCGCTGCATCTCTTGCAAGAGGCGCGCCCG-3′
PARP1 K414/418A: forward, 5′-GAGGCACTCGGGGGGGCGTTGACGGGGACGG-3′
reverse, 5′-CAACGCCCCCCCGAGTGCCTCAATCATGGCC-3′
PARP1 K414/418R: forward, 5′-GAGAGACTCGGGGGGAGGTTGACGGGGACGG-3′
reverse, 5′-CAACCTCCCCCCGAGTCTCTCAATCATGGCC-3′
PARP1 K621A: forward, 5′-TATGAAGAAGCAACCGGGAACGCTTGGCACT-3′
reverse, 5′-GTTCCCGGTTGCTTCTTCATATAATTTCATG-3′
PARP1 K621R: forward, 5′-TATGAAGAAAGAACCGGGAACGCTTGGCAC-3′
reverse, 5′-GTTCCCGGTTCTTTCTTCATATAATTTCATG-3′
RNA interference. Synthetic oligos were used for siRNA-mediated
silencing of AARS1, and scramble siRNA was used as a control. Cells
were transfected with siRNAs using Lipofectamine 3000 according to the
manufacturer’s protocol. Knockdown efficiency was verified by western
blotting. SiRNA sequences was as follows: AARS1: 5′- CAG CAA GTG AAA
TCC GGC AGC GATT-3′.
Stable gene overexpression
For stable AARS1 overexpression, HEK293T cells (ATCC, Catalog:
CRL-11268, RRID: CVCL_QW54) were co-transfected with pCMV-VSV-G,
pCMV-Gag-Pol, and pBABE-AARS1. Transfected cells were cultured in
Dulbecco’s Modified Eagle’s Medium containing 10% FBS for 6 h.
Twenty-four hours after transfection, culture medium supernatant was
collected and used for retrovirus preparation to infect Hutu80 cells at
10% confluency in 90-mm-diameter dishes. Cells were re-infected 24 h
after the first infection and selected using 5 μg/mL puromycin
(Amresco, Solon, OH, USA). AARS1 was subcloned into the BamH I and EcoR
I restriction sites of the pBABE vector using the ClonExpress MultiS
One-Step Cloning Kit (Vazyme Biotech Co., Ltd, Nanjing, China, Catalog:
C113-02). The sequences of primers used were as follows: pBABE-AARS1:
forward, 5′-ctc tag gcg ccg gcc gga tcc ATG GAC TCT ACT CTA ACA-3′, and
reverse, 5′-ggt ctt ctc gtc cat gaa ttc GTT CTT TAC ATC CCC GAG-3′.
Western blot analyses
Cultured cells or cells extracted from duodenal cancer tissue and
patient-matched normal tissues were homogenized with 0.5% NP-40 buffer
containing 50 mM Tris-HCl (pH 7.5), 150 mM NaCl, 0.5% Nonidet P-40, and
a mixture of protease inhibitors (Sigma-Aldrich). After centrifugated
at 12,000 rpm and 4 °C for 15 min, the supernatants of the lysates were
collected for western blotting according to standard procedures.
Anti-AARS1 (dilution 1:100, Proteintech, catalog No:17394-1-AP),
Anti-Poly (ADP-Ribose) polymer (dilution 1:100, Abcam, catalog
No:14459), Anti-Histone H2A.X (dilution 1:100, Cell Signaling
Technology, catalog No: 7631), Anti-Phospho-Histone H2A.X(Ser139)
(dilution 1:200, Cell Signaling Technology, catalog No: 9718),
Anti-SARS1 (dilution 1:500, Abclonal, catalog No: A13350), Anti–TARS1
(dilution 1:500, Abclonal, catalog No: A6993), Anti-Flag (dilution
1:5000, Proteintech, catalog No: 20543-1-AP), Anti-HA (dilution 1:5000,
Proteintech, catalog No: 51064-2-AP), Anti-PARP1 (dilution 1:500,
Proteintech, catalog No: 13371-1-AP), Anti-TARS1 (dilution 1:500,
Proteintech, catalog No: 14773-1-AP), Anti-GARS1 (dilution 1:500,
Proteintech, catalog No.: 15831-1-AP), Anti-Actin (dilution 1:1000,
Proteintech, catalog No.: 66009-1-Ig), Anti-GAPDH (dilution 1:5000,
Proteintech, catalog No.: 60004-1-Ig). Detection was performed by
measuring chemiluminescence on a Typhoon FLA 9500 (GE Healthcare,
Little Chalfont, UK).
Sample preparation for LC–MS/MS analysis
For detecting AARS1-interacting proteins or lysine-alanylation sites on
PARP1, Hutu80 cells were transfected with AARS1-Flag or PARP1-Flag
vector. Cells were harvested and lysed in 0.5% NP-40 buffer, and then
supernatants were immunoprecipitated with anti-FLAG M2 magnetic beads
(dilution 1:5000, Proteintech, catalog No: 20543-1-AP) for 3 h in 4 °C.
The precipitates were washed twice with 0.1% NP-40 buffer, twice with
ddH[2]O and thrice with 50 mM NH[4]HCO[3], after which on-bead tryptic
digestion was performed at 37 °C overnight. The peptides in the
supernatants were collected through centrifugation and dried in a speed
vacuum (Eppendorf). Obtained peptides were stored at −80 °C until
LC–MS/MS analysis.
K-Ala site identification
Raw MS files were analyzed using MaxQuant version 1.4.1.2. MS/MS
spectra were searched using the Andromeda search engine against the
SwissProt-human database (Release 2014-04-10) containing forward and
reverse sequences. In the main Andromeda search precursor, mass and
fragment mass had an initial mass tolerance of 5 ppm and 0.05 Da. The
search included alanylation of lysine and oxidation of methionine.
Minimal peptide length was set to seven amino acids, and a maximum of
four miscleavages was allowed. The FDR was set to 0.01 for peptide and
protein identifications.
PARP1 activity assay in vitro
PARP1 self-PARylaton assay in vitro was modified from a reported
method^[423]121. In brief, experiments were performed in a 40 μl
reaction mix contains 50 mM HEPES (pH = 8.0), 4 mM DTT, 20 mM MgCl[2],
25 μl NAD^+, 200 nM DNA and 2 μg PARP1 protein. The reaction was
allowed for 30 min at 30 °C. The reaction mix was then collected for
western blotting to detect poly (ADP-ribose) polymer levels. For
measuring PARP1 activity in cell extracts and in vitro reactions, a
PARP1 colorimetric assay kit (BPS Bioscience, USA, Cat# 80580) was
employed according to manufacturer’s instruction. To evaluate the
reduction of PARP1 activity by interacting with AARS1, purified AARS1
was added into the reaction mix and incubated with PARP1 for 30 min at
4 °C at the sample preparation step of the kit.
DNA-binding capacity assay
To test the DNA-binding capacity of PARP1 in vitro, Hutu80 cells were
transfected with wildtype and mutant PARP1 plasmid. After cells were
harvested, a DNA–Protein Binding Assay kit (Abcam, Catalog: 117139) was
used to detect the DNA-binding capacity of PARP1 according to the
manufacturer’s instructions.
Apoptosis and cell proliferation assay through flow cytometry
Sample preparation: A Beckman Coulter flow cytometer (Beckman Coulter,
Brea, CA, USA) was used to detect apoptotic cells. Data were collected
on a Beckman Coulter flow cytometer (Beckman Coulter, Brea, CA, USA).
An Annexin V-FITC Apoptosis Detection Kit (BD Biosciences) was used to
detect apoptotic cells according to manufacturers’ instruction. In
addition, apoptosis of cells was gated using Annexin V and PI following
doublet exclusion using FSCHxW and SSC-HxW. Cell proliferation was
assessed by the Cell Counting Kit-8 (Dojindo Laboratories, Kumamoto,
Japan, Catalog: CK04). In brief, Hutu80 cells were seeded in a 96-well
plate with 4 × 10^3 cells per well and allowed to adhere. Cell Counting
Kit-8 solution (10 μl) was added to each well, and the cells were
cultured in 5% CO[2] at 37 °C for 2 h. Purity of isolated samples was
obtained by antibody stain and FACS. Sample purity was greater than
95%. Cell proliferation was determined by measuring the absorbance at
450 nm. Instrument: Beckman Coulter flow cytometer (Beckman Coulter,
Brea, CA, USA). Software: Flowjo version 10.7.1 (Becton Dickinson Life
Science).
Comet assay
A Comet Assay Kit (Trevigen, Catalog: 4250-050-K) was used to detect
single- and double-stranded DNA breaks in cultured cells and tissues.
Slides were examined (under 425–500-nm excitation) using a Leica DMI
4000B epifluorescence microscope. Comet slides were used for each
condition. In the assay, the fluorescence in normal cells is mostly
confined to the nucleus because intact DNA cannot migrate. In
DNA-damaged cells, DNA is denatured with an alkaline or neutral
solution for detecting single-/double- stranded breaks, respectively;
negatively charged DNA fragments are released from the nucleus and
migrate toward the anode.
Mouse xenograft studies in vivo
Four- to six-week-old male BALB/c nude mice were obtained (Shanghai
SLAC Laboratory Animal Co., Ltd., Shanghai, China) for in vivo
xenografts (n = 6 mice per group). Mice were housed in polycarbonate
cages, and provided free access to food and water with a 12-h light:
dark cycle. Control cells and cells overexpressing AARS1 from Hutu80
cell lines were subcutaneously heterotransplanted into the left and
right flanks of each mouse. For HU and alaninol treatment, 100 mg/kg HU
or 100 mg/kg alaninol was intraperitoneally injected into the abdominal
cavities of the animals twice a week. At the end of the experiment,
following euthanasia, tumors were excised, weighed and imaged. All
procedures were performed with approval from the Animal Care Committee
at Fudan University. The maximal tumor burden permitted by the
committee is 2000 mm^3.
Alanylation and de-alanylation in vitro
In vitro alanylation and de-alanylation reactions were carried out as
described before^[424]68. In vitro alanylation was carried out in a
30-μl reaction mix containing 50 mM HEPES (pH 7.5), 25 mM KCl, 2 mM
MgCl[2], 5 mM alanine, 4 mM ATP, 10 nM AARS1, and 0.05 mg/ml synthetic
substrate peptide. The final pH of each reaction mix was adjusted to
7.5 before adding AARS1. The reaction was allowed for 3 h at 37 °C. The
peptide was desalted by passing through a C18 ZipTip (Millipore) and
was subjected to analyzation by a MALDI-TOF/TOF mass spectrometer
(SCIEX-5800).
In vitro de-alanylation reactions were carried out in a 30-μl reaction
mix containing 50 mM HEPES (pH 7.5), 6 mM MgCl[2], 1 mM DTT, 1 mM NAD+,
0.05 mg/ml synthetic peptide, 1 mg/ml SIRT6 and 1 mM PMSF. The reaction
was allowed for 4 h at 37 °C. The peptide was desalted by passing
through a C18 ZipTip (Millipore) and was subjected to analyzation by a
MALDI-TOF/TOF mass spectrometer (SCIEX-5800).
Quantification and statistical analysis
Statistical details of experiments and analyses can be observed in the
(Supplementary data)/figure legends in this study. All the analyses
were performed in R (version 3.5.1) and GraphPad Prism (version 7.0).
Standard statistical tests were used to analyze the clinical data,
including but not limited to Wilcoxon signed-rank test, Fisher’s exact
test, Kruskal–Wallis test. The statistical significance of differences
between two groups was calculated with the Wilcoxon rank-sum test; for
more than two group comparisons, Kruskal–Wallis test was used. Fisher’s
exact test was used for categorical variables and Wilcoxon rank-sum
test was used for continuous variables, when testing association of
different groups with clinical variables. As for the correlation
analysis between two proteins/phosphoproteins, Pearson’s correlation
(two-sided) of correlation coefficients were used. All statistical
tests were two-sided except special explanations. For validation
experiments, each was repeated at least three times independently,
representative photos were shown. To account for multiple-testing, the
p values were adjusted using the Benjamini-Hochberg FDR correction.
Data in the boxplot were presented median (central line), upper and
lower quartiles (box limits), 1.5× interquartile range (whiskers).
Statistical significance was considered when p-value < 0.05. For
validation experiments, each was repeated at least three times
independently, representative photos were shown.
Reporting summary
Further information on research design is available in the [425]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[426]Supplementary Information^ (11.3MB, pdf)
[427]Peer review file^ (15.2MB, pdf)
[428]41467_2023_37221_MOESM3_ESM.pdf^ (143.2KB, pdf)
Description to Additional Supplementary Information
[429]Dataset 1^ (46.3KB, xlsx)
[430]Dataset 2^ (30.6MB, xlsx)
[431]Dataset 3^ (36.1MB, xlsx)
[432]Dataset 4^ (20.8KB, xlsx)
[433]Dataset 5^ (189.6KB, xlsx)
[434]Dataset 6^ (41.9KB, xlsx)
[435]Reporting Summary^ (5.2MB, pdf)
Acknowledgements