Abstract

Simple Summary

   Prostate cancer (PCa) is one of the most common cancers. Due to the
   limited and invasive approaches for PCa diagnosis, it is crucial to
   identify more accurate and non-invasive biomarkers for its detection.
   The aim of our study was to non-invasively uncover new protein targets
   for detecting PCa using a proteomics and proteogenomics approach. This
   work identified several dysregulated mutant protein isoforms in urine
   from PCa patients, some of them predicted to have a protective or an
   adverse role in these patients. These results are promising given
   urine’s non-invasive nature and offers an auspicious opportunity for
   research and development of PCa biomarkers.

Abstract

   To identify new protein targets for PCa detection, first, a shotgun
   discovery experiment was performed to characterize the urinary proteome
   of PCa patients. This revealed 18 differentially abundant urinary
   proteins in PCa patients. Second, selected targets were clinically
   tested by immunoblot, and the soluble E-cadherin fragment was detected
   for the first time in the urine of PCa patients. Third, the
   proteogenome landscape of these PCa patients was characterized,
   revealing 1665 mutant protein isoforms. Statistical analysis revealed 6
   differentially abundant mutant protein isoforms in PCa patients.
   Analysis of the likely effects of mutations on protein function and
   PPIs involving the dysregulated mutant protein isoforms suggests a
   protective role of mutations HSPG2*Q1062H and VASN*R161Q and an adverse
   role of AMBP*A286G and CD55*S162L in PCa patients. This work originally
   characterized the urinary proteome, focusing on the proteogenome
   profile of PCa patients, which is usually overlooked in the analysis of
   PCa and body fluids. Combined analysis of mass spectrometry data using
   two different software packages was performed for the first time in the
   context of PCa, which increased the robustness of the data analysis.
   The application of proteogenomics to urine proteomic analysis can be
   very enriching in mutation-related diseases such as cancer.

   Keywords: prostate cancer, urine, human, biomarker, proteome,
   proteogenome, label-free quantitation, immunoblot

1. Introduction

   Prostate cancer (PCa) is one of the most prevalent cancers among men
   and the fifth leading cause of cancer-related death [[44]1]. When
   detected at early stages, PCa can be treated. However, PCa diagnosis is
   challenging, largely due to the low specificity of PSA tests,
   particularly in the diagnostic window of 4–10 ng/mL [[45]2], which
   underscores the need to identify new and more accurate biomarkers.

   An ideal biomarker for PCa should be non-invasively assessed,
   inexpensive, highly sensitive, and specific [[46]3]. For anatomical
   reasons, urine is enriched in prostatic secretions and better reflects
   the molecular changes associated with the prostate than blood, which
   contains markers and confounding factors from the whole body. Urine can
   be serially collected, requiring minimal processing steps, and presents
   a simpler matrix with more stability than blood [[47]4].

   The phenotype role of proteins combined with the variety of techniques
   available for proteome analysis makes the search for protein markers in
   cancer a very attractive strategy [[48]5]. Some promising
   single-protein biomarkers have been reported, such as AMBP [[49]6] and
   zinc-alpha-2-glycoprotein (AZGP1) [[50]7,[51]8]. AMBP discriminated PCa
   and benign prostatic hyperplasia (BPH) patients with a highest accuracy
   than that estimated for PSA [[52]9], using 2D-DIGE MALDI-TOF/TOF and
   immunoturbidimetry as discovery and validation approaches,
   respectively. AZGP1 significantly improved the prediction of PCa in a
   cohort of candidates for a prostatic biopsy, using isobaric stable
   isotope labeling and 2D-LC-MS/MS as the discovery method and Western
   Blot as the validation approach. Multi-marker panels have been shown to
   improve performance because they better reflect the cancer complexity
   and heterogeneity, addressing the limitations of single biomarkers.
   Although promising, no urine protein panel is available for clinical
   practice due partly to failure in clinical validation, reflecting the
   need to discover new biomarkers and/or new combinations of biomarkers
   [[53]7,[54]8]. Interestingly, and to the best of our knowledge, only
   one assay (Promark^®) that quantifies a protein panel in prostate
   tissue by Mass Spectrometry (MS) is commercially available [[55]10]
   and, to date, only four mRNA-based urine tests—PCA3 [[56]11], SelectMDX
   [[57]12], ExoDx Prostate(IntelliScore) [[58]13], and MyProstateScore
   [[59]14]—have been commercialized.

   Cancer is driven by accumulated mutations and other genomic alterations
   [[60]15]. Mutations on proteins can affect their structure, function,
   and stability, which may increase their susceptibility to being
   degraded [[61]16]. As in other types of cancer, in PCa, a weak
   correlation between RNA and proteins expression is observed. Therefore,
   the effect of mutations should also be directly investigated at the
   protein level [[62]17]. To address this inference problem, integration
   of genome and proteome data (proteogenome) analyses has been performed
   to identify mutant protein isoforms. Integrated proteogenome analysis
   can provide new insights into PCa pathophysiology and unveil powerful
   clinically applicable biomarkers. A shotgun proteomics approach
   combined with a mutation database has been used to detect mutated
   peptides related to various types of cancer, such as breast [[63]18],
   colon [[64]19], and rectal cancer [[65]20]. Still, in PCa, it is mostly
   unexplored. In 2018, Kwon et al. first applied a proteogenome approach
   to identify six mutated peptides in the conditioned media from human
   PCa cell lines related to androgen-independent PCa, which are specific
   markers for PCa and for metastasis sites [[66]21]. More recently, the
   same team identified seventy mutant peptides in PCa cell lines, of
   which seven were differentially expressed in PCa compared to normal
   tissues [[67]22].

   To identify a panel of putative protein markers to be evaluated in a
   non-invasively collected body fluid for PCa screening, the urine
   proteome and proteogenome of PCa patients were characterized by an
   MS-based approach. The integration of results was used to select
   candidate targets for small-scale clinical testing. MS is widely used
   to discover urinary protein biomarkers for cancer, including PCa
   [[68]23]. Usually, biomarker discovery relies on a shotgun proteomics
   approach, followed by a validation phase using antibody-based
   techniques or targeted MS. Considering the complex mixture of proteins
   in urine, separation methodologies are important to increase
   sensitivity. Thus, a combination of gel-based and gel-free methods,
   such as GeLC-MS/MS, appears to be a robust and reproducible method for
   proteome analysis [[69]24], warranting its application in the present
   work.

   This work aims to improve the diagnosis of PCa by investigating the
   effect of new mutations in proteins that can be detected in urine, a
   non-invasively collected fluid. Additipnally, it overcomes the
   limitations of prior studies by using a combination of two software
   packages for MS data analysis, a proteogenome approach, and a detailed
   revision and integration of other exploratory proteome analyses to
   select protein targets.

2. Materials and Methods

2.1. Urine Proteome Profile of PCa Patients and Cancer-Free Subjects

2.1.1. Patients and Sample Collection

   Urine samples were collected, without a prior prostate massage, from
   patients diagnosed with PCa at the Portuguese Oncology Institute of
   Porto (IPO Porto, Porto, Portugal), before surgery or therapy. Patients
   with other types of cancer, obesity, or autoimmune diseases were
   excluded, and cancer-free subjects had no clinically apparent prostatic
   disease. All available clinical data of the subjects enrolled in this
   study (discovery (d) and testing cohorts) is depicted in [70]Tables S1
   and S2. The discovery cohort comprised five PCa patients and five
   cancer-free subjects (controls). The testing cohort comprised thirty
   patients and thirty cancer-free subjects, not considering benign
   prostate diseases, such as BPH, due to the unavailability of samples.

2.1.2. Urine Sample Preparation

   Urine samples were kept at 4 °C and centrifugated at 4000× g for 20 min
   at 4 °C. The supernatant (4.5 mL per sample) was collected and stored
   at −80 °C until laboratory analysis. Each urine sample was concentrated
   using a filter device (10 kDa cut-off, Vivaspin 500 Sartorius Biotech)
   by sequential centrifugations at 10,000× g for 10 min at 10 °C.
   Afterward, the retentate was resuspended in 0.5 M Tris pH 6.8 and 4%
   SDS and protein concentration were assessed by DCTM kit (Bio-Rad,
   Hercules, CA, USA).

2.1.3. SDS-PAGE

   The volume equivalent to 50 µg of protein was precipitated overnight
   with cold acetone (−20 °C) and centrifugated at 14,000× g for 30 min at
   4 °C. Then, the precipitated protein was mixed 1:1 with sample Laemmli
   loading buffer (0.5 M Tris-HCl pH 6.8, 15% glycerol, 4% SDS, 20%
   2-mercaptoethanol, bromophenol blue), heated to 100 °C for 5 min, and
   separated on 12% Tris-Glycine gels. Following electrophoretic
   separation, gels were fixed in methanol:acetic acid:water (4:1:5; for
   30 min) and stained with Colloidal Coomassie Blue G250 (overnight).
   Gels were distained with 20% methanol until optimal contrast was
   achieved.

2.1.4. Liquid Chromatography Tandem-Mass-Spectrometry (LC-MS/MS)

   Tryptic digestion was performed according to Shevchenko et al.
   [[71]25], with a few modifications. All protein bands were manually
   excised from the gels and sliced into ten sections. The gel pieces were
   washed with ammonium bicarbonate (NH[4]HCO[3]) (25 mM) and ACN
   (acetonitrile). Proteins were reduced with dithiothreitol (10 mM, 30
   min, 60 °C) and alkylated in the dark with iodoacetamide (55 mM, 30
   min, 25 °C). The gel pieces were washed with 100 mM NH[4]HCO[3] and
   then with ACN. Gel pieces were vacuum-dried (SpeedVac, Thermo Savant)
   and proteins digested with trypsin (Thermo Scientific™, Waltham, MA,
   USA. Pierce™ Trypsin Protease, MS Grade) in 50 mM NH[4]HCO[3] to a
   final protease: protein ratio of 1:25 (w/w). After 30 min on ice, 50 μL
   of 50 mM NH[4]HCO[3] was added, and the samples were incubated for 16 h
   at 37 °C. The extraction of tryptic peptides was performed by the
   serial addition of 10% formic acid (FA), 10% FA:ACN (1:1) twice, and
   90% ACN. Tryptic peptides were lyophilized and resuspended in 1% FA
   upon HPLC injection. The samples were analyzed with an Orbitrap Q
   Exactive (Thermo Fisher Scientific, Bremen, Germany) through the
   EASY-spray nano ESI source (Thermo Fisher Scientific, Bremen) that was
   coupled to an Ultimate 3000 (Dionex, Sunnyvale, CA, USA) HPLC system.
   The trap (5 mm × 300 µm inner diameter) and the EASY-spray analytical
   (150 mm × 75 µm) columns used were C18 Pepmap100 (Dionex, LC Packings,
   Sunnyvale, CA, USA), having a particle size of 3 µm. One analytical
   replicate was performed for each sample and blank runs were acquired
   between samples. For quality control of the performance of the nano-LC
   system, the acquisition of cytochrome C digest (1 pmol/μL) (cytochrome
   c digest lyophilized P/N 161089-thermo scientific) was routinely
   performed. Peptides were trapped at 30 μL/min in 96% of solvent A (0.1%
   FA). Elution was achieved with the solvent B (0.1% FA/80% acetonitrile
   v/v) at 300 nL/min. The 92 min gradient used was as follows: 0–3 min, 4
   solvent B; 3–70 min, 4–25% solvent B; 70–90 min, 25–40% solvent B;
   90–92 min, 40–90% solvent B; 92–100 min, 90% solvent B; 100–101 min,
   90–4% solvent B; 101–120 min, 4% solvent B. The mass spectrometer was
   operated at 2.2 kV in the data-dependent acquisition mode. An MS2
   method was used with an FT survey scan from 400 to 1600 m/z (resolution
   70,000; auto-gain control target 1 × 10^6). The 10 most intense peaks
   were subjected to high collision dissociation fragmentation (resolution
   17,500; auto-gain control target 5 × 10^4, normalized collision energy
   28%, max. injection time 100 ms, dynamic exclusion 35 s).

2.1.5. Protein Identification and Quantification

   The MaxQuant (version 1.6.5.0, Thermo software) and Proteome Discoverer
   (version 2.2, Thermo Fisher Scientific) software packages were used for
   peptide identification and label-free quantification. In MaxQuant, the
   Andromeda, and Proteome Discoverer, the MS Amanda, and Sequest HT
   search engines were used to search the MS/MS spectra against the
   Uniprot (TrEMBL and Swiss-Prot) protein sequence database under Homo
   Sapiens (version December 2018). Both database search parameters were
   as follows: methionine oxidation, protein N-term acetylation and
   phosphorylation, as variable modifications, and cysteine
   carbamidomethylation as a fixed modification. The mass tolerance of
   precursor mass was 20 ppm for MaxQuant and 10 ppm for Proteome
   Discoverer, and fragment ion mass tolerance was 0.15 Da (MaxQuant) and
   0.02 Da (Proteome Discoverer). Minimal peptide length was set to 7
   amino acids and, at most, 2 missed cleavages were allowed for both
   software. The false discovery rate (FDR) for identification was set to
   1% at peptide and protein levels. Only the top-ranking protein of each
   group (master proteins), identified with at least two peptides, were
   considered. Exclusion of contaminants relied on those identified by the
   MaxQuant software and the cRAP protein sequences—THE GPM
   ([72]https://www.thegpm.org/crap/) (accessed on 2 April 2019).

   The MS proteome data have been deposited on the ProteomeXchange
   Consortium via the PRIDE [[73]26] partner repository with the data set
   identifier PXD017902.

2.1.6. Exploratory Analysis of Urine Proteome Data

   The protein abundances in Proteome Discoverer (normalized to the
   respective median) and normalized LFQ intensities in MaxQuant were log
   2-transformed. In an exploratory analysis of proteome data, the
   proteins identified in all individuals were used as variables to
   perform Principal Component Analysis (PCA) and Heatmap analyses. These
   analyses were performed on MetaboAnalyst 5.0 [[74]27]. To identify
   dysregulated proteins in PCa patients, the fold-change in protein
   abundance between PCa patients and cancer-free subjects was then
   calculated from the average log2 difference of protein intensities.
   Student’s t-test assessed the statistical significance of this
   difference.

2.1.7. Comparison with a Previous Bioinformatic Analysis of Putative Urinary
Markers of PCa and Selection of Candidate Protein Targets for the Testing
Phase

   Dysregulated proteins were compared with the results of a bioinformatic
   analysis focused on comparing and mining the proteome profile of tumor
   prostate tissue and urine from PCa patients reported by several MS
   studies [[75]28]. The bioinformatic analysis reported 2641 and 616
   dysregulated proteins in tumor prostate tissue and urine from PCa
   patients, respectively. To place urine proteome as a reflection of
   events taking place in prostate tissue and to identify specific urinary
   protein targets for PCa, the dysregulated proteins identified in tumor
   prostate tissue and urine from PCa patients were compared, resulting in
   339 overlapping proteins. In this sense, the dysregulated proteins
   identified by MS in the present work, common to the 2641 dysregulated
   proteins expressed in tumor prostate tissue or to the 339 urinary
   proteins with prostate expression, correspond to the selection criteria
   of candidate proteins to be tested. Then, the selected proteins were
   compared with the normal human urinary proteome [[76]29].

2.1.8. Measurement of Candidate Protein Targets in Urine Using Immunoblot

   The selected protein targets from the discovery phase were tested by
   slot blot or Western blot immunoassays. In slot blot analysis,
   performed according to Caseiro et al. [[77]30], the urine protein
   concentrated fraction was diluted in TBS to a final protein
   concentration of 0.01 μg/μL and slot-blotted onto a nitrocellulose
   membrane (Amersham Protran NC 0.45; Amersham Pharmacia Biotech,
   Buckinghamshire, UK). Antibodies specificity, selectivity, and
   sensitivity were assessed previously through Western blot by the bands
   appearing at the expected molecular weights without evidence of
   non-specific binding of the antibodies. The blocking and incubation
   conditions were optimized as follows: EFEMP1 (GTX111657: 1:1000, 1 h;
   GE Amersham-NA934: HRP-linked donkey anti-rabbit 1:10,000); AMBP
   (sc-81948: 1:1000, 1 h; GE Amersham-NA931: HRP-linked sheep anti-mouse
   1:5000); LMAN2 (sc-130026, 1 h; GE Amersham-NA931: HRP-linked sheep
   anti-mouse 1:5000). Regarding Western blot, 20 µg of protein from each
   sample was separated on a 12% SDS-PAGE gel and transferred onto
   nitrocellulose membranes. In both immunoblot experiments, Ponceau S
   staining was used to normalize the antibody signal to total protein
   levels. In any case, the membranes were washed with TBS-T (TBS 25 Mm
   Tris−HCl, pH 7.4, 150 Mm NaCl, 0.1% Tween 20) and imaged in a
   ChemiDocTM Touch imaging system (Bio-Rad) using the Enhanced
   Chemiluminescence kit (ECL Select Western Blotting Detection Reagent,
   RPN2235, Amersham). Optical density was assessed with Image Lab
   Software (Bio-Rad) and normalized to a loading control sample. Western
   blot conditions were: CDH1 (GTX629691: 1:1000, 1 h; GE Amersham-NA931:
   HRP-linked sheep anti-mouse 1:5000); TTR (GTX100577: 1:500, 1 h; GE
   Amersham-NA934: HRP-linked donkey anti-rabbit 1:10,000).

2.1.9. Measurement of Urinary PSA Levels

   Urinary PSA levels were determined using the same method (Elecsys total
   PSA, 08791732500) used to determine serum PSA levels. This
   electrochemiluminescence assay is used in the clinical routine of IPO
   Porto. It quantifies total PSA (free + complexed PSA) using a Cobas e
   801 module, a member of Roche Cobas 8000 Modular Analyzer (Roche,
   Woerden, The Netherlands).

2.2. Urine Proteogenome Profile of PCa Patients and Cancer-Free Subjects

2.2.1. Identification of Cancer-Associated Mutations

   Considering the high impact of mutations on cancer progression, the
   proteogenome profile of urine from PCa patients was explored. For this,
   mass spectra resulting from the MS analysis were searched against a
   database built into the Pinnacle software
   ([78]https://rimuhc.ca/-/protein-quantification-software-pinnacle?redir
   ect=%2Fproteomics-software, accessed on 5 January 2022). This type of
   analysis aimed to investigate the existence of cancer-associated
   mutations that were translated in proteins present in the urine from
   PCa patients. To select high-confidence urinary proteins with a very
   likely origin in the prostate, only mutations on proteins present in
   all samples and with known prostate expression were considered. The
   prostate proteome was searched in the HPA database and in the
   above-mentioned bioinformatic analysis [[79]28]. The prostate proteome
   in the HPA consisted of proteins with evidence at the protein level and
   its last access was on 8 November 2021.

2.2.2. Exploratory Analysis of Urine Proteogenome Data

   The abundances of proteins with known prostate expression in Pinnacle
   were log 2-transformed. In an exploratory analysis of proteogenome
   data, the levels of mutant protein isoforms identified in all
   individuals were used as variables to perform Principal Component
   Analysis (PCA) and Heatmap analyses. These analyses were performed on
   MetaboAnalyst 5.0 [[80]27]. To identify dysregulated proteins with
   mutations in PCa patients, the fold-change in protein abundance between
   PCa patients and cancer-free subjects was then calculated from the
   average log2 difference of protein intensities. Student’s t-test
   assessed the statistical significance of this difference.

2.2.3. Integration with the Cancer Genome Atlas (TCGA), DisGeNET and
Literature Data

   To investigate whether mutations identified in proteins with known
   prostate expression were already described in PCa, TCGA, DisGeNET
   (v7.0), and literature data were searched.

   TCGA is a cancer genomics consortium that generates data
   ([81]https://www.cancer.gov/tcga, accessed on 12 January 2022)
   encompassing the profiling of over 20,000 primary tumors and matched
   non-tumoral samples related to various human cancers, including PCa.
   The characterization of PCa samples disclosed 20,237 mutated genes and
   33,334 mutations. DisGeNET is one of the largest repositories of
   Gene-Disease (GDA) and Variant-Disease (VDA) Associations [[82]31]. The
   latest version of DisGeNET contains 1,134,942 GDAs and 369,554 VDAs. In
   the present work, variants associated with PCa were extracted from the
   Prostate Carcinoma C0600139 (January 2022).

2.2.4. Comparison of the Levels of Native and Mutant Forms of Proteins in the
Urine from PCa Patients

   To investigate the influence of mutations on the abundance of proteins
   with known expression in the prostate, the levels of their native and
   mutant forms were compared.

2.2.5. Prediction of the Likely Impact of Single-Residue Substitutions in
Proteins

   The PolyPhen-2 (Polymorphism Phenotyping v2) web tool was used to
   predict the likely impact of each amino acid substitution on the
   structure and function of the proteins with known prostate expression
   [[83]32]. Each mutation is assigned a score, which is the probability
   of the substitution being damaging, in addition to a sensitivity and
   specificity value of the prediction confidence. According to the
   PolyPhen-2 tool, single-residue substitutions in the protein sequence
   can be classified as benign (score: 0–0.4), possibly damaging (score:
   0.4–0.9), or probably damaging (score: 0.9–1) [[84]33].

2.2.6. Protein–Protein Interaction Analysis

   Due to the pivotal role of Protein–Protein interactions (PPIs) in
   cancer and the possible effect of mutations on its dynamics, the
   interactions between proteins in which point mutations has been
   identified were explored. For this, the STRING database v 11.5 was
   sourced on 12 January 2022, and only protein interactions with a
   confidence score of ≥0.4 were considered [[85]34]. However, we must be
   cautious when extrapolating the significance of these PPIs to
   biological fluids such as urine, as most PPIs are identified or
   predicted from studies in cells and tissues.

2.2.7. Prediction of the Likely Impact of Single-Residue Substitutions in
Protein–Protein Affinity

   Considering the impact of mutations on PPIs, the SAAMBE-SEQ Web Server
   was used to predict the effect of point mutations detected in this work
   on protein binding affinity [[86]35].

2.3. Statistical Data Analysis

   Statistical analyses were carried out in R software for Windows version
   3.6.2 and GraphPad Prism version 6.0 (GraphPad Software, Inc.; San
   Diego, CA, USA). The Shapiro normality test and visual inspection of
   the histograms were used to assess the data distribution. To evaluate
   the effect size of the dysregulated proteins when comparing the tested
   groups, Cohen’s d was determined. Differences were considered
   statistically significant if p-value was ≤ 0.05. The clinical
   parameters and protein levels are expressed as mean ± standard
   deviation (SD).

3. Results

3.1. Urine Proteome Profile of PCa Patients and Cancer-Free Subjects

   To identify potential protein targets for PCa prediction, shotgun
   proteomics was performed in urine collected from PCa patients and
   cancer-free subjects. To boost MS data analysis, a combination of two
   different software packages, MaxQuant and Proteome Discover, sourcing
   three databases (Andromeda, Amanda, and Sequest HT) in total, was used.

   Considering only the top-ranking protein of each group identified with
   at least two peptides and filtering out identifications from reversed
   sequences and contaminants, 605 and 592 urinary proteins were
   identified by MaxQuant and Proteome Discoverer, respectively. In total,
   732 proteins were identified, excluding those common to both software.

3.1.1. Exploratory Analysis of Urine Proteome Data

   Aiming to select and identify proteins of interest for PCa monitoring,
   only proteins present in all samples analyzed by MaxQuant (82 proteins)
   and by Proteome Discoverer (84 proteins) were considered for further
   analysis. These high-confidence proteins were separately used for
   Principal Component Analysis (PCA) ([87]Figure 1A and [88]Figure 2A)
   and Heatmap analyses ([89]Figure 1B and [90]Figure 2B). In both
   software, no separation of groups was observed in the PCa analysis.
   However, the proteins identified by the MaxQuant software alone seem to
   provide a discrimination between PCa patients and non-cancer subjects
   based on two protein clusters, depicted in the heatmap:
   AZGP1(zinc-alpha-2-glycoprotein)-SPP1 (Osteopontin); CD14 (Monocyte
   differentiation antigen CD14)-MASP2 (Mannan-binding lectin serine
   protease 2) ([91]Figure 1B). In the first cluster, proteins are mostly
   upregulated in PCa patients compared to non-cancer subjects, while in
   the second cluster proteins are predominantly downregulated in PCa
   patients.

Figure 1.

   [92]Figure 1
   [93]Open in a new tab

   Exploratory analysis of proteome data from MaxQuant. (A) Principal
   Component Analysis of the urine proteome of the two groups. (B) The
   heatmap of proteins identified in all individuals. Samples are
   represented in columns and proteins in rows. Proteins whose gene name
   is not available are indicated by their UniProt accession number. The
   dashed line on the heatmap indicates the two clusters of proteins.

Figure 2.

   [94]Figure 2
   [95]Open in a new tab

   Exploratory analysis of proteome data from Proteome discoverer. (A)
   Principal Component Analysis of the urine proteome of the two groups.
   (B) The heatmap of proteins identified in all individuals. Samples are
   represented in columns and proteins in rows.

   Then, differential protein analysis revealed 18 dysregulated proteins
   in PCa, with 4 proteins (p-value ≤ 0.05) identified only by Proteome
   Discoverer, 9 proteins only by MaxQuant analysis, and 5 proteins
   (Cadherin-1 (CDH1), EGF-containing fibulin-like extracellular matrix
   protein 1 (EFEMP1), Prostate-specific antigen (PSA) (KLK3), Secreted
   and transmembrane protein 1 (SECTM1), and Transthyretin (TTR))
   discovered by both software. Altogether, 11 proteins were significantly
   downregulated (fold change less than 1), and 7 proteins were
   significantly upregulated (fold change greater than 1) in PCa patients
   ([96]Table 1 and [97]Table 2). Reassuringly, the most widely used
   biomarker for PCa diagnosis, PSA, was one of the dysregulated proteins
   in common in the analysis by both software packages. When the tested
   groups were compared, proteins showing significant differences (p-value
   ≤ 0.05) and revealed a “large” effect-size (|Cohen’s d|) > 0.8
   ([98]Table 1 and [99]Table 2). Besides a large effect-size,
   dysregulated proteins identified by both software presented a
   consistent direction of dysregulation. It is noteworthy that in the
   heatmap of MaxQuant data, seven proteins (TTR, KLK3, SECTM1, CDH13,
   AMY2A, EFEMP1, ITIH4, HSPG2, PTGDS, CDH1, and LMAN2) responsible for
   the separation of groups were also found dysregulated in PCa patients.
   It was observed that the decreased levels of SECTM1, CDH13, AMY2A,
   EFEMP1, ITIH4, HSPG2, PTGDS, CDH1, and LMAN2 and increased levels of
   TTR and KLK3 characterized the urine proteome of PCa patients.

Table 1.

   Dysregulated proteins between PCa patients and cancer-free subjects
   (Proteome Discoverer).
   Uniprot ID Protein Name Gene Name p-Value Cohen’s d
   [Lower; Upper 95% CI]
   [100]P07288 Prostate-specific antigen KLK3 0.00 4.21 (3.50; 4.91)
   [101]Q8WVN6 Secreted and transmembrane protein 1 SECTM1 0.01 −2.16
   (−2.39; −1.93)
   [102]P12830 Cadherin-1 CDH1 0.03 −1.73 (−2.05; −1.41)
   [103]P0DOX5 Immunoglobulin gamma-1 heavy chain N/A 0.03 1.73 (1.39;
   2.07)
   [104]Q12805 EGF-containing fibulin-like extracellular matrix protein 1
   EFEMP1 0.03 −1.68 (−2.25; −1.12)
   [105]P02766 Transthyretin TTR 0.03 1.66 (0.86; 2.46)
   [106]P01861 Immunoglobulin heavy constant gamma 4 IGHG4 0.04 1.52
   (0.90; 2.15)
   [107]P01034 Cystatin-C CST3 0.05 1.50 (0.91; 2.08)
   [108]Q01459 Di-N-acetylchitobiase CTBS 0.05 −1.44 (−1.86; −1.02)
   [109]Open in a new tab

   The protein identification and label-free quantification performed by
   the Proteome Discoverer software revealed nine dysregulated proteins
   (p-value ≤ 0.05) between the tested groups. These proteins are shown in
   this table along with their p-value and effect size. The Cohen’s d for
   individual proteins is presented together with the lower and upper 95%
   confidence interval (CI). Abbreviation: Confidence interval (CI).

Table 2.

   Dysregulated proteins between PCa patients and cancer-free subjects
   (MaxQuant).
   Uniprot ID Protein Name Gene Name p-Value Cohen’s d
   [Lower; Upper 95% CI]
   [110]Q8WVN6 Secreted and transmembrane protein 1 SECTM1 0.01 −2.10
   (−2.48; −1.73)
   [111]P07288 Prostate-specific antigen KLK3 0.01 2.01 (1.08; 2.95)
   [112]P41222 Prostaglandin-H2 D-isomerase PTGDS 0.01 −1.97 (−2.44;
   −1.49)
   [113]Q14624 Inter-alpha-trypsin inhibitor heavy chain H4 ITIH4 0.01
   −1.96 (−2.32; −1.60)
   [114]Q12805 EGF-containing fibulin-like extracellular matrix protein 1
   EFEMP1 0.01 −1.84 (−2.33; −1.35)
   [115]P55290 Cadherin-13 CDH13 0.02 −1.75 (−2.11; −1.40)
   [116]P98160 Basement membrane-specific heparan sulfate proteoglycan
   core protein HSPG2 0.03 −1.63 (−2.07; −1.19)
   [117]P04746 Pancreatic alpha -amylase AMY2A 0.03 −1.57 (−1.95; −1.19)
   [118]P01876 Immunoglobulin heavy constant alpha 1 IGHA1 0.04 1.55
   (1.32; 1.78)
   [119]P02760 Protein AMBP AMBP 0.04 −1.51 (−1.88; −1.13)
   [120]P12830 Cadherin-1 CDH1 0.05 −1.48 (−1.90; −1.07)
   [121]Q12907 Vesicular integral-membrane protein VIP36 LMAN2 0.05 −1.46
   (−2.10; −0.83)
   [122]Q9NPP6 Immunoglobulin heavy chain variant N/A 0.04 1.58 (1.22;
   1.93)
   [123]P02766 Transthyretin TTR 0.05 1.42 (0.97; 1.87)
   [124]Open in a new tab

   The protein identification and label-free quantification performed by
   the MaxQuant software revealed fourteen dysregulated proteins (p-value
   ≤ 0.05) between the tested groups. These proteins are shown in this
   table along with their p-value and effect size. The Cohen’s d fof
   individual proteins is presented together with the lower and upper 95%
   confidence interval (CI). Abbreviation: Confidence interval (CI).

3.1.2. Comparison with a Previous Bioinformatic Analysis of Putative Urinary
Markers of PCa and Selection of Candidate Protein Targets for the Testing
Phase

   To select the most promising proteins for further analysis,
   dysregulated proteins revealed by MS analysis were compared with
   proteins resulting from a bioinformatic analysis integrating urine and
   tumor tissue proteomes of PCa from several MS studies [[125]28]. From
   this comparison, some common proteins emerged, such as AMBP, CDH1,
   EFEMP1, KLK3, SECTM1, LMAN2, and TTR.

   From the previous study of our group, the dysregulated proteins AMBP,
   KLK3, LMAN2, and TTR were found dysregulated in urine and tumor tissue
   from PCa patients, while SECTM1 was only found in urine from PCa
   patients, and CDH1 and EFEMP1 were only in PCa tissue.

   Taken together, and keeping in mind that candidate targets should be
   urinary proteins with prostate expression, AMBP, CDH1, EFEMP1, KLK3,
   LMAN2, and TTR were selected for testing in an independent cohort. The
   presence of these proteins in the urine was already expected, because
   they are characteristic of the normal human urine proteome [[126]29].

3.1.3. Measurement of Candidate Protein Targets in Urine

   Five protein targets, AMBP, CDH1, EFEMP1, LMAN2, and TTR were selected
   for immunoblot-based testing in a larger and independent cohort
   (testing group). However, none of the MS findings could be reproduced
   ([127]Table S3, [128]Figure S1). Measurement of urinary PSA levels in
   the testing cohort did not agree with the MS findings (p = 0.29,
   Mann–Whiney test). The results are shown in [129]Figure 3.

Figure 3.

   [130]Figure 3
   [131]Open in a new tab

   Urinary protein levels of the candidate targets for PCa in the
   discovery group (using MS) and in the testing group (using immunoblot
   and immunoassay). MS: mass spectrometry.

3.2. Urine Proteogenome Profile of PCa Patients and Cancer-Free Subjects

3.2.1. Identification of Cancer-Associated Mutations

   To characterize the proteogenome landscape of urine from PCa patients,
   MS/MS spectra were searched against a repository of information from a
   wide variety of databases encompassing somatic mutations. This search
   resulted in identifying 6418 mutated peptides corresponding to 1665
   mutant protein isoforms. Of these, 609 mutated peptides, which
   correspond to 417 mutant protein isoforms, were associated with cancer.
   Only mutant protein isoforms that occurred in all urine samples (322
   proteins) were selected for further analysis. Immunoglobulins and
   highly abundant urinary proteins (serum albumin, uromodulin,
   serotransferrin) were excluded due to their high abundance in
   biological samples and the lack of specificity for cancer, resulting in
   170 proteins. These 170 proteins corresponded to 122 proteins after
   filtering out duplicates. As our focus was high confidence proteins
   with mutations whose origin was very likely the prostate, these data
   were integrated with the prostate proteome searched in the HPA database
   and in a bioinformatic analysis [[132]28], resulting in 86 proteins
   with known expression in the prostate ([133]Table S4). Among these
   proteins are some of known relevance for PCa, namely Acid ceramidase
   (ASAH1), Extracellular superoxide dismutase [Cu-Zn] (SOD3), Glutathione
   S-transferase P (GSTP1), Osteopontin (SPP1), Prostatic acid phosphatase
   (PAP), and Zinc-alpha-2-glycoprotein (ZAG).

3.2.2. Exploratory Analysis of Urine Proteogenome Data

   The levels of the mutant protein isoforms were used for PCA (Principal
   Component Analysis) ([134]Figure 4A) and Heatmap analyses ([135]Figure
   4B). No group separation was observed in the PCA of the proteogenome
   profile of PCa patients. However, the heatmap indicates a
   discrimination between PCa patients and non-cancer subjects based on
   two protein clusters: ITIH4*G893S (Inter-alpha-trypsin inhibitor heavy
   chain H4)-LMAN2*D222N (Vesicular integral-membrane protein VIP36);
   KLK3*C209Y (PSA)-MVB12B*T198M (Multivesicular body subunit 12B)
   ([136]Figure 4B). In the first cluster, mutant forms of proteins are
   mostly downregulated in PCa patients compared to non-cancer subjects,
   while in the second cluster mutant forms of proteins are upregulated
   predominantly in PCa patients.

Figure 4.

   [137]Figure 4
   [138]Open in a new tab

   Exploratory analysis of proteogenome data from Pinnacle. (A) Principal
   Component Analysis of the urine proteogenome of the two groups. (B) The
   heatmap of mutant proteins identified in all individuals. Samples are
   represented in columns and proteins in rows. Proteins are identified by
   their gene name, and the mutation identified. The dashed line on the
   heatmap indicates the two clusters of proteins.

3.2.3. Integration with the Cancer Genome Atlas (TCGA), DisGeNET and
Literature Data

   According to TCGA, DisGeNET, and the literature, only three of the
   mutations identified in the 86 proteins with known prostate expression
   have already been described. These mutations (rs17632542, rs1695,
   rs7041) were mapped on KLK3 (PSA) [[139]36], GSTP1 (Glutathione
   S-transferase P) [[140]37,[141]38], and GC (Vitamin D-binding protein)
   [[142]39], respectively. To the best of our knowledge, there is no
   association of the remaining mutant protein isoforms with PCa.
   Especially notable are the proteins SPP1, VASN, ASAH1, RBP4, and ASS1,
   which, until now, have had no mutation related to PCa described in the
   literature.

3.2.4. Comparison of the Levels of Native and Mutant Forms of Proteins in the
Urine from PCa Patients

   The analysis of proteogenome data revealed 6 differentially abundant
   mutant protein isoforms in PCa patients compared with cancer-free
   individuals, namely Protein AMBP (AMBP*A286G), Sodium/hydrogen
   exchanger 9B1 (SLC9B1*N70S), Basement membrane-specific heparan sulfate
   proteoglycan core protein (HSPG2*Q1062H), Zinc finger protein 624
   (ZNF624*S207F), Vasorin (VASN*R161Q), and Complement decay-accelerating
   factor (CD55*S162L) ([143]Table S4, [144]Figure S2). Mutant AMBP
   isoform was upregulated in PCa patients, while the remaining 5
   differentially abundant mutant protein isoforms were downregulated.

   Comparing the proteome profile analysis of MaxQuant and Proteome
   Discoverer with the proteogenome profile of PCa patients resulted in 30
   and 31 common proteins, respectively. Of these common proteins, AMBP,
   CDH1, EFEMP1, HSPG2, ITIH4, KLK3, LMAN2, PTGDS, VASN, and CD55 proteins
   stood out. The native form of AMBP, CDH1, EFEMP1, HSPG2, ITIH4, KLK3,
   LMAN2, and PTGDS proteins was found dysregulated in urine from PCa
   patients, but only the mutant protein isoforms (AMBP*A286G;
   HSPG2*Q1062H) were found dysregulated ([145]Figure S2). In the
   remaining common proteins, the presence of mutations did not affect
   their abundance in urine. The native form of VASN and CD55 proteins was
   not found dysregulated in the urine from PCa patients, but their mutant
   protein isoforms (VASN*R161Q; CD55*S162L) were.

   The mutations identified in these proteins and in those with recognized
   relevance to PCa are summarized in [146]Table 3.

Table 3.

   List of mutations mapped on some proteins and respective mutant
   peptides identified in urine from PCa patients.
   Uniprot ID Protein Name Gene Name Mutation Description Mutation Type
   Protein Role in PCa or Other Types of Cancer
   [147]P02760 Protein AMBP AMBP G238S; E192G; V69M; A286G;
   P197S; R185Q;
   G338S; G341A;
   I198T; V313I;
   G186R; R185Q missense AMBP is an inflammation-regulating protein,
   associated with human cancers [[148]40,[149]41], including PCa
   [[150]42,[151]43]. Increased urinary levels
   [[152]6,[153]42,[154]44,[155]45] but diminished levels in tumor
   prostate tissue have been reported in PCa patients
   [[156]46,[157]47,[158]48].
   [159]P12830 Cadherin-1 CDH1 H233R; A408E missense CDH1 is a protein
   implicated in cell adhesion, migration, and epithelial-mesenchymal
   transition [[160]49,[161]50] and its downregulation is correlated with
   a poor prognosis in PCa patients [[162]51].
   [163]Q12805 EGF-containing fibulin-like extracellular matrix protein 1
   EFEMP1 V463M missense EFMP1 plays a role in cell adhesion and
   migration, acting as a tumor suppressor in PCa. Diminished EFEMP1 mRNA
   and protein levels [[164]52] and EFEMP1 promoter hypermethylation were
   observed in PCa patients [[165]53,[166]54].
   [167]P98160 Basement membrane-specific heparan sulfate proteoglycan
   core protein HSPG2 V4332I; A1503V;
   S970F; M638V;
   Q1062H missense HSPG2, found predominantly in the ECM and bone marrow,
   modulates tumor angiogenesis, proliferation, and differentiation. It is
   overexpressed in PCa tissues compared to non-malignant tissues,
   correlating with high GS and PCa cell proliferation and viability
   [[168]55,[169]56,[170]57].
   [171]Q14624 Inter-alpha-trypsin inhibitor heavy chain H4 ITIH4 R866C;
   G893S missense ITIH4 is an acute-phase response protein whose function
   remains unclear [[172]58]. Research points to a tumor suppressor
   activity of ITIH4 in human cancers and dysregulation in PCa
   [[173]43,[174]59].
   [175]P07288 Prostate-specific antigen (PSA) KLK3 C209Y; V55M; G156V;
   AVCG (47–50);
   S117P; G87R;
   L124F; A154T;
   I179T Missense; inframe_insertion PSA is widely used as serum biomarker
   for PCa. It was approved by the US Food and Drug Administration (FDA)
   in 1994 [[176]60].
   [177]Q12907 Vesicular integral-membrane protein VIP36 LMAN2 G250S;
   D229N missense LMAN2 protein is involved in endoplasmic reticulum to
   Golgi trafficking of some glycoproteins [[178]61]. Dysregulation of the
   LMAN2 gene has been indicated in some cancers
   [[179]62,[180]63,[181]64], while the role in PCa remains obscure.
   However, raised LMAN2 urinary levels were detected in PCa patients
   [[182]44].
   [183]P41222 Prostaglandin-H2 D-isomerase PTGDS L130M missense PTGDS is
   involved in prostaglandins metabolism and lipid transport. The PTGDS
   gene is downregulated in malignant prostate tissues compared to
   non-malignant tissues and integrates a signature that predicts relapse
   after prostatectomy. In vitro, its overexpression increased death and
   suppressed the growth of PCa cells [[184]65,[185]66].
   [186]Q13510 Acid ceramidase ASAH1 V246A missense ASAH1 hydrolyzes
   ceramide to sphingosine and fatty acid [[187]67] and its protein levels
   are elevated in tumor prostate tissue [[188]68]. Its increased levels
   have been suggested as a therapeutic target in PCa as they have been
   correlated with metastasis establishment and resistance to chemotherapy
   [[189]69,[190]70].
   [191]P08294 Extracellular superoxide dismutase [Cu-Zn] SOD3 A58T
   missense SOD3 is a known tumor suppressor gene in PCa. It is an
   antioxidant enzyme that catalyzes the dismutation of the superoxide
   radical anion [[192]71]. SOD3-reduced levels were reported in PCa
   patients, and its overexpression in PCa cells prevented cell
   proliferation, migration, and invasion, suggesting a role as a
   therapeutic target and predictive marker [[193]72,[194]73].
   [195]P09211 Glutathione S-transferase P GSTP1 I105V missense GSTP1 is a
   known tumor suppressor gene in PCa and is responsible for cellular
   detoxification through glutathione conjugation [[196]74]. PCa is
   characterized by loss of GSTP1 function, mostly due to hypermethylation
   of its regulatory CpG island [[197]75], and it is purported to occur
   early in prostatic carcinogenesis [[198]76,[199]77].
   [200]P10451 Osteopontin SPP1 A22G missense SPP1 is a bone matrix
   protein involved in bone remodeling, modulation of inflammation, cell
   adhesion, and migration and angiogenesis [[201]78]. In PCa, SPP1 is
   associated with metastasis and proliferation [[202]79], lower overall
   survival and biochemical relapse-free survival, and high GS [[203]80].
   Higher SPP1 levels were reported in PCa patients
   [[204]80,[205]81,[206]82].
   [207]P15309 Prostatic acid phosphatase PAP G68D missense PAP is one of
   the main secreted proteins by the prostate cells and was the first
   serum screening marker for PCa. PAP was latter replaced by PSA
   [[208]83,[209]84].
   [210]P25311 Zinc-alpha-2-glycoprotein ZAG P187L; A46T missense ZAG
   promotes adipocyte lipolysis, resulting in cancer cachexia [[211]85].
   Elevated levels of this protein have been proposed as a serum marker
   for PCa [[212]86,[213]87], and a significant predictive ability was
   found for urinary ZAG [[214]8].
   [215]Q4ZJI4 Sodium/hydrogen exchanger 9B1 SLC9B1 N70S missense SLC9B1
   is a Na^+/H^+ transporter responsible for preserving cellular
   homeostasis [[216]88], but this transporter has not yet been correlated
   with any type of cancer.
   [217]Q9P2J8 Zinc finger protein 624 ZNF624 S207F missense ZNF624 has
   not been well studied yet, but in breast cancer was one of the target
   genes of a microRNA found to be significantly and independently
   correlated with patient prognosis [[218]89].
   [219]Q6EMK4 Vasorin VASN R161Q missense VASN, an inhibitor of TGF-beta
   signaling, is upregulated in PCa tissues and stimulates PCa
   proliferation [[220]90].
   [221]P08174 Complement decay-accelerating factor CD55 S162L missense
   CD55 inhibits the complement system [[222]91]. In PCa, CD55 mediates
   tumor cells survival and growth [[223]92].
   [224]Open in a new tab

   This table shows the UniProt IDs, protein and gene names, mutation
   site/description and type, and the role of proteins in PCa.

3.2.5. Prediction of the Likely Impact of Single-Residue Substitutions in
Proteins

   With the purpose of determining the potential impact of point mutations
   on protein function, PolyPhen-2 tool was used. It is worthy of mention
   that AMBP*A286G and CD55*S162L mutant protein isoforms were predicted
   to be probably damaging, while SLC9B1*N70S, ZNF624*S207F, VASN*R161Q,
   and HSPG2*Q1062H were predicted to be benign. Most point mutations were
   predicted to be possibly or probably damaging. The results are
   presented in [225]Table 4 and [226]Table S5.

Table 4.

   Results of Polyphen-2 score and prediction for the mapped mutations.
   Gene Name Mutation    Prediction     Score Sensitivity Specificity
     AMBP     G238S   Probably damaging 1.000    0.00        1.00
     AMBP     E192G   Probably damaging 0.75     0.981       0.96
     AMBP      V69M   Possibly damaging 0.758    0.85        0.92
     AMBP     A286G   Probably damaging 1.000    0.00        1.00
     AMBP     P197S        Benign       0.051    0.94        0.83
     AMBP     G338S   Probably damaging 0.994    0.69        0.97
     AMBP     G341A   Probably damaging 0.958    0.78        0.95
     AMBP     V313I        Benign       0.025    0.95        0.81
     AMBP     G186R   Probably damaging 1.000    0.00        1.00
     AMBP     R185Q   Probably damaging 0.992    0.70        0.97
     CDH1     H233R   Possibly damaging 0.831    0.84        0.93
     CDH1     A408E   Possibly damaging 0.798    0.84        0.93
    EFEMP1    V463M   Probably damaging 0.999    0.14        0.99
     HSPG2    V4332I       Benign       0.001    0.99        0.15
     HSPG2    A1503V  Probably damaging 1.00     0.00        1.00
     HSPG2    S970F   Possibly damaging 0.498    0.88        0.90
     HSPG2    M638V        Benign       0.00     1.00        0.00
     HSPG2    Q1062H       Benign       0.00     1.00        0.00
     ITIH4    R866C   Probably damaging   1      0.00        1.00
     ITIH4    G893S        Benign       0.00     1.00        0.00
     KLK3     C209Y   Probably damaging 1.000    0.00        1.00
     KLK3     G156V   Probably damaging 1.000    0.00        1.00
     KLK3      V55M   Probably damaging 0.972    0.77        0.96
     KLK3     S117P   Possibly damaging 0.621    0.87        0.91
     KLK3      G87R        Benign       0.128    0.93        0.86
     KLK3     L124F   Probably damaging 1.000    0.00        1.00
     KLK3     A154T   Possibly damaging 0.657    0.86        0.91
     KLK3     I 179T  Possibly damaging 0.800    0.84        0.93
     LMAN2    G250S   Probably damaging 1.00     0.00        1.00
     LMAN2    D229N   Probably damaging 0.983    0.74        0.96
     PTGDS    L130M   Probably damaging 1.00     0.00        1.00
     ASAH1    V246A        Benign       0.00     1.00        0.00
     SOD3      A58T        Benign       0.188    0.92        0.87
     GSTP1    I105V        Benign       0.00     1.00        0.00
     SPP1      A22G   Possibly damaging 0.611    0.87        0.91
     ACP3      G68D   Probably damaging 1.00     0.00        1.00
     AZGP1    P187L   Probably damaging 0.94     0.69        0.97
     AZGP1     A46T        Benign       0.002    0.99        0.30
    SLC9B1     N70S        Benign       0.036    0.94        0.82
    ZNF624    S207F        Benign       0.214    0.92        0.88
     VASN     R161Q        Benign       0.019    0.95        0.80
     CD55     S162L   Probably damaging 0.990    0.72        0.97
   [227]Open in a new tab

3.2.6. Protein–Protein Interaction Analysis

   In addition to impacting the function of proteins, mutations can also
   affect interactions between proteins and, consequently, important
   biological processes and signaling pathways. To predict interactions
   between the proteins in which point mutations were identified, the
   STRING search tool was used. As shown in [228]Figure 5, the network
   consisted of 86 connected proteins (nodes) through 214 edges with
   different confidence levels. The protein–protein interaction enrichment
   p-value was <1.0 × 10^−16. Reactome enrichment analysis showed 12
   pathways enriched in this network ([229]Table S6). Regulation of
   Insulin-like Growth Factor (IGF) transport and uptake by Insulin-like
   Growth Factor Binding Proteins (IGFBPs) was the third most important
   pathway in this network, while Extracellular matrix (ECM) organization
   was the tenth. This network shows predicted interactions between most
   of the proteins.

Figure 5.

   [230]Figure 5
   [231]Open in a new tab

   PPI network of 86 mutated proteins with known expression in the
   prostate.

3.2.7. Prediction of the Likely Impact of Single-Residue Substitutions in
Protein–Protein Affinity

   To predict the impact of point mutations on PPIs, the SAAMBE-SEQ tool
   was used. The likely effect of AMBP*A286G, HSPG2*Q1062H, VASN*R161Q,
   and CD55*S162L point mutations on protein–protein interactions was
   scrutinized. Point mutations detected on SLC9B1 and ZNF624 were not
   examined as these proteins do not interact with any proteins in the
   network. Additionally, the impact of point mutations on proteins
   involved in the IGF pathway was also explored. This analysis revealed
   that the likely effect of these point mutations is destabilizing for
   PPIs ([232]Table S7).

4. Discussion

   The limitations and the invasive nature of serum PCa screening have
   driven the discovery of new candidate urinary biomarkers, especially
   protein markers. However, so far, none has translated into clinically
   useful tools, reflecting the need to discover novel biomarkers and/or
   new combinations of biomarkers. Thus, this study aimed to take
   advantage of a non-invasively collected biofluid, urine, and a high
   throughput approach, proteomics, to identify new protein targets for
   predicting the risk of developing PCa. This work was divided into three
   stages: characterization of the urine proteome profile and selection of
   protein targets; testing of shortlisted protein targets in a larger,
   independent cohort; and characterization of the urine proteogenome
   profile. The urine proteome profile of PCa and cancer-free subjects was
   analyzed by two software packages and 18 dysregulated proteins, of
   which 5 (TTR, EFEMP1, CDH1, SECTM1, KLK3) common to both software, were
   found. The integration of the urine proteome profile of PCa patients
   with proteome data from other studies reviewed by us [[233]28]
   supported the selection of potential discriminatory protein targets. As
   a result, AMBP, CDH1, EFEMP1, LMAN2, and TTR stood out as potential
   targets and were tested in an independent cohort of patients. In this
   testing phase, incubation with anti-E-cadherin did not result in a band
   around 120 kDa (full-length protein), but rather a band about 80 kDa.
   We realized that this 80 kDa fragment corresponded to soluble
   E-cadherin (sE-cadherin) and has been previously identified in tissue
   and serum from PCa patients [[234]93,[235]94] and in urine from
   patients with other cancers [[236]95,[237]96], using antibody-based
   techniques. Concerning PCa, as far as we know, here we present the
   first report of the detection of sE-cadherin fragment in the urine.
   Kuefer et al. [[238]93] suggested that the 80 kDa fragment is
   originated from the extracellular domain of full-length E-cadherin.
   Increased levels of sE-cadherin have been reported in serum and tumor
   prostate tissue from PCa patients and are correlated with disease stage
   [[239]94,[240]97,[241]98]. Differential abundances of these MS-detected
   proteins were tested in an independent cohort using immunoblot, but
   different variations were observed. Additionally, urinary PSA levels
   were also assessed in this independent cohort, but did not distinguish
   PCa patients from controls, which agrees with other studies [[242]99].

   The proteogenome landscape of urine from PCa patients was then
   characterized and 1665 mutant protein isoforms were disclosed, of which
   417 were cancer-related mutations. After considering only mutations
   present in all urine samples and proteins with known prostate
   expression, 86 mutant protein isoforms emerged. Among these proteins
   are some of known relevance for PCa, namely Acid ceramidase (ASAH1),
   Extracellular superoxide dismutase [Cu-Zn] (SOD3), Glutathione
   S-transferase P (GSTP1), Osteopontin (SPP1), Prostatic Acid Phosphatase
   (PAP), and Zinc-Alpha-2-Glycoprotein (ZAG). PAP is gaining renewed
   interest due to its superior predictive role of cause-specific survival
   and GS compared to serum PSA in men with high risk PCa
   [[243]100,[244]101]. Remarkably, it was recently suggested that a form
   of PAP (PLPAcP) associates with early PCa [[245]102]. Identifying a new
   mutation in this protein in a non-invasive biological fluid, adding to
   the prediction of PAP mutation to be probably damaging, strengthens the
   renewed interest in its study in PCa. Mutations found on the 86
   proteins were searched for in databases and the literature and, to the
   best of our knowledge, only rs17632542
   [[246]36,[247]103,[248]104,[249]105], rs1695
   [[250]37,[251]38,[252]106,[253]107], and rs7041 [[254]39] mutations
   mapped on PSA, GSTP1, and GC proteins have been described in the PCa
   context. In that vein, these results validate the proteogenome analysis
   performed in the present study.

   The analysis of the urine proteogenome profile of PCa patients revealed
   6 differentially abundant mutant protein isoforms, namely AMBP*A286G,
   SLC9B1*N70S, HSPG2*Q1062H, ZNF624*S207F, VASN*R161Q, and CD55*S162L.
   From the comparison of the proteome and proteogenome profile of PCa
   patients, AMBP, CDH1, EFEMP1, KLK3, and LMAN2 proteins stood out. Their
   native form was found dysregulated in urine from PCa patients, but the
   same was not observed with their mutant form, with the exception of
   AMBP*A286G and HSPG2*Q1062H. These results may explain the differences
   between MS and immunoblot data, because the antibodies either do not
   recognize the mutated peptides or do not specifically recognize them.

   PPIs play a pivotal role in most biological processes. Dysregulation of
   these protein interactions may result in pathological conditions, such
   as cancer, being involved in tumor progression, invasion, and
   metastasis [[255]108,[256]109]. In this sense, PPIs have been claimed
   as promising therapeutic targets for numerous types of cancer,
   including for PCa. For this type of cancer, 28 small molecules and 14
   peptides have been proposed to disrupt PPIs with relevance to PCa
   progression [[257]110]. To explore PPIs between proteins with known
   prostate expression and the pathways in which these interactions were
   involved, the STRING tool was used. In this analysis, the IGF transport
   and uptake by IGFBPs proved to be the third most important pathway in
   the network. The IGF axis is a network of ligands (GF1, IGF2, insulin)
   and IGFBP receptors (IGF1R, IGF2R, INSR), the latter being responsible
   for mediating the activity of IGFs [[258]111]. IGFs are oncogenic
   regulators, promoting prostate tumor growth, survival, and
   proliferation, and the role of IGF axis has been well documented in
   PCa. For instance, IGFBP-2 enhanced proliferation of
   androgen-independent prostate cancer cells [[259]112] and IGF-I levels
   were found raised in serum and prostate tissue from PCa patients, being
   a predictor of risk for this type of cancer [[260]113,[261]114]. In
   accordance with this, IGF1R and INSR act as oncogenes in PCa, enhancing
   tumor growth, proliferation, invasion, and angiogenesis [[262]115].
   Considering the relevance of the IGF pathway in PCa, the impact of
   mutations on the interaction of proteins involved in this pathway was
   predicted. According to SAAMBE-SEQ, the mutations were predicted to
   destabilize all PPIs involved in the IGF pathway, which naturally could
   affect this pathway and consequently the progression of PCa.

   To investigate the likely impact of each amino acid substitution on
   protein function and PPIs involving the dysregulated mutant protein
   isoforms (AMBP*A286G, SLC9B1*N70S, HSPG2*Q1062H, ZNF624*S207F,
   VASN*R161Q, and CD55*S162L), the PolyPhen-2 and SAAMBE-SEQ prediction
   tools were used. The role of the SLC9B1 and ZNF624 proteins on cancer
   is completely unknown, so the downregulation of their mutant protein
   isoforms and the prediction of their benign impact do not allow
   conclusions to be drawn. HSPG2, in its intact form, is a well-described
   pro-angiogenic molecule, being correlated with GS and increased cell
   proliferation and viability [[263]55,[264]56,[265]116]. The intact form
   of this protein was found increased in tumor prostate tissue, but in
   sera from PCa patients raised levels of HSPG2-derived fragments
   resulting from matrix metalloproteinase 7 (MMP7) degradation were
   observed. These fragments were mostly originated from domain IV and
   were not present in sera from non-cancer subjects, suggesting that
   HSPG2 cleavage occurs during metastasis and before the protein enters
   the bloodstream. Using an in silico analysis, Grindel et al. predicted
   that domains III and V of HSPG2 are the most prone to cleavage by MMP-7
   and generate new peptides for other extracellular proteases to digest
   [[266]55]. Curiously, in this work, the mutated peptide identified in
   the mutant HSPG2 isoform is located on domain III. The cleavage of
   HSPG2 and other components of basement membrane occurs during PCa cell
   invasion and is orchestrated by proteases such as MMPs, cathepsin L,
   and BMP1/Tolloid-like proteases. Both Cathepsin L and BMP1/Tolloid-like
   proteases cleave HSPG2 in domain V, originating the Endorepellin
   [[267]117] and LG3 [[268]118] peptides, respectively. Unlike the intact
   form, cleaved Endorepellin and LG3 peptides behave as powerful
   anti-angiogenic factors, being claimed as potential therapeutic targets
   for cancer [[269]118]. In fact, the administration of endorepellin to
   mice with squamous cell carcinomas and lung carcinomas resulted in
   mitigation of tumor growth, angiogenesis and metabolism and promotion
   of tumor hypoxia [[270]119]. Accordingly, LG3-diminished levels were
   noticed in breast cancer cells and in plasma from breast cancer
   patients [[271]120]. Only the LG3 peptide has been detected in urine
   [[272]121,[273]122]. In PCa, both the existence and the role of these
   peptides are unknown, and the only recognized HSPG2 protease is MMP7. A
   complex network between HSPG2 and other basement membrane components,
   such as collagens, laminin, and nidogen is responsible for ECM
   integrity. When this integrity is disturbed, the metastatic process is
   compromised [[274]123]. In the present work, mutations were identified
   in HSPG2, collagens, nidogen, and in other proteins involved in ECM
   organization. When the impact of these mutations on PPIs was predicted,
   they all proved to be destabilizing, which eventually affects ECM
   dynamics and tumor progression. All these results, together with the
   fact that the HSPG2*Q1062H point mutation was predicted to be benign
   and the mutant peptide was downregulated in PCa patients, suggest that
   this mutant peptide may have beneficial effects in patients with PCa
   and opens doors for its study in PCa treatment. Concerning the AMBP
   protein, it is cleaved into three chains, namely Alpha-1-microglobulin,
   Bikunin, and Trypstatin. The function of the AMBP protein in cancer
   remains undisclosed. However, it has been claimed that the AMBP-derived
   product bikunin is underexpressed in oral squamous cell carcinoma and
   plays an antitumor role [[275]40]. In line with this, there is evidence
   that bikunin significantly prevented tumor invasion and metastasis in
   Lewis lung carcinoma and ovarian carcinoma cells [[276]124,[277]125].
   Curiously, in this work, the mutant peptide identified in the AMBP
   isoform is located on the bikunin fragment. The mutation identified in
   AMBP was predicted to be probably damaging, destabilized all PPIs in
   which AMBP was involved, and resulted in an upregulation of mutant AMBP
   isoform in PCa patients. This may suggest a detrimental role of this
   mutation on PCa patients. Regarding CD55, it blocks complement response
   by accelerating the decay of C3 and C5 convertases [[278]126] and is
   involved in PCa cell survival and metastasis [[279]92]. This interplay
   between CD55 and C3 is visible by their interaction in the STRING
   network. The mutation detected on the CD55 protein was predicted to be
   probably damaging and destabilizing for CD55-C3 interaction. With these
   findings, it seems reasonable to suspect the detrimental role of this
   mutation on PCa patients. Regarding VASN, it is a known inhibitor of
   TGF-β signaling [[280]127]. The TGF-β pathway has a dual role in
   cancer, because it prevents cell proliferation in early stages and in
   advanced stages stimulates proliferation, epithelial-to-mesenchymal
   transition (EMT) and evasion of immune surveillance, and attenuates
   apoptosis [[281]128]. The mechanism involved in this inhibitory action
   of VASN on TGF-beta was revealed in breast cancer cell lines. It was
   demonstrated that a soluble form of VASN resulting from the proteolytic
   shedding of its extracellular domain by Metalloprotease domain 17
   (ADAM17) is responsible for controlling the TGFβ pathway [[282]129]. In
   PCa, the role of VASN is largely unexplored, including the interplay
   between the VASN and TGFβ pathways. However, overexpression of VASN in
   prostate tumor tissue and in serum from PCa patients and the subsequent
   promotion of cell proliferation and PCa progression have already been
   reported, in agreement with other types of cancer [[283]90].
   Interestingly, in this work, the mutated peptide identified in the VASN
   protein is located on the extracellular domain of the protein, the
   domain cleaved by ADAM17. The mutation identified in VASN resulted in a
   downregulation of this mutant protein isoform in PCa patients and was
   predicted to be benign, which may suggest a protective role of this
   mutation on PCa patients.

   These findings indicate that, in mutational diseases such as cancer and
   in biofluids with high proteolytic activity, such as urine, the
   application of proteogenomics to urine analysis and the study of
   peptides can be very enriching because point mutations can go unnoticed
   at the protein level but are detected at the peptide level. This may
   sharpen or renew interest in underexplored targets, as observed in this
   work. We hope to address some of these questions in future work.
   Furthermore, it would be interesting to test these mutant peptides by
   an MS-targeted approach such as MRM, but this is beyond the scope of
   this work. This work’s novelty lies in the proteogenome
   characterization of urine from PCa patients and the combined analysis
   of MS data using two different software packages, increasing certainty
   in the identification of urinary proteins modulated by PCa.

5. Conclusions

   The majority of mutations identified in this work have never been
   associated with PCa, and some are predicted to be damaging, which
   offers an auspicious opportunity for research and development of PCa
   biomarkers, especially in the HSPG2 context. Additionally, the
   discovery of cancer-associated mutations in PCa-related proteins in
   urine is promising given this biofluid’s non-invasive and dynamic
   nature.

Supplementary Materials

   The following supporting information can be downloaded at:
   [284]https://www.mdpi.com/article/10.3390/cancers14082001/s1: Table S1,
   Clinical data of subjects included in the discovery cohort group; Table
   S2, Clinical data of subjects included in the testing cohort group;
   Table S3, Summary of statistical analysis results of shortlisted
   proteins evaluated in the testing group; Table S4, List of mutant
   protein isoforms identified in the 86 proteins with known prostate
   expression; Table S5, Prediction of likely impact of point mutations on
   protein function using PolyPhen-2 tool; Table S6, Reactome pathway
   enrichment analysis of the network; Table S7, Prediction of likely
   impact of point mutations on protein–protein interactions using
   SAAMBE-SEQ tool; Figure S1, Original Western blots figures; Figure S2,
   Levels of AMBP*A286G, SLC9B1*N70S, HSPG2*Q1062H, ZNF624*S207F,
   VASN*R161Q, and CD55*S162L mutant protein isoforms and respective
   levels of native form (when applicable) in the urine from PCa patients.
   [285]Click here for additional data file.^ (761.7KB, zip)

Author Contributions

   Conceptualization, T.L., R.H., R.V. and M.F.; methodology, T.L., R.H.,
   R.V. and M.F.; validation, T.L., R.H., R.V. and M.F.; formal analysis,
   T.L., A.S.B., F.T., R.F., C.J., A.L.-M., R.H., R.V. and M.F.;
   investigation, T.L.; resources, D.B.-S., C.J., L.A., A.L.-M., R.H.,
   R.V. and M.F.; data curation, A.S.B., F.T. and R.F.; writing—original
   draft preparation, T.L.; writing—review and Editing, T.L., A.S.B.,
   F.T., R.F., A.L.-M., D.B.-S., L.A., C.J., R.H., R.V. and M.F.;
   visualization, T.L., R.H., R.V. and M.F.; supervision, R.H., R.V. and
   M.F.; project administration, T.L.; funding acquisition, T.L., R.H.,
   R.V. and M.F. All authors have read and agreed to the published version
   of the manuscript.

Funding

   This work was supported by the Portuguese Foundation for Science and
   Technology (FCT), European Union, QREN, FEDER, and COMPETE for the
   Unidade de Investigação Cardiovascular (UIDB/IC/00051/2020 and
   UIDP/00051/2020), Institute of Biomedicine (iBiMED) (UIDB/04501/2020,
   POCI-01-0145-FEDER-007628), and FCT QOPNA ((FCT UID/QUI/00062/2019) and
   LAQV/REQUIMTE (UIDB/50006/2020) research units and the DOCnet
   (NORTE-01-0145-FEDER-000003), by Norte Portugal Regional Operational
   Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement,
   through the European Regional Development Fund (ERDF). T.L. is
   supported by an individual scholarship (SFRH/BD/136904/2018), F.T. by a
   post-doctoral research grant by UnIC (UIDP/00051/2020), and R.V. by
   individual fellowship grant (IF/00286/2015).

Institutional Review Board Statement

   The study was conducted according to the guidelines of the Declaration
   of Helsinki and approved by the IPO-Porto Ethics Committee (Comissão de
   Ética para a Saúde, Reference 282R/2017).

Informed Consent Statement

   Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

   Data generated during Mass Spectrometry analysis is available in the
   ProteomeXchange Consortium via the PRIDE partner repository with the
   data set identifier PXD017902.

Conflicts of Interest

   The authors declare no conflict of interest.

Footnotes

   Publisher’s Note: MDPI stays neutral with regard to jurisdictional
   claims in published maps and institutional affiliations.

References