Abstract

Simple Summary

   The detection of early-stage colorectal cancer increases the chance to
   prevent tumor progression and death by the disease. Colonoscopy is one
   sensitive screening test to detect malignant or potentially malignant
   lesions in the intestines. However, it has some disadvantages,
   including sedation requirements, increased risk of colon perforation,
   and bleeding. Circulating microRNAs (miRNAs) in plasma or serum from
   cancer patients have been investigated and described as potential
   diagnostic or prognostic markers. We conducted an miRNAs screening test
   in plasma samples from colorectal cancer patients and subjects without
   cancer, aiming to identify markers for the early detection of the
   disease. We identified and validated four miRNAs capable of
   distinguishing cancer from non-cancer cases. Our non-invasive
   diagnostic biomarkers presented high performance and are easily
   applicable to clinical practice.

Abstract

   Colorectal cancer (CRC) is a disease with high incidence and mortality.
   Colonoscopy is a gold standard among tests used for CRC traceability.
   However, serious complications, such as colon perforation, may occur.
   Non-invasive diagnostic procedures are an unmet need. We aimed to
   identify a plasma microRNA (miRNA) signature for CRC detection. Plasma
   samples were obtained from subjects (n = 109) at different stages of
   colorectal carcinogenesis. The patients were stratified into a
   non-cancer (27 healthy volunteers, 17 patients with hyperplastic
   polyps, 24 with adenomas), and a cancer group (20 CRC and 21 metastatic
   CRC). miRNAs (381) were screened by TaqMan Low-Density Array. A
   classifier based on four differentially expressed miRNAs (miR-28-3p,
   let-7e-5p, miR-106a-5p, and miR-542-5p) was able to discriminate cancer
   versus non-cancer cases. The overexpression of these miRNAs was
   confirmed by RT-qPCR, and a cross-study validation step was implemented
   using eight data series retrieved from Gene Expression Omnibus (GEO).
   In addition, another external data validation using CRC surgical
   specimens from The Cancer Genome Atlas (TCGA) was carried out. The
   predictive model’s performance in the validation set was 76.5%
   accuracy, 59.4% sensitivity, and 86.8% specificity (area under the
   curve, AUC = 0.716). The employment of our model in the independent
   publicly available datasets confirmed a good discrimination performance
   in five of eight datasets (median AUC = 0.823). Applying this algorithm
   to the TCGA cohort, we found 99.5% accuracy, 99.7% sensitivity, and
   90.9% specificity (AUC = 0.998) when the model was applied to solid
   colorectal tissues. Overall, we suggest a novel signature of four
   circulating miRNAs, i.e., miR-28-3p, let-7e-5p, miR-106a-5p, and
   miR-542-5p, as a predictive tool for the detection of CRC.

   Keywords: colorectal cancer, blood, microRNA, diagnosis

1. Introduction

   Colorectal cancer (CRC) is the third most frequent type of cancer
   worldwide [[60]1]. CRC is frequently diagnosed at an advanced stage,
   and distant metastases contribute to its high mortality rate [[61]2].
   It is a complex disease that involves interactions between genetic and
   environmental factors in the intestinal epithelium [[62]3]. The normal
   epithelium accumulates changes over 20–40 years, progressing from
   different dysplasia grades to the establishment of local and distant
   metastasis [[63]4,[64]5]. Remarkably, the long-term tumor development
   opens the perspective for early disease traceability [[65]6].

   Colonoscopy is a gold-standard screening method that significantly
   reduces the mortality rate since it allows the detection of
   precancerous polyps and early-stage CRC. Despite being an outstanding
   screening tool, several limitations have been described, such as colon
   perforation, bleeding, sedation requirements, cost, and invasiveness of
   the procedure [[66]7]. The development of rapid, less invasive, and
   low-risk procedures complementary to colonoscopy is highly welcome. The
   fecal occult blood test (FOBT)-based screening, a non-invasive
   technique, can discover the existence of polyps, adenomas, and tumors
   in the intestine. However, the FOBT is limited by its reduced positive
   predictive value [[67]8]. Non-invasive protocols clinically useful in
   CRC screening programs showing high performance are required.

   Blood biomarkers are a promising alternative diagnostic approach to
   detect CRC [[68]9,[69]10]. The carcinoembryonic antigen (CEA) has been
   widely used as a blood-based molecular marker for detecting tumor
   recurrence [[70]11]. Other circulating biomarkers have also been
   considered for post-operative CRC surveillance, such as cancer antigen
   19-9 (CA19-9), cancer antigen 125 (CA125), and Septin 9 methylated DNA
   [[71]9,[72]12]. Despite being useful for patient monitoring, these
   markers are also associated with non-neoplastic conditions, such as
   inflammatory bowel disease and endometriosis, and other types of
   cancer, including ovarian, gastric, pancreatic, and lung cancer
   [[73]9]. Notwithstanding, no circulating biomarker is currently
   available for the early detection of CRC. Circulating microRNAs
   (miRNAs) have been considered new promising biological markers for
   cancer detection [[74]10].

   miRNAs are small non-coding RNAs of 21–25 nucleotides, which repress
   target messenger RNAs (mRNAs). miRNAs play a critical role in cell
   signaling networks, and their expression is associated with several
   tumor types, including colorectal, breast, gastric, lung cancers and
   sarcomas [[75]13,[76]14,[77]15]. Pathological conditions can impact the
   miRNA profile, and this is the basis for obtaining liquid biopsies. A
   liquid biopsy can be obtained at all stages of cancer diagnosis and
   treatment, allowing non-invasive and real-time monitoring of disease
   development [[78]16,[79]17].

   Although many studies propose detecting miRNA signatures for early CRC
   diagnosis [[80]18,[81]19,[82]20,[83]21,[84]22,[85]23,[86]24], several
   limitations are observed, such as reduced coverage for miRNA screening
   [[87]19,[88]22], variable signatures for different disease stages
   [[89]24], absence of non-cancer groups [[90]20], or lack of data
   validation. Additionally, some of these studies analyzed only one miRNA
   as a biomarker [[91]25], miRNA allied with inflammatory mediators
   [[92]23], or miRNA expression without testing the signature’s power
   [[93]18,[94]21,[95]26,[96]27,[97]28]. Indeed, these strategies have
   reduced validity due to tumor heterogeneity and potential lack of
   specificity. A broader miRNA panel for identifying a suitable signature
   to be tested in all tumor development stages and validated with gene
   databases seems an appropriate approach to overcome such limitations
   and increase diagnostic accuracy, specificity, and sensitivity.

   In the present study, plasma samples of healthy subjects and patients
   at different stages of colorectal carcinogenesis were assessed for
   identifying an miRNA signature capable of distinguishing cancer
   patients from those without the disease.

2. Results

   The clinical–demographic characteristics of subjects included in the
   study are shown in [98]Table 1. [99]Figure 1 depicts the study workflow
   designed to identify a diagnostic miRNA signature from plasma samples
   of patients at different CRC development stages. One hundred and nine
   subjects (61 female and 48 male) were recruited and stratified into two
   groups: a non-cancer group (n = 68) composed of healthy controls,
   individuals with hyperplastic polyps, or adenoma and a cancer group (n
   = 41), which included CRC patients and subjects with metastatic CRC
   ([100]Table 1 and [101]Figure 1). Most of the patients never smoked
   (66.06%) and used no medications at the recruitment time (54.12%).
   Antihypertensives, proton-pump inhibitors, and analgesics were drugs
   reported by the subjects with current medication use.

Table 1.

   Clinical–demographic characteristics.
   Variables      Groups
                  Total     Non-Cancer  Cancer
                  n  %      n  %        n  %
   Gender
   Female         61 55.96% 40 65.57%   21 34.43%
   Male           48 44.04% 28 58.34%   20 41.66%
   Age
   Up to 60 years 59 54.12% 40 67.79%   19 32.21%
   >60 years      50 45.88% 28 56%      22 44%
   Smoking status
   Never smoker   72 66.06% 49 68.05%   23 31.94%
   Former smoker  25 22.93% 13 52%      12 48%
   Current smoker 12 11.01% 6  50%      6  50%
   Medication
   No             59 54.12% 40 68%      19 32%
   Yes            50 45.88% 28 56%      22 44%
   [102]Open in a new tab

Figure 1.

   [103]Figure 1
   [104]Open in a new tab

   Study workflow to identify a diagnostic miRNA signature from plasma
   samples of patients at different stages of CRC development. CRC,
   colorectal cancer.

2.1. Circulating miRNA Profile in the Plasma of CRC and Non-CRC Subjects

   A total of 292 out of 381 potential markers was included in the
   screening phase of our study. Eighty-nine candidates showing constant
   deficient expression (a null cycle quantification after 40 cycles of
   amplification) in cancer and non-cancer samples were excluded
   ([105]Figure 1 and [106]Table S1).

   Unsupervised hierarchical clustering analysis did not demonstrate a
   clear stratification between cancer and non-cancer samples ([107]Figure
   2A). These groups were statistically compared, unveiling nine miRNAs
   with differential expression in plasma ([108]Figure 2B and [109]Table
   S2). Increased expression levels of miR-542-5p, miR-28-3p, miR-106a-5p,
   let-7e-5p, miR-454-3p, and miR-203a and decreased expression levels of
   miR-190a-5p, miR-383-5p, and miR-519a-3p were detected in CRC patients
   compared with the non-cancer group ([110]Figure 2B). A supervised
   hierarchical clustering analysis, including these nine miRNAs, revealed
   a group of six miRNAs associated with cancer- ([111]Figure 2C). One
   cluster enriched of cancer patients (black) and another (gray) composed
   exclusively of non-cancer individuals (healthy volunteers and patients
   with hyperplastic polyps and adenomas) were also observed (top of
   [112]Figure 2A,C).

Figure 2.

   [113]Figure 2
   [114]Open in a new tab

   Hierarchical clustering analysis and plots representing plasma miRNAs.
   (A) Unsupervised hierarchical clustering analysis of 292 circulating
   miRNAs (RT-qPCR-TaqMan Low-Density Array (TLDA) assay). (B)
   Differential expression of nine miRNAs in the plasma of cancer versus
   non-cancer cases. The boxplot displays the first quartile, median, and
   third quartiles (interquartile range) and the minimum and maximum
   values excluding outliers of the log[2]-normalized relative
   quantification of the miRNAs in plasma (RT-qPCR-TLDA assay). (C)
   Supervised hierarchical clustering analysis comprising the nine
   differentially expressed miRNAs. The dendrogram demonstrates a
   stratification of samples into two clusters (black and gray) associated
   with the cancer status. The lines in the heatmaps represent individual
   miRNAs, and the columns represent each sample; * p < 0.05; ** p < 0.01
   (t-test).

2.2. Circulating miRNA-Based Model to Predict Colorectal Malignancy

   A potential diagnostic tool was designed based exclusively on the
   over-represented circulating miRNAs (n = 6) in CRC patients from the
   discovery set. Six models were tested, including one to six miRNAs
   (support vector machine (SVM) method with recursive elimination), where
   a combination of four to six miRNAs achieved the best overall accuracy
   (87.5%) ([115]Figure 3A). Accordingly, we carried on with the
   four-miRNAs combination that required fewer assays. The application of
   the four-miR-based classifier in the screening phase (score = let-7e-5p
   × 1.037 + miR-106a-5p × 0.9 + miR-28-3p × 0.247 + miR-542-5p × 0.903;
   cancer prediction threshold >1.024) yielded an 88.9% sensitivity and
   86.7% specificity (55.6% and 80.0% in the leave-one-out
   cross-validation (LOOCV), respectively) ([116]Figure 3B and [117]Table
   2).

Figure 3.

   [118]Figure 3
   [119]Open in a new tab

   Training and validation of the circulating miRNA-based diagnostic
   classifier. (A) Cancer prediction models including one to six miRNAs
   (selected by the recursive elimination method) previously detected at
   higher levels in the blood samples of CRC patients. Representative
   graphs of overall yield accuracy and LOOCV estimative. (B) Application
   of the four-miR-based classifier (miR-106a + let-7e + miR-28 + miR-542)
   in the screening phase (evaluated by the TLDA assay). (C) The
   four-miR-based classifier applied to a subset of cases of the discovery
   set (screening phase) using individual RT-qPCR assays. (D) Application
   of the four-miR-based classifier to a group of samples independent of
   the screening phase (validation set) using individual RT-qPCR assays.
   The dotted line indicates the threshold above which a malignant status
   would be predicted. SVM: support vector machine; LOOCV: leave-one-out
   cross-validation; AUC: area under the ROC curve; CI[95%]: 95%
   confidence interval.

Table 2.

   Classification performance of the four-miR-based classifier used to
   distinguish colorectal cancer from non-cancer individuals.
     Metric        TLDA Assay                  Single Assays
                 Screening Phase      Discovery Set      Validation Set
               Estimate (CI[95%])  Estimate (CI[95%])  Estimate (CI[95%])
   Sensitivity  88.9 (50.7–99.4)    57.1 (20.2–88.2)    59.4 (40.8–75.8)
   Specificity  86.7 (58.4–97.7)    80.0 (51.4–94.7)     86.8 (74–94.1)
       PPV      80.0 (44.2–96.5)    57.1 (20.2–88.2)    73.1 (51.9–87.6)
       NPV      92.9 (64.2–99.6)     80 (51.4–94.7)     78.0 (64.9–87.3)
       AUC     0.867 (0.710–1.000) 0.743 (0.501–0.985) 0.716 (0.600–0.832)
   [120]Open in a new tab

   CI[95%]: 95% confidence interval; PPV = positive predictive value; NPV
   = negative predictive value.

2.3. Validation of the Circulating miRNAs as a Diagnostic Model

   Four putative plasma markers (let-7e-5p, miR-106a-5p, miR-28-3p, and
   miR-542-5p) and two carefully selected endogenous reference miRNAs
   (mir-423-5p and mir-361-5p) were further tested using RT-qPCR assays.
   The sequences of the endogenous and target miRNAs are described in
   [121]Table S3. This step was carried out in a subset of samples
   previously analyzed in the screening phase (discovery set; n = 22) and
   validation set (n = 85). The same mathematical model previously
   designed was adopted to support the predictive model performance,
   adjusting the threshold to achieve the best overall accuracy (cancer
   prediction threshold >2.442). The method demonstrated a similar
   classification performance in the discovery (72.7% accuracy, 57.1%
   sensitivity, 80% specificity, AUC = 0.743) and validation sets (76.5%
   accuracy, 59.4% sensitivity, 86.8% specificity, AUC = 0.716)
   ([122]Figure 3C,D and [123]Table 2). The combined four-miR-based
   classifier had a higher AUC than any single miRNAs marker in the
   discovery and validation sets ([124]Figure S1).

2.4. Performance of the Diagnostic Model in External Datasets of Liquid
Biopsies and Solid Tissues

   To confirm the performance of our circulating miRNA model, we
   investigated publicly available databases comprising small non-coding
   RNAs analysis of liquid biopsy samples from CRC and controls in the
   Gene Expression Omnibus (GEO). Sixteen data series were found, and
   seven were included after employing the inclusion/exclusion criteria
   and curation of the published articles ([125]Figure 4A and [126]Table
   S4).

Figure 4.

   [127]Figure 4
   [128]Open in a new tab

   Performance of the four-miR classifier tested in the Gene Expression
   Omnibus (GEO) dataset. (A) Database searching, inclusion and exclusion
   criteria. (B) Among 16 studies found in the GEO datasets, 7 fulfilled
   the criteria of number of samples (≥20 samples of both CRC and
   controls), 5 used serum samples and validated our four-miR classifier
   model, and 3 datasets (exosome, serum, and plasma samples) showed no
   significant association.

   This cross-study validation step included five studies assessing small
   non-coding RNAs from serum ([129]GSE106817, [130]GSE113740,
   [131]GSE112264, [132]GSE124158, [133]GSE113486, and [134]GSE59856), one
   from plasma ([135]GSE25609), and one from plasma-derived extracellular
   vesicles ([136]GSE71008). The available processed values (microarray
   and high-throughput sequencing) were used to generate the four-miRNA
   score, and the ROC curve for all studies was assessed. The AUCs varied
   largely, ranging from 0.068 to 0.896 in different studies, with a
   median of 0.823 ([137]Figure 4B).

   Since the source of the circulating miRNAs included in our liquid
   biopsy method may have its origin from colorectal cancer, we sought to
   investigate the four-miRNA model performance in predicting malignancy
   directly in tumors. Colorectal tumors (n = 615) and non-neoplastic
   samples (n = 11) from The Cancer Genome Atlas (TCGA) database were used
   in this approach. A high classification efficiency was obtained (99.5%
   accuracy, 99.7% sensitivity, 90.9% specificity, AUC = 0.998) by
   adapting the model threshold for RNA sequencing quantification (cancer
   prediction threshold score >19.14) and applying the same weight for
   each marker ([138]Figure 5).

Figure 5.

   [139]Figure 5
   [140]Open in a new tab

   Performance of the four-miR classifier tested in TCGA colorectal
   primary tumors and adjacent non-cancer tissues. The classifier designed
   to be a liquid biopsy method also demonstrated high power in
   discriminating cancer and non-cancer colorectal tissues of the TCGA
   dataset. The dotted line indicates the threshold above which a
   malignant status would be predicted. TCGA: The Cancer Genome Atlas;
   COAD: colon cancer cohort from TCGA; READ: rectal cancer cohort from
   TCGA.

   After being tested on CRC samples from TCGA, the SVM model was also
   applied to 14 other tumor types from the Pan-Cancer cohort. Despite a
   relatively high discrimination power observed for urothelial bladder
   carcinoma (AUC = 0.878), the model was found to be CRC-specific (AUC =
   0.998) ([141]Figure S2).

2.5. Putative mRNA Targets and Pathways Regulated by the Selected miRNAs

   MicroRNAs regulate numerous target mRNAs that are involved in critical
   signaling pathways. Based on predicted interactions (miRWalk, miRanda,
   RNAhybrid, and Targetscan), miR-106a-5p, let-7e-5p, miR-28-3p, and
   miR-542 were estimated to regulate 2239, 1020, 637, and 203 mRNA
   targets, respectively ([142]Table S6). The biological pathways enriched
   with the miRNA targets (performed separately for each miRNA) were
   mainly cancer-related ([143]Tables S5 and S6 and Figure S3). The
   colorectal cancer pathway was among the most significant pathways for
   three of four tested miRNAs (miR-106a-5p, let-7e-5p, and miR-28-3p)
   ([144]Figure 6).

Figure 6.

   [145]Figure 6
   [146]Open in a new tab

   Biological pathways enriched with the mRNAs predicted to be targets of
   miR-106a-5p, let-7e-5p, miR-28-3p, and miR-542-5p. The colorectal
   cancer pathway (red star) is among the most significant pathways for
   three out of four tested miRNAs (miR-106a-5p, let-7e-5p, and
   miR-28-3p). p-Value expressed as −log[10].

3. Discussion

   CRC screening methods include stool-based tests for occult blood search
   and endoscopic or radiologic imaging [[147]29]. According to the
   updated National Comprehensive Cancer Network Clinical Practice
   Guidelines in Oncology, colonoscopy remains an effective and sensitive
   procedure for the detection of CRC compared with other screening
   modalities [[148]29]. However, the limiting access to care, lack of
   adequate bowel preparation, bleeding, and colon perforation are among
   its complication risks [[149]30]. New protocols have been described to
   overcome these limitations. The circulating Septin 9 methylated DNA
   demonstrated 73.3% sensitivity for CRC detection, comparable with that
   of the fecal immunochemical test (68.0%) [[150]31], and is FDA-approved
   as an emerging, more accessible blood-based test option [[151]29]. The
   performance of these tests is still far from ideal, and novel and
   sensitive blood biomarkers remain demanded.

   In the present study, the combination of four circulating overexpressed
   miRNAs (let-7e-5p, miR-106a-5p, miR-28-3p, and miR-542-5p)
   distinguished patients with CRC from healthy subjects and individuals
   with precursor lesions, particularly, hyperplastic polyps and adenomas.
   These findings indicate the potential of this circulating miRNA
   signature in predicting tumors in the colon and rectum at early stages.
   We used the Recursive Feature Elimination method to test multiple
   marker combinations and LOOCV to estimate the performance to avoid a
   marker selection bias. The combinations tested in our study were
   systematically defined, including only statistically significant
   individual markers overrepresented in the plasma of the cancer
   patients, using the recursive elimination method before training the
   classifier (SVM method). The recursive feature elimination method is
   based on removing the weakest features until a specific number of
   features is reached, avoiding collinearity and dependencies inside the
   model [[152]32].

   The four-miRNA-based signature discovered in our screening phase was
   tested in subjects at different colorectal carcinogenesis stages and
   validated in the cohort of colorectal samples from the TCGA database.
   The classifier designed to be a plasma miRNA signature also
   demonstrated high performance in differentiating cancer from non-cancer
   colorectal tissues, which infers the method’s accuracy. Strategies
   using biomarker signatures increase a method’s significance by boosting
   its diagnostic efficiency. Eslamizadeh et al. analyzed a panel of eight
   miRNAs to compare the plasma of CRC patients with that of healthy
   controls. Among the miRNAs investigated, four miRNAs distinguished
   these groups, but the diagnostic perspective was reduced by the
   independent analyses of each miRNA [[153]21]. Other studies proposed
   plasma miRNA panels with potential clinical value for early CRC
   detection, demonstrating an AUC = 0.8356–0.866, with 78–91% sensitivity
   and 79–88% specificity, but not performing external validation of the
   miRNA panels in tumor tissues [[154]33,[155]34].

   Considering that changes in miRNAs expression are expected during tumor
   development [[156]35], a tumor signature essentially must be validated
   as a whole and bear the diagnostic power of their units combined. In
   line with that, Zanutto and colleagues proposed a plasma miRNA-based
   test associated with the fecal immunochemical test to identify patients
   that could benefit from subsequent colonoscopy [[157]24]. The authors
   categorized miRNA signatures as specific for low-grade adenoma,
   high-grade adenoma, or cancerous lesions. Interestingly, they found
   increased expression of some miRNAs in high-grade adenomas but reduced
   expression of the same in low-grade adenomas and cancerous lesions
   [[158]24]. Such an approach diverges from ours, since we propose
   identifying miRNAs that are progressively expressed along with the
   carcinogenic process. Our strategy also contrasts with other studies
   that grouped advanced adenoma and CRC and found plasma- or
   serum-derived miRNA signatures differentially expressed with respect to
   control individuals [[159]22,[160]36]. Grouping non-neoplastic lesions
   with neoplastic tumors limits the identification of markers that could
   differentiate these groups of lesions.

   Another advantage of our proposed plasma miRNA signature compared to
   others previously reported is the superior classification performance
   when applied to tissue specimens [[161]37,[162]38]. In CRC samples
   compared with normal tissues, Zhu et al. (2017) reported a three-miRNA
   panel with good accuracy in predicting tumor samples (AUC = 0.830)
   [[163]38]. Notably, our four-miR classifier presented a higher
   diagnostic efficiency (AUC = 0.998). Using stringent criteria for the
   selection of key miRNAs as described in our study increases the
   diagnostic potential of a given signature. The use of miRNA profiles in
   liquid biopsies of cancer patients has received special attention in
   recent years. However, the main message from studies in this area is
   the difficulty in generating reproducible data. The method and source
   (serum, total plasma, purified extracellular vesicles, for instance)
   used to isolate microRNAs can result in variations in the miRNA profile
   [[164]39]. Remarkably, we tested the performance of the four-miR
   classifier in the GEO dataset. Among the studies evaluated, five of
   them validated our classifier model, despite being serum-based
   analyses. Therefore, the four-miR classifier proposed in our study is
   suitable to be used despite the blood collection method (serum or
   plasma). It is important to note that if the signature is effective
   both in plasma and in serum, samples included in the same study must be
   collected with the same protocol.

   Among the miRNAs herein detected, let-7e-5p is broadly described in
   several cancers, including head and neck and rectum [[165]25,[166]40].
   Interestingly, let-7e-5p was suggested as a prognostic marker for
   inducing metastatic capacity in rectal carcinomas [[167]25]. In
   addition, let-7e-5p-inducing cell migration was further confirmed in
   the colon carcinoma-derived Caco-2 cell line transfected with
   hsa-let-7e-5p-carrying plasmids. The underlying metastatic mechanism is
   unclear but seems to involve the modulation of MYC pathways
   [[168]25,[169]41]. The second key miRNA identified in our diagnostic
   classifier was miR-106a-5p, which showed a high discriminative power
   between the groups. miR-106a-5p overexpression contributes to cell
   invasion and is associated with 5-fluorouracil resistance in colorectal
   cancer patients [[170]42]. The tumorigenic mechanism might involve the
   inhibition of apoptotic pathways, as demonstrated in breast cancer
   cells [[171]43]. A translational approach demonstrated that miR-106a-5p
   is overexpressed in colorectal cancer and associated with tumor stage,
   vascular invasion, and lymph node metastasis, reducing disease-free
   survival [[172]44]. Similarly, miR-28-3p expression was also related to
   colon and rectum malignancies. miR-28 is described to induce tumor
   metastases in CRC animal models and increase the migration and
   invasiveness capacity of the colorectal cancer cell line HCT-116
   [[173]45].

   We also found miR-542-5p overexpression in the plasma but with lower
   discriminating capacity in subjects with CRC than other miRNAs,
   including miR-28-3p and miR-106a-5p. One possible explanation might
   involve the reduced interaction between the signaling pathways that
   these miRNAs regulate. miR-542-5p is found to induce mitochondrial
   dysfunction and activation of SMAD2/3 phosphorylation [[174]46], a
   signaling molecule downstream of transforming growth factor-β (TGF-β).
   TGF-β is a critical player in epithelial–mesenchymal transition,
   favoring tumor cell survival and dissemination [[175]47]. Additionally,
   it allows tumor microenvironment remodeling to support cancer
   progression [[176]47]. Together with TGF-β, dysfunctional mitochondria
   can trigger gene expression changes, altering cell morphology and
   function and resulting in a pro-tumorigenic phenotype [[177]48].

   Despite the experimental and clinical relevance, the specificity of
   single miRNAs as a diagnostic tool is limited since most miRNAs are
   expressed by other tumor types and inflammatory conditions, generating
   false-positive or false-negative results [[178]49]. The use of a
   diagnostic classifier must then overcome an miRNA biological function.
   Consistently, the high efficiency of our four-miR-based diagnostic tool
   (99.5% accuracy, 99.7% sensitivity, 90.9% specificity for tumor
   specimens) suggests its technical reliability.

   However, our study has some limitations, including a low number of
   patients. In this setting, the signature proposed was not sensitive
   enough to discriminate subsets of patients either in the non-cancer or
   in the cancer groups. In addition to the possibility that some
   interesting miRNAs might be lost due to the limited number of cases
   used in our screening phase, a larger number of miRNAs were not tested.
   Current information on the human miRNome estimates about 2300 human
   mature miRNAs, only 50% (1115 miRNAs) of which are annotated in miRBase
   V22 [[179]50]. Then, technical limitations must be considered regarding
   the number of miRNAs analyzed (381 miRNAs) in our study. Second, the
   TCGA database involves some restrictions, since the data are not
   curated, and several comorbidities might influence population
   variability. The validity of the proposed signature might also consider
   the simultaneous presence of other morbidities. The prognostic
   applicability also merits further investigation in prospective studies.
   In our protocol, blood samples were collected before either the
   endoscopic procedure or chemotherapy. This strategy was essential to
   prevent any bias associated with anesthetics and chemotherapy
   administration. After the initiation of chemotherapy, tumor biology
   changes as well as the miRNA profile [[180]51]. The type of
   chemotherapeutic regimens (drugs, dose intensity, time, etc.)
   administered is also patient-specific, which increases the number of
   variables to control. Furthermore, miRNAs expression can be altered as
   a consequence of the treatment [[181]52] and have a role in cancer drug
   resistance. Based on these statements, our study was designed to
   identify a signature useful as a diagnostic tool.

   Therefore, the combination of let-7e-5p, miR-106a-5p, mir-28-3p, and
   miR-542-5p as a proposed signature does not intend to replace the
   current gold standards in diagnostic measures but as a complementary
   tool to improve cancer-screening methods. The combined analysis of our
   four-miRNAs has potential clinical applicability and overwhelms the
   shortcomings of some circulating miRNA signatures that independently
   evaluate each of its composing miRNAs.

4. Materials and Methods

4.1. Patients and Study Design

   This observational, analytical, cross-sectional study was approved by
   the Ethics Committee of the involved institutions (Cancer Institute of
   Ceará, Haroldo Juaçaba Hospital; Walter Cantídio University Hospital,
   Federal University of Ceará—HUWC/UFC; and Dr. César Cals General
   Hospital-HGCC) (# 3.047.394, CAAE: 32361714.0.1001.5528). All subjects
   provided written informed consent, and the study was performed
   following the Declaration of Helsinki.

   The number of colorectal cancer patients admitted at the Oncology
   Department per year in the Cancer Institute of Ceará, Brazil is 200
   individuals (N). The mean cycle quantification (
   [MATH: <mrow><mrow><mover><mi
   mathvariant="normal">X</mi><mo>¯</mo></mover></mrow></mrow> :MATH]
   ) ± standard deviation(s) for the cancer patients recruited in the
   screening phase is 17.78 ± 1.92. Sample size calculation with a 95%
   confidence interval († = 2.306, considering t distribution having 8
   degrees of freedom) and sampling error (e, 5% of the
   [MATH: <mrow><mrow><mover><mi
   mathvariant="normal">X</mi><mo>¯</mo></mover></mrow></mrow> :MATH]
   ) was based on the following formula: n = [N × s^2 × †^2]/[(N − 1)×e^2
   + s^2 × †^2], indicating a sample size (n) of 16 per group (screening +
   discovery + validation sets) [[182]53].

   A total of 109 subjects were stratified into non-cancer (27 healthy
   controls, 17 individuals with hyperplastic polyps, and 24 with adenoma)
   and cancer patients (20 CRC and 21 metastatic CRC) ([183]Table 1). All
   individuals were submitted to a routine colonoscopy; the suspected
   lesions, when identified, were processed for histopathological
   evaluation to confirm the diagnosis. Eligibility criteria for patient
   classification into groups also comprised: (1) healthy volunteers
   (absence of colorectal lesions), (2) hyperplastic polyps (larger than
   10 mm, removed by a colonoscopic procedure—polypectomy), (3) adenomas
   (larger than 10 mm or with a high degree of dysplasia or at least 20%
   of the villous component removed with a colonoscopic procedure), (4)
   patients with advanced non-metastatic CRC (tumor lesions >1.0 cm) and
   (5) metastatic CRC (at advanced stages and distant CRC metastasis or
   post-surgical resection with active disease). Considering that
   chemotherapy alters the miRNA profile in tumor and te blood [[184]51],
   the patients were included in this study before exposure to
   chemotherapy. All patients enrolled in this study were older than 18
   years.

   The following exclusion criteria were adopted: clinical diagnosis of
   familial adenomatous polyposis or Lynch syndrome; the presence of more
   than 10 colorectal adenomas, inflammatory bowel disease, or diabetes;
   other primary tumors at the time of recruitment; chemotherapy or
   radiotherapy before blood collection; incomplete colonoscopy;
   inadequate preparation for colonoscopy; and the presence of any degree
   of hemolysis (determined by visual inspection) in plasma samples. The
   absence of hemolysis in samples that proceeded for analysis was later
   confirmed by the delta cycle quantification (Cq) (miR-23a and miR-451),
   positive if >7 [[185]54].

   The blood samples (8 mL) were collected in EDTA-K2 vials (BD
   Vacutainer^®, Becton Dickinson, São Paulo, Brazil) by venous puncture.
   Plasma was obtained by sample centrifugation at 1000× g (4 °C for 10
   min), transferred to cryotubes, and then stored at −80 °C until use.

4.2. RNA Extraction and cDNA Synthesis

   RNA was isolated from 1 mL of plasma samples using the TRIzol reagent
   (Life Technologies, Carlsbad, CA, USA) according to the manufacturer’s
   instructions, followed by purification with the columns from the
   miRNeasy Mini kit (Qiagen, Valencia, CA, USA). Samples triplicates were
   used for column saturation. RNA was eluted in nuclease-free water and
   treated to eliminate genomic DNA contamination with a DNA-free kit
   (Life Technologies, Carlsbad, CA, USA). Due to its low abundance, the
   RNA was quantified with the Bioanalyzer small RNA Analysis kit (Agilent
   Technologies, Santa Clara, CA, USA) and the Agilent 2100 Bioanalyzer
   (Agilent Technologies, Santa Clara, CA, USA). RNA (75 ng) and 4.5 μL of
   the Megaplex™ RT Primers, Human Pool A v2.1 (Thermo Fisher Scientific,
   Pleasanton, CA, USA) were used to obtain a final volume of 7.5 µL of
   reaction per sample for the synthesis of cDNA, according to the
   manufacturer’s recommendations. Once converted to cDNAs, these miRNAs
   were subjected to a pre-amplification step using the Megaplex^TM PreAmp
   Primers Human Pool A Kit and TaqMan^TM Master Mix (Applied Biosystems,
   Foster City, CA, USA). Then, the resulting pre-amplified product was
   used to detect miRNA expression by TLDA.

4.3. MicroRNA Relative Quantification by TaqMan Low-Density Array

   The miRNA expression analysis of 24 cases (discovery set) was performed
   using the TaqMan^TM Array Human MicroRNA A Cards v2.0 (TLDA) (Applied
   Biosystems, Foster City, CA, USA), composed of 377 miRNAs, three
   small-nucleolar RNAs, and one negative control (exogenous miRNA). The
   reactions were performed in the Biosystems Prism 7900HT Fast Real-Time
   PCR sequence detection System (Applied Biosystems, Foster City, CA,
   USA). Sequences with constant deficient expression were removed (n =
   85), based on null cycle quantification (Cq) in more than 5% of the
   samples (after 40 cycles) of the cancer and non-cancer groups
   ([186]Table S1). The miRNAs detected in at least 95% of the samples in
   any of the biological groups were further evaluated. RT-qPCR
   normalization was carried out following the Livak and Schmittgen method
   (2001) [[187]55]. The arithmetic mean of the Ct/Cq values from the
   control samples for each miRNA was used as a calibrator value in the
   normalization. The geometric mean of the ΔCt from all filtered miRNAs
   (n = 292) was used as a reference quantification (normalization factor)
   and integrated with the 2^−ΔΔCt model to obtain the relative
   quantification of the target miRNAs. The log[2] transformed values were
   further quantile-normalized using the program BRB ArrayTools v. 4.4.0
   (Biometric Research Branch, National Cancer Institute) to avoid
   inter-sample variation. Hierarchical clustering analysis was
   implemented with one minus correlation distance and complete linkage
   (BRB ArrayTools).

4.4. Circulating miRNA-Based Diagnostic Model

   Non-cancer (5 healthy volunteers, 5 hyperplastic polyps, and 5
   adenomas) and cancer samples (5 CRC and 4 metastatic CRC) were used to
   identify miRNAs differentially expressed in the screening phase. The
   screening phase was performed with a reduced number of samples per
   group; however, the number of samples was further enlarged in the
   validation set to confirm the importance of the selected miRNAs. The
   groups were statistically compared using a random variance t-test,
   adopting a p-value < 0.05 and FC (fold change) ≥|1.5|. The classifiers
   were trained by Support Vector Machine (BRB Array Tools), considering
   the higher circulating miRNAs in the cancer group. The RT-qPCR values
   used in the statistical analysis, illustrations, and introduced into
   the mathematical model were the log[2]-transformed after the
   normalization using the Livak method [[188]55].

4.5. Validation of Selected microRNAs

   Since the use of hundreds of miRNAs to obtain a reference
   quantification would not be feasible for the normalization in the
   individual TaqMan assays (Discovery and Validation Sets), we used the
   best pair of endogenous control candidates based on the TLDA results.
   From the 292 filtered miRNAs in TLDA, 96 candidates detected at a high
   frequency in the plasma of healthy individuals and patients (mean Cq <
   20) were evaluated by the Genorm software [[189]56]. According to this
   analysis, miR-423-5p and miR-361-5p were the most stable miRNAs,
   presenting the average expression stability of 0.23 (after serial
   exclusion of the most variable candidates) and pairwise variation of
   0.065. Therefore, these two miRNAs were subsequently used as endogenous
   references to normalize the target miRNAs in the individual TaqMan