Abstract
Objectives:
   Colorectal cancer (CRC) is a prevalent disease characterized by
   significant dysregulation of gene expression. Non-invasive tests that
   utilize microRNAs (miRNAs) have shown promise for early CRC detection.
   This study aims to determine the association between miRNAs and key
   genes in CRC.
Methods:
   Two datasets ([35]GSE106817 and [36]GSE23878) were extracted from the
   NCBI Gene Expression Omnibus database. Penalized logistic regression
   (PLR) and artificial neural networks (ANN) were used to identify
   relevant miRNAs and evaluate the classification accuracy of the
   selected miRNAs. The findings were validated through bipartite
   miRNA-mRNA interactions.
Results:
   Our analysis identified 3 miRNAs: miR-1228, miR-6765-5p, and
   miR-6787-5p, achieving a total accuracy of over 90%. Based on the
   results of the mRNA-miRNA interaction network, CDK1 and MAD2L1 were
   identified as target genes of miR-6787-5p.
Conclusions:
   Our results suggest that the identified miRNAs and target genes could
   serve as non-invasive biomarkers for diagnosing colorectal cancer,
   pending laboratory confirmation.
   Keywords: Colorectal neoplasms, microRNA, smoothly clipped absolute
   deviation, least absolute shrinkage and selection operator, the minimax
   concave penalty, artificial neural networks
Introduction
   Colorectal cancer (CRC) is the third most prevalent cancer globally and
   the second leading cause of cancer-related mortality. In 2020 alone,
   there were approximately 1.93 million new CRC cases and 935 000 deaths.
   Age is a major risk factor for CRC, with most cases occurring in
   individuals aged 50 or older. Other risk factors include a family
   history of CRC, inflammatory bowel disease, genetic mutations, poor
   dietary choices, obesity, and lack of physical activity.^[37]1 -[38]3
   In recent decades, developing countries have experienced an
   epidemiological shift in CRC, marked by a concerning rise in its
   incidence.^ [39]4 CRC has become a major contributor to cancer-related
   mortality worldwide.^ [40]5 The early detection of CRC through
   screening plays a crucial role in enhancing treatment outcomes and
   improving patient survival rates. This is primarily because early-stage
   CRC typically presents no noticeable symptoms. Consequently,
   individuals with early-stage CRC are often diagnosed at later stages,
   when the cancer is more advanced and treatment is more challenging.^
   [41]6 The overall survival of patients is intricately linked to the
   progression of cancer at the time of diagnosis. This is primarily due
   to the fact that the extent of cancer progression upon diagnosis serves
   as a robust predictor of overall survival.^[42]7,[43]8
   Early diagnosis has the potential to significantly impact the
   trajectory of treatment.^ [44]9 Traditional screening methods for CRC,
   such as fecal immunochemical testing (FIT) and guaiac-based fecal
   occult blood test (gFOBT), have become routine practices. However,
   these methods have inherent drawbacks, including low sensitivity and
   the inability to detect CRC in a timely manner. These limitations have
   spurred efforts to develop new screening methods that offer improved
   sensitivity and timely detection.^[45]6,[46]10
   Biomarkers, as molecular signatures, hold the potential to serve as
   more effective tools for cancer screening compared to traditional
   methods.^ [47]6 The dysregulation of genes, both coding and non-coding,
   along with perturbed signaling pathways, plays a substantial role in
   cancer development. Recent research has highlighted the significance of
   leveraging these genes and signaling pathways for early cancer
   detection.^ [48]11 miRNAs have emerged as highly recognized biological
   molecules and genes that intricately regulate the pathways involved in
   the formation of cancer cells, specifically in CRC. These miRNAs engage
   in interactions with proteins and other non-coding RNAs, thereby
   contributing to the pathogenesis of CRC.^ [49]12 Extracellular miRNAs
   have been identified in serum and plasma, rendering them non-invasive
   biomarkers with potential applications in various disease
   conditions.^[50]12,[51]13 Circulating miRNAs in the blood exhibit
   remarkable stability and reproducibility, rendering them a promising
   biomarker for CRC. Biological processes can influence the expression of
   miRNAs, and epigenetic changes can further contribute to alterations in
   miRNA expression specifically in CRC.^[52]14 -[53]16 In recent years,
   the study of Differentially Expressed miRNAs (DEmiRs) has gained
   traction in cancer research. DEmiRs are miRNAs whose expression levels
   significantly differ between normal and disease conditions, such as in
   cancerous vs. healthy tissues.
   One significant challenge in identifying biomarkers associated with
   different clinical outcomes, such as distinguishing normal from
   cancerous tissue samples, is the high-dimensional nature of the data.
   The number of miRNAs often exceeds the sample size, requiring
   specialized methods to address this issue. Penalized regression models,
   including Penalized Logistic Regression (PLR), have garnered
   considerable attention for analyzing this type of data. These models
   enable simultaneous variable selection and coefficient estimation. As a
   result, non-informative miRNAs receive close to zero estimations, while
   the remaining miRNAs in the model are associated with the outcome and
   can reliably detect CRC.
   In this study, we employed PLR with 3 different penalties: Smoothly
   Clipped Absolute Deviation (SCAD), Least Absolute Shrinkage and
   Selection Operator (LASSO), and the Minimax Concave Penalty (MCP), to
   identify miRNAs related to CRC. The primary objective of this article
   was to identify miRNAs capable of detecting CRC at an early stage. By
   leveraging systems biology and data mining techniques, we aimed to
   determine non-invasive biomarkers with high accuracy, facilitating
   timely treatment through early diagnosis of CRC.
Material and Methods
   The bioinformatics strategy presented in [54]Figure 1 involved the
   utilization of serum microarray datasets to identify miRNAs and key
   genes associated with CRC through systems biology methods. Initially,
   miRNAs were extracted from each sample’s profile and subjected to
   evaluation using PLR. Subsequently, an ANN was developed to assess the
   accuracy of the selected miRNAs. The analysis resulted in the
   identification of Differentially Expressed miRNAs (DEmiRs) and their
   respective target genes. To validate the findings, common genes were
   identified between the target genes and Differentially Expressed Genes
   (DEGs) using bipartite miRNA-mRNA interactions.
Figure 1.
   [55]Figure 1.
   [56]Open in a new tab
   Flow chart of bioinformatics analysis.
   Notably, factors such as age, health status, and patient risk factors
   were not accounted for in this study.
miRNA expression profile dataset
   Two miRNAs and gene expression datasets for CRC were acquired from the
   Gene Expression Omnibus (GEO) repository, namely [57]GSE106817 and
   [58]GSE23878. [59]GSE106817 was generated using the “3D-Human miRNA
   V21_1.0.0” platform ([60]GPL21263) and comprised 4043 samples including
   various disease conditions and healthy individuals. Among these, 115
   samples were from CRC patients, while 2759 samples were from healthy
   individuals. In order to maintain balance, 115 healthy samples were
   randomly selected using R software. The expression levels of 2566
   miRNAs were measured in each sample without any initial screening,
   providing data for subsequent analysis and modeling. Additionally,
   [61]GSE23878, generated with the “Illumina HumanHT-12 V3.0” platform
   ([62]GPL6947), consisted of 59 tissue specimens, including 35 CRC
   samples and 24 normal tissue samples. This dataset was used as a
   validation set to assess key genes identified in the study.
Statistical Analysis
miRNA selection through penalized model
   PLR techniques are a class of statistical learning methods that can be
   used for variable selection. These techniques attach a penalty to the
   objective function of the PLR, which shrinks the estimates of the
   regression coefficients toward zero. In this way, penalized regression
   techniques can simultaneously perform variable selection and
   coefficient estimation. In this study, we used PLR models with (1)
   Smoothly Clipped Absolute Deviation (SCAD) and (2) Least Absolute
   Shrinkage and Selection Operator (LASSO) and (3) the Minimax Concave
   Penalty (MCP) to identify important miRNAs. Briefly, PLR is a shrinkage
   regression model that adds a penalty term to the regression
   coefficients in the likelihood function. The LASSO penalty considers an
   absolute value term for each variable in the likelihood function as the
   penalty term, more specifically. The SCAD penalty is a Smoothly Clipped
   Absolute Deviation penalty that is defined as follows:
   [MATH: p_λ(t)=λ⋅|t|,if|t|≤λ :MATH]
   [MATH: p_λ(t)=−(|t|^2−2aλ|t|+λ^2)/2(a−1),ifλ<|
   mo>t|≤aλ :MATH]
   [MATH: p_λ(t)=((a+1)λ^2)/2,if|
   mo>t|>aλ :MATH]
   Where t is the regression coefficient and λ is the tuning parameter.
   The MCP a concave penalty function used in penalized regression for
   variable selection and coefficient estimation. It is defined as
   follows:
   [MATH:
   p′λ(|βj|
   )=(λ−|βj|a)I(|βj|
   )≤aλ). :MATH]
   We used a 10-fold cross-validation strategy to select the optimal value
   of λ. The value of λ that minimized the Bayesian Information Criterion
   was chosen as the optimal value. The PLR models with the 3 types of
   penalties were repeated 1000 times and the miRNA that were selected at
   least by 2 penalties were considered as miRNA biomarkers. The “grpreg”
   package was used for gene selection in R software version 4.0.2.^[63]17
   -[64]19 The source code used for the analysis is available on GitHub at
   [65]https://github.com/ARGHAREBAGHI.
Artificial neural networks
   The analysis involved utilizing the R package version 4.0.2 software to
   train an ANN. To prepare the data for training, it was normalized using
   the maximum and minimum values. Subsequently, an ANN model was designed
   in the R software package, incorporating the important variables. The
   model parameters were adjusted to construct a disease prediction model,
   taking into account the weight information derived from the expression
   of miRNAs. In this model, the pathogenicity score was computed by
   summing the weighted scores, which were multiplied by the significant
   miRNAs’ disappearance. For gene selection, the “neuralnet” package
   (version 19) within R software version 4.0.2 was employed. To optimize
   the performance of the model, a 10-fold cross-validation strategy was
   employed, allowing for the fine-tuning of hyper-parameters.^[66]20
   -[67]23
miRNA target prediction
   The miRWalk 3.0 online database, available at
   [68]http://mirwalk.uni-hd.de/, is a user-friendly and easily accessible
   resource that provides predictive data obtained through a machine
   learning algorithm. The database prioritizes accuracy, simplicity, and
   up-to-date information to facilitate efficient miRNA research. In the
   context mentioned, miRWalk was utilized as a tool to search for
   predicted target genes of miRNAs.^ [69]24
Protein-protein interaction (PPI) network analysis
   In this study, an interactive network of proteins was employed to
   investigate gene interactions and identify hub genes. The
   protein-protein interaction (PPI) network for the selected genes was
   constructed using the STRING online tool, with an interaction score
   threshold of 0.4. To visualize and analyze the constructed network,
   Cytoscape software version 3.8.2 was utilized. The CytoHubba plugin
   version 1.6 within Cytoscape was employed to evaluate various network
   measures, including Maximum Neighborhood Component (MNC), Maximal
   Clique Centrality (EPC), and DEGREE, to identify the hub genes within
   the network. Furthermore, a Venn diagram was utilized to identify the
   common genes and select the hub genes that appeared consistently across
   the different measures.^[70]25,[71]26
DEGs’ enrichment analyses
   In this study, the function of DEGs was explored through Kyoto
   Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO)
   enrichment analyses. The GO classification system, encompassing
   molecular function (MF), cellular component (CC), and biological
   processes (BP), was utilized to gain insights into the functional
   characteristics of the DEGs. To conduct the functional enrichment
   analysis of the gene list, the Database for Annotation, Visualization,
   and Integrated Discovery (DAVID) program, accessible at
   [72]https://david.ncifcrf.gov, was employed. The analysis involved
   determining significant enrichment of gene functions using an adjusted
   P-value cutoff threshold of <.05.^[73]27,[74]28
Potential miRNA-mRNA interactions
   In this study, DEmiRs were identified between CRC samples and normal
   tissues, considering an adjusted P-value < .05 and |logFC| > 1 as the
   criteria for differential expression. Subsequently, the target genes of
   the DEmiRs were determined using the miRWalk database. To understand
   the miRNA-mRNA regulatory interactions comprehensively, a bipartite
   miRNA-mRNA correlation network was constructed and analyzed using
   Cytoscape version 3.8.2 software. The interaction score threshold of
   0.4 was employed to filter out weak interactions in the network. The
   choice of a bipartite network is appropriate for this study since mRNAs
   and miRNAs do not directly interact with each other. This network
   structure allows mRNAs and miRNAs to be connected solely through their
   interactions with target genes.
Hub gene validation by GEPIA
   The Gene Expression Profiling Interactive Analysis (GEPIA) database
   ([75]http://gepia.cancer-pku.cn/) is a web-based tool designed for fast
   and CHECK FOR PLAGIRISM : customizable analyses using data from The
   Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx)
   projects. In this study, GEPIA was used to validate the expression of
   key hub genes by comparing cancerous and normal tissue samples,
   specifically focusing on colorectal cancer. Differential gene
   expression was analyzed using ANOVA, with statistical significance set
   at P-value < .05 and a fold change greater than 2.
Result
Differentially expression analysis
   The miRNAs expression data series ([76]GSE106817) was utilized to
   identify miRNAs that were DEmiRs, as well as DEGs. In order to validate
   the findings, a total of 3763 DEGs were identified by applying the
   criteria of an adjusted P-value < .05 and |logFC| > 1. It was observed
   that these genes overlapped with the DEGs identified in the primary
   data series ([77]GSE23878), which was utilized for comparison.
Identification of differentially expressed miRNAs
   The miRNA expression data was utilized to train the PLR model, as
   outlined in the Methods section, with the aim of identifying DEmiRs
   associated with CRC diagnosis. The PLR model used the binary outcome
   variable, where 1 represented CRC and 0 denoted healthy controls. In
   [78]Table 1, we present the names of the 14 selected DEmiR profiles and
   their respective frequencies, determined over 1000 repetitions using
   LASSO, SCAD, and MCP methods. LASSO selected 11 miRNA profiles, while
   SCAD and MCP identified 5 and 2 miRNA profiles, respectively. Notably,
   3 miRNAs (miR-6765-5p, miR-6787-5p, and miR-1228) were confirmed as
   significant in at least 2 PLR methods.
Table 1.
   Frequencies of the selected miRNA over 1000 repetitions using penalized
   logistic regression by SCAD, MCP, and LASSO penalties.
   miRNA              SCAD MCP  LASSO Total accuracy
   MIMAT0005582       1    1000 1000  .966
   MIMAT0019776       1000            .983
   MIMAT0027430       1         1000  .966
   MIMAT0027436       961             .966
   MIMAT0027474       1         1000  .966
   MIMAT0015079            305        .759
   MIMAT0003320                 1000  .845
   MIMAT0004970                 1000  .966
   MIMAT0005922                 1000  .931
   MIMAT0015075                 389   .879
   MIMAT0018949                 1000  .931
   MIMAT0022259                 1000  .966
   MIMAT0019776                 1000  .931
   MIMAT0027392                 1000  .931
   No. selected miRNA 5    2    11
   [79]Open in a new tab
   The results of the univariate PLR analysis for the selected miRNAs are
   presented in [80]Table 2, which includes the regression coefficient,
   standard error of the coefficient, odds ratio (OR), and corresponding
   P-values. Notably, the results demonstrate that all 13 miRNAs exhibited
   statistically significant associations with the diagnosis of CRC.
Table 2.
   Results of fitting univariate logistic regression for the selected
   genes using penalized logistic regression by SCAD, MCP, and LASSO
   penalties.
   miRNA SCAD MCP LASSO
   β (S.E) OR P-value β (S.E) OR P-value β (S.E) OR P-value
   MIMAT0005582 10.95 (2.80) 56954 <.0001 10.95 (2.80) 56954 <.0001 10.95
   (2.80) 56954 <.0001
   MIMAT0019776 −3.04 (.56) .048 <.0001
   MIMAT0027430 12.23 (2.46) 204843 <.0001 12.23 (2.46) 204843 <.0001
   MIMAT0027436 1.81 (.34) 6.11 <.0001
   MIMAT0027474 −5.84 (1.24) .003 <.0001 −5.84 (1.24) .003 <.0001
   MIMAT0015079 −1.44 (.28) .237 <.0001
   MIMAT0003320 −2.01 (.26) .134 <.0001
   MIMAT0004970 −3.36 (.59) .035 <.0001
   MIMAT0005922 9.99 (1.60) 21807 <.0001
   MIMAT0015075 −1.74 (.21) .176 <.0001
   MIMAT0018949 −4.69 (.59) .009 <.0001
   MIMAT0022259 −2.27 (.40) .103 <.0001
   MIMAT0019776 −3.04 (.56) .048 <.0001
   MIMAT0027392 −6.57 (1.38) .001 <.0001
   [81]Open in a new tab
   [82]Table 2 presents the outcomes of unpenalized logistic regression
   for estimating the regression coefficients of the selected miRNAs. The
   table reveals that certain miRNAs exhibited a positive association with
   CRC, whereas others displayed a negative association with CRC.
     * • Positively associated miRNAs: miR-1228, miR-6765-5p, miR-6768,
       and miR-1268. This means that an increase in the expression of
       these miRNAs increases the chance of CRC.
     * • Negatively associated miRNAs: miR-1343, miR-6787-5p, miR-650,
       miR-920, miR-3190, miR-4433, miR-5100, miR-1343, and miR-6746. This
       means that a decrease in the expression of these miRNAs increases
       the chance of CRC.
   The miRNAs identified through PLR were employed as inputs for an ANN
   model to develop classifiers capable of diagnosing patients. The ANN
   model was designed with a 1:1:1 architecture, comprising a single input
   layer, 1 hidden layer, and 1 output layer. The activation functions
   used in the model were sigmoid for the input layer, hyperbolic tangent
   for the hidden layer, and linear for the output layer.
   The input variables for the ANN model were the miRNA expression values
   that were chosen in the preceding step. The model’s output was a binary
   value, either 0 or 1, enabling the classification of patients as
   non-cancerous or cancerous, respectively. This classification holds the
   potential for early cancer detection, offering valuable diagnostic
   capabilities.
   The outcomes of the ANN model are displayed in the final column of
   [83]Table 1. Notably, a majority of the miRNAs exhibit a total accuracy
   greater than 90%, underscoring their significant potential for cancer
   detection.
Identification of key genes using PPI network analysis
   In this study, an analysis was conducted using the PPI (Protein-Protein
   Interaction) network to explore the 3763 DEGs. The resulting PPI
   network consisted of 443 nodes and 8314 edges, as depicted in
   [84]Figure 4. Additionally, the Venn diagram analysis of the 10 top
   genes, using the 3 methods, resulted in the identification of 7 hub
   genes: CDC20, MAD2L1, UBE2C, CDK1, AURKB, CCNA2, and TOP2A. These
   findings are illustrated in [85]Figure 2.
Figure 4.
   [86]Figure 4.
   [87]Open in a new tab
   Bipartite mRNA-miRNA subnetwork for CRC. Blue diamonds consist of hub
   genes between CRC and normal tissues. Green diamonds consist of 2 hub
   genes targeting miR-6787. Cytoscape v.3.8.2 was used to visualize the
   network.
Figure 2.
   Figure 2.
   [88]Open in a new tab
   The overlap between the top 10 predicted target genes, ranked by MNC,
   EPC, and DEGREE illustrated in a Venn diagram. The number 7 in the
   image’s center describes the 3 groups’ commonalities.
Functional and pathway enrichment analysis
   The results of the GO study, biological processes (BP), cellular
   components (CC) and molecular functions (MF) were significantly
   enriched:
     * • Top 10 terms BP: rRNA processing, cell division, translation,
       mitochondrial translation, mitotic spindle organization, protein
       folding, cytoplasmic translation, ribosomal large subunit
       biogenesis, proteasomal ubiquitin-independent protein catabolic
       process, mitotic sister chromatid segregation.
     * • Top 10 terms CC: nucleoplasm, cytosol, membrane, extracellular
       exosome, cytoplasm, nucleus, endoplasmic reticulum, mitochondrion,
       chromosome, ribosome.
     * • Top 10 terms MF: protein binding, RNA binding, identical protein
       binding, structural constituent of ribosome, cadherin binding,
       enzyme binding, chaperone binding, ATPase activity, snoRNA binding,
       unfolded protein binding.
   On other hand, KEGG pathway analysis indicated the following pathways
   involved: Nucleocytoplasmic transport, Proteasome, DNA replication,
   Spliceosome, Glutathione metabolism, Ribosome, Protein processing in
   endoplasmic reticulum, p53 signaling pathway ([89]Figure 3).
Figure 3.
   [90]Figure 3.
   [91]Open in a new tab
   Gene Ontology (GO) and KEGG pathway enrichment analyses were performed
   for the module genes. The top 10 GO terms in Biological Process (BP),
   Molecular Function (MF), and Cellular Component (CC), along with
   significant KEGG pathways, are presented.
BiPartite miRNA and mRNA network analysis
   mRNA-miRNA network analysis is a valuable computational approach
   utilized for understanding the underlying mechanisms contributing to
   CRC pathogenesis. In this particular study, the MiRwalk database was
   employed to identify target genes of DEmiRs. By assessing the overlap
   between the identified miRNA targets and the validated DEmiGs, key hub
   genes such as CDK1 and MAD2L1 were identified as both targets of
   mir-6787 and pivotal players in CRC. Notably, the expression of
   miR-6787-5p was significantly downregulated in cancer tissue samples
   compared to normal tissue samples, with CDK1 and MAD2L1, being
   identified as its target genes. These findings highlight the intricate
   regulatory network involving miRNAs and their target genes in CRC
   ([92]Figure 4).
Gene expression analysis of the central hub genes
   We used the GEPIA database to analyze the expression of 2 candidate
   genes in cancer tissues and normal samples from the TCGA-COAD dataset.
   The results revealed that CDK1 and MAD2L1 were both significantly
   upregulated in tumors in comparison to normal tissues presented in
   [93]Figure 5.
Figure 5.
   [94]Figure 5.
   [95]Open in a new tab
   Validation of hub genes in colorectal cancer using TCGA-COAD. Two hub
   genes including CDK1, and MAD2L1 were significantly upregulated in CRC
   tissues compared to normal tissues in TCGA- COAD data.
Discussion
   CRC is a leading cause of global mortality, making early detection
   vital for improved treatment response and reduced mortality rates.
   Biomarkers play a critical role in CRC diagnosis and treatment, and
   bioinformatics tools facilitate the identification of CRC-related
   biomarkers and molecular interactions.^[96]29 -[97]31 In this study, a
   bioinformatics approach was employed, utilizing 2 databases,
   [98]GSE106817 and [99]GSE23878, to identify DEmiRs and hub genes
   associated with the progression of CRC. The analysis of these databases
   enabled the identification of specific miRNAs and genes that play a
   crucial role in CRC progression. By investigating the expression
   patterns and interactions of these DEmiRs and hub genes, valuable
   insights into the molecular mechanisms underlying CRC development and
   progression can be gained. miRNAs such as miR-6765-5p, miR-6787-5p, and
   miR-1228 were selected based on their intersection in LASSO, MCP, and
   SCAD regression methods. The overall accuracy of these 3 miRNAs
   exceeded 95%, underscoring their potential as promising biomarkers for
   stable plasma level determination in CRC patients. The study also
   demonstrated the utility of an ANN employing 3 different penalty
   functions to effectively identify miRNAs significantly associated with
   CRC.
   miRNAs have emerged as key regulators in cancer biology, functioning as
   both tumor suppressors and oncogenes depending on their expression
   patterns and the cancer type. These small non-coding RNAs play a
   pivotal role in a range of cancer-related processes, including
   initiation, malignant transformation, progression, and metastasis.
   Recent research has demonstrated that certain cancers have unique miRNA
   signatures, making them valuable diagnostic and prognostic markers as
   well as potential therapeutic targets. Advances in techniques such as
   microarray analysis, RT-PCR, and next-generation sequencing have
   facilitated the profiling of miRNAs in various cancer types, even from
   archived tumor tissues. Emerging detection methods, such as
   nanoparticle-based and hybridization chain reaction (HCR)
   amplification, aim to enhance miRNA detection sensitivity. miRNAs are
   also stable in body fluids, making them promising candidates for
   non-invasive cancer diagnostics. Their dysregulation in cancer cells,
   influenced by both genetic and epigenetic factors, highlights their
   role in tumorigenesis, and disruptions in the miRNA biogenesis process
   could significantly contribute to cancer development.^[100]32,[101]33
   In CRC, miR-1228 is often downregulated. This downregulation is
   associated with poor prognosis. The exact role of miR-1228 in CRC is
   not fully understood, but it is thought to play a role in tumor growth
   and progression. miR-1228 targets a number of genes that are involved
   in cell proliferation, angiogenesis, and apoptosis. By targeting these
   genes, miR-1228 helps prevent cancer cells from growing and spreading.
   Numerous studies have shown that miR-1228 plays an essential role in
   the proliferation of cancer cells and can be used for early detection
   of cancer.^[102]34,[103]35 miR-1228 regulates stress-induced cellular
   apoptosis by targeting the MOAP1 protein.^ [104]36 In another report,
   the findings showed that miR-1228 has a role in metabolism, maintaining
   cell survival, regulating apoptosis, stimulus- response, and survival.
   However, some studies have investigated the target gene miR-1228 for
   CRC.^[105]37,[106]38 LRP1 is the target gene of miR-1228 and is located
   on chromosome 12.^[107]39,[108]40 This gene mainly plays a role in
   basic metabolism and cell structure, which is a key component of
   maintaining cell survival. In past research, the expression level of
   miR-1228-3p has been checked in drug resistance of breast cancer,
   chronic heart failure, endometrial carcinoma, prostate cancer, CRC, and
   cancer secretions. The expression level of miR-1228-3p is stable in
   blood circulation and can be used as a biomarker.^ [109]41 In a study
   by Yang et al,^ [110]37 it was revealed that miR-1228 remained
   unaffected by surgical treatment, indicating its suitability as an
   optimal reference gene for treatment studies. Additionally, the
   circulating level of miR-1228 was found to be independent of tumor
   stage.
   In CRC, miR-6787-5p is often downregulated. This downregulation is
   associated with poor prognosis. The exact role of miR-6787-5p in CRC is
   not fully understood, but it is thought to play a role in tumor growth
   and progression. miR-6787-5p targets a number of genes that are
   involved in cell proliferation, angiogenesis, and apoptosis. By
   targeting these genes, miR-6787-5p helps prevent cancer cells from
   growing and spreading.^ [111]42 The exact role of miR-6765-5p in CRC is
   not fully understood.
   Bioinformatics analysis was then performed using the MNC, EPC, and
   DEGREE tools in Cytoscape software. The functional and biological
   interactions between the DEGs were investigated using Gene Ontology
   (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. In
   the present study, the nucleoplasm was identified as one of the
   significant enrichment pathways of DEGs in CRC. Network analysis
   demonstrated that 4 genes of DEGs are involved in this pathway. These
   findings suggest that the DEGs are involved in a number of biological
   processes that are important for the pathogenesis of CRC. Further
   research is needed to confirm these findings and to identify new
   diagnostic targets for CRC.^ [112]43 Therapeutic modulation of cell
   membrane lipid composition and organization is an emerging field with
   potential applications in a variety of diseases, including cancer.
   Research has shown that this approach could be used to treat a variety
   of diseases, including cancer.^ [113]44 It has been shown that GO terms
   such as rRNA processing,^ [114]45 translation,^ [115]46 Mitochondrial
   translation,^ [116]47 mitotic spindle organization,^ [117]48
   extracellular exosome.^ [118]49 and protein binding^ [119]50 were
   associated with CRC
   By using miRNA-mRNA expression profiling, CDK1 and MAD2L1 were
   identified as the most important genes playing an important role in
   CRC. The CDK1 gene encodes a protein known as cyclin-dependent kinase
   1, which belongs to a family of enzymes involved in the regulation of
   the cell cycle. The cell cycle is a fundamental process responsible for
   cell growth, division, and the generation of new cells. In CRC, the
   CDK1 gene can undergo mutations, resulting in abnormal functioning.
   These mutations can lead to excessive production of the
   cyclin-dependent kinase 1 protein. Scientific investigations have
   demonstrated that dysregulation of CDK1 accelerates tumor growth and
   uncontrolled proliferation of cancer cells.^[120]51,[121]52 Zhang et
   al^[122]53 revealed that CDK1, in addition to being overexpressed and
   sensitive to apoptosis in CRC cells, plays a crucial role in
   controlling the cell cycle and contributes to the development of
   colorectal tumors through an iron-regulated signaling axis. Previous
   studies have established a link between CDK1 overexpression and the
   development of colorectal, liver, and lung cancers, ultimately
   impacting patient survival.^ [123]54
   MAD2L1 plays a crucial role as a tumor suppressor gene in regulating
   the cell cycle. Mutations in the MAD2L1 gene can disrupt the normal
   control of cell growth and division, which can contribute to the
   development of cancer. Deletion of the MAD2L1 gene has been found to
   impede the growth of CRC cells.^[124]55,[125]56 Venugopal et al^
   [126]57 revealed that there is a higher expression of MAD2L1 in CRC
   cell lines and tissues, and this overexpression has been associated
   with poor prognosis. Li et al^ [127]55 revealed that MAD2L1 gene has
   demonstrated potential as a biomarker for colorectal cancer, according
   to previous studies.
   The present study introduced a novel set of gene expression profiles
   that are predictive of CRC patients using a miRNA-mRNA model. This
   model provides a different perspective than the traditional
   proportional point of view.
Conclusions
   This study identified 3 novel miRNAs (miR-1228, miR-6765-5p, and
   miR-6787-5p) that are potentially associated with CRC and could serve
   as biomarkers. Additionally, the target genes related to these miRNAs,
   namely CDK1 and MAD2L1, were found to be upregulated in CRC compared to
   normal tissues. The miRNAs associated with the hub genes in the
   mRNA-miRNA bipartite network played a pivotal role in CRC. However,
   further molecular studies are warranted to validate the role of these
   genes in CRC tumorigenesis.
List of Abbreviations
   Abbreviation Definition
   miRNAs       microRNAs
   CRC          Colorectal cancer
   ANN          Artificial neural networks
   PLR          Penalized logistic regression
   SCAD         Smoothly clipped absolute deviation
   LASSO        Least absolute shrinkage and selection operator
   MCP          The minimax concave penalty
   GEO          Gene Expression Omnibus
   DEmiRs       Differentially Expressed miRNAs
   PPI          Protein-protein interaction
   DEGs         Differentially expressed genes
   [128]Open in a new tab
Acknowledgments