Abstract

   SARS-CoV-2, the virus that causes COVID-19, is a current concern for
   people worldwide. The virus has recently spread worldwide and is out of
   control in several countries, putting the outbreak into a terrifying
   phase. Machine learning with transcriptome analysis has advanced in
   recent years. Its outstanding performance in several fields has emerged
   as a potential option to find out how SARS-CoV-2 is related to other
   diseases. Idiopathic pulmonary fibrosis (IPF) disease is caused by
   long-term lung injury, a risk factor for SARS-CoV-2. In this article,
   we used a variety of combinatorial statistical approaches, machine
   learning, and bioinformatics tools to investigate how the SARS-CoV-2
   affects IPF patients’ complexity. For this study, we employed two
   RNA-seq datasets. The unique contributions include common genes
   identification to identify shared pathways and drug targets, PPI
   network to identify hub-genes and basic modules, and the interaction of
   transcription factors (TFs) genes and TFs–miRNAs with common
   differentially expressed genes also placed on the datasets.
   Furthermore, we used gene ontology and molecular pathway analysis to do
   functional analysis and discovered that IPF patients have certain
   standard connections with the SARS-CoV-2 virus. A detailed
   investigation was carried out to recommend therapeutic compounds for
   IPF patients affected by the SARS-CoV-2 virus.

   Keywords: SARS-CoV-2, COVID-19, machine learning, idiopathic pulmonary
   fibrosis, gene ontology, differentially expressed genes

Introduction

   Coronaviruses have various variants that can infect humans and animals
   [[40]1]. The variants of this virus are responsible for various
   diseases, ranging from common fever and cold cough to more serious
   illnesses such as Severe Acute Respiratory Syndrome (SARS) and Middle
   East Respiratory Syndrome (MERS) [[41]2]. Severe Acute Respiratory
   Syndrome Coronavirus 2 (SARS-CoV-2) is a new type of coronavirus, which
   got a lot of attention at the end of 2019 because it was a new variant
   of coronavirus that had never been observed in humans previously.
   Coronavirus Disease 2019 (COVID-19) is the name of the new coronavirus,
   which was first discovered in Wuhan, China, in December 2019 [[42]3].
   Chinese officials reported 44 instances of pneumonia with unknown
   causes to the World Health Organization (WHO) between 31 December 2019,
   and 3 January 2020 [[43]4]. The first fatality of COVID-19 occurred in
   Wuhan on 9 January 2020, while the first death outside of China
   occurred in the Philippines on 1 February 2020. Within a few days, the
   disease had spread worldwide and was out of control in many nations
   [[44]4]. On 30 January 2020, the WHO designated the virus as a Public
   Health Emergency (PHE) of worldwide concern [[45]5]. This virus was
   declared a pandemic by the same organization on 11 March 2020, after a
   total of 4500 deaths were reported in 30 countries and territories
   throughout the world [[46]5]. Italy surpassed China, with the highest
   reported death cases of this virus reported on 19 March 2020 [[47]4].
   The USA has surpassed both China and Italy as the country with the
   highest confirmed virus cases on 26 March 2020 [[48]4]. On a global
   basis, the bloodiest week was 13–19 April 2020, when nearly 7460 deaths
   were officially reported each day by this virus. The pandemic’s
   epicenter migrated to Latin America and the Caribbean in June 2020.
   Between 15 July 2020 and 15 August 2020, the region had an average of
   almost 2500 deaths per day. With over 78 000 cases on 30 August 2020,
   India surpassed the US record for the highest cases in a single day,
   and a second wave hit India on 9 April 2021. There were 281 808 270
   confirmed cases from December 2020 to December 2021, with 5 411 75
   deaths by this virus [[49]6]. On 26 November 2021, WHO designated a new
   variant (B.1.1.529) of SARS-CoV-2 named Omicron in South Africa. On 26
   November, WHO designated Omicron as a variant of concern. The first
   COVID-19 case associated with the Omicron variant was reported in the
   USA on 1 December 2021, and at least one Omicron variant had been
   detected in 22 states as of 8 December 2021. Recently, this new
   variation of this virus has spread worldwide and is out of control in
   several countries, putting the outbreak into a terrifying phase.

   SARS-CoV-2 is a single-stranded RNA virus that is positive in a sense.
   The Spike (S), envelope (E), membrane (M), and nucleocapsid (N)
   proteins are the four proteins found in SARS-CoV-2. Spike proteins are
   responsible for attaching to a host cell’s membrane. Idiopathic
   pulmonary fibrosis (IPF) disease is a long illness marked by the
   thickening and stiffening of lung tissue associated with scar tissue
   formation [[50]7]. In this condition, the sponge or meaty section of
   the lung becomes scarred or fibrotic. It is a slow-progressing, highly
   fatal disease that affects roughly 80% of people within 3–5 years of
   diagnosis [[51]7]. Pulmonary fibrosis affects people in different ways.
   Various common, easily curable diseases might cause similar symptoms.
   Shortness of breath and a persistent dry, hacking cough are the most
   common indications and symptoms of IPF. Many impacted people also
   notice a decrease in appetite and weight loss over time. Due to a lack
   of oxygen, some people with IPF acquire enlarged, rounded tips on their
   fingers and toes (clubbing) [[52]8]. IPF’s cause is not understood. The
   following are some of the most common risk factors for IPF: Almost all
   patients with IPF are over 50 years. Genetics, up to 20% of patients
   with IPF have another family member who suffers from the condition.
   Approximately 75% of people with IPF smoke now or have in the past.
   Gastroesophageal reflux or heartburn affects about 75% of people with
   IPF. Male patients account for roughly 65% of IPF patients [[53]9].
   Radiation treatments to the chest or the use of certain chemotherapy
   medications have been shown to enhance the risk of pulmonary fibrosis
   [[54]10, [55]11]. SARS-CoV-2 contains spike protein, which has a
   greater interaction with ACE2, and IPF patients have a lot of this
   enzyme, confirming IPF as a risk factor for this disease [[56]12,
   [57]13]. These investigations have revealed several linkages between
   IPF and COVID-19, which raises concerns.

Contributions

   In this article, we used a variety of combinatorial statistical
   approaches, machine learning algorithms, and bioinformatics tools to
   investigate how the SARS-CoV-2 virus affects IPF patients’ complexity.
   The following are the main contributions of this article:
     * the experiments have been conducted using a real-time dataset. We
       have observed common gene identification by machine learning
       algorithms and various bioinformatics analyses to identify shared
       pathways and drug targets;
     * the Protein-Protein Interaction (PPI) network was examined to
       discover hub-genes and modules. The interactions of transcription
       factors (TFs) genes and TFs-miRNAs with common differentially
       expressed genes (DEGs) were also discovered. Furthermore, we used
       gene ontology (GO) analyses and molecular pathway analyses to do
       functional analysis and discovered that IPF patients have certain
       common connections with SARS-CoV-2 infection;
     * a comprehensive analysis has been conducted to suggest drug
       molecules for IPF patients with SARS-CoV-2 infections. In the
       context of molecular-based knowledge and several pathway-based
       analyses, which illustrate the utility of the biological system for
       both SARS-CoV-2 and IPF; and
     * finally, the current challenges and future research directions of
       integration and interplay between machine learning and
       bioinformatics have been discussed.

   The remainder of this study is organized in the following manner. The
   ‘Materials and methods’ section begins with a full description of the
   dataset with preprocessing and an overview of selected methodology. The
   ‘Result analysis’ section discusses the evaluation and interpretation
   of experimental outcomes for these methodologies. In addition,
   ‘Discussion’ section contains a lengthy explanation and discusses some
   application areas for scientific society. Finally, ‘Conclusions’
   section contains an overview of the findings and possible future
   directions.

Materials and methods

   In this section, we have thoroughly detailed the overview of the
   analysis, including the dataset transformation process and various
   transcriptome analyses.

Overview of approach

   We applied machine learning and transcriptomic analysis to identify
   shared associations between SARS-CoV-2 and IPF by employing selected
   datasets shown in the block diagram in [58]Fig. 1. The machine learning
   approaches have been used to identify common DEGs of the selected
   datasets. Furthermore, these shared or common DEGs were used to
   construct gene–disease association networks, identify GO, pathways, PPI
   network, hub-genes, transcription factor (TF)–gene, TF–miRNA, and
   identify candidate drugs.

Figure 1:

   [59]Figure 1:
   [60]Open in a new tab

   The complete workflow for the current investigation. Two types of
   samples (control cells, affected cells) were collected from
   SARS-CoV-2-infected lung epithelial cells and both are included in the
   [61]GSE147507 dataset. The [62]GSE52463 dataset contains IPF-affected
   lung samples. Common DEGs were identified from both the datasets using
   machine learning technique. From the common DEGs, GO identification,
   pathway analysis, PPIs network, TF–gene analysis, TF–miRNA analysis,
   and hub-gene identification were designed and based on those analysis
   drug molecule identification was performed.

Dataset analysis

   This section has performed a series of operations on the dataset
   without changing its properties. Also, we have thoroughly explained the
   overview of the selected dataset.

Dataset description

   We have identified common genetic interrelationships between SARS-CoV-2
   and IPF using Ribonucleic Acid Sequencing (RNA-Seq) datasets from the
   Gene Expression Omnibus (GEO) collection of the National Center for
   Biotechnology Information (NCBI) directory [[63]14, [64]15]. The
   transcriptional responses to SARS-CoV-2 infection are contained in the
   SARS-CoV-2 dataset with GEO accession ID [65]GSE147507 and GEO platform
   ID [66]GPL18573. In contrast, the transcriptome analysis reveals
   differential splicing events in IPF lung tissue that are contained in
   the IPF dataset GEO accession [67]GSE52463 and GEO platform ID
   [68]GPL11154 [[69]16]. SARS-CoV-2-affected Lung Epithelial Cell (LECs)
   are found in the [70]GSE147507, while IPF-affected lung tissues are
   found in the [71]GSE52463 dataset. The [72]GSE147507 dataset contains
   two types of samples (control and SARS-CoV-2-affected cells) taken from
   SARS-CoV-2-affected LECs, while the [73]GSE52463 dataset has two types
   of samples (control and IPF-affected cells). Metadata and count data
   are also included in both databases. The RNA sequence was extracted
   from the [74]GSE147507 dataset using high-throughput sequencing
   technologies on the Illumina NextSeq 500 (Homo sapiens) platform
   [[75]17]. The IPF dataset, on the other hand, comprises mRNA sequencing
   of eight IPF-affected lung tissues and seven control lung tissue
   samples, all of which were sequenced on the Illumina Hi-Seq 2000
   (H.sapiens) platform utilizing high-throughput sequencing technology
   [[76]18]. [77]Table 1 lists the datasets used in this study and their
   geo-features and sequencing methods.

Table 1:

   Contents of the datasets
   Properties SARS-CoV-2 IPF
   GEO Accession [78]GSE147507 [79]GSE52463
   GEO Platform [80]GPL18573 [81]GPL11154
   Organisms Homo sapiens Homo sapiens
   Assay type RNA-Seq RNA-Seq
   Type of the datasets Transcriptional response to SARS-CoV-2 infection
   In IPF lung tissue, transcriptome analysis indicates distinct splicing
   events.
   Instrument Illumina NextSeq 500 Illumina HiSeq 2000
   Total GEO samples 110 15
   Experiment type High-throughput sequencing for expression profiling
   High throughput sequencing for expression profiling
   [82]Open in a new tab

Data preparation

   To achieve optimal performance, it is necessary to clean and prepare
   the dataset before applying machine learning methods. Data preparation
   is generally done by removing unnecessary features, checking the
   variation of independent features, converting non-numerical features,
   removing outliers, and replacing missing values if they exist. The two
   fundamental steps apply during the data preparation process. The first
   is data preprocessing, and the second is the data transformation step.

Data preprocessing

   This dataset originates from multiple heterogeneous sources. Due to its
   vast size, this dataset is highly susceptible to missing and noisy
   data. This section discusses the essential steps in data preprocessing:
   data cleaning and data integration.
     * Data Cleaning: First, we applied various techniques to remove noise
       and clean inconsistencies in the metadata and countdata from both
       datasets. For example, Rosner’s test for outliers checking and the
       predictive mean matching method for imputing missing values. Then,
       to apply machine learning techniques, we converted the qualitative
       values into quantitative values by applying various techniques
       (e.g. Biobase (version 2.30.0), GEOquery (version 2.40.0), limma
       (version 3.26.8), and Bioconductor) packages of the R programming
       language, which is a free, open-source, and open-development
       software project for the analysis and comprehension of genomic
       data.
     * Data Integration: To improve the accuracy, the data integration
       technique helped us reduce and avoid redundancies in the resulting
       dataset. This dataset originates from multiple heterogeneous
       sources. So, it is essential to check both datasets for redundancy
       and correlation analysis. This analysis has measured how strongly
       one feature implies the other. [83]Figure 2a and b shows the
       correlation between different features for the two datasets,
       [84]GSE147507 and [85]GSE52463, respectively. For our analysis, we
       have evaluated the correlation between all the features using the
       following Pearson’s product-moment coefficient equation.

   [MATH:
   <mi>r</mi><mo>=</mo><mfrac><mrow><msubsup><mrow><mo>∑</mo></mrow><mrow>
   <mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>n</mi></mrow></msubsup><
   mo> </mo><mfenced open="(" close=")"
   separators="|"><mrow><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mro
   w></msub><mo>-</mo><mrow><mover
   accent="true"><mrow><mi>x</mi></mrow><mo>↼</mo></mover></mrow></mrow></
   mfenced><mfenced open="(" close=")"
   separators="|"><mrow><msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi></mro
   w></msub><mo>-</mo><mrow><mover
   accent="true"><mrow><mi>y</mi></mrow><mo>↼</mo></mover></mrow></mrow></
   mfenced></mrow><mrow><msqrt><msubsup><mrow><mo>∑</mo></mrow><mrow><mi>i
   </mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>n</mi></mrow></msubsup><mo> <
   /mo><msup><mrow><mfenced open="(" close=")"
   separators="|"><mrow><msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mro
   w></msub><mo>-</mo><mrow><mover
   accent="true"><mrow><mi>x</mi></mrow><mo>↼</mo></mover></mrow></mrow></
   mfenced></mrow><mrow><mn>2</mn></mrow></msup></msqrt><msqrt><msubsup><m
   row><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><
   mi>n</mi></mrow></msubsup><mo> </mo><msup><mrow><mfenced open="("
   close=")"
   separators="|"><mrow><msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi></mro
   w></msub><mo>-</mo><mrow><mover
   accent="true"><mrow><mi>y</mi></mrow><mo>↼</mo></mover></mrow></mrow></
   mfenced></mrow><mrow><mn>2</mn></mrow></msup></msqrt></mrow></mfrac>
   :MATH]

Figure 2:

   [86]Figure 2:
   [87]Open in a new tab

   The correlation analysis between different features for the two
   datasets (a) [88]GSE147507 and (b) [89]GSE52463. This analysis has
   measured how strongly one feature implies the other.

   where ā is the meaning of x variable and
   [MATH: <mrow><mover
   accent="true"><mrow><mi>y</mi></mrow><mo>‾</mo></mover></mrow> :MATH]
   is the meaning of y variable,
   [MATH: <msub><mrow><mi>x</mi></mrow><mrow><mi>i</mi></mrow></msub>
   :MATH]
   and
   [MATH: <msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi></mrow></msub>
   :MATH]
   are values in tuple i.

Data transformation

   We applied this processing step to achieve more efficient resulting
   processes and easily understand the patterns. Some selected features
   have larger values than others, which leads to incorrect performance.
   We have implemented these strategies to scale the selected feature
   values within a range between [0.0] and [1.0] without changing the
   characteristics of the data.

   N=(X−Xmin)/(Xmax−Xmin)

   where N is the output normalized values, X is an original value and
   Xmax and Xmin is the maximum and minimum values of the feature,
   respectively.

   As shown in the following equation, a technique called minimum–maximum
   normalization has been used to scale the selected feature values within
   the range. We have also evaluated the density plot for both datasets.
   The density plot shows the smooth distribution of the points along the
   numeric axis. The peaks of the density plot are at the locations where
   there is the highest concentration of points. [90]Figure 3a and b shows
   density plots for the two datasets, [91]GSE147507 and [92]GSE52463,
   respectively.

Figure 3:

   [93]Figure 3:
   [94]Open in a new tab

   The density plots of the two datasets (a) [95]GSE147507 and (b)
   [96]GSE52463. The density plot shows the smooth distribution of the
   points along the numeric axis. The peaks of the density plot are at the
   locations where there is the highest concentration of points.

Spotting DEGs and shared DEGs between SARS-CoV-2 and IPF

   A gene is differentially expressed when a statistical discrepancy
   exists between several test settings during the transcription phase
   [[97]19]. The major purpose of this study is to identify DEGs that are
   shared between the [98]GSE147507 and [99]GSE52463 datasets. The DESeq2
   and lima packages of the R programming language were used to access
   data generated by microarray analysis. DEGs from both datasets were
   identified using a machine learning method. Listing 1 shows the applied
   procedure of the machine learning algorithm to identify DEGs from both
   datasets. Across all datasets, significant DEGs were identified using
   cutoff criteria (P-value < 0.05 and |log Fc| ≥ 1.00). The shared DEGs
   of the [100]GSE147507 and [101]GSE52463 datasets were found using the
   online VENN analysis platform JVENN tool [[102]20].

   Listing 1. The procedure of the machine learning algorithm to identify
   DEGs from both datasets.

   1 Input: RNA-Seq dataset

   2 Output: Identification of differentially expressed genes (DEGs)

   3

   4 outputFileName=open( “result.csv” , ”a” )

   5

   6 datasetName=os.listdir(folderPath)

   7 for i in range(len(datasetName)):

   8     fileName=open(“GSE”+str(name[i])+”.csv”)

   9     fileName.next()

   10     for dataset in fileName:

   11 #Extract countdata and store in a matrix

   12 datasetName=os.listdir(folderPath)

   13 countData = read.csv((“GSE”+str(name[i])+ ”filtered_countdata.csv
   ”):

   14         countDataFrame = data.frame(countData)

   15         countDataFrameRound=mutate(across(where(is. numeric ),round
   , 3))

   16 #Extract metadata and store in a matrix

   17 metaData = read.csv((“GSE”+str(name[i])+ ”filtered_metadata.csv ”):

   18     countDataFrame = data.frame(metaData)

   19 #Analyze count data using DESEQ2

   20 applyDESeq = DESeqDataSetFromMatrix(countData=countDataFrameRound,
   21 colData=metaDataFrame, design=∼treatment, tidy= TRUE )

   22 applyDESeq = DESeq(applyDESeq)

   23 result = results(applyDESeq)

   24

   25 #Result Analysis

   26 #Check and Omit the null value

   27 checkNull = is.na(result)

   28 resultsOmitNa = na.omit(result)

   29

   30 #Count the up regulated gene

   31 resultOmitNaFilterUp = filter(resultOmitNa, log2FoldChange

   32                           > 1 &Padj < 0.05)

   33 #Count the down regulated gene

   34 resultOmitNaFilterDown = filter(resultOmitNa, log2FoldChange

   35                             <-1 & P adj < 0.05)

   36 #ABS logFC value and setup cuttoff criteria for P adj value

   37 resultFinal = filter(resultOmitNa, abs(log2FoldChange)

   38                 > 1 &Padj < 0.05)

   39 outputFileName.write(resultFinal)

   40 outputFileName.close()

Identifying of GO and molecular pathway

   Enrichment analysis of gene set is a technique for identifying DEGs
   linked to a biological process or molecular function [[103]21]. GO is a
   classification system that divides genes into biological mechanisms,
   molecular functions, and cellular components [[104]22]. The purpose of
   analyzing GO concepts is to understand the molecular activity, cellular
   structure, and the position in the cell where genes fulfill their
   functions [[105]23]. We used four databases to find common molecular
   pathways in IPF and COVID-19: Kyoto Encyclopedia of Genes and Genomes
   (KEGG) [[106]24], Wiki Pathways [[107]24], Reactome [[108]25], and
   BioCarta [[109]25]. Various gene annotations may be found in the KEGG,
   which is commonly used to characterize metabolic pathways. A web-based
   platform Enrichr has been used to obtain GO, and molecular pathways for
   the common genes mentioned earlier in this research [[110]26, [111]27].
   To derive GO and molecular pathways, we utilized 20 sorted genes.

Analysis of PPI network

   The role of PPIs in cellular biology is projected to be a major focus
   of research, and it serves as a requirement for system biology
   [[112]28]. Proteins finish their journey within a cell with a
   comparable protein affiliation established by a PPI network, indicating
   the protein processes. Proteins interact with other proteins to carry
   out their activities inside cells, and the information created by a PPI
   network informs individuals about the protein’s function [[113]29]. We
   built the PPI network of DEGs proteins using the STRING resource to
   exchange activity and physical linkages between IPF and COVID-19
   [[114]30]. The STRING generates experimental and predicted outcomes
   based on the data and the interaction generated by the online tool,
   which is determined by 3D structures, accessory data, and confidence
   scores [[115]31]. The confidence score was set using the STRING
   platform that was different categorized confidence scores (low, medium,
   and high). We have been worked on the PPI network with a medium
   confidence score (0.400). We get the exact information, using the
   network type “full string network” (the edges indicate both functional
   and physical protein resources) and a selected number of 10
   interactors. Then, we consume our PPI network into Cytoscape (version
   3.7.1) for visual representation and further PPI network experimental
   studies. And with that the purpose of identifying hub-genes, the
   obtained PPIs are analyzed through Cytoscape. Cytoscape is an
   open-source network visualization framework that serves as a versatile
   method for combining several datasets to optimize efficiency for
   various interactions such as protein–protein interactions, genetic
   interactions, and protein–DNA interactions, among others [[116]32,
   [117]33].

Identifying of hub-genes and module analysis

   The PPI networks are nodes, edges, and connections, with hub-genes
   being the most entangled nodes. The PPI networks are used to identify
   hub-genes. Hub-genes provide dense areas identified as important parts
   of the PPIs network. The hub-genes for the associated PPI networks are
   indicated by CytoHubba, a Cytoscape application plugin [[118]34].
   CytoHubba is the most popular Cytoscape hub-genes identification plugin
   for its user-friendly interface. CytoHubba has 20 different methods for
   topological analysis (e.g. MCC, Degree, DMNC, MNC, EPC, Bottleneck,
   etc.). The degree analysis method was employed to find the hub-genes
   for this study. Because the degree method facilitates analysis by
   suggesting large, closely compacted modules in the PPI network, it is
   employed instead of another approach [[119]35]. The Molecular Complex
   Detection (MCODE) plugin in the Cytoscape software is utilized to
   locate the most profound modules in the PPIs network [[120]36]. The
   MCODE method is based on a graph-theoretic clustering algorithm that
   detects densely connected regions in large protein–protein interaction
   networks that may represent molecular complexes [[121]36]. The method
   has the advantage over other graph clustering methods of having a
   directed mode that allows fine-tuning of clusters of interest without
   considering the rest of the network and allows examination of cluster
   inter-connectivity, which is relevant for protein networks.
   Furthermore, the method is not affected by the known high rate of false
   positives in data from high-throughput interaction techniques
   [[122]37]. Moreover, the method is relatively easy to implement and,
   since it is local density based, has the advantages of both a directed
   mode and a complex connectivity mode. The MCODE method has also been
   employed in the PPIs network to locate highly bound areas in the
   molecular complexes.

TF–gene analysis

   TFs bind to individual genomes and regulate their levels of expression.
   As a result, it is required for molecular recognition [[123]38]. In all
   species, TFs control gene expression and play a critical role in
   transcription. TFs play an important role in a variety of biological
   processes, including cell cycle regulation and development. TF–gene
   linkage with the newly discovered top 12 common DEGs among 90 DEGs was
   used to investigate the effects of TF–genes on functional pathways and
   genomic levels. By using the Network Analyst tool to find topologically
   relevant TFs from the ENCODE database, which was used in the TF–gene
   interaction network [[124]39–41], we were able to exploit TF–gene
   interactions with previously established common genes. Network Analyst
   is a web-based tool for doing transcriptional research and
   meta-analysis on various species, including humans [[125]42, [126]43].
   The TF–gene interaction network has made up of 190 nodes and 301 edges.
   Moreover, the network has 12 DEGs and 178 TF–genes, where HSPB6 is
   regulated by 85 TF–genes, EPAS1 is regulated by 68 TF–genes, and FCGR2A
   is regulated by 37 TF–genes according to their degree value. These 178
   TF–genes are regulated by more than one common DEG, which indicates
   high interaction of the TF–genes with common DEGs.

TF–miRNA interaction with the common DEGs

   The miRNAs are short non-RNAs that are expressed by RNA polymerase II
   and then regulated by a shared biogenic pathway in a step-by-step
   method. Using a combination of experimental and computational
   techniques, miRNAs have been discovered in a variety of species. By
   binding to the 3′-untranslated, miRNA regulates gene expression at the
   post-transcriptional stage. The RegNetwork database was utilized to
   collect TF–miRNA coregulatory interactions, which helps to identify the
   miRNAs and regulatory TF–genes that regulate DEGs of interest at the
   transcriptional and post-transcriptional phases [[127]43]. We found
   miRNAs that interact with common DEGs and then utilized the Network
   Analyst tool to analyze how they interact. With this platform,
   researchers can find complex datasets and determine biological traits
   and functions [[128]44]. The network of miRNA–gene interactions was
   examined using Cytoscape software. By classifying top miRNAs to higher
   levels, this software aids researchers in determining biological roles
   and features. The TF–miRNA coregulatory network has 191 nodes and 216
   edges. According to research, DEGs engage with 87 miRNAs and 93
   TF–genes.

Candidate drugs identification

   Predicting PDI or drug molecule recognition is important for this
   research. We identified a therapeutic molecule based on the common DEGs
   of SARS-CoV-2 and IPF using the Enrichr tool and DSigDB database. There
   are 22 527 gene sets in the drug signatures database. To acquire access
   to the DSigDB database, the Enrichr platform is employed [[129]45,
   [130]46]. Enrichr is a well-known web portal with many gene-set
   libraries that may be used to look into gene-set enrichment on a
   genome-wide scale [[131]26].

Result analysis

   The overall performance of the analysis is discussed in this section.
   Beginning with a discussion of DEGs and mutual DEG identification, the
   article progresses to a description of the candidate drug
   identification procedure.

DEGs and mutual DEGs identification

   We investigated the interrelationships and implications of disrupted
   genes that activate COVID-19 and IPF using the NCBI’s human RNA-seq and
   microarray datasets. The [132]GSE147507 dataset determines DEGs for
   SARS-CoV-2, and its GEO platform identifier is [133]GPL18573. There are
   926 upregulated and 799 downregulated genes in the [134]GSE147507
   dataset, resulting in 1725 DEGs. In the [135]GSE52463 dataset, which
   has the GEO platform identifier [136]GPL11154, we discovered a total of
   1008 DEGs, with 669 upregulated and 339 downregulated genes. The
   quantitative measurement of the selected datasets is shown in
   [137]Table 2. After cross-comparative analysis using JVENN, a
   trustworthy web platform for Venn analysis, we discovered 90 similar
   DEGs from the [138]GSE147507 and [139]GSE52463 datasets. Twenty common
   DEGs were chosen for further study from 90 common DEGs based on the
   P-value (MDK, HP, HSPB6, CHIT1, TNFAIP6, EPAS1, MMP1, CCL18, CXCL6,
   CCL11, IL1RN, LAMP3, CD207, ARRB1, RNASE2, LILRA1, FCGR2A, STAT4, CD69,
   and SAMSN1). Additional study has been conducted using these 20
   frequent DEGs. [140]Figure 4 depicts the common DEGs as a Venn diagram,
   with 90 genes discovered to be shared in the [141]GSE147507 and
   [142]GSE52463 datasets.

Table 2:

   Quantitative measurements of the datasets used in this analysis
   Properties [143]GSE147507 [144]GSE52463
   Common gene analysis DESeq2 and the lima package DESeq2 and the lima
   package
   Cutoff criteria P < 0.05 and |log Fc| ≥ 1.0 P < 0.05 and |log Fc| ≥ 1.0
   Total DEGs count 1725 genes 1008 genes
   Upregulated DEGs count 926 genes 669 genes
   Downregulated DEGs count 799 genes 339 genes
   [145]Open in a new tab

Figure 4:

   [146]Figure 4:
   [147]Open in a new tab

   Common DEGs representation through a Venn diagram. There are 90 genes
   were found common from the 1635 DEGs of SARS-CoV-2 infection and 918
   DEGs of IPF patients. The common DEGs were 3.4% among total 2553 DEGs.

GO and molecular pathway analysis

   Enrichment analysis of gene sets is a technique for identifying DEGs
   linked to a biological process or molecular function. For this study,
   we looked at the most prevalent DEGs. GO processes are divided into
   biological, cellular components, and molecular functions. [148]Table 3
   shows the biological process connected to GO keyword identification
   findings based on the combined score. [149]Table 4 shows the results of
   the identification of molecular function-related GO keywords based on
   the combined score. [150]Table 5 also shows the results of the cellular
   component-related GO keywords identification based on the combined
   score. The KEGG, Wiki Pathways, Reactome, and BioCarta have been used
   to find the most impactful pathways of the shared DEGs between IPF and
   SARS-CoV-2. [151]Tables 6, [152]7, [153]8, and [154]9 show the
   essential pathways discovered in the datasets. The graphical view of GO
   terms and pathways analysis are shown in [155]Figs. 5 and [156]6.

Table 3:

   The combined score was used to identify biological process-related GO
   keywords
   Group GO ID GO pathways P-value Genes
   GO biological process GO: 0006032 Chitin catabolic process 6.98E-03
   CHIT1
   GO: 0090240 Positive regulation of histone H4 acetylation 6.98E-03
   ARRB1
   GO: 0006030 Chitin metabolic process 6.98E-03 CHIT1
   GO: 0072677 Eosinophil migration 2.59E-04 CCL11; CCL18
   GO: 0048245 Eosinophil chemotaxis 2.59E-04 CCL11; CCL18
   GO: 0070098 Chemokine-mediated signaling pathway 1.83E-05 CXCL6; CCL11;
   CCL18
   GO: 0030593 Neutrophil chemotaxis 1.94E-05 CXCL6; CCL11; CCL18
   GO: 0002029 Desensitization of G-protein coupled receptor protein
   signal 7.97E-03 ARRB1
   GO: 0038114 Interleukin-21-mediated signaling pathway 7.97E-03 STAT4
   GO: 0098757 Cellular response to interleukin-21 7.97E-03 STAT4
   [157]Open in a new tab

Table 4:

   The combined score was used to identify GO keywords linked to molecular
   functions
   Group GO ID GO pathways P-value Genes
   GO molecular function GO: 0019966 Interleukin-1 binding 5.98E-03 IL1RN
   GO: 0008009 Chemokine activity 1.26E-05 CXCL6; CCL11; CCL18
   GO: 0004568 Chitinase activity 6.98E-03 CHIT1
   GO: 0042379 Chemokine receptor binding 1.53E-05 CXCL6; CCL11; CCL18
   GO: 0005537 Mannose binding 1.09E-02 CD207
   GO: 0048020 CCR chemokine receptor bind 6.54E-04 CCL11; CCL18
   GO: 0005041 Low-density lipoprotein receptor 1.29E-02 TNFAIP6
   GO: 0005125 Cytokine activity 1.53E-05 CXCL6; IL1RN; CCL11;
   GO: 0005149 Interleukin-1 receptor binding 1.49E-02 IL1RN
   GO: 0005159 Binding of insulin-like growth factor receptors 1.49E-02
   ARRB1
   [158]Open in a new tab

Table 5:

   The combined score was used to identify cellular component-related GO
   keywords
   Group GO ID GO pathways P-value Genes
   GO cellular component GO: 1904724 Tertiary granule lumen 1.37E-03
   CHIT1; TNFAIP6
   GO: 0030669 Clathrin-coated endocytic vesicle membrane 3.25E-02 CD207
   GO: 0045334 Clathrin-coated endocytic vesicle 4.88E-02 CD207
   GO: 0070820 Tertiary granule 1.15E-02 CHIT1; TNFAIP6
   GO: 0030659 Cytoplasmic vesicle membrane 5.27E-02 ARRB1
   GO: 0035580 Specific granule lumen 6.02E-02 CHIT1
   GO: 0031410 Cytoplasmic vesicle 1.92E-02 CD207; ARRB1
   GO: 0005769 Early endosome 2.04E-02 LAMP3; CD207
   GO: 0031901 Early endosome membrane 7.05E-02 CD207
   GO: 0030665 Clathrin-coated vesicle membrane 7.79E-02 CD207
   [159]Open in a new tab

Table 6:

   Pathway analysis results in identification through KEGG using the
   combined score
   Database Pathways P-value Gene
   KEGG IL-17 signaling pathway 1.05E-04 CXCL6; CCL11; MMP1
   Chemokine signaling pathway 3.39E-05 CXCL6; CCL11; ARRB1; CCL18
   Cytokine–cytokine receptor interaction 1.84E-04 CXCL6; IL1RN; CCL11;
   CCL18
   Rheumatoid arthritis 3.69E-03 CXCL6; MMP1
   Asthma 3.06E-02 CCL11
   Osteoclast differentiation 7.05E-03 FCGR2A; LILRA1
   Relaxin signaling pathway 7.37E-03 MMP1; ARRB1
   Bladder cancer 4.02E-02 MMP1
   Hedgehog signaling pathway 4.59E-02 ARRB1
   Amino sugar and nucleotide sugar metabolism 4.69E-02 CHIT1
   [160]Open in a new tab

Figure 5:

   [161]Figure 5:
   [162]Open in a new tab

   According to the combined score, (a) biological, (b) molecular
   function, and (c) cellular component relevant GO keywords were
   identified. The higher the enrichment score, the higher number of genes
   are involved in a certain ontology.

Figure 6:

   [163]Figure 6:
   [164]Open in a new tab

   The pathway analysis results were identified using (a) KEGG, (b) Wiki
   Pathways, (c) Reactome, and (d) BioCarta. The results of the pathway
   terms were identified through the combined score.

Table 7:

   Pathway analysis results in identification through Wiki pathways using
   the combined score
   Database Pathways P-value Gene
   Wiki Pathways Thymic Stromal Lymphopoietin Signaling Pathway 1.00E-03
   CCL11; STAT4
   Amplification and Expansion of Oncogenic Pathways as Metastatic Traits
   1.69E-02 EPAS1
   Matrix Metalloproteinases 2.95E-02 MMP1
   Signal transduction through IL1R 3.25E-02 IL1RN
   Type 2 papillary renal cell carcinoma 3.34E-02 EPAS1
   Photodynamic therapy-induced NF-kB survival signaling 3.44E-02 MMP1
   Bladder Cancer 3.92E-02 MMP1
   Integrated Cancer Pathway 4.31E-02 MMP1
   Hedgehog Signaling Pathway 4.31E-02 ARRB1
   Hepatitis C and Hepatocellular Carcinoma 4.79E-02 MMP1
   [165]Open in a new tab

Table 8:

   Pathway analysis results in identification through Reactome using the
   combined score
   Database Pathways P-value Gene
   Reactome PTK6 Expression 4.99E-03 EPAS1
   Regulation of gene expression by Hypoxia-inducible Factor 9.96E-03
   EPAS1
   Chemokine receptors bind chemokines 1.42E-03 CXCL6; CCL11
   Oxygen-dependent proline hydroxylation of Hypoxia-inducible Factor
   Alpha 1.78E-02 EPAS1
   Activation of SMO 1.78E-02 ARRB1
   Regulation of Insulin-like Growth Factor transport and uptake by
   Insulin-like Growth Factor Binding Proteins 2.08E-02 MMP1
   NOTCH2 Activation and Transmission of Signal to the Nucleus 2.08E-02
   MDK
   Basigin interactions 2.47E-02 MMP1
   Regulation of hypoxia-inducible Factor by oxygen 2.56E-02 EPAS1
   Cellular response to hypoxia 2.57E-02 EPAS1
   [166]Open in a new tab

Table 9:

   Pathway analysis results in identification through BioCarta using the
   combined score
   Database Pathways P-value Gene
   BioCarta Beta-arrest ins in GPCR Desensitization Pathway 3.54E-04
   CCL11; ARRB1
   NO2-dependent IL12 Pathway in NK cells Pathway 8.96E-03 STAT4
   Role of Beta-arrestins in the activation and targeting of MAP kinases
   Pathway 4.06E-04 CCL11; ARRB1
   G-Protein Signaling Through Tubby Proteins Pathway 9.95E-03 CCL11
   Roles of Beta-arrestins-dependent Recruitment of Src Kinases in GPCR
   Signaling Pathway 5.23E-04 CCL11; ARRB1
   Activation of PKC through G-protein coupled receptors Pathway 1.09E-02
   CCL11
   Visual Signal Transduction Pathway 1.29E-02 ARRB1
   Attenuation of GPCR Signaling Pathway 1.29E-02 ARRB1
   IL12- and Stat4-dependent Signaling Pathway in Th1 Development 1.49E-02
   STAT4
   Cystic fibrosis transmembrane conductance regulator (CFTR) and beta 2
   adrenergic receptor (b2AR) 1.98E-02 CCL11
   [167]Open in a new tab

Analysis of PPI network for the identification of hub-genes

   The PPI network analysis is the most important element. This network
   has conducted hub-gene recognition, module analysis, and drug
   identification. In STRING, the specific DEGs have been provided as
   input. The analysis file was re-imported into the Cytoscape software
   for visualization. For the most frequent DEGs, a PPI network has been
   created. Finally, the PPIs network results connect to therapeutic
   compound suggestions, placing the PPIs analysis as the research’s
   focus. [168]Figure 7 shows the PPI network with 60 nodes and 308 edges.
   For SARS-CoV-2 and IPF, the PPI network was developed to discover
   hub-genes and medicinal compounds.

Figure 7:

   [169]Figure 7:
   [170]Open in a new tab

   A network of PPIs discovered common DEGs in two illnesses (SARS-CoV-2
   and IPF). The orange nodes denote common DEGs, whereas the edges denote
   the relationship between two genes. The network under investigation has
   60 nodes and 308 edges.

Identification of hub-genes for therapeutic solutions and module analysis

   CytoHubba, a Cytoscape software plugin, was used to track the hub-genes
   from the PPIs network. The degree meaning of the hub-genes, which
   represents the number of interactions between the genes in the PPI
   network, has been categorized. Hub-genes are the bulk of interconnected
   nodes in a PPI network. The topological analysis identified the top
   five genes (AKT1, IL1B, CCL5, MMP9, and ARRB1) classified as hub-genes
   based on their degree value. [171]Table 10 shows the results of the
   topological analysis. These hub-genes could be exploited as biomarkers,
   leading to new therapeutic approaches for the studied diseases. The
   network has 50 nodes and 283 edges, and we utilized a degree-sorted
   circle structure to lay it out. The network of hub-genes is depicted in
   [172]Fig. 8, with the top five hub-genes AKT1, IL1B, CCL5, MMP9, and
   ARRB1.

Table 10:

   Exploration of topological results for the top five hub-genes
   Hub gene Degree Stress Close ness Between ness Bottle neck Clustering
   coefficient EcCentricity Radiality
   AKT1 27 3322 42.25000 637.30186 26 0.25356 0.25000 4.47458
   IL1B 26 2172 42.33333 475.08574 03 0.34154 0.33333 4.52542
   CCL5 22 1216 38.25000 238.70899 14 0.35931 0.25000 4.23729
   MMP9 22 1808 39.16667 322.49125 07 0.35498 0.33333 4.33898
   ARRB1 19 1630 37.55000 291.37776 06 0.43865 0.25000 4.25424
   [173]Open in a new tab

Figure 8:

   [174]Figure 8:
   [175]Open in a new tab

   The PPIs network was used to find hub-genes. There are 50 nodes and 283
   edges in the network. AKT1 and IL1B have degrees of 27 and 26,
   respectively, according to topological analysis. CCL5, MMP9, and ARRB1
   had degrees of 22, 22, and 19, respectively.

TF–gene analysis

   The Network Analyst platform was used to investigate TF–gene
   interactions. The common DEGs were used to examine the TF–gene network.
   There are 190 nodes and 301 edges in the TF–gene network. Furthermore,
   the network contains 12 DEGs and 178 TF–genes, with 85 TF–genes
   regulating HSPB6, 68 TF–genes regulating EPAS1, and 37 TF–genes
   regulating FCGR2A according to their degree value. These 178 TF–genes
   are regulated by several common DEGs, indicating a high level of
   interaction between the TF–genes and common DEGs. The TF–gene network
   is shown in [176]Fig. 9.

Figure 9:

   [177]Figure 9:
   [178]Open in a new tab

   The interaction of TF–genes with common DEGs is represented via a
   network. The common genes are shown by the highlighted yellow color
   node, while TF–genes are represented by the other nodes. There are 190
   nodes and 301 edges in the network.

TF–miRNA analysis

   The TF–miRNA coregulatory network was built using the Network Analyst
   tool. Analyzing this TF–miRNA coregulatory network revealed the
   connection of miRNAs and TFs with common DEGs. There are 191 nodes and
   216 edges in this coregulatory network. DEGs interact with 87 miRNAs
   and 93 TF–genes, according to this study. [179]Figure 10 shows the
   TF–miRNA coregulatory network. 

Figure 10:

   [180]Figure 10:
   [181]Open in a new tab

   There are 93 TF–genes, 87 miRNAs, and 11 DEGs in the TF–miRNA network.
   There are 191 nodes and 216 edges in the network. DEGs are represented
   by blue nodes, while miRNA is represented by green nodes, and TF–genes
   are represented by other nodes.

Candidate drugs identification and validation

   Drug compounds for common DEGs have been discovered using the Enrichr
   platform. Using the DSigDB database, we discovered 10 candidate
   medicinal compounds. The top 10 chemical compounds have been extracted
   based on the combined score of P-value and adjusted P-value. NICKEL
   SULFATE CTD 00001417, Clonidine HL60 UP, and THYMOLPHTHALEIN CTD
   00006891 are the three-drug compounds most genes interact with,
   according to the data. These medicines are common pharmaceuticals for
   COVID-19 and IPF since these signature drugs have been discovered for
   common DEGs. [182]Table 11 displays the most efficient medications for
   the most common DEGs from the DSigDB database.

Table 11:

   The top 10 drug compounds suggested for common DEGs
   Name of the drugs P-value Adjusted P-value Name of the genes
   Nickel Sulfate CTD 00001417 1.37E-12 8.81E-10 CXCL6; IL1RN; CCL11;
   TNFAIP6; EPAS1; MMP1; LAMP3; CD207; STAT4; CD69; SAMSN1
   Clonidine HL60 UP 1.04E-06 3.36E-04 IL1RN; FCGR2A; RNASE2; SAMSN1
   Thymolphthalein CTD 00006891 3.80E-04 1.01E-02 EPAS1; ARRB1
   Peptidoglycan CTD 00006490 4.34E-04 1.07E-02 TNFAIP6; MMP1
   Lithocholic acid HL60 UP 4.63E-04 1.10E-02 CD69; SAMSN1
   Beclomethasone CTD 00005468 3.93E-05 3.21E-03 IL1RN; CCL11; RNASE2
   Salmeterol CTD 00002421 4.92E-04 1.13E-02 CCL11; RNASE2
   Mephentermine HL60 UP 4.48E-05 3.21E-03 IL1RN; EPAS1; CD69
   Colchicine HL60 UP 8.09E-06 1.04E-03 IL1RN; FCGR2A; EPAS1; SAMSN1
   Bromocriptine HL60 UP 6.94E-05 4.07E-03 FCGR2A; TNFAIP6; SAMSN1
   [183]Open in a new tab

   Computationally predicted results usually need experimental
   verification, but it has more difficulty and limitations in practical
   implementation. Thus, similar to Zhang et al. [[184]47], they found a
   novel validation process for suggested drug compounds based on the
   Receiver Operator Characteristic (ROC) curve. We tried to validate our
   suggested drug compounds using the ROC curve mechanism. [185]Figure 11
   shows the validation performance comparison between the top five
   suggested drug compounds using the ROC curve. We considered the top
   five suggested drug compounds, where Nickel Sulfate has a higher
   validation accuracy than the others, according to the ROC curve. Other
   suggested drug compounds, as shown in [186]Fig. 11, were also validated
   using the same procedures, which is much more valuable to the medical
   community.

Figure 11:

   Figure 11:
   [187]Open in a new tab

   Performance comparison of the top five suggested drug compounds based
   on the ROC curve. We considered the top five suggested drug compounds,
   where Nickel Sulfate has a higher validation accuracy than the others,
   according to the ROC curve.

Discussion

   COVID-19 is more common in people who have lung disease. This study
   contributes to the development of a bioinformatics and machine learning
   model to identify the Genetic Effect of SARS-CoV-2- and IPF-affected
   patients. Shortness of breath, cough, and chest pain are the most
   typical symptoms of these two diseases. About 1725 and 1008 DEGs were
   found in [188]GSE147507 and [189]GSE52463, respectively, using
   bioinformatics-related techniques. Common DEGs between the
   [190]GSE147507 and [191]GSE52463 datasets have been discovered for
   better coordination. There is a total of 90 DEGs that have been
   identified. Twenty common DEGs were chosen for further study from 90
   common DEGs based on the P-value (MDK, HP, HSPB6, CHIT1, TNFAIP6,
   EPAS1, MMP1, CCL18, CXCL6, CCL11, IL1RN, LAMP3, CD207, ARRB1, RNASE2,
   LILRA1, FCGR2A, STAT4, CD69, and SAMSN1). The analysis of GO, KEGG,
   Wiki Pathways, Reactome, BioCarta pathway analysis, PPIs, TF–gene,
   TF–miRNA coregulatory network, and candidate drug detection has been
   continued in the research project.

   DEGs that have been identified as common have been used to find GO
   words. GO keywords were identified using the combined score. Biological
   process, molecular function analysis, and cellular component analysis
   are the three categories of GO analysis [[192]48]. KEGG, Wiki Pathways,
   Reactome, and BioCarta were used to identify pathway analysis results.
   For the most prevalent DEGs, the KEGG pathway has been determined. KEGG
   is a database that aids researchers in understanding the high-level
   functions and utility of biological systems. Because hub-gene
   recognition, module analysis, and drug identification are all strongly
   dependent on the PPI network, it is the significant part of the
   research. Common DEGs were also subjected to PPI analysis. The
   identification of hub-genes in the PPI network was studied. The five
   genes that have been highlighted are AKT1, IL1B, CCL5, MMP9, and ARRB1.
   These five genes are classified as hub-genes based on their degree
   value. The aim of concentrating on a small area is to suggest a more
   effective medication component.

   The interaction of TF–genes and miRNAs was investigated to identify
   transcriptional and post-transcriptional regulators of common DEGs. The
   specific DEGs have been used to investigate TF–gene interactions.
   TF–genes act as regulators of gene expression, which can contribute to
   cancer cell formation. About 85 TF–genes regulate HSPB6, 68 TF–genes
   regulate EPAS1, and 37 TF–genes regulate FCGR2A according to their
   degree value in the network, with 12 DEGs and 178 TF–genes. The
   TF–miRNA coregulatory network depicts the interactions between miRNAs
   and TF–genes tested for their ability to influence common DEGs. There
   were 87 miRNAs and 93 TF–genes discovered. Several studies have found
   evidence of altered miRNA expression in IPF samples, and members of the
   miR-200 family play a significant role in IPF sample management
   [[193]49]. Taz et al. [[194]50] investigated only 69 samples, whereas
   we analyzed 110 SARS-CoV-2 samples. As a result, this research will
   ideally integrate COVID-19 with IPF risk factor treatment. Chemical
   testing can be used to verify the drugs’ efficacy.

   In addition, we thoroughly discussed the application areas of our
   research for the scientific society. First of all, researchers can use
   the same approach to investigate the impact of SARS-CoV-2 on other
   diseases. Also, if a new virus appears, our research will serve as a
   useful starting point for further investigation. Furthermore, our
   research suggests several viable drugs, so scientists will be able to
   find a treatment for SARS-CoV-2 with more research. Finally, our
   research is an example of a virus's genetic relationship with a certain
   type of patient. So, researchers can use this methodology to figure out
   the genetic relationships between different viruses and patients.

Conclusions

   COVID-19 infections have been associated with a high-risk factor for
   IPF patients. Shortness of breath, cough, and chest pain are the most
   typical symptoms of these two diseases. We used machine learning and
   bioinformatics analysis to summarize the relationships between these
   two disease genes as part of our research. We analyzed DEGs from two
   selected datasets, analyzed the results using shared gene
   identification, and discovered SARS-CoV-2- and IPF-affected lung-cell
   infection responses. As a consequence, we discovered 90 genes that are
   linked across these datasets. These interconnected genes built the PPI
   network, which identified the five most important hub-genes. In
   addition, we looked at SARS-CoV-2 and IPF to see if they might predict
   the outcomes of identifying infections of other diseases. The
   therapeutic goals are logically presented because they are executed
   from the discovery of hub-genes and could work as an effective
   precursor to meanwhile licensed medications. We believe that the
   biomarkers, pathways, and molecular markers we discovered will be
   valuable in developing pharmacological therapies.

Declarations

Ethical Approval

   Not applicable (there is no human-related data. So, ethical approval is
   not taken from the external body of the committee).

Consent to Participate

   Not applicable (there is no human-related data. So, consent is not
   necessary to take from the participant).

Consent to Publish

   Not applicable (there is no human-related data. So, consent to publish
   is not necessary to take from the participant).

Funding

   This work was supported by Researchers Supporting Project number
   (RSP-2021/100), King Saud University, Riyadh, Saudi Arabia. This work
   was supported in part by funding from the Natural Sciences and
   Engineering Research Council of Canada (NSERC).

Acknowledgement