Abstract
Vitiligo is a chronic asymptomatic disorder affecting melanocytes from
the basal layer of the epidermis which leads to a patchy loss of skin
color. Even though it is one of the neglected disease conditions,
people suffering from vitiligo are more prone to psychological
disorders. As of now, various studies have been done in order to
project auto-immune implications as the root cause. To understand the
complexity of vitiligo, we propose the Vitiligo Information Resource
(VIRdb) that integrates both the drug-target and systems approach to
produce a comprehensive repository entirely devoted to vitiligo, along
with curated information at both protein level and gene level along
with potential therapeutics leads. These 25,041 natural compounds are
curated from Natural Product Activity and Species Source Database.
VIRdb is an attempt to accelerate the drug discovery process and
laboratory trials for vitiligo through the computationally derived
potential drugs. It is an exhaustive resource consisting of 129
differentially expressed genes, which are validated through gene
ontology and pathway enrichment analysis. We also report 22 genes
through enrichment analysis which are involved in the regulation of
epithelial cell differentiation. At the protein level, 40 curated
protein target molecules along with their natural hits that are derived
through virtual screening. We also demonstrate the utility of the VIRdb
by exploring the Protein–Protein Interaction Network and Gene–Gene
Interaction Network of the target proteins and differentially expressed
genes. For maintaining the quality and standard of the data in the
VIRdb, the gold standard in bioinformatics toolkits like Cytoscape,
Schrödinger’s GLIDE, along with the server installation of MATLAB, are
used for generating results. VIRdb can be accessed through
“[34]http://www.vitiligoinfores.com/”.
Keywords: Vitiligo, Database, Differential genes, Protein targets,
Molecular docking, Natural compounds
Introduction
Vitiligo is a chronic asymptomatic disorder affecting melanocytes from
the basal layer of the epidermis which leads to a patchy loss of skin
color ([35]Rodrigues et al., 2017). It affects 0.5–2% of the world’s
population ([36]Ezzedine & Silverberg, 2016). The lesions can appear in
different shapes and sizes on any visible part of the body ([37]Tarlé
et al., 2014). Even though it is one of the neglected diseases
condition people suffering from vitiligo are more prone to
psychological disorders and the risk of developing hearing problems
([38]Ezzedine et al., 2015). The auto-immune inflammation theory
remains to be the most acceptable, accounting for its development via
an adaptive immune response ([39]Boniface et al., 2017). According to
biochemical theory, accumulation of Reactive Oxygen species (ROS) in
the melanocytes remains the pioneer, for activation of CD8+ T cells
against melanocytes ([40]Wang, Li & Li, 2019; [41]Deng et al., 2019).
Previous studies have also reported that HIF-1-alpha, F2RL-1 and PIK3CB
over-expression in CD8+ cells ([42]Deng et al., 2019). HIF-1-alpha
plays a crucial part in ROS accumulation response, whereas the F2RL-1
and PIK3CB are responsible for an immune response with CD8+ T cells
([43]Deng et al., 2019). Due to loss of melanocytes in individuals
affected by vitiligo, one can also expect an increased incidence of
non-melanoma cancer and actinic keratose ([44]Zhang et al., 2016).
Arresting the progression of lesions and repigmentation of formed
lesions remains to be a crucial strategy for treatment ([45]Bishnoi &
Parsad, 2018). AK-STAT inhibitors have demonstrated promising outcomes
in the treatment of vitiligo, including repigmentation in topical
applications. JAK inhibitor, tofacitinib and JAK-1,2 inhibitor,
ruxolitinib, have been tested positive in causing repigmentation
([46]Bishnoi & Parsad, 2018).
To understand the complexity of the vitiligo, we envisage a
comprehensive online repository focusing on the systematically derived
and manually curated potential protein targets along with their
prospective hits derived from the natural compounds databases. To fully
understand the genetic complexity of the vitiligo we also aim to
present a genetic interaction network of the differentially expressed
genes, in Lesional Vitiligo, Peri-Lesional Vitiligo, and Non-Lesional
Vitiligo against healthy controls. The predicted differentially
expressed genes across samples were then validated using gene
enrichment analysis with Kyoto Encyclopedia of Gene and Genome (KEGG)
and Gene Ontology (GO) ([47]Kanehisa et al., 2020; [48]The Gene
Ontology Consortium, 2007). The Vitiligo Information Resource (VIRdb)
systematically harbors all the information relating to vitiligo in
terms of Potential protein targets and differentially expressed genes
in a user-friendly database. As more than 50% of the approved drugs in
DrugBank are natural compounds, therefore, we took the liberty of
virtually screening all the natural compounds present in Natural
Product Activity and Species Source Database (NPASS) against the
potential protein targets ([49]Wishart et al., 2017; [50]Zeng et al.,
2017). For maintaining, the quality and standard of the data in the
VIRdb, the gold standard in bioinformatics toolkits like Cytoscape,
Schrödinger’s GLIDE, along with the server installation of MATLAB are
used ([51]Shannon, 2003; [52]Schrödinger, 2020; [53]MATLAB, 2019). The
screened library can be used to carry wet-lab experiments based on
potential hits which would thereby lead to considerably less
consumption of time and other resources to carry out the protocol for
drug development. We envision that VIRdb will be pertinent for the
researchers and clinicians engaged in drug development for vitiligo.
Methods
Microarray gene expression analysis
Gene Expression Omnibus was manually queried using “(“vitiligo” (MeSH
Terms) OR vitiligo (All Fields)) AND (“humans” (MeSH Terms) OR “Homo
sapiens” (Organism) OR homo sapiens (All Fields))”. The returned
results were manually curated for microarray experiments having healthy
controls against affected individuals. We selected [54]GSE65127 as it
comprises of 10 healthy controls against 10 conditions (Lesional,
Peri-lesional, Non-lesional). The CEL files of 40 samples having
expression data of Lesional, Peri-lesional, Non-lesional against
healthy control on Affymetrix [55]GPL570 platform were isolated
([56]Clough & Barrett, 2016; [57]Regazzetti et al., 2015) data.
Combinations of {Healthy vs (Lesional, Peri-lesional, Non-lesional)}
were processed using the server installation of [58]MATLAB (2019) by
applying the GC Robust Multichip Averaging (GCRMA) method with
Empirical Bayes ([59]MATLAB, 2019; [60]Wu et al., 2004). GCRMA
empirical Bayes is the posterior mean estimate which is calculated
through S, that represents a quantity proportional to the concentration
of transcripts in the hybridization mixture ([61]Eq. 1). After the raw
reads from CEL files are converted to expression data through GCRMA,
the probe sets with un-annotated gene symbols were dropped
programmatically. A t-test is performed with each gene expression value
to study the relevant changes in the expression values. False Discovery
Rate (FDR) for each test situation was computed using Storey–Tibshirani
procedure ([62]Storey & Tibshirani, 2003). To compute the biological
expression significance, −log10 of p-values was calculated and plotted
against log fold change (2 folds).
[MATH:
s=E[s|S>0,<
mrow>PM,MM],withs=log(S)<
/mrow> :MATH]
(1)
Gene–gene interaction network construction
For designing the gene–gene interaction network for differentially
expressed genes, GeneMania plugin from Cytoscape was used ([63]Shannon,
2003; [64]Mostafavi et al., 2020). Initially, 3 distinct networks for
differentially expressed genes (Expressed in lesional, Expressed in
non-lesional, and Expressed in peri-lesional) were made. After that,
all the networks are merged, with common interacting gene nodes in the
Cytoscape environment ([65]Shannon, 2003). The tangential nodes were
manually deleted and only the highly connected nodes were taken for
further analysis. The predicted co-expressed genes from GeneMania
interaction network were also taken for enrichment analysis
([66]Mostafavi et al., 2020).
Gene Ontology (GO) and pathway analysis
ShinyGO server was used for the pathway enrichment analysis of the
single expression conditions (Lesional, Peri-Lesional and Non-lesional,
each distinctly) and unique differentially expressed genes (significant
expression at least once across the conditions) and predicted GGI genes
from GeneMania ([67]Mostafavi et al., 2020; [68]Ge, Jung & Yao, 2019).
The differentially expressed genes were enriched in Kyoto Encyclopedia
of Gene and Genome database (KEGG) ([69]Kanehisa et al., 2020). The
false discovery rate of 0.01 was used as the cut off to elute out the
most significant pathways. For GO enrichment analysis the GO database
was recruited from the ShinyGO server and GO enrichment analysis was
functioned with the false discovery rate of <0.01 ([70]The Gene
Ontology Consortium, 2007; [71]Mostafavi et al., 2020).
Biological data collection
The Pubmed Engine was queried using “((Vitiligo) AND (Proteins)) AND
(Protein–Protein Interaction)” for retrieval of literature relating to
the protein targets ([72]Motschall & Falck-Ytter, 2005). The query
returned 4 abstracts, out of which study that includes the systemic
approach for the construction of Protein–Protein Interaction Network
(PPI) was taken into consideration ([73]Malhotra et al., 2017). PPI
network of 215 proteins was revitalized using Cytoscape 3.7 and String
Database ([74]Shannon, 2003; [75]Mering, 2003; [76]Malhotra et al.,
2017). A total of 50 protein targets were eluted out of the set of
reported 64 vitiligo core proteins in the study ([77]Malhotra et al.,
2017). The sub-network of 50 proteins was created manually based on
coherence of centrality among 64 nodes ([78]Malhotra et al., 2017).
Finally considering the on the availability of X-ray crystallographic
structures in Protein Data Bank 40 protein crystal structures were
taken for molecular docking out of 50 nodes ([79]Appendix I)
([80]Berman, 2000; [81]Malhotra et al., 2017). All the potential
protein targets were annotated manually with their respective UniProt
ID, PDB ID, STRING ID, KEGG ID, encoding Gene Symbol and target active
amino acids from the scientific literature ([82]Kanehisa et al., 2020;
[83]Mering, 2003; [84]Berman, 2000; [85]The UniProt Consortium, 2016).
Highest-resolution structures were manually curated and confiscated for
Molecular docking studies for the protein structures that comprises
multiple deposits in the Protein Data Bank ([86]Wlodawer et al., 2013).
For identification of the active amino acids, each PDB profile was
manually curated in reference to its literature ([87]Appendix I).
Virtual screening against NPASS
As more than 50% of the approved drugs in DrugBank are natural
compounds, therefore, we took the liberty of virtually screening all
the natural compounds present in Natural Product Activity and Species
Source Database (NPASS) against the potential protein targets
([88]Wishart et al., 2017; [89]Zeng et al., 2017). The NPASS database
consists of 25,041 natural compounds ([90]Zeng et al., 2017). Every
structure was neutralized at the physiological pH and a library of
36,229 potential ligands was created including the stereoisomers and
tautomers ([91]Srivastava, Pal & Misra, 2018). Target protein
structures files were prepared using Schrödinger Protein Preparation
Wizard by removing excess waters and were minimized using OPLS 2005
force field ([92]Clark et al., 2019; [93]Sweere & Fraaije, 2017).
Receptor grid of dimensions of 15 × 15 × 15 angstroms was generated
around the active amino acids. Glide HTVS pipeline was used to carry
out virtual screening of the natural compound library to each target
protein. Ligands were sorted and ranked based on the Glide score as it
is empirical of various computed scores and is also quite accurate in
wet-lab experimentation ([94]Greenwood et al., 2010).
Architecture and design
Vitiligo Information Resource was formulated using Apache HTTP server
on Linux Platform. MySQL was used to store the information as it offers
fast and easy query processing because of its RDBMS architecture
([95]Mathur et al., 2016). HTML 5 and Cascading Style Sheets were used
to design the front-end interface of the VIRdb. PHP 7 was used for
maintaining the dynamic connectivity between the front-end and
back-end. Each entry in the database was connected to their parent
database via html anchors. The complete architectural schema of
vitiligo information resource db is depicted in [96]Fig. 1.
Figure 1. Database architecture.
[97]Figure 1
[98]Open in a new tab
Architecture of VIR database.
Results
Differentially expressed genes
A total of 123 differential genes for lesional vitiligo against healthy
controls. Out of 123 genes, 52 genes were predicted to be as
under-expressed and 70 genes were predicted to be over-expressed in
accordance with the LogFc filter ([99]Appendix II; [100]Fig. 2A). In
the case of Non-Lesional vs healthy control, a total of 27 upregulated
differential genes were identified ([101]Appendix III; [102]Fig. 2B).
For Peri-Lesional vitiligo, a total of 32 upregulated genes were
curated ([103]Appendix IV; [104]Fig. 2C). Upon, a union of all the sets
(Differential Genes, Lesional, Non-Lesional, and Peri-Lesional) as a
total of 129 differential genes have been curated. Out of which 52 were
under-expressed and 76 were over-expressed ([105]Appendix V).
Figure 2. Heatmap for lesional, non-lesional and peri-lesional conditions.
[106]Figure 2
[107]Open in a new tab
Heatmap of differentially expressed gene expression values with black
bars representing the Healthy/control samples and red being the
case/diseased samples; (A) heatmap of differentially expressed gene
expression values in lesional vitiligo against Healthy controls; (B)
heatmap of differentially expressed gene expression values in
non-lesional vitiligo against healthy controls; (C) heatmap of
differentially expressed gene expression values in peri-lesional
vitiligo against healthy controls.
GO and pathway enrichment analysis
The DCT, EDNRB, TYR, TYRP1, and WNT6 that were from the set of 129
unique differentially expressed genes across the samples were enriched
in Melanogenesis pathway with an FDR of 0.0038 in KEGG ([108]Kanehisa
et al., 2020) ([109]Appendix VI). Upon GO enrichment 6 genes (GPR143,
OCA2, TYR, DCT, TYRP1, SLC45A2) were enriched in Melanosome membrane
(Cellular Compartment), 7 genes (TYR, DCT, TYRP1, OCA2, SLC45A2, PMEL,
CDH3) were enriched in Melanin metabolic process (Biological Process)
([110]The Gene Ontology Consortium, 2007) ([111]Appendix VII).
Surprisingly, 2 genes were enriched with monophenol monooxygenase
activity (molecular function) and 2 genes were enriched in
oxidoreductase activity (molecular function), which could account for
the oxidative stress-induced melanocyte degradation ([112]Wang, Li &
Li, 2019; [113]Deng et al., 2019) ([114]Appendix VII) (For enrichment
analysis of the single expression conditions (Lesional, Peri-Lesional
and Non-lesional, each distinctly) see [115]Supplements 1–[116]3).
Gene–gene interaction network
We upvote 22 genes that were predicted while construction of common
network (ACYP1, APLP2, ARNTL, CEBPG, CPEB2, DIP2A, HNRNPA3, HNRNPA3P1,
HTATSF1, JKAMP, MAFG, MFN2, MKL1, ORC2, PAGR1, PDPK1, PEX5, PRKX, PRKY,
TLE4, TMEM70, USP47) with GeneMania ([117]Shannon, 2003, [118]Fig. 3;
[119]Appendix VIII). Upon gene enrichment analysis of these 22 genes
with GO db MAFG, PRKX, ARNTL were enriched in Regulation of epithelial
cell differentiation as the biological process ([120]The Gene Ontology
Consortium, 2007). These genes were never reported in the literature
for these types of vitiligo therefore, might be relevant in
understanding the molecular aspects of vitiligo.
Figure 3. GGI Network.
[121]Figure 3
[122]Open in a new tab
Gene–gene Interaction network of the predicted set of differential
genes in lesional, non-lesional, and peri-lesional condition through
the set of 129 unique genes; co-expressed genes are shown in gray
nodes; constructed using Cytoscape 3.7 ([123]Shannon, 2003).
Data browsing and web interface
The main browsing window of the VIRdb holds the links to Target
proteins, Potential hits and differential genes ([124]Fig. 4A). The
end-user can either browse any of the sections or can directly download
the data in CSV file format. The network files are only available for
download and can be downloaded in SIF of the Cytoscape 3.7. The target
protein toggle directs the user to the comprehensive list of proteins
that were used as targets for the molecular docking studies
([125]Shannon, 2003). The other two toggles of potential hits and
differential genes direct the user towards the dynamic tables where the
user can select from the types of vitiligo and target specific drugs.
The target proteins have individual profiles as shown in [126]Fig. 4B,
which comprises their active site residue references along with the