Abstract

   Vitiligo is a chronic asymptomatic disorder affecting melanocytes from
   the basal layer of the epidermis which leads to a patchy loss of skin
   color. Even though it is one of the neglected disease conditions,
   people suffering from vitiligo are more prone to psychological
   disorders. As of now, various studies have been done in order to
   project auto-immune implications as the root cause. To understand the
   complexity of vitiligo, we propose the Vitiligo Information Resource
   (VIRdb) that integrates both the drug-target and systems approach to
   produce a comprehensive repository entirely devoted to vitiligo, along
   with curated information at both protein level and gene level along
   with potential therapeutics leads. These 25,041 natural compounds are
   curated from Natural Product Activity and Species Source Database.
   VIRdb is an attempt to accelerate the drug discovery process and
   laboratory trials for vitiligo through the computationally derived
   potential drugs. It is an exhaustive resource consisting of 129
   differentially expressed genes, which are validated through gene
   ontology and pathway enrichment analysis. We also report 22 genes
   through enrichment analysis which are involved in the regulation of
   epithelial cell differentiation. At the protein level, 40 curated
   protein target molecules along with their natural hits that are derived
   through virtual screening. We also demonstrate the utility of the VIRdb
   by exploring the Protein–Protein Interaction Network and Gene–Gene
   Interaction Network of the target proteins and differentially expressed
   genes. For maintaining the quality and standard of the data in the
   VIRdb, the gold standard in bioinformatics toolkits like Cytoscape,
   Schrödinger’s GLIDE, along with the server installation of MATLAB, are
   used for generating results. VIRdb can be accessed through
   “[34]http://www.vitiligoinfores.com/”.

   Keywords: Vitiligo, Database, Differential genes, Protein targets,
   Molecular docking, Natural compounds

Introduction

   Vitiligo is a chronic asymptomatic disorder affecting melanocytes from
   the basal layer of the epidermis which leads to a patchy loss of skin
   color ([35]Rodrigues et al., 2017). It affects 0.5–2% of the world’s
   population ([36]Ezzedine & Silverberg, 2016). The lesions can appear in
   different shapes and sizes on any visible part of the body ([37]Tarlé
   et al., 2014). Even though it is one of the neglected diseases
   condition people suffering from vitiligo are more prone to
   psychological disorders and the risk of developing hearing problems
   ([38]Ezzedine et al., 2015). The auto-immune inflammation theory
   remains to be the most acceptable, accounting for its development via
   an adaptive immune response ([39]Boniface et al., 2017). According to
   biochemical theory, accumulation of Reactive Oxygen species (ROS) in
   the melanocytes remains the pioneer, for activation of CD8+ T cells
   against melanocytes ([40]Wang, Li & Li, 2019; [41]Deng et al., 2019).
   Previous studies have also reported that HIF-1-alpha, F2RL-1 and PIK3CB
   over-expression in CD8+ cells ([42]Deng et al., 2019). HIF-1-alpha
   plays a crucial part in ROS accumulation response, whereas the F2RL-1
   and PIK3CB are responsible for an immune response with CD8+ T cells
   ([43]Deng et al., 2019). Due to loss of melanocytes in individuals
   affected by vitiligo, one can also expect an increased incidence of
   non-melanoma cancer and actinic keratose ([44]Zhang et al., 2016).
   Arresting the progression of lesions and repigmentation of formed
   lesions remains to be a crucial strategy for treatment ([45]Bishnoi &
   Parsad, 2018). AK-STAT inhibitors have demonstrated promising outcomes
   in the treatment of vitiligo, including repigmentation in topical
   applications. JAK inhibitor, tofacitinib and JAK-1,2 inhibitor,
   ruxolitinib, have been tested positive in causing repigmentation
   ([46]Bishnoi & Parsad, 2018).

   To understand the complexity of the vitiligo, we envisage a
   comprehensive online repository focusing on the systematically derived
   and manually curated potential protein targets along with their
   prospective hits derived from the natural compounds databases. To fully
   understand the genetic complexity of the vitiligo we also aim to
   present a genetic interaction network of the differentially expressed
   genes, in Lesional Vitiligo, Peri-Lesional Vitiligo, and Non-Lesional
   Vitiligo against healthy controls. The predicted differentially
   expressed genes across samples were then validated using gene
   enrichment analysis with Kyoto Encyclopedia of Gene and Genome (KEGG)
   and Gene Ontology (GO) ([47]Kanehisa et al., 2020; [48]The Gene
   Ontology Consortium, 2007). The Vitiligo Information Resource (VIRdb)
   systematically harbors all the information relating to vitiligo in
   terms of Potential protein targets and differentially expressed genes
   in a user-friendly database. As more than 50% of the approved drugs in
   DrugBank are natural compounds, therefore, we took the liberty of
   virtually screening all the natural compounds present in Natural
   Product Activity and Species Source Database (NPASS) against the
   potential protein targets ([49]Wishart et al., 2017; [50]Zeng et al.,
   2017). For maintaining, the quality and standard of the data in the
   VIRdb, the gold standard in bioinformatics toolkits like Cytoscape,
   Schrödinger’s GLIDE, along with the server installation of MATLAB are
   used ([51]Shannon, 2003; [52]Schrödinger, 2020; [53]MATLAB, 2019). The
   screened library can be used to carry wet-lab experiments based on
   potential hits which would thereby lead to considerably less
   consumption of time and other resources to carry out the protocol for
   drug development. We envision that VIRdb will be pertinent for the
   researchers and clinicians engaged in drug development for vitiligo.

Methods

Microarray gene expression analysis

   Gene Expression Omnibus was manually queried using “(“vitiligo” (MeSH
   Terms) OR vitiligo (All Fields)) AND (“humans” (MeSH Terms) OR “Homo
   sapiens” (Organism) OR homo sapiens (All Fields))”. The returned
   results were manually curated for microarray experiments having healthy
   controls against affected individuals. We selected [54]GSE65127 as it
   comprises of 10 healthy controls against 10 conditions (Lesional,
   Peri-lesional, Non-lesional). The CEL files of 40 samples having
   expression data of Lesional, Peri-lesional, Non-lesional against
   healthy control on Affymetrix [55]GPL570 platform were isolated
   ([56]Clough & Barrett, 2016; [57]Regazzetti et al., 2015) data.
   Combinations of {Healthy vs (Lesional, Peri-lesional, Non-lesional)}
   were processed using the server installation of [58]MATLAB (2019) by
   applying the GC Robust Multichip Averaging (GCRMA) method with
   Empirical Bayes ([59]MATLAB, 2019; [60]Wu et al., 2004). GCRMA
   empirical Bayes is the posterior mean estimate which is calculated
   through S, that represents a quantity proportional to the concentration
   of transcripts in the hybridization mixture ([61]Eq. 1). After the raw
   reads from CEL files are converted to expression data through GCRMA,
   the probe sets with un-annotated gene symbols were dropped
   programmatically. A t-test is performed with each gene expression value
   to study the relevant changes in the expression values. False Discovery
   Rate (FDR) for each test situation was computed using Storey–Tibshirani
   procedure ([62]Storey & Tibshirani, 2003). To compute the biological
   expression significance, −log10 of p-values was calculated and plotted
   against log fold change (2 folds).
   [MATH:
   <mi>s</mi><mo>=</mo><mi>E</mi><mrow><mo>[</mo><mrow><mi>s</mi><mrow><mo
   stretchy="false">|</mo></mrow><mi>S</mi><mo>></mo><mn>0</mn><mo>,</mo><
   mrow><mi mathvariant="normal">P</mi><mi
   mathvariant="normal">M</mi></mrow><mo>,</mo><mrow><mi
   mathvariant="normal">M</mi><mi
   mathvariant="normal">M</mi></mrow></mrow><mo>]</mo></mrow><mo>,</mo><ms
   pace width="1em"></mspace><mspace
   width="thickmathspace"></mspace><mrow><mi
   mathvariant="normal">w</mi><mi mathvariant="normal">i</mi><mi
   mathvariant="normal">t</mi><mi
   mathvariant="normal">h</mi></mrow><mspace
   width="0.25em"></mspace><mi>s</mi><mo>=</mo><mrow><mi
   mathvariant="normal">l</mi><mi mathvariant="normal">o</mi><mi
   mathvariant="normal">g</mi></mrow><mrow><mo>(</mo><mi>S</mi><mo>)</mo><
   /mrow> :MATH]
   (1)

Gene–gene interaction network construction

   For designing the gene–gene interaction network for differentially
   expressed genes, GeneMania plugin from Cytoscape was used ([63]Shannon,
   2003; [64]Mostafavi et al., 2020). Initially, 3 distinct networks for
   differentially expressed genes (Expressed in lesional, Expressed in
   non-lesional, and Expressed in peri-lesional) were made. After that,
   all the networks are merged, with common interacting gene nodes in the
   Cytoscape environment ([65]Shannon, 2003). The tangential nodes were
   manually deleted and only the highly connected nodes were taken for
   further analysis. The predicted co-expressed genes from GeneMania
   interaction network were also taken for enrichment analysis
   ([66]Mostafavi et al., 2020).

Gene Ontology (GO) and pathway analysis

   ShinyGO server was used for the pathway enrichment analysis of the
   single expression conditions (Lesional, Peri-Lesional and Non-lesional,
   each distinctly) and unique differentially expressed genes (significant
   expression at least once across the conditions) and predicted GGI genes
   from GeneMania ([67]Mostafavi et al., 2020; [68]Ge, Jung & Yao, 2019).
   The differentially expressed genes were enriched in Kyoto Encyclopedia
   of Gene and Genome database (KEGG) ([69]Kanehisa et al., 2020). The
   false discovery rate of 0.01 was used as the cut off to elute out the
   most significant pathways. For GO enrichment analysis the GO database
   was recruited from the ShinyGO server and GO enrichment analysis was
   functioned with the false discovery rate of <0.01 ([70]The Gene
   Ontology Consortium, 2007; [71]Mostafavi et al., 2020).

Biological data collection

   The Pubmed Engine was queried using “((Vitiligo) AND (Proteins)) AND
   (Protein–Protein Interaction)” for retrieval of literature relating to
   the protein targets ([72]Motschall & Falck-Ytter, 2005). The query
   returned 4 abstracts, out of which study that includes the systemic
   approach for the construction of Protein–Protein Interaction Network
   (PPI) was taken into consideration ([73]Malhotra et al., 2017). PPI
   network of 215 proteins was revitalized using Cytoscape 3.7 and String
   Database ([74]Shannon, 2003; [75]Mering, 2003; [76]Malhotra et al.,
   2017). A total of 50 protein targets were eluted out of the set of
   reported 64 vitiligo core proteins in the study ([77]Malhotra et al.,
   2017). The sub-network of 50 proteins was created manually based on
   coherence of centrality among 64 nodes ([78]Malhotra et al., 2017).
   Finally considering the on the availability of X-ray crystallographic
   structures in Protein Data Bank 40 protein crystal structures were
   taken for molecular docking out of 50 nodes ([79]Appendix I)
   ([80]Berman, 2000; [81]Malhotra et al., 2017). All the potential
   protein targets were annotated manually with their respective UniProt
   ID, PDB ID, STRING ID, KEGG ID, encoding Gene Symbol and target active
   amino acids from the scientific literature ([82]Kanehisa et al., 2020;
   [83]Mering, 2003; [84]Berman, 2000; [85]The UniProt Consortium, 2016).
   Highest-resolution structures were manually curated and confiscated for
   Molecular docking studies for the protein structures that comprises
   multiple deposits in the Protein Data Bank ([86]Wlodawer et al., 2013).
   For identification of the active amino acids, each PDB profile was
   manually curated in reference to its literature ([87]Appendix I).

Virtual screening against NPASS

   As more than 50% of the approved drugs in DrugBank are natural
   compounds, therefore, we took the liberty of virtually screening all
   the natural compounds present in Natural Product Activity and Species
   Source Database (NPASS) against the potential protein targets
   ([88]Wishart et al., 2017; [89]Zeng et al., 2017). The NPASS database
   consists of 25,041 natural compounds ([90]Zeng et al., 2017). Every
   structure was neutralized at the physiological pH and a library of
   36,229 potential ligands was created including the stereoisomers and
   tautomers ([91]Srivastava, Pal & Misra, 2018). Target protein
   structures files were prepared using Schrödinger Protein Preparation
   Wizard by removing excess waters and were minimized using OPLS 2005
   force field ([92]Clark et al., 2019; [93]Sweere & Fraaije, 2017).
   Receptor grid of dimensions of 15 × 15 × 15 angstroms was generated
   around the active amino acids. Glide HTVS pipeline was used to carry
   out virtual screening of the natural compound library to each target
   protein. Ligands were sorted and ranked based on the Glide score as it
   is empirical of various computed scores and is also quite accurate in
   wet-lab experimentation ([94]Greenwood et al., 2010).

Architecture and design

   Vitiligo Information Resource was formulated using Apache HTTP server
   on Linux Platform. MySQL was used to store the information as it offers
   fast and easy query processing because of its RDBMS architecture
   ([95]Mathur et al., 2016). HTML 5 and Cascading Style Sheets were used
   to design the front-end interface of the VIRdb. PHP 7 was used for
   maintaining the dynamic connectivity between the front-end and
   back-end. Each entry in the database was connected to their parent
   database via html anchors. The complete architectural schema of
   vitiligo information resource db is depicted in [96]Fig. 1.

Figure 1. Database architecture.

   [97]Figure 1
   [98]Open in a new tab

   Architecture of VIR database.

Results

Differentially expressed genes

   A total of 123 differential genes for lesional vitiligo against healthy
   controls. Out of 123 genes, 52 genes were predicted to be as
   under-expressed and 70 genes were predicted to be over-expressed in
   accordance with the LogFc filter ([99]Appendix II; [100]Fig. 2A). In
   the case of Non-Lesional vs healthy control, a total of 27 upregulated
   differential genes were identified ([101]Appendix III; [102]Fig. 2B).
   For Peri-Lesional vitiligo, a total of 32 upregulated genes were
   curated ([103]Appendix IV; [104]Fig. 2C). Upon, a union of all the sets
   (Differential Genes, Lesional, Non-Lesional, and Peri-Lesional) as a
   total of 129 differential genes have been curated. Out of which 52 were
   under-expressed and 76 were over-expressed ([105]Appendix V).

Figure 2. Heatmap for lesional, non-lesional and peri-lesional conditions.

   [106]Figure 2
   [107]Open in a new tab

   Heatmap of differentially expressed gene expression values with black
   bars representing the Healthy/control samples and red being the
   case/diseased samples; (A) heatmap of differentially expressed gene
   expression values in lesional vitiligo against Healthy controls; (B)
   heatmap of differentially expressed gene expression values in
   non-lesional vitiligo against healthy controls; (C) heatmap of
   differentially expressed gene expression values in peri-lesional
   vitiligo against healthy controls.

GO and pathway enrichment analysis

   The DCT, EDNRB, TYR, TYRP1, and WNT6 that were from the set of 129
   unique differentially expressed genes across the samples were enriched
   in Melanogenesis pathway with an FDR of 0.0038 in KEGG ([108]Kanehisa
   et al., 2020) ([109]Appendix VI). Upon GO enrichment 6 genes (GPR143,
   OCA2, TYR, DCT, TYRP1, SLC45A2) were enriched in Melanosome membrane
   (Cellular Compartment), 7 genes (TYR, DCT, TYRP1, OCA2, SLC45A2, PMEL,
   CDH3) were enriched in Melanin metabolic process (Biological Process)
   ([110]The Gene Ontology Consortium, 2007) ([111]Appendix VII).
   Surprisingly, 2 genes were enriched with monophenol monooxygenase
   activity (molecular function) and 2 genes were enriched in
   oxidoreductase activity (molecular function), which could account for
   the oxidative stress-induced melanocyte degradation ([112]Wang, Li &
   Li, 2019; [113]Deng et al., 2019) ([114]Appendix VII) (For enrichment
   analysis of the single expression conditions (Lesional, Peri-Lesional
   and Non-lesional, each distinctly) see [115]Supplements 1–[116]3).

Gene–gene interaction network

   We upvote 22 genes that were predicted while construction of common
   network (ACYP1, APLP2, ARNTL, CEBPG, CPEB2, DIP2A, HNRNPA3, HNRNPA3P1,
   HTATSF1, JKAMP, MAFG, MFN2, MKL1, ORC2, PAGR1, PDPK1, PEX5, PRKX, PRKY,
   TLE4, TMEM70, USP47) with GeneMania ([117]Shannon, 2003, [118]Fig. 3;
   [119]Appendix VIII). Upon gene enrichment analysis of these 22 genes
   with GO db MAFG, PRKX, ARNTL were enriched in Regulation of epithelial
   cell differentiation as the biological process ([120]The Gene Ontology
   Consortium, 2007). These genes were never reported in the literature
   for these types of vitiligo therefore, might be relevant in
   understanding the molecular aspects of vitiligo.

Figure 3. GGI Network.

   [121]Figure 3
   [122]Open in a new tab

   Gene–gene Interaction network of the predicted set of differential
   genes in lesional, non-lesional, and peri-lesional condition through
   the set of 129 unique genes; co-expressed genes are shown in gray
   nodes; constructed using Cytoscape 3.7 ([123]Shannon, 2003).

Data browsing and web interface

   The main browsing window of the VIRdb holds the links to Target
   proteins, Potential hits and differential genes ([124]Fig. 4A). The
   end-user can either browse any of the sections or can directly download
   the data in CSV file format. The network files are only available for
   download and can be downloaded in SIF of the Cytoscape 3.7. The target
   protein toggle directs the user to the comprehensive list of proteins
   that were used as targets for the molecular docking studies
   ([125]Shannon, 2003). The other two toggles of potential hits and
   differential genes direct the user towards the dynamic tables where the
   user can select from the types of vitiligo and target specific drugs.
   The target proteins have individual profiles as shown in [126]Fig. 4B,
   which comprises their active site residue references along with the