Abstract

   The cardiomyopathies are a group of heart muscle diseases which can be
   inherited (familial). Identifying potential disease-related proteins is
   important to understand mechanisms of cardiomyopathies. Experimental
   identification of cardiomyophthies is costly and labour-intensive. In
   contrast, bioinformatics approach has a competitive advantage over
   experimental method. Based on “guilt by association” analysis, we
   prioritized candidate proteins involving in human cardiomyopathies. We
   first built weighted human cardiomyopathy-specific protein-protein
   interaction networks for three subtypes of cardiomyopathies using the
   known disease proteins from Online Mendelian Inheritance in Man as
   seeds. We then developed a method in prioritizing disease candidate
   proteins to rank candidate proteins in the network based on “guilt by
   association” analysis. It was found that most candidate proteins with
   high scores shared disease-related pathways with disease seed proteins.
   These top ranked candidate proteins were related with the corresponding
   disease subtypes, and were potential disease-related proteins.
   Cross-validation and comparison with other methods indicated that our
   approach could be used for the identification of potentially novel
   disease proteins, which may provide insights into
   cardiomyopathy-related mechanisms in a more comprehensive and
   integrated way.

Introduction

   The cardiomyopathies are the myocardial disorders in which the heart
   muscle (or myocardium) is structurally and functionally abnormal, but
   there are not coronary artery disease, hypertension, valvular disease
   and congenital heart disease [45][1]. The cardiomyopathies can be
   classified into five subtypes: (i) hypertrophic cardiomyopathy (HCM),
   in which a portion of the myocardium is hypertrophied (thickened), and
   the heart has to work hard to pump blood [46][2]; (ii) arrhythmogenic
   right ventricular cardiomyopathy (ARVC), characterized by a predominant
   right ventricular replacement of the myocardium by partial or total
   adipose or fibroadipose tissue and ventricular arrhythmias [47][3];
   (iii) dilated cardiomyopathy (DCM), in which the heart becomes larger
   (dilated), and is unable to pump blood efficiently [48][4]; (iv)
   restrictive cardiomyopathy (RCM), a rare form, in which the heart
   involves impaired diastolic filling with blood [49][5]; and (v)
   unclassified [50][6], [51][7]. Most cardiomyopathies are autosomal
   dominantly inherited. X-linked, autosomal recessive, and mitochondrial
   inheritance have also been reported [52][8]. Some environmental factors
   have been shown to cause cardiomyopathies, such as dietary salt
   exacerbates [53][9], abuse of alcohol, cocaine or antidepressant
   medications [54][10]. Since cardiomyopathies are major causes of
   morbidity and mortality and proteins are impacted by most
   disease-related mutations and conduct functions finally, the
   identification of disease-related proteins is very important for
   understanding mechanisms of cardiomyopathies development.

   Genome-wide linkage and association studies have identified chromosomal
   regions which contain hundreds of candidate genes associated with these
   genetic diseases [55][11]. It still remains a big challenge to identify
   the potential proteins associated with genetic diseases using
   experimental methods with up-to-date technologies. Thus, computational
   predictions or candidate prioritizations of candidate proteins become
   attractive and draw much attention to researchers since they are cheap
   and effortless [56][12]. In recent years, identifying candidate genes
   of complex diseases was mainly based on biochemical networks such as
   metabolic networks [57][13], transcriptional regulatory networks
   [58][14], and protein-protein interaction (PPI) networks (PPINs)
   [59][15], which can be obtained at a large scale via high-throughput
   screening [60][16]. Several algorithms have been developed to utilize
   PPINs for mining or prioritizing potential disease candidate genes to
   understand genetic diseases [61][17]–[62][28] since the candidate genes
   related to specific (or similar) disease phenotypes tend to be located
   in a “local neighborhood” in the PPIN [63][29]–[64][31]. For example,
   Chen et al. developed a computational method to rank candidate genes
   for Alzheimer Disease (AD) based on an initial list of AD-related genes
   and public human PPI data [65][32]. DADA was built up as a suite to
   prioritize disease candidate genes accounting for the degree
   distribution of known disease and candidate genes, using a PPI network
   [66][33]. ToppGene and ToppNet were online candidate gene
   prioritization tools with high reliability based on functional
   similarity or network analysis in PPIN [67][23]. These algorithms take
   a set of seed proteins (genes known to be associated with the disease
   of interest), candidate proteins (genes in linkage intervals for the
   disease, genomic regions that has been associated with the disease, of
   interest or neighbors of seed proteins in PPINs), and a human PPIN as
   input. They use PPIs to infer the relationship between seed and
   candidate proteins, followed by ranking the candidate proteins
   according to the inferred relationships. Since the proteins with direct
   interactions tend to have the same or similar functions [68][34],
   called “guilt by association” [69][35], disease-related PPIN and
   functional similarities of protein pairs can be used to predict
   disease-related proteins more accurately.

   In this study, we proposed a method in prioritizing disease candidate
   proteins to rank each protein in the network based on “guilt by
   association” analysis. At first, we obtained the seed proteins of DCM,
   HCM and ARVC from Online Mendelian Inheritance in Man (OMIM,
   [70]http://www.ncbi.nlm.nih.gov/omim) [71][36] (other two subtypes, RCM
   and unclassified, were neglected since their disease genes in OMIM were
   very rare). We then built cardiomyopathy (DCM, HCM or ARVC)-specific
   PPINs composed of seed proteins and their direct neighbors (candidate
   proteins) from human PPI data in the STRING database [72][37].
   Secondly, we combined the functional similarity of Gene Ontology (GO,
   [73]http://www.geneontology.org/) [74][38] with protein interaction
   confidence to weigh each interacting protein pair in
   cardiomyopathy-specific PPINs. Subsequently, we measured the disease
   relevance score for each protein by adding interaction confidence and
   functional similarity of its neighbors and subtracting the likely
   effect of its interacting proteins. Finally, we took the proteins
   ranked at top of each candidate list in descending order of disease
   relevance score as potential disease-related proteins followed by
   leave-one-out cross-validation (LOOCV) and comparison with Chen’s
   protein ranking method, DADA, ToppGene and ToppNet.

Materials and Methods

   We presented a method in prioritizing disease candidate proteins to
   rank candidate cardiomyopathies proteins based on “guilt by
   association” analysis ([75]Figure 1). Pathway enrichment analysis was
   then conducted to examine the relevance between the proteins at the top
   of each ranked list and cardiomyopathies. At last, the proposed method
   was compared with other methods to test its performance.

Figure 1. The workflow of our method in prioritizing disease candidate
proteins.

   [76]Figure 1
   [77]Open in a new tab

   First, cardiomyopathy (DCM, HCM or ARVC)-specific PPINs were
   constructed, which were composed of seed proteins and their direct
   neighbors (candidate proteins) from human PPIN. Secondly, two weights
   (interaction confidence scores and functional similarities) were used
   to measure each protein interaction. The disease relevance score of
   each protein was measured by using these weights. Finally, the proteins
   ranked at top of each candidate list in descending order of disease
   relevance score were taken as potential disease-related proteins.

Screening of Seed Proteins of Cardiomyopathies

   The disease-related genes of three subtypes of cardiomyopathy were
   obtained from OMIM. As a result, 33 DCM, 24 HCM, and 9 ARVC genes were
   selected as seed genes, respectively. These genes were further
   converted into their corresponding standard symbols by using the HUGO
   Gene Nomenclature Committee (HGNC) database
   ([78]http://www.genenames.org) [79][39] ([80]Table 1) and seed proteins
   were generated as referred to the proteins corresponding to these seed
   genes.

Table 1. Official symbols of seed genes of DCM, HCM and ARVC.

               DCM                     HCM                   ARVC
   Seed genes/ NEXN ([81]Q0ZGT2[82]^*) NEXN ([83]Q0ZGT2)     RYR2 ([84]Q92736)
   proteins    LMNA ([85]P02545)       TNNT2 ([86]P45379)    TMEM43 ([87]Q9BTV4)
               TNNT2 ([88]P45379)      TTN ([89]Q8WZ42)      RPSA ([90]P08865)
               PSEN2 ([91]P49810)      CAV3 ([92]P56539)     DSP ([93]P15924)
               ACTN2 ([94]P35609)      MYL3 ([95]P08590)     PKP2 ([96]Q99959)
               TTN ([97]Q8WZ42)        TNNC1 ([98]P63316)    TGFB3 ([99]P10600)
               DES ([100]P17661)       MYOZ2 ([101]Q9NPC6)   JUP ([102]P14923)
               SCN5A ([103]Q14524)     SLC25A4 ([104]P12235) DSC2 ([105]Q02487)
               TNNC1 ([106]P63316)     MYO6 ([107]Q9UM54)    DSG2 ([108]Q14126)
               SDHA ([109]P31040)      PLN ([110]P26678)
               SGCD ([111]Q92629)      PRKAG2 ([112]Q9UGJ0)
               DSP ([113]P15924)       VCL ([114]P18206)
               PLN ([115]P26678)       COX15 ([116]Q7KZN9)
               EYA4 ([117]O95677)      CSRP3 ([118]P50461)
               GATAD1 ([119]Q8WUU5)    MYBPC3 ([120]Q14896)
               FKTN ([121]O75072)      MYL2 ([122]P10916)
               VCL ([123]P18206)       MYH6 ([124]P13533
               LDB3 ([125]O75112)      MYH7 ([126]P12883)
               RBM20 ([127]Q5T481)     ACTC1 ([128]P68032)
               BAG3 ([129]O95817)      TPM1 ([130]P09493)
               CSRP3 ([131]P50461)     CALR3 ([132]Q96L12)
               MYBPC3 ([133]Q14896)    TNNI3 ([134]P19429)
               ABCC9 ([135]O60706)     MYLK2 ([136]Q9H1R3)
               TMPO ([137]P42166)      JPH2 ([138]Q9BR39)
               MYH6 ([139]P13533)
               MYH7 ([140]P12883)
               PSEN1 ([141]P49768)
               ACTC1 ([142]P68032)
               TPM1 ([143]P09493)
               TCAP ([144]O15273)
               DSG2 ([145]Q14126)
               TNNI3 ([146]P19429)
               DMD ([147]P11532)
   [148]Open in a new tab
   ^*

   Accession number of the corresponding protein.

Construction of Weighted Cardiomyopathy-specific PPINs

   The method of the nearest-neighbor expansion was applied to obtain the
   direct neighbors of seed proteins of DCM, HCM and ARVC from human PPI
   data in the STRING database [149][37] (version 8.3). To be more
   comprehensive, all the interaction relationships of seed proteins were
   kept as original ones from STRING. DCM, HCM and ARVC-specific PPINs
   were built. As a result, 5624, 3869, 2173 nodes and 14569, 8972, 3003
   edges were generated, respectively. Direct neighbors of seed proteins
   in these cardiomyopathy-specific PPINs were considered as the candidate
   proteins of three cardiomyopathy subtypes.

   Two weights were used to measure each protein interaction (i.e. each
   edge of the network). The first weight is the confidence score Inline
   graphic from STRING [150][40], and the second one is the functional
   similarity Inline graphic by combining functional enrichment analysis
   of GO. The functional similarity Inline graphic was computed by
   employing an R package GOSim [151][41], which ranged from 0 to 1
   according to GO annotations. Finally, the edge-weighted cardiomyopathy
   (DCM, HCM or ARVC)-specific PPINs were constructed.

Calculation of Disease Relevance Score Based on “Guilt by Association”
Analysis

   We presented a method in prioritizing disease candidate proteins to
   measure the relevance of each candidate protein to a disease in each
   cardiomyopathy-specific PPIN. In this method, the relevance between a
   protein and seed proteins in its neighborhood was estimated by “guilt
   by association” effects of seed proteins to candidate proteins, i.e.
   connectivity, interaction confidences and functional similarities of
   each protein in cardiomyopathy-specific PPINs. Briefly, the disease
   relevance score of one protein was measured by adding interaction
   confidence and functional similarity of its neighbors and by
   subtracting the effect of promiscuous connections between this protein
   and its interacting proteins. A disease relevance score Inline graphic
   for each protein Inline graphic in each cardiomyopathy-specific PPIN
   was calculated as follows:
   graphic file with name pone.0071191.e006.jpg

   where Inline graphic and Inline graphic represent two proteins; Inline
   graphic and Inline graphic are empirical constants ( Inline graphic and
   Inline graphic were set up after screening them from all combinations
   of Inline graphic and Inline graphic from 1 to 10, respectively. LOOCV
   was used to identify constants and to reduce overfit bias); Inline
   graphic is the set of proteins interacting with Inline graphic in the
   cardiomyopathy-specific PPIN; Inline graphic is the interaction
   confidence score of protein pair Inline graphic and j; Inline graphic
   is the functional similarity value of protein pair Inline graphic and
   j; and Inline graphic is 1 if protein j belongs to the
   cardiomyopathy-specific PPIN (or 0 otherwise). The score Inline graphic
   ranks higher in the situations where there are more interacting
   proteins, higher confidence interactions and functional similarities
   with seed proteins among its neighbors.

   Each candidate protein in the network was ranked by descending order of
   Inline graphic , and the performance of these prioritizations was
   assessed.

Pathway Analysis of Top Ranked Candidate Proteins

   To further examine the functional relevance between the candidate
   proteins that were ranked at the top of each ranked list and
   cardiomyopathies, KEGG pathway enrichment analysis was applied for top
   50 candidate proteins using the Functional Annotation Tool in DAVID
   Bioinformatics Resources 6.7 ([152]http://david.abcc.ncifcrf.gov/)
   [153][42], [154][43]. P value less than 0.05 was considered as
   significance.

Assessment of the Developed Method Prioritizing Disease Candidate Proteins

   LOOCV was applied to assess the developed method. For all seed
   proteins, one protein was removed as a test protein at each time, and
   was added to candidate proteins. Cardiomyopathy-specific PPINs were
   then reconstructed using the newly generated sets of seed proteins and
   their direct neighbors. All the candidate proteins were ranked by the
   developed method to determine the rank of the test protein. This
   procedure was repeated until all the seed proteins were used up as test
   proteins. In the end, the result generated by our method was compared
   with those of Chen’s protein ranking method, DADA, ToppGene and ToppNet
   using the same seed and candidate proteins as our method did.

   To compare these methods, receiver operating characteristic (ROC)
   curves were plotted by sensitivity and specificity values of
   prioritizations. Sensitivity refers to the percentage of the removed
   seed proteins which were ranked over a particular threshold.
   Specificity refers to the proportion of non-test proteins which were
   ranked below the threshold [155][44]. The area under curve (AUC) is a
   standard measure of performances of these methods.

   At last, we analyzed the top 50 candidate proteins obtained by our
   method and compared them to ones obtained from Chen’s protein ranking
   method, which had the best performance among the methods used to
   compare with our method, to further assess the performance of our
   method by literature review from the PubMed database.

Results

Prioritizations of Cardiomyopathies’ Candidate Proteins Based on “Guilt by
Association” Analysis

   We used our developed method to calculate disease relevance scores of
   all candidate proteins and rank them in each of cardiomyopathy-specific
   PPINs. To examine the effectiveness of our disease relevance scores in
   ranking disease proteins, scores of seed proteins were also calculated,
   and compared with those of candidate proteins. It was found that scores
   of all seed proteins were larger than those of candidate proteins. We
   then focused on the relevance between candidate proteins ranked at the
   top of the ranked list and the corresponding disease. Top 50 DCM
   candidate proteins were identified and listed in [156]Table 2.

Table 2. Top 50 candidate proteins from DCM-specific PPIN.

   Protein[157]^* Accession number Rank Disease relevance score Relevance
   Literature
   MYL2 [158]P10916 1 107629670.500 cardiomyopathy [159][54], [160][55]
   MYL3 [161]P08590 2 73161485.690 cardiomyopathy [162][54], [163][55]
   TNNI1 [164]P19237 3 52203517.710
   MYH14 [165]Q7Z406 4 45310956.140
   NEB [166]P20929 5 44470968.060
   TNNI2 [167]P48788 6 29830367.350
   GJA1 [168]P17302 7 24199871.590 cardiac arrhythmias [169][56]
   ACTA1 [170]P68133 8 23568069.020 DCM [171][45]
   MYL1 [172]P05976 9 19532022.970
   TNNC2 [173]P02585 10 18200557.600
   TPM2 [174]P07951 11 17743459.410 cardiac dysfunction [175][57]
   VIM [176]P08670 12 15069691.590
   TNNT3 [177]P45378 13 12691030.660
   TNNT1 [178]P13805 14 10598443.490
   GJA5 [179]P36382 15 9497687.715 cardiac arrhythmias [180][56]
   SP4 [181]Q02446 16 7947705.183
   MYL4 [182]P12829 17 7586555.815
   TPM3 [183]P06753 18 7337310.387
   TMOD1 [184]P28289 19 7014582.495 DCM [185][46]
   MYOT [186]Q9UBF9 20 6610783.862
   TPM4 [187]P67936 21 6600584.576
   MYH3 [188]P11055 22 6368705.641
   MYBPC1 [189]Q00872 23 6317707.232
   MYBPC2 [190]Q14324 24 4820328.071
   CAV3 [191]P56539 25 3746694.232 DCM [192][47], [193][48]
   MYOD1 [194]P15172 26 2449472.494 cardiomyopathy [195][78]
   CALM1 [196]P62158 27 2291170.029 DCM [197][45]
   ACTB [198]P60709 28 2039574.700
   MYOG [199]P15173 29 1304602.403
   DNAH8 [200]Q96JB1 30 1247401.222
   CKM [201]P06732 31 1139350.347 DCM [202][49]
   CAPN3 [203]P20807 32 1108281.072
   PRKAG2 [204]Q9UGJ0 33 1068188.085 cardiomyopathy [205][79]
   ZMPSTE24 [206]O75844 34 1050441.814 DCM [207][50]
   AMY1A [208]P04745 35 1028615.159
   AMY1B [209]P04745 36 983506.614
   HRAS [210]P01112 37 931783.319 cardiomyopathy [211][80]
   HLA-DR4 [212]P13760 38 926276.147 DCM [213][51], [214][52]
   DNM2 [215]P50570 39 896713.915
   NKX2-5 [216]P52952 40 791481.623 cardiomyopathy [217][81]
   FXN [218]Q16595 41 591821.092 cardiomyopathy [219][82]
   DYSF [220]O75923 42 591430.047 DCM [221][53]
   AMY2A [222]P04746 43 547354.552
   ACTG1 [223]P63261 44 532313.791
   C1QBP [224]Q07021 45 491410.248 cardiac cell damage [225][58]
   CALD1 [226]Q05682 46 485082.005
   AMY2B [227]P19961 47 476690.079
   DAG1 [228]Q14118 48 475398.858 DCM [229][59], [230][60]
   AMY1C [231]P04745 49 455785.472
   PRKCA [232]P17252 50 442473.436
   [233]Open in a new tab
   ^*

   Proteins are represented in their corresponding gene symbols.

   In these candidate proteins, 9 out of 50 have been reported to be
   DCM-related proteins (as shown in [234]Table 2) in literature using the
   PubMed database. For example, TMPP could be a promising drug for
   prevention and treatment of DCM since it reduces expression of ACTA1
   and CALM1 in the DCM heart [235][45]. TMOD1 was shown to be
   over-expressed and associated with DCM in juvenile mice [236][46]. CAV3
   was found to be mutated in two patients with DCM [237][47], [238][48].
   Protein levels of CKM activity were found to decrease in DCM patients
   [239][49]. Histopathological analysis of the mutant mice with
   disruption of the gene ZMPSTE24 revealed DCM [240][50]. Statistically
   elevated frequency of HLA-DR4 allele was found in patients with DCM
   compared with ones in controls [241][51], [242][52]. DYSF generally
   resulted in mild cardiac abnormalities to severe DCM [243][53]. The
   rest of our potential disease proteins (41) were not directly
   associated to DCM in literature review. However, 7 of them may be
   related to the processes associated with DCM since these candidate
   proteins were directly associated to other types of cardiomyopathies,
   such as HCM (e.g. MYL2 and MYL3 [244][54], [245][55]). Four proteins
   (GJA1, TPM2, GJA5 and C1QBP) were related with cardiac arrhythmias,
   cardiac dysfunction and cardiac cell damage, and might also be
   responsible for DCM [246][56]–[247][58]. More experiments are needed to
   study their associations with DCM.

   For HCM-specific PPIN and ARVC-specific PPIN, top 50 candidate proteins
   were identified and their relevance with cardiomyopathies was listed in
   [248]Table S1 and [249]S2, respectively. It demonstrated that the most
   of top 50 candidate proteins identified in our method were associated
   with cardiomyopathies ([250]Table 2, [251]Table S1, and [252]S2).

   We searched cardiomyopathy seed proteins and top 50 candidate proteins
   in their corresponding disease pathways in KEGG. Four candidate
   proteins were found in DCM pathway ([253]Figure 2), one of which (DAG1)
   have been validated to be a DCM-related protein in an investigation of
   glycosylation pathways in biopsied heart tissue due to autosomal
   recessive mutations [254][59], [255][60].

Figure 2. DCM pathway.

   [256]Figure 2
   [257]Open in a new tab

   DCM seed proteins are colored in cyan. Red nodes are proteins which
   were verified to be DCM-related proteins, and yellow nodes represent
   proteins which are potential DCM-related proteins.

   Although MYL2 and MYL3 have not been validated to be directly
   associated to DCM, their relationships with other cardiomyopathies have
   been mentioned in literature. They encode sarcomere proteins that cause
   adult-onset cardiomyopathies when mutated [258][54], [259][55], and may
   cause DCM eventually.

   ACTG1 was shown to link to the seed gene DMD and sarcomere, two
   important factors of DCM in the DCM pathway [260][61], [261][62]. There
   is no evidence about the relationship between ACTG1 and
   cardiomyopathies and more studies are needed.

   Moreover, five and seven candidate proteins in HCM and ARVC pathways
   were found, respectively. One from each pathway has been validated to
   be disease proteins, respectively ([262]Figure S1 and [263]S2). These
   results demonstrated that our method in prioritizing disease candidate
   proteins could provide a new alternative for researchers to predict
   novel disease proteins, i.e. the top ranked candidate proteins without
   literature review.

Pathway Analysis of the Top Ranked Candidate Proteins

   KEGG pathway enrichment analysis (p<0.05) was performed for the top 50
   candidate proteins to illustrate the relationships between disease
   pathways of three subtypes of cardiomyopathies and other pathways
   ([264]Figure 3, [265]Figure S3, and [266]S4). It was shown that DCM
   disease pathway was related to both HCM and ARVC pathways. DCM-related
   pathways that DCM seed genes enriched in were in the inner space
   ([267]Figure 3).

Figure 3. DCM pathway and its relevant pathways.

   [268]Figure 3
   [269]Open in a new tab

   DCM pathway is colored in yellow. Purple nodes are DCM-related
   pathways, and green nodes are other pathways. Black edges connect
   pathways which are directly connected to the DCM pathway.

   We surveyed relationships between the DCM pathway and other pathways
   using literature from the PubMed database. It was found that reduced
   focal adhesion kinase-S910 phosphorylation might contribute to
   sarcomere disorganization in DCM [270][63], suggesting direct
   regulation of focal adhesion to DCM. STC1, which was up-regulated in
   DCM, effectively blocked down-regulation of endothelial tight junction
   proteins at both mRNA and protein levels [271][64]. Moreover,
   deregulation of proteins of carbohydrate metabolism, the actin
   cytoskeleton, and extracellular matrix remodeling were observed in DCM
   patients [272][65]. The association between Escherichia coli infection
   and DCM was established in a patient who was diagnosed as DCM after the
   onset of hemolytic uremic syndrome caused by pathogenic Escherichia
   coli infection [273][66]. In addition, leukocyte transendothelial
   migration was found to be pivotal to the inflammatory response
   [274][67], which could lead to direct injury or severe host disease,
   such as DCM [275][68].

Assessment of the Developed Method

   To assess the performance of our method, we compared our method with
   Chen’s protein ranking method, DADA, ToppGene and ToppNet using LOOCV.
   ROC curves were then plotted to demonstrate performances of these
   methods for DCM. It was found that our method reached a higher AUC
   score (0.963) than Chen’s protein ranking method (0.956), DADA (0.854),
   ToppGene (0.884) and ToppNet (0.741), indicating that our method was
   more sensitive and specific in ranking the test proteins. The
   performances of these five methods were also compared for HCM and ARVC,
   and similar results were obtained ([276]Table 3), except that DADA was
   better only for HCM. These results demonstrated that our method had a
   better overall performance.

Table 3. AUC for three subtypes of cardiomyopathies obtained using five
different methods.

   Our developed method Chen’s protein ranking method DADA ToppGene
   ToppNet
   DCM 0.963 0.956 0.854 0.884 0.741
   HCM 0.919 0.916 0.979 0.911 0.716
   ARVC 0.995 0.934 0.770 0.946 0.756
   [277]Open in a new tab

   To further test the performance of our method, top 50 candidate
   proteins from our method were compared with those from Chen’s protein
   ranking method by literature review. The result for DCM was shown in
   [278]Figure 4. We found 36 proteins in both protein sets, in which 8
   were confirmed to be DCM-related ([279]Table 2). Among the remaining
   proteins, 8 in our potential protein list ([280]Table 2) and 5 in
   Chen’s protein ranking list [281][69]–[282][73] were found to be
   related to the processes associated with DCM.

Figure 4. The number of proteins related with DCM.

   [283]Figure 4
   [284]Open in a new tab

   50 potential disease proteins identified either by our developed method
   (the top left circle), or by Chen’s protein ranking method (the top
   right circle), and the number of proteins which have been confirmed to
   be related with DCM in literature were plotted.

   Results for HCM and ARVC were shown in [285]Figure S5 and [286]S6. In
   general, top 50 candidate proteins from our method contained more
   cardiomyopathy-related proteins than those from Chen’s protein ranking
   method.

Crosstalk of Three Subtypes of Cardiomyopathies

   To illustrate relationship among three subtypes of cardiomyopathies, we
   compared seed proteins of DCM, HCM and ARVC. Thirteen seed proteins
   were found in both DCM and HCM, suggesting that there might be similar
   mechanisms in these diseases [287][74]. For example, most of these
   proteins comprised the sarcomeric proteins ([288]Figure 2 and
   [289]Figure S1). However, there were no common seed genes identified to
   be shared between ARVC and DCM or HCM.

   To further explore relationship among three subtypes of
   cardiomyopathies, we compared the top 50 candidate proteins of each
   cardiomyopathy. Only 2 proteins were found in the protein lists of all
   the three cardiomyopathies and 29 proteins were found in those of both
   DCM and HCM. Since all the pathways of these subtypes of
   cardiomyopathies involve integrins, dystroglycan complex and sarcomere
   ([290]Figure 2, [291]Figure S1, and [292]S2), which are important in
   cardiac myocyte and cardiac muscle contraction [293][75], [294][76], it
   suggested that there was a crosstalk of these three subtypes of
   cardiomyopathies, especially between DCM and HCM.

   Investigation of disease pathways and their related pathways
   ([295]Figure 3, [296]Figure S3, and [297]S4) showed that these three
   subtypes of cardiomyopathies were also closely related. Four pathways
   were found to share among the three diseases: Tight junction, Focal
   adhesion, Regulation of actin cytoskeleton and Leukocyte
   transendothelial migration. These results further implied a crosstalk
   of three subtypes of cardiomyopathies.

Discussion

   In our study, OMIM was applied to obtain seed proteins of three
   subtypes of cardiomyopathies, DCM, HCM and ARVC. With these seed
   proteins, we built cardiomyopathy (DCM, HCM or ARVC)-specific PPINs
   through the nearest-neighbor expansion method and weighted each protein
   pair in the network. We then developed a method to prioritize disease
   candidate proteins by calculating disease relevance score and ranking
   each protein in the network. As a result, it was shown that proteins
   ranked at the top of candidate proteins were potential
   cardiomyopathy-related proteins, which opened a new door for the
   further research of cardiomyopathy pathogenesis. By analyzing top 50
   candidate proteins, it was found that most of them and cardiomyopathy
   seed proteins shared common disease-related KEGG pathways. The
   performance of our method was evaluated based on the following
   criteria: 1). literature review. By the literature review from the
   PubMed database, it was found that top 50 candidate proteins were
   closely correlated with cardiomyopathies; 2). KEGG pathway enrichment
   analysis. Our results revealed that the most of seed and top 50
   candidate proteins were enriched in disease-related functional classes
   and pathways; 3). leave-one-out cross-validation. In this validation,
   candidate proteins were ranked by using our method in prioritizing
   disease candidate proteins, Chen’s method, DADA, ToppGene and ToppNet.
   It was found that the overall performance of our method was better than
   all the others.

   The reliability and precision of our method were improved based on the
   following factors. Firstly, we measured the relevance of each protein
   with disease based on “guilt by association” analysis by adding
   interaction confidence and functional similarity of its neighbors and
   subtracting the effect of promiscuously connections between the protein
   and its interacting proteins. Secondly, the relationships of
   comprehensive human disease protein interaction were obtained using
   protein interaction data from the STRING database. Thirdly, the
   establishment of cardiomyopathy-specific PPINs using disease-related
   seed proteins enhanced specific connections between disease proteins
   and their interacting proteins, reduced promiscuous connections, and
   had higher fidelity interaction confidence. Finally, identified top 50
   candidate proteins in weighted cardiomyopathy-specific PPINs by using
   our method in prioritizing disease candidate proteins were more
   strongly associated with cardiomyopathies, and the crosstalk of three
   subtypes of cardiomyopathies could be obtained through common proteins
   in three ranked lists. As a result, we obtained more proteins which
   were closely associated with cardiomyopathies in literature and some
   new proteins with the unknown roles involving in cardiomyopathies
   ranked at the top of candidate proteins. Further investigation is
   required to prove their disease relevance.

   To further examine the performance of our method in prioritizing
   disease candidate proteins, we compared it with GeneMANIA [298][77] and
   ToppGenet in ToppGene. GeneMANIA and ToppGenet prioritize neighboring
   proteins of seeds in their background networks, which were different
   from our method. It was shown that our method obtained a higher AUC
   score than other two methods ([299]Table 4) for each subtype of
   cardiomyopathies. It is worth noting that better AUC scores were
   obtained using ToppGenet with distance 3 or 4 from seeds, which implied
   that proteins not in the direct neighborhood of seed proteins might
   also be disease-related. Besides, since our method depends on PPI,
   disease proteins with unknown PPI failed to be identified or ranked. We
   will develop a more comprehensive approach taking proteins not in the
   direct neighborhood of seed proteins and other information, such as
   expression, into account in the future to predict disease associated
   proteins.

Table 4. AUC for three subtypes of cardiomyopathies obtained using GeneMANIA
and ToppGenet.

   Our developed method GeneMANIA ToppGenet
   Distance to seeds
   1 2 3 4
   DCM 0.963 0.466 Network based 0.373 0.660 0.728 0.725
   Functional annotation based 0.485 0.724 0.774 0.809
   HCM 0.919 0.588 Network based 0.291 0.670 0.767 0.776
   Functional annotation based 0.369 0.834 0.905 0.904
   ARVC 0.995 0.569 Network based 0.519 0.630 0.665 0.669
   Functional annotation based 0.801 0.873 0.894 0.894
   [300]Open in a new tab

   In conclusion, the method in prioritizing disease candidate proteins
   based on “guilt by association” analysis has proven its ability to more
   precisely identify potential disease-related proteins. This study not
   only provided a new methodology for studying human cardiomyopathy
   disease, but also shed light on process and mechanisms of human
   cardiomyopathy and other complex diseases.

   Full names of all gene/protein symbols used in the main text are listed
   below:

   ACTA1: actin, alpha 1, skeletal muscle

   CALM1: calmodulin 1

   TMOD1: tropomodulin 1

   CAV3: caveolin 3

   CKM: creatine kinase, muscle

   ZMPSTE24: zinc metallopeptidase STE24

   HLA-DR4: major histocompatibility complex, class II, DR beta 1

   DYSF: dysferlin, limb girdle muscular dystrophy 2B

   MYL2: myosin, light chain 2, regulatory, cardiac, slow

   MYL3: myosin, light chain 3, alkali; ventricular, skeletal, slow

   DAG1: dystroglycan 1

   ACTG1: actin, gamma 1

   DMD: dystrophin

   STC1: stanniocalcin 1

Supporting Information

   Figure S1

   HCM pathway. HCM seed proteins are colored in cyan. Red nodes are
   proteins which were verified to be HCM-related proteins, and yellow
   nodes represent proteins which are potential HCM-related proteins.

   (DOC)
   [301]Click here for additional data file.^ (209KB, doc)
   Figure S2

   ARVC pathway. ARVC seed proteins are colored in cyan. Red nodes are
   proteins which were verified to be ARVC-related proteins, and yellow
   nodes represent proteins which are potential ARVC-related proteins.

   (DOC)
   [302]Click here for additional data file.^ (249.5KB, doc)
   Figure S3

   HCM pathway and its relevant pathways. The HCM pathway is colored in
   yellow. Purple nodes are HCM-related pathways, and green nodes are
   other pathways. Black edges connect pathways which are directly
   connected to the HCM pathway.

   (DOC)
   [303]Click here for additional data file.^ (167KB, doc)
   Figure S4

   ARVC pathway and its relevant pathways. The ARVC pathway is colored in
   yellow. Green nodes are other pathways. Black edges connect pathways
   which are directly connected to the ARVC pathway.

   (DOC)
   [304]Click here for additional data file.^ (225.5KB, doc)
   Figure S5

   The number of proteins related with HCM. 50 potential disease proteins
   identified either by our developed method (the top left circle) or by
   Chen’s protein ranking method (the top right circle), and the number of
   proteins that have been confirmed to be related with HCM in literature
   were plotted.

   (DOC)
   [305]Click here for additional data file.^ (72.5KB, doc)
   Figure S6

   The number of proteins related with ARVC. 50 potential disease proteins
   identified either by our developed method (the top left circle) or by
   Chen’s protein ranking method (the top right circle), and the number of
   proteins which have been confirmed to be related with ARVC in
   literature were plotted.

   (DOC)
   [306]Click here for additional data file.^ (72.5KB, doc)
   Table S1

   Top 50 candidate proteins from HCM-specific PPIN.

   (DOC)
   [307]Click here for additional data file.^ (87.5KB, doc)
   Table S2

   Top 50 candidate proteins from ARVC-specific PPIN.

   (DOC)
   [308]Click here for additional data file.^ (97KB, doc)

Funding Statement

   This work was supported by the National Natural Science Foundation of
   China (Grant NO. 61272388), the National Natural Science Foundation of
   Heilongjiang (Grant NO. F201237), the Science & Technology Research
   Project of the Heilongjiang Ministry of Education (Grant No. 12511271)
   and the Student Innovation Funds of Heilongjiang Province (Grant No.
   2010-016HMU and 2012-011HLJ). The funders had no role in study design,
   data collection and analysis, decision to publish, or preparation of
   the manuscript.

References