Abstract

Background

   Feature extraction (FE) is difficult, particularly if there are more
   features than samples, as small sample numbers often result in biased
   outcomes or overfitting. Furthermore, multiple sample classes often
   complicate FE because evaluating performance, which is usual in
   supervised FE, is generally harder than the two-class problem.
   Developing sample classification independent unsupervised methods would
   solve many of these problems.

Results

   Two principal component analysis (PCA)-based FE, specifically,
   variational Bayes PCA (VBPCA) was extended to perform unsupervised FE,
   and together with conventional PCA (CPCA)-based unsupervised FE, were
   tested as sample classification independent unsupervised FE methods.
   VBPCA- and CPCA-based unsupervised FE both performed well when applied
   to simulated data, and a posttraumatic stress disorder (PTSD)-mediated
   heart disease data set that had multiple categorical class observations
   in mRNA/microRNA expression of stressed mouse heart. A critical set of
   PTSD miRNAs/mRNAs were identified that show aberrant expression between
   treatment and control samples, and significant, negative correlation
   with one another. Moreover, greater stability and biological
   feasibility than conventional supervised FE was also demonstrated.
   Based on the results obtained, in silico drug discovery was performed
   as translational validation of the methods.

Conclusions

   Our two proposed unsupervised FE methods (CPCA- and VBPCA-based) worked
   well on simulated data, and outperformed two conventional supervised FE
   methods on a real data set. Thus, these two methods have suggested
   equivalence for FE on categorical multiclass data sets, with potential
   translational utility for in silico drug discovery.

Electronic supplementary material

   The online version of this article (doi:10.1186/s12859-015-0574-4)
   contains supplementary material, which is available to authorized
   users.

   Keywords: Unsupervised feature extraction, Principal component
   analysis, Variational Bayes, Posttraumatic stress disorder, Heart
   disease, In silico drug discovery, chooseLD, FAMS

Background

   Feature extraction (FE) is an important task in bioinformatic analyses,
   as there are often more features than samples. The number of bases
   spanning linear space is at most equivalent to the number of
   independent vectors. Accordingly, more features than samples inevitably
   leads to redundancy. Although dimensional reduction is often used to
   eliminate redundancy, it is far from true redundancy elimination as
   reconstructed bases are usually the linear combination of all features,
   which is not always necessary for spanning entire linear space.

   Instead of dimensional reduction, FE can be used to eliminate
   redundancy, and is often performed to maximize performance of targeted
   tasks (supervised FE), e.g., discrimination between samples or
   regression analysis, although fewer samples than features often creates
   difficulties due to overfitting and/or bias. Multiple class samples
   commonly provide additional problems when supervised FE is used,
   complicating performance evaluations compared with two-class samples.
   Although pairwise evaluations (e.g., one versus one or one versus
   others) are possible, they are time-consuming. FE using a set of
   pairwise evaluations is often even more difficult, because it is hard
   to maximize performance simultaneously for all pairwise evaluations
   with commonly selected features for all pairs. In addition, supervised
   FE is often heavily sample-dependent, and alternative sample sets often
   provide alternative optimal FE. Moreover, with categorical classes the
   problems are greater, since frequently used FE with regression
   analyses, e.g., lasso [[29]1], cannot be directly applied to
   categorical multiclass data sets.

   In order to avoid these difficulties, unsupervised FE is useful as it
   is assumed to be more robust and stable, although this has not been
   extensively studied because of implementation difficulties, e.g.,
   supervision is not based upon performance, therefore FE has nothing
   suitable for optimization. Variational Bayesian approaches eliminate
   redundancy in an unsupervised manner, and have recently been proposed
   as promising unsupervised FE methods. In this paper, we used
   variational Bayes (VB) principal component analysis (PCA) [[30]2,[31]3]
   to perform unsupervised FE. In addition, a simpler and more
   conventional PCA (CPCA)-based unsupervised FE method (CPCAFE) that
   worked well with a simulated data set was proposed as an alternative to
   VBPCA. As VBPCA is more computationally challenging, CPCAFE is an even
   more promising alternative unsupervised FE candidate than the VB
   approach.

   To demonstrate applicability to translational research, we used
   CPCA-based, and partially, VBPCA-based unsupervised FE (VBPCAFE) to
   examine heart disease associated with posttraumatic stress disorder
   (PTSD). PTSD [[32]4] is caused by exposure to high-magnitude,
   life-threatening stressors, i.e., traumatic events. PTSD patients
   typically develop acute stress responses that include symptoms of
   arousal, anxiety, sadness, grief, agitation, irritability, and sleep
   disturbances. Heart disease is associated with PTSD, with a
   meta-analysis showing association between coronary heart disease (CHD)
   and PTSD [[33]5]. Moreover, hospitalization due to cardiovascular
   disease is associated with PTSD caused by the September 11, 2001, World
   Trade Center disaster [[34]6]. PTSD was also associated with CHD using
   a prospective twin study design [[35]7]. However, the underlying
   genetic background of the association is not well known.

   By applying CPCAFE and VBPCAFE to publically available mRNA and
   microRNA (miRNA) expression data [[36]8] from stressed mouse hearts, we
   identified aberrantly expressed miRNAs and mRNAs. Biological
   feasibility of the identified miRNAs and mRNAs was determined (by
   negative correlation between miRNAs and mRNAs, and Kyoto Encyclopedia
   of Genes and Genomes (KEGG) [[37]9] pathway enrichment of miRNA target
   genes), showing suitability of our selections.

   Two supervised FE methods, specifically, regression analysis using
   categorical pseudo variables and backward elimination using
   Hilbert-Schmidt norm of the cross-covariance operator (BAHSIC),
   identified unstable and biologically less feasible sets of mRNAs and
   miRNAs that were not always negatively correlated, and suggested
   superiority of unsupervised FE methods.

   Among the aberrantly expressed genes identified by CPCAFE, fatty acid
   binding protein 3 (FABP3) was considered a potential drug target
   candidate by structural investigations, and inhibitory drugs were
   sought using the in silico drug discovery tool, chooseLD, a
   profile-based drug discovery program.

Results and discussion

FE methods applied to a simulated categorical multiclass data set

   We performed FE using a simulated categorical multiclass data set, to
   determine the limitations of FE and usefulness of our proposed methods.

   The data set consisted of 100 simulated ensembles of 20 samples with
   100 features, of which only 10 features were distinct between four
   classes, and with each class consisting of 5 samples (see [38]Methods).
   As sample values of each feature were obtained from an identical
   mixture of four Gaussian distributions, FE is difficult but
   discrimination between four classes varies between three cases (shown
   in Figure [39]1; a typical feature boxplot with distinct expression
   among classes). We named these three cases as easy (s=2), medium (s=1),
   and hard (s=0.5). Furthermore, we did not use order between the four
   classes, so the data could be treated as a categorical data set.

Figure 1.

   Figure 1
   [40]Open in a new tab

   Boxplot of typical features with distinct values between four classes
   (1≤i≤5) in the simulated data set; s=2 (easy), 1 (medium), and 0.5
   (hard).

   In order to demonstrate the limitations of FE, we first tested one vs
   one t test based FE (see [41]Methods). No significantly different
   features were detected between the four classes in any of the three
   cases (Figure [42]2; confusion tables are available in Additional file
   [43]1), demonstrating that FE on a categorical multiclass data set is
   difficult.

Figure 2.

   Figure 2
   [44]Open in a new tab

   Matthews correlation coefficients and F measure for various FE methods
   applied to the simulated data set.t test, one vs one t test based FE;
   CRP, categorical regression based FE (using adjusted P-values); CRR,
   categorical regression based FE (using ranked P-values); BAHSIC,
   backward elimination using Hilbert-Schmidt norm of the cross-covariance
   operator; VBPCAFE, variational Bayes principal component analysis based
   unsupervised FE; CPCAFE, conventional principal component analysis
   based unsupervised FE.

   Next, we applied a more sophisticated method, specifically, categorical
   regression based FE (using significance based on adjusted P-values, see
   [45]Methods). The number of correctly extracted features improved, with
   Matthews correlation coefficients of 0.95, 0.35, and 0.05, and F
   measures of 0.98, 0.15, and 0.005 for easy, medium, and hard cases,
   respectively (Figure [46]2; confusion tables are available in
   Additional file [47]1). However, for the hard case, both values are
   almost zero.

   In order to improve performance, we selected the top 10 ranked features
   with the smallest P-values, independent of significance (using P-value
   ranks, see [48]Methods). This resulted in Matthews correlation
   coefficients of 0.97, 0.61, and 0.22, and F measures of 0.97, 0.65, and
   0.30 for easy, medium, and hard cases, respectively (Figure [49]2;
   confusion tables are available in Additional file [50]1). Thus, FE
   accuracy is reasonable, but selecting features using non-significant
   P-values is not ideal, highlighting the problem of using FE on a
   categorical multiclass data set.

   In order to overcome this, we next tested BAHSIC [[51]10] (see
   [52]Methods), a recently proposed non-parametric FE method that has
   been used on high-dimensional microarray data sets. The obtained
   Matthews correlation coefficients were 0.84, 0.49, and 0.18, and F
   measures were 0.86, 0.54, and 0.27 for easy, medium, and hard cases,
   respectively (Figure [53]2; confusion tables are available in
   Additional file [54]1). Although performance is relatively improved
   (without using highly ranked but non-significant features), BAHSIC does
   not use P-values, and again the results are not completely
   satisfactory, especially for the easy case: a Matthews coefficient of
   0.84 is less than the 0.95 achieved by categorical regression based FE.

   Therefore, we next considered VBPCAFE. In order to perform FE with
   VBPCA, original VBPCA was extended to incorporate feature dependence
   (see [55]Methods). Prior distribution in this extension has
   feature-dependent parameters, and therefore automatic elimination of
   irrelevant features was expected. The
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   histogram reflects the prior distribution parameter of the first
   principal component (PC) (Figure [56]3), with smaller values assumed to
   be irrelevant features. Originally, C [B] was introduced to eliminate
   irrelevant PCs, and therefore has only q (PC) and not i (feature)
   dependence. However, we needed to eliminate irrelevant features as
   well; therefore, in our study C [B] must also have i dependence.
   Although we did not use sample labelling information, as expected,
   irrelevant features had smaller
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   . Averaged A [j1] (contribution of the jth sample to the first PC) over
   100 ensembles (Figure [57]4), demonstrates that A [j1] represents the
   distinction between the four classes. Thus, features not coincident
   with A [j1] (and with smaller
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   ) were eliminated, producing Matthews correlation coefficients of 0.91,
   0.39, and 0.04, and F measures of 0.92, 0.46, and 0.14 for easy,
   medium, and hard cases, respectively (Figure [58]2; confusion tables
   are available in Additional file [59]1).

Figure 3.

   Figure 3
   [60]Open in a new tab

   Relationship between
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>b</mi></mrow><mrow><mi>i</mi>
   </mrow></msubsup> :MATH]
   and features with distinct expression among four classes. Left column:
   histogram obtained from logarithmic
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>b</mi></mrow><mrow><mi>i</mi>
   </mrow></msubsup> :MATH]
   for 100 independent ensembles. Red color indicates features with
   distinct expression between four classes. Right column: proportion of
   features with distinct expression between four classes in each bin.
   Top: s=2, easy; middle: s=1, medium; bottom: s=0.5, hard cases.

Figure 4.

   Figure 4
   [61]Open in a new tab

   A [j1] boxplot with distinct values between four classes in 100
   independent ensembles of the simulated data set; s=2 (easy), 1
   (medium), and 0.5 (hard).

   Although performance has been successfully improved for the easy case,
   VBPCAFE is computationally challenging and is a potential drawback to
   its use. Iteration involves updates proportional to the number of
   features, which can often be as many as several tens of thousands.
   Thus, a more computationally effective method is of value, and a
   relatively straightforward idea is to replace
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mtext
   mathvariant="italic">iq</mtext></mrow></msubsup> :MATH]
   with B [iq], the qth PC score of ith feature, because a larger B [iq]
   is expected to result in relevant (thus larger)
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mtext
   mathvariant="italic">iq</mtext></mrow></msubsup> :MATH]
   (for example, see eq. ([62]1) in [63]Methods). Indeed, replacing
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   with B [1i], we obtained Matthews correlation coefficients of 0.88,
   0.34, and 0.02, and F measures of 0.89, 0.41, and 0.12 for easy, medium
   and hard cases, respectively (Figure [64]2, confusion tables are
   available in Additional file [65]1). These values are comparable with
   those using
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   . Considering that CPCAFE requires minimal computational resources,
   CPCAFE is a promising candidate for applying to real data that often
   consists of more than several thousand features.

   In summary, there are at least four methods that achieve comparable FE
   performance on a categorical multiclass data set: categorical
   regression based FE (using ranked P-values), BAHSIC, VBPCAFE, and
   CPCAFE. Each has its own advantages and disadvantages: although
   categorical regression based FE achieved the best overall performance,
   it used features with non-significant adjusted P-values. BAHSIC does
   not have this problem, but as it does not use P-values, its performance
   using the easy case was poorest of the tested methods. VBPCAFE and
   CPCAFE were more effective using the easy case, but their performance
   was poor for the hard case. VBPCAFE is computationally challenging,
   while CPCAFE is not.

   However, these conclusions may be viewed as slightly subjective. The
   methods recommended for the hard case (i.e., categorical regression
   based FE and BAHSIC) have strict label dependency. Harder class
   discrimination often means label uncertainty (or incorrectness). Of
   course, there are many objective labels (e.g., gender or age), but
   those related to experimental conditions often include subjective
   criterion or insufficient controls for experimental conditions. Hard
   class discrimination often originates from these subjective labels, and
   consequently, robustness to mislabeling is desired, especially for hard
   cases.

   In order to test robustness of the four methods towards partial
   mislabeling, we determined performance after mislabeling (Table [66]1).
   Figure [67]5 shows performance degradation caused by little, medium,
   and heavy partial mislabeling. Little, medium, and heavy were defined
   by the distance between true and wrong labels. The number of samples
   with wrong labels included only a proportion of the data, and never
   more than 20%. Although only 20% of samples were mislabeled,
   performance degradation with BAHSIC and categorical regression based FE
   was considerable for medium and heavy mislabeling. Conversely, VBPCAFE
   and CPNAFE performance were not affected by mislabeling as these two
   methods do not require labeling information. This indicates that the
   two methods recommended for hard cases are not particularly robust,
   even against partial mislabeling, and therefore not always recommended.

Table 1.

   Artificial partial mislabeling introduced to the simulated data set to
   check robustness of FE
               (a)     (b)     (c)
   Corr. coef. 0.92    0.68    0.60
   Class       1 2 3 4 1 2 3 4 1 2 3 4
   1           4 1 0 0 4 0 1 0 4 0 0 1
   2           1 4 0 0 0 4 0 1 0 4 1 0
   3           0 0 4 1 1 0 4 0 0 1 4 0
   4           0 0 1 4 0 1 0 4 1 0 0 4
   [68]Open in a new tab

   Rows and columns represent true and modified labels. Numbers represent
   the number of samples with correct and modified labels. (a) little; (b)
   medium; and (c) heavy mislabeling. Correlation coefficients represent
   the amount of mislabeling (larger correlation coefficients correspond
   to less mislabeling).

Figure 5.

   Figure 5
   [69]Open in a new tab

   Matthews correlation coefficients and F measure for various FE methods
   applied to the simulated data set with partial mislabeling, indicated
   in Table [70]1. Correlation coefficients between mislabeled and true
   labeling were (a) 0.92, (b) 0.68, and (c) 0.60, as shown in Table
   [71]1. Other notations are the same as in Figure [72]2.

   In conclusion, CPCAFE or VBPCAFE are the best methods for FE from
   categorical multiclass data sets, as they maintain robustness and show
   relatively stable and good performance for most cases.

Translational use of CPCAFE

   In order to determine the effectiveness of CPCAFE in a real
   application, we examined PTSD associated heart disease. Our data set
   consisted of multiclass categorical samples, including several
   experimental conditions with no pre-defined rank orders (see
   [73]Methods), and is suitable for estimating CPCAFE performance.
   Two-dimensional embedding of miRNA profiles are shown (Figure [74]6).
   Two probes (corresponding to mmu-miR-302c and -370) with extremely
   large values were deemed to be erroneous signals and excluded prior to
   embedding. Since each miRNA was attributed to multiple probes, removing
   these two probes did not mean exclusion of these miRNAs from the
   analysis. In addition, probes with relatively large PC2 projections
   were excluded and the 100 top-ranked outliers along PC1 selected. This
   strategy ensured selection of PC1 enhanced probes, as outliers along
   both PC1 and PC2 were not expected to be PC1 specific (PCs other than
   PC1 or PC2 were excluded, with our rationale discussed in Additional
   file [75]2). CPCAFE assumes that if a set of probes are outliers along
   a PC, they behave in a group oriented manner. If not, they are not
   outliers and are instead located on the origin, where the majority of
   probes without biological feasibility are presumed to be located. This
   criterion requires no user input and probe biological relevance (e.g.,
   distinct expression between treatment and control samples) is unknown.
   Using these assumptions, we further investigated a selected 100 probes,
   representing 27 unique miRNAs, specifically, mmu-miR-451, -22, -133b,
   -709, -126-3p, -30c, -29a, -143, -24, -23b, -133a, -378, -30b, -29b,
   -125b-5p, -675-5p, -16, -26a, -30e, -1983, -691, -23a, -690, -207, and
   -669l, and mmu-let-7b and -7g. We investigated differential expression
   of these miRNAs between treatment and control samples using various
   statistical tests (e.g., t-test, Wilcoxon rank sum test, and
   Kolmogorov-Smirnov test), but obtained no significant results. However,
   owing to the small number of samples (four samples for treatment and
   control conditions), the significance criterion (i.e., P<0.05) was not
   satisfied by any miRNA examined, including the 27 selected miRNAs.
   Therefore, we compared the selected and other miRNAs as two separate
   groups. Boxplots of logarithmic P-values obtained by t tests between
   miRNAs in treatment and control samples are shown (Figure [76]7).

Figure 6.

   Figure 6
   [77]Open in a new tab

   Two-dimensional embedding of miRNA expression determined by PCA. Each
   dot represents a probe. Red dots indicate the 100 top-ranked probes
   with larger PC1 scores, selected as outliers based on the criterion of
   CPCAFE. A few probes with relatively large PC2 scores were excluded to
   allow selection of PC1 enhanced probes. See Additional file [78]2 for
   more detail.

Figure 7.

   Figure 7
   [79]Open in a new tab

   Boxplots of logarithmic P-values obtained from t tests comparing
   control and treatment samples.P-values < 0.5 indicate greater miRNA
   upregulation in treatment samples than controls (justification for
   using logarithmic P-values is available in Additional file [80]2). From
   left to right, the experimental conditions were 2, 5, and 10 days of
   stress and 1 day of rest; 5 days of stress and 10 days of rest; and 10
   days of stress and 42 days of rest. Right-hand boxes (“selected”) are
   the 27 miRNAs identified by PCA-based unsupervised FE, and left-hand
   boxes (“others”) the remaining miRNAs. P-values shown above each plot
   were calculated using t tests to compare logarithmic P-values between
   selected and other miRNAs. P [>]s (P [<]s) is the rejection probability
   that rejects the null hypothesis (both sets have equal means) in favor
   of the alternative hypothesis (“others” have a larger or smaller mean
   than “selected”). Based on these criteria, selected miRNAs are
   upregulated in treatment samples with 2 days of stress and 1 day of
   rest, and downregulated in treatment samples for the other conditions.

   The 27 selected miRNAs have larger or smaller P-values than the other
   miRNAs, indicating that CPCAFE successfully selected differentially
   expressed miRNAs between treatment and control samples (justification
   for using logarithmic P-values is provided in Additional file [81]2).

   Although the selected miRNAs were expressed distinctly from the other
   miRNAs, this does not demonstrate biological relevance. As our aim was
   to investigate the underlying transcriptomic background of
   PTSD-mediated heart disease, preliminary validation of our approach can
   be provided by prior involvement of the selected miRNAs with heart
   disease. To address this, we performed literature searches (Table
   [82]2). Of the 27 selected miRNAs, 23 (excluding miR-1983, -691, -690,
   and -207) were previously reported to be related to heart disease,
   suggesting our miRNA selection is biologically useful.

Table 2.

   Summary of studies with association between the 27 selected miRNAs and
   heart disease
   miRNAs Ref. Description
   miR-451 [[83]39] Upregulated in heart due to ischemia
   miR-22 [[84]40] Elevated serum levels in patients with stablechronic
   systolic heart failure
   miR-133 [[85]41] Downregulated in transverse aortic constrictionand
   isoproterenol-induced hypertrophy
   miR-709 [[86]42] Upregulated in rat heart four weeks after
   chronicdoxorubicin treatment
   miR-126 [[87]43] Association with outcome of ischemic andnonischemic
   cardiomyopathy in patients withchronic heart failure
   miR-30 [[88]44] Inversely related to CTGF in two rodent modelsof heart
   disease, and human pathological leftventricular hypertrophy
   miR-29 [[89]45] Downregulated in the heart region adjacent toan infarct
   miR-143 [[90]46] Molecular key to switching of the vascular
   smoothmuscle cell phenotype that plays a critical role incardiovascular
   disease pathogenesis
   miR-24 [[91]47] Regulates cardiac fibrosis after myocardial infarction
   miR-23 [[92]48] Upregulated during cardiac hypertrophy
   miR-378 [[93]49] Cardiac hypertrophy control
   miR-125 [[94]50] Important regulator of hESC differentiation to
   cardiacmuscle(potential therapeutic application)
   miR-675 [[95]51] Elevated in plasma of heart failure patients
   let-7 [[96]52] Aberrant expression of let-7 members incardiovascular
   disease
   miR-16 [[97]53] Circulating prognostic biomarker in critical
   limbischemia
   miR-26 [[98]54] Downregulated in a rat cardiac hypertrophy model
   miR-669 [[99]55] Prevents skeletal muscle differentiation in
   postnatalcardiac progenitors
   [100]Open in a new tab

   To further confirm biological suitability of the identified miRNAs, we
   examined KEGG pathway enrichment using miRNA target genes (see
   [101]Methods). Associations between enriched KEGG pathways and heart
   disease are summarized (Table [102]3). Next, we examined the 21
   top-ranked pathways (the complete list is available in Additional file
   [103]3), with 17 found to be related to heart disease. Overall, these
   findings validate both our selection of miRNAs, and utility of CPCAFE.

Table 3.

   Summary of studies with association between heart disease and KEGG
   pathways enriched by miRNA target genes
   KEGG pathway Enrichment P -value Ref. Description
   Axon guidance 3.61×10^−14 [[104]56] Axon guidance of sympathetic
   neurons to cardiomyocytes is a promising target for regulation of
   cardiac function in diseased hearts
   Colorectal cancer 1.92×10^−10 [[105]57] Significant association between
   colorectal neoplasm and coronary artery disease
   Chronic myeloid leukaemia 1.92×10^−10 [[106]58] Imatinib mesylate (a
   therapeutic agent for chronic myeloid leukaemia) has cardiotoxicity
   Glutamatergic synapse 3.35×10^−9 — —
   Hepatitis B 5.86×10^−9 [[107]59] Hepatitis B virus causes varied forms
   of heart disease
   Pancreatic cancer 6.60×10^−9 — —
   Acute myeloid leukaemia 2.54×10^−8 [[108]60] Cause of acute ischemic
   heart disease
   Focal adhesion 6.65×10^−8 [[109]61] Focal adhesion kinase deletion
   attenuates pressure overload-induced hypertrophy
   MAPK signaling pathway 3.50×10^−7 [[110]62] Plays an important role in
   cardiac and vascular disease pathogenesis
   Endometrial cancer 4.55×10^−7 [[111]63] Cardiovascular disease is the
   leading cause of death among endometrial cancer patients
   Chagas disease 6.33×10^−7 [[112]64] Association with heart disease
   T cell receptor signaling pathway 9.82×10^−7 — —
   ErbB signaling pathway 1.01×10^−6 [[113]65] ErbB-signaling pathway
   proteins are potential drug targets for heart failure treatment
   Prostate cancer 2.47×10^−6 [[114]66] Coronary artery disease and
   prostate cancer are both common diseases sharing many risk factors
   Neurotrophin signaling pathway 3.08×10^−6 — —
   Toxoplasmosis 3.16×10^−6 [[115]67] Toxoplasmosis is a cause of heart
   disease
   Bacterial invasion of epithelial cells 3.84×10^−6 — —
   TGF-beta signaling pathway 7.35×10^−6 [[116]68] Upregulated in
   infarcted myocardium
   Nonsmall-cell lung cancer 9.00×10^−6 — —
   VEGF signaling pathway 1.17×10^−5 [[117]69] A VEGF inhibitor has
   cardiotoxicity
   Dopaminergic synapse 2.19×10^−5 [[118]70] Dopamine agonists affect the
   cardiovascular system
   [119]Open in a new tab

   Moreover, our results suggest that the aberrantly expressed miRNAs are
   likely involved in PTSD-mediated heart disease. Hence, further
   investigation of the miRNA target genes may clarify the underlying
   molecular biology and transcriptomic background of PTSD-mediated heart
   disease.

   In this regards, we applied PCA to mRNA expression. Two-dimensional
   embedding of mRNA expression is shown (Figure [120]8). To determine
   correlation between miRNA and mRNA expression, we compared their
   expression profiles and determined the contribution of each sample to
   PC1 (Figure [121]9(a)), showing negative correlation between PC1s.
   Additionally, to determine if the observed correlation is coincident
   with experimental conditions, scatterplots of averaged values within
   each condition were examined (Figure [122]9(b)). This strengthened the
   correlations but did not change significance, indicating that the
   observed negative correlation between mRNA and miRNA expression is
   reliable. Next, we selected mRNAs using CPCAFE. To select PC1 enhanced
   probes, the 100 top-ranked outliers along PC1 were selected, excluding
   probes with relatively large projections to PC2. In total, 59 unique
   mRNAs were identified (RefSeq mRNA IDs are provided in Additional file
   [123]4). To confirm negative correlation between miRNAs and miRNA
   target mRNAs, correlation coefficients were determined (see Additional
   file [124]4). There were no targets common to the selected 59 mRNAs and
   TarBase, employed by DNA intelligent Analysis (DIANA)-mirpath [[125]11]
   and used in KEGG pathway analysis (see [126]Methods), possibly because
   of TarBase’s experiment-oriented, thus context-dependent, nature (i.e.,
   TarBase does not include PTSD). Thus, instead we used seed matching to
   identify miRNA target genes, with so called 7mer-m8 [[127]12] detecting
   exact matches to positions 2-8 of mature miRNAs (seed + position 8).
   Among the 59 mRNAs, 24 were targeted by at least one of the 27 selected
   miRNAs. In addition, 47 pairs of miRNAs and miRNA target genes were
   identified. In total, there were 45/47 negative correlation
   coefficients between miRNAs and miRNA target genes. We also examined
   correlation coefficient significance (see [128]Methods), with 26/47
   pairs (more than half) associated with significant correlations (two
   positive correlations were judged insignificant), and confirming
   negative correlation between miRNAs and miRNA target genes.

Figure 8.

   Figure 8
   [129]Open in a new tab

   Two-dimensional embedding of mRNA expression by PCA. Notations are the
   same as in Figure [130]6. See Additional file [131]2 for more detail.

Figure 9.

   Figure 9
   [132]Open in a new tab

   Comparison of sample contributions to PC1 between mRNA and miRNA.(a)
   Scatterplot comparing mRNA and miRNA expression profiles for sample
   contribution to PC1 (i.e., components of the first loading vector).
   Pearson’s correlation coefficient =−0.37 (P=0.01). (b) Averaged
   contributions within each condition. Pearson’s correlation coefficient
   =−0.69 (P=0.01). XY-Zd with X = C (control); X = T (treatment); Y,
   stress days; and Z, rest days in experimental conditions. See
   Additional file [133]2 for more detail.

   In order to determine if the selected 59 mRNAs were differentially
   expressed between control and treated samples, we examined the
   logarithmic ratio. Using t tests, the averaged logarithmic ratio of the
   59 samples was significantly negative or positive compared with that
   averaged over other mRNAs, excluding only one condition (see Table
   [134]4). Thus, our results show that the 59 selected mRNAs are mostly
   distinctly expressed between control and treated samples.

Table 4.

   P -values calculated by t tests from logarithmic ratios between treated
   and control samples
                     C2-1d     C5-1d     C10-1d     C5-10d C10-42d
   P -values         vs        vs        vs         vs     vs
                     T2-1d     T5-1d     T10-1d     T5-10d T10-42d
   Control < treated 6.9×10^−4 7.4×10^−8 1.0        0.22   0.03
   Control > treated 1.0       1.0       2.35×10^−8 0.78   0.97
   [135]Open in a new tab

   Smaller P-values indicate that the difference in mean logarithmic ratio
   is more significant in the selected 59 mRNAs than that in the other
   mRNAs. For descriptions of each experimental condition, see the caption
   for Figure [136]9. For more methodological detail, see Additional file
   [137]2.

   Although the selected mRNAs include many targeted and negatively
   regulated by miRNAs, and are differentially expressed between control
   and treated samples, published associations between the selected mRNAs
   and heart disease would provide direct biological relevance.
   Association between the identified genes and heart disease are
   summarized (Table [138]5). In total, 9 genes were both targeted by at
   least one of the 27 miRNAs and related to heart disease. In addition,
   17 genes were related to heart disease but not targeted by any of the
   27 miRNAs (see Additional file [139]5). Among the genes not associated
   with heart disease, 15 genes were targeted by at least one of the 27
   miRNAs, and only 18 genes were not targeted by any. Because
   approximately half (26 in total) of the selected 59 genes are related
   to heart disease, our gene selection is feasible from a biological
   point of view.

Table 5.

   Summary of studies with association between heart disease and genes
   selected by CPCAFE
   Gene RefSeq mRNA miRNAs Heart disease association ( P -values from the
   Gendooserver)
   Fabp3 [140]NM_010174 miR-709 Cardiomegaly (0.031)
   Cox4i1 [141]NM_009941 miR-709 Cardiac output, low (5×10^−4)
   Atp5g1 [142]NM_001161419 miR-29a/b-3p, Cardiomyopathies (1.5×10^−3)
   miR-16
   Cox6a2 [143]NM_009943 miR-23a/b-3p Heart disease (1.3×10^−3);
   Cardiomyopathies (2.3×10^−3);Heart failure (5.0×10^−3)
   Aldoa [144]NM_001177307 miR-16 Cardiomyopathy, dilated (2.0×10^−3)
   Cox5a [145]NM_007747 miR-26a-5p Cardiomyopathies (5.4×10^−3);
   Myocardial ischemia(7.4×10^−4)
   Myl2 [146]NM_010861 miR-1983 Heart defects, congenital (4.9×10^−48);
   Cardiomyopathy, dilated (5.2×10^−21); Cardiomegaly (1.1×10^−17); Heart
   failure (2.1×10^−9)
   Myl3 [147]NM_010859 miR-691 Heart defects, congenital (1.1×10^−7);
   Cardiomegaly(1.2×10^−4); Myocardial infarction (2.4×10^−4); Heart
   septal defects, ventricular (6.9×10^−4)
   Tcap [148]NM_011540 miR-207 Cardiomyopathy, dilated (1.1×10^−8);
   Cardiomyopathy, hypertrophic (1.7×10^−3); Heart defects,
   congenital(6.0×10^−3); Heart failure (1.2×10^−2)
   [149]Open in a new tab

   Finally, we compared the selected 59 genes with KEGG pathways. KEGG
   pathway analysis had been performed for the miRNAs, but as they are not
   the only mRNA regulatory mechanism, mRNA expression may be, at least
   partially, distinct from miRNA expression. We found many genes were
   concentrated to specific KEGG pathways (see Additional file [150]6).
   For example, Uqcrh, Uqcrq, Cox6a2, Cox7a1, Cox5a, Cox4i1, Cox8b, Cox6c,
   Myl2, Myl3, Myh6, Myh8, Tpm1, Tnni3, and Tnnt2 belong to the KEGG
   pathway “Cardiac muscle contraction” (mmu04260), and most are also
   components of cytochrome c oxidase, involved in mitochondrial proton
   transfer. Thus, it is biologically reasonable that heart disease is
   associated with aberrant expression of these genes. Mybpc3, Tnni3, and
   Tnnt2 belong to the KEGG pathway “Hypertrophic cardiomyopathy”
   (mmu05410), also coincident with a role in heart disease. Atp5g1,
   Atp5b, Atp5g3, Atp5h, Atp5a1, Atp5e, Atp5j2, Ndufa13, and Ndufs6 belong
   to “Oxidative phosphorylation” (mmu00190 + 11951). This KEGG pathway is
   an essential part of mitochondrial energy metabolism, and malfunction
   is likely to be directly related to muscle functionality. Mdh2 and Aco2
   belong to the “Citrate cycle” (mmu00020), and are involved in energy
   production, while Fabp3 belongs to the “peroxisome
   proliferator-activated receptor (PPAR) signaling pathway” (mmu03320),
   involved in muscle function. Thus, based on these KEGG pathway
   functions, the genes identified by CPCAFE are related to heart disease.

   In order to further confirm the validity of our methodology, i.e.,
   integrated analysis of mRNA and miRNA expression, we have also
   performed KEGG pathway analysis by the Database for Annotation,
   Visualization and Integrated Discovery (DAVID) [[151]13] restricted to
   24 genes targeted by at least one of 27 miRNAs (see [152]Methods).
   DAVID identified five KEGG pathways associated with significant
   P-values adijutsted by Benjamini and Hochberg (BH) critetion (Table
   [153]6). Two out of five were related to cradiadic diseases (“Cardiac
   muscle contraction” and “Oxidative phosphorylation”[[154]14]) and three
   were related to neurodegenerative diseases (“Parkinson’s disease”,
   “Alzheimer’s disease” and “Huntington’s disease”). Thus, these are very
   coincident with those causing PTSD mediated heart diseases. Figure
   [155]10 summarizes the analyses performed in the above.

Table 6.

   KEGG pathway enriched by 24 mRNAs targeted by 27 miRNAs, identified by
   DAVID
   KEGG pathway               Number of genes %    P -values Adjusted P -values
   Cardiac muscle contraction 7               30.4 2.30E-09  3.20E-08
   Parkinson’s disease        7               30.4 5.80E-08  4.10E-07
   Oxidative phosphorylation  6               26.1 2.30E-06  1.10E-05
   Alzheimer’s disease        6               26.1 1.20E-05  4.20E-05
   Huntington’s disease       6               26.1 1.20E-05  3.50E-05
   [156]Open in a new tab

   Number of genes are genes included in pathway, % is the ratio genes
   included in pathway among 24 genes. P-values and those adjusted by BH
   criterion were provided by DAVID.

Figure 10.

   Figure 10
   [157]Open in a new tab

   Schematic that illustrates biological validations towards 27 miRNAs and
   59 mRNAs.

   Our findings suggest that PTSD-mediated heart disease may be caused by
   malfunction in “Cardiac muscle contraction” and/or “Oxidative
   phosphorylation” of energy metabolism. These malfunctions are related
   to aberrant gene expression likely mediated by aberrant miRNA
   expression. From a therapeutic point of view, PTSD-mediated heart
   disease may be treated based upon this knowledge. To perform in silico
   drug discovery for these genes, the 26 genes associated with heart
   disease were investigated. First, tertiary structures were predicted
   (see [158]Methods), and found to be similar to predicted or
   experimentally determined tertiary structures available in the Protein
   Data Bank (PDB), indicating our tertiary structures are reliable (see
   Additional file [159]7).

   Among the proteins with tertiary structures predicted or available in
   PDB, FABP3 has a “pocket” to which inhibitors can bind, and was
   therefore selected as a candidate drug target for in silico drug
   discovery. FABP3 is an acid binding protein and its function can be
   blocked by inhibition of acid binding. Moreover, FABP3 is upregulated
   in patients with ventricular-septal defects in comparison to normal
   controls [[160]15]. FABP3 is also a member of FABPs that play critical
   roles in the PPAR signaling pathway identified above. Thus, it is
   likely that FABP3 is a key protein in PTSD-mediated heart disease.

   The 10 top-ranked drug candidate compounds obtained by in silico drug
   discovery (see [161]Methods) are listed (Table [162]7). The complete
   list of compounds ranked by FPAScores is available in Additional file
   [163]8. The compounds include promising drug candidates, for example,
   two heat shock protein 90 (HSP90) inhibitors are upregulated in dilated
   cardiomyopathy [[164]16], and two PPAR inhibitors are regarded as
   pharmacological therapeutic targets [[165]17]. Furthermore,
   overexpression of two Cyclin-dependent kinase 2 (CDK2) inhibitors
   results in smaller mononuclear cardiomyocytes [[166]18]. All of these
   candidates should be investigated further.

Table 7.

   Top ranked FABP3 inhibitor compounds identified by in silico drug
   discovery
   Rank FPAScore Drug name DrugBank/CHEMBL No. Target/Activity reported in
   DrugBankand CHEMBL
   1 907.4 Oxaprozin DB00991/CHEMBL1071 PTGS1 (Cox1) & PTGS2 (Cox2)
   inhibitor
   2 901.9 — DB08539/CHEMBL249736 Inhibits human CDK2
   3 866.2 — DB06964/CHEMBL252124 HSP90 β binding affinity
   4 864.4 — DB08702/— Metallo- β-lactamase L1*
   5 826.7 — DB08396/— Ig heavy chain V-III region CAM*
   6 809.3 — DB06908/CHEMBL209749 PPAR α/γ binding affinity
   7 790.5 — DB08483/CHEMBL47590 Agonist activity at human PPAR δ/γ/α;
   CDK2/4 inhibition
   8 773.8 Flavoxate DB01148/CHEMBL1493 CHRM1/2 antagonist; PDE 4/7/8
   inhibitor
   9 771.6 — DB07594/CHEMBL399530 Inhibits HSP90 activity
   10 761.2 Rolitetracycline DB01301/CHEMBL1237046 Inhibitor of 30S
   ribosomal protein S9 & 16S rRNA; Inhibits synthetic amyloid β-42
   fibrillization
   [167]Open in a new tab

   *pharmacological action unknown.

Stability and comparison with other methods

   We have shown biological feasibility of CPCAFE when applied to PTSD.
   However, it is also important to compare its performance against other
   methods, as CPCAFE superiority is doubtful if any other method can
   achieve comparable performance. Hence, we used a modified data set to
   ensure performance is due to the method and not the data set. If slight
   modifications of the sample data set drastically decrease performance,
   CPCAFE superiority is doubtful. In addition, VBPCAFE performance on the
   PTSD data set is unknown. If VBPCAFE derives completely different
   outcomes from CPCAFE, proposing CPCAFE as an alternative to VBPCAFE
   (with a firmer theoretical base, albeit a more time-consuming method)
   would be less convincing.

   First, we examined equivalence between CPCAFE and VBPCAFE. VBPCAFE is
   too time-consuming to be directly applied to the whole PTSD data set,
   therefore we created a data set small enough to be used directly that
   consisted of 200 features, with 100 features selected by CPCAFE and 100
   distinct features. If VBPCAFE and CPCAFE are equivalent, the 100
   features selected by CPCAFE should have larger
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   s than those attributed to the other 100 features (see [168]Methods).
   Feature frequencies selected from the 100 top-ranked probes with larger
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   values after investigation of 100 independent ensembles are shown
   (Figure [169]11). Scatterplots between
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   and B [i1] are also shown (Figure [170]12). As expected (from eq.
   ([171]1) in [172]Methods),
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   is quadratically dependent upon B [i1], definitely demonstrating
   equivalence between CPCAFE and VBPCAFE when applied to a real data set.
   Of course, there is still the possibility that VBPCAFE applied to the
   whole PTSD data set would select a completely different set of features
   from the 100 features selected by CPCAFE. However, we believe it is
   unlikely as there are almost no features not selected by CPCAFE, within
   the 100 top-ranked in any of the 100 ensembles. Thus, CPCAFE and
   VBPCAFE show equivalence when applied to not only simulated data, but
   also a real data set.

Figure 11.

   Figure 11
   [173]Open in a new tab

   Frequency of 100 probes selected by CPCAFE and also identified by
   VBPCAFE within the 100 top-ranked probes with largest
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   values over 100 independent ensembles.(a) miRNA; and (b) mRNA. Red and
   black circles correspond to probes selected or not selected,
   respectively, by CPCAFE.

Figure 12.

   Figure 12
   [174]Open in a new tab

   Scatterplots between B [i1] (horizontal axis) and
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   (vertical axis). Red and black open circles correspond to 100 features
   selected or not selected, respectively, by CPCAFE. Solid green lines
   indicate quadratic regression lines. (a) miRNA; and (b) mRNA.

   Next, we examined CPCAFE stability, i.e., sample independence, by
   performing a 4-fold cross-validation study (see [175]Methods). We
   specifically used a 4-fold cross-validation as there are only four
   biological replicates in each experimental condition, therefore using
   any other cross-validation would not be straightforward. The results
   from 100 independent ensembles are shown (see Additional file [176]9).
   For mRNAs, 78 probes were always selected by CPCAFE and only 10 probes
   not always selected, demonstrating high sample independence. Among the
   78 probes always selected, 53 had associated RefSeq IDs that were part
   of the 59 RefSeq IDs selected by applying CPCAFE to the whole data set.
   For miRNAs, 27 probes were always selected and no probe not always
   selected (see Additional file [177]9). Among the 27 selected mature
   miRNAs, 25 were part of the 27 mature miRNAs selected by CPCAFE
   previously. Thus, CPCAFE selected almost all features 100% of the time,
   demonstrating stability and suggesting that performance is not likely
   due to the selected data set. As CPCAFE and VBPCAFE show potential
   equivalence, it is likely that VBPCAFE also shares this stability.

   Third, we determined if other conventional supervised, and thus
   possibly sample dependent, FE methods can achieve a comparable
   performance to CPCAFE. As there are as many as 12 experimental setups
   without any pre-defined rankings or orderings, it is not
   straightforward to use a popular multiclass oriented FE with regression
   analyses, e.g., lasso [[178]1]. Not only is there no way to attribute
   labels with actual values to each experimental setup, but it is also
   not known if it is possible, in principal, to align these 12
   experimental conditions in one dimensional order. Thus, the number of
   usable supervised FE methods that can be tested on the present PTSD
   data set is limited. Nevertheless, we identified and examined two
   methods. The first was regression analysis, which attributes a 0 or 1
   to each experimental setup with a linear combination assumed to
   represent mRNA/miRNA expression (categorical regression-based FE, see
   [179]Methods). Each feature is then ranked by significance (small
   P-values are attributed to regression), and the 100 top-ranked features
   selected. Using this method, we identified 100 mRNA/miRNA probes and
   investigated biological feasibility and stability of selected ones. For
   mRNAs, among the 100 selected probes, only 23 had associated RefSeq IDs
   (see Additional file [180]4), a smaller number than CPCAFE, suggesting
   that CPCAFE more readily identifies biologically important genes (with
   the underlying assumption that biologically important genes are more
   likely to have RefSeq IDs). Disease association of the selected genes
   was also poorer than those selected by CPCAFE (Table [181]8). Only 4/23
   genes (Ttn, Ubr2, Gata5, and Hs2st1) were associated with heart
   failure-related diseases, in contrast with the 26 heart disease
   associated genes (out of 59 genes) selected by CPCAFE. Even extending
   the target species to human, only one additional gene (Cxc11) was
   identified. Altogether, these results show that CPCAFE has greater
   power for identifying biologically feasible genes. For miRNAs, among
   the 100 selected probes, 77 were identified with unique mature miRNA
   names (see Additional file [182]4), a much larger number than CPCAFE.
   Again, this demonstrates the biological feasibility of CPCAFE, as each
   miRNA is attributed to multiple probes and should therefore have been
   selected simultaneously. Thus with miRNAs, a smaller number of
   identified unique mature miRNA names reflects more plausible FE. The
   identified miRNAs were uploaded to DIANA-mirpath for KEGG pathway
   analysis (see Additional file [183]3), and 74 KEGG pathways identified
   (66 with CPCAFE), although the number of miRNAs related to each pathway
   is much smaller: using CPCAFE, on average seven miRNAs contributed to
   each pathway, while with categorical regression-based FE, only four
   miRNAs contributed, despite almost three times larger (77 vs 27) miRNAs
   uploaded (P = 1.67×10^−17, difference between mean values computed by t
   test). This suggests that categorical regression-based FE is less able
   to identify biologically important miRNAs than CPCAFE. We also
   determined if miRNA target mRNAs were negatively correlated with miRNA
   expression, and identified 180 mRNA-miRNA pairs but with only 35 pairs
   negatively correlated (see Additional file [184]4). This again
   contrasts with CPCAFE, with almost all mRNA-miRNA pairs negatively
   correlated. Correlation coefficient significance was examined, and only
   20 significantly correlated pairs were identified. Since all
   significant correlation coefficients were negative, categorical
   correlation-based FE can extract correct features, but is less able to
   confidently screen more features than CPCAFE; more than half of the
   pairs identified by CPCAFE were associated with significant negative
   correlations. This is consistent with the limited number of miRNAs
   selected by categorical regression-based FE contributing to KEGG
   pathway analysis. Stability was also compared with CPCAFE (see
   Additional file [185]9). For mRNAs, only 24 probes were always selected
   among 100 cross-validation ensembles, with 122 probes selected only
   once. For miRNAs, only eight probes were always selected (see
   Additional file [186]9). The next highest frequencies selected were 93
   and 95, corresponding to only two probes each. Conversely, 33 and 29
   probes were selected only once and twice, respectively. Thus, with
   regards to stability, CPCAFE outperforms FE with categorical regression
   analysis.

Table 8.

   Disease association of mRNAs selected by categorical regression-based
   FE and BAHSIC
   RefSeq mRNA Gene Heart disease association ( P -values: mouse, human)
   Categorical regression-based FE
   [187]NM_011652 Ttn Cardiomyopathy, dilated (5.49×10^−7, 1.80×10^−10);
   Cardiomyopathies (1.5×10^−4, 9.19×10^−5); Cardiomyopathy, hypertrophic
   (5.6×10^−3, 4.74×10^−10); Heart defects, congenital (1.90×10^−2, —);
   Heart failure (—, 1.02×10^−5); Cardiomyopathy, hypertrophic, familial
   (—, 1.02×10^−5)
   [188]NM_019494 Cxcl11 Cardiomyopathy, hypertrophic (—, 3.05×10^−2)
   [189]NM_001177374 Ubr2 Heart Defects, congenital (8.64×10^−4, —);
   Cardiomegaly (1.46×10^−3, —)
   [190]NM_008093 Gata5 Heart defects, congenital (1.40×10^−7, —); Heart
   septal defects (4.12×10^−3, —); Hypertrophy, right ventricular
   (1.36×10^−4, —); Cardiovascular abnormalities (1.49×10^−3, —);
   Cardiomyopathy, dilated (8.14×10^−3, —); Cardiomyopathies (9.27×10^−3,
   —); Cardiomegaly (1.75×10^−2, 7.60×10^−3)
   [191]NM_011828 Hs2st1 Cardiomyopathies (3.87×10^−3, —);
   BAHSIC
   [192]NM_009722 Atp2a2 Cardiomegaly (7×10^−21, 1.33×10^−3);
   Cardiomyopathy, dilated (2.68^−12, 7.37^−9); Cardiomyopathies
   (5.87×10^−12, 9.69 × 10^−4); Heart Failure (6.50 × 10^−12, —); Cardiac
   Output, low (2.01 × 10^−7, —); Arrhythmias, cardiac (8.85 ×10^−7, —);
   Hypertrophy, left ventricular (3.03×10^−6, 3.38×10^−2); Myocardial
   reperfusion injury (2.33×10^−5, —) Myocardial stunning (3.42×10^−3, —);
   Atrial fibrillation (5.42×10^−3, —); Myocardial infarction (6.11×10^−3,
   —); Heart disease (2.38×10^−2, 7.00×10^−4); Ventricular dysfunction,
   left (2.74×10^−2, 3.4×10^−4); Cardiomyopathy, restrictive (—,
   2.12×10^−3); Myocardial ischemia (—, 2.91×10^−3); Mitral valve
   insufficiency (—, 3.68×10^−3); Cardiomyopathy, hypertrophic (—,
   3.49×10^−2)
   [193]NM_001164171 Myh6 (Already identified by CPCAFE)
   [194]NM_008725 Nppa Heart Defects, congenital (2.53×10^−55,
   9.09×10^−9); Cardiomegaly (1.11×10^−28, —); Cardiomyopathies
   (7,57×10^−17, —); Hypertrophy, left ventricular (9.16×10^−12,
   1.19×10^−13); Cardiomyopathy, dilated (6.47×10^−11, 2.41×10^−8); Heart
   disease (9,72×10^−8, 1.11×10^−4); Endocardial cushion defects
   (3.51×10^−6, —); Heart septal defects (4.63×10^−6, —); Cardiovascular
   abnormalities (6.34×10^−5, —); Tachycardia, ectopic atrial (1.97×10^−4,
   —); Arrhythmias, cardiac (4.35×10^−4, —); Cardiomyopathy, hypertrophic,
   familial (4.90×10^−3, —); Heart septal defects, ventricular
   (5.70×10^−3, —); Hypertrophy, right ventricular (1.04×10^−2, —); Heart
   failure (1.04×10^−2, 5.07×10^−29); Cardiomyopathy, hypertrophic
   (2.20×10^−2, 4.40×10^−2) Ventricular dysfunction, left (—,
   3.04×10^−10); Myocardial reperfusion injury (— 1.76×10^−7);
   Cardiovascular disease (—, 4.21×10^−6); Myocardial infarction (—,
   9.73×10^−6); Mitral valve insufficiency (—, 1.04×10^−5); Heart valve
   disease (—, 3.27×10^−5); Myocardial ischemia (—, 1.49×10^−4);
   Ventricular dysfunction (—,.198×10^−3); Cardiomyopathy, restrictive (—,
   2.68×10^−2); Ventricular dysfunction, right (—, 3.53×10^−3)
   [195]NM_008103 Gcm1 Heart defects, congenital (6×10^−3, —)
   [196]NM_009608 Actc1 Heart defects, congenital (2.54×10^−38, —);
   Cardiomyopathy, dilated (6.55×10^−9, 2.16×10^−14); Heart septal defects
   (5.15×10^−7, 6.80×10^−4); Arrhythmias, cardiac (4.90×10^−5, —);
   Cardiomyopathies (2.75×10^−4, 1.23×10^−2); Heart septal defects, atrial
   (1.33×10^−3); Cardiovascular abnormalities (3.85×10^−2, —);
   Cardiomegaly (4.45×10^−2, 1.01×10^−4); Cardiomyopathy, hypertrophic (—,
   4.91×10^−13); Cardiomyopathy, hypertrophic, familial (—, 1.97×10^−6);
   Cardiomyopathy, restrictive (—, 5.87×10^−4); Heart septal defects,
   atrial (—, 1.73×10^−3); Hypertrophy, left ventricular (—, 1.09×10^−2)
   [197]NM_013468 Ankrd1 Heart defects, congenital (3.98×10^−12,
   4.70×10^−5); Cardiomegaly (2.52×10^−6, 1.00×10^−2); Cardiomyopathies
   (9.11×10^−5, —); Heart septal defects (6.18×10^−4, —); Cardiomyopathy,
   hypertrophic, familial (9.66×10^−4, —); Cardiomyopathy, dilated
   (1.22×10^−2, 1.22×10^−2); Heart failure (—, 3.04×10^−4); Hypertrophy,
   left ventricular (—, 7.55×10^−3)
   [198]NM_011540 Tcap (Already identified by CPCAFE)
   [199]NM_177369 Myh8 (Already identified by CPCAFE)
   [200]NM_007450 Slc25a4 (Already identified by CPCAFE)
   [201]NM_010174 Fabp3 (Already identified by CPCAFE)
   [202]NM_008084 Gapdh (Already identified by CPCAFE)
   [203]NM_019494 Cxcl11 Already identified by categorical regression
   based FE
   [204]NM_013463 Gla Cardiomyopathy, hypertrophic (—, 2.46×10^−2); Heart
   disease (—, 2.64×10^−2); Cardiomyopathies (—, 3.10×10^−2)
   [205]NM_001038592 Glrx2 Myocardial reperfusion injury (—, 1.56×10^−3)
   [206]Open in a new tab

   P-values obtained from the Gendoo server for both mouse and human.
   Genes annotated as “Already identified by CPCAFE” are listed in Table
   [207]5.

   It is possible that CPCAFE outperforms FE with categorical regression
   analysis because it is too simple (naive) an alternative. Therefore to
   compare CPCAFE performance with a more sophisticated method, we used
   BAHSIC (see [208]Methods).

   For mRNAs, among the 100 selected probes, only 37 had RefSeq IDs, again
   less than CPCAFE (see Additional file [209]4). Of these 37 mRNAs, only
   16 were associated with heart failure-related disease (Table [210]8),
   even when the target species was extended to human. Thus, 43% of
   extracted RefSeq associated mRNAs are related to heart disease in the
   Gendoo server. This is similar to the 45% reported for genes selected
   by CPCAFE. Considering that the total number of selected RefSeq
   associated mRNAs is larger, this suggests that CPCAFE outperforms
   BAHSIC as well as categorical regression based FE. Interestingly, 15/37
   mRNAs (not always including the 16 disease associated genes) were also
   selected by CPCAFE, despite use of distinct algorithms by BAHSIC and
   CPCAFE. Nine genes selected by CPCAFE (in addition to the six genes
   listed in Table [211]8 or Additional file [212]5), but with missing
   disease association were Synrg, Milr1, Exosc2, Medag, 2610028H24Rik,
   pbld1, Hbb-bt, 1700047I17Rik2, and Hba-a2. Thus, mRNAs extracted by
   CPCAFE and BAHSIC overlap significantly. Furthermore, if we also
   consider PC2 outliers (see Additional file [213]2) that were excluded
   in the previous analyses, an additional 15 genes (Rhox8, Kctd14, Ttc38,
   G l r x2, Ndor1, Dcc, E l k1, G c m1, Hccs, Nppa, A c t c1, Pigr,
   Slc10a1, C x c l11, and Med16) have been identified by CPCAFE
   (Bolditalic six genes are also associated with heart failure-related
   disease, see Additional file [214]2). Consequently, there is a
   substantial increase in overlap significance between mRNAs extracted by
   CPCAFE and BAHSIC. Figure [215]13 shows the relationship between
   CPCAFE, BAHSIC, and heart failure-related disease association. The many
   genes identified by both methods and associated with disease show that
   our method (i.e., treating expression data as a categorical multiclass
   data set and not a collection of pairwise comparisons) can identify
   biologically relevant genes.

Figure 13.

   Figure 13
   [216]Open in a new tab

   Venn diagram of mRNAs extracted by CPCAFE and BAHSIC, and heart
   failure-related disease association. Numbers in parentheses correspond
   to those with outliers along PC2 (see Additional file [217]2).

   For miRNAs, among the 100 probes, 47 unique mature miRNAs were selected
   (see Additional file [218]4), a much larger number than CPCAFE (27).
   The 47 miRNAs were uploaded to DIANA-mirpath (see Additional file
   [219]4), and 65 KEGG pathways identified as significant. Although this
   number is similar to that obtained by CPCAFE (66), considering that the
   number of uploaded miRNAs was almost twice (47 vs 27), CPCAFE is still
   superior to BAHSIC. In fact, the averaged number of miRNAs with target
   mRNAs enriched in each pathway, was approximately seven, similar to
   that obtained by CPCAFE despite two-times more uploaded miRNAs. We also
   compared biological significance of the KEGG pathways obtained by
   BAHSIC and CPCAFE. In contrast to 17 reported pathways related to
   heart-failure related disease (from the top most significant 21 KEGG
   pathways) using CPCAFE (Table [220]3), only 11 identified by BASHSIC
   (of which, seven were also identified by CPCAFE; Table [221]3) were
   related to heart-failure related disease by literature searching (Table
   [222]9). Thus, from a biological point of view, CPCAFE outperforms
   BAHSIC. We also determined if targeted mRNAs negatively correlate with
   miRNA expression, and identified 164 mRNA-miRNA pairs, although only 73
   were negatively correlated (see Additional file [223]4). Conversely,
   with CPCAFE, almost all pairs are negatively correlated. Correlation
   coefficient significance was examined, with only 33 significantly
   correlated pairs identified. Since all significant correlation
   coefficients were negative, this shows that BAHSIC cannot correctly
   extract features, and is less able to confidently screen more features
   than CPCAFE. Again this coincides with the limited number of miRNAs
   selected by BAHSIC that contribute to KEGG pathway analysis.

Table 9.

   Summary of studies with association between heart disease and KEGG
   pathways enriched by BAHSIC identified miRNA target genes
   KEGG pathway Enrichment P -value Ref. Description
   Long-term depression 5.37×10^−19 [[224]71] Depression has been linked
   with additional health problems, including heart disease
   Prion diseases 3.13×10^−17 [[225]72] Prion-induced amyloid heart
   disease with high blood infectivity in transgenic mice
   Axon guidance 5.99×10^−14 (Already identified by CPCAFE)
   Fatty acid biosynthesis 2.01×10^−12 [[226]73] A relationship between
   fatty acid synthase andcardiac calcium signaling has been suggested
   Calcium signaling pathway 2.04×10^−10 (Already identified by CPCAFE)
   Gap junction 5.03×10^−10 [[227]74] Gap junction alterations in human
   cardiac disease
   Prostate cancer 6.04×10^−10 (Already identified by CPCAFE)
   Pancreatic cancer 1.63×10^−9 — —
   Glutamatergic synapse 2.02×10^−8 — —
   Endometrial cancer 2.02×10^−8 (Already identified by CPCAFE)
   Long-term potentiation 2.65×10^−8 — —
   Neurotrophin signaling pathway 2.81×10^−8 — —
   Focal adhesion 8.34×10^−8 (Already identified by CPCAFE)
   TGF-beta signaling pathway 2.00×10^−7 (Already identified by CPCAFE)
   Endocytosis 2.27×10^−7 — —
   Cholinergic synapse 9.60×10^−7 — —
   Regulation of actin cytoskeleton 1.33×10^−6 — —
   Colorectal cancer 3.52×10^−6 (Already identified by CPCAFE)
   Salmonella infection 5.65×10^−6 — —
   PI3K-Akt signaling pathway 1.01×10^−5 — —
   [228]Open in a new tab

   Pathways annotated as “Already identified as CPCAFE” are listed in
   Table [229]3.

   BAHSIC stability was compared (see Additional file [230]9). For mRNAs,
   only one probe was selected only 14 times among the 100 independent
   ensembles, and was also the most frequently selected. The second most
   frequent four probes were selected only 13 times, and 2133 probes were
   selected only once. For miRNAs, 68 probes were always selected among
   100 cross-validation ensembles (see Additional file [231]9). The second
   most frequent probes were selected 99 times, but this only included
   three probes. The third, fourth, and fifth frequent probes were
   selected 98, 97, and 96 times, but included only 2, 3 and 1 probes,
   respectively. Thus, with stability, CPCAFE definitely outperforms
   BAHSIC.

   In conclusion, CPCAFE outperformed two conventional supervised FE
   methods on both stability and biological feasibility, and performed
   equally well to VBPCAFE, which has more theoretical confidence. Thus,
   CPCAFE and VBPCAFE are the most suitable FE methods for categorical
   multiclass problems.

   We have included diagrams summarizing the discussed points (Figure
   [232]14). Figure [233]14(a) shows the number of unique miRNAs. The same
   100 probes were extracted using all methods, and miRNAs assigned to
   multiple probes. As all probes to which the same miRNAs are assigned
   should be extracted at the same time, smaller numbers of unique miRNAs
   assigned to 100 probes shows better extraction. In this context, CPCAFE
   outperformed the other two methods (Venn diagram is also available in
   Figure [234]15). Figure [235]14(b) shows the number of mRNAs with
   RefSeq IDs. As mRNAs with RefSeq IDs are more likely to have known
   biological significance, a larger number of RefSeq IDs accompanying
   mRNAs shows selection of more biologically relevant mRNAs. In this
   context, CPCAFE outperformed the other two methods. Figure [236]14(c)
   shows the number of RefSeq miRNA genes related to heart failure-related
   disease. Larger numbers suggest that FE has correctly extracted mRNAs
   related to the target disease, therefore CPCAFE outperformed the other
   two methods. Figure [237]14(d) shows the number of KEGG pathways
   identified by uploading the miRNAs shown in Figure [238]14(a) to
   DIANA-mirpath. Although categorical regression based FE outperformed
   the other two methods, it may have incorrectly extracted the most
   unique miRNAs in Figure [239]14(a), therefore increased number of KEGG
   pathways does not always reflect method superiority. Indeed, counting
   the average number of miRNAs contributing to each pathway (Figure
   [240]14(e)), indicates that CPCAFE and BAHSIC are comparable while
   categorical regression based FE performs worse. This suggests that
   categorical regression based FE cannot identify as many KEGG pathways
   as CPCAFE or BAHSIC if the same number of unique miRNAs are extracted
   (albeit experimentally). Thus, it is reasonable to assume that CPCAFE
   and BAHSIC outperformed categorical regression based FE in this
   context. Figure [241]14(f) shows the possible number of mRNA/miRNAs
   pairs, i.e., product of the unique miRNAs (Figure [242]14(a)) and
   RefSeq accompanying mRNAs (Figure [243]14(b)). Although these numbers
   are comparable among the three FE methods, the number of miRNA and
   targeted mRNA pairs is distinct (Figure [244]14(g)). Considering the
   ratio (Figure [245]14(h)) (obtained by dividing the numbers in Fig.
   [246]14(g) by those in Figure [247]14(f)), CPCAFE appears worst (i.e.,
   smallest), but this is reversed if significance is considered. Figure
   [248]14(i) shows the number of miRNA and targeted mRNA pairs with
   significant correlations. Converting these numbers to a ratio (Figure
   [249]14(j)) by dividing the number of pairs of miRNAs and targeted
   mRNAs (Figure [250]14(g)), CPCAFE outperformed the other two methods.
   The same result is obtained when considering negative correlations
   (Figures [251]14(k) and (l)), which are desirable as miRNAs suppress
   target mRNA expression. Overall, these results (from Figure
   [252]14(g)-(l)) suggest that CPCAFE extracted significantly more miRNA
   and targeted mRNA pairs than the other two methods, which had
   apparently extracted more. In conclusion, based on our summarized
   discussions, we conclude that CPCAFE outperforms two other FE methods.

Figure 14.

   Figure 14
   [253]Open in a new tab

   Barplots comparing CPCAFE, categorical regression based FE, and BAHSIC.
   See text for details about panels included, from (a) to (l).

Figure 15.

   Figure 15
   [254]Open in a new tab

   Venn diagram of miRNAs extracted by CPCAFE and BAHSIC.

Conclusions

   We have extended VBPCA for FE. Although VBPCAFE is an effective method
   when applied to simulated data, feature-dependent extension inevitably
   makes it computationally challenging. Thus, we replaced VBPCA with the
   simpler CPCA, and achieved reasonable FE performance on simulated data.
   In order to demonstrate CPCAFE effectiveness on real data, we performed
   an integrated analysis of mRNA and miRNA expression from stressed mouse
   heart, investigating the underlying molecular biology and
   transcriptomic background of PTSD-mediated heart disease. CPCAFE
   successfully identified aberrantly expressed miRNAs and their
   negatively regulated target mRNAs. Biological significance of the
   identified miRNAs and mRNAs was confirmed. Equivalence between CPCAFE
   and VBPCAFE was demonstrated by applying both methods to the PTSD data
   set. Two conventional supervised FE methods were also tested, with
   CPCAFE outperforming them both. In silico drug discovery was performed
   for a selected gene, FABP3, with the top-ranked compounds identified
   including protein inhibitors reportedly upregulated in heart disease.

Methods

Simulations based on synthetic data of categorical multiclass samples

   To simulate multiclass samples with N features and M samples,
   expression values were modeled using Gaussian distributions with μ [k]
   mean (k class). Standard deviations were fixed at 0.5. μ [k] was
   [MATH:
   <mrow><msub><mrow><mi>μ</mi></mrow><mrow><mi>k</mi></mrow></msub><mo>=<
   /mo><mfenced close=")" open="("
   separators=""><mrow><mfrac><mrow><mi>k</mi><mo>−</mo><mn>1</mn></mrow><
   mrow><mi>K</mi><mo>−</mo><mn>1</mn></mrow></mfrac><mo>−</mo><mfrac><mro
   w><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac></mrow></mfenced><mi>
   s</mi><mo>,</mo><mi>k</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,
   </mo><mi>K</mi><mo>,</mo></mrow> :MATH]

   with K indicating class number. Considering x [ij] to be an expression
   of the ith feature of the jth sample, it obeys:
   [MATH: <mrow><msub><mrow><mi>x</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><mo>∈</mo><mspace
   width="0.3em"></mspace><mfenced close="" open="{"
   separators=""><mrow><mspace width="0.3em"></mspace><mtable
   class="array" columnalign="left"><mtr><mtd><mi
   mathvariant="script">N</mi><mfenced close=")" open="("
   separators=""><mrow><msub><mrow><mi>μ</mi></mrow><mrow><mi>k</mi></mrow
   ></msub><mo>,</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow>
   </mfrac></mrow></mfenced><mo>,</mo></mtd><mtd><mn>1</mn><mo>≤</mo><mi>i
   </mi><mo>≤</mo><mfrac><mrow><msup><mrow><mi>N</mi></mrow><mrow><mo>′</m
   o></mrow></msup></mrow><mrow><mn>2</mn></mrow></mfrac><mo>,</mo></mtd><
   mtd><mo>(</mo><mi>k</mi><mo>−</mo><mn>1</mn><mo>)</mo><mfrac><mrow><mi>
   M</mi></mrow><mrow><mi>K</mi></mrow></mfrac><mo><</mo><mi>j</mi><mo>≤</
   mo><mi>k</mi><mfrac><mrow><mi>M</mi></mrow><mrow><mi>K</mi></mrow></mfr
   ac></mtd></mtr><mtr><mtd><mi mathvariant="script">N</mi><mfenced
   close=")" open="("
   separators=""><mrow><mo>−</mo><msub><mrow><mi>μ</mi></mrow><mrow><mi>k<
   /mi></mrow></msub><mo>,</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</
   mn></mrow></mfrac></mrow></mfenced><mo>,</mo></mtd><mtd><mfrac><mrow><m
   sup><mrow><mi>N</mi></mrow><mrow><mo>′</mo></mrow></msup></mrow><mrow><
   mn>2</mn></mrow></mfrac><mo><</mo><mi>i</mi><mo>≤</mo><msup><mrow><mi>N
   </mi></mrow><mrow><mo>′</mo></mrow></msup><mo>,</mo></mtd><mtd><mo>(</m
   o><mi>k</mi><mo>−</mo><mn>1</mn><mo>)</mo><mfrac><mrow><mi>M</mi></mrow
   ><mrow><mi>K</mi></mrow></mfrac><mo><</mo><mi>j</mi><mo>≤</mo><mi>k</mi
   ><mfrac><mrow><mi>M</mi></mrow><mrow><mi>K</mi></mrow></mfrac></mtd></m
   tr><mtr><mtd><mi mathvariant="script">N</mi><mfenced close=")" open="("
   separators=""><mrow><mi>ε</mi><mo>,</mo><mfrac><mrow><mn>1</mn></mrow><
   mrow><mn>2</mn></mrow></mfrac></mrow></mfenced><mo>,</mo></mtd><mtd><ms
   up><mrow><mi>N</mi></mrow><mrow><mo>′</mo></mrow></msup><mo><</mo><mi>i
   </mi><mo>≤</mo><mi>N</mi></mtd></mtr></mtable></mrow></mfenced><mo>,</m
   o></mrow> :MATH]

   where
   [MATH: <mi
   mathvariant="script">N</mi><mo>(</mo><mi>μ</mi><mo>,</mo><mi>σ</mi><mo>
   )</mo> :MATH]
   is the normal distribution with a mean of μ and standard deviation of
   σ. The reason for including positive and negative μ [k] signs was to
   simulate coexistence of up/downregulated genes among multiple classes.
   The s parameter represents difficulty of distinguishing among multiple
   classes. Larger (smaller) s indicates easier (harder) samples with
   fixed K. ε is a random number satisfying the probability, P:
   [MATH:
   <mrow><mi>P</mi><mo>(</mo><mi>ε</mi><mo>=</mo><msub><mrow><mi>μ</mi></m
   row><mrow><mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><mfrac><mrow><mn>
   1</mn></mrow><mrow><mi>K</mi></mrow></mfrac></mrow> :MATH]

   In the present study, N=100,N ^′=10,K=4, and M=20 were used, and
   performances averaged over 100 independent ensembles.

One vs one t test based FE of a simulated categorical multiclass data set

   P values,
   [MATH:
   <msubsup><mrow><mi>P</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>k</mi>
   <mo>,</mo><msup><mrow><mi>k</mi></mrow><mrow><mo>′</mo></mrow></msup></
   mrow></msubsup> :MATH]
   , attributed to each i were computed using t tests to compare between
   [MATH: <mfenced close="}" open="{"
   separators=""><mrow><msub><mrow><mi>x</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><mo>;</mo><mo>(</mo><mi>k<
   /mi><mo>−</mo><mn>1</mn><mo>)</mo><mfrac><mrow><mi>M</mi></mrow><mrow><
   mi>K</mi></mrow></mfrac><mo><</mo><mi>j</mi><mo>≤</mo><mi>k</mi><mfrac>
   <mrow><mi>M</mi></mrow><mrow><mi>K</mi></mrow></mfrac></mrow></mfenced>
   :MATH]
   and
   [MATH: <mfenced close="}" open="{"
   separators=""><mrow><msub><mrow><mi>x</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><mo>;</mo><mo>(</mo><msup>
   <mrow><mi>k</mi></mrow><mrow><mo>′</mo></mrow></msup><mo>−</mo><mn>1</m
   n><mo>)</mo><mfrac><mrow><mi>M</mi></mrow><mrow><mi>K</mi></mrow></mfra
   c><mo><</mo><mi>j</mi><mo>≤</mo><msup><mrow><mi>k</mi></mrow><mrow><mo>
   ′</mo></mrow></msup><mfrac><mrow><mi>M</mi></mrow><mrow><mi>K</mi></mro
   w></mfrac></mrow></mfenced> :MATH]
   , for all K(K−1)/2 pairs of k and k ^′.
   [MATH:
   <msubsup><mrow><mi>P</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>k</mi>
   <mo>,</mo><msup><mrow><mi>k</mi></mrow><mrow><mo>′</mo></mrow></msup></
   mrow></msubsup> :MATH]
   was adjusted by BH criterion [[255]19], with each k,k ^′ pair and the
   ith gene adjusted.
   [MATH:
   <msubsup><mrow><mi>P</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>k</mi>
   <mo>,</mo><msup><mrow><mi>k</mi></mrow><mrow><mo>′</mo></mrow></msup></
   mrow></msubsup> :MATH]
   s <0.05 for all (k,k ^′) pairs were identified as distinct features
   between multiple classes.

Categorical regression-based FE

   Categorical regression-based FE was defined as follows, x [ij] reflects
   “expression” of the jth feature of the ith samples, therefore x [ij]
   can be represented as:
   [MATH: <mrow><msub><mrow><mi>x</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><mo>=</mo><msub><mrow><mi>
   C</mi></mrow><mrow><mi>i</mi><mn>0</mn></mrow></msub><mo>+</mo><munder>
   <mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>k</mi></mrow></munder><msub><mrow
   ><mi>C</mi></mrow><mrow><mtext
   mathvariant="italic">ik</mtext></mrow></msub><msub><mrow><mi>δ</mi></mr
   ow><mrow><mtext
   mathvariant="italic">jk</mtext></mrow></msub><mo>,</mo></mrow> :MATH]

   with δ [jk] = 1 only when the jth sample belongs to the kth category,
   otherwise it is 0. Category summation was taken and C [ik]s are fitting
   parameters. Because independent variables are categorical, the above
   regression equation belongs to a category of equations often named as
   categorical regression. For each ith feature, P-values were computed
   using the lm function implemented in R[[256]20] (this can be easily
   performed if factors corresponding to experimental setups are used as
   independent variables in lm).

   FE was used for simulated categorical multiclass regression in two
   ways. The first selected features with BH criterion [[257]19] adjusted
   P-values <0.05. The second selected the top 10 features with smallest
   P-values. Finally, for the PTSD data set, the 100 top-ranked features
   with smallest P-values were selected as extracted features.

BAHSIC

   BAHSIC uses the Hilbert-Schmidt norm of the cross-covariance operator
   (HSIC), defined as follows:
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd
   class="eqnarray-1"><mo>|</mo><mo>|</mo><msub><mrow><mi>C</mi></mrow><mr
   ow><mtext
   mathvariant="italic">jk</mtext></mrow></msub><mo>|</mo><msubsup><mrow><
   mo>|</mo></mrow><mrow><mtext
   mathvariant="italic">HS</mtext></mrow><mrow><mn>2</mn></mrow></msubsup>
   </mtd><mtd class="eqnarray-2"><mo>≡</mo></mtd><mtd
   class="eqnarray-3"><msub><mrow><mfenced close="〉" open="〈"
   separators=""><mrow><mfenced close=")" open="("
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mr
   ow></munder><msub><mrow><mi>x</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><msub><mrow><mi>x</mi></mr
   ow><mrow><mi>i</mi><msup><mrow><mi>j</mi></mrow><mrow><mo>′</mo></mrow>
   </msup></mrow></msub></mrow></mfenced><msub><mrow><mi>δ</mi></mrow><mro
   w><mtext
   mathvariant="italic">jk</mtext></mrow></msub><msub><mrow><mi>δ</mi></mr
   ow><mrow><msup><mrow><mi>j</mi></mrow><mrow><mo>′</mo></mrow></msup><ms
   up><mrow><mi>k</mi></mrow><mrow><mo>′</mo></mrow></msup></mrow></msub><
   msub><mrow><mi>δ</mi></mrow><mrow><mi>k</mi><msup><mrow><mi>k</mi></mro
   w><mrow><mo>′</mo></mrow></msup></mrow></msub></mrow></mfenced></mrow><
   mrow><mi>j</mi><msup><mrow><mi>j</mi></mrow><mrow><mo>′</mo></mrow></ms
   up></mrow></msub></mtd><mtd><mtext></mtext></mtd></mtr><mtr><mtd
   class="eqnarray-1"></mtd><mtd class="eqnarray-2"><mo>+</mo></mtd><mtd
   class="eqnarray-3"><msub><mrow><mfenced close="〉" open="〈"
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mr
   ow></munder><msub><mrow><mi>x</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><msub><mrow><mi>x</mi></mr
   ow><mrow><mi>i</mi><msup><mrow><mi>j</mi></mrow><mrow><mo>′</mo></mrow>
   </msup></mrow></msub></mrow></mfenced></mrow><mrow><mi>j</mi><msup><mro
   w><mi>j</mi></mrow><mrow><mo>′</mo></mrow></msup></mrow></msub><msub><m
   row><mfenced close="〉" open="〈"
   separators=""><mrow><msub><mrow><mi>δ</mi></mrow><mrow><mtext
   mathvariant="italic">jk</mtext></mrow></msub><msub><mrow><mi>δ</mi></mr
   ow><mrow><msup><mrow><mi>j</mi></mrow><mrow><mo>′</mo></mrow></msup><ms
   up><mrow><mi>k</mi></mrow><mrow><mo>′</mo></mrow></msup></mrow></msub><
   msub><mrow><mi>δ</mi></mrow><mrow><mi>k</mi><msup><mrow><mi>k</mi></mro
   w><mrow><mo>′</mo></mrow></msup></mrow></msub></mrow></mfenced></mrow><
   mrow><mi>j</mi><msup><mrow><mi>j</mi></mrow><mrow><mo>′</mo></mrow></ms
   up></mrow></msub></mtd><mtd><mtext></mtext></mtd></mtr><mtr><mtd
   class="eqnarray-1"></mtd><mtd class="eqnarray-2"><mo>−</mo></mtd><mtd
   class="eqnarray-3"><mn>2</mn><msub><mrow><mfenced close="〉" open="〈"
   separators=""><mrow><msub><mrow><mfenced close="〉" open="〈"
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mr
   ow></munder><msub><mrow><mi>x</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><msub><mrow><mi>x</mi></mr
   ow><mrow><mi>i</mi><msup><mrow><mi>j</mi></mrow><mrow><mi>′′</mi></mrow
   ></msup></mrow></msub></mrow></mfenced></mrow><mrow><msup><mrow><mi>j</
   mi></mrow><mrow><mi>′′</mi></mrow></msup></mrow></msub><msub><mrow><mfe
   nced close="〉" open="〈"
   separators=""><mrow><msub><mrow><mi>δ</mi></mrow><mrow><msup><mrow><mi>
   j</mi></mrow><mrow><mo>′</mo></mrow></msup><mi>k</mi></mrow></msub><msu
   b><mrow><mi>δ</mi></mrow><mrow><msup><mrow><mi>j</mi></mrow><mrow><mi>′
   ′′</mi></mrow></msup><msup><mrow><mi>k</mi></mrow><mrow><mo>′</mo></mro
   w></msup></mrow></msub><msub><mrow><mi>δ</mi></mrow><mrow><mi>k</mi><ms
   up><mrow><mi>k</mi></mrow><mrow><mo>′</mo></mrow></msup></mrow></msub><
   /mrow></mfenced></mrow><mrow><msup><mrow><mi>j</mi></mrow><mrow><mi>′′′
   </mi></mrow></msup></mrow></msub></mrow></mfenced></mrow><mrow><mi>j</m
   i><msup><mrow><mi>j</mi></mrow><mrow><mo>′</mo></mrow></msup></mrow></m
   sub><mo>,</mo></mtd><mtd><mtext></mtext></mtd></mtr></mtable> :MATH]

   where 〈·〉[j] and
   [MATH:
   <msub><mrow><mo>〈</mo><mo>·</mo><mo>〉</mo></mrow><mrow><mi>j</mi><msup>
   <mrow><mi>j</mi></mrow><mrow><mo>′</mo></mrow></msup></mrow></msub>
   :MATH]
   are averaged over all j and j,j ^′ pairs, respectively, and
   [MATH:
   <msub><mrow><mi>δ</mi></mrow><mrow><mi>k</mi><msup><mrow><mi>k</mi></mr
   ow><mrow><mo>′</mo></mrow></msup></mrow></msub> :MATH]
   is cronecker’s delta. The reason why linear kernel was employed was
   because it was shown to achieve best performances when applied to
   microarray gene expression [[258]10]. In BAHSIC, features with smaller
   HSIC are iteratively discarded until the desired number of features
   remain. The number of features discarded at each step was 1 for the
   simulated categorical multiclass data set and 10% of remaining features
   for the PTSD data set.

   R code for this algorithm is available in Additional file [259]2.

Extended VBPCA

   Before extending VBPCA, conventional VBPCA is briefly explained.

   X={x [ij]} was modeled [[260]2,[261]3] as:
   [MATH:
   <mrow><mi>X</mi><mo>=</mo><mi>B</mi><msup><mrow><mi>A</mi></mrow><mrow>
   <mi>T</mi></mrow></msup><mo>+</mo><mi>E</mi><mo>,</mo></mrow> :MATH]

   where B and A are N×Q and M×Q matrices, respectively, and E is a N×M
   matrix obeying a Gaussian distribution with zero mean and standard
   deviation, σ [E]. The purpose of VBPCA is to obtain an optimal
   approximation using Q≪N,M.

   Thus, following conventional notation and after substituting X=V, the
   conditional probability is:
   [MATH:
   <mrow><mi>p</mi><mo>(</mo><mi>V</mi><mo>|</mo><mi>A</mi><mo>,</mo><mi>B
   </mi><mo>)</mo><mo>∝</mo><mo>exp</mo><mfenced close=")" open="("
   separators=""><mrow><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mund
   erover><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mrow><mrow><mn>2</mn></
   mrow></munderover></mrow></mfrac><mo>|</mo><mo>|</mo><mi>V</mi><mo>−</m
   o><mi>B</mi><msup><mrow><mi>A</mi></mrow><mrow><mi>T</mi></mrow></msup>
   <mo>|</mo><msubsup><mrow><mo>|</mo></mrow><mrow><mtext>Fro</mtext></mro
   w><mrow><mn>2</mn></mrow></msubsup></mrow></mfenced><mo>,</mo></mrow>
   :MATH]

   where ||·||[Fro] is the Frobenius norm matrix. In order to obtain
   optimal Q values, a prior distribution was assumed:
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd
   class="eqnarray-1"><mi>P</mi><mo>(</mo><mi>A</mi><mo>)</mo></mtd><mtd
   class="eqnarray-2"><mo>∝</mo></mtd><mtd
   class="eqnarray-3"><mo>exp</mo><mfenced close="]" open="["
   separators=""><mrow><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2
   </mn></mrow></mfrac><mtext>tr</mtext><mfenced close=")" open="("
   separators=""><mrow><mi>A</mi><munderover><mrow><mi>C</mi></mrow><mrow>
   <mi>A</mi></mrow><mrow><mo>−</mo><mn>1</mn></mrow></munderover><msup><m
   row><mi>A</mi></mrow><mrow><mi>T</mi></mrow></msup></mrow></mfenced></m
   row></mfenced></mtd><mtd><mtext></mtext></mtd></mtr><mtr><mtd
   class="eqnarray-1"><mi>P</mi><mo>(</mo><mi>B</mi><mo>)</mo></mtd><mtd
   class="eqnarray-2"><mo>∝</mo></mtd><mtd
   class="eqnarray-3"><mo>exp</mo><mfenced close="]" open="["
   separators=""><mrow><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2
   </mn></mrow></mfrac><mtext>tr</mtext><mfenced close=")" open="("
   separators=""><mrow><mi>B</mi><munderover><mrow><mi>C</mi></mrow><mrow>
   <mi>B</mi></mrow><mrow><mo>−</mo><mn>1</mn></mrow></munderover><msup><m
   row><mi>B</mi></mrow><mrow><mi>T</mi></mrow></msup></mrow></mfenced></m
   row></mfenced><mo>,</mo></mtd><mtd><mtext></mtext></mtd></mtr></mtable>
   :MATH]

   where C [A] and C [B] are Q×Q diagonal positive matrices qth (q=1,…,Q)
   is a diagonal element of C [A] and C [B] that expresses the importance
   of the obtained qth principal component. Free energy was minimized as:
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd class="eqnarray-1"><mi>F</mi></mtd><mtd
   class="eqnarray-2"><mspace width="0.3em"></mspace><mo>=</mo><mspace
   width="0.3em"></mspace></mtd><mtd
   class="eqnarray-3"><mfrac><mrow><mo>|</mo><mo>|</mo><mi>V</mi><mo>|</mo
   ><msubsup><mrow><mo>|</mo></mrow><mrow><mtext>Fro</mtext></mrow><mrow><
   mn>2</mn></mrow></msubsup></mrow><mrow><mn>2</mn><msubsup><mrow><mi>σ</
   mi></mrow><mrow><mi>E</mi></mrow><mrow><mn>2</mn></mrow></msubsup></mro
   w></mfrac><mo>+</mo><mfrac><mrow><mtext
   mathvariant="italic">NM</mtext></mrow><mrow><mn>2</mn></mrow></mfrac><m
   o>log</mo><munderover><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mrow><mr
   ow><mn>2</mn></mrow></munderover><mo>+</mo><mfrac><mrow><mi>M</mi></mro
   w><mrow><mn>2</mn></mrow></mfrac><mo>log</mo><mfrac><mrow><mo>|</mo><ms
   ub><mrow><mi>C</mi></mrow><mrow><mi>A</mi></mrow></msub><mo>|</mo></mro
   w><mrow><mo>|</mo><msub><mrow><mi>Σ</mi></mrow><mrow><mi>A</mi></mrow><
   /msub><mo>|</mo></mrow></mfrac><mo>+</mo><mfrac><mrow><mi>N</mi></mrow>
   <mrow><mn>2</mn></mrow></mfrac><mo>log</mo><mfrac><mrow><mo>|</mo><msub
   ><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow></msub><mo>|</mo></mrow>
   <mrow><mo>|</mo><msub><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow></m
   sub><mo>|</mo></mrow></mfrac></mtd><mtd><mtext></mtext></mtd></mtr><mtr
   ><mtd
   class="eqnarray-1"><msup><mrow><mo>(</mo><msubsup><mrow><mi>C</mi></mro
   w><mrow><mi>B</mi></mrow><mrow><mi>i</mi></mrow></msubsup><mo>)</mo></m
   row><mrow><mo>−</mo><mn>1</mn></mrow></msup></mtd><mtd
   class="eqnarray-2"><mo>+</mo></mtd><mtd
   class="eqnarray-3"><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow
   ></mfrac><mtext>tr</mtext><mfenced close="" open="{"
   separators=""><mrow><msubsup><mrow><mi>C</mi></mrow><mrow><mi>A</mi></m
   row><mrow><mo>−</mo><mn>1</mn></mrow></msubsup><mfenced close=")"
   open="("
   separators=""><mrow><msup><mrow><mi>Â</mi></mrow><mrow><mi>T</mi></mrow
   ></msup><mi>Â</mi><mo>+</mo><mi>M</mi><msub><mrow><mi>Σ</mi></mrow><mro
   w><mi>A</mi></mrow></msub></mrow></mfenced><mo>+</mo><msubsup><mrow><mi
   >C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mo>−</mo><mn>1</mn></mrow><
   /msubsup><mfenced close=")" open="("
   separators=""><mrow><msup><mrow><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover></mrow><mrow><mi
   >T</mi></mrow></msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><mo>+</mo><mi>N<
   /mi><msub><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow></msub></mrow><
   /mfenced></mrow></mfenced></mtd><mtd><mtext></mtext></mtd></mtr><mtr><m
   td class="eqnarray-1"></mtd><mtd class="eqnarray-2"><mspace
   width="0.3em"></mspace><mo>+</mo><mspace
   width="0.3em"></mspace></mtd><mtd class="eqnarray-3"><mspace
   width="0.3em"></mspace><mfenced close="}" open=""
   separators=""><mrow><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></m
   row><mrow><mo>−</mo><mn>2</mn></mrow></msubsup><mfenced close=")"
   open="("
   separators=""><mrow><mo>−</mo><mn>2</mn><msup><mrow><mi>Â</mi></mrow><m
   row><mi>T</mi></mrow></msup><msup><mrow><mi>V</mi></mrow><mrow><mi>T</m
   i></mrow></msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><mspace
   width="0.3em"></mspace><mo>+</mo><mfenced close=")" open="("
   separators=""><mrow><msup><mrow><mi>Â</mi></mrow><mrow><mi>T</mi></mrow
   ></msup><mi>Â</mi><mo>+</mo><mspace
   width="0.3em"></mspace><mi>M</mi><msub><mrow><mi>σ</mi></mrow><mrow><mi
   >A</mi></mrow></msub></mrow></mfenced><mfenced close=")" open="("
   separators=""><mrow><msup><mrow><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover></mrow><mrow><mi
   >T</mi></mrow></msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><mspace
   width="0.3em"></mspace><mo>+</mo><mi>N</mi><msub><mrow><mi>Σ</mi></mrow
   ><mrow><mi>B</mi></mrow></msub></mrow></mfenced></mrow></mfenced></mrow
   ></mfenced></mtd><mtd><mtext></mtext></mtd></mtr><mtr><mtd
   class="eqnarray-1"></mtd><mtd class="eqnarray-2"><mspace
   width="0.3em"></mspace><mo>+</mo><mspace
   width="0.3em"></mspace></mtd><mtd
   class="eqnarray-3"><mtext>const.</mtext></mtd><mtd><mtext></mtext></mtd
   ></mtr></mtable> :MATH]

   where Σ [A] and Σ [B] are variance matrices of
   [MATH: <mi>Â</mi> :MATH]
   and
   [MATH: <mover accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover>
   :MATH]
   , estimated from A and B by the variational method. Locally optimal
   [MATH: <mi>Â</mi><mo>,</mo><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><mo>,</mo><msub>
   <mrow><mi>Σ</mi></mrow><mrow><mi>A</mi></mrow></msub><mo>,</mo><msub><m
   row><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow></msub><mo>,</mo><msub><mro
   w><mi>C</mi></mrow><mrow><mi>B</mi></mrow></msub> :MATH]
   , and σ [E] were obtained by performing the following iterative updates
   in this order:
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd
   class="eqnarray-1"><msub><mrow><mi>Σ</mi></mrow><mrow><mi>A</mi></mrow>
   </msub></mtd><mtd class="eqnarray-2"><mo>←</mo></mtd><mtd
   class="eqnarray-3"><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mr
   ow><mrow><mn>2</mn></mrow></msubsup><msup><mrow><mfenced close=")"
   open="(" separators=""><mrow><msup><mrow><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover></mrow><mrow><mi
   >T</mi></mrow></msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><mo>+</mo><mi>N<
   /mi><msub><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow></msub><mo>+</m
   o><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mrow><mrow><mn>2</m
   n></mrow></msubsup><msubsup><mrow><mi>C</mi></mrow><mrow><mi>A</mi></mr
   ow><mrow><mo>−</mo><mn>1</mn></mrow></msubsup></mrow></mfenced></mrow><
   mrow><mo>−</mo><mn>1</mn></mrow></msup></mtd><mtd><mtext></mtext></mtd>
   </mtr><mtr><mtd class="eqnarray-1"><mi>Â</mi></mtd><mtd
   class="eqnarray-2"><mo>←</mo></mtd><mtd
   class="eqnarray-3"><msup><mrow><mi>V</mi></mrow><mrow><mi>T</mi></mrow>
   </msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><mfrac><mrow><ms
   ub><mrow><mi>Σ</mi></mrow><mrow><mi>A</mi></mrow></msub></mrow><mrow><m
   subsup><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mrow><mrow><mn>2</mn></
   mrow></msubsup></mrow></mfrac></mtd><mtd><mtext></mtext></mtd></mtr><mt
   r><mtd
   class="eqnarray-1"><msub><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow>
   </msub></mtd><mtd class="eqnarray-2"><mo>←</mo></mtd><mtd
   class="eqnarray-3"><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mr
   ow><mrow><mn>2</mn></mrow></msubsup><msup><mrow><mfenced close=")"
   open="("
   separators=""><mrow><msup><mrow><mi>Â</mi></mrow><mrow><mi>T</mi></mrow
   ></msup><mi>Â</mi><mo>+</mo><mi>M</mi><msub><mrow><mi>Σ</mi></mrow><mro
   w><mi>A</mi></mrow></msub><mo>+</mo><msubsup><mrow><mi>σ</mi></mrow><mr
   ow><mi>E</mi></mrow><mrow><mn>2</mn></mrow></msubsup><msubsup><mrow><mi
   >C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mo>−</mo><mn>1</mn></mrow><
   /msubsup></mrow></mfenced></mrow><mrow><mo>−</mo><mn>1</mn></mrow></msu
   p><mo>,</mo></mtd><mtd><mtext></mtext></mtd></mtr><mtr><mtd
   class="eqnarray-1"><mover accent="true"><mrow><mtext
   mathvariant="italic">B</mtext></mrow><mo>^</mo></mover></mtd><mtd
   class="eqnarray-2"><mo>←</mo></mtd><mtd class="eqnarray-3"><mtext
   mathvariant="italic">V</mtext><mi>Â</mi><mfrac><mrow><msub><mrow><mi>Σ<
   /mi></mrow><mrow><mi>B</mi></mrow></msub></mrow><mrow><msubsup><mrow><m
   i>σ</mi></mrow><mrow><mi>E</mi></mrow><mrow><mn>2</mn></mrow></msubsup>
   </mrow></mfrac><mo>,</mo></mtd><mtd><mtext></mtext></mtd></mtr><mtr><mt
   d
   class="eqnarray-1"><msub><mrow><mo>(</mo><msub><mrow><mi>Σ</mi></mrow><
   mrow><mi>A</mi></mrow></msub><mo>)</mo></mrow><mrow><mtext
   mathvariant="italic">qq</mtext></mrow></msub><mo>,</mo><msubsup><mrow><
   mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>q</mi></mrow></msubsup
   ></mtd><mtd class="eqnarray-2"><mo>←</mo></mtd><mtd
   class="eqnarray-3"><mfrac><mrow><mo>|</mo><mo>|</mo><msub><mrow><mover
   accent="true"><mrow><mtext
   mathvariant="italic">B</mtext></mrow><mo>^</mo></mover></mrow><mrow><mi
   >q</mi></mrow></msub><mo>|</mo><msup><mrow><mo>|</mo></mrow><mrow><mn>2
   </mn></mrow></msup></mrow><mrow><mi>N</mi></mrow></mfrac><mo>+</mo><msu
   b><mrow><mo>(</mo><msub><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow><
   /msub><mo>)</mo></mrow><mrow><mtext
   mathvariant="italic">qq</mtext></mrow></msub><mo>,</mo><mi>q</mi><mo>=<
   /mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>Q</mi><mo>,</mo></mtd><
   mtd><mtext></mtext></mtd></mtr><mtr><mtd
   class="eqnarray-1"><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mr
   ow><mrow><mn>2</mn></mrow></msubsup></mtd><mtd
   class="eqnarray-2"><mo>←</mo></mtd><mtd
   class="eqnarray-3"><mfrac><mrow><mn>1</mn></mrow><mrow><mtext
   mathvariant="italic">NM</mtext></mrow></mfrac><mfenced close=""
   open="{"
   separators=""><mrow><mo>|</mo><mo>|</mo><mi>V</mi><mo>|</mo><msubsup><m
   row><mo>|</mo></mrow><mrow><mtext>Fro</mtext></mrow><mrow><mn>2</mn></m
   row></msubsup><mo>−</mo><mtext>tr</mtext><mfenced close=")" open="("
   separators=""><mrow><mn>2</mn><msup><mrow><mi>V</mi></mrow><mrow><mi>T<
   /mi></mrow></msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><msup><mrow><mi>
   Â</mi></mrow><mrow><mi>T</mi></mrow></msup></mrow></mfenced></mrow></mf
   enced></mtd><mtd><mtext></mtext></mtd></mtr><mtr><mtd
   class="eqnarray-1"></mtd><mtd class="eqnarray-2"><mo>+</mo></mtd><mtd
   class="eqnarray-3"><mfenced close="}" open=""
   separators=""><mrow><mtext>tr</mtext><mfenced close=")" open="("
   separators=""><mrow><mfenced close=")" open="("
   separators=""><mrow><msup><mrow><mi>Â</mi></mrow><mrow><mi>T</mi></mrow
   ></msup><mi>Â</mi><mo>+</mo><mi>M</mi><msub><mrow><mi>Σ</mi></mrow><mro
   w><mi>A</mi></mrow></msub></mrow></mfenced><mfenced close=")" open="("
   separators=""><mrow><msup><mrow><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover></mrow><mrow><mi
   >T</mi></mrow></msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><mo>+</mo><mi>N<
   /mi><msub><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow></msub></mrow><
   /mfenced></mrow></mfenced></mrow></mfenced><mo>,</mo></mtd><mtd><mtext>
   </mtext></mtd></mtr></mtable> :MATH]

   where
   [MATH: <msub><mrow><mover accent="true"><mrow><mtext
   mathvariant="italic">B</mtext></mrow><mo>^</mo></mover></mrow><mrow><mi
   >q</mi></mrow></msub> :MATH]
   is the qth columnar vector of matrix
   [MATH: <mover accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover>
   :MATH]
   , and
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>q</mi>
   </mrow></msubsup> :MATH]
   is the qth diagonal element of C [B]. ith row V vector. In addition to
   the above iteration, which should be repeated from top to bottom, i.e.,
   Σ [A] in the left hand side of the first equation substituted to Σ [A]
   in the right hand side of the second equation, C [A] should be I in
   order to simulate conventional PCA [[262]21]. It is also known that C
   [B] can be used to estimate the optimal number of PCs [[263]2].

   Next, we extended VBPCA so that it can extract features. In order to do
   this, C [B] was assumed to have i,(i=1,…,N) dependence, and should be
   denoted as
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>b</mi></mrow><mrow><mi>i</mi>
   </mrow></msubsup> :MATH]
   . Thus, P(B) should also have i dependence as:
   [MATH:
   <mrow><mi>P</mi><mo>(</mo><msub><mrow><mi>B</mi></mrow><mrow><mtext
   mathvariant="italic">iq</mtext></mrow></msub><mo>)</mo><mo>∝</mo><mo>ex
   p</mo><mfenced close="}" open="{"
   separators=""><mrow><mo>−</mo><mfrac><mrow><msup><mrow><mo>(</mo><msub>
   <mrow><mi>B</mi></mrow><mrow><mtext
   mathvariant="italic">iq</mtext></mrow></msub><mo>)</mo></mrow><mrow><mn
   >2</mn></mrow></msup></mrow><mrow><mn>2</mn><munderover><mrow><mi>C</mi
   ></mrow><mrow><mi>B</mi></mrow><mrow><mtext
   mathvariant="italic">iq</mtext></mrow></munderover></mrow></mfrac></mro
   w></mfenced><mi>.</mi></mrow> :MATH]

   Furthermore, it is required that Σ [B] has i dependence, and should
   also be denoted as
   [MATH:
   <msubsup><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   </mrow></msubsup> :MATH]
   . For example, in the above equations, N C [B] and N Σ [B] are replaced
   with
   [MATH:
   <munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mrow></munder><msubsup
   ><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi></mrow><
   /msubsup> :MATH]
   and
   [MATH:
   <munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mrow></munder><msubsup
   ><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi></mrow><
   /msubsup> :MATH]
   , respectively.

   Then
   [MATH:
   <mfrac><mrow><mi>N</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>log</m
   o><mfrac><mrow><mo>|</mo><msub><mrow><mi>C</mi></mrow><mrow><mi>B</mi><
   /mrow></msub><mo>|</mo></mrow><mrow><mo>|</mo><msub><mrow><mi>Σ</mi></m
   row><mrow><mi>B</mi></mrow></msub><mo>|</mo></mrow></mfrac> :MATH]
   in F is replaced with
   [MATH:
   <mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><munder><m
   row><mo>∑</mo></mrow><mrow><mi>i</mi></mrow></munder><mo>log</mo><mfrac
   ><mrow><mo>|</mo><munderover><mrow><mi>C</mi></mrow><mrow><mi>B</mi></m
   row><mrow><mi>i</mi></mrow></munderover><mo>|</mo></mrow><mrow><mo>|</m
   o><munderover><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i
   </mi></mrow></munderover><mo>|</mo></mrow></mfrac> :MATH]
   .
   [MATH: <mtext>tr</mtext><mfenced close="}" open="{"
   separators=""><mrow><msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></m
   row><mrow><mo>−</mo><mn>1</mn></mrow></msubsup><mfenced close=")"
   open="(" separators=""><mrow><msup><mrow><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover></mrow><mrow><mi
   >T</mi></mrow></msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover></mrow></mfenced
   ></mrow></mfenced> :MATH]
   in F is replaced with
   [MATH:
   <munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mrow></munder><msup><m
   row><mover accent="true"><mrow><msub><mrow><mtext
   mathvariant="italic">B</mtext></mrow><mrow><mi>i</mi></mrow></msub></mr
   ow><mo>^</mo></mover></mrow><mrow><mi>T</mi></mrow></msup><msup><mrow><
   mfenced close=")" open="("
   separators=""><mrow><msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></m
   row><mrow><mi>i</mi></mrow></msubsup></mrow></mfenced></mrow><mrow><mo>
   −</mo><mn>1</mn></mrow></msup><mover
   accent="true"><mrow><msub><mrow><mtext
   mathvariant="italic">B</mtext></mrow><mrow><mi>i</mi></mrow></msub></mr
   ow><mo>^</mo></mover> :MATH]
   , where
   [MATH: <msub><mrow><mover accent="true"><mrow><mtext
   mathvariant="italic">B</mtext></mrow><mo>^</mo></mover></mrow><mrow><mi
   >i</mi></mrow></msub> :MATH]
   is the ith row vector of
   [MATH: <mover accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover>
   :MATH]
   .
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mo>−</mo>
   <mn>1</mn></mrow></msubsup><mi>N.</mi><msub><mrow><mi>Σ</mi></mrow><mro
   w><mi>B</mi></mrow></msub> :MATH]
   and N Σ [B] in F are replaced with
   [MATH:
   <munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mrow></munder><msup><m
   row><mfenced close=")" open="("
   separators=""><mrow><msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></m
   row><mrow><mi>i</mi></mrow></msubsup></mrow></mfenced></mrow><mrow><mo>
   −</mo><mn>1</mn></mrow></msup><msubsup><mrow><mi>Σ</mi></mrow><mrow><mi
   >B</mi></mrow><mrow><mi>i</mi></mrow></msubsup> :MATH]
   and
   [MATH:
   <munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mrow></munder><msubsup
   ><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi></mrow><
   /msubsup> :MATH]
   , respectively. As a result:
   [MATH:
   <mrow><mtable><mtr><mtd><mi>F</mi></mtd><mtd><mo>=</mo><mfrac><mrow><mo
   >|</mo><mo>|</mo><mi>V</mi><mo>|</mo><msubsup><mrow><mo>|</mo></mrow><m
   row><mtext>Fro</mtext></mrow><mrow><mn>2</mn></mrow></msubsup></mrow><m
   row><mn>2</mn><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mrow><m
   row><mn>2</mn></mrow></msubsup></mrow></mfrac><mo>+</mo><mfrac><mrow><m
   text
   mathvariant="italic">NM</mtext></mrow><mrow><mn>2</mn></mrow></mfrac><m
   o>log</mo><munderover><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mrow><mr
   ow><mn>2</mn></mrow></munderover><mo>+</mo><mfrac><mrow><mi>M</mi></mro
   w><mrow><mn>2</mn></mrow></mfrac><mo>log</mo><mfrac><mrow><mo>|</mo><ms
   ub><mrow><mi>C</mi></mrow><mrow><mi>A</mi></mrow></msub><mo>|</mo></mro
   w><mrow><mo>|</mo><msub><mrow><mi>Σ</mi></mrow><mrow><mi>A</mi></mrow><
   /msub><mo>|</mo></mrow></mfrac><mo>+</mo><mfrac><mrow><mn>1</mn></mrow>
   <mrow><mn>2</mn></mrow></mfrac><munder><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi></mrow></munder><mo>log</mo
   ><mfrac><mrow><mo>|</mo><munderover><mrow><mi>C</mi></mrow><mrow><mi>B<
   /mi></mrow><mrow><mi>i</mi></mrow></munderover><mo>|</mo></mrow><mrow><
   mo>|</mo><munderover><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow><mro
   w><mi>i</mi></mrow></munderover><mo>|</mo></mrow></mfrac></mtd></mtr><m
   tr><mtd><mo>+</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow>
   </mfrac><munder><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi></mrow></munder><msup><mrow
   ><mover accent="true"><mrow><msub><mrow><mtext
   mathvariant="italic">B</mtext></mrow><mrow><mi>i</mi></mrow></msub></mr
   ow><mo>^</mo></mover></mrow><mrow><mi>T</mi></mrow></msup><msup><mrow><
   mfenced close=")" open="("
   separators=""><mrow><msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></m
   row><mrow><mi>i</mi></mrow></msubsup></mrow></mfenced></mrow><mrow><mo>
   −</mo><mn>1</mn></mrow></msup><mover
   accent="true"><mrow><msub><mrow><mtext
   mathvariant="italic">B</mtext></mrow><mrow><mi>i</mi></mrow></msub></mr
   ow><mo>^</mo></mover></mtd></mtr><mtr><mtd><mo>+</mo><mfrac><mrow><mn>1
   </mn></mrow><mrow><mn>2</mn></mrow></mfrac><mtext>tr</mtext><mfenced
   close="" open="{"
   separators=""><mrow><msubsup><mrow><mi>C</mi></mrow><mrow><mi>A</mi></m
   row><mrow><mo>−</mo><mn>1</mn></mrow></msubsup><mfenced close=")"
   open="("
   separators=""><mrow><msup><mrow><mi>Â</mi></mrow><mrow><mi>T</mi></mrow
   ></msup><mi>Â</mi><mo>+</mo><mi>M</mi><msub><mrow><mi>Σ</mi></mrow><mro
   w><mi>A</mi></mrow></msub></mrow></mfenced><mo>+</mo><munder><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi></mrow></munder><msup><mrow
   ><mfenced close=")" open="("
   separators=""><mrow><msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></m
   row><mrow><mi>i</mi></mrow></msubsup></mrow></mfenced></mrow><mrow><mo>
   −</mo><mn>1</mn></mrow></msup><msubsup><mrow><mi>Σ</mi></mrow><mrow><mi
   >B</mi></mrow><mrow><mi>i</mi></mrow></msubsup></mrow></mfenced></mtd><
   /mtr><mtr><mtd><mo>+</mo><mfenced close="}" open=""
   separators=""><mrow><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></m
   row><mrow><mo>−</mo><mn>2</mn></mrow></msubsup><mfenced close=")"
   open="("
   separators=""><mrow><mo>−</mo><mn>2</mn><msup><mrow><mi>Â</mi></mrow><m
   row><mi>T</mi></mrow></msup><msup><mrow><mi>V</mi></mrow><mrow><mi>T</m
   i></mrow></msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><mo>+</mo><mfenc
   ed close=")" open="("
   separators=""><mrow><msup><mrow><mi>Â</mi></mrow><mrow><mi>T</mi></mrow
   ></msup><mi>Â</mi><mo>+</mo><mi>M</mi><msub><mrow><mi>σ</mi></mrow><mro
   w><mi>A</mi></mrow></msub></mrow></mfenced><mfenced close=")" open="("
   separators=""><mrow><msup><mrow><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover></mrow><mrow><mi
   >T</mi></mrow></msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><mo>+</mo><munde
   r><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi></mrow></munder><msubsup><m
   row><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi></mrow></ms
   ubsup></mrow></mfenced></mrow></mfenced></mrow></mfenced></mtd></mtr><m
   tr><mtd><mo>+</mo><mtext>const.</mtext></mtd></mtr></mtable></mrow>
   :MATH]

   is obtained. Reflecting these extensions, the above iteration rules are
   modified as:
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd
   class="eqnarray-1"><msub><mrow><mi>Σ</mi></mrow><mrow><mi>A</mi></mrow>
   </msub></mtd><mtd class="eqnarray-2"><mo>←</mo></mtd><mtd
   class="eqnarray-3"><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mr
   ow><mrow><mn>2</mn></mrow></msubsup><msup><mrow><mfenced close=")"
   open="(" separators=""><mrow><msup><mrow><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover></mrow><mrow><mi
   >T</mi></mrow></msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><mo>+</mo><munde
   r><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mrow></munder><msubsup><mrow
   ><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi></mrow></msubs
   up><mo>+</mo><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mrow><mr
   ow><mn>2</mn></mrow></msubsup><msubsup><mrow><mi>C</mi></mrow><mrow><mi
   >A</mi></mrow><mrow><mo>−</mo><mn>1</mn></mrow></msubsup></mrow></mfenc
   ed></mrow><mrow><mo>−</mo><mn>1</mn></mrow></msup></mtd><mtd><mtext></m
   text></mtd></mtr><mtr><mtd class="eqnarray-1"><mi>Â</mi></mtd><mtd
   class="eqnarray-2"><mo>←</mo></mtd><mtd
   class="eqnarray-3"><msup><mrow><mi>V</mi></mrow><mrow><mi>T</mi></mrow>
   </msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><mfrac><mrow><ms
   ub><mrow><mi>Σ</mi></mrow><mrow><mi>A</mi></mrow></msub></mrow><mrow><m
   subsup><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mrow><mrow><mn>2</mn></
   mrow></msubsup></mrow></mfrac></mtd><mtd><mtext></mtext></mtd></mtr><mt
   r><mtd
   class="eqnarray-1"><msubsup><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></mr
   ow><mrow><mi>i</mi></mrow></msubsup></mtd><mtd
   class="eqnarray-2"><mo>←</mo></mtd><mtd
   class="eqnarray-3"><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mr
   ow><mrow><mn>2</mn></mrow></msubsup><msup><mrow><mfenced close=")"
   open="("
   separators=""><mrow><msup><mrow><mi>Â</mi></mrow><mrow><mi>T</mi></mrow
   ></msup><mi>Â</mi><mo>+</mo><mi>M</mi><msub><mrow><mi>Σ</mi></mrow><mro
   w><mi>A</mi></mrow></msub><mo>+</mo><msubsup><mrow><mi>σ</mi></mrow><mr
   ow><mi>E</mi></mrow><mrow><mn>2</mn></mrow></msubsup><msup><mrow><mfenc
   ed close=")" open="("
   separators=""><mrow><msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></m
   row><mrow><mi>i</mi></mrow></msubsup></mrow></mfenced></mrow><mrow><mo>
   −</mo><mn>1</mn></mrow></msup></mrow></mfenced></mrow><mrow><mo>−</mo><
   mn>1</mn></mrow></msup><mo>,</mo><mi>i</mi><mo>=</mo><mn>1</mn><mo>,</m
   o><mo>…</mo><mo>,</mo><mi>N</mi></mtd><mtd><mtext></mtext></mtd></mtr><
   mtr><mtd class="eqnarray-1"><msub><mrow><mover
   accent="true"><mrow><mtext
   mathvariant="italic">B</mtext></mrow><mo>^</mo></mover></mrow><mrow><mi
   >i</mi></mrow></msub></mtd><mtd class="eqnarray-2"><mo>←</mo></mtd><mtd
   class="eqnarray-3"><msub><mrow><mtext
   mathvariant="italic">V</mtext></mrow><mrow><mi>i</mi></mrow></msub><mi>
   Â</mi><mfrac><mrow><msubsup><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></mr
   ow><mrow><mi>i</mi></mrow></msubsup></mrow><mrow><msubsup><mrow><mi>σ</
   mi></mrow><mrow><mi>E</mi></mrow><mrow><mn>2</mn></mrow></msubsup></mro
   w></mfrac><mo>,</mo><mi>i</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>…</mo><
   mo>,</mo><mi>N</mi></mtd><mtd><mtext></mtext></mtd></mtr><mtr><mtd
   class="eqnarray-1"><msubsup><mrow><mover
   accent="true"><mrow><mi>C</mi></mrow><mo>~</mo></mover></mrow><mrow><mi
   >A</mi></mrow><mrow><mi>q</mi></mrow></msubsup></mtd><mtd
   class="eqnarray-2"><mo>←</mo></mtd><mtd
   class="eqnarray-3"><mfrac><mrow><mo>|</mo><mo>|</mo><msub><mrow><mover
   accent="true"><mrow><mtext
   mathvariant="italic">A</mtext></mrow><mo>^</mo></mover></mrow><mrow><mi
   >q</mi></mrow></msub><mo>|</mo><msup><mrow><mo>|</mo></mrow><mrow><mn>2
   </mn></mrow></msup></mrow><mrow><mi>M</mi></mrow></mfrac><mo>+</mo><msu
   b><mrow><mo>(</mo><msub><mrow><mi>Σ</mi></mrow><mrow><mi>A</mi></mrow><
   /msub><mo>)</mo></mrow><mrow><mtext
   mathvariant="italic">qq</mtext></mrow></msub><mo>,</mo><mi>q</mi><mo>=<
   /mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>Q</mi></mtd><mtd><mtext
   ></mtext></mtd></mtr><mtr><mtd
   class="eqnarray-1"><msubsup><mrow><mi>C</mi></mrow><mrow><mi>A</mi></mr
   ow><mrow><mi>q</mi></mrow></msubsup></mtd><mtd
   class="eqnarray-2"><mo>←</mo></mtd><mtd
   class="eqnarray-3"><mfrac><mrow><msubsup><mrow><mover
   accent="true"><mrow><mi>C</mi></mrow><mo>~</mo></mover></mrow><mrow><mi
   >A</mi></mrow><mrow><mi>q</mi></mrow></msubsup></mrow><mrow><munder><mr
   ow><mo>∑</mo></mrow><mrow><msup><mrow><mi>q</mi></mrow><mrow><mo>′</mo>
   </mrow></msup></mrow></munder><msubsup><mrow><mover
   accent="true"><mrow><mi>C</mi></mrow><mo>~</mo></mover></mrow><mrow><mi
   >A</mi></mrow><mrow><msup><mrow><mi>q</mi></mrow><mrow><mo>′</mo></mrow
   ></msup></mrow></msubsup></mrow></mfrac><mo>,</mo><mi>q</mi><mo>=</mo><
   mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>Q</mi></mtd><mtd><mtext></mt
   ext></mtd></mtr><mtr><mtd
   class="eqnarray-1"><msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mr
   ow><mrow><mtext
   mathvariant="italic">iq</mtext></mrow></msubsup></mtd><mtd
   class="eqnarray-2"><mo>←</mo></mtd><mtd
   class="eqnarray-3"><msup><mrow><mo>(</mo><msub><mrow><mi>B</mi></mrow><
   mrow><mtext
   mathvariant="italic">iq</mtext></mrow></msub><mo>)</mo></mrow><mrow><mn
   >2</mn></mrow></msup><mo>+</mo><msub><mrow><mfenced close=")" open="("
   separators=""><mrow><msubsup><mrow><mi>Σ</mi></mrow><mrow><mi>B</mi></m
   row><mrow><mi>i</mi></mrow></msubsup></mrow></mfenced></mrow><mrow><mte
   xt
   mathvariant="italic">qq</mtext></mrow></msub><mo>,</mo><mi>q</mi><mo>=<
   /mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>Q</mi><mo>,</mo><mi>i</
   mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>N</mi></mtd><m
   td><mtext></mtext><mtext></mtext></mtd></mtr></mtable> :MATH]
   (1)
   [MATH: <mtable class="align" columnalign="left"><mtr><mtd
   class="align-1"><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>E</mi></mrow>
   <mrow><mn>2</mn></mrow></msubsup></mtd><mtd
   class="align-2"><mo>←</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mtext
   mathvariant="italic">NM</mtext></mrow></mfrac><mfenced close=""
   open="{"
   separators=""><mrow><mo>|</mo><mo>|</mo><mi>V</mi><mo>|</mo><msubsup><m
   row><mo>|</mo></mrow><mrow><mtext>Fro</mtext></mrow><mrow><mn>2</mn></m
   row></msubsup><mo>−</mo><mtext>tr</mtext><mfenced close=")" open="("
   separators=""><mrow><mn>2</mn><msup><mrow><mi>V</mi></mrow><mrow><mi>T<
   /mi></mrow></msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><msup><mrow><mi>
   Â</mi></mrow><mrow><mi>T</mi></mrow></msup></mrow></mfenced></mrow></mf
   enced><mspace width="2em"></mspace></mtd><mtd><mspace
   width="2em"></mspace></mtd></mtr><mtr><mtd class="align-1"></mtd><mtd
   class="align-2"><mo>+</mo><mfenced close="}" open=""
   separators=""><mrow><mtext>tr</mtext><mfenced close=")" open="("
   separators=""><mrow><mfenced close=")" open="("
   separators=""><mrow><msup><mrow><mi>Â</mi></mrow><mrow><mi>T</mi></mrow
   ></msup><mi>Â</mi><mo>+</mo><mi>M</mi><msub><mrow><mi>Σ</mi></mrow><mro
   w><mi>A</mi></mrow></msub></mrow></mfenced><mfenced close=")" open="("
   separators=""><mrow><msup><mrow><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover></mrow><mrow><mi
   >T</mi></mrow></msup><mover
   accent="true"><mrow><mi>B</mi></mrow><mo>^</mo></mover><mo>+</mo><munde
   r><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi></mrow></munder><msubsup><m
   row><mi>Σ</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi></mrow></ms
   ubsup></mrow></mfenced></mrow></mfenced></mrow></mfenced><mo>,</mo><msp
   ace width="2em"></mspace></mtd><mtd><mspace
   width="2em"></mspace></mtd></mtr></mtable> :MATH]

   where V [i] is the ith row V vector, and
   [MATH: <msub><mrow><mover accent="true"><mrow><mtext
   mathvariant="italic">A</mtext></mrow><mo>^</mo></mover></mrow><mrow><mi
   >q</mi></mrow></msub> :MATH]
   is the qth columnar vector of matrix
   [MATH: <mi>Â</mi> :MATH]
   . Because of extension, divergence was suppressed by introducing C [A]
   normalization instead of using constant values. Therefore,
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>b</mi></mrow><mrow><mi>i</mi>
   </mrow></msubsup> :MATH]
   expresses ith feature importance, and C [A] (instead of the former C
   [B]) represents importance of the qth PC, and must be non-constant. The
   order of iterations in eq. ([264]1) is arbitrary since each iteration
   is independent of one another.

   In the present study, the above iterations began by substituting
   pre-computed PCs (using the prcomp function in R[[265]20]) in A and B,
   to compensate for slow convergence. After a suitable number of
   iterations, parameters with the smallest F values were used for further
   analysis. Convergence was judged if extracted features change after
   more than 100 iterations.

   R code for this algorithm is available in Additional file [266]2.

Statistical analysis of CPCAFE and VBPCAFE applied to simulated data

   For VBPCAFE, the top-ranked features with larger
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   values were extracted after convergence, judged by changes in extracted
   features after more than 100 iterations. For CPCAFE [[267]22-[268]29],
   expression data ({x [ij]}) was embedded into low dimensional space
   using PCA. After selecting the PC for FE, outliers along the PC were
   extracted i. e., the top 10 features with larger absolute projections
   to the selected PC.

Evaluation of FE performance

   In order to evaluate FE performance in a simulated categorical
   multiclass data set, two measures were used for discrimination of
   binary classes with unequal member numbers. In this computation, true
   positive (TP) represents the number of features with distinct
   expression between classes (i≤N ^′), which are identified by the
   specified feature. Conversely, true negative (TN) is the number of
   features with no distinct expression between classes (i>N ^′), and are
   not identified by the specified feature. False positive (FP) is the
   number of features with no distinct expression between classes (i>N
   ^′), but are identified by the specified feature. False negative (FN)
   is the number of features with distinct expression between classes (i≤N
   ^′), but are not identified by the specified feature. Accordingly, the
   Matthews correlation coefficient is defined as:
   [MATH: <mrow><mfrac><mrow><mtext
   mathvariant="italic">TP</mtext><mo>·</mo><mtext
   mathvariant="italic">TN</mtext><mo>−</mo><mtext
   mathvariant="italic">FP</mtext><mo>·</mo><mtext
   mathvariant="italic">FN</mtext></mrow><mrow><msqrt><mrow><mo>(</mo><mte
   xt mathvariant="italic">FP</mtext><mo>+</mo><mtext
   mathvariant="italic">FN</mtext><mo>)</mo><mo>(</mo><mtext
   mathvariant="italic">FP</mtext><mo>+</mo><mtext
   mathvariant="italic">TP</mtext><mo>)</mo><mo>(</mo><mtext
   mathvariant="italic">TN</mtext><mo>+</mo><mtext
   mathvariant="italic">TP</mtext><mo>)</mo><mo>(</mo><mtext
   mathvariant="italic">TN</mtext><mo>+</mo><mtext
   mathvariant="italic">FN</mtext><mo>)</mo></mrow></msqrt></mrow></mfrac>
   <mo>,</mo></mrow> :MATH]

   while the F measure is:
   [MATH: <mrow><mfrac><mrow><mn>2</mn><mtext
   mathvariant="italic">TP</mtext></mrow><mrow><mo>(</mo><mtext
   mathvariant="italic">TP</mtext><mo>+</mo><mtext
   mathvariant="italic">FP</mtext><mo>)</mo><mo>+</mo><mo>(</mo><mtext
   mathvariant="italic">TP</mtext><mo>+</mo><mtext
   mathvariant="italic">FN</mtext><mo>)</mo></mrow></mfrac><mo>,</mo></mro
   w> :MATH]

   where T P+F P corresponds to the number of features identified by
   specific FE, and T P+F N represents the number of features with
   distinct expression between classes (i≤N ^′), thus representing the
   harmonic mean of sensitivity
   [MATH: <mfenced close=")" open="("
   separators=""><mrow><mfrac><mrow><mtext
   mathvariant="italic">TP</mtext></mrow><mrow><mtext
   mathvariant="italic">TP</mtext><mo>+</mo><mtext
   mathvariant="italic">FN</mtext></mrow></mfrac></mrow></mfenced> :MATH]
   and precision
   [MATH: <mfenced close=")" open="("
   separators=""><mrow><mfrac><mrow><mtext
   mathvariant="italic">TP</mtext></mrow><mrow><mtext
   mathvariant="italic">TP</mtext><mo>+</mo><mtext
   mathvariant="italic">FP</mtext></mrow></mfrac></mrow></mfenced> :MATH]
   .

Translational application to PTSD associated heart disease

   The overall work flow is illustrated in Figure [269]16.

Figure 16.

   Figure 16
   [270]Open in a new tab

   Overall study work flow. The methods used for data processing and
   evaluation are indicated in red and blue, respectively. Dotted lines in
   sample descriptions (i.e., 1 or 2 day(s) stress vs 1 day rest) indicate
   missing control samples (see [271]Methods for more detail).

mRNA and miRNA expression

   mRNA and miRNA expression data were obtained from the Gene Expression
   Omnibus (GEO) using the GEO ID: [272]GSE52875 [[273]8]. Expression was
   observed in stressed mouse hearts. The length of time subservient mice
   were exposed to aggressor mice varied, with variable rest times after
   exposure. Exposure and rest times were 1, 2, 3, or 10 days of exposure
   and 1 day of rest, 5 days of exposure and 1 or 10 days of rest, and 10
   days of exposure and 42 days of rest (in total, seven distinct
   treatment conditions). Controls were housed separately from the
   aggressor mice in personal cages, and prepared for all treatment
   conditions except for 1 or 3 days of exposure and 1 day of rest. Thus,
   there were only five control conditions. For each condition there were
   four biological replicates and therefore 48 samples in total were
   available.

   mRNA expression was included in the subseries [274]GSE52866 and
   [275]GSE52871. GSE52866_RAW.tar and GSE52871_RAW.tar (provided as
   Supplementary Data in GEO) were downloaded, and the gProcessedSignal in
   each of 48 GSM files used for analysis. miRNA expression was included
   in the subseries [276]GSE52869 and [277]GSE52872. From [278]GSE52872,
   eight raw Exiqon files (provided as Supplementary Data in GEO) were
   downloaded, and gProcessedSignal and rProcessedSignal used as miRNA
   profiles for further analysis. From [279]GSE52869, 16 files (provided
   as Supplementary Data in GEO) were downloaded, and “Spot Mean Intensity
   Cyanine3” and “Spot Mean Intensity Cyanine5H” used as miRNA profiles
   for further analysis. The conditions attributed to each sample were
   provided by the file names. mRNA and miRNA expression were not
   normalized, apart from for CPCAFE, where mRNA expression was normalized
   to have zero mean and a standard deviation of 1.

KEGG pathway analysis of miRNAs using DIANA-mirpath

   For CPCAFE, the 27 identified miRNAs were uploaded to the DIANA-mirpath
   server [[280]11]. Although DIANA-mirpath requires mature miRNA names
   used in miRBase (rel. 18), the mature miRNA names in GEO ID:
   [281]GSE52875 were used in the previous release, therefore the uploaded
   mature miRNAs are not exactly the same as the 27 identified miRNAs.
   Target gene prediction was performed using Tarbase [[282]30], a
   database using experimentally validated targets that are more
   biologically plausible. Combined target genes sets were used for KEGG
   pathway enrichment analysis. False discovery rate corrected P-values
   were used to screen KEGG pathways. Direct weblinks to DIANA-mirpath
   results are provided in Additional file [283]10. The same analyses were
   performed for categorical regression-based FE and BAHSIC, excluding the
   number of uploaded miRNAs.

Disease association analysis of genes using the Gendoo server

   The Gendoo server [[284]31], a literature-based disease association
   database, was used to identify associations between selected mRNAs and
   disease. As mRNAs were identified using RefSeq mRNA IDs, these were
   interpreted to gene symbols. Obtained gene symbols were uploaded to the
   Gendoo server with mouse being the specified target species. Human was
   also tested for categorical regression-based FE and BAHSIC, to
   compensate for the small number of hits.

KEGG pathway analysis of mRNAs by DAVID

   Employed 24 genes were mRNAs that have non zero values in the column
   named as “target” of the “CPCA based” sheet in Additional file [285]4.
   Refseq IDs for these 24 genes were identified and were uploaded to
   DAVID [[286]13] server using default setting. Then KEGG pathways
   identified were extracted.

Tertiary protein structure prediction

   Tertiary protein structures were predicted using two prediction
   servers, Protein Homology/analogY Recognition Engine V2.0 (phyre2)
   [[287]32] and full automatic modeling systems (FAMS) [[288]33]. Among
   the 26 genes investigated, 14 genes were included in the PDB [[289]34].
   Tertiary protein structures of the remaining genes were predicted by
   either phyre2 or FAMS. The complete list of template proteins and amino
   acid sequences of investigated genes in fasta format is available in
   Additional file [290]11.

In silico drug discovery for FABP3

   ChooseLD [[291]35] was used for FABP3 drug discovery. Using the PDB
   structure of chain A in 1HMR as the template protein, and four ligands
   (9-OCTADECENOIC ACID in 1HMR, OLEIC ACID in 1HMS, STEARIC ACID in 1HMT,
   and PALMITIC ACID in 2HMB) as template ligands (acids not provided in
   1HMR were mapped to 1HMR prior to chooseLD execution), with four
   additional FABP3 inhibitors from ChEMBL [[292]36] as template ligands
   (see Additional file [293]2 for more detail), drug candidate compounds
   were selected from DrugBank [[294]37]. ChooseLD was applied to 1450
   compounds with a Tanimoto index >0.20 from at least one of the eight
   template ligands. The 1450 compounds were selected from 6510 compounds
   with tertiary structures computed by Babel software [[295]38], from
   among the 6583 compounds listed in DrugBank [[296]37]. The 1450
   compounds were ranked based on Finger Print Alignment Scores
   (FPAScores) averaged over three independent runs.

Generation of a test PTSD data set

   In order to test equivalence between CPCAFE and VBPCAFE on PTSD data, a
   test data set was generated. In addition to the 100 probes selected by
   CPCAFE, an additional 100 probes were chosen from those remaining,
   generating a test data set with 200 features. VBPCAFE was applied to
   the generated data set, and
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   s values calculated. The top-ranked 200 features with largest
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>B</mi></mrow><mrow><mi>i</mi>
   <mn>1</mn></mrow></msubsup> :MATH]
   s values were selected as extracted features. FE was performed over 100
   independent ensembles, and the frequency (number of times each feature
   was selected) counted. Frequency number = 100, indicates that the
   feature was always selected.

FE cross-validation

   In order to test FE stability on the PTSD data set, a 4-fold
   cross-validation was performed, with each experiment repeated four
   times. For each experimental setup, three out of four replicates were
   randomly selected, generating a data set with 36 samples. Since the
   total possible number of independent samplings was 4^12≃2×10^7, it is
   impossible to test all samplings. Therefore, 100 samplings were tested,
   which was large enough to demonstrate superior stability of CPCAFE
   compared with the two conventional supervised methods of categorical
   regression-based FE and BAHSIC.

Significance of correlation coefficients

   Correlation coefficient significance was investigated by transforming
   the correlation coefficient (r) to the statistical test variable (t):
   [MATH:
   <mrow><mi>t</mi><mo>=</mo><mfrac><mrow><mi>r</mi><msqrt><mrow><mi>M</mi
   ><mo>−</mo><mn>2</mn></mrow></msqrt></mrow><mrow><msqrt><mrow><mn>1</mn
   ><mo>−</mo><msup><mrow><mi>r</mi></mrow><mrow><mn>2</mn></mrow></msup><
   /mrow></msqrt></mrow></mfrac><mo>,</mo></mrow> :MATH]

   obeying the t distribution of M−2 degrees of freedom. As there were 48
   samples in our study, M= 48. Computed P-values were adjusted by BH
   criterion [[297]19]. Correlation coefficients with adjusted P-values
   <0.05 were regarded as significant.

Acknowledgements