Abstract Background Bone marrow stem cell clonal dysfunction by somatic mutation is suspected to affect post-infarction myocardial regeneration after coronary bypass surgery (CABG). Methods Transcriptome and variant expression analysis was studied in the phase 3 PERFECT trial post myocardial infarction CABG and CD133^+ bone marrow derived hematopoetic stem cells showing difference in left ventricular ejection fraction (∆LVEF) myocardial regeneration Responders (n=14; ∆LVEF +16% day 180/0) and Non-responders (n=9; ∆LVEF -1.1% day 180/0). Subsequently, the findings have been validated in an independent patient cohort (n=14) as well as in two preclinical mouse models investigating SH2B3/LNK antisense or knockout deficient conditions. Findings 1. Clinical: R differed from NR in a total of 161 genes in differential expression (n=23, q<0•05) and 872 genes in coexpression analysis (n=23, q<0•05). Machine Learning clustering analysis revealed distinct RvsNR preoperative gene-expression signatures in peripheral blood acorrelated to SH2B3 (p<0.05). Mutation analysis revealed increased specific variants in RvsNR. (R: 48 genes; NR: 224 genes). 2. Preclinical:SH2B3/LNK-silenced hematopoietic stem cell (HSC) clones displayed significant overgrowth of myeloid and immune cells in bone marrow, peripheral blood, and tissue at day 160 after competitive bone-marrow transplantation into mice. SH2B3/LNK^−/− mice demonstrated enhanced cardiac repair through augmenting the kinetics of bone marrow-derived endothelial progenitor cells, increased capillary density in ischemic myocardium, and reduced left ventricular fibrosis with preserved cardiac function. 3. Validation: Evaluation analysis in 14 additional patients revealed 85% RvsNR (12/14 patients) prediction accuracy for the identified biomarker signature. Interpretation Myocardial repair is affected by HSC gene response and somatic mutation. Machine Learning can be utilized to identify and predict pathological HSC response. Funding German Ministry of Research and Education (BMBF): Reference and Translation Center for Cardiac Stem Cell Therapy - FKZ0312138A and FKZ031L0106C, German Ministry of Research and Education (BMBF): Collaborative research center - DFG:SFB738 and Center of Excellence - DFG:EC-REBIRTH), European Social Fonds: ESF/IV-WM-B34-0011/08, ESF/IV-WM-B34-0030/10, and Miltenyi Biotec GmbH, Bergisch-Gladbach, Germany. Japanese Ministry of Health : Health and Labour Sciences Research Grant (H14-trans-001, H17-trans-002) Trial registration ClinicalTrials.gov [83]NCT00950274 Keywords: Clonal hematopoiesis of indeterminate pathology, CHIP, SH2B3, Myocardial regeneration, Cardiac stem cell therapy, Angiogenesis induction, Post myocardial infarction heart failure, Coronary bypass surgery, CABG, Machine learning Abbreviation: ANCOVA, Analysis of covariance; Ang-1, Angiopoeitin 1; AUR, Arch user repository; AUC, Area under curve; BM, Bone marrow; BMMNC, Bone marrow mononuclear cell; BMSC, Bone marrow stem cells; BrdU, Brome deoxyuridine; CABG, Coronary Artery Bypass Graft; CAP-EPC, Concentrated Ambient Particles – Endothelial Progenitor Cells; CD, Cluster of Differentiation; CEC, Circulating endothelial cells, CEC panel, CDs measured in PB; CFU, Colony-forming unit; CHIP, Clonal hematopoiesis of indeterminate potential; CI, Confidence interval; c-KIT/CD117, Stem Cell Factor Receptor c-KIT, CD117; CLARA, Clustering for Large Applications; CPC, cardiac progenitor cell; CSC, cardiac stem cell; DE, Differential gene expression; DNAseq, desoxyribonucleid acid sequencing; EGF, Epidermal growth factor; EGFR, Epidermal growth factor receptor; ELISA, Enzyme-Linked Immunosorbent Assay; EPC, Endothelial Progenitor Cells, EPC panel, CDs measured in PB; EPO, Erythropoietin; EPOR, Erythropoietin receptor; FACS, Fluorescence activated cell sorter; FDR, False discovery rate; GADPH, Glyceraldehyde 3 phosphate dehydrogenase; GATA4, Transcriptional activator that binds to the consensus sequence 5′-AGATAG-3’; GFP, Green fluorescent protein; GMP, Good Manufacturing Practice; GWAS, Genome wide association study; HR, Hazard ratio; HIF, Hypoxia-Inducible Factor, transcription factor; HSC, Hemopoeitic stem cell; hu, human; ICH GCP, Tripartite Guidelines Guideline for Good Clinical Practice; IGF-1, Insuline-like Growth Factor 1; IGFBP, Insuline-like growth factor binding proteine; IHG, Analysis performed in accordance with ISHAGE guidelines; IL, Interleukin; InDel, mutation insertion or deletion variant; KSL, mouse bone marrow stem cell subpopulation c-KIT+ Sca-I+ lin-; LAD, Left anterior descending coronary artery, RIVA; LAS, Longitudinal axis strain; LCRC, Loss of cardiac regeneration capacity; LNK, SH2B adapter protein 3 (lymphocyte adapter protein); LVEDV, Left Ventricular End Diastolic Volume; LVEF, Left Ventricular Ejection Fraction; LVESD, Left Ventricular End Systolic Dimension; m, mouse; MI, myocardial infarction; ML, Machine learning; MNC, Mononuclear cells; MRI, Magnetic resonance imaging; 6MWT, 6-Minute Walk Test; NGS, Next Generation Sequencing; NR, non-responder; PB, Peripheral blood; PBMNC, mononuclear cells isolated from peripheral blood; PCR, Polymerase chain reaction; PDGF, Platelet derived growth factor; PDFR, Platelet derived growth factor receptor; PEI, Paul-Ehrlich Institute; PI3K, Phosphoinositide-3-Kinase; PBMNC, Peripheral blood mononuclear cell; PPMC, Pearson Product Moment Correlation; qPCR, Quantitative polymerase chain reaction; R, responder; RFI, Reactome functional interaction; RNASeq, Ribonucleid acid sequencing; ROC, Receiver operating characteristics; RT-PCR, Reverse transcriptase polymerase chain reaction; RWMS, Regional wall motion score; SDF-1, Stromal Cell-derived Factor 1; SH2B3, LNK [Src homology 2-B3 (SH2B3)] belongs to a family of SH2-containing proteins with important adaptor functions; SCF, Stem Cell Factor; SNP, Single nucleotide polymorphism, variant; STEMI, ST- segment Elevation Infarction; SUSAR, Suspected Unexpected Serious Adverse Reaction; TiCoNE, Time course network enrichment; TNF, Tumor Necrosis Factor; t-SNE, t-distributed stochastic neighbour embedding; VCA, Virus-Capsid-Antigen; VEGF, Vascular Endothelial Growth Factor; VEGFR, Vascular Endothelial Growth Factor Receptor; WT, wild type; WGCNA, Weighted gene coexpression network analysis Graphical abstract [84]Image, graphical abstract [85]Open in a new tab __________________________________________________________________ Research in context. Evidence before this study The basis for this current work is the randomized double-blinded placebo controlled multicenter Phase 3 PERFECT-trial in which post myocardial infarction (MI) patients after coronary artery bypass graft (CABG) surgery have been treated with intramyocardial CD133^+ bone marrow derived hematopoetic stem cells (BM-HSC) or Placebo. At the time we identified the correlation of myocardial regeneration with systemic bone marrow response characterized by a preoperative biomarker signature in peripheral blood (PB) of 20 angiogenesis and stem cell related factors [86][17]. An additional outcome prediction obtained by Machine Learning (ML) received an accuracy rate of 85% for responder (R) and 80% for non-responder (NR). Here, genetic dysregulation of BM-HSC was suspected and now followed up by gene expression and mutational dysregulation analysis. LNK is an adaptor protein coded by the gene SH2B3 and negatively regulates multiple essential signals in hematopoietic stem cells (HSC). Its regulatory role for BM-HSC in cardiovascular repair remains shallow and will be investigated throughout this underlying manuscript. Added value of this study In the present series of experiments, we clarified that HSC signaling adaptor gene mutations in SH2B3 contribute to a polygenic gene expression circuit switch including the genes PLCG1, LPCAT2, GRB2, AFAP1, AP1B1, KLF8, MARK3 favorable for the cardiac healing process in MI-patients undergoing cardiac recovery after CABG surgery. An integrative ML analysis of preoperative PB enables highly sensitive clinical diagnosis and prediction of cardiac regeneration response after CABG. It may be used for treatment monitoring for cardiac regeneration and give rise to a patient specific ML supported therapy in the future. Our findings in PERFECT about RvsNR and in SH2B3/LNK^−/− mice suggest that the significantly reduced ischemic myocardial damage with preserved cardiac function following MI is mainly due to enhanced angiogenesis in ischemic myocardium. Implications of all the available evidence This novel approach of disease genotype/phenotype analysis combining gene expression, coexpression, and transcript variant calling in a randomized clinical trial led to the discovery of a polygenic circuit involved in HSC response associated to cardiac regeneration capacity. In the following, the findings were verified by animal studies and assisted by correlation analysis of human and mice. This comparison enabled new insights into adaptor proteins, proliferation signaling, and immune checkpoint regulation controlling for vasculogenesis/angiogenesis and cardiac tissue repair. Recovery of expedient cardiac function was observed through up-regulation of HSC/EPCs circulation and stimulation of immune progenitor cell (PC) proliferation. Our findings show that mutational changes in gene expression transcripts have important implications for formulations of new therapeutic strategies to diagnose and enhance cardiac repair by stem cells. Alt-text: Unlabelled box 1. Introduction The hematopoietic system has traditionally been considered as an organized, hierarchical system with multipotent, self-renewing stem cells at the top, lineage-committed progenitor cells in the middle, and lineage-restricted precursor cells, which give rise to terminally differentiated cells, at the bottom [[87][1], [88][2], [89][3]]. However, disorders of clonal hematopoiesis of indeterminate pathology (CHIP) has been described in hematological and cardiovascular disease patients and associated to congenital or somatic DNA mutations [[90]4,[91]5]. The question arises, which mutations in stem congenital or somatic cell regulatory genes cause hematopoietic clonal advantage and impact cardiovascular pathology [[92]6,[93]7]. SH2B3, which codes for the LNK adaptor protein, is one of the major mutated genes associated with hematopoietic stem cell (HSC) proliferation disorders, such as myelodysplasia, erythrocytosis or leukemia [[94]8,[95]9]. In genome wide association studies (GWAS) of cardiovascular patients, the SH2B3 phosphorylation related missense variant rs3184504 was found to be associated with increased platelet count, monocyte proliferation, hypertension, peripheral/coronary artery disease, autoimmune disease, and longevity [96][9], [97][10], [98][11], [99][12], [100][13], [101][14], [102][15]. SH2B3/LNK expression regulation is largely unknown, but expected to impact cardiovascular regeneration through c-KIT/CD117 expressing hematopoietic, myeloid, lymphocytic, endothelial, and mesenchymal progenitor cells in blood [[103]9,[104]10]. In contrast to this, intracardiac SH2B3/LNK expression was found to be associated with pressure overload cardiac hypertrophy regulation [105][16]. At present, the regulatory role of SH2B3 in stem cell proliferation and inflammation response remains unclear in patients with coronary artery disease, especially in post-myocardial infarction repair leading either to regeneration or inflammatory fibrosis of the myocardium [[106]9,[107]13]. Furthermore, it is unclear, if a monogenic switch of SH2B3 gene expression or SNP altered LNK protein function in bone marrow stem cells is able to control cardiac regeneration by altering bone marrow response [108][9]. Moreover, frequency and type of SH2B3 clonal mutations of HSC of patients with cardiac disease is unknown and may have impact on variable pathology. In the recent outcome analysis of the phase 3 clinical PERFECT trial we are investigating intramyocardial transplantation of c-KIT/CD117^+/CD133^+,/CD34^+ bone marrow derived hematopoeitic stem cells (BM-HSC) in post-myocardial infarction (MI) coronary artery bypass graft (CABG) patients. We found striking differences in induction of cardiac regeneration in 60% of BM-HSC treated and placebo groups characterized by a preoperative Machine Learning (ML) signature in peripheral blood (PB) [109][17]. Responders (R) vs. non-responders (NR) were significantly different preoperatively, with R characterized by increased peripheral blood c-KIT/CD117^+/CD133^+/CD34^+ circulating stem cells (EPC), increased thrombocytes, while NR had increased Erythropoeitin (EPO), Vascular endothelial growth factor (VEGF) and N-terminal pro b-type natriuretic peptide (NTproBNP) in preoperative serum [110][17]. Induced bone marrow stem cell proliferation responses in R was suspected to be due to adaptor protein SH2B3/LNK activity [111][17]. Based on this, we first performed variant and gene expression analyses in PERFECT responders vs. non-responders and compared diagnostic RvsNR signatures ([112]Fig. 1A). Then we validated the effect on R/NR signature switch in SH2B3/LNK deficient mouse models to investigate the role of HSC dysfunction in cardiac repair . Final evaluation of the signatures was performed in an independent patient cohort and by mouse/man correlation analysis ([113]Fig. 1B,C). Fig. 1. [114]Fig 1: [115]Open in a new tab Overview of utilized integrative analysis approach integrating clinical patient data with murine pre-clinical models: Genotype/phenotype analysis in randomised clinical trial PERFECT cardiac regeneration outcome and knock-out animal disease model verification of regulatory genes. 2. Methods 2.1. Study Design Peripheral blood bone marrow response was studied by whole transcriptome analysis in the randomized phase 3 PERFECT trial biomarker subgroup (total n=39, Rostock n=23, Hannover n=14, Leipzig n=2) [116][17]. Primary analysis was performed at the Rostock trial site with available biobank, clinical (per protocol), and biomarker data (n=23). CD133^+ BMSC treated (n=13) and placebo controls (n=10) were equally distributed. In the biomarker patient cohort (n=23; Placebo/CD133^+9/14), we investigated systemic bone marrow stem cell response in peripheral blood in Responders (R) classified by the difference in left ventricular ejection fraction (∆LVEF) ≥5 % after 180d (n=14; Placebo/CD133^+ 7/7) and Non-responders (NR) classified by ∆LVEF <5 % after 180d (n=9; Placebo/CD133^+ 5/4) in both treatment groups (intramyocardial Placebo vs. CD133+). For validation variant expression analysis, MRI, and clinical outcome was tested in 14 patients from the Hannover trial center with available clinical data. 2.2. Ethical approval and trial setting RNA sequencing (RNA-Seq) analysis and mRNA RT-PCR in PB: Samples were taken from informed study patients who gave their written consent according to the Declaration of Helsinki (Approval by the Ethical committee, Rostock University Medical Center 2009; No. HV-2009-0012). Analyses and examinations were performed before unblinding of the trial and under careful adherence to the protection of data privacy (pseudonyms). 2.3. Transcriptome Analysis of EDTA blood samples using NGS RNA of frozen EDTA blood samples was isolated in a three step procedure: First, the GeneJET Stabilized and Fresh Whole Blood RNA KIT (Thermo Scientific) was used following manufacturer's instructions. Second, isolated RNA was precipitated with 2.5 volumes ethanol under high salt conditions (10 % of 3 M sodium acetate, pH 5•2). After DNase digest (Thermo Scientific), the RNA was finally purified using Agencourt RNAClean XP beads (Beckman Coulter). Purified RNA was analyzed on a Bionalyzer (Agilent) using RNA 6000 Nano Chips (Agilent). Quality controlled RNA was used to construct sequencing libraries using the Universal Plus mRNA–Seq Technology (Nugen) according to manufacturer's instructions. Briefly, mRNA was selected by oligo d(T) beads, reverse transcribed and cDNA from Globin messengers was removed by the Globin depletion module (Nugen). Quality controlled and quantified libraries were sequenced on a HiSeq1500 system (Illumina) in single-end mode (100 nt read length). For RNA-Seq data analysis, adapter clipping and quality trimming procedures for data pre-processing were performed and aligned the reads to the hg19 genome (for patient data) and mm10 genome (for murine data) with the aid of kallisto, respectively. Differential expression analysis was performed using the likelihood ratio test of the DESeq2 package (genes with >2-fold change and a q-value < 0•05 are considered as significantly differentially expressed). The gene set enrichment analysis (GSEA), annotation, including functional annotation clustering and functional classification, was performed according to Enrichr [117][18]. 2.4. Variant calling from transcriptomic data The previously preprocessed human RNA-Seq datasets were realigned to the hg19 reference (Ensembl version 94) with Star (2-pass mode). The variant calling was applied by the Gatk toolkit [118][19] with specialized filters (e.g. variants are only considered, if they are confirmed with five independent reads - a comprehensive workflow is shown in Supplement Figure S1. 2.5. Experimental CRISPR-Cas9 induced SH2B3/LNK antisense silencing mouse model 2.5.1. Lentiviral vector production The lentiviral vectors pRRL.U6.Lnk-sgRNA.EFS.dTomato.pre or pRRL.U6.NT-sgRNA.EFS.eBFP2.pre were packaged into viral particles by transfection of 10 µg vector, 12 µg pcDNA3.GP.4xCTE, 6 µg pRSV-Rev and 2 µg pMD2.G into HEK-293T cells in 10 cm plates using the calcium-phosphate method. Medium change was performed 6-8 h later and viral supernatants were harvested 30 h and 54 h post-transfection. The lentiviral supernatants were pooled and concentrated by ultracentrifugation. Vector titers were determined on lineage-negative mouse bone marrow cells. 2.5.2. Competitive bone marrow transplantation Lineage-negative mouse bone marrow cells were isolated by flushing femurs, tibias and pelves of GFP^+Cas9 mice (B6J.129(Cg)-Gt(ROSA)26Sortm1.1(CAG-cas9*,-EGFP)Fezh/J) (IMSR Cat# JAX:026179, RRID:IMSR_JAX:026179) followed by lineage depletion using the MojoSort Mouse Hematopoietic Progenitor Cell Isolation Kit (BioLegend). Cells were prestimulated for 24 hin StemSpan (Stem Cell Technologies) supplemented with 100 U/mL penicillin and 100 µg/mL streptomycin (PAA), 2 mM L-glutamine (Biochrom), 20 ng/mL mTPO, 20 ng/mL mIGF-2, 10 ng/mL mSCF, 10 ng/mL hFGF-1 (all cytokine: Peprotech), 20 µg/mL meropenem (Hexal) and 10 µg/mL heparin (Ratiopharm). Cells were transduced at a density of 1•5 × 10^6 cells/mL at an MOI of 30. For competitive transplantation, equal cell numbers of cells transduced with lentiviral vector pRRL.U6.Lnk-sgRNA.EFS.dTomato.pre or pRRL.U6.NT-sgRNA.EFS.eBFP2.pre were mixed and about 5 × 10^5 cells per mouse were transplanted into irradiated (2 × 4•5 Gy) GFP^− CD45.2 B6 (C57BL/6) recipients. Cell mixtures were analyzed by flow cytometry 4-5 d after transduction to confirm equal distribution of both cell fractions. At week 4, 8, 12, and 18 after transplantation, blood counts were performed and peripheral blood was analyzed for donor cell engraftment and lineage distribution by flow cytometry. All experimental procedures were conducted in accordance with the German Animal Protection Law Guidelines for the Care and Use of Laboratory Animals and the study protocol was approved by the Ethics Committee of the LAVES (Lower Saxony State Department for Consumer and Food Safety Protection), Germany. 2.5.3. Experimental SH2B3/LNK knockout model The SH2B3/LNK^–/– mouse strain was generated as described previously [119][10]. C57BL/6 mice (CLEA Japan, Tokyo, Japan) were used as WT control mice. GFP transgenic mice (GFP-Tg mice; C57BL/ 6TgN [act EGFP] Osb Y01) were mated with WT mice or SH2B3/LNK^–/– mice and generated WT/GFP mice or SH2B3/LNK^–/–/GFP mice, respectively, for BM transplantation (BMT) studies. All experimental procedures were conducted in accordance with the Japanese Physiological Society Guidelines for the Care and Use of Laboratory Animals and the study protocol was approved by the Ethics Committee in RIKEN Center for Developmental Biology. 2.5.4. Statistical analysis The results were statistically analyzed using a software package (Statview 5.0, Abacus Concepts Inc, Berkeley, CA). All values were expressed as mean±standard deviation (mean±SD). The comparisons among more than three groups were made using the one-way analysis of variances (ANOVA) in Prism 4 (GraphPad Software, San Diego, CA). Post hoc analysis was performed by Tukey's multiple comparison test, Mann-Whitney comparison test or Bonferroni post-hoc test. Differences of p<0•05 were considered to denote statistical significance. 2.5.5. Data analysis with machine learning Identifying key features and classification of the comprehensive patient data was obtained by employing supervised and unsupervised Machine Learning (ML) algorithms . We preprocessed the data, while removing features with low variance and high correlation for dimension reduction, following best practices recommendations. We compared the following supervised algorithms: AdaBoost (AB), Gradient Boosting (GB), Support Vector Machines (SVM), and Random Forest (RF) [120][20]. We employed classifiers that are suitable for training on small data sets for a comparison of features given little training and chose the most appropriate algorithm according to accuracy and robustness towards overfitting [121][21]. Supervised ML models were 10-fold cross-validated and 100 times repeated. We then applied feature selection for the AB, GB, and RF classifiers to further reduce the number of features to <20. We employed principial component analysis (PCA), t-distributed Stochastic Neighbor Embedding (tSNE), and Uniform Manifold Approximation and Projection (UMAP, [122]https://arxiv.org/abs/1802.03426) for unsupervised machine learning classification and nonlinear dimensionality reduction. 2.5.6. WGCN analysis Weighted gene coexpression network analysis (WGCNA) was performed by applying the R package “WGCNA” to the human RNA-Seq count data. We first constructed the topological overlap matrix (TOM) of all investigated transcripts (~160,000) using the soft thresholding method. We calculated the eigenvalues of the transcripts and evaluated the adjacency based on distance. We subjected transcripts to hierarchical clustering (average linkage) and assigned transcripts with the dynamic hybrid method into groups. We computed the connectivity based on the interaction partners (k) and evaluated the gene significance, which represents the resulting module membership. 3. Results In our analysis we integrated clinical genotype and phenotype data as well as experimental gene knockout animal modeling in which we aimed to unravel and validate diagnostic associations of blood, bone marrow, and heart tissue ([123]Fig. 1). At the phenotypic level, left ventricular function measured in magnetic resonance imaging (MRI) showed recovery with a mean difference in primary endpoint outcome ∆LVEF (d.180/0) in Responders (R) +16% vs. Non-responders (NR) -1•1% (p<0•01, t-test; Mann-Whitney Rank Sum test) ([124]Table 1). Significant difference was found in R for myocardial capillary perfusion measured in MRI with increased epicardial (p=0•038, t-test; Mann-Whitney Rank Sum test) and endocardial (p=0•024, t-test; Mann-Whitney Rank Sum test) maximal upslope velocity after 180 days ([125]Table 1). Table 1. Left ventricular function and myocardial perfusion outcome analysis. MRI evaluation biomarker .subgroup (n=23) for primary endpoint (delta LVEF 180/0), myocardial function by long-axis-strain analysis, and myocardial perfusion by semiquantitative analysis (mean value of 16 segments). Responders (n=14) were classified according to primary endpoint outcome by delta LVEF >5% d. 180/0, non-responders (n=9) by delta LVEF <5% d.180/0. Long-axis-strain measurement was performed according to Giesdal O et al [126][22], myocardial perfusion was measured according to Mordini FE et al [127][23]. Baseline (day 0) SD Primary endpoint (day 180) SD Delta (180/0) P-value (t-test; Mann-Whitney Rank Sum test) LVEF (%) Responder (n evaluable=14) 33,3 5,0 49,3 6,7 16,0 P≤0.001 Non-Responder (n evaluable = 9) 33,3 7,5 32,2 9,1 -1,1 P=0.781 Responder – Non-responder 0 17,1 17,1 P≤0.001 Long axis strain (LAS global) Responder (n evaluable=14) -7,6 2,2 -9,4 2,2 -1,8 P=0.032 Non-Responder (n evaluable = 9) -8,4 2,7 -9,5 2,7 -1,1 P=0.402 Responder – Non-responder +0,8 +0,1 -0,7 P=0.416 Maximal upslope epicardial Responder (n evaluable=13) 27,0 10,4 37,4 17,3 10,4 P=0.018 Non-Responder (n evaluable = 9) 29,6 11,5 28,7 9,1 -0,9 P=0.895 Responder – Non-responder -2,6 +8,7 11,3 P=0.038 Maximal upslope endocardial Responder (n evaluable=13) 29,7 12,6 42,4 19,5 12,7 P=0.014 Non-Responder (n evaluable = 9) 33,6 11,5 33,8 9,7 0,2 P=0.967 Responder – Non-responder -3,9 8,6 12,5 P=0.024 [128]Open in a new tab 3.1. A: Clinical phenotype and genotype of cardiac regeneration response 3.1.1. Gene expression analysis In addition to previously identified correlating angiogenesis biomarkers and SH2B3/LNK RT-PCR analysis of PB [129][17], we performed an in-depth gene expression analysis. In order to study transcriptome profile patterns of R and NR signatures, the capture of polyadenylated RNA was conducted by high throughput sequencing. The experimental procedure included a depletion of cDNA derived from Globin messengers transcriptome to enable high resolution RNA-Seq in preoperative PB samples from 23 patients (14 R, 9 NR). Differential gene expression analysis revealed distinct R/NR patterns consisting of 161 significant genes (q<0•05) out of ~160,000 transcripts. The highest significance was found for 122 unique genes (R/NR: q=0•02) (Supplementary Data SD1a). Clustering for all used methods examined potentially occuring patient subgroups. Three independent clustering analyses (PCA, tSNE, and UMAP) on all gene expression read counts showed a clear distinction between patients ([130]Fig. 2a, Supplementary Figure 1c). All methods clustered the patients into the same defined subgroups, which did not change. Pathway analysis of differing genes was subsequently conducted on each of the three clusters to investigate the specific differences towards the gene signaling among these subgroups ([131]Fig. 2a, Supplementary Data SD1b). Then, we performed the coexpression analysis by WGCNA, an so-called guilt-by-association approach, to be able to interconnect SH2B3/LNK with similarly regulated transcripts. SH2B3/LNK was identified to be coexpressed within a cluster of 872 genes (Supplementary Data SD1c). The corresponding pathways of the coexpressed genes were c-KIT receptor signaling pathway, as well as EGF, PDGF, TCR, IL6, and Interferon 1 signaling ([132]Table 2). Fig. 2. [133]Fig 2: [134]Open in a new tab a: ML subgroup clusters of cohort study (Responder, n=14, red points; Non-responder, n=9, grey points). b: Machine learning feature selection on clinical trial research data and RNA-Seq data. Accuracy comparison for the supervised prediction of the patient responsiveness using only preoperative data. Results are obtained after feature selection and subsequent prediction with two independent classifiers. The graph shows the true positive prediction weights of the ML model (RF for feature selection and SVM for final prediction). Combinations and subsets of these features have been subsequently used to train the final model. The importance indicates a hierarchy of the most relevant features needed for a classification. Table 2. Gene set enrichment pathway analysis utilized by Enrichr for differential gene expression, coexpression, and transcriptomic variants is based on preoperative RNA-Seq data . The 161 significantly differentially expressed transcripts identified by DESeq2 and 872 WGCNA transcripts have been applied to the pathway enrichment analysis of Enrichr for the WikiPathways and BioCarta database. The obtained pathways are significantly enriched according to the adjusted p-value < 0.05 Type of data analysis Database Pathway Term p-value Adjusted p-value Differential expression BioCarta Ras Signaling Pathway_Homo sapiens_h_rasPathway 0,0004127 0,0235251 AKT Signaling Pathway_Homo sapiens_h_aktPathway 0,0076997 0,1097214 Cyclin E Destruction Pathway_Homo sapiens_h_fbw7Pathway 0,0018925 0,0447290 E2F1 Destruction Pathway_Homo sapiens_h_skp2e2fPathway 0,0023542 0,0447290 Control of Gene Expression by Vitamin D Receptor_Homo sapiens_h_vdrPathway 0,0169139 0,1722122 Beta-arrestins in GPCR Desensitization_Homo sapiens_h_bArrestinPathway 0,0181276 0,1722122 WikiPathways Hematopoietic Stem Cell Differentiation_Homo sapiens_WP2849 0,0050742 0,2572584 Translation Factors_Homo sapiens_WP107 0,0063783 0,2572584 AMPK Signaling_Homo sapiens_WP1403 0,0145744 0,4408766 RalA downstream regulated genes_Homo sapiens_WP2290 0,0034194 0,2572584 EGFR1 Signaling Pathway_Mus musculus_WP572 0,0384020 0,4723207 IL-6 signaling Pathway_Mus musculus_WP387 0,0353539 0,4723207 Androgen receptor signaling pathway_Homo sapiens_WP138 0,0284065 0,4723207 Striated Muscle Contraction_Mus musculus_WP216 0,0369516 0,4723207 Striated Muscle Contraction_Homo sapiens_WP383 0,0321361 0,4723207 Coexpression BioCarta EGF Signaling Pathway_Homo sapiens_h_egfPathway 0,0001867 0,0148901 PDGF Signaling Pathway_Homo sapiens_h_pdgfPathway 0,0004153 0,0148901 Control of Gene Expression by Vitamin D Receptor_Homo sapiens_h_vdrPathway 0,0004653 0,0148901 IL 6 signaling pathway_Homo sapiens_h_il6Pathway 0,0023696 0,0440702 Cell to Cell Adhesion Signaling_Homo sapiens_h_cell2cellPathway 0,0020125 0,0440702 Eukaryotic protein translation_Homo sapiens_h_eifPathway 0,0027544 0,0440702 T Cell Receptor Signaling Pathway_Homo sapiens_h_tcrPathway 0,0037212 0,0486744 Map Kinase Inactivation of SMRT Corepressor_Homo sapiens_h_egfr_smrtePathway 0,0040712 0,0486744 Internal Ribosome entry pathway_Homo sapiens_h_iresPathway 0,0045632 0,0486744 TPO Signaling Pathway_Homo sapiens_h_TPOPathway 0,0080521 0,0515331 Inhibition of Cellular Proliferation by Gleevec_Homo sapiens_h_gleevecpathway 0,0067889 0,0515331 Erk1/Erk2 Mapk Signaling pathway_Homo sapiens_h_erkPathway 0,0067889 0,0515331 Sprouty regulation of tyrosine kinase signals_Homo sapiens_h_spryPathway 0,0056252 0,0515331 How Progesterone Initiates the Oocyte Maturation_Homo sapiens_h_mPRPathway 0,0074082 0,0515331 mTOR Signaling Pathway_Homo sapiens_h_mTORPathway 0,0080521 0,0515331 WikiPathways Kit Receptor Signaling Pathway_Mus musculus_WP407 0,0000385 0,0056566 mRNA processing_Mus musculus_WP310 0,0000874 0,0064241 Interferon type I signaling pathways_Homo sapiens_WP585 0,0002457 0,0101753 EGF/EGFR Signaling Pathway_Homo sapiens_WP437 0,0003291 0,0101753 EPO Receptor Signaling_Homo sapiens_WP581 0,0004153 0,0101753 EPO Receptor Signaling_Mus musculus_WP1249 0,0004153 0,0101753 mRNA Processing_Homo sapiens_WP411 0,0007745 0,0162654 PDGF Pathway_Homo sapiens_WP2526 0,0013840 0,0254306 IL-6 signaling pathway_Homo sapiens_WP364 0,0018385 0,0270260 IL-7 Signaling Pathway_Mus musculus_WP297 0,0018385 0,0270260 Translation Factors_Mus musculus_WP307 0,0022338 0,0298523 IL-3 Signaling Pathway_Homo sapiens_WP286 0,0026782 0,0325360 EGFR1 Signaling Pathway_Mus musculus_WP572 0,0028773 0,0325360 SNP-Responder BioCarta Calcium Signaling by HBx of Hepatitis B virus_Homo sapiens_h_HBxPathway 0,0025401 0,0379514 T Cell Receptor Signaling Pathway_Homo sapiens_h_tcrPathway 0,0027108 0,0379514 IL 4 signaling pathway_Homo sapiens_h_il4Pathway 0,0025401 0,0379514 Repression of Pain Sensation by the Transcriptional Regulator DREAM_Homo sapiens_h_dreampathway 0,0025401 0,0379514 Nuclear receptors coordinate the activities of chromatin remodeling complexes and coactivators to facilitate initiation of transcription in carcinoma cells_Homo sapiens_h_rarrxrPathway 0,0025401 0,0379514 WikiPathways mRNA Processing_Homo sapiens_WP411 0,0000037 0,0005582 Diurnally Regulated Genes with Circadian Orthologs_Homo sapiens_WP410 0,0001005 0,0037941 Diurnally Regulated Genes with Circadian Orthologs_Mus musculus_WP1268 0,0001005 0,0037941 Exercise-induced Circadian Regulation_Mus musculus_WP544 0,0001005 0,0037941 mRNA processing_Mus musculus_WP310 0,0001861 0,0056192 IL-2 Signaling Pathway_Homo sapiens_WP49 0,0012438 0,0268307 Cytoplasmic Ribosomal Proteins_Homo sapiens_WP477 0,0010767 0,0268307 IL-4 Signaling Pathway_Homo sapiens_WP395 0,0025723 0,0409333 RANKL/RANK Signaling Pathway_Homo sapiens_WP2018 0,0027108 0,0409333 Apoptosis-related network due to altered Notch3 in ovarian cancer_Homo sapiens_WP2864 0,0024383 0,0409333 SNP-Non-responder BioCarta Mechanism of Protein Import into the Nucleus_Homo sapiens_h_npcPathway 0,0017807 0,1887530 Thrombin signaling and protease-activated receptors_Homo sapiens_h_Par1Pathway 0,0187440 0,2862846 Role of MEF2D in T-cell Apoptosis_Homo sapiens_h_mef2dPathway 0,0248418 0,2862846 ADP-Ribosylation Factor_Homo sapiens_h_arapPathway 0,0227045 0,2862846 Spliceosomal Assembly_Homo sapiens_h_smPathway 0,0114323 0,2862846 Cycling of Ran in nucleocytoplasmic transport_Homo sapiens_h_ranPathway 0,0144950 0,2862846 Role of Ran in mitotic spindle regulation_Homo sapiens_h_ranMSpathway 0,0215374 0,2862846 Erythropoietin mediated neuroprotection through NF-kB_Homo sapiens_h_eponfkbPathway 0,0297088 0,2862846 WikiPathways Proteasome Degradation_Homo sapiens_WP183 0,0000001 0,0000161 Allograft Rejection_Homo sapiens_WP2328 0,0000055 0,0007073 Proteasome Degradation_Mus musculus_WP519 0,0000215 0,0018437 G13 Signaling Pathway_Mus musculus_WP298 0,0007868 0,0505532 [135]Open in a new tab 3.1.2. Stratification of patients by ML selected features and correlation analysis ML feature selection was applied as an independent method identifying the most important features among all gene expression and PERFECT trial outcome data. A first model achieved a prediction accuracy of 90% (ROC AUC 91•2%; CI: 89•4-93•0) when selecting the most important top 20 features as potential biomarkers ([136]Fig. 2b). An integrational correlation analysis was applied to identify interrelations among the transcriptomic and phenotypic layer as well as between known angiogenesis and immune response markers. In particular, we correlated previously identified preoperative biomarkers for RvsNR found in the prior PERFECT trial analysis [137][17], ML top-selected genes (PLCG1, LPCAT2, GRB2, KLF8, AFAP1, MARK3, AP1B1), CHIP-related genes (TET2, ASXL-1, DNMT3A), previously identified adaptor protein LNK coding gene SH2B3, and related pathways (EPOR, KIT, KIT-L, PROM1/CD133, NOTCH2, PDCD1/PD-1, ATXN1L, MTOR genes) as well as myocardial perfusion parameters ([138]Fig. 3). Top-listed correlations (p<0•05; Pearson correlation coefficient) were found for SH2B3 to the gene expression of NOTCH2, KLF8, NOTCH2NLC, TET2, ASXL1, PLCG1, and ATXN1L ([139]Fig. 3). ML-top listed PLCG1, LPCAT2 were correlated to ∆LVEF response (p>0•05; Pearson correlation coefficient), PDCD1/PD-1 to ∆LVperfusion (p<0•05). Response was also correlated to increased PROM1/CD133 RNA, AFAP1 RNA, myocardial perfusion (∆ maximal upslope gradient epicardial after 180 days), preoperative leukocyte count, CD34 count, IGFBP3 serum protein, and hemoglobin (p>0•05; Pearson correlation coefficient; n=23). NR (negative correlation to ΔLVEF Response) correlated to preoperative LVEDV Index, VEGF-B, NOTCH2NLA gene expression, serum levels of NT proBNP, VEGF, Erythropoietin, and IP10 (p<0•05; Pearson correlation coefficient; n=23). Examplarily, an even higher complexity of differential gene transcript correlations to different genes were demonstrated for PROM1/CD133 and NOTCH2 on the isoform level(Supplemental Fig. S2). Fig. 3. [140]Fig. 3: [141]Open in a new tab Integration of RNA-Seq, perfusion, and clinical trial research data for Pearson correlation analysis. Comparison of peripheral blood (PB) circulating cells and biomarkers (orange), MRI myocardial perfusion parameters (green), and human PB gene expression data (RNA-Seq) (black). The ΔLVEF response (red) is highlighted for an improved visual analysis of important correlations. The color scale, ranging from 1 to -1 in the upper panel (blue to red), represents the correlation between the different factors. The size of the dots represents the significance (p<0,01, p<0,05, and p>0,05; Pearson correlation) of the respective correlation (For interpretation of the references to color in this