Abstract Generating condition-specific metabolic models by mapping gene expression data to genome-scale metabolic models (GEMs) is a routine approach to elucidate disease mechanisms from a metabolic perspective. On the other hand, integrating variants that perturb enzyme functionality from the same RNA-seq data may enhance GEM accuracy, offering insights into genome-wide metabolic pathology. Our study pioneers the extraction of both transcriptomic and genomic data from the same RNA-seq data to reconstruct personalized metabolic models. We map genes with significantly higher load of pathogenic variants in Alzheimer’s disease (AD) onto a human GEM together with the gene expression data. Comparative analysis of the resulting personalized patient metabolic models with the control models shows enhanced accuracy in detecting AD-associated metabolic pathways compared to the case where only expression data is mapped on the GEM. Besides, several otherwise would-be missed pathways are annotated in AD by considering the effect of genomic variants. Subject terms: Computational models, Data integration __________________________________________________________________ The authors used the iMAT algorithm to map the genes associated with Alzheimer’s Disease (AD) to the genome-scale metabolic models (GEMs). Introduction Alzheimer’s Disease (AD) is a complex disease without exact treatment. A systems-level perspective is required to discover the molecular mechanisms driving the multifaceted pathogenesis of the disease. Recent research on AD revealed that cellular metabolism is highly affected in the disease. The brain of AD patients shows mitochondrial dysfunction^[30]1, impairment of lipid metabolism^[31]2 and reduced overall energy metabolism^[32]3. In the light of these findings, therapeutic strategies for AD have also shifted focus to the metabolism. As an example, the Drug Repurposing for Effective Alzheimer’s Medicines (DREAM) study is a large scale study for drug repurposing on AD and related dementias that target metabolic abnormalities in AD^[33]4. Genome-scale metabolic models (GEMs) are powerful instruments to study metabolic events in the cell comprehensively. GEMs describe all known metabolic reactions and their gene associations for an organism, enabling the analysis of alterations in metabolic reactions and pathways at different conditions including disease states. Algorithms are available to integrate transcriptomic data with GEMs to generate reduced, condition-specific metabolic models^[34]5. iMAT^[35]6 is one of the most widely used integration algorithms for mammalian cells since it does not require any specific measurement constraints and any definition of biological objective in terms of metabolic reactions, which is the case for the majority of such algorithms. Recently, Baloni et al. ^[36]7 used iMAT to construct AD models and showed that bile acid and cholesterol metabolism reactions were commonly active in different brain-region specific metabolic models. Also, Moolamalla and Vinod^[37]8 used the iMAT algorithm to generate metabolic models for different neuropsychiatric diseases like schizophrenia, bipolar disorder, and major depressive disorder. Whole genome sequencing and whole exome sequencing are popular methods to detect single-nucleotide variants (SNVs) on the genome. RNAseq data can be used to identify pathogenic variants based on the sequencing of exonic regions. This makes RNAseq data unique since both gene expression levels and pathogenic variants can be identified from the same sample^[38]9. Different studies in the literature have already reported the application of variant discovery from RNAseq data in different diseases^[39]10,[40]11. However, the use of genomic and transcriptomic information content of RNAseq data together to generate condition-specific genome-scale metabolic models have remained unexplored to date. In this study, we used the iMAT algorithm to generate personalized genome-scale metabolic models for each individual in three different large AD study cohorts; The Religious Orders Study (ROS) and Rush Memory and Aging Project (MAP)^[41]12, Mayo Clinic^[42]13 and Mount Sinai Brain Bank (MSBB)^[43]14, which collectively cover a total of 643 AD individuals. For the first time in the literature, we extracted both transcriptomic and genomic information from the same RNA-seq sample to generate personalized metabolic models. We show that the consideration of genes with significantly higher load of pathogenic variants in AD state considerably improves metabolic models by capturing a number of otherwise missed AD-associated metabolic alterations at pathway level. Methods Transcriptome datasets Three most commonly used RNAseq-derived AD datasets in the literature were used in this study. ROSMAP, Mayo Clinic and MSBB raw FASTQ files were retrieved from the Synapse Platform ([44]https://www.synapse.org) with accession IDs: syn17024112, syn9738945 and syn7416949, respectively. Clinical metadata of the datasets were also retrieved from the Synapse platform. The ROSMAP dataset was derived from the dorsolateral prefrontal cortex, the Mayo Clinic dataset from the temporal cortex and the MSBB dataset from the parahippocampal gyrus. Samples were categorized as AD and control based on the CERAD score. Individuals with CERAD score 4 (No AD) were considered as the control group, and 1 (Definite AD) and 2 (Probable AD) were considered as AD. Since there was no CERAD score information in the Mayo data, sample classification was done based on Braak Stage (0-III as control, IV-VI as AD). ROSMAP, Mayo Clinic and MSBB RNAseq datasets include 404, 82 and 158 AD and 165, 78 and 26 control samples, respectively. Gene expressions from RNAseq data The FASTQ files were subjected to FastQC tool (version 0.11.9)^[45]15 for quality control, and low-quality reads, short reads and adapter sequences were trimmed by Trimmomatic (version 0.39)^[46]16. Trimmed reads were aligned to human reference genome (hg38) by the STAR algorithm (version 2.7.8a)^[47]17. Then, featureCounts tool (version 2.0.2)^[48]18 was used to obtain raw counts. Between-sample normalization was applied on the raw counts using Deseq2^[49]19 R package. The log2 values of normalized counts were calculated, and the log-transformed values were adjusted for major covariates (age, gender and post-mortem interval). To this end, the built-in R function lm was used to generate linear models and to calculate coefficients of age, gender and post-mortem interval covariates as described before^[50]20. Subsequently, the effects of the covariates were removed from the normalized and log2-transformed count data. Principal component analysis (PCA) was performed to detect outlier samples for each dataset. 1 AD (Sample ID: 500_120515) and 1 control (Sample ID: 380_120503) from the ROSMAP dataset and 3 controls from the Mayo Clinic dataset (Sample IDs: 11294, 11396, 11399) were determined to be outliers and were removed from the data. Variant identification from RNAseq data By using the trimmed FASTQ files, the STAR tool was re-run to get splice junction information for each sample. Variant calling was then performed after a second indexing and alignment, using GATK tools (version 4.2.0.0) as described in GATK Best Practices for Variant Calling in RNAseq^[51]21. Briefly, AddOrReplaceReadGroups was used to add read group information to BAM files, and it was followed by MarkDuplicates, which identifies duplicate reads resulting from the PCR step. Then, SplitNCigarReads was used to split mapped reads at the intronic regions, and BSQR (Base Quality Score Recalibration) was used to re-score base quality scores. Afterwards, variants were detected using the HaplotypeCaller tool, and a VCF file was generated for each sample. The variants were filtered and annotated by the VariantFiltration tool of GATK and ANNOVAR^[52]22 respectively. In further analyses, biallelic variants with read depth value equal to or greater than five were used. Gene-level pathogenicity score calculation The GenePy^[53]23 algorithm was employed to transform variant-level pathogenicity scores into gene-level pathogenicity scores. This algorithm operates by aggregating the impact of all variants within a gene in an additive fashion, thereby generating a cumulative pathogenicity score that considers the combined effects of numerous small to moderate effects exerted by each variant. This method closely follows the non-Mendelian inheritance patterns typically observed in complex diseases. GenePy scores of each gene for each individual were computed in the R environment by using the in-house script implemented based on the GenePy score formula and GenePy algorithm. In this study, Rare Exome Variant Ensembl Learner (REVEL) scores^[54]24 were used as the deleteriousness value. REVEL predicts variant pathogenicity by combining scores from 13 distinct in silico pathogenicity prediction tools. The REVEL score of a variant takes values between 0 and 1, with higher scores reflecting a higher likelihood of the variant being pathogenic. Structural variants (e.g., frameshift indels, stop loss/gain mutations) that lead to protein truncation/elongation are not assigned REVEL scores. Therefore, the maximum deleteriousness value 1 was assigned to all structural variants due to their profoundly damaging impact on protein function. The Genome Aggregation Database (gnomAD)^[55]25 exome allele frequencies were used as the allele frequency in the GenePy score calculation. Since the gnomAD exome database encompasses 125,748 individuals, variants lacking frequency data in gnomAD were assigned an allele frequency of 3.98 × 10^–6, indicating one allele present among 125,748 individuals. The GenePy scores were computed for each sample in each dataset. Larger genes can accumulate higher GenePy scores as they tend to harbor a greater number of variants, resulting in inflated GenePy scores. To address this, GenePy scores were divided by gene length and multiplied by the median observed gene length in the data for gene length correction^[56]23. Finally, the GenePy scores were adjusted to account for gender effect using the lm function in R, as mentioned in the previous section. To determine genes with significantly higher pathogenic variants in a personalized manner for each AD individual, the GenePy score of each gene in a given AD sample was combined with the scores of that gene in all the control samples in that dataset, and the values were ranked. If the score is higher than 95% of the scores from the control individuals, the gene was marked as a gene with higher load of pathogenic variants in AD. Personalized metabolic model reconstruction To reconstruct personalized genome-scale metabolic models for all controls and AD cases, genes in the covariate-adjusted expression data were mapped individually to the human genome-scale metabolic model (Human-GEM) using the integrative metabolic analysis tool (iMAT)^[57]6. The iMAT algorithm was run with the parameters described in our previous study^[58]26. Due to the covariate adjustment of the expression data, there are negative values in the data. iMAT algorithm available through the COBRA Toolbox^[59]27 ignores genes with negative expression values. Hence, the absolute value of the smallest negative expression value in each dataset was added to the whole data. In this way, all expression values were converted to positive values. Since the parameters used by the iMAT algorithm to determine active and inactive reactions are based on percentiles, this process does not affect the results. Human-GEM (version 1.12.0)^[60]28 was retrieved from the GitHub repository ([61]https://github.com/SysBioChalmers/Human-GEM). This comprehensive metabolic network includes 13,070 reactions associated with 3067 genes, 8369 metabolites and 143 pathways. 3055 out of 3067 genes in the Human-GEM are included in the transcriptome datasets. Simulations were performed using MATLAB R2020a with the Gurobi solver. To incorporate the effects of the variants that disrupt protein function into personalized metabolic models (Fig. [62]1A), the following steps were followed: (i) It was hypothesized that genes with significantly higher load of pathogenic variants in AD, as identified by the GenePy algorithm, would not be able to form functional proteins due to mutations on them, even if their mRNA expression levels were not low. Based on this assumption, the expression levels of pathogenic variant genes were set to zero in the covariate adjusted expression data of that AD sample. This was repeated for all AD samples. (ii) Then, sample-based iMAT models were generated by forcing the removal of the reactions controlled by the combined list of the lowly expressed genes and the genes with pathogenic variants from the metabolic model, subject to mass-balance constraints around the intracellular metabolites. iMAT restores the reactions back if they violate mass-balance constraints or if the removal decreases the consistency of reaction rates with the transcriptome data. The resulting metabolic models were referred as AD^var models in this study. Fig. 1. Summary of iMAT-based personalized metabolic model generation and incorporation of the impact of pathogenic variants to models. [63]Fig. 1 [64]Open in a new tab A RNAseq data are obtained in Fastq format and processed with two different pipelines to find gene expression values and genomic variants. The genes with higher load of pathogenic variants and the genes with high or moderate expression levels were selected as inactive genes for each AD sample. B Gene expression values are mapped to the genome-scale metabolic model and AD, control and AD^var models are created with the iMAT algorithm. To create AD^var models, unlike the classical method, the genes carrying higher load of pathogenicity in AD were additionally used. C Differentially perturbed reactions between AD and control are identified by Fisher’s Exact Test after representing each personalized metabolic model as a binary vector, where “1” means the reaction is active in that model. Then, pathways overrepresented with the differentially perturbed reactions were predicted. Statistics and reproducibility The generated iMAT models were converted to binary format to apply the statistical tests. This conversion was based on the reactions in Human-GEM. Reactions active in the iMAT models were represented by 1 and inactive reactions were represented by 0 for each sample. Then, using these binary matrices, a contingency table was created for each reaction between AD and control conditions, and Fisher’s Exact Test was performed to identify significantly altered reactions in each dataset. P value cut-off of 0.05 was used for AD-Control models while the cut-off was 0.01 for AD^var-Control models. We used a more stringent p value cut-off in comparing AD^var-Control metabolic models when identifying affected reactions. Here, we aimed to make the number of affected reactions more comparable to the case where AD models ignoring genomic variants were used. We show that we still captured many more AD related mechanisms in AD^var-Control comparisons than AD-Control comparisons although we used a more stringent p value cut-off. The reaction-subsystem assignments in Human-GEM were used to find significantly affected pathways. Fisher Exact Test based p values of each reaction were converted to z-scores using the inverse cumulative density function. Each pathway z-score was then calculated by averaging the z-scores of the reactions associated with that pathway and corrected with 1000 random permutations as described elsewhere^[65]29,[66]30. Corrected pathway z-scores were converted to pathway p values using the cumulative density function. Significantly affected pathways were identified using the p value < 0.05 cut-off. This approach enabled consideration of all pathway reactions in the calculation compared to a hypergeometric test where only reactions with significant p values are considered. Identification of AD-related genes associated with the affected metabolic reactions Using the Gene-Protein-Reaction (GPR) rules of Human-GEM, the genes controlling significantly perturbed reactions were extracted. Experimentally confirmed and curated AD-related gene lists were collected from another study^[67]31 and DisGeNET^[68]32. There are 166 AD-related metabolic genes in the final list. The hypergeometric test was used to test whether the number of AD-related genes was significantly overrepresented in the identified lists of significantly perturbed reactions from AD or AD^var models. ROSMAP proteome data analysis ROSMAP proteome data was obtained from the Synapse platform with accession ID: syn21448334. There are 400 samples and 8817 proteins in the data. One protein with NA reads in more than 50% of the samples was excluded from the data. Other NA reads were filled with half of the minimum value in the data (limit of detection/2). Quantile normalization was performed using the preprocessCore R package. Then, 1545 proteins in common with the proteins (enzymes) covered by HumanGEM v.1.12 were selected. Similar to RNAseq analysis, AD and control samples were separated according to CERAD score. 251 AD and 106 control samples were identified. Covariate adjustment was performed with respect to age, sex and PMI. Using the rank-based approach, the abundance of each protein in an AD sample was compared with the abundance of that protein in all control samples. Thus, the abundance ranks of the proteins in each AD sample compared to the control were determined on a sample-based manner. Reporting summary Further information on research design is available in the [69]Nature Portfolio Reporting Summary linked to this article. Results and discussion iMAT models of AD, AD^var and control conditions Each sample in the ROSMAP, Mayo Clinic and MSBB post-mortem brain RNAseq datasets were mapped to Human-GEM using the iMAT algorithm to reconstruct personalized genome-scale metabolic models for both AD and control samples. This led to 908 models in total spanning the three datasets. The iMAT algorithm generates metabolic models by inactivating reactions controlled by enzymes encoded by genes with low expression, while keeping reactions controlled by enzymes encoded by genes with high expression in the model, subject to mass-balance constraints around intracellular metabolites^[70]6. It also ensures maximum consistency between gene expression levels and reaction rates. On the other hand, some genomic variants are pathogenic, i.e. they lead to dysfunctional enzymes, although they do not affect expression at the mRNA level^[71]33,[72]34. In this study, first, genomic variants in the form of single-nucleotide variants and insertions/deletions were predicted from the RNAseq datasets, and, then, their in silico pathogenicity scores were used to calculate gene-level scores. The correlation between expression levels and pathogenicity scores of metabolic genes was calculated for each sample. The correlation coefficients ranged between -0.03-0.23 for all the samples from three different cohorts (Supplementary Fig. [73]1). Therefore, the pathogenicity scores of a sample do not depend on gene expression levels and provide new knowledge. To integrate this knowledge into metabolic models, genes with significantly higher pathogenicity in AD samples were marked as inactive in generating iMAT-derived metabolic models, termed AD^var models (Fig. [74]1A-B). The number of metabolic genes carrying pathogenic variants in AD samples was about 50-150 for the studied datasets (Fig. [75]2A). This means that the genes additionally included in the list of genes to be removed from the metabolic models to create AD^var models is about ~1.5%-5% of the total number of genes in Human-GEM, which is not a high fraction. In other words, our approach of incorporating genes with higher load of pathogenic variants in AD introduces minimal intervention to the original iMAT approach. We also checked the expression levels of these genes to see how many of them are expressed higher than the high-expression cut-off used by iMAT. On average, we found 37 pathogenic variant carrying genes in ROSMAP, 44 in Mayo and 31 in MSBB to have expression levels higher than 75% of all the genes in transcriptome data (Fig. [76]2B). Thus, the number of metabolic genes manipulated in the transcriptome data is considerably less than the number of genes in the model. Fig. 2. Number of metabolic genes carrying significantly higher load of pathogenic variants in AD for each dataset. [77]Fig. 2 [78]Open in a new tab A Histogram of the number of metabolic genes with higher load of pathogenic variants in each AD sample for each dataset. Frequency axis shows the number of samples having a specific number of pathogenic genes as indicated in the x-axis. B Boxplot of the number of metabolic genes carrying pathogenic variants and having expression levels higher than 75% of the genes (the iMAT high-expression cut-off) in the whole transcriptome. Our major hypothesis in this study is that genes carrying higher pathogenic load in an AD patient compared to controls will not encode a functional protein although the gene is highly expressed. To validate this hypothesis, we focused on the genes that were assumed to be inactive in the iMAT analysis because of their high pathogenic load in AD although they were highly expressed (expression higher than 75% of the averaged expressions of the genes across all samples). For the ROSMAP dataset, there were 1126 such genes that were set to inactive state in at least one sample. Of note, setting a gene with high pathogenic variant load in AD to inactive state does not necessarily mean that the corresponding reaction will be removed from the model since a high number of reactions in the Human-GEM are indeed controlled by multiple enzymes/genes. Only 24 of these genes were set to inactive state in more than 10% of the ROSMAP AD samples (Supplementary Table [79]1). Besides, these 24 genes were not highly expressed in 85% of the AD samples. These genes were examined in more detail as explained below. Firstly, we checked the expression values of these 24 genes in the Mayo Clinic and MSBB datasets, and we determined for each dataset the number of samples in which their expressions were lower than the average expression. Among these 24 genes, six of them (PGGHG, PNPLA2, LPL, PLCD3, SLC52A3, and MPST) were below the average expression in at least 40% of the MSBB samples. This shows that these genes are not always highly expressed in AD, and the behavior we see in the ROSMAP dataset can be cohort-specific or brain-region specific. Supportively, Jia et al. ^[80]35 reported that mRNA levels of TCA cycle genes ACO2, SDHB and PDK2, which are among these 24 genes, were decreased in different brain regions in AD. Another study applied a transcriptome-wide association study (TWAS) on SLC25 family genes, and found that decreased expression of the SLC25A22 gene is associated with AD^[81]36. Bis et al. ^[82]37 created a list of AD-associated genes from GWAS studies reporting also the type of mutation causing the association. When these 24 genes were compared with their list, it was found that missense mutation in PNPLA2 and POLR2E genes have been associated with AD. In addition, the ROSMAP proteome data^[83]38 (synapse accession ID: syn21448334) was analyzed. 14 of these 24 genes have measured protein abundance in the proteome data. 5 (MEPCE, RNF13, POLR2E, UBE2E2, and SMPD1) out of these 14 genes have abundances below the average protein abundance in all AD samples in the ROSMAP proteome data. This implies that these genes are translated into non-functional or weakly functional proteins. In summary, for 15 of the 24 genes, there is evidence that their expression may be low in AD or that they may form non-functional proteins. Therefore, this supports inactivation of these genes by the approach proposed in this study. Subsequently, the iMAT models of each dataset were compared to investigate whether the numbers of active reactions between the control, AD^var and AD models are different. As shown in Fig. [84]3A, all models consisted of ~7000–8000 reactions while Human-GEM includes 13070 reactions. Accordingly, it was observed that the distribution of the number of active reactions was more similar in the AD and control models compared to the AD^var models. In the AD^var models, although there was no dramatic difference in terms of the number of reactions in the created personalized models, the distribution became more spread out. For each sample, we also looked at the difference between the number of reactions between the AD and AD^var models of that sample (Fig. [85]3B) and found that there was a difference of about 50 reactions in ROSMAP and MSBB and about 200 reactions for Mayo Clinic. Differing from the other datasets, the AD^var metabolic models of the Mayo Clinic dataset had fewer reactions than the AD models. To better understand this difference, Jaccard similarity indices were calculated between the binary matrices representing AD and AD^var models for each dataset, and the similarity of the binary models was around 85% for all the datasets. These results indicate that although there are differences between the models at the individual level, in general the AD and AD^var models are similar in terms of the majority of active reactions. Fig. 3. Condition specific model generation By iMAT. [86]Fig. 3 [87]Open in a new tab A Histogram of the number of active reactions in each personalized model for each dataset. Frequency shows the number of instances with the count. B Boxplot of the difference in reaction numbers between AD and AD^var iMAT models for each sample. C Bar plot showing the total number of genes associated with the lists of significantly altered reactions, and the proportion of AD-related genes in these lists for each dataset and for each of AD and AD^var models. pval shows the hypergeometric p value obtained from the overrepresentation analysis of AD-related genes in each list. n.s: not significant, ** p val < 0.01, ***p val < 0.001. After the significantly altered reactions were identified using Fisher’s Exact test, gene compositions controlling these reactions were analyzed to determine the number of genes associated with AD for each reaction list (Fig. [88]3C). Accordingly, while the significant reaction lists were significantly enriched with the AD-related genes in all AD^var models, a significant result was obtained only in MSBB among the models constructed with only gene-expression data, and with a much higher p value compared to the AD^var counterpart. This shows that the integration of pathogenic-variant data into the metabolic models contributes to a better representation of AD status at the gene level. Differentially perturbed pathways After the personalized models were generated, Fisher’s Exact Test was applied to identify differentially affected reactions between AD-Control comparison and AD^var-Control comparison (Fig. [89]1C). Although the model sizes and pathogenic gene counts were not much different in the AD and AD^var models, the number of differentially perturbed reactions between AD-Control and AD^var-Control were considerably different (Supplementary Fig. [90]2). For all the cohorts, adding the effect of pathogenic variants to the models increased the number of differentially altered reactions (Table [91]1). Using reaction-pathway associations of Human-GEM, we also identified the number of unique pathways associated with the significantly affected reactions for each dataset (Table [92]1, Supplementary Fig. [93]3). The statistical significance of the intersection of AD and AD^var pathways was checked for each dataset using the pathways associated with the significantly affected reactions. The intersections were found to be highly significant based on Super Exact Test^[94]39, with p values being 7.76 × 10^–5, 1.30 × 10^–8 and 1.27 × 10^–8 for ROSMAP, Mayo Clinic and MSBB datasets, respectively (Supplementary Table [95]2). Likewise, Jaccard similarity indices between AD and AD^var pathways were 0.46, 0.57 and 0.45 for ROSMAP, Mayo Clinic and MSBB, respectively. A modified version, Overlap similarity index, showed very high similarity between AD and AD^var pathways (~0.90 on average), implying that AD^var pathways were almost inclusive of AD pathways (Supplementary Table [96]3). Table 1. The number of significantly affected reactions and associated pathways Datasets and Compared Conditions # of significantly affected reactions # of associated pathways # of overrepresented pathways ROSMAP AD-Control 290 55 6 ROSMAP AD^var-Control 3089 108 28 Mayo Clinic AD-Control 674 64 7 Mayo Clinic AD^var-Control 1343 98 17 MSBB AD-Control 220 40 4 MSBB AD^var-Control 793 72 17 [97]Open in a new tab We applied a statistical analysis to identify pathways significantly overrepresented among the affected reactions (Fig. [98]4, Table [99]1). In general, only few pathways were identified to be significantly perturbed based on the AD models, regardless of the dataset used. We identified a number of pathways whose perturbation was only captured with the AD^var models. Below, we provide a detailed analysis of the pathways identified to be significantly perturbed only in AD^var models. Fig. 4. Differentially perturbed pathways between AD-Control models and AD^var-Control models. [100]Fig. 4 [101]Open in a new tab Pathway enrichment analysis was applied on the differentially altered reactions. x-axis shows -log10(p value) of the pathways. Only significant pathways are shown in the figure for each dataset. The sphingolipid metabolism was significantly perturbed only in the AD^var models in all the three cohorts. Sphingolipids are a member of lipid family, and they are crucial elements of membrane biology and have roles in the regulation of cell function^[102]40. Sphingolipid metabolism was previously associated with dementia and the AD risk factor gene APOE genotype^[103]41. Also, sphingomyelin metabolism was recently proposed as a therapeutic target for AD based on multi-omics profiling of the AD brain^[104]42. Therefore, identifying sphingolipid metabolism only in AD^var models shows the increased accuracy in capturing AD-related processes in the metabolic models. Glycolysis/glucogenesis is another pathway that was significantly affected only in AD^var models in the three cohorts. Glucose metabolism is important in energy production, which is vital for all cells, and it was reported that glucose metabolism and glycolysis defects contribute to AD pathogenesis^[105]43–[106]45. In the brain, astrocyte cells prefer aerobic glycolysis to produce lactate. Lactate produced from glycolysis or glycogenolysis in astrocytes is transferred to neurons to be used in oxidative phosphorylation. However, it was observed that the lactate production capacity of astrocytes decreased in AD^[107]46. Consequently, energy deficiency in neurons and glia play an essential role in AD. Fatty acid oxidation is another pathway that was significantly affected only in AD^var models in the three cohorts. Brain is the organ where fatty acids are the most abundant in our body. It was shown that the levels of all subclasses of fatty acids are different in the cerebrospinal fluid of AD patients compared to controls^[108]47. Qi et al. reported that APOE4 genotype caused a shift in both glucose and fatty acid oxidation in astrocytes^[109]48. Overall, these results indicate that integrating pathogenic variants into the models allows capturing of some important AD-related pathways in different cohorts. In models generated with the samples obtained from different brain regions, there were also differences among significantly perturbed pathways that were captured only in the variant-aware models. Bile acid, chondroitin/heparan sulfate and glycosphingolipid biosynthesis were differentially perturbed only in the variant models (MSBB and ROSMAP cohorts). Among these three pathways, bile acid metabolism is highly integrated with cholesterol metabolism and has been associated with AD. Recently, Baloni et al. used metabolic modeling approach and reported that bile acid metabolism was altered in AD brains^[110]7. Similarly, Varma et al. showed that altered brain bile acid metabolism was associated with AD^[111]49. Heparan sulfate is another important metabolic pathway affecting amyloid-beta clearance, and high levels of heparan sulfate proteoglycans were found in post-mortem AD brains^[112]50,[113]51. Glycosphingolipids are a heterogeneous group of membrane lipids. They comprise a high amount of the brain lipid composition. Tang et al. ^[114]52 analysed the expression levels of glycosylation related genes in the AD brains and found alterations in the levels of the genes that play a role in glycosphingolipid biosynthesis in human. Similarly, omega-6 fatty acid, fatty acid beta-oxidation and nucleotide metabolisms were differentially perturbed only in the variant models of Mayo Clinic and ROSMAP. Lipid metabolism including fatty acids is one of the most studied metabolic pathways in AD^[115]53,[116]54. Lipid metabolism is known to be impaired in the early stages of AD^[117]47. On the other hand, since nucleotide metabolism is closely related to cell proliferation pathways, it is interesting that it was also captured in our results although it is a pathway generally studied in cancer^[118]55. As a result, different datasets have the capacity to capture different pathways that have been associated with AD or that may be candidates for AD. These results are also important for understanding how different brain regions are affected in AD. Additionally, propanoate, inositol phosphate, glycerophospholipid and biotin metabolic pathways were identified to be perturbed in both variant and non-variant models of the same dataset. Propanoate metabolite is known to be neuroprotective and associated with brain-gut axis^[119]56. A study showed that changes in the levels of inositol phosphate metabolic enzymes were correlated with amyloid beta peptide formation and tau hyperphosphorylation in AD^[120]57. A recent study using Drosophila model of tauopathy suggested biotin metabolism as a druggable target for neuroprotection^[121]58. Glycerophospholipids are complex species of fatty acids, and they are the most abundant class of lipids in the human brain^[122]59. These results also indicated that adding the variant effect to the models does not cause the loss of previously annotated metabolic processes. As an additional perspective, we reconstructed personalized iMAT models of AD patients by using a more stringent cut-off of gene pathogenicity. In this analysis, a gene was considered pathogenic if its pathogenicity score was higher than 99% of the controls. Significantly affected reactions and significantly enriched pathways were identified accordingly. Similar to our findings with the 95% cut-off, variant-integrated models again revealed more pathways than expression-only models. Also, the pathways known to be associated with AD such as glycolysis, fatty acid oxidation and bile acid biosynthesis were identified only in the ADvar models (Supplementary Fig. [123]4). Conclusion Genome-scale metabolic models in complex diseases such as Alzheimer’s disease, in which many different metabolisms in the cell are affected, provide a great advantage for systemic investigation of the etiology of the disease. For this purpose, the generation of disease models by integrating gene expression data into GEMs has become widespread in the literature in the last 20 years, and it is a frequently used method today. In these models, information from mRNA level is incorporated into the model as the abundance of enzyme-coding proteins, and optimization problems based on mass-balance around intracellular metabolites are solved by aiming to keep the reactions catalyzed by enzymes with high abundance in the model and excluding the reactions catalyzed by enzymes with low abundance from the model. Although these models are very convenient, possible loss of function in the mRNA-to-protein transition is neglected. In this study, for the first time in the literature, we constructed personalized metabolic models by determining both gene expression levels and pathogenic variants from the same AD and control RNAseq samples from different brain regions. When we compared these models with the models constructed with only gene-expression data, we showed that pathways such as fatty acid oxidation, bile acid metabolism, sphingolipid metabolism and glycolysis, which are known to play crucial roles in AD metabolism, were found to be significantly altered only in the variant integrated models. In addition, gene-level analysis showed that the list of significantly perturbed reactions was significantly enriched with AD genes for all the three datasets in the AD^var models, and with a much higher significance level compared to the models constructed with only gene expression data. The three data sets analyzed in this study were obtained from three different brain regions. The pathways associated with the significantly altered reactions of these datasets were compared, and it was observed that the pathways of variant-integrated models were more similar to each other across the three datasets (Jaccard similarity index >0.6) than the pathways of only expression-integrated models (Supplementary Fig. [124]5). This is expected since genetic effects should be more consistent across different brain regions than gene expression. Although the results of this study have brought a new perspective to the use of metabolic modeling and transcriptome data in AD studies, it has some limitations. Within the framework of this study, only the effect of genomic variants that are predicted to affect protein expression was considered. However, some variants in intronic regions called eQTLs are known to regulate the expression of target genes. Transcriptome-wide association studies (TWAS) aim to reveal such variant-expression-trait relationships^[125]60. In future studies, metabolic models can be generated by also adding the information of metabolic genes whose expressions are regulated by variants based on TWAS studies. The reactions in GEMs can have complex gene-protein-reaction (GPR) rules. The presence of a significantly higher pathological variant load in a gene in AD may impair gene function, affecting the reaction catalyzed by the enzymes coded by a single gene or enzyme complexes; however, reactions catalyzed by alternative enzymes may remain unaffected if the variant does not impact the alternative genes. On the other hand, for some genes with high pathological variant load in AD, it is possible that one of the known transcripts of these genes indeed does not carry identified pathogenic variants, rendering the gene functional. This possibility was ignored in this study. This limitation can be eliminated by transcript-based variant identification and data mapping on GEMs instead of gene-based strategy. In addition, post-translational modifications which could have an important role in enzyme activity were ignored in this study. For more comprehensive results, integrating the effect of post-translational modifications in the cell into the genome-scale metabolic models is another important future study. Supplementary information [126]Supplementary Information^ (587.9KB, pdf) [127]Reporting Summary^ (1.4MB, pdf) Acknowledgements