Abstract
Genome-wide association studies (GWAS) for atrial fibrillation (AF)
have uncovered numerous disease-associated variants. Their underlying
molecular mechanisms, especially consequences for mRNA and protein
expression remain largely elusive. Thus, refined multi-omics approaches
are needed for deciphering the underlying molecular networks. Here, we
integrate genomics, transcriptomics, and proteomics of human atrial
tissue in a cross-sectional study to identify widespread effects of
genetic variants on both transcript (cis-eQTL) and protein (cis-pQTL)
abundance. We further establish a novel targeted trans-QTL approach
based on polygenic risk scores to determine candidates for AF core
genes. Using this approach, we identify two trans-eQTLs and five
trans-pQTLs for AF GWAS hits, and elucidate the role of the
transcription factor NKX2-5 as a link between the GWAS SNP rs9481842
and AF. Altogether, we present an integrative multi-omics method to
uncover trans-acting networks in small datasets and provide a rich
resource of atrial tissue-specific regulatory variants for transcript
and protein levels for cardiovascular disease gene prioritization.
Subject terms: Proteomics, Gene regulatory networks, Gene expression,
Genome-wide association studies, Genomics
__________________________________________________________________
Numerous disease-associated variants have been described in GWAS for
atrial fibrillation. Here the authors integrate omics data to
investigate the consequences of genetic variants for transcript and
protein levels in the atrium of the human heart. With this multi-omics
approach, authors reveal the regulatory network underlying atrial
fibrillation and provide a resource for cardiac gene prioritization.
Introduction
Genome-wide association studies (GWAS) have discovered thousands of
disease-associated single-nucleotide polymorphisms (SNPs) and improved
our understanding of genetic and phenotypic relationships^[46]1. In
this regard, GWAS have been applied to investigate atrial fibrillation
(AF), which affects more than 30 million individuals worldwide^[47]2.
More than 100 distinct genetic loci have been identified^[48]3,[49]4
and integrated into genome-wide polygenic risk scores (PRS) for AF risk
prediction^[50]5,[51]6. While rare monogenic effects exist, additive
polygenic effects of many common variants explain a much higher
proportion of AF risk in the population^[52]7. More than 95% of these
GWAS variants are localized in non-coding regions^[53]3 most likely
acting through regulatory elements affecting expression of multiple
genes and pathways that remain largely elusive^[54]8.
A common approach to further investigate those mechanisms is to
consider tissue-specific cis-acting expression quantitative trait loci
(eQTL), where genetic variants affect the transcription of nearby
genes. However, cis-eQTLs can only explain a fraction of the identified
AF risk loci^[55]3. Therefore, trans-eQTLs, where the variant is
distant to the target gene, and more complex genetic or epigenetic
mechanisms need to be considered^[56]8–[57]10. To date, it remains
difficult to quantify the contribution of cis- and trans-variants to
the heritability of complex diseases such as AF.
Recently, the contribution of trans-effects to the genetic architecture
of complex polygenic traits was theoretically assessed by the omnigenic
model^[58]11. Based on this model, it was estimated that trans-genetic
effects explain at least 70% of the disease heritability by indirect
propagation through gene regulatory networks^[59]11. Within these
networks, multiple trans-effects can accumulate on just a few central
genes, so-called core genes, which in turn are functionally related to
a phenotype. Identifying those core genes by trans-eQTLs remains
challenging due to the small effect size of each individual
locus^[60]9,[61]12 and the associated large multiple testing burden.
Since a PRS summarizes the genetic risk information, it can act as a
proxy for the accumulation of trans-effects in one score^[62]12. By
correlating the score with transcript expression (eQTS), the
propagation of trans-effects to mRNA level^[63]12 can be evaluated.
However, not only transcript abundance, but also the abundance of
translated proteins can determine phenotypic consequences. To date,
little is known about genetic effects on protein levels (pQTLs), e.g.
through post-transcriptional regulation^[64]13–[65]16, especially in
atrial tissues. It is both challenging and important to establish
methods to identify AF core genes and to integrate data from multiple
omics levels to improve the understanding of genotype–phenotype
relationships.
In this work, we present a multi-omics analysis that uses genomics,
transcriptomics and proteomics of human atrial tissue to better
understand how genetics are related to molecular changes in AF. The
first aim was to systematically integrate omics data and identify
genome-wide cis-regulatory mechanisms on transcript as well as protein
level. We reasoned that core genes are key for a better understanding
of complex molecular pathomechanisms of AF. Therefore, the second aim
was to identify candidate core genes for AF and the trans-acting
regulatory networks that link them to AF GWAS loci. We develop an
approach combining the correlation of gene expression with a PRS for
AF^[66]6,[67]12 and pathway enrichment analysis to identify
AF-associated biological processes. Based on those processes, candidate
core genes for targeted trans-QTL analyses with AF GWAS SNPs are
selected. This approach allows the identification of putative core
genes, their molecular networks and downstream consequences in AF.
Results
Cis-QTL analysis
Microarray transcriptomics and mass spectrometry-based proteomics were
used to profile human atrial tissue samples collected during coronary
artery bypass surgery. We analyzed disease-independent effects of
genetic variants on transcript, cis-expression quantitative trait loci
(cis-eQTLs), and protein levels, cis-protein quantitative trait loci
(cis-pQTLs), of nearby genes. All cis-QTLs were calculated using
expression values for 16,306 genes and 1337 proteins (Table [68]1) with
additional PEER^[69]17,[70]18 factors, an established method which
allows to adjust for known as well as unknown confounders (see methods,
Supplementary Fig. [71]1). Comparison of PEER factors and observed
covariates showed that PEER factors successfully captured disease
status, cardiovascular risk factors and technical covariates
(Supplementary Fig. [72]2). We assessed the replication rate of our
eQTLs in GTEx^[73]19 v7 atrial appendage tissue. Effect sizes for the
best eQTL (P < 1 × 10^−5) per gene showed a correlation of 0.81
(P = 2.3 × 10^71) in GTEx, 62% replicated (GTEx P < 1 × 10^−5) and 87%
showed concordant allelic effects (see also Supplementary Note
[74]1Cis-QTL replication^[75]20, Supplementary Fig. [76]3).
Furthermore, correlations between transcript and protein abundances
were comparable to previous studies^[77]14 (Supplementary Fig. [78]4,
Supplementary Table [79]1) indicating high quality of the proteomics
measurements.
Table 1.
Summary of tested data and discovered cis-quantitative trait loci.
Results for all available transcriptomics and proteomics measurements
Tested FDR < 0.05: P < 1 × 10^−5
SNPs Pairs Genes Pairs Genes Loci Pairs Genes Loci N
eQTL 4,861,118 56,139,851 16,306 57,403 1058 1657 40,267 552 870 75
pQTL 2,323,504 4,508,654 1337 4081 91 139 2543 45 71 75
Results only for genes with both transcriptomics and proteomics
measurements
eQTL 2,249,758 4,198,168 1243 4603 124 201 3218 64 109 75
pQTL 2,249,758 4,198,168 1243 3906 87 133 2406 42 66 75
ratioQTL 2,249,758 4,198,168 1243 563 16 23 575 18 27 66
[80]Open in a new tab
Discovered cis-QTLs in human heart right atrial appendage tissue for
mRNA and protein measurements. Two-sided t-tests were evaluated and
Benjamini-Hochberg procedure to calculate FDR per omic was applied to
account for multiple comparisons. Listed are the results for
significance thresholds FDR < 0.05 and nominal P-value < 1 × 10^−5.
Loci denote the number of independent loci derived by LD clumping.
Source data are provided as a Source Data file.
QTL quantitative trait loci, FDR false discovery rate, LD linkage
disequilibrium, eQTL expression quantitative trait loci, pQTL protein
quantitative trait loci, N sample size.
Cis-regulatory patterns in atrial tissue
We first sought to functionally characterize the cis-regulatory
variants. Local genetic variation can lead to different modulations in
mRNA and protein abundance, which are commonly attributed to
transcriptional and post-transcriptional
regulation^[81]14,[82]15,[83]21. Protein abundances are suggested as
more direct determinants for phenotypic consequences of expression
QTLs^[84]14 emphasizing the need to integrate mRNA and proteomic
measurements to better understand functional genotype–phenotype
relationships. Thus, only for the following characterization of
cis-genetic variation on transcripts and proteins, we focused on genes
with both transcriptomics and proteomics measurements (1,243 genes,
Table [85]1). As observed previously^[86]14,[87]15,[88]21, significant
eQTLs and pQTLs (FDR < 0.05) differ considerably. On gene level, 32% of
the top eQTLs replicated in pQTLs and 50% vice versa, with a Pearson
correlation of 0.58 and 0.66 for corresponding effect sizes
(see [89]Supplementary Information Overlap of cis- eQTLs and pQTLs).
Considering all individual SNP-gene pairs, only 8.2% of significant
associations are shared between mRNA and proteins (Fig. [90]1a,
Supplementary Fig. [91]5, Supplementary Table [92]2). Linkage
disequilibrium (LD) clumping was performed on eQTL and pQTL summary
statistics simultaneously allowing for the partitioning of cis-regions
into independent loci for further evaluation.
Fig. 1. Significant cis-eQTLs, cis-pQTLs and their overlap.
[93]Fig. 1
[94]Open in a new tab
a Circular plot of the significant cis-eQTLs (blue) and pQTLs (purple)
at a FDR cutoff of 0.05 (dotted line, plot created using the R package
circlize^[95]85). Considering only genes with both transcriptomics and
proteomics measurements, we visualized the overlap of significant eQTLs
and pQTLs in the circle center. In total, the lead SNP-gene pair of 200
QTL clumps in 124 genes had a significant eQTL and 133 loci in 87 genes
a significant pQTL. Only 19 lead variants (13 genes) had an eQTL and
pQTL for the same gene. The numbers in brackets represent the number of
significant SNP-gene pairs. b Characterization of overlapping eQTL and
pQTL loci. All 19 LD clumps (based on eQTL and pQTL summary statistics)
where the lead SNP-gene-pair was a significant eQTL and pQTL were
classified as a shared QTL by either our residual regression approach
or colocalization analysis. c Characterization of eQTL loci without a
corresponding pQTL. Only 83 out of 181 LD clumps (based on eQTL and
pQTL summary statistics) that had a lead SNP-gene-pair with a
significant eQTL but no pQTL were classified as an independent eQTL by
either our residual regression approach or colocalization analysis. d
Characterization of pQTL loci without a corresponding eQTL. Only 42 out
of 114 LD clumps (based on eQTL and pQTL summary statistics) that had a
lead SNP-gene-pair with a significant pQTL but no eQTL were classified
as an independent pQTL by either our residual regression approach or
colocalization analysis. eQTL expression quantitative trait loci, pQTL
protein quantitative trait loci, QTL quantitative trait loci, FDR false
discovery rate, LD linkage disequilibrium, SNP single-nucleotide
polymorphism. Source data are provided as a Source Data file.
Divergence of mRNA from protein abundance can arise through diverse
molecular mechanisms which we additionally analyzed by calculating
protein-per-mRNA ratios (ratioQTLs) (see methods). Lack of overlap can
either indicate independent effects or lack of power in one of the
studies. To distinguish these scenarios, we quantified for every
variant if its effect was shared between the omic levels using a linear
regression based residual QTL analysis. A residual eQTL considers the
association between a variant and the mRNA measurements after removing
variation shared with protein measurements for the same gene. Vice
versa, a residual pQTL describes the association between the variant
and protein levels after removing variation shared with mRNA. Having
removed variation that was shared across omics levels for each gene, we
can now assess the omic-specific effect of a variant. Three simple
regulatory categories were defined by our regression approach (see
methods, Fig. [96]2, Supplementary Fig. [97]6, Supplementary
Table [98]3). We compared these categories to a formal colocalization
analysis based on approximate Bayes Factor analysis^[99]22 to estimate
posterior probabilities for a shared or independent causal variant
within each LD clump:
* (i)
For 11 genes, 14 independent loci with shared cis- eQTL/pQTL were
identified, where the lead SNP is associated with both mRNA
expression and the respective protein abundance as depicted in
Fig. [100]2a. The corresponding variants were primarily enriched in
cis-regulatory elements such as active transcription start sites
(TssA) (Supplementary Fig. [101]6a, Supplementary Fig. [102]7). For
all but one loci, colocalization analysis also confirmed a shared
causal variant (Fig. [103]1b).
* (ii)
For 37 genes, 62 independent loci with independent eQTLs were
identified, where the lead SNP is associated only with transcript
levels, but the respective protein abundance is independent of the
genotype (Fig. [104]2b). Corresponding variants were enriched in
elements regulating transcription, e.g. transcription factor
binding site (TF BS) or enhancer regions, and within splicing
regions (Supplementary Fig. [105]6b, Supplementary Fig. [106]7a).
Possible mechanisms involved in unchanged protein levels remain
largely elusive and range from adaptation of translational rate to
protein degradation and long-non-coding RNAs^[107]23,[108]24. For
43 out of 62 LD clumps, colocalization analysis also confirmed an
independent eQTL (Fig. [109]1c).
* (iii)
For 21 genes, 25 independent loci with independent pQTLs were
identified, where the lead SNP affects only protein abundance
(Fig. [110]2c). pQTL variants were enriched for exonic regions and
although not significantly, in binding sites of RNA binding
proteins (RBP) (Supplementary Fig. [111]6c, Supplementary
Fig. [112]7a), where they may influence mRNA translation resulting
in an independent pQTL association^[113]25. For 16 out of 25 LD
clumps, colocalization analysis also confirmed an independent pQTL
(Fig. [114]1d).
Fig. 2. Different genetic regulatory patterns derived by multi-omics cis-QTL
integration.
[115]Fig. 2
[116]Open in a new tab
a Shared eQTLs/pQTLs represent QTLs, where the effect of
transcriptional regulation translates into mRNA and protein abundance
exemplified by the significant SNP-gene pair rs9664184-MYOZ1. No
corresponding ratioQTL can be observed as the genetic variation is
shared across both omics levels. b Independent eQTLs depict variants
with regulation on mRNA but not on protein level displayed by the
significant SNP-transcript pair rs2070594-ATP5C1. c Independent pQTLs
represent variants that show regulation only on protein level as shown
for the SNP-protein pair rs3916-ACADS. Genetic influence is not
observable on transcript level. In the boxplots, the lower and upper
hinges correspond to the first and third quartiles (the 25th and 75th
percentiles). The median is denoted by the central line in the box. The
upper/lower whisker extends from the hinge to the largest/smallest
value no further than 1.5 × IQR from the hinge. Nominal P-values were
derived based on two-sided t-tests for N = 75 (eQTLs), N = 75 (pQTL)
and N = 66 (ratioQTL) biologically independent samples. To assess
significance, FDR correction per omic based on the Benjamini-Hochberg
procedure was applied to account for multiple comparisons. eQTL
expression quantitative trait loci, pQTL protein quantitative trait
loci, ratioQTL ratio quantitative trait loci, TssA active transcription
start site, UTR untranslated region, TF BS transcription factor binding
site, RBP RNA binding protein, SNP single-nucleotide polymorphism, IQR
interquartile range. Source data are provided as a Source Data file.
For only a small fraction of 5 out of 181 loci with an eQTL but no
pQTL, colocalization analysis suggested a shared QTL and the other way
around, 7 out of 114 loci with a pQTL but no eQTL may be actually
shared. Altogether, we confirmed that QTL variants corresponding to
different categories tended to cluster in distinct genomic regions
(Supplementary Fig. [117]7, Supplementary
Note [118]1)^[119]14,[120]18,[121]26. By integrating matched
transcriptome and proteome data, we were able to differentiate
functional regulatory mechanisms not observable by transcriptomics
only.
GWAS overlap and enrichment
In order to investigate genotype–phenotype relationships in the context
of cardiovascular disease, we used all available cis-QTL data (not only
those quantified on both omics levels, FDR < 0.05) to annotate GWAS
variants for phenotypes either related to cardiovascular measurements
or cardiovascular disease (Fig. [122]3a).
Fig. 3. Overlap of cis-QTL associations with GWAS hits annotated in the GWAS
catalog.
[123]Fig. 3
[124]Open in a new tab
a Overview of significant cis- eQTLs and pQTLs (FDR < 0.05) overlapping
with GWAS hits for different disease traits. b Independent pQTL for
GWAS hit creatine kinase levels. Shown are the non-significant cis-eQTL
and the significant cis- pQTL and ratioQTL for the SNP rs1801690 and
the gene APOH (FDR < 0.05). Statistics were derived based on two-sided
t-tests for N = 75 (eQTLs), N = 75 (pQTL) and N = 66 (ratioQTL)
biologically independent samples. A FDR < 0.05 per omic based on the
Benjamini-Hochberg procedure was applied to assess significance and to
account for multiple comparisons. In the boxplots, the lower and upper
hinges correspond to the first and third quartiles (the 25th and 75th
percentiles). The median is denoted by the central line in the box. The
upper/lower whisker extends from the hinge to the largest/smallest
value no further than 1.5 × IQR from the hinge. c For three different
trait categories (cardiovascular traits, arrhythmias and atrial
fibrillation) as well as rheumatoid arthritis as a negative control,
the enrichment of GWAS hits at significant cis-QTLs (FDR < 0.05) was
evaluated. Enrichments were calculated using Fisher̀s exact test
(two-sided). 4,815,266 (eQTL) and 2,301,873 (pQTL) SNPs were evaluated
for 7,817/4,661 (eQTL/pQTL) cardiovascular trait, 2,287/1,006
(eQTL/pQTL) arrhythmic, 691/394 (eQTL/pQTL) AF and 468/297 (eQTL/pQTL)
RA GWAS hits. Odds ratios are presented with their 95% CI. Source data
are provided as a Source Data file. QTL quantitative trait loci, GWAS
genome-wide association study, SNP single-nucleotide polymorphism, eQTL
expression quantitative trait loci, pQTL protein quantitative trait
loci, ratioQTL ratio quantitative trait loci, CI confidence interval,
IQR interquartile range. Source data are provided as a Source Data
file.
Of all the overlaps between GWAS hits and cis-QTLs (Supplementary
Table [125]4), AF-related loci were most abundant (17 eQTLs, four also
with pQTL, see Supplementary Table [126]5). Furthermore, we found an
independent pQTL overlapping with the GWAS hit for creatine kinase
levels (Fig. [127]3b). This genetic effect was not detected on mRNA
level illustrating the importance of proteomics data. In addition, we
systematically assessed whether significant QTLs are enriched at GWAS
loci in the hierarchical groups cardiovascular traits, arrhythmias and
AF (two-sided Fishers exact test). We identified a strong significant
overrepresentation (P < 1 × 10^−5) of eQTLs at GWAS hits for all three
groups, and a significant overrepresentation (P < 0.05) of pQTLs in
variants annotated with arrhythmias and AF (Fig. [128]3c).
Additionally, we evaluated rheumatoid arthritis (RA) as negative
control, a trait that should not share atrial-specific disease
mechanisms. Indeed, there were no overlaps between cis-QTLs and RA GWAS
loci. Alltogether, we presented widespread effects of cis-acting
variants on gene expression and protein abundance in atrial tissue and
a possible relation to AF.
Trans-QTL analysis
We further extended cis-regulatory analyses by investigating
trans-effects. Specifically, we addressed a key hypothesis of the
omnigenic model^[129]11, which postulates the existence of core genes.
Core genes are central genes with trans-associations to AF GWAS loci,
whose expression levels directly affect the disease phenotype. Here we
sought to identify candidate core genes for AF to understand the
contribution of trans-genetic effects in the pathology of AF. To
prioritize genes satisfying the properties predicted by the omnigenic
model, we evaluated the accumulation of trans-effects, their relevance
in gene regulatory networks, and the association with AF by the
following strategy (Fig. [130]4):
1. We evaluated the cumulated trans-effects of AF-associated variants
on expression by ranking genes based on their correlation of mRNA
and protein abundance with the PRS for AF^[131]6 (see Supplementary
Fig. [132]8), so called expression/protein quantitative trait score
(eQTS/pQTS)^[133]12. While correcting for possible cis-effects by
including the top SNP per independent cis-QTL loci, the PRS served
as a proxy for an aggregation of AF-related trans-effects across
the whole genome.
2. To identify genes sharing molecular function and representing
biological networks that propagate trans-effects to core genes,
gene set enrichment analysis (GSEA)^[134]27,[135]28 was performed
on the eQTS and pQTS rankings. Genes driving the enrichment of
multiple gene sets were selected as core gene candidates.
3. The link between the core gene candidates and AF was established
based on a significant trans-eQTL or pQTL for an AF GWAS hit and
further supported by differential protein abundance analysis.
Fig. 4. Graphical illustration of the strategy for trans-QTL analysis to
identify AF-relevant genes.
[136]Fig. 4
[137]Open in a new tab
a Overview: Based on patient-specific PRS values for AF correlated with
transcript and protein expression, we performed GSEA to preselect genes
for trans-eQTL and pQTL analyses from the leading edge of enriched
pathways. Core genes were identified as significant trans-eQTLs or
trans-pQTLs. We further assessed their functional targets to
investigate the genotype–phenotype relationship in the context of
AF. Graphical concept adapted from Liu et al.^[138]11. b Identified
core genes as trans-eQTLs (blue), trans-pQTLs (purple) (FDR < 0.2) and
functional NKX2-5 targets (light purple). PRS, genome-wide polygenic
risk score; GSEA gene set enrichment analysis, QTS quantitative trait
score, eQTL expression quantitative trait loci, pQTL protein
quantitative trait loci, FDR false discovery rate, AF atrial
fibrillation, blue, green or gray dots = core gene candidates, red
dots = core genes with trans- eQTL/pQTL, stars = functional targets of
core genes. Circular plots were created with the R package
circlize^[139]85.
Identification of candidate AF core genes
Based on this strategy, we first used the GO biological process gene
set annotations, which are not a priori disease-related, to recover
processes functionally related to AF. Using the eQTS ranking of all
measured transcripts (top hits in Supplementary Table [140]6) as
background for the gene set enrichment, 81 GO biological processes were
enriched (adjusted P-value < 0.05, Supplementary Table [141]7) mostly
related to heart muscle or energy metabolism, including the processes
GO:0006091 (generation of precursor metabolites and energy), GO:0055117
(regulation of cardiac muscle contraction), GO:0048738 (cardiac muscle
tissue development) and GO:0002027 (regulation of heart rate).
Furthermore, we identified three processes that implicate calcium
homeostasis (GO:0010880 regulation of release of sequestered calcium
ion into cytosol by sarcoplasmic reticulum, GO:0010881 regulation of
cardiac muscle contraction by regulation of the release of sequestered
calcium ion, and GO:0010882 regulation of cardiac muscle contraction by
calcium ion signaling). For pQTS rankings (top hits in Supplementary
Table [142]8) and restricting the background only to those proteins
quantified in our dataset, one GO biological process (GO:0044281 small
molecule metabolic process) connected to metabolism was enriched
(adjusted P-value < 0.05, Supplementary Table [143]9).
Our pathway enrichment approach yielded 23 transcripts (Supplementary
Table [144]10) and 152 proteins (Supplementary Table [145]11) as core
gene candidates that we used to calculate trans-QTLs with 108 AF GWAS
SNPs. On mRNA level, we identified two trans-eQTLs encoding for a
cardiac structural protein (rs11658168-TNNT2) and a transcription
factor (rs9481842-NKX2-5) (Table [146]2). Since NKX2-5 was not detected
on protein level using mass spectrometry, we performed additional
Western blot experiments to identify the respective pQTL (see
Supplementary Fig. [147]9-[148]10, β = −0.45, T = −2.9, P = 0.049,
N = 29, df = 21, two-sided t-test). On protein level, we discovered
five trans-pQTLs which are all connected to metabolism
(rs11588763-CYB5R3/NDUFB3/NDUFA9/DLAT, rs11658168-HIBADH)
(Table [149]2). Noticeably, four out of five identified genes encode
for mitochondrial enzymes (HIBADH) or enzyme subunits (NDUFA9, NDUFB3,
DLAT). More than half of the putative core genes have already been
mentioned by other studies in the context of arrhythmias and other
cardiovascular diseases^[150]29–[151]35 (detailed findings see
Supplementary Table [152]12 and differential expression
results^[153]30,[154]36 in Supplementary Table [155]13-[156]14) which
independently replicates the disease link.
Table 2.
Trans-QTL results.
Variant Gene Trans-QTL
GWAS SNP QTL SNP Chr Position Symbol Chr QTL β T value P-value FDR
rs9675122 rs11658168 chr17 7,406,134 TNNT2^a,d chr1 Transcript −0.517
−4.27 6.43 × 10^−5 0.081
rs9481842 rs9481842 chr6 118,974,798 NKX2-5^b chr5 Transcript −0.593
−4.27 6.54 × 10^−5 0.081
rs34292822 rs11588763 chr1 154,813,584 CYB5R3 chr22 Protein −0.786
−4.89 6.86 × 10^−6 0.113
rs34292822 rs11588763 chr1 154,813,584 NDUFB3^c chr2 Protein −0.916
−4.44 3.56 × 10^−5 0.133
rs9675122 rs11658168 chr17 7,406,134 HIBADH chr7 Protein −0.512 −4.43
3.66 × 10^−5 0.133
rs34292822 rs11588763 chr1 154,813,584 NDUFA9^d chr12 Protein −0.752
−4.42 3.85 × 10^−5 0.133
rs34292822 rs11588763 chr1 154,813,584 DLAT chr11 Protein −0.716 −4.40
4.05 × 10^−5 0.133
[157]Open in a new tab
Significant trans-eQTLs and pQTLs for a FDR < 0.2 (Benjamini-Hochberg
procedure to account for multiple comparisons per omic). Two-sided
t-tests were performed on 23 transcripts with 74 samples and 152
proteins with 73 samples of human heart right atrial appendage tissue
for 108 variants associated with atrial fibrillation from the GWAS
catalog (or their proxy, if the GWAS SNP was not measured).
Calculations were carried out using the SNP rs11658168 as a proxy for
the GWAS SNP rs9675122 as well as rs11588763 instead of the GWAS SNP
rs34292822.
eQTL expression quantitative trait loci, pQTL expression quantitative
trait loci, QTL quantitative trait loci, FDR false discovery rate, GWAS
genome-wide association study, SNP single-nucleotide polymorphism, Chr
Chromosome.
^aMutation known to affect cardiovascular phenotypes.
^bMutation known to affect arrhythmias.
^cDifferential expression functional impairment for cardiovascular
phenotypes.
^dDifferential expression or functional impairment for arrhythmias; For
details to disease links in literature see Supplementary Table [158]12.
Source data are provided as a Source Data file.
NKX2-5 transcription factor network
In order to get more detailed information about complex molecular
mechanisms underlying AF, we further analyzed the TF network of NKX2-5
(see Fig. [159]5a) since the TF has already been described in the
context of cardiac development^[160]37, AF^[161]31,[162]32,[163]38 and
congenital heart diseases^[164]39.
Fig. 5. NKX2-5 activity controlled by AF GWAS variant rs9481842.
[165]Fig. 5
[166]Open in a new tab
a Graphical illustration of NKX2-5 TF target gene analysis in AF. b
Strong trans-eQTL of the SNP rs9481842 with the NKX2-5 transcript for
N = 75 independent biological samples. c Validation of the NKX2-5
trans-eQTL on protein level (trans-pQTL) using western blot analysis in
remaining tissue samples (N = 29 independent biological samples). d
NKX2-5 activity estimation based on target mRNA expression stratified
by the rs9481842 genotype for N = 75 independent biological samples. e
Depicted are functional NKX2-5 targets with the number of TF binding
sites (column 1), trans-eQTL strength (columns 2–4), trans-pQTL
strength (columns 5–7) and protein level in AF (columns 8–9). The
colour scale represents median transcript or protein values per group
(=columns). Residuals corrected for fibroblast-score and RIN-score /
protein concentration with subsequent normal-quantile-normalization per
gene were used to calculate the medians per group. A quantitative
description of the qualitative results presented in the heatmap can be
found in Supplementary Table [167]16 and Table [168]3. In the boxplots,
the lower and upper hinges correspond to the first and third quartiles
(the 25th and 75th percentiles). The median is denoted by the central
line in the box. The upper/lower whisker extends from the hinge to the
largest/smallest value no further than 1.5 × IQR from the hinge. AF
atrial fibrillation, QTL quantitative trait loci, BS binding site, IQR
interquartile range *Mutation known to affect cardiovascular
phenotypes, **Mutation known to affect arrhythmias, ^+Differential
expression or functional impairment for cardiovascular phenotypes,
^++Differential expression or functional impairment for arrhythmias.
Source data are provided as a Source Data file.
To evaluate the downstream effects of the SNP rs9481842 via the TF
NKX2-5 in AF, we analyzed the influence of NKX2-5 transcript levels on
target genes by estimating NKX2-5 TF activity (TFA). We annotated
NKX2-5 binding sites in promoter regions based on published
iPSC-derived cardiomyocyte ChIP-seq and promoter-capture HiC data to
identify almost 10,000 target genes. The number of binding sites per
gene in open chromatin regions were counted for each gene and the TFA
was computed as the sum of target transcript expression weighted by the
number of binding sites. We observed a high correlation between the SNP
rs9481842 and the NKX2-5 transcript (cor = −0.43, P = 1.4 × 10^−4,
two-sided Pearson’s correlation, Fig. [169]5b, Supplementary
Fig. [170]11a) as well as for the direct molecular link between the
NKX2-5 transcript and the TF activity (cor = 0.36, P = 1.3 × 10^−4,
one-sided Pearson’s correlation, Supplementary Fig. [171]11c). In
addition, there was a weak association between SNP rs9481842 and the TF
activity (cor = −0.13, P = 0.145, one-sided Pearson’s correlation,
Fig. [172]5d, Supplementary Fig. [173]11b) most likely attributed to
the indirect link through NKX2-5. Partial correlation analysis further
supported NKX2-5 being the causal link between SNP and target
expression (Supplementary Table [174]15). Based on the western blot
data that was used to identify the NKX2-5 pQTL (Fig. [175]5c), we also
observed a high correlation (Spearman’s rank correlation ρ = 0.42,
P = 0.026) between the calculated TF activity and the actual protein
abundance (Supplementary Fig. [176]12).
Next, to elucidate the role of NKX2-5 as a link between the disease
variant and AF, we further analyzed its effect on specific targets,
which we also prioritized as putative core genes. Overall, we
identified 13 functional targets that are significantly influenced by
the SNP rs9481842 as well as NKX2-5 transcript levels on both mRNA and
protein level (see methods and Supplementary Fig. [177]13 for details,
Supplementary Table [178]16). For these 13 targets, we observed a
consistent downregulation on mRNA and protein level with respect to the
rs9481842 risk allele (Fig. [179]5e). As the core gene model predicts a
direct effect of core gene expression on the phenotype^[180]11, we
evaluated the protein abundance of the NKX2-5 target genes in patients
with AF compared to patients in sinus rhythm to assess functional
connection to the disease. For all targets, AF cases showed lower
protein levels (Fig. [181]5e). When adjusting for common risk factors
of AF, five out of 13 targets showed a nominal P-value P < 0.05
(Table [182]3). More importantly, the identified target set
collectively displayed a strong association with AF on proteomics level
(GSEA P = 7.17 × 10^−5). This serves as independent validation of the
disease link, since these genes were identified based on molecular data
in our cohort in combination with public AF annotations without using
the actual cohort phenotypes.
Table 3.
Putative core genes and functional targets with disease association.
Protein AF association
Gene Chr Type β T-value P-value FDR
TNNT2^a,d chr1 Trans-eQTL −0.0609 −1.61 0.113 1.00
NKX2-5^b chr5 Trans-eQTL
CYB5R3 chr22 Trans-pQTL −0.0212 −0.662 0.511 1.00
NDUFB3^c chr2 Trans-pQTL −0.0631 −1.35 0.182 1.00
HIBADH chr7 Trans-pQTL −0.0454 −1.24 0.218 1.00
NDUFA9^d chr12 Trans-pQTL −0.0533 −1.20 0.235 1.00
DLAT chr11 Trans-pQTL −0.0231 −0.579 0.564 1.00
PPIF chr10 NKX2-5 target −0.0342 −1.13 0.261 1.00
MYL4^b,d chr17 NKX2-5 target −0.0270 −0.664 0.509 1.00
CKM^d chr19 NKX2-5 target −0.0875 −2.78 0.00705 0.120
MYL7 chr7 NKX2-5 target −0.0421 −1.04 0.304 1.00
PGAM2^d chr7 NKX2-5 target −0.175 −3.70 0.000452 0.00813
TNNC1^a chr3 NKX2-5 target −0.0557 −1.71 0.0929 1.00
CYC1 chr8 NKX2-5 target −0.0946 −2.14 0.036 0.545
ETFB^b,d chr19 NKX2-5 target −0.0553 −1.65 0.105 1.00
PRDX5 chr11 NKX2-5 target −0.0524 −1.79 0.0789 1.00
AK1 chr9 NKX2-5 target −0.0669 −2.17 0.0341 0.545
ALDOA^d chr16 NKX2-5 target −0.0646 −2.17 0.0341 0.545
TCAP^a chr17 NKX2-5 target −0.0178 −0.282 0.779 1.00
TOM1L2 chr17 NKX2-5 target −0.0771 −1.75 0.0849 1.00
[183]Open in a new tab
Proteomics differential abundance results in human atrial appendage
tissue for prevalent AF. Two-sided t-tests were calculated as part of a
multiple linear regression model including AF-related covariates sex,
age, BMI, diabetes, systolic blood pressure, hypertension medication,
myocardial infarction and smoking status (see methods differential
protein analysis, N = 78, df = 66). The Benjamini-Hochberg procedure
was used to asses FDR and account for multiple comparisons.
AF atrial fibrillation, BMI body mass index, QTL quantitative trait
loci, FDR false discovery rate.
^aMutation known to affect cardiovascular phenotypes.
^bMutation known to affect arrhythmias.
^cDifferential expression or functional impairment for cardiovascular
phenotypes.
^dDifferential expression or functional impairment for arrhythmias.
The coordinated downregulation of those 13 targets in patients with AF
was replicated in two independent datasets (Fig. [184]6a, for details
see methods) with a GSEA P-value of 0.00593 for RNA-seq data in right
atrial appendage samples of [185]GSE128188^[186]36, a GSEA P-value of
0.0248 for RNA-seq data in left atrial appendage samples of
[187]GSE128188 (see Supplementary Table [188]13 for differential
expression results) and with a GSEA P-value of 2.43 × 10^−3 in left
atrial samples from the proteomics dataset [189]PXD006675^[190]30 (see
Supplementary Table [191]14 for differential protein abundance
results). Similarly, regulation of our identified targets by the TF
NKX2-5 was corroborated by coexpression of NKX2-5 mRNA with target
transcript levels in GTEx atrial appendage tissue and [192]GSE128188
(Fig. [193]6b).
Fig. 6. Replication of the core gene candidate AF association and NKX2-5
target coexpression in independent datasets.
[194]Fig. 6
[195]Open in a new tab
Published proteomics data (PXD006675) as well as RNA-seq data
([196]GSE128188, GTEx) generated from human atrial tissue samples were
used for replication. a Centered and scaled values of the mean mRNA or
protein expression in AF ctrls and cases, with stronger effects on
protein level. GSEA p-values quantify the negative association of
NKX2-5 targets with respect to AF. Sample sizes per column: 69
controls, 14 prevalent AF cases, 69 controls, 14 prevalent AF cases
(AFHRI, all right atrial appendage); five controls, five AF cases
([197]GSE128188, both right atrial appendage); five controls, five AF
cases ([198]GSE128188, both left atrial appendage); three controls,
three AF cases (PXD006675, both left atrium). A quantitative
description of the qualitative results presented in the heatmap can be
found in Supplementary Table [199]13-[200]14 and Table [201]3. b
Coexpression of NKX2-5 with the 13 identified NKX2-5 transcription
factor targets (Pearson’s correlation). Quantified is the correlation
between NKX2-5 and its targets on mRNA level for mRNA datasets and the
correlation between the NKX2-5 transcript expression with the target
protein concentrations for the AFHRI proteomics (NKX2-5 not quantified
in proteomics). Sample sizes used for the computation of correlations:
102 AFHRI mRNA, 96 AFHRI protein, 372 GTEx, 10 [202]GSE128188 right,
and 10 left atrial appendage samples. AF atrial fibrillation, Ctrl
control i.e. individuals in sinus rhythm, GSEA gene set enrichment
analysis *Mutation known to affect cardiovascular phenotypes,
**Mutation known to affect arrhythmias, ^+Differential expression or
functional impairment for cardiovascular phenotypes, ^++Differential
expression or functional impairment for arrhythmias. Source data are
provided as a Source Data file.
Furthermore, the majority of the identified proteins are in fact
involved in contractile function (MYL4, MYL7, TNNC1 and TCAP) or
metabolism (PPIF, CKM, AK1, PGAM2, CYC1, ETFB and ALDOA), two
mechanisms linked to processes involved in the pathophysiology of AF.
At this point, our identified putative core genes point to potential
novel targets for further experimental research to better understand
molecular consequences of genetics underlying AF.
Discussion
We present a comprehensive multi-omics analysis that integrates
genomics, transcriptomics and proteomics in human atrial tissue in a
case control cohort of AF. This unique dataset allowed us to improve
our understanding of how genetic factors are related to intermediate
molecular phenotypes in AF.
We found widespread genetic effects associated with the expression of
nearby genes on transcript and protein level. Our integrated cis-eQTL
and pQTL analysis allowed the distinction between functional regulatory
mechanisms with consequences for mRNA and protein levels. For example,
we found many genetic variants exclusively affecting mRNA or protein
abundances contributing to a modest overlap between both molecular
levels using stringent statistical criteria. In this regard, we and
others found that proteome-specific pQTLs are enriched in the coding
sequence^[203]14, where post-transcriptional regulatory elements might
be affected by sequence variants, which may at least partially explain
the divergence between eQTLs and pQTLs. Compared to other studies, a
similar extent of co-regulation between mRNA and protein levels was
previously documented by comparing cis-pQTLs in human plasma to
cis-eQTLs in GTEx tissue^[204]15 (see Supplementary Note [205]1,
Supplementary Table [206]2). We assume that the use of less stringent
significance cutoffs or multiple testing correction as well as more
sensitive measurement techniques might achieve a higher overlap as
observed by Battle and colleagues for cell-type-specific
transcriptomics and proteomics in the same lymphoblastoid cell
lines^[207]14. In line with prior studies, we observed large
differences in transcript and protein expression as well as their
regulation^[208]14,[209]15,[210]21, emphasizing the necessity and
benefit of taking multiple molecular entities into account to
investigate genotype–phenotype relationships.
To extend the cis-QTL analysis, we assessed trans-associations by
applying a candidate selection strategy based on the correlation of
gene expression with a PRS, a concept termed eQTS^[211]12. PRS
accumulate small genetic effects at many individual genome-wide loci.
In the theoretical omnigenic model^[212]11 it has been suggested that
these loci are linked to the phenotype by weak trans-effects on gene
expression, which accumulate in so called core genes. It has been shown
that this accumulation of trans-effects would lead to strong eQTS
associations for core genes^[213]12. Here we used eQTS and pQTS in
combination with gene set enrichment analysis to identify core gene
pathways and putative core genes for AF. As core genes are postulated
to have trans-associations with AF GWAS SNPs, we subsequently performed
a targeted trans-QTL analysis. This strategy allowed the investigation
of tissue-specific trans-acting genetic mechanisms in AF using a
relatively small clinical dataset by reducing the multiple testing
burden; however, it is limited to genes with functional annotations
available. Here we used gene sets from gene ontology, to avoid
introducing a bias towards already known disease gene set definitions,
as for instance contained in KEGG. The PRS-based gene set enrichment
approach revealed cardiac-specific pathways associated with the genetic
susceptibility for AF, which are similar to results identified by Wang
and colleagues^[214]40. We identified different pathways on
transcriptome and proteome level, which is probably not only caused by
biological but also technological reasons like protein coverage. On
transcriptome level, the majority of the identified pathways were
involved in contractile function and metabolism. In general, these
mechanisms have been reported by clinical and experimental studies to
play a major role in the pathology of AF^[215]41–[216]43. As expected,
also the putative core genes identified by trans-eQTLs and trans-pQTLs
were involved in those mechanisms. In addition, all identified
transcripts and some of the proteins have been described in the context
of arrhythmias or cardiovascular
disease^[217]29–[218]34,[219]44–[220]50 (Supplementary Table [221]12,
differential expression results in Supplementary
Tables [222]13-[223]14). As observed for the cis-QTL analysis, the
trans-analysis revealed similar differences between transcriptomics and
proteomics level. Interestingly, none of the trans-pQTLs had a
significant association on expression level (see Supplementary
Table [224]17). Differences between trans-eQTLs and pQTLs for the same
gene have previously not been discussed in detail in the literature.
Sun and colleagues analyzed overlapping cis-QTLs but not
trans-QTLs^[225]15. Yao and colleagues analyzed overlaps of cis- and
trans- eQTLs and pQTLs in plasma^[226]16, however no overlaps were
found in trans. Suhre and colleagues validated plasma pQTL findings in
other proteomics datasets, but did not evaluate corresponding
trans-eQTLs^[227]13. Possible reasons besides the small sample size
might be the effect of other genetic variants, post-transcriptional
regulation or environmental factors.
To investigate more complex molecular mechanisms underlying AF, we
further analyzed the TF network of NKX2-5, since the TF has been
described in the context of heart development and arrhythmias. A
loss-of-function mutation in NKX2-5 is associated with increased
susceptibility to familial AF^[228]32 demonstrating its causal role for
AF. However, this result still does not provide mechanistic insights. A
study of allele specific binding of NKX2-5 identified relevant targets
and demonstrated that variants that change NKX2-5 binding at the
promoters of target genes contribute to electrocardiographic
phenotypes^[229]38. Hence this study established a mechanistic link of
NKX2-5 to AF susceptibility at common GWAS variants, which alter
cis-regulatory elements bound by NKX2-5 at the target genes. In this
work, we link the expression of NKX2-5 to an AF-associated GWAS variant
in trans. Moreover, we investigated downstream effects by integrating
the functional data^[230]38 with our genotype and expression data. We
identified a set of target genes, where the AF-associated variant did
not alter the cis-regulatory elements but the activity of the
trans-acting TF NKX2-5. Taken together, prior studies demonstrate a
causal link of NKX2-5 with AF and suggested that AF-associated variants
alter cis-regulatory elements. Here we showed that in addition
trans-acting mechanisms are important to link AF-associated variants to
their downstream target genes. The high correlation between the
estimated NKX2-5 TF activity and its mRNA levels implies that NKX2-5
modulates target gene expression. The fact that the NKX2-5 eQTL is a
key regulatory mechanism prompted us to validate the trans-QTL on
protein level with western blot analyses. Furthermore, the TF activity
that was estimated based on genome-wide transcriptomics data as well as
independent tissue and cell-type-specific annotations, highly
correlated with actual protein intensities. While measured mRNA and
protein abundance show a stronger association with the rs9481842
genotype than the estimated TF activity, correlation between the TF
activity and protein measurements was higher compared to the
correlation between mRNA and protein. Consistent with other
studies^[231]37,[232]38, our analysis suggests that NKX2-5 acts as a
transcriptional activator for the majority of genes. However, the TF
can also function as a transcriptional repressor of genes like
ISL1^[233]37,[234]51. We were able to detect strong effects of the
NKX2-5 transcript on various target transcript and protein levels. Most
of the identified target genes are involved in contractile function or
metabolism, two mechanisms highly linked to processes involved in AF.
Our unique trans-QTL approach revealed direct disease-relevant
associations between candidate core genes, NKX2-5 target genes and AF.
These genes were identified based on molecular data in our cohort in
combination with public AF annotations without using the actual cohort
phenotypes. The collective differential protein abundance of NKX2-5
targets in AF patients compared to controls therefore serves as an
independent validation of the disease relevance, which we independently
replicated in additional RNA-seq^[235]36 and proteomics^[236]30
datasets of AF patients and controls.
Overall, we successfully integrated multi-omics data and established a
unique approach to investigate not only cis- but also trans-regulatory
effects. This provided a platform to generate hypotheses on functional
interactions underlying the genetic associations that can be further
experimentally investigated.
We acknowledge some limitations that are attributed to common
biological and technical factors. First, the use of human heart tissue
came with several challenges including restricted sample sizes and
heterogeneity of cellular composition. The small sample size affects
the statistical power of QTL analyses and does not allow for assessing
causality of molecular and physiological changes, for example by
Mendelian randomization^[237]52,[238]53. Changes in cell-type
composition and structural remodeling have been described for the
pathology of AF^[239]54. To take differences in the cellular
composition into account, we used a fibroblast-score based on a
fibroblast-specific gene signature to adjust expression levels in
eQTL/pQTL analyses^[240]55. Furthermore, although AF prevalently
originates from pulmonary vein ostia, the relevance and usefulness of
right atrial appendage tissue was demonstrated by prior studies which
identified various AF disease mechanisms and candidate
genes^[241]3,[242]56,[243]57. Therefore, we believe that the extracted
tissue samples are well suited and the best proxy for atrial impairment
for our analysis. Another strength is that tissue samples were
explanted during cardiac surgery and not post-mortem as in comparable
datasets, which can affect various pathways e.g. metabolism. Second,
expression data were generated using microarrays, however, to date,
more information can be generated by RNA-seq. Third, human cardiac
muscle tissue is dominated by mitochondrial and sarcomere
proteins^[244]58, which affects the detection of less abundant proteins
such as TFs (e.g. NKX2-5). Therefore, missing TF coverage was not due
to data quality but biological and technological restrictions. In
addition, only limited functional genomic annotations specific for
atrial tissue are currently available including TF-, miRNA- and RBP
binding sites. Therefore, we integrated multiple sources to render
functional annotations as reliable and accurate as possible.
Replication in an independent dataset was not feasible, since this was
the first study investigating pQTLs in atrial tissue. Due to this
restriction, literature research was carried out to validate the
relevance of the identified AF core genes and NKX2-5 targets in the
context of arrhythmia (Supplementary Table [245]12, differential
expression results in Supplementary Table [246]13-[247]14).
Furthermore, we validated the association of target genes with AF by
analyzing the differential expression on protein level between AF and
controls. The consistent downregulation of the 13 NKX2-5 targets in AF
cases compared to controls was replicated in two independent datasets
including RNA-seq data from left and right atrial samples as well as
proteomics from left atrial tissue. Additionally, coexpression of
NKX2-5 mRNA with transcript levels of the 13 targets was replicated in
two independent datasets. Taken together this evidence suggests the
identified genes as strong candidates for follow-up analyses. Yet, for
clinical translation, further validations and molecular
characterization are required.
In this study we suggest an integrative analysis of genomics,
transcriptomics and proteomics data of human atrial tissue to identify
genome-wide genetic effects on intermediate molecular phenotypes in the
context of AF. Our multi-omics approach permits the identification
of shared and independent effects of cis-acting variants on transcript
expression and protein abundance. Furthermore, we proposed a PRS-guided
analysis strategy to successfully investigate complex genetic networks
even with a limited sample size. By providing these unique
tissue-specific omics results as a publicly accessible database in an
interactive browser, we hope to extend the availability of valuable
resources for hypothesis generation, experimental design and target
prioritization.
Methods
Analyses were performed using [248]R 3.4.1 and 3.6.3 (r-project.org).
Genomics data in R was handled using the Bioconductor packages
[249]rtracklayer^[250]59 1.46.0 and [251]GenomicRanges^[252]60 1.38.0.
Patient cohort
Patients were consecutively enrolled in the ongoing observational
cohort study AFHRI-B (Atrial fibrillation in high risk
individuals-biopsy) independent of AF disease status. Participants were
older than 18 years of age and were scheduled to undergo open heart
coronary artery bypass surgery. Any patients with other bypass
surgeries or additional procedures, e.g. valve surgery, were excluded.
For the current analyses, N = 118 patients with multi-omics data were
available. Omics measurements were performed in batches (one batch
genotypes, two batches transcriptomics both with approximately equal
distribution of cases and controls and one batch of proteomics)
depending on the amount and quality of the material and resource
availability. Thus, the number of individuals who entered analyses
differed by omics type. Information on classical cardiovascular risk
factors and potential confounders (age, sex, body mass index, systolic
and diastolic blood pressure, hypertension, hypertension medication,
diabetes, diabetes medication, history of myocardial infarction,
smoking) was collected by questionnaire and from medical records.
Non-valvular prevalent AF was the clinical diagnosis based on patient
history and routine cardiology work-up, that was used as outcome in our
analysis. Baseline blood samples were aliquoted and stored prior to
surgery. Right atrial appendage tissue remnants were collected when the
extracorporeal circulation was started and shock frozen immediately.
Follow-up for AF and other cardiovascular disease outcomes was done by
questionnaire, telephone interview and medical chart review. The
observational cohort study was approved by the Ethikkommission
Ärztekammer Hamburg (PV3982). The study was performed in compliance
with the Declaration of Helsinki. The study enrollment and follow-up
procedures were in accordance with institutional guidelines. All
participants provided written informed consent. Sex stratification of
the results was not possible due to the inherently small number of
women in the study (Supplementary Table [253]18). Analyses were
adjusted for sex where appropriate. Analyses were performed in all
samples with respective omics data that passed appropriate quality
control as stated in the preprocessing steps. This resulted in slightly
different samples analysed in eQTL/eQTS, pQTL/pQTS and differential
protein expression analysis. Baseline characteristics of the cohort
stratified by analysis type can be found in the supplement
(Supplementary Table [254]18).
Genotypes
The genotype data were generated using the Affymetrix GeneChips
Genome-Wide Human SNP Array 6.0, with quality control (QC) at different
levels. Using the Birdseed v2 algorithm, [255]PLINK 1.9 and standard
quality control procedures^[256]61, 749,272 SNP were identified in 83
blood samples with a MAF > 0.01, HWE exact test P > 1 × 10^−6 and a
call rate > 98%. Genotypes were further imputed with IMPUTE2^[257]62
2.3.2 based on the 1000 genomes Phase 3 genotypes^[258]63,[259]64
[[260]https://www.internationalgenome.org/] (per SNP: confident
genotype calls with genotype probability > 95%, percentage of confident
genotype calls across samples > 95%) and included only variants with
HWE P > 1 × 10^−4 resulting in 5,050,128 SNPs for 83 individuals. All
samples were of central European ancestry, no close relatedness (max.
IBD = 0.04 ≪ 0.19) or population substructure could be detected.
For QTL analyses, outliers in the expression data which coincide with
rare genotypes can lead to false positive findings. For SNPs with less
than three individuals with the homozygous-minor-allele genotype, all
samples with homozygous-minor-allele genotype were therefore recoded to
heterozygous genotype.
PRS for AF
The polygenic risk score was calculated based on the LDpred omnigenic
score for AF published by Khera et al.^[261]6. To account for a
realistic representation of risk score values across the general
population, we calculated risk score values together with unrelated
1000 genomes^[262]63,[263]64 CEU individuals. Phase 3 1000 genomes
genotypes were filtered for variants in the risk score and merged with
our AFHRI genotypes, resulting in SNP data for 6,730,540 variants out
of 6,730,541 in the score. The PRS per individual was computed using
the [264]Plink 1.9 score function, imputing missing variants based on
the frequency of the risk allele. From this, percentiles across all 490
individuals (1000 genomes: 407, AFHRI cohort: 83) were further used as
PRS values for further analysis.
mRNA
The mRNA data were generated from human heart atrial appendage tissue
samples obtained during heart bypass surgery. They were frozen in
liquid nitrogen and pulverized for further analysis. RNA isolation was
performed with subsequent assessment of the RNA integrity number (RIN)
for quality determination of the samples. HuGene 2.0 ST Arrays were
used with the Affymetrix® GeneChip WT Plus Reagent Kit, the Affymetrix
GeneChip 3000 scanner and the Affymetrix Genechip Command Console
4.0.0.1567. The R Bioconductor package oligo^[265]65 1.50.0 was used to
create expression sets, perform the background correction and
quantile-normalization per sample, as well as log-transform the data.
Left atrial appendage tissues and samples with a RIN-score smaller than
6 were excluded, in case of replicates the one with the highest
RIN-score was used. The mean of multiple transcript clusters annotated
to the same gene symbol was used to derive gene level expression values
for 26,376 genes in 102 samples.
Protein
To measure the protein concentrations of 97 right atrial appendage
samples, the tissues were homogenized using a micro dismembrator
(Braun, Melsungen, Germany) at 2600 rpm for 2 min in 100 μl of 8M
urea/2M thiourea (UT). Then homogenates were resuspended in 300 μl of
UT. Nucleic acid fragmentation was gained by sonication on ice three
times for 5 s each with nine cycles at 80% energy using a Sonoplus
(Bandelin, Berlin, Germany). The homogenates were centrifuged at
16,000 × g for 1 h at 4 °C. After that, protein concentration was
determined by Bradford with BSA as standard (SE). Three micrograms
protein were reduced and alkylated and digested with LysC (1:100) for
3 h followed by tryptic digestion overnight both at 37 °C. Subsequently
peptide solutions were desalted on C18 material (μ ZipTip). Finally
mass spectrometry analysis was performed on a LC-ESI-MS/MS machine (LTQ
Orbitrap Velos). One sample was excluded due to irregularities in the
chromatographic pattern. The Rosetta Elucidator 3.3 workflow was used
to extract feature intensity and derive protein intensities by summing
of all isotope groups with the same peptide annotation for all peptides
annotated to one protein (further parameters: Uniprot_Sprot_human_rel.
2016_05: static modification: carbamidomethylation at Cys, variable
modification: oxidation at methionine, 2 missed cleavages, fully
tryptic, filtered for peptides with FDR < 0.05 corr. to Peptide Teller
probability > 0.94 and shared peptides were excluded). Intensities for
1419 proteins with one or more peptides (877 with 2 or more peptides)
were quantified for 96 samples, median-normalized and
log10-transformed. The original protein concentration was determined as
an important technical covariate and therefore used in further
analyses.
For cis-QTL computations, matched genotypes and proteomics data were
available for 1337 proteins in 75 samples including 62 missing values
(0.06% of all values) that were imputed using the KNN-method
implemented in the R bioconductor package [266]impute 1.50.1.
Protein analysis using western blot
Human atrial tissue samples (15 mg each) were pulverized in liquid
nitrogen and lyzed with M-PER Mammalian Protein Extraction Reagent
(Thermo Scientific) supplemented with protease inhibitor. Protein
concentrations were measured using a BCA assay (Thermo Scientific). The
same amount of protein for each sample was heated at 95 °C for 10 min
in 1× Laemmli. Proteins were separated on a 10% SDS-PAGE gel and
transferred to nitrocellulose membranes. Membranes were blocked with 5%
skim milk in TBS-T for 1 hour. Staining with the primary antibody was
performed overnight at 4 °C, and secondary antibody staining for 1 h at
room temperature. The following primary antibodies were used: NKX2-5
(ab205263, 1:1000), alpha-actinin (CST #3134, 1:1000), GAPDH (CST
#3683, 1:2000). The following HRP-conjugated secondary antibody was
used: goat anti-rabbit IgG (PI-1000-1, 1:10,000). The antibodies were
visualized with enhanced chemiluminescence (ECL) detection reagent
(Bio-Rad #1705060) or the SuperSignal West Pico PLUS chemiluminescent
substrate (Thermo Scientific #34579). The membranes were reprobed with
GAPDH antibody after incubation with stripping buffer (Thermo
Scientific #46430) for 4 min, washing and blocking with 5% skim milk in
TBS-T. Antibody detection was performed with a chemiluminescence
imaging system (FUSION Solo S). Blot analyses were achieved with the
Image Lab software (Bio-Rad 6.1).
Protein-per-mRNA ratios
mRNA and protein measurements were already per-sample
quantile-normalized and log-transformed. Both mRNA and protein
measurements were additionally quantile-normalized per gene and the
ratio was computed as the difference between protein and transcript
values.
Residuals
Per-sample quantile-normalized, log-transformed mRNA and protein values
were used to compute residuals. mRNA residuals were derived as the
residuals from a linear model explaining mRNA by protein levels, i.e.
mRNA ~ β[0] + β[1] ⋅ protein + ε. Protein residuals were derived as the
residuals from a linear model explaining protein by mRNA levels, i.e.
protein ~ β[0] + β[1] ⋅ mRNA + ε. Covariates were used for further
analyses but not for the calculation of residuals.
Correction for cell-type composition—fibroblast-score
Tissue samples consist of different cell-type compositions. Samples
with more fibroblasts probably contain less cardiomyocytes, one of the
functionally most relevant cell-types in primary atrial appendage
tissue. We used a fibroblast-score based on the sum of expression
values of genes upregulated in fibroblasts compared to cardiomyocytes
in rats^[267]55: ELN, FGF10, FOSB, FCRL2, SCN7A, ARHGAP20, CILP, FRAS1,
DCDC2, NRG1, AFAP1L2, ITGBL1, NOV, CLEC3B. Cardiomyocyte specific gene
signatures were avoided to prevent interfering effects due to
structural remodeling common in AF.
Genome annotations
Ensembl BioMart^[268]66 [269]GRCh37.p13 hg19 annotations were used as
genome annotations.
GWAS catalog
GWAS annotations were based on the GWAS catalog^[270]67
([[271]https://www.ebi.ac.uk/gwas/], 2019-11-26). We looked at the
traits annotated to cardiovascular measurements (EFO_0004298) and
cardiovascular disease (EFO_0000319), further referred to as
cardiovascular traits. We also distinguished the subcategories
arrhythmias, i.e. all traits annotated to atrial fibrillation, cardiac
arrhythmia, sudden cardiac arrest, supraventricular ectopy, early
cardiac repolarization measurement, heart rate, heart rate variability
measurement, P wave duration, P wave terminal force measurement, PR
interval, PR segment, QRS amplitude, QRS complex, QRS duration, QT
interval, R wave amplitude, resting heart rate, RR interval, S wave
amplitude or T wave amplitude and AF, i.e. all traits annotated to
Atrial fibrillation or QT interval based on the EFO-mapping
([272]https://www.ebi.ac.uk/gwas/api/search/downloads/trait_mappings,
2019-11-26).
VEP
Ensembl Variant Effect Predictions^[273]68 were downloaded from the
Ensembl Biomart GRCh37.p13 based on SNP rs-IDs. The label Missense was
used to summarize all possible missense consequences of the variant
(gained stop codon, a frameshift/amino-acid altering/protein-altering
variant, a lost start/stop codon, an inframe insertion/deletion).
Chromatin states
Roadmap Epigenomics ChromHMM 15 state model coremarks for human heart
right atrial appendage^[274]69 ([275]E104_15_coreMarks_dense.bed) were
used to annotate tissue-specific chromatin states.
Promoter-capture HiC
Promoter-capture HiC data from human iPSC-derived
cardiomyocytes^[276]70 [277]E-MTAB-6014
(capt-CM-replicated-interactions-1kb.bedpe) was used to annotate linked
promoter regions.
Binding sites
TF BS were based on ChIP-seq data from the ReMap TF database^[278]71
([279]ReMap2018 v1.2) filtered for highly expressed genes
(log(TPM + 1) ≥ 1) in GTEx atrial appendage tissue. Additionally,
NKX2-5 binding sites from human iPSC-derived
cardiomyocytes^[280]38 [281]GSE133833 were used. All TF BS were
filtered for a minimal overlap of 25 bp with open chromatin regions,
i.e. chromatin states 1_TssA, 2_TssAFlnk, 10_TssBiv, 6_EnhG, 7_Enh,
11_BivFlnk or 12_EnhBiv.
Fine mapping for functional NKX2-5 BS was done integrating promoter,
promoter-capture HiC, chromatin states and NKX2-5 ChIP-seq data.
Promoter regions were annotated based on [282]Gencode^[283]72 v31lift37
basic and long non-coding RNA transcript start annotations as well as
regions linked to those by promoter-capture HiC. ChIP-seq binding sites
were further overlapped with those promoter regions and filtered for
open chromatin regions (details see provided analysis code).miRNA BS
were based on [284]TargetScan 7.2 default predictions for conserved
target sites of conserved miRNA families^[285]73. RBP BS were derived
based on eCLIP data from HepG2 and K562 cell lines provided by the
ENCODE Project Consortium^[286]74,[287]75 ([288]ENCODE, Supplementary
Table [289]19). Peak calling was done using the ENCODE uniform
processing pipeline, peaks in the bed-files were further filtered for
an enrichment > log2(1), a Fisher P-value > −log10(0.05) and
overlapping peaks were then merged (details see provided analysis
code).
Cis-QTL covariates including PEER factors
Tissue-specific expression analyses remain challenging due to a large
number of confounders. PEER factors^[290]17 are widely used to account
for known and unknown factors in the context of cis-QTL
analyses^[291]18,[292]19 and were computed using the R package
[293]PEER 1.0 . One to 30 PEER factors without additional covariates,
with fibroblast-score only, with the first three genotpye principle
components, with age, sex, BMI, disease status, fibroblast-score and
with age, sex, BMI, disease status and fibroblast-score and three
genotype principle components were used as covariates in the QTL
analysis (Supplementary Fig. [294]1).
Cis-QTL computation
QTLs were calculated using the R package [295]MatrixEQTL^[296]76 2.2. A
cis-range of 1 × 10^6 bp and a linear, additive model for genotype
effect were used. Expression quantitative trait loci (eQTL), protein
quantitative trait loci (pQTL), expression residual quantitative trait
loci (res eQTL), protein residual quantitative trait loci (res pQTL)
and protein-per-mRNA ratio quantitative trait loci (ratioQTL) analyses
were performed for per-sample quantile-normalized as well as additional
per-gene quantile-normalized expression values, each for the different
sets of covariates as described above.Normalization and covariate sets
were optimized for the highest number of QTL genes (i.e. genes, with at
least one significant QTL, as previously established by Lappalainen and
colleagues^[297]18 and the GTEx consortium^[298]19), detected based on
a FDR < 0.05 (Benjamini-Hochberg procedure, Supplementary Fig. [299]1).
For the final analysis, QTLs were computed using per-sample and
per-gene quantile-normalized expression values, using only PEER factors
without additional covariates. Two-sided t-tests were performed with 12
PEER factors for eQTLs (N = 75, df = 61), 10 PEER factors for pQTLs
(N = 75, df = 63), 8 PEER factors for res eQTLs (N = 66, df = 56), 12
PEER factors for res pQTLs (N = 66, df = 52) and 9 PEER factors and the
fibroblast-score for ratioQTLs (N = 66, df = 54).
To investigate independent QTLs for the same gene, LD clumping was
performed using the [300]Plink 1.9 clump function with parameters
–clump-r2 0.5, –clump-kb 250, FDR cutoffs –clump-p1 0.05 and –clump-p2
0.8 as well as P value cutoffs –clump-p1 1 × 10^−5 and –clump-p2 0.05
for each gene and QTL type.
Definition of functional QTL categories
Shared eQTL/pQTL were defined as SNP-gene pairs with a significant eQTL
(FDR < 0.05), pQTL (FDR < 0.05) and no res eQTL (FDR < 0.05) or res
pQTL (FDR < 0.05), i.e. genetic regulation is observable on
transcriptomics and proteomics level and variation corresponding to the
SNP influence in one omic level can be explained and therefore removed
by the variation in the other omic level.
Independent eQTLs were defined as SNP-gene pairs with a significant
eQTL (FDR < 0.05) and res eQTL (FDR < 0.05) but no pQTL (FDR < 0.05)
and no res pQTL (FDR < 0.05), i.e. regulation of SNP is only affecting
transcript levels, not proteins. Also, the res eQTL disappears, if the
SNP influences protein levels too much, for example a pQTL that barely
missed the significance threshold.
Independent pQTLs were defined as SNP-gene pairs with a significant
pQTL (FDR < 0.05) and res pQTL (FDR < 0.05) but no eQTL (FDR < 0.05)
and no res eQTL (FDR < 0.05), i.e. regulation of SNP is only affecting
protein levels, not transcripts, i.e. by post-transcriptional
regulation.
Colocalization analysis
Colocalization analysis was performed using the coloc.abf() function
from the R package coloc^[301]22 3.2-1. Posterior probabilities > 0.5
were used to classify colocalization. For independent eQTLs and
independent pQTLs, we considered the sum of the posteriors for H1/H2
and H3 representing either only one independent QTL or independent eQTL
and pQTL.
Enrichment of functional elements
Similar as described by Battle and colleagues^[302]14, annotations of
the top 5 QTL hits per gene were compared to a background distribution
(100 background SNPs per QTL SNP) matched for MAF (difference ≤0.05)
and distance to TSS (difference ≤ 1000 bp). Top QTL SNPs per gene were
ranked according to the FDR of pQTLs for shared eQTLs/pQTLs, res eQTLs
for independent eQTLs and res pQTLs for independent pQTLs. Odds ratios
were computed using Fisher’s exact test (two-sided) on the
QTL-by-annotation contingency tables.
GWAS overlap and enrichments
To determine the overlap between GWAS hits and cis-QTLs, we first
annotated all GWAS hits for cardiovascular traits and RA in the GWAS
catalog^[303]67 with proxies in high linkage-disequilibrium using
SNiPA^[304]77 ([[305]https://snipa.helmholtz-muenchen.de/], EUR
population, R^2 > 0.8) as well as significant QTLs (P < 1 × 10^−5). For
each of the original GWAS SNPs, the corresponding proxy-gene pair with
the strongest QTL was selected to annotate this GWAS hit.
To assess a general enrichment of GWAS hits in QTLs, for all tested QTL
SNPs we constructed the cross tables that a SNP has significant QTL
(P < 1 × 10^−5) versus was the SNP (or R^2 > 0.8 proxy) annotated in
the GWAS catalog. These tables were evaluated for eQTLs and pQTLs for
each of the groups cardiovascular traits, arrhythmias, AF and RA using
Fisher’s exact test (two-sided).
PRS correlations/eQTS/pQTS rankings
Transcriptomics and proteomics ranking based on PRS correlations were
evaluated using linear regression models with additional covariates
age, sex, BMI, systolic blood pressure (sysBP), C-reactive protein
(CRP) and N-terminal prohormone of brain natriuretic peptide
(NT-proBNP) as well as the lead SNP for each independent cis-QTL loci
(based on LD clumps, FDR < 0.05) to correct for potential cis-effects
included in the PRS. We assume that genetic-centered analyses are less
susceptible to confounding than differential expression analyses. While
we still included the most relevant cardiovascular risk factors and
technical covariates RIN-score or protein concentration, we reduced the
number of highly correlated covariates (e.g. blood pressure,
hypertension and hypertension medication) and preferentially included
continuous covariates to avoid overfitting/overcorrecting and to
increase power. The following models were evaluated:
[MATH: mRNA~β0+β
1⋅PRS+β2⋅age+β<
mrow>3⋅sex+β4⋅BMI+β5⋅sy
sBP+β6<
/mn>⋅CRP+β7⋅NT-proBNP+<
/mo>β8⋅RIN+∑
iβiSNPi+εprotein~β0+β
1⋅PRS+β2⋅age+β<
mrow>3⋅sex+β4⋅BMI+β5⋅sy
sBP+β6<
/mn>⋅CRP+β7⋅NT-proBNP+<
/mo>β8⋅protein conc.+
∑iβ
iSNPi+ε
:MATH]
We further used summary statistics (T statistic) for β[1], equivalent
to comparing the nested models with/without the PRS and derived
corresponding two-sided P values. Degrees of freedom for genes without
significant cis-loci were 65 for mRNA (N = 74) and 64 for protein
(N = 73). Different numbers of samples compared to cis-QTL analyses are
due to missingness in the used covariates.
Pathway enrichment analysis
Gene set enrichment analysis (GSEA)^[306]27 was performed using the
Bioconductor R package [307]fgsea^[308]28 1.8.0 and MSigDB v6.1 gene
sets for Gene Ontology biological processes
([309]c5.bp.v6.1.symbols.gmt.txt^[310]27,[311]78,[312]79). To avoid
bias towards gene sets specific to human disease as e.g. KEGG pathways,
Gene Ontology gene sets, which are not linked to a disease a priori
were favoured. The GSEA method was selected to further identify the
leading edge genes, which represent the drivers of the enrichment.
Enrichments were calculated with 100,000 permutations on eQTS T-values
(considering gene sets with minimal 15 and maximal 500 transcripts) and
pQTS T-values (considering gene sets with minimal 5 and maximal 500
proteins).
SNP and gene candidate selection for trans-analyses
To reduce the multiple testing burden, trans-analyses were only
performed on AF GWAS SNPs and candidate genes derived from the gene set
enrichment analysis. We selected all SNPs with MAF ≥ 0.1 that were
annotated with atrial fibrillation in the GWAS catalog^[313]67 or the
best proxy if the annotated SNP was not measured in our dataset. We
further evaluated SNPs in high LD using SNiPA^[314]77 (R^2 > 0.5) and
took only the SNP with the highest P-value in a recent GWAS^[315]3,
resulting in 108 independent loci.
We performed power analysis for the ability to detect strong trans-eQTL
effects with our fixed sample size (N = 74 for eQTLs). The trans-eQTL
effect size was set to 21.8% of variance explained, which is the
strongest trans-eQTL found in eQTLGen^[316]80. In particular we
evaluated how many genes can be tested in a targeted trans-eQTL
analysis of all LD pruned AF loci (N = 108 SNPs) to still have a power
of at least 50% at a Bonferroni adjusted significance level of 5%,
based on power calculations for the F test. We found that 23 genes
could be tested (Supplementary Fig. [317]14). Thus we designed our
candidate selection strategy to identify the most promising 23
candidates.
Leading edge genes^[318]27 defined by GSEA on the eQTS/pQTS
associations were considered drivers of the enrichments of gene sets. A
gene set was considered significantly enriched with a FDR < 0.05
(Benjamini-Hochberg procedure). This resulted in 1261 genes for 81 gene
sets on transcript and 152 genes for one gene set on protein level.
Due to the hierarchical structure of the GO biological processes, we
favoured genes that were driving the enrichment of multiple gene sets,
i.e. also contained in smaller, more specialized child terms. For that
reason, we selected all leading edge genes as trans-QTL gene candidates
that appeared in the transcript leading edge set of 14 or more gene
sets, reducing the 1261 to 23 genes (as based on the power analysis
suggesting ≤23 genes).
Although protein candidates were much more abundant than 23 genes,
because of only one significant gene set we could not apply the same
selection strategy resulting in no further preselection.
Trans-QTL computations
Trans-QTLs were calculated using the R package MatrixEQTL^[319]76. An
additive linear model was evaluated for 108 SNPs for AF and 23
transcripts as well as 152 proteins. Additional covariates age, sex,
BMI, sysBP, CRP, NT-proBNP, the fibroblast-score and RIN-score/protein
concentration for transcripts/proteins were used similar to the QTS
analyses and resulted in two-sided t-test with 74 samples (df = 64) for
transcripts and 73 samples (df = 63) for proteins. Different numbers of
samples compared to cis-QTL analyses are due to missingness in the used
covariates. In contrast to the cis-QTL analyses, no PEER factors were
used as has been previously suggested for trans-analyses^[320]18.
NKX2-5 trans-pQTL evaluation
The rs9481842-NKX2-5 trans-pQTL was evaluated using a linear model with
additive genotype effect and covariates age, BMI, sysBP, CRP, NT-proBNP
and the fibroblast-score to explain logarithmized NKX2-5 protein
intensities normalized to alpha-actinin (N = 29, df = 21). Compared to
the original trans-QTL computations, sex was dropped as a covariate
since only one female sample was present.
NKX2-5 target definition
We were interested in investigating the link between a GWAS hit to
target genes through a trans-eQTL-regulated TF, that was not measured
on proteomics level. We therefore investigated the effect of the SNP as
well as the TF transcript on target genes on transcriptomics and
proteomics level. A graphical summary of the target definition
procedure is shown in Supplementary Fig. [321]13. To prioritize target
genes with most evidence of an association with the QTL SNP which is
mediated by the TF, we aimed to establish the following properties:
1. The target has a NKX2-5 binding site (ChIP-seq) overlapping with an
open chromatin state in the promoter or an promoter interacting
region (HiC).
To establish that the TF mediates the effect of the QTL SNP on the
target gene transcription we further need to show that:
2. The transcript expression of the target gene is associated with the
SNP genotype and
3. the association between the SNP and the target transcript
expression disappears when adjusting for TF expression levels.
Finally, the most relevant endpoint, i.e. target protein abundance,
should be mediated by the TF.
4. The protein expression of the target is significantly and
positively correlated with the transcript expression of the TF.
We do so by evaluating the following regression models (two-sided
t-tests):
1. Association of GWAS SNP with target transcript (trans-eQTL, N = 67,
df = 63): target
transcript ~ β[0] + β[1] ⋅ SNP + β[2] ⋅ fibroblast-score + β[3] ⋅ R
IN + ε
2. Independent effects of the SNP on target transcript, that are not
mediated by the TF transcript (N = 67, df = 62): target
transcript ~ β[0] + β[1] ⋅ SNP + β[2] ⋅ TF
transcript + β[3] ⋅ fibroblast-score + β[4] ⋅ RIN + ε
3. Association of target protein with TF transcript (N = 79,
df = 75): target protein ~ β[0] + β[1] ⋅ TF
transcript + β[2] ⋅ fibroblast-score + β[3] ⋅ protein conc. + ε
4. Additionally, we quantify the corresponding trans-pQTL for
Supplementary Table [322]16: Association of GWAS SNP with target
protein (trans-pQTL, (N = 66, df = 62): target
protein ~ β[0] + β[1] ⋅ SNP + β[2] ⋅ fibroblast-score + β[3] ⋅ prot
ein conc. + ε.
For direct binding a) we considered only genes with transcriptomics and
proteomics measurements and at least one functional TF BS (see methods
Binding sites: functional NKX2-5 BS).
To establish regulation by the SNP, we selected only genes with a
significant association of the SNP with the target transcript
(concordant effect to the TF expression) in regression model (1)
β[1] < 0, P(β[1]) < 0.05 to ensure (b) and additionally checked the
vanishing effect when adding the TF transcript model, i.e. for (c) we
assessed in regression model (2) P(β[1]) > 0.2. For the remaining
candidates, we performed FDR correction (Benjamini-Hochberg) on
P-values of model (3). We finally identified all proteins that satisfy
(d) by (3) FDR(P(β[1])) < 0.05 and T(β[1])) > 0 as functional NKX2-5
targets.
Partial correlations
Partial correlations were computed using the R package ppcor^[323]81
1.1.
Differential proteome analysis for AF
Differential analysis of proteins was done by comparing protein
expression of controls without AF to cases diagnosed with prevalent AF.
Cases that developed only post-operative AF after surgery were excluded
from the analysis. Since differential expression analyses with respect
to AF might be more confounded by underlying conditions, more stringent
adjustment was carried out by including covariates covs: age, sex, BMI,
diabetes, sysBP, hypertension medication, myocardial infarction,
smoking status, fibroblast-score and protein concentration. Summary
statistics for β[1](AF) in the following linear model (N = 78) were
used: protein ~ β[0] + β[1] ⋅ AF + ∑[cov ∈ covs ]β[cov] ⋅ cov + ε.
Replication in the [324]GSE128188 dataset
RNA-seq transcriptomics data of 20 atrial appendage tissue samples from
males undergoing coronary artery bypass grafting and/or atrial/mitral
valve repair or replacement was downloaded from the public repository
[325]GSE128188^[326]36. Norm factors were calculated to derive
log-transformed TMM-based RPKMs using the functions calcNormFactors()
and rpkm() from the bioconductor R package [327]edgeR^[328]82 3.28.1.
Full summary statistics for the AF differential expression analysis
were supplied by the original authors after inquiry. GSEA was performed
on the log fold changes while ranking significant genes before
non-significant ones. For left and right atrial appendage samples each,
all ten samples as well as the mean of cases and the mean of controls
were scaled and centered per gene to produce relative values comparable
across datasets.
Replication in the PXD006675 dataset
Mass spectrometry proteomics data of six left atrial tissue samples
were downloaded from the public repository [329]PXD006675^[330]30. Full
summary statistics for the AF differential expression analysis were
supplied by the original authors after inquiry. GSEA was performed on
the log fold changes while ranking significant genes before
non-significant ones. Proteomics data of triplicates for three AF cases
and three controls were median-normalized per measurement and
log-transformed. All 18 samples as well as the mean of the AF cases and
the mean of the controls were scaled and centered per gene to produce
relative values comparable across datasets.
Reporting summary
Further information on research design is available in the [331]Nature
Research Reporting Summary linked to this article.
Supplementary information
[332]Supplementary information^ (14.7MB, pdf)
[333]Peer Review File^ (4.3MB, pdf)
[334]Reporting Summary^ (337.2KB, pdf)
Acknowledgements