Abstract

   Atrial fibrillation (AF) is the most common sustained arrhythmia in
   humans, yet the molecular basis of AF remains incompletely understood.
   To determine the cell type-specific transcriptional changes underlying
   AF, we perform single-nucleus RNA-seq (snRNA-seq) on left atrial (LA)
   samples from patients with AF and controls. From more than 175,000
   nuclei we find that only cardiomyocytes (CMs) and macrophages (MΦs)
   have a significant number of differentially expressed genes in patients
   with AF. Attractin Like 1 (ATRNL1) was overexpressed in CMs among
   patients with AF and localized to the intercalated disks. Further, in
   both knockdown and overexpression experiments we identify a potent role
   for ATRNL1 in cell stress response, and in the modulation of the
   cardiac action potential. Finally, we detect an unexpected expression
   pattern for a leading AF candidate gene, KCNN3. In sum, we uncover a
   role for ATRNL1 which may serve as potential therapeutic target for
   this common arrhythmia.

   Subject terms: Gene expression, Atrial fibrillation, Transcriptomics
     __________________________________________________________________

   Characterizing atrial fibrillation (AF) at the single cell level is
   challenging. Here, the authors perform snRNA-seq on 18 patients with AF
   to investigate the cell composition, and gene expression shifts
   associated with this common arrhythmia.

Introduction

   AF is the most common arrhythmia in humans, which increases a patient’s
   risk for death, stroke, heart failure, and dementia. Clinical
   treatments for AF include stroke prevention, as well as rate and rhythm
   control, which can improve symptoms. AF has many known risk factors,
   including age, body mass index, height, hypertension, diabetes
   mellitus, sleep apnea, heart failure, smoking, alcohol use, as well as
   genetic predisposition^[52]1. While some risk factors for AF are
   modifiable, the strong genetic contribution to AF points to the need
   for personalized medicine to assess AF risk and severity more
   accurately. Moreover, the heritability also implicates several
   molecular pathways in AF onset and perpetuation.

   The onset of AF results in both structural and electrical remodeling in
   the atria. Ion channels, gap junctions, and regulators of intracellular
   Ca^2+ homeostasis are considered the principal molecular determinants
   of abnormal atrial electrical activity^[53]2,[54]3. Indeed, mutations
   in ion channel genes, and gap-junctions^[55]4,[56]5, have been found to
   alter cardiac conduction and promote AF. In addition to the molecular
   determinants of electrical remodeling, structural integrity of cardiac
   tissue also plays a major role in AF pathophysiology. Importantly,
   cardiac fibrosis presents a major impediment to physiological cardiac
   conduction and is also highly associated with AF. And while fibrosis
   and other structural remodeling-associated factors may not initiate AF,
   they are likely to contribute to the progression, persistence, and
   severity of the disease. An enhanced understanding of these molecular
   features, combined with clinical knowledge regarding modifiable risk
   factors have the potential to aid in the improvement of therapeutic
   efficacy for treating AF.

   Single-nucleus RNA sequencing (snRNA-seq) has emerged as a powerful
   tool for characterizing cell diversity in preserved human tissue
   samples^[57]6,[58]7. In addition to identifying cell types, snRNA-seq
   allows investigators to characterize tissue composition, sources of
   potential inflammation, cell status, and active gene regulatory
   networks. Recently, snRNA-seq has been applied to generate detailed
   cellular atlases of the healthy human heart^[59]8,[60]9 and to
   large-scale studies investigating a broad range of cardiovascular
   diseases including hypertrophic-, dilated-, and
   ischemic-cardiomyopathies, as well as congenital heart
   disease(s)^[61]10–[62]13. From these studies and others, cardiac
   fibroblasts (CFs) and macrophages appear to possess the potential to
   diversify during disease, particularly when compared to cardiomyocytes
   (CMs)^[63]11. However, a cell-type specific characterization of human
   AF has yet to be undertaken.

   In this study, we perform large-scale snRNA-seq on LA samples from
   patients with and without AF. We generate over 170,000 snRNA
   transcriptomes allowing us to investigate the transcriptional and
   cellular responses associated with AF. We were able to identify
   disease-associated genes in cardiomyocytes and macrophages, such as
   ATRNL1, which were significantly dysregulated in AF. Genetic
   manipulation of ATRNL1 levels in hESC-aCMs allowed us to identify a
   role for ATRNL1 in the regulation of the cell stress response and
   cardiac conduction. Our findings have implications for the development
   of therapeutic strategies for treating AF and heart failure.

Results

snRNA-seq of human LA tissue from AF cases and controls

   To characterize the cellular and molecular characteristics associated
   with AF, we performed snRNA-seq on samples from the LA of human
   patients with AF who were not in heart failure (n = 19) as well as
   non-AF controls (n = 17) (Supplementary Data [64]1, and Supplementary
   Data [65]2). We selected patients without heart failure to better
   control for the many diverse etiologies associated with this common
   arrythmia. Overall, our controls (CTRL) were 63% female, the mean age
   was 68 (SD = 7.5), 25% had a history of taking beta-blockers, and 6%
   had taken anticoagulants. The AF cases were 61% were female, the mean
   age was 66 (SD = 8.2), 56% had a history of taking beta-blockers, and
   44% were being administered anticoagulants.

   We carried out strict quality control, including sample sex check,
   unique molecular identifier (UMI) decay curve analysis, and a
   crosscheck of genotype fingerprinting with genome sequencing data (see
   Methods). A total of 2 samples failed our quality control standards and
   were removed from all subsequent analysis. Our final data set included
   18 AF cases and 16 controls (Fig. [66]1a, Supplementary Data [67]1, and
   Supplementary Data [68]2). Next, batch correction was performed with
   the single-cell variational inference framework (scVI)^[69]14.
   Low-quality nuclei, doublets, and misclassified cells were then removed
   following clustering, leaving us with 179,697 nuclei that occupied a
   total of 15 transcriptionally distinct clusters (Fig. [70]1b, and
   Supplementary Fig. [71]1). Differential expression analysis was
   performed to determine the top representative genes for each cluster
   (Supplementary Data [72]3), and then clusters were annotated based on
   the expression of canonical cell markers (Fig. [73]1c). For example,
   TBX5 was enriched in cardiomyocytes (CMs), ERG in endothelial cell
   clusters (EC-1, EC-2, and LEC), and ADIPOQ was highly expressed in
   adipocytes (Fig. [74]1d).

Fig. 1. Cellular Composition in the LA from patients with and without atrial
fibrillation.

   [75]Fig. 1
   [76]Open in a new tab

   a Schematic of experimental setup and tissue processing pipeline.
   Created partly in BioRender. Hill, M. (2021) BioRender.com/e34r263. b
   Uniform manifold approximation and projection (UMAP) representation of
   179,697 left atrial cardiac nuclei isolated from a total of 30 donors.
   c Heatmap displaying the expression of marker genes for each cluster
   (cell type). Low median z-score of normalized expression highlighted in
   green. High expression showed in purple. d Feature plots of gene
   expression. High expression shown in purple. e UMAP displaying the
   clinical designation of each transcriptome. Healthy controls denoted by
   red, and AF patients are highlighted in blue. f Embedding of the Milo
   k-nearest neighbor differential abundance testing results for all cell
   types. Neighborhoods colored by log fold changes for Control (CTRL)
   versus AF. A negative (blue) log fold change indicates enrichment in
   AF, and a positive (red) log fold change represents an enrichment in
   CTRL. Nonsignificant neighborhoods (false discovery rate (FDR) > 5%)
   are shown in white. g Milo k-nearest neighbor differential abundance
   testing results scored by individual cell type. Cell types colored as
   in Fig. 1b.

   Next, we sought to determine the changes in LA tissue composition that
   occur with AF, so we evaluated the proportions of cell-type annotated
   nuclei across all samples (Fig. [77]1e, and Supplementary Fig. [78]1b).
   Overall, control and AF samples contained similar percentages of all
   the major cardiac cell types and no clear trends were detectable. To
   formally determine which cell types were shifting in composition
   between AF and controls, we applied Milo, a statistical framework that
   avoids the resolution-associated pitfalls of clustering through
   performing differential abundance testing by assigning cells to
   neighborhoods on a k-nearest neighbor graph^[79]15. The differential
   abundance testing identified very few neighborhoods within the main
   cardiac cell types that shifted significantly between disease and
   controls (Fig. [80]1f, and Fig. [81]1g). To further evaluate changes in
   tissue composition with a different statistical approach we have
   decided to utilize a Bayesian model that we have successfully utilized
   and experimentally validated previously^[82]10,[83]13 the single-cell
   compositional data analysis (scCODA) framework (Supplementary
   Fig. [84]2c)^[85]16. No statistically significant changes in nuclei
   abundance were detected between AF patients and control donors. To
   confirm these results, we next performed trichrome stained tissue
   sections collected directly proximal to those utilized for snRNA-seq to
   evaluate tissue composition (Supplementary Fig. [86]2d). The trichome
   stained sections contained similar tissue compositions between controls
   and AF patients, consistent with our snRNA-seq analysis. Overall, these
   results suggest that the tissue composition of the LA was not overly
   different in patients with AF compared to age matched controls.

Differentially expressed genes in AF

   Next, we sought to identify individual transcripts that were associated
   with AF. We began by performing performed principal components analysis
   (PCA) across combined snRNA-seq data for each sample in a pseudo-bulk
   RNA-sequencing approach (Fig. [87]2a). Overall, we observed little to
   no transcriptional variation that could be explained by patient
   diagnosis. The greatest source of variation separating samples,
   regardless of disease status, was sex, as genes from X and Y
   chromosomes were included in our analysis. We next aimed to identify
   the differentially expressed genes between AF and control hearts for
   each individual cell type (Supplementary Data [88]4, see “Methods”).
   Very few substantial changes in transcription were identified after
   completing this comparison. However, we did observe significant
   transcriptional changes (adjusted p-value < 0.05) for 2 cell types,
   namely CMs and macrophages (Fig. [89]2e, [90]f). In CMs, we observed a
   marked upregulation of LINC01257, a lncRNA associated with pediatric
   acute myeloid leukemia^[91]17, and ATRNL1.

Fig. 2. Differentially expressed genes in atrial fibrillation.

   [92]Fig. 2
   [93]Open in a new tab

   a PCA plots for pseudobulk RNA-seq analysis of samples by diagnosis.
   Red is control (CTRL) and blue is AF cases. Female samples are shaped
   as circular and male samples by triangles. b Volcano plot displaying
   the differentially expressed genes in macrophages between controls and
   AF cases. c Volcano plot displaying the differentially expressed genes
   in CMs between controls and AF cases. Volcano plots show log fold
   change and two-sided P-value expression changes between AF and CTRL for
   each gene tested using limma–voom differential expression analysis
   (“Methods”).

ATRNL1 localizes to the intercalated disk in cardiomyocytes

   Attractin Like 1 (ATRNL1) is highly conserved transmembrane
   glycoprotein postulated to function in cell adhesion, cell signaling,
   and neuronal energy homeostasis. To our knowledge, only a single
   patient with an ATRNL1 mutation has been described, and this patient
   was heterozygous for a 325 kb microdeletion (10q25.3) that included
   ATRNL1^[94]18. The patient presented with a dysmorphic facial
   appearance, ventricular septal defect, borderline microcephaly, global
   developmental delay with autistic features and mild ataxia, suggesting
   that the observed phenotype is the result of haploinsufficiency of
   ATRNL1^[95]18. Interestingly, ATRNL1 knockout mice have no such
   phenotype^[96]19. Thus, a clear role for ATRNL1 in cardiac development
   and physiology has yet to be understood. We evaluated the localization
   of ATRNL1 with tissue immunofluorescence in fresh frozen LA samples
   obtained from patients included in our snRNA-seq analysis (Fig. [97]3a,
   and Supplementary Fig. [98]3a). We found that ATRNL1 was expressed
   primarily in α-actinin^+ CMs. Interestingly, ATRNL1 localized strongly
   to the longitudinal borders of CMs at the intercalated disc. To confirm
   this subcellular localization, we co-stained the same LA tissues with
   the intercalated disc component connexin 43 (Cx43/GJA1) (Fig. [99]3b,
   Supplementary Fig. [100]3b) and found that ATRNL1 and Cx43 both
   localize at intercalated disks in human LA tissue samples (Pearson’s
   correlation coefficient = 0.87). Further, we confirmed the expression
   and co-localization of ATRNL1 with Cx43 in hESC-aCMs (Supplementary
   Fig. [101]3c). To assess the expression of ATRNL1 across different
   anatomical regions of the human heart we investigated the Human Heart
   Atlas^[102]20. We found that ATRNL1 was expressed in both ventricular
   and atrial cardiomyocytes (Supplementary Fig. [103]4a, [104]b).
   Further, we found that ATRNL1 was expressed in pacemaker and epicardial
   cells (mesothelial). To confirm the presence of ATRNL1 in ventricular
   cardiomyocytes we performed immunofluorescence staining in non-failing
   LV samples and observed that ATRNL1 was localized at the intercalated
   disk in ventricular cardiomyocytes of the LV (Supplementary
   Fig. [105]4c).

Fig. 3. Subcellular localization of ATRNL1 in human left atrial tissue.

   [106]Fig. 3
   [107]Open in a new tab

   a Tissue immunohistochemistry for nuclei/DAPI (blue), α-actinin (red),
   and ATRNL1 (green) in CTRL (n = 3) and AF (n = 3) LA tissue. A total of
   4 images were evaluated per tissue section. b Tissue
   immunohistochemistry for nuclei/DAPI (blue), Cx43 (red), and ATRNL1
   (green) in CTRL (n = 3) and AF (n = 3) LA tissue. Scale bars = 100 μm.
   A total of 4 images were evaluated per tissue section.

Cardiomyocytes are dosage-sensitive to ATRNL1

   In AF patients we found that ATRNL1 was significantly higher expressed
   than in control LA tissues (Supplementary Fig. [108]5a). To
   comprehensively characterize the molecular role of ATRNL1 in CMs we
   elected to both knockdown its expression with siRNAs, as well as
   overexpress it with a lentiviral cDNA approach (Fig. [109]4a). While
   overexpression is more relevant for AF (Fig. [110]4d), the knockdown
   experiments were conducted to aid our analysis of the molecular
   function of ATRNL1 in CMs. First, we evaluated the expression of ATRNL1
   in human tissues with data derived from the Genotype-Tissue Expression
   (GTEx) repository and identified two predominant isoforms of ATRNL1, a
   long isoform and a short isoform (Supplementary Fig. [111]5b). The
   predominant isoform in the brain is the long isoform, while the testis
   almost exclusively express the short isoform. Interestingly, the heart
   appears to express appreciable levels of both isoforms, though with a
   bias towards the short ATRNL1 transcript. We designed an siRNA to
   target the long isoform (siRNA-1), and another to target both the long
   and the short isoforms (siRNA-2).

Fig. 4. ATRNL1 is part of the CM cell stress response.

   [112]Fig. 4
   [113]Open in a new tab

   a Experimental overview for RNA-seq. Created partly in BioRender. Hill,
   M. (2021) BioRender.com/e34r263. b Correlation heatmap of RNA-seq
   sample distances. Euclidean distance between samples based on DESeq2
   normalized gene expression (“Methods”). Individual libraries are
   annotated type of treatment and condition. We profiled 3 replicates for
   each condition (total n = 15). c Volcano plot displaying the
   differentially expressed genes in hESC-aCMs between siRNA-1 treated
   cells and Scramble siRNA control treated cells. Genes highlighted in
   yellow are significantly differentially expressed (adjusted
   p-value < 0.05, and Log2 Fold Change > 0.5). d Volcano plot displaying
   the differentially expressed genes in hESC-aCMs between siRNA-2 treated
   cells and Scramble siRNA control treated cells. Genes highlighted in
   green are significantly differentially expressed (adjusted
   p-value < 0.05, and Log2 Fold Change >  0.5). e Volcano plot displaying
   the differentially expressed genes in hESC-aCMs between ATRNL1-OE
   treated cells and control GFP lentiviral treated cells. Genes
   highlighted in orange are significantly differentially expressed
   (adjusted p-value < 0.01, and Log2 Fold Change > 1.0). f Scatterplot of
   all combined significantly differentially expressed genes compared
   across siRNA-1 treated cells and siRNA-2 treated cells. Individual
   genes are colored by their location on the Cartesian coordinate plane,
   separating them into quadrants. Representative genes are highlighted. g
   Enrichment map for gene pathway over-representation analysis colored by
   quadrant from Fig. [114]5f. The size of each dot represents the number
   of genes in that pathway category. h Scatterplot of all combined
   significantly differentially expressed genes compared across siRNA-2
   treated cells and ATRNL1-OE cells. Individual genes are colored by
   their location on the Cartesian coordinate plane, separating them into
   quadrants. Representative genes are highlighted. i Heatmap for pathway
   enrichment analysis colored by significance of enrichment (2-sided
   P-value). Quadrant are annotated and colored according to Fig. [115]5h.
   Volcano plots show log fold change and two-sided P-value expression
   changes between AF and CTRL for each gene tested using DEseq2
   differential expression analysis (“Methods”).

   For the overexpression we chose to focus on the short isoform
   (ATRNL1-OE), due to the size restrictions of lentiviral vectors. We
   manipulated ATRNL1 expression with these reagents in hESC-aCMs and then
   performed total RNA-seq (Fig. [116]4b). Next, we performed differential
   expression analysis and found that for both siRNAs we achieved a
   significant knockdown of ATRNL1 expression (adjusted p-value < 0.05,
   log2 fold change cutoff = 0.5)(Fig. [117]4c, d, and Supplementary
   Data [118]5–[119]6). Further, both siRNAs lead to decreased expression
   of similar genes including, THBS1, a stress- and injury-induced
   matricellular protein known to mediate cellular attachment^[120]21.
   Overexpression of ATRNL1 in hESC-aCMs caused the downregulation of
   proliferation-related genes like CENPU, and the upregulation of VEGFA,
   and the voltage gated calcium channel gene CACNA1I (adjusted
   p-value < 0.01, log2 fold change cutoff = 0.5)(Fig. [121]4e, and
   Supplementary Data [122]7).

   To determine if the rarer long isoform had a distinct function in
   hESC-aCMs, we examined the effect of siRNA-1 (targeting the
   long-isoform) and siRNA-2 (targeting the short and long isoforms)
   (Fig. [123]5f). We selected all genes that were significantly
   differentially expressed in at least one siRNA treatment condition, and
   then compared the directionality of expression across both experiments.
   Many genes displayed concordant directionality for both siRNAs, with
   451 genes being downregulated, and 998 being upregulated in both
   conditions (Supplementary Data [124]8). We explored the transcriptional
   characteristics of ATRNL1 deficient iPS-CMS by performing pathway
   enrichment analysis on these differentially expressed genes
   (Fig. [125]4g). The genes that are upregulated upon depletion of ATRNL1
   (quadrant I) are associated most with cardiac conduction, connexons
   trafficking, potassium channels, calcium signaling, adherens junctions,
   and glycosylation. Genes that are down regulated when ATRNL1 was
   knocked down (quadrant III) are associated with cholesterol
   biosynthesis, the unfolded protein response (UPR), endoplasmic
   reticulum (ER) stress, and protein transport. Overall, these results
   suggest that, at least in hESC-aCMs, both isoforms function similarly
   to actively reduce cell stress and regulate levels of key
   conduction-related pathways in CMs.

Fig. 5. Modulation of ATRNL1 alters the atrial action potential duration.

   [126]Fig. 5
   [127]Open in a new tab

   a Representative action potentials recorded by optical mapping of
   ArcLight fluorescence in hESC-aCMs transfected with scrambled siRNA
   (CTRL) or siRNA-2 targeting ATRNL1. b Bar plot displaying the
   mean ± SEM of action potential durations at 80% repolarization (APD80,
   ms) in CTRL and ATRNL1 siRNA-2 treated hESC-aCMs. N = 29 in CTRL and
   n = 20 in ATRNL1 siRNA-2, N is the total number of wells from three
   independent biological replicates. **p = 0.0092, CTRL vs. siRNA-2,
   two-tailed Mann-Whitney test. c Representative tracings of action
   potentials in hESC-aCMs transduced with lentivirus carrying an empty
   vector backbone (CTRL), or an ATRNL1 overexpression construct
   (ATRNL1-OE). d Bar plot displaying the mean ± SEM of APD80 in hESC-aCMs
   overexpressing the short isoform of ATRNL1 or CTRL. N = 26 in CTRL and
   N = 23 in ATRNL1-OE from two biological replicates. ****p < 0.0001,
   CTRL vs. ATRNL1-OE, two-tailed Mann-Whitney test.

   We next integrated the results of the knockdown experiments with the
   ATRNL1-OE dataset (Fig. [128]4h, and Supplementary Fig. [129]5c). We
   selected all genes that were significantly differentially expressed in
   at least one condition, the siRNA-2 and/or ATRNL1-OE experiment, and
   then compared the directionality of expression across both experiments
   (Supplementary Data [130]9). First, we wanted to identify genes that
   displayed concordance between experiments. Genes and pathways that are
   activated by ATRNL1, directly or indirectly, should be decreased in the
   knockdown and upregulated with the overexpression of ATRNL1 (quadrant
   II). We identified 450 such genes that fit this description. For
   genes/pathways negatively impacted or inhibited by ATRNL1, they should
   be upregulated after its depletion and strongly downregulated with
   ATRNL1 overexpression (quadrant IV). There were 690 genes negatively
   impacted by ATRNL1 expression. The discordantly regulated genes in
   quadrants I and III were abundant, with 775 and 339 gene members,
   respectively. Such large numbers highlight the dose sensitivity of CMs
   to ATRNL1 levels and may also be an indication of its role within an
   intricate feedback loop or represent noise within our experimental
   system. Pathway enrichment analysis was next performed on all
   concordant and discordant quadrants (Fig. [131]4i). We found that
   ATRNL1 positively regulates, directly or indirectly, pathways and gene
   sets related to MTORC1 signaling, hypoxia, cell stress, ER stress,
   glucose metabolism, protein secretion/transport, and ion transport
   (quadrant II). Conversely, ATRNL1 inhibits, directly or indirectly,
   genes and pathways related to cell cycle progression, cell junction
   organization, endocytosis, glycosylation, calcium signaling, and
   synaptic transmission (quadrant IV).

   The large number of gene expression changes observed following the
   manipulation of ATRNL1 levels prompted us to look at more diverse forms
   of gene regulation. Interestingly, a circular form of the ATRNL1
   transcript has been previously described in several cancer
   studies^[132]22–[133]25. We re-analyzed our total RNA-seq data from
   hESC-aCMs to detect circular RNAs. We identified several circular RNA
   isoforms of ATRNL1 (circATRNL1) (Supplementary Fig. [134]6a). The most
   reliably detectable circATRNL1 species spanned from exon 2 through exon
   8 (Supplementary Fig. [135]6b). This circular RNA was not detectable in
   2 out of our 3 siRNA-2 libraries, suggesting that the siRNA is also
   capable of targeting this molecule. In cancer cicrATRNL1 is thought to
   act as a microRNA (miRNA) sponge^[136]25. So, we next scanned the
   circATRNL1 transcript for putative miRNA binding sites. We restricted
   this analysis to miRNAs conserved across mammals, giving us a total of
   231 miRNAs to analyze. We filtered all hits by imposing a dissociation
   constant (log(Kd)) cutoff of -4, and were left with 20 putative miRNAs
   (Supplementary Data [137]10). Next, we sought to determine the gene
   targets of these microRNAs. For miRNA target-enrichment we incorporated
   the results from 2 well-known predictive models, TargetScan^[138]26 and
   miRTarBase (Supplementary Fig. [139]6c)^[140]27,[141]28. We then
   intersected these results with our combined differentially expressed
   genes (Fig. [142]4h) and identified a total of 98 genes (Supplementary
   Fig. [143]6d, and Supplementary Data [144]11). Next, we wanted to
   determine which other putative gene networks could be regulated by
   circATRNL1 in CMs, so we performed pathway enrichment analysis on the
   98 genes that intersected our differential expression data
   (Supplementary Fig. [145]6e). Consistent with circATRNL1s suggested
   role as a tumor suppressor, cell cycle was among the most enriched
   pathways. Further, we found genes enriched in pathways/processes
   related to MTORC1 signaling, hypoxia, cell-cell adhesion, TGFB-, and
   WNT-signaling.

ATRNL1 levels effect atrial cardiomyocyte electrophysiology

   Our transcriptomics data indicated that elevated levels of ATRNL1
   promotes the expression of many genes with roles in cardiac conduction
   and calcium signaling (Fig. [146]4). Changes in expression of such
   genes could lead to changes of the electrophysiological properties of
   CMs and promote arrhythmias like AF. Further, reduced expression of
   intercalated disc components can lead to the misexpression of others,
   which can lead to diminished cell-cell coupling and arrhythmias.
   Previously, it was found that with reduced N-cadherin levels (an
   important intercalated disc protein component) you also observe reduced
   Cx43/GJA1 expression which predisposes animals to arrhythmias^[147]29.
   Interestingly, we also observed significant changes in Cx43/GJA1
   expression in our ATRNL1 knockdown experiments (Fig. [148]4d). Thus, we
   next wanted to determine if ATRNL1 plays a functional role in CM
   electrophysiology. For these experiments we wanted to phenotype
   hESC-aCMs through measuring their action potentials. The cardiac action
   potential duration (APD) determines the refractory period of the heart
   and if it becomes too short or too long, then the result is an
   arrhythmia^[149]30. Importantly, APD is focused on so heavily because
   this parameter is essential for modeling inherited arrhythmogenic
   syndromes (short and long QT syndromes) and for the pharmacologic field
   of cardiac safety^[150]31. We transduced hESC-aCMs with a lentivirus
   carrying a voltage-sensitive fluorescent protein, ArchLight^[151]32. We
   then manipulated the levels of ATRNL1 expression in these cells
   (Supplementary Fig. [152]7a, b), recorded their action potentials with
   a high-speed fluorescence imager, and finally calculated their APD at
   80% repolarization (APD80). Knocking down ATRNL1 with siRNA-2
   significantly prolonged the APD (Fig. [153]5a, b). Conversely,
   overexpressing ATRNL1 significantly shortened the APD of hESC-aCMs
   (Fig. [154]5c, d). Overall, these results suggest that the manipulation
   of ATRNL1 in atrial CMs is arrhythmogenic.

snRNA-seq and AF GWAS uncover unexpected expression of KCNN3

   Finally, we sought to use our large scale snRNAseq dataset from the
   human LA to prioritize genes at the known AF genetic loci. To date,
   genome-wide association studies (GWAS) have identified more than 150
   genetic loci associated with AF. However, a limitation of GWAS is that
   they implicate a genetic locus rather than directly a gene. The term
   effector gene is typically used to refer to the causal gene at a GWAS
   locus, that is driving the common variant signal observed in the study.
   Finding the effector gene at each locus remains challenging. Typically,
   gene prioritization approaches are applied to identify a subset of
   candidate effector genes for follow-up. One available gene
   prioritization method is the intersection with gene expression in
   relevant tissues and cell types. Since the LA is a highly relevant
   tissue for AF, our dataset serves as a rich resource to perform such
   analyzes.

   Overall, we identified 1379 marker genes from our snRNA-seq dataset
   (Supplementary Data [155]3) with roughly a quarter of the marker genes
   (24.3%) arising from CMs. We then sought to intersect these markers
   with the genes at the genetic loci of a previously published AF GWAS
   study^[156]33 (Fig. [157]6a). At each GWAS locus (defined as 1 mega
   base (Mb) around the sentinel variant) there can be many effector gene
   candidates. And, at the 104 AF GWAS loci we found a total of 1793
   candidate genes. Importantly, some of these loci contained over 60
   candidate genes, emphasizing the pressing need for prioritization
   (Fig. [158]6b). By intersecting the GWAS candidate genes with the
   marker genes for each cell type of our dataset, we were able to
   highlight 1 gene, in most cases. The rare cases included 6 loci each
   with 2 prioritized candidates, and 3 additional loci each with 3 gene
   hits. In sum, we found at least 1 candidate effector gene using this
   approach for 52 of the 104 AF GWAS loci. The majority of the 59
   candidate effector genes are CM marker genes (N = 31), and up to 5 each
   are coming from the other cell types (Fig. [159]6c). A few genes are
   marker genes in multiple cell types, namely REEP1 (CMs and smooth
   muscle cells(SMCs)), CD96 (Adipocytes, and T-cells), RNF144B
   (endothelial cells (EC-1, EC2, and macrophages (MΦ)), AKAP6 (CMs, and
   SMCs), and MYOCD (CMs, and SMCs). We also identified several known
   candidates for AF that have been previously described, including
   TTN^[160]34, ZFHX3^[161]35, TBX5^[162]36 and KCNIP2, which supports the
   validity of our approach. Next, we evaluated the expression of all
   candidate AF loci across the cell states found in the Human Heart Cell
   Atlas (Supplementary Fig. [163]8a)^[164]20. Our data matched the Heart
   Cell Atlas cell state annotations well. The CM AF loci were expressed
   across the different subtypes of atrial and ventricular CMs, as well as
   pacemaker cells, and Purkinje cells. Some candidate genes did display
   differential expression among the CM clusters, including known
   atrial-enriched transcripts like TBX5 and MYL4^[165]37. Further, with
   the large enrichment of immune cells in the Human Heart Atlas dataset
   we found improved resolution among our immune cell-enriched candidate
   AF loci. For example, the macrophage marker gene RNF144B exhibited high
   expression in neutrophils. RNF144B is an E3 ubiquitin ligase known to
   promote lipopolysaccharide-inducible IL-1b expression and inflammasome
   priming in human macrophages. The interconnection of inflammasome
   activation and AF have been recently established^[166]38. Overall, the
   Human Heart Atlas data agrees with our cell type annotations and
   provides improved cell-type-specific expression information for the
   identified AF GWAS loci. Additionally, the gene KCNN2 emerged as a
   candidate effector gene at the genetic locus 5q22.3 (sentinel variant
   rs716845).

Fig. 6. Integration of AF snRNA-seq data with a multi-ancestry genome-wide
association study for AF.

   [167]Fig. 6
   [168]Open in a new tab

   a Manhattan plot for Roselli et al. AF GWAS main results^[169]17. The
   dotted line marks the cutoff for genome-wide significance
   (P-value < 5 × 10^-8, that accounts for multiple testing across the
   genome). Highlighted in blue are the genetic loci that reached
   genome-wide significance, which were used to select the region (1 mega
   base (MB) around sentinel variants) and identify overlapping genes for
   the integration with snRNA-seq data. The y-axis of the plot has a break
   between -log10(P-value) of 30 and 510 to for visibility. The
   significance level accounts for multiple testing of independent
   variants with using a Bonferroni correction. P values (two-sided) were
   derived from a meta-analysis using a fixed-effects model with an
   inverse-variance weighted approach. b Stacked dotplot with one column
   for each genetic locus examined. Each dot represents a gene at the
   genetic locus, showing the range of number of genes within 1 MB of
   sentinel variants. Some loci including more than 60 genes close to the
   sentinel variant. Genes that overlap with the marker genes for the
   snRNA-seq cell types are labeled in cell-type specific colors as in
   Fig. [170]1b. Gray dots indicate genes that did not overlap. The
   cell-types color scheme is shown in the legend. c Circular plot of all
   marker genes that intersected with the AF GWAS loci, sorted by percent
   of cells expressing the gene and cell-type. Cell types are labeled in
   cell-type specific colors as in Fig. [171]1b. d Dot plot displaying
   expression for each indicated cell population. Size represents the
   percent of cells expressing each marker. e RNA-ISH images of LA tissue
   displaying expression of KCNN2 (shown in pink) and the CM markers
   PPP1R3C (top, shown in blue) and DOK7 (bottom, shown in blue). Scale
   bars = 50 μm. f RNA-ISH images of LA tissue displaying expression of
   KCNN3 (shown in pink) and the CM markers PPP1R3C (top left, shown in
   blue) and DOK7 (bottom left, shown in blue). Expression of KCNN3 (shown
   in pink) and the LEC markers CCL21 (top right, shown in blue) and RELN
   (bottom right, shown in blue). Scale bars = 50 μm.

   KCNN2 and KCNN3 encode small conductance calcium-activated potassium
   channels (SK). The channels are known to regulate cardiac excitability
   and are gated by changes in intracellular Ca2+ derived from the
   sarcoplasmic reticulum^[172]39. Importantly, SK channels are known to
   contribute to arrythmias, as it has been found that both gain- and
   loss-of-function of SK channels can increase AF
   susceptibility^[173]39–[174]42. In the search for novel
   antiarrhythmics, inhibition of small conductance calcium-activated
   potassium, or SK, channels has recently been proposed as a therapeutic
   approach. SK channels are encoded by the KCNN1 (KCa2.1; SK1), KCNN2
   (KCa2.2; SK2), and KCNN3 (KCa2.3; SK3) genes^[175]43. Both KCNN2 and
   KCNN3 have been independently associated with AF in genome-wide
   association studies^[176]33,[177]44–[178]46.

   To further characterize the expression patterns of KCNN2 and KCNN3 we
   evaluated their transcript abundance across all snRNA-seq clusters in
   comparison to other cell type-specific GWAS loci mentioned above
   (Fig. [179]2d). Interestingly, while KCNN2 was expressed highly by CMs,
   KCNN3 levels were surprisingly low in CMs. Rather, KCNN3 was expressed
   by lymphatic endothelial cells (LECs), a finding that was confirmed by
   RNA in situ hybridization (RNA-ISH) of fresh frozen LA tissue as
   follows. We began by identifying markers from the snRNA-seq data that
   were: a) specific for the cell populations of interest, CMs and LECs,
   and b) expressed at levels comparable to KCNN2 and KCNN3. The genes
   DOK7 and PPP1R3C fit these criteria and were therefore used to identify
   CMs in LA tissue. RELN and CCL21 were selected as appropriate markers
   for the LEC population. Our RNA-ISH results indicated that KCNN2
   expression is widespread in atrial tissue and often observed in cells
   simultaneously expressing the CM markers DOK7 and PP1R3C
   (Fig. [180]6e). These results support our snRNA-seq data and existing
   literature that shows widespread KCNN2 expression in the human heart,
   specifically within CMs. When KCNN3 expression was similarly assessed
   by RNA-ISH, we observed sporadic groups of cells positive for KCNN3,
   and in many cases, these KCNN3^+ cells were also co-expressing the
   LEC-specific marker genes, RELN and CCL21 (Fig. [181]6f). Performing
   RNA-ISH with probes targeting KCNN3 conjointly with the CM markers DOK7
   and PPP1R3C failed to provide any evidence of co-expression; rather,
   there was clear spatial separation of cells expressing KCNN3 and these
   CM markers (Fig. [182]6f). Together, these results demonstrated that
   KCNN3 is expressed in LECs, and not in CMs as was previously expected.

Discussion

   We present a comprehensive snRNA-seq analysis of LA tissue from
   patients with and without AF. In total, our dataset includes 34
   individuals, and we were able to generate an embedding of over 170,000
   nuclei from LA tissue. We identified a gene significantly upregulated
   in AF patients, ATRNL1. We performed immunofluorescence analysis of
   human LA tissue and hESC derived cardiomyocytes, transcriptomic
   profiling of siRNA and overexpression cells, and electrophysiological
   phenotyping to characterize the role of ATRNL1 in AF and cardiomyocyte
   physiology. Finally, we integrated our large-scale transcriptomics data
   with high-quality AF GWAS results to prioritize effector genes at
   single-cell resolution.

   In our snRNA-seq analyzes, we did not identify large changes in
   cellular composition in the LA from patients with AF compared to
   control donors. Interestingly, it’s generally acknowledged that
   fibrotic replacement of the atrial myocardium is the basis of atrial
   pathology in patients with AF^[183]47. However, we did not observe any
   obvious shifts in cardiac fibroblast (CF) composition or cellular
   status. Further, there is a high correlation of AF with increased
   amounts of epicardial adipose tissue (EAT) and subsequent infiltration
   of that adipose tissue into the myocardium^[184]48,[185]49. Adipose and
   lipid accumulation in the atrial myocardium is also thought to promote
   fibrosis and immune cell infiltration. Increased inflammation and
   fibro-fatty tissue accumulation are believed to strongly contribute to
   the development and maintenance of substrates for arrhythmias. Yet,
   from our study presented here, we did not see a dramatic change in
   tissue composition after comparing between control hearts and patients
   with AF. A potential limitation of our study would that the observed
   tissue compositional variability among the patients profiled here may
   be difficult to properly account for even with modern analytical
   frameworks (e.g., Milo and scCODA) and could require larger sample
   sizes with greater statistical power to determine the distinctions more
   accurately between patient groups. Moreover, profiling more patients at
   a greater depth (nuclei per sample) could improve our statistical power
   to accurately identify differentially expressed genes across more cell
   types^[186]50. Further studies on a broader range of AF cases
   classified from mild to severe (e.g., AF with heart failure) could help
   identify subtle changes in intercellular signaling and tissue
   composition that accompany AF.

   Interestingly, a recent scRNA-seq study conducted on patients
   undergoing open heart surgery with AF and mitral valve regurgitation
   found evidence of macrophage expansion and a commensurate decrease in
   endothelial and mural cell composition in the left atrial appendage
   compared to controls^[187]51. The flow cytometry-based cellular
   enrichment approach taken in that study may be an ideal approach for
   characterizing the changes in all non-cardiomyocytes from living
   tissue, especially rare immune cells that our nuclei-based profiling
   approach may not properly account for. Importantly, here we only
   focused on the LA, an important tissue for evaluating AF. In AF the
   right atrium (RA) is also affected, however, the cellular and molecular
   changes to the RA in AF are not well understood. The pulmonary vein
   (PV) is known to play a role in the pathogenesis of AF, and catheter
   ablation of foci in the PV is an effective treatment for
   arrhythmias^[188]52,[189]53. Future studies interrogating the single
   cell characteristics of the PV-LA junction, and the RA are also
   warranted.

   Although there has been growing interest in SK channels as therapeutic
   targets for the treatment of AF, the close homology between the SK
   channels and the lack of isoform-specific tools have presented
   considerable challenges to establishing the role of SK2 and SK3 in the
   heart at steady-state and during AF. Previous studies of SK channel
   expression in the heart have relied on qRT-PCR, Western Blot, and
   immunofluorescence analysis of isolated CM or cardiac tissue
   lysates^[190]54–[191]60. Our goal was to use snRNAseq to obtain
   additional information about the expression of KCNN2 and KCNN3 at the
   level of individual cell populations to help clarify the role of these
   channels in the adult human heart. Our snRNAseq studies demonstrated
   that while KCNN2 was readily identifiable in the CMs of the heart,
   KCNN3 expression was negligible within the CM compartment. Rather,
   KCNN3 was expressed by cells falling into what we annotated as LECs.
   These results were validated using RNA-ISH where again, KCNN2 was
   easily detectable in cells bearing common CM markers, while KCNN3 was
   found in cells co-expressing markers specific to LECs. Recently it was
   discovered that in human small resistance KCNN3 colocalizes with TRPV4
   channels in endothelial caveolae^[192]61,[193]62. The interaction of
   these proteins allows Ca^2+ entry through TRPV4 channels, initiating
   the endothelium-derived hyperpolarizing factor response and contributes
   to vasodilation and regulation of blood flow and blood
   pressure^[194]63. This interaction of KCNN3 and TRPV4 was found to be
   impaired in the small resistance arteries of patients with hypertension
   and in a mouse model of hypertension^[195]63. Importantly, among
   patients with persistent AF, hypertension is present in about 60% to
   80% of individuals^[196]64. Future studies will determine if KCNN3
   plays a role in the development of AF and may help shed light on the
   well-known epidemiological association between AF and hypertension.

   A major finding of our study was the differential expression of ATRNL1
   in CMs from patients with AF. The role of ATRNL1 in cardiovascular
   biology is unknown. However, a patient heterozygous for a deletion of
   ATRNL1 has been identified with a heart murmur and a partial
   ventricular septal defect^[197]18. Interestingly, a Lebanese patient
   with a 64 kb heterozygous deletion of exons 9-13 of ATRNL1 has recently
   been identified and this patient didn’t present any abnormal cardiac
   phenotype^[198]65. This may suggest that only the short ATRNL1 isoform
   is required for proper human cardiac development, as exon 9 is the
   first exon distinguishing the long isoform from the short, which is
   only 8 exons long. When we selectively knocked down the long isoform of
   ATRNL1, we overall observe a similar transcriptional profile to when we
   knocked down both the long and the short isoforms. The long isoform is
   distinguished from the short at the protein level via the following
   additional features; a C-type lectin-like domain, several
   plexin-semaphorin-integrin (PSI) domains, a lamin-type EGF domain, a
   transmembrane helix, and a small domain predicted to be localized in
   the cytoplasm. The short isoform lacks the predicted transmembrane
   helix domain, but contains features typically associated with
   extracellular or secreted proteins like EGF domains, a CUB domain, and
   a lectin domain. Our immunofluorescence analysis indicated that ATRNL1
   localizes to the intercalated disk and may also be localized in the
   cytoplasm. The antibody used in this study recognizes an antigen only
   found in the long ATRNL1 isoform. Interestingly, in other cell types
   ATRNL1 has been found to be localized to the nucleoplasm and
   mitochondria (proteinatlas.org)^[199]66. Future studies dissecting the
   functional roles of these two isoforms, including the protein-protein
   interactions and subcellular localization will be invaluable for
   deciphering the role of ATRNL1.

   We found that knockdown of ATRNL1 led to an upregulation of genes
   involved in cardiac conduction, and a downregulation of genes
   associated with the UPR, protein glycosylation, cholesterol
   biosynthesis. While a role has been proposed for ATRNL1 in energy
   homeostasis previously^[200]19, no other clear molecular functions have
   been ascribed. Our analysis indicated a broader function for ATRNL1 and
   implicated it as a regulator and/or sensor of various pathways
   associated with the cellular response to stress. ATRNL1 appears to
   repress cell cycle associated pathways, cell junction organization,
   calcium signaling, and genes involved in synaptic transmission. Given
   the highly pleiotropic nature of the phenotype caused by a heterozygous
   ATRNL1 deletion in humans, it could be anticipated that ATRNL1 may have
   several molecular functions. ATRNL1 appears to positively impact the
   expression of Natriuretic Peptide Receptor 3 (NPR3), which is a
   clearance receptor for natriuretic peptides and has been shown to have
   an anti-apoptotic role in CMs^[201]67. The ER-associated degradation
   (ERAD) complex member HERPUD1, was similarly affected by ATRNl1, and
   HERPUD1 also appears to play a protective role in CMs as it’s been
   identified as an inhibitor of pathologic cardiac hypertrophy^[202]68.
   We also observed a strong enrichment of genes related to MTORC1
   signaling. While MTORC1 signaling is required for heart development,
   some studies suggest that pharmacologically inhibiting MTORC1 signaling
   is cardioprotective^[203]69. Further, we found that ATRNL1 also
   activates the expression of MYC. In CMs it has been found that MYC
   regulates metabolism and mitochondrial biogenesis in response to
   pathologic stress and enhances the heart’s ability to respond to
   ischemic insults^[204]70. Similarly, ATRNL1 levels had an inhibitory
   impact on the expression of the thioredoxin interacting protein TXNIP,
   and mouse studies have found that knockout of TXNIP Deletion of
   thioredoxin-interacting protein in mice impairs mitochondrial function
   but protects the heart from ischemia-reperfusion injury^[205]71.
   Together, these findings suggest a cardioprotective role for ATRNL1. A
   limitation of our study is the use of hESC-aCMs, which are not as
   mature as adult human CMs in respect to features like sarcomere
   structure, Ca2+ kinetics, and ion channel density. Recent developments
   in 3D cardiac tissue technologies and methods have led to improvements
   in CM maturation and cardiac tissue modeling, including engineered
   heart tissue (EHT)^[206]72,[207]73. Future work evaluating the
   physiological role of ATRNL1 in CMs will need to be conducted in EHTs
   or other 3D cardiac tissue models containing more mature and
   therapeutically relevant cell populations.

   Altogether, our study has identified several findings regarding the
   cellular and molecular mechanisms underlying AF. First, we found that
   there were no significant changes in cellular composition in the LA
   from patients with AF compared to control donors. This suggests that
   the development of AF may not be due to a single cell type or
   population, but rather a complex interplay between multiple cell types,
   and potentially even cardiac regions. Second, we found that the
   expression of the gene ATRNL1 was significantly upregulated in CMs from
   AF patients. Finally, we found that knockdown of ATRNL1 in hESC-aCMs
   causes an upregulation of genes involved in cardiac conduction, and a
   downregulation of cell stress response genes and pathways. These
   findings suggest that ATRNL1 may have a cardioprotective role. However,
   we also found that elevated ATRNL1 levels cause a shortening of the
   APD, and thus may be arrhythmogenic. The dosage sensitivity of CMs to
   ATRNL1 highlights its important biological role in humans. In total,
   our findings extend our understanding of the transcriptional, cellular,
   and molecular basis of arrhythmias, results that will launch new areas
   of investigation and opportunities for the therapeutic development of
   improved AF treatments.

Methods

Human tissue samples

   Written informed consent for research use of donated tissue was
   obtained from either the patient providing tissue or the next-of-kin in
   the case of deceased organ donors. Research use of tissues were
   approved by the relevant institutional review boards at the
   Gift-of-Life Donor Program, the University of Pennsylvania,
   Massachusetts General Hospital, and the Broad Institute (IRB-5999). All
   samples originated from deceased donors with non-failing hearts that
   were rejected for organ donation. Adult left atrial tissue samples were
   collected from organ donors by Myocardial Applied Genetics Network
   (MAG-Net;) as previously described^[208]10,[209]13. Control samples
   were collected from organ donors with no previous history of heart
   disease or atrial fibrillation AF samples were collected from patients
   that had a history of atrial fibrillation but no history of other heart
   disease. Briefly, hearts were arrested in situ with at least 1 L
   ice-cold crystalloid cardioplegia solution and were transported to the
   on ice until cryopreservation (always less than 4 h). Tissue from the
   left atria was collected, flash frozen and stored at -80 until used.

Statistics and Reproducibility

   No statistical methods were used to predetermine sample size. The
   experiments were not randomized, and investigators were not blinded to
   allocation during experiments and outcome assessment.

Trichrome staining of cardiac tissue

   Trichrome staining was carried out on 10 um thick fresh frozen sections
   of LA tissue following manufacturer’s protocol (Abcam ab150686).

Immunofluorescence staining of cardiac tissue

   Immunofluorescence staining was carried out on 10 um thick fresh frozen
   sections of LA tissue. In brief, samples were fixed in 4% PFA for
   15 mins and then permeabilized with 0.1% TX-100 for 5 mins. Following
   3x PBS washes samples were blocked in 7% donkey serum for 1 h (h) at
   room temperature (RT). Sections were incubated with primary antibodies
   overnight at 4 degrees C. Sections were washed 3 x in PBS and then
   incubated with secondary antibodies and hoescht for 1 h at RT. Sections
   were washed 3 x in PBS and then mounted with prolong gold mounting
   media (Invitrogen). Images were taken at 20x on a Leica SP8 microscope.
   Antibodies used in this study; ATRNL1, rabbit, Thermo fisher PA5-58234
   (1:100), Cx43, mouse, Thermofisher 14-4759-82 (1:100), α-actinin:
   mouse: abcam ab9465 (1:200). Secondary antibodies: donkey anti-rabbit
   488 thermo, A-21206 (1:200); donkey anti-mouse 568 thermo, A10037
   (1:200). Pearson’s correlation coefficient measurements were carried
   out on images co-stained for ATNRL1 and Cx43 using the Image J plug in
   Just Another Colocalization Plugin^[210]74. Images were converted to
   16 bit and ATRNL1 and Cx43 images were analyzed. The average of all
   analyzed images was reported in the results section.

LA sample processing and snRNAseq

   Nuclei were isolated from fresh frozen LA tissue as previously
   described^[211]10,[212]13. In brief, LA tissue was sectioned into
   100 um thick slices using a cryostat. Nuclei were liberated by douncing
   in NIM2 buffer (250 mM sucrose, 25 mM KCl, 0.05% IGEPAL-630, 3 mM
   MgCl2, 1 μM DTT, 10 mM Tris pH 8.0), and debris was pelleted at 400 × g
   for 2 mins at 4 degrees C. Nuclei were filtered through 40 and 10 um
   filters and then washed with PBS wash buffer (0.01% BSA, 5 mM MgCl2,
   PBS). Nuclei were pelleted by centrifugation at 550 × g for 5 min at 4
   degrees C, washed with 6 mL of PBS wash buffer and centrifuged at
   550 × g for 5 min at 4 degrees C. Nuclei were resuspended in Nuclei
   Suspension buffer (Nuclear wash buffer + 0.4 U μl − 1 murine RNAse
   inhibitor) and then counted and 6000 nuclei were loaded into the 10x
   genomics microfluidic device (10x genomics Single cell 3′ solution v
   3). Libraries were constructed using manufacturer’s protocol with a few
   modifications. First, to promote nuclei lysis, GEMS were incubated at
   4 °C for 15 min after emulsion generation. Second, the reverse
   transcription protocol was lengthened to be 42 °C for 20 min then 53 °C
   for 120 min. Libraries were QCed and quantified using BioAnalyzer and
   qPCR. Libraries were pooled and sequenced on Illumina Novaseq S1 flow
   cells, at an average of 16 samples per flow cell.

snRNA-seq data pre-processing

   The processing pipeline implemented here followed similar steps than
   previously described^[213]10. In brief, the raw files of 36 samples
   were processed on the TERRA platform ([214]https://app.terra.bio/). The
   raw base call files (BCLs) were demultiplexed into FASTQ files using
   the mkfastq workflow from the 10x Genomics CellRanger v 4.0.0 tool.
   Homopolymer repeats (A30, C30, T30 and G30, settings:
   max_error_rate = 0.1; min_overlap = 20) and the template-switch
   oligonucleotide and its reverse complement
   (AAGCAGTGGTATCAACGCAGAGTACATGGG and CCCATGTACTCTGCGTTGATACCACTGCTT,
   settings: max_error_rate = 0.07 and min_overlap = 10) were trimmed
   using cutadapt v1.18 in Python v3.7.9. Alignment of trimmed reads was
   performed with CellRanger count workflow, to the GRCh38 pre-mRNA human
   reference (v2020-A), using the parameter expect-cells = 5000 and
   otherwise default values.

   The standard quality control output metrics from the CellRanger count
   workflow were inspected regarding percent of reads in cells, percent of
   reads confidently mapped to transcriptome, percent valid barcodes, and
   Q30 for reads (Supplementary Data [215]12). To identify low quality
   samples, we performed visual inspection of the unique molecular
   identifier (UMI) curves, the automatically created graph-based
   t-distributed stochastic neighbor embedding (t-SNE) projection, and the
   Y-chromosome count distribution in comparison to the reported sex. One
   sample (LA_1512) was removed during this procedure, as it presented as
   an outlier on the metrics distribution and had an atypical UMI curve
   with insufficient ambient plateau. As an additional quality control
   step, we fingerprinted the samples against whole-genome sequencing
   (WGS) data (available for 33 of the 36 samples) to identify mismatches
   using the CrosscheckFingerprints function from the Picard tool. The
   generation and processing of the WGS data has been previously
   described^[216]10. In brief, Whole-genome sequencing was performed
   targeting 30x coverage at the Broad Institute of Harvard and the
   Massachusetts Institute of Technology using the Illumina NovaSeq
   platform. Reads were aligned to the GRCh38 reference using
   BWA-MEM^[217]75. Variants were called using GATK HaplotypeCaller
   (v3.5.054)^[218]76. We removed one sample (LA_1344) with a mismatch
   between the WGS data and the genotypes extracted from the single nuclei
   RNA-sequencing. The baseline data for the 34 samples included in the
   final analysis can be found in Supplementary Data [219]1.

   For a sensitivity analysis all samples that passed QC were processed
   with the remove-background workflow from the CellBender v0.2
   tool^[220]77. The following parameters were used: --expected-cells
   5000, --z-dim 100, --total-droplets-included 25000, --epochs 150,
   --learning_rate 1e-4, and --fpr 0.01. For the sample LA_1440 the
   parameters were adjusted to achieve training convergence:
   --learning_rate 2e-5, --expected_cells 10000, --total_droplets_included
   30000.

Construction of nuclei map

   We generated a nuclei map using scanpy 1.7.0. We log-normalized the
   count data using scanpy.pp.normalize_per_cell(1e4) and
   scanpy.pp.log1p() and calculated highly variable genes with
   scanpy.pp.highly_variable_genes. Afterwards scVI^[221]14 was applied
   for sample alignment by including the patient of origin as a batch
   indicator, and a nuclei map was constructed using
   scanpy.pp.neighbors(n_neighbors=10, n_pcs=50,
   use_rep = ‘X_scvi’,metric = ‘cosine’). Then clustering was performed
   with scanpy.tl.leiden(resolution=0.75).

   The following 7 quality control (QC) metrics were calculated for each
   nucleus. We used the tool scanpy 1.7.0 to calculate several QC metrics
   unless otherwise indicated. 1) exon_ratio: the tool scR-Invex
   ([222]https://github.com/getzlab/scrinvex) was used to assign reads to
   exons, introns, or both. The total proportion of exonic reads was then
   calculated for each droplet. A larger proportion of exonic reads could
   be indicative of cytoplasmic contamination in the droplet. 2)
   percent_mito: the percentage of mitochondrial reads out of all reads
   was calculated. 3) doublet_scores: The tool Scrublet^[223]78 was used
   to calculate a double score that can detect doublet nuclei in the data.
   4) log_n_genes: The number of unique genes per nucleus were computed
   with scanpy.pp.filter_cells(min_genes=0) and logarithmized with
   numpy.log1p(). 5) log_ncount: The number of UMI counts per nucleus was
   calculated and logarithmized with numpy.log1p(). 6) entropy: The
   entropy per nucleus based on CellRanger counts was calculated with
   Bayesian entropy estimation ([224]https://pypi.org/project/ndd/1.6.3/).
   7) loggene_entropy: The entropy value was multiplied by the
   logarithmized number of genes. We first examined each cluster for
   outliers. One cluster was identified as highly enriched with
   contaminated nuclei based on a high proportion of reads mapping to
   exons and high mitochondrial gene counts, a second cluster was
   identified as enriched for doublets though high doublet score and high
   number of genes. All nuclei (N = 28,040) from both clusters were
   removed from the dataset (Supplemental Fig. [225]1). For all remaining
   clusters, nuclei were removed if a QC value was an outlier as defined
   by the interquartile range (IQR) = 1.5*IQR. Cutoffs were determined by
   cluster to account for inherent differences in transcriptional
   complexity between cell types. Then we reran scanpy.tl.leiden
   (resolution=0.5) and performed additional nuclei QC within the cell
   type clusters obtained at this resolution. To identify low quality or
   misclassified nuclei, the 7 nuclei QC metrics listed above were
   inspected, along with a marker gene-based score for each major cell
   type, to identify misclassifications. These marker gene-based scores
   were generated by performing a differential expression test based on
   the log normalized expression matrix across the clusters using
   scanpy.tl.rank_genes_groups(method = ‘wilcoxon’). Then the area under
   the curve (AUC) per gene and cluster was calculated. The top 50
   differentially expressed genes per cluster (by highest AUC) were
   selected to generate a cell type score for each nucleus using
   scanpy.tl.score_genes(). To identify subpopulations that may represent
   misclassified or low quality nuclei that were not removed in our
   previous QC, we iteratively subclustered each cluster, stepwise
   increasing the resolution by 0.1 until no more subcluster with marker
   genes with an AUC greater than 0.6 emerged. Then the distribution of
   all the nuclei QC metrics was inspected visually (via boxplots) and
   outlier sub-clusters were identified. A total of 8059 nuclei were
   marked for exclusion during this QC step. The final nuclei map
   contained 179,697 nuclei. Uniform Manifold Approximation and Projection
   (UMAP) plots of the nuclei map were created with scanpy.tl.umap
   (min_dist=0.2).

Identification of cell-types and marker genes

   Marker genes for the top 13 most abundant cell types were calculated
   with the differential expression method limma-voom using the R-package
   limma v3.40.6. First, the expression counts were summarized by cluster
   and sample, including each observation with at least 50 nuclei per
   sample/cluster combination. For each cluster, genes were filtered using
   the algorithm implemented in filterByExpr from the R-package edgeR
   v3.26.8^[226]79 to retain genes with sufficiently large counts with
   default settings. The data was then normalized using DESeq2
   v1.24.0^[227]80 normalization. Differential expression was then
   calculated using the limma-voom pipeline^[228]81 with the model
   ‘~0 + cluster + individual’, followed by extraction of the contrast
   comparing expression in each cluster versus all other clusters. Marker
   genes were selected if the AUC was > 0.6, the log fold change > 2, the
   adjusted P-value (by Benjamini Hochberg) was < 0.01 and the percentage
   of nuclei expressing the gene in the given cell-type was > 25%. For the
   2 smallest clusters there was not enough power to calculate
   differential expression with the limma-voom pipeline. Instead, the
   marker genes were calculated using the function
   scanpy.tl.rank_genes_groups(method = “Wilcoxon”) on the log normalized
   expression matrix. The marker genes were selected using the same
   criteria as listed above but based on the Wilcoxon test results. The
   final set of marker genes for each cell-type can be found in
   Supplementary Data [229]3. The marker genes were then used to perform
   pathway enrichment analyzes to confirm the cell-type of each cluster.
   The web-based tool g:Profiler^[230]82 was used to perform gene set
   enrichment analyzes based on Gene Ontology (GO) gene sets.

Differential expression analysis

   A formal differential expression model controlling for the correlation
   amongst nuclei from the same individual was performed by summing gene
   counts across all nuclei in a cluster within an individual patient and
   treating the data as a bulk RNA sequencing experiment before running
   limma-voom in a similar manner as mentioned above. The differential
   expression analysis across cases and controls was performed with the
   method limma-voom using the R-package limma v3.40.6. First, the
   expression counts were summarized by cluster and sample, including each
   observation with at least 20 nuclei per sample/cluster combination. For
   each cluster, genes were filtered using the algorithm implemented in
   filterByExpr from the R-package edgeR v3.26.8^[231]79 to retain genes
   with sufficiently large counts with default settings. The data was then
   normalized using DESeq2 v1.24.0^[232]80 normalization. Differential
   expression was then calculated using the limma-voom pipeline^[233]81
   with the model ‘~af + sex’, followed by extraction of the contrast
   comparing expression in cases versus controls. Genes were considered
   significant in the differential expression analyzes when the adjusted
   P-value (by Benjamini Hochberg) was < 0.05. For visualization of the
   differential expression results in violin plots, the filtered and
   normalized count data was converted to the unit counts per million
   (CPM) as implemented in the cpm function from the edgeR R-package.
   Since there can be notable background contamination present in single
   nuclei RNA-sequencing experiments we generated a flag that would
   identify genes that have a high probability of coming from the
   background. We followed a procedure to calculate the flag based on
   CellRanger and also CellBender counts.

   For the differential abundance analysis with Milo, we used the python
   version, MiloPy (0.1.1)^[234]15. We selected neighbors with a k = 50
   for our differential abundance testing analysis. Compositional analysis
   with scCODA was carried out in python with the PertPy package (version
   0.7.0)^[235]16. For pseudo-bulk RNA-seq analysis the python apdbulk
   ([236]https://github.com/noamteyssier/adpbulk) module was utilized to
   generate pseudo-bulk counts, and the resulting count table was then
   analyzed with DEseq2 (1.30.1)^[237]80.

Integration of snRNA-seq with AF GWAS results

   The marker genes for each cell type were intersected with results from
   an AF genome-wide association study (GWAS)^[238]33. We intersected the
   104 genetic loci (sentinel variant +/- 500 kb) with the human gene
   reference GRCh38.p13 exported from BioMart to identify all genes at the
   genetic loci from the GWAS. That gene list was then intersected with
   the marker genes from this analysis to 1) prioritize genes at GWAS loci
   and 2) highlight the most relevant cell-types for AF.

hPSC culture and ACM differentiation

   Human embryonic stem cells (hESCs, WiCell) were cultured in feeder-free
   culture on Geltrex (Invitrogen) coated flasks in a modified Essential 8
   medium containing DMEM/F12 with HEPES (Thermo Fisher, 1133057),
   540 mg/L sodium bicarbonate (Thermo Fisher, 25080094), 1% MEM
   Non-Essential Amino Acids (Invitrogen 11140) and 1x Pen-Strep
   (ThermoFisher 15140122), 8.5 mM NaCl (Sigma, S5150), 20 μg/mL insulin
   (Thermo Fisher, 12585014), 200 μM ascorbic acid (Sigma, A8960),
   10 μg/mL holo-transferrin (Sigma, T0665), 1 μg/mL heparin (Sigma,
   H3149), 14 ng/mL sodium selenite (Sigma, S9133), 20 ng/mL bFGF (Thermo
   Fisher, 13256027), 10 ng/mL activin A (R&D Systems, 338-AC-01M), and
   10 ng/mL Neuregulin-beta 1 (R&D Systems, 396-HB-050). Cells were
   incubated at 5% O[2] and 10% CO[2] at 37 °C.

   For atrial cardiomyocyte (ACM) differentiation, dissociated single cell
   PSC suspension was incubated overnight with continuous shaking at 98
   RPM to form spheroid clusters (day -1). Spheroids were collected (day
   0) and resuspended in Basal Differentiation Media (BDM) containing
   DMEM/F12 with HEPES, 540 mg/L sodium bicarbonate, 1% non-essential
   amino acids (Thermo Fisher, 11140), 10 ug/mL holo-transferrin, 200 uM
   ascorbic acid, 63 ng/mL progesterone (Sigma, P8783), and 14 ng/mL
   sodium selenite, supplemented with 50 ng/mL activin A, 10 ng/mL BMP4
   (R&D Systems, 314-BP-050), 20 ng/mL bFGF (Shenandoah, 100-28-500UG),
   with 2 μM CHIR-99021 (Selleckchem, S2924). Spheroids were replenished
   with fresh BDM to remove CHIR-99021 on day 1 and treated with 10 uM
   XAV-939 (APExBio, A1877), 1 uM IWR1-Endo (APExBio, B2306) and 1 uM
   IWP-2 (APExBio, A3512) on day 2. To promote atrial induction, 1 nM
   TTNPB (Tocris, 0761) was applied along with 100 nM retinoic acid, 1 uM
   SAG (Selleckchem, S7779) and 10 uM XAV-939 on day 3. TTNPB and RA were
   removed on days 4–5. On day 6 and onward, spheroids were maintained in
   BDM supplemented with 4 μg/ml insulin. On day 17, spheroids were
   collected, washed and dissociated with 1 mg/ml collagenase II (Thermo
   Fisher, 117010015) for one hour at 37 °C. Dissociated cardiomyocytes
   were seeded in fibronectin-coated plates and cultured in Lactate Media
   (DMEM without glucose (Thermo Fisher, 11966025), 540 mg/L sodium
   bicarbonate, 4 mM lactate (Sigma L6402), 200 uM ascorbic acid, and
   14 ng/mL sodium selenite) for 4 days to remove non-cardiomyocytes.
   Cardiomyocytes were then kept in Cardiomyocyte Maintenance Media
   (DMEM/F12, 10% FBS and 1% Pen/Strep) until day 30.

Cellular electrophysiology phenotyping

   Beating atrial cardiomyocytes at day 30 were dissociated as described
   above and seeded onto a fibronectin-coated 96-well (Greiner Bio-One) at
   55,000 cells per well densities. Cardiomyocytes were infected with
   lentiviruses carrying a genetically encoded voltage sensor – Arclight.
   After 72 h of infection, ACMs were transfected with a scrambled siRNA
   control (Life Technologies, 4390843) or a ATRNL1 siRNA using
   Lipofectamine RNAiMAX (Invitrogen 13778) according to the
   manufacturer’s instructions. After 7 days of transfection, Arclight
   fluorescence of ACM was captured at 500 frames/sec on a Vala Sciences
   IC200-KIC Imager with electrical pacing at 1 Hz. Cardiomyocytes were
   maintained in a heating chamber at 37 °C with 10% CO[2] throughout the
   recordings. The raw data were exported as sequences of TIFF files and
   transferred to an internal cloud database for further analysis. MATLAB
   (R2019b) was used to calculate durations of action potentials at 80%
   recovery to determine APD80^[239]83.

RNA extraction and qPCR

   Total RNAs from ACMs were extracted with the QIAGEN RNeasy Mini Kit
   (74106) according to the manufacturer’s instructions. cDNA was
   synthesized with the iScript kit (Bio-Rad 1708891). qPCR was performed
   with the SsoAdvanced Universal probes supermix (Bio-Rad 1725284) on a
   Bio-rad CFX384 Real-time system. TaqMan probes (Life Technologies
   4331182) for ATRNL1 and HPRT1 (Hs02800695_m1) were used for gene
   expression assay. Analysis was performed using the delta CT method
   normalized to HPRT1.

Immunohistochemistry of hESC-aCMs

   HESC derived ACMs seeded on fibronectin-coated 96-well plate (Perkin
   Elmer, 6055300) were fixed in 4% paraformaldehyde for 15 min followed
   by 3 times washing with PBS. Cells were permeabilized with 0.2%
   Triton-X100 for 15 min and blocked with 2% BSA in PBS for 1 h at RT,
   followed by incubation with primary antibodies ATRNL1 and MYOM1 (DSHB,
   B4) overnight at 4 °C. After washing three times with PBS, cells were
   incubated with fluorescent secondary antibodies donkey anti-rabbit IgG
   alexa 488 (Invitrogen A32790), donkey anti-mouse IgG alexa 568
   (Invitrogen A10037) and DAPI (Thermo Fisher, D1306) for 1 h at RT.
   Images were captured with Perkin Elmer Opera Phenix Imaging system and
   processed with Harmony. Human left atrial tissue from normal donor was
   sectioned at 10 μm and fixed in 4% PFA for 5 min, followed by washing
   with PBS, permeabilization in 0.1% Triton-X100 for 30 min and blocking
   in 3% BSA for an hour at RT. Tissue section was stained with primary
   antibodies against ATRNL1 and MYOM1 at 4 °C overnight, followed by
   washing and incubation with fluorescent secondary antibodies in 7 %
   donkey serum as described above, and mounted with Prolong Gold Antifade
   Mountant (Life Technologies, [240]P36934). Fluorescent images were
   captured with a Leica SP8 confocal microscope. Adult human myocardial
   samples were collected and research use of tissues was approved by the
   relevant institutional review boards at the Gift-of-Life Donor Program,
   the University of Pennsylvania, Massachusetts General Hospital, and the
   Broad Institute.

RNA-seq and RNA-seq analysis

   All hESC-aCMs were transfected with ATRNL1 siRNAs as described above.
   For the overexpression of ATRNL1 we packaged the short isoform
   (ENST00000609571.5) into lentiviral particles via transfection of
   HEK293T cells according to the manufacturers protocol (Dharmacon
   lentiviral packaging kit, Horizon Discovery, TLP5913). The
   overexpression lentiviral vector was obtained from GeneCopoeia
   (EX-E1118-Lv242). All hESC-aCMs were transduced with lentiviral
   particles and selected for overexpression with puromycin. Total RNA was
   extracted from atrial iPS-CMs using the Direct-zol RNA Miniprep kit
   (Zymo Research, R2051). RNA-seq libraries were generated with a
   ribosomal depletion protocol (KAPA RNA HyperPrep with RiboErase,
   KK8560). All RNA-seq libraries were sequenced on an Illumina NovaSeq.
   Sequenced reads were aligned to the human genome (GRCh38) using STAR
   (2.7.9a)^[241]84. Differential expression analysis was performed with
   DESeq2 (1.30.1)^[242]80. Pathway enrichment analysis was carried out
   with Metascape^[243]85 and ClusterProfiler (version 4.6.2)^[244]86. For
   the detection of circular RNA, we utilized the CircRNA_Finder scripts
   with STAR (2.7.9a)^[245]87. We scanned the circATRNL1 for miRNAs with
   ScanMiR (1.5.2)^[246]88. And the miRNA target enrichment analysis was
   carried out with MIENTURNet^[247]28.

Reporting summary

   Further information on research design is available in the [248]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [249]Supplementary Information^ (107.8MB, pdf)
   [250]Peer Review file^ (1.6MB, pdf)
   [251]41467_2024_54296_MOESM3_ESM.pdf^ (131.6KB, pdf)

   Description of Additional Supplementary Files
   [252]Supplementary Data 1^ (10.4KB, xlsx)
   [253]Supplementary Data 2^ (15.3KB, xlsx)
   [254]Supplementary Data 3^ (292.3KB, xlsx)
   [255]Supplementary Data 4^ (12.9KB, xlsx)
   [256]Supplementary Data 5^ (284KB, xlsx)
   [257]Supplementary Data 6^ (492.9KB, xlsx)
   [258]Supplementary Data 7^ (710KB, xlsx)
   [259]Supplementary Data 8^ (200.2KB, xlsx)
   [260]Supplementary Data 9^ (298.3KB, xlsx)
   [261]Supplementary Data 10^ (10.8KB, xlsx)
   [262]Supplementary Data 11^ (49.2KB, xlsx)
   [263]Supplementary Data 12^ (15.3KB, xlsx)
   [264]Reporting Summary^ (98.9KB, pdf)

Source data

   [265]Source Data^ (1.1MB, xlsx)

Acknowledgements