Abstract

   A detailed understanding of the gene-regulatory network in ankylosing
   spondylitis (AS) is vital for elucidating the mechanisms of AS
   pathogenesis. Assaying transposase-accessible chromatin in single cell
   sequencing (scATAC-seq) is a suitable method for revealing such
   networks. Thus, scATAC-seq was applied to define the landscape of
   active regulatory DNA in AS. As a result, there was a significant
   change in the percent of CD8+ T cells in PBMCs, and 37 differentially
   accessible transcription factor (TF) motifs were identified. T cells,
   monocytes-1 and dendritic cells were found to be crucial for the IL-17
   signaling pathway and TNF signaling pathway, since they had 73
   potential target genes regulated by 8 TF motifs with decreased
   accessibility in AS. Moreover, natural killer cells were involved in AS
   by increasing the accessibility to TF motifs TEAD1 and JUN to induce
   cytokine-cytokine receptor interactions. In addition, CD4+ T cells and
   CD8+ T cells may be vital for altering host immune functions through
   increasing the accessibility of TF motifs NR1H4 and OLIG (OLIGI and
   OLIG2), respectively. These results explain clear gene regulatory
   variation in PBMCs from AS patients, providing a foundational framework
   for the study of personal regulomes and delivering insights into
   epigenetic therapy.

   Subject terms: Genetics, Immunology, Diseases, Pathogenesis

Introduction

   Ankylosing spondylitis (AS) is a chronic, inflammatory, rheumatic
   disease that predominantly affects the axial skeleton of young
   men^[42]1. Consequently, it causes characteristic inflammatory back
   pain, leads to structural and functional impairments and decreases the
   quality of life^[43]2. Various immune cells have been reported to play
   important roles in the initiation, progression and regulation of AS.
   For example, dendritic cells (DCs) are involved in Th17 immune
   responses, which are associated with manifestations of AS^[44]3.
   Natural killer (NK) cells can recognize HLA class I through the
   expression of genes named killer immunoglobulin-like receptors (KIRs).
   Interestingly, this recognition determines NK cell functions, one of
   which is recruiting other immune cells, resulting in an excessive
   immune state in AS^[45]4. T cells are able to mediate inflammatory
   responses by secreting inflammatory cytokines in AS, and B cells are
   reported to have a positive correlation with the Bath Ankylosing
   Spondylitis Disease Activity Index (BASDAI) in AS^[46]3. However, the
   number of these immune cell subsets and their roles in the pathogenesis
   of AS are still debatable. The precise etiology of AS remains unclear,
   which includes a lack of knowledge about the cell-type-specific
   regulatory program in the pathogenesis of AS.

   Recently, assaying transposase-accessible chromatin in single cell
   sequencing (scATAC-seq) was devolved to map open chromatin regions and
   identify regulatory regions^[47]5. This method is not only simple and
   sensitive but also capable of recognizing different cell types,
   including subtle and rare cell subtypes, and it can reveal
   cell-type-specific regulatory regions and explore related transcription
   factors (TFs). Moreover, some recent studies have displayed the power
   of scATAC-seq for understanding regulatory principles. For example, the
   single-cell chromatin assay has successfully mapped the regulatory
   landscape of adult mouse tissues, characterized 85 distinct chromatin
   patterns, annotated key regulators in diverse mammalian cell types and
   identified cell types underlying common human traits and
   diseases^[48]6. Another group used scATAC-seq to study the mouse
   forebrain at eight developmental stages, and they found
   cell-type-specific transcriptional regulation and provided insight into
   forebrain development^[49]7. In addition, the application of scATAC-seq
   has identified cell-type-specific cis- and trans-regulatory elements
   without using antibodies and reconstructed trajectories of cellular
   differentiation in human blood; in addition, this method has disclosed
   regulatory networks in basal cell carcinoma^[50]8.

   Notably, chromatin accessibility plays important roles in genome
   stability and gene regulation. As changes in chromatin accessibility
   patterns may alter the accessibility of key proteins to regulatory
   regions of the genome, these patterns are emerging as essential
   component of human diseases^[51]9. Using scATAC-seq to profile
   accessible chromatin of AS is a promising avenue for unveiling
   important cell subsets involved in AS pathogenesis without bias, and
   this process can explain how the chromatin regulatory elements govern
   transcription in each cell type. Thus, we take advantage of scATAC-seq
   to assess the landscape of open chromatin regions of peripheral blood
   mononuclear cells (PBMCs) from 9 AS patients and 12 healthy controls.
   Through analysis of scATAC-seq data, we performed cell identification
   without using antibodies. Then, we investigated novel and rare cell
   populations and evaluated the changes in immune cell numbers in AS
   patients compared to healthy controls. Finally, we analyzed
   cell-type-specific regulatory patterns. These results provided
   mechanistic insights into AS-associated pathogenesis.

Results

Benchmarking analysis of scATAC-seq

   PBMCs from both AS patients (AS_PBMC) and healthy controls (NC_PBMC)
   were obtained and analyzed by scATAC-seq. In our study, patients
   treated with immune suppressants such as non-steroidal
   anti-inflammatory drugs (NSAIDs), corticosteroids (CS), methotrexate
   (MTX) and sulfasalazine (SSZ) within 3 months before sample collection
   were excluded (Table [52]1). After quality control was performed,
   217,542,359 cleaned, paired-end reads from the AS_PBMC data and
   222,111,738 paired-end reads from the NC_PBMC data were obtained. A
   total of 5,988,120 peaks of DNA accessibility from AS_PBMC library and
   5,950,637 peaks from NC_PBMC library were identified. At single-cell
   resolution, approximately fourfold enrichment of fragments proximal to
   TSSs (relative to distal regions) from the AS_PBMC data were observed,
   and a similarly assessed fivefold enrichment from the NC_PBMC data were
   observed; these data reflect a high fraction of fragments captured
   within open rather than closed chromatin. The AS_PBMC library was
   sequenced at an average depth of 26,084 raw reads per cell, generating
   chromatin accessibility profiles for 8340 cells with a median of 6190
   unique fragments per cell. Meanwhile, the NC_PBMC library contained
   26,464 raw reads per cell, producing profiles for 8393 cells with a
   median of 5344 unique fragments per cell. In the AS_PBMC library,
   19.12% of the fragments were assigned to TSSs, 13.01% were mapped to
   enhancer regions, 11.77% were linked to promoter regions, 22.36% were
   in nucleosome-free regions, 70.60% were mapped to a single nucleosome,
   and the fraction of total read pairs mapped confidently to the genome
   was 86.59%. In the NC_PBMC library, the related percentages were
   21.57%, 13.25%, 14.29%, 23.71%, 69.04% and 86.77%, respectively.

Table 1.

   Clinical features of patients with AS and NC studied for scATAC-seq
   experiments.
   Clinical characteristic (number of samples) AS (n = 9)  HC ( n = 12)
   Age (years)*                                37 ± 6      34 ± 9
   Sex, Female/Male                            7/2         6/6
   ESR(mm/hr)*                                 35 ± 38     NA
   CRP(mg/l)*                                  18 ± 21     NA
   Inflammatory back pain                      8/9         NA
   HLA-B27 positivity                          7/9         NA
   Schober test (cm)*                          1.5 ± 1.4   NA
   Hand to floor distance (cm)*                16.5 ± 16.5 NA
   Chest expansion (cm)*                       2.6 ± 2.0   NA
   [53]Open in a new tab

   AS, ankylosing spondylitis; NC, healthy controls; PBMC, Peripheral
   blood mononuclear cells; NA, not applicable; ESR, Erythrocyte
   sedimentation rate; CRP, C-reactive protein; BASDAI, Bath AS disease
   activity index.

   *Mean ± standard deviation.

Cell-type-specific clustering of PBMCs

   Clustering was performed with Cell Ranger ATAC pipeline, and 9 clusters
   were obtained for both AS_PBMC and NC_PBMC libraries after tSNE
   analysis. Following aggregating reads from all cells within a cluster
   to form ‘pseudobulk’ accessibility profiles and examining chromatin
   accessibility at known marker loci, 7 different functional cell types
   were recognized and annotated: NK cells (cluster 3), monocytes (cluster
   4), memory CD4+ T cells (cluster 5 and cluster 8), CD8+ T cells
   (cluster 6), B cells (cluster 7) and DCs (cluster 9) (Supplementary
   Fig. [54]S1). Meanwhile, none of the marker loci were significantly
   located in cluster 1 and cluster 2, and differential TF motifs between
   AS_PBMC and NC_PBMC libraries were hardly observed in these clusters.
   Since lymphocytes, monocytes and DCs were the focus of our research,
   cells from the relevant 7 annotated clusters were selected for
   reclustering and subsequent analysis, which included 4960 cells from
   the AS_PBMC library and 4094 cells from the NC_PBMC library.

   As a result, 8 clusters were obtained and identified: NK cells (cluster
   1), monocytes-1 (cluster 2), T cells (cluster 3), CD8+ T cells (cluster
   4), CD4+ T cells (cluster 5), B cells (cluster 6), DCs (cluster 7) and
   monocytes-2 (cluster 8) (Fig. [55]1a,b). More specifically, CD4+ T
   cells were identified by CD3G, CD4, IL2RA and IL7R gene promoter
   accessibility^[56]10; CD8+ T cells were identified by CD3G, CD8A and
   CD8B gene promoter accessibility^[57]11; NK cells were identified by
   CD160, GNLY, GZMB, NKG7 and KLRF1 gene promoter accessibility^[58]12; B
   cells were identified by CD19, CD79A and MS4A1 gene promoter
   accessibility^[59]10,[60]12; monocytes were identified by CD14, CD68,
   S100A12 and FCGR1A gene promoter accessibility^[61]12; and DCs were
   identified by CD83, CD1C and IL3RA gene promoter
   accessibility^[62]10,[63]12,[64]13 (Fig. [65]1c,d). Notably, monocyte-1
   and monocyte-2 were really different, because we found that the maker
   gene promoter accessibility of CX3CR1 and CD16 was different in cluster
   2 (monocyte-1) and cluster 8 (monocyte-2), which divided monocytes into
   different subgroups (Supplementary Fig. [66]S2).

Figure 1.

   [67]Figure 1
   [68]Open in a new tab

   Cell-type-specific clustering of human PBMCs according to scATAC-seq.
   (a) Schematic of cell types in AS_PBMC group; (b) Schematic of cell
   types in NC_PBMC group; (c) Open chromatin signals for each cluster at
   several marker gene loci; (d) tSNE visualization of deviations in
   accessibility at marker gene promoters across the 8 clusters; (e)
   Heatmap representation of log twofold change in the 579 variable TF
   motifs (rows) across all scATAC-seq clusters (columns). PBMCs,
   peripheral blood mononuclear cells; scATAC-seq, assaying
   transposase-accessible chromatin in single cell sequencing; AS_PBMC,
   PBMCs from patients with ankylosing spondylitis (AS); NC_PBMC, PBMCs
   from healthy controls; NK cells, natural killer cells; TF,
   transcription factor.

   Regarding the identified fragments that overlap with the list of TF
   motifs from the Cell Ranger ATAC pipeline, the most significantly
   enriched TF motifs in each cluster (Student’s t-test, p < 0.01; fold
   change (FC) > 1.2) with no significant difference between AS_PBMC and
   NC_PBMC libraries were annotated to show cell-type specificity.
   Notably, more than 30 TF motifs in monocytes (39 and 48 in monocytes-1
   and monocytes-2, respectively) were observed; thus, only the top 10
   significantly enriched TF motifs were annotated. As a result, CEBPA,
   FOS, Nfe2l2, NEUROG2 and SPI1 were critical for monocytes, FOSL2, JUND,
   CEBPD, CEBPG and FOSL1 were enriched for monocytes-1, and OLIG (OLIG1
   and OLIG2), TEAD2, TEDA1 and BHLHE22 were enriched in monocytes-2; GCM1
   was increased in T cells, especially CD4+ T cells; CEBPB and CEBPE were
   abundant in DCs; and TBX20, TBX21, TBX2 and TBR1 were specific to NK
   cells, which was consistent with data from a previous publication^[69]8
   (Fig. [70]1e, Supplementary Table [71]S1, Supplementary Spreadsheets
   [72]S1 and Supplementary Fig. [73]S3).

Epigenomic analysis of AS_PBMC and NC_PBMC libraries

   When comparing the 8 clusters in both AS_PBMC and NC_PBMC libraries
   (Table [74]2), the cell number in cluster 4 was significantly lower in
   AS patients (CD8+ T cells, p < 0.05, FDR < 0.15) than it was in healthy
   controls (Fig. [75]2a), and a total of 37 TF motifs across the genome
   were significantly different (p < 0.05), including 27 more enriched and
   10 less active motifs in the AS_PBMC data (Fig. [76]2b and
   Supplementary Spreadsheets [77]S2). In detail, 2 different TF motifs
   (OLIG) were found in CD8+ T cells; among the remaining 7 clusters
   without significant differences in the cell number ratio, the total
   number of differential TF motifs in NK cells (TEAD1 and JUN),
   monocytes-1 (REL and RELA) and CD4+ T cells (NR1H4) was 5; meanwhile,
   19 and 15 different TF motifs were found in T cells and DCs,
   respectively; neither cell numbers nor TF motifs were changed in B
   cells and monocytes-2.

Table 2.

   Comparison of 8 clusters in both AS_PBMC and NC_PBMC libraries.
   Clusters Cell types Cell number ratio p value FDR Number of
   differential TF motifs (p < 0.05)
   AS NC
   1 NK cells 28.73 21.25 0.222 1.000 2
   2 Monocytes-1 20.42 13.68 0.205 1.000 2
   3 T cells 15.16 18.93 0.479 1.000 19
   4 CD8+ T cells 6.47 17.59 0.016 0.141 2
   5 CD4+ T cells 8.83 9.57 0.857 1.000 1
   6 B cells 7.82 6.81 0.784 1.000 0
   7 Dendritic cells 6.57 8.04 0.690 1.000 15
   8 Monocytes-2 5.99 4.13 0.548 1.000 0
   [78]Open in a new tab

   AS_PBMC, peripheral blood mononuclear cells (PBMCs) from AS patients;
   NC_PBMC, PBMCs from NC; AS, ankylosing spondylitis; NC, healthy
   controls; p value, Student’s t-test; FDR, false discovery rate; TF,
   transcription factor; NK cells, natural killer cells.

Figure 2.

   [79]Figure 2
   [80]Open in a new tab

   Epigenomic analysis of human PBMCs. (a) Percentage of cells in each
   cell type for comparison of cell number ratio in the AS_PBMC and
   NC_PBMC libraries; (b) Volcano plots of 579 TF motifs in the AS_PBMC
   library compared to NC_PBMC library; (c) Heatmap representation of
   average counts in the 37 significantly differential TF motifs (rows)
   across all scATAC-seq clusters from both AS_PBMC and NC_PBMC libraries
   (columns); (d) Venn-diagram showing distribution of 37 significantly
   differential TF motifs between the AS_PBMC and NC_PBMC libraries.
   PBMCs, peripheral blood mononuclear cells; AS_PBMC, PBMCs from patients
   with ankylosing spondylitis (AS); NC_PBMC, PBMCs from healthy controls;
   NK cells, natural killer cells; TF, transcription factor.

   Notably, several TF motifs (such as OLIG) in monocytes-2 from the
   AS_PBMC group were more accessible than those of the NC_PBMC group, but
   there were no significant differences (Fig. [81]2c). Obviously, there
   were cells sharing the same TF motifs with significant differences
   (p < 0.05) between the AS_PBMC and NC_PBMC libraries (Fig. [82]2d). In
   detail, TEAD1 and JUN were more enriched in both NK cells and T cells,
   while REL and RELA were less accessible in both monocytes-1 cells and
   DCs.

Gene regulatory network analysis of the AS_PBMC library

   According to the databases ENCODE, ITFP, Marbach 2016, TRED and TRRUST,
   12,430 target genes were found to be related to the significantly
   differential TF motifs between the AS_PBMC and NC_PBMC libraries, which
   were validated by ChIP-seq. Since no related genes were ever recorded
   for TFs of Arid3b, Bach1::Mafk, Dmbx1, Klf12, Nfe2l2, Stat5a::Stat5b,
   TEAD2 and Znf423, only 29 TFs were used for subsequent Gene Ontology
   (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. As a
   result, , 21 TFs could regulate 11,823 target genes with more
   accessible binding sites in the AS_PBMC group, while the remaining 8
   TFs participated in the expression of 8,164 target genes with less
   accessible binding regions. Based on GO analysis, it was shown that
   7868 genes involved in AS pathogenesis had a molecular function of
   protein binding (Fig. [83]3a). With further KEGG analysis, the 20 most
   significant pathways related to both more active TF motifs
   (Fig. [84]3b) and less active TF motifs (Fig. [85]3c) were revealed,
   and the 8 less enriched TF motifs from the AS_PBMC library were found
   to play an important role in AS pathogenesis through the IL-17
   signaling pathway and TNF signaling pathway (Fig. [86]3c). Of note, 4
   less accessible TF motifs, CREB1, CREB3, ELK4 and ZNF740, were from T
   cells, and the other 4 less active TF motifs, REL, RELA, T and HMBOX1,
   were from DCs. Meanwhile, REL and RELA were also less enriched in the
   monocytes-1 (Fig. [87]3d and Supplementary Spreadsheets [88]S3).

Figure 3.

   [89]Figure 3
   [90]Open in a new tab

   Functional analysis of significantly differential TF motifs between the
   AS_PBMC and NC_PBMC libraries. (a) GO analysis of 12,430 target genes
   related to the 29 recorded significantly differential TF motifs; KEGG
   analysis revealing the 20 most significant pathways involved in (b) the
   21 more accessible TF motifs and (c) the 8 less enriched ones from the
   AS_PBMC group; (d) Boxplot showing counts of the 8 less active TF
   motifs from dendritic cells, monocytes-1 and T cells; Enriched KEGG
   analysis displaying the 5 most significant pathways related to TF
   motifs (e) OLIG (OLIGI and OLIG2), (f) NR1H4, and (g) TEAD1 and JUN. *,
   p < 0.05; **, p < 0.01, ***p < 0.001, p-value was calculated with Loupe
   Cell Browser 3.1.1 through the difference analysis part, and it was
   adjusted using the Benjimini-Hochberg correction for multiple tests.
   TF, transcription factor; PBMCs, peripheral blood mononuclear cells;
   AS_PBMC, PBMCs from patients with ankylosing spondylitis (AS); NC_PBMC,
   PBMCs from healthy controls; GO, Gene Ontology; KEGG, Kyoto
   Encyclopedia of Genes and Genomes.

   As mentioned above, the expression of CD8+ T cells was decreased in the
   AS_PBMC group, while the TF motif OLIG in this cell type was
   significantly enriched (p < 0.05). Based on the databases mentioned
   above, OLIG was found to be responsible for the transcriptional
   regulation of 78 and 139 genes, respectively. After deduplication, 142
   genes were used for KEGG analysis. 14 genes including CIITA, TAP and
   HLA-D were found to be abnormally regulated by OLIG2 and they were
   critical for dysregulated functioning of CD8+ T cells through antigen
   processing and presentation pathways (Fig. [91]3e). Noteworthy, CIITA
   could also be regulated by OLIG1 to participate in the same pathway.

   For CD4+ T cells, the TF motif NR1H4 was more accessible in the AS_PBMC
   group than it was in the NC_PBMC group. With further KEGG analysis, 791
   genes could be regulated by NR1H4. Meanwhile, 35genes including ADCY6,
   ARRB1, BRAF, CCL, CCR, CRK, CXCL, ELMO1, FOXO3, GNB5, GNG, HCK, IKBKG,
   NFKB1, PF4, PPBP, PREX1, RAP1, RELA, SOS1, SRC, TIAM1 and WAS were
   identified as able to be regulated to participate in a chemokine
   signaling pathway (Fig. [92]3f).

   Apart from T cells, monocytes and DCs, NK cells were also found to take
   part in driving AS progression by regulating 1495 genes with more
   enriched TF motifs of TEAD1 and JUN. Specifically, TEAD1 was able to
   regulate 60 genes to be associated with cytokine-cytokine receptor
   interactions and natural killer cell-mediated cytotoxicity, and JUN
   could regulate 34 genes to be involved in this pathway (Fig. [93]3g).
   Notably, TEAD1 was responsible for the regulation of 1267 genes
   according to KEGG analysis, while JUN was only involved in the
   regulation of 297 genes. At the same time, 69 genes were found to be
   regulated by both TEAD1 and JUN.

Discussion

   Ankylosing spondylitis (AS) is characterized by inflammatory low back
   pain and progressive ankyloses of spine. This definition has explained
   that immune cells play a vital role in AS pathogenesis. Since the
   immune cells circulate in the body through blood, research on PBMCs
   contributes to finding out markers for diagnosis and target therapy.
   For example, a recent study on PBMC of AS patients has mentioned that
   GM-CSF could be detected in plasma from 14/46 (30%) AS patients
   compared to 3/18 (17%) HC, and GM-CSF neutralization can be a potential
   novel therapeutic approach for the treatment of AS^[94]14. In our
   study, the landscape of active regulatory DNA in PBMCs from AS patients
   was surveyed at single-cell resolution using the sensitive method of
   scATAC-seq. Of note, it’s better if the sex ratio between AS patients
   and HCs is the same^[95]15. But we used fresh sample in our experiment
   for higher data quality, and we didn’t obtain samples with the same sex
   ratio between AS patients and HCs as expected in the hospital. Besides,
   our study focus on the difference of immune cells between AS patients
   and the healthy controls instead of cell heterogeneity in AS patients.
   In addition, there is no significant difference between the result from
   scATAC-seq of 6 healthy male controls plus 6 female healthy controls
   and the one from 7 male healthy controls plus 2 female healthy
   controls. Thus, it is acceptable to have different sex ratio between AS
   patients and HCs. Based on chromatin accessibility at known marker
   loci, cells were separated into 8 clusters by type without using
   antibodies through this method, and T cells, B cells, NK cells,
   monocytes and DCs were recognized. Notably, two subtypes of monocytes
   were found. In addition, cell-type-specific TF motifs were identified,
   such as CEBPA for monocytes, GCM1 for T cells, PRDM1 for B cells, and
   TBX20 for NK cells, which could be used to identify each cluster in the
   future.

   Since scATAC-seq was able to screen out cells that were involved in the
   pathogenesis of AS, 8 clusters in both AS_PBMC and NC_PBMC libraries
   were compared. Consequently, a significant difference in the cell
   number ratio was observed in CD8+ T cells, suggesting a possible
   mechanism for AS pathogenesis in which fewer CD8+ T cells fight against
   inflammation in the AS_PBMC group. Next, all TF motifs among the 8
   clusters were analyzed, and 37 differential TF motifs were found.
   Interestingly, differential TF motifs were hardly observed in the
   cluster with a significant differential cell number ratio between the
   AS_PBMC and NC_PBMC libraries, except that TF motif OLIG was more
   enriched in CD8+ T cells from the AS_PBMC library. Meanwhile, 19 and 15
   different TF motifs were found in T cells and DCs, respectively. This
   result may reveal that most cells take unique actions to promote
   disease: either they change cell numbers, or they adjust accessibility
   of their TF motifs. Besides, the 34 (equal to 19 + 15) differential TF
   motifs may reflect that the normal epigenomic signature of homeostasis
   for both T cells and DCs was lost in AS patients. In addition, T cells
   and DCs may contribute to AS progression by altering gene expression.
   As neither cell numbers nor TF motifs were changed in B cells and
   monocytes-2, these cells may play a role in AS pathogenesis through
   other pathways, such as posttranscriptional regulation.

   Based on GO and KEGG analysis, 12,430 target genes related to the
   significantly differential TF motifs between the AS_PBMC and NC_PBMC
   libraries were found, and 7557 genes were ready for regulation through
   different TF motifs with opposite changes. This result suggested that
   these genes may be involved in multiple regulatory networks. Moreover,
   7868 genes involved in AS pathogenesis had a molecular function of
   protein binding and the 8 less enriched TF motifs from the AS_PBMC
   library were found to play an important role in AS pathogenesis through
   the IL-17 signaling pathway and TNF signaling pathway, this was in
   agreement with previous publications^[96]16.

   IL-17 is produced mainly by T cells and is a proinflammatory cytokine.
   It is involved in AS pathogenesis either by itself or through synergy
   with other cytokines, such as TNF, to trigger the release of
   inflammatory mediators and increase the number of immune cells^[97]16.
   From our results, all 8 less active TF motifs were found to be critical
   to the IL-17 signaling pathway because of their regulation of 73 target
   genes. As the serum concentration of IL-17 was higher in AS
   patients^[98]16, our results indicated that the 8 less active TF motifs
   may contribute to higher expression of the 73 target genes, which
   resulted in a significant increase in activity of the IL-17 signaling
   pathway. In fact, the 21 more accessible TF motifs were also involved
   in the IL-17 signaling pathway, but they were far less statistically
   significant in their change than the 4 less active ones (p value: 0.04
   vs 1*10^–7). Thus, T cells, monocytes-1 and DCs performed their crucial
   role in the inflammatory response in AS patients by regulating 73
   potential target genes through decreasing the accessibility of TF
   motifs.

   Concerning the TNF signaling pathway, TNF was able to recruit
   receptor-interacting serine/threonine-protein kinase 1 (RIPK1), adaptor
   protein TNFR1-associated death domain (TRADD), and TNRF-associated
   factor 2 (TRAF2) to form complex I after binding TNFR1, which could
   trigger related phosphorylation and ubiquitination processes. Finally,
   mitogen-activated kinase (MAPK) and nuclear factor kB (NF-kB) were
   activated to produce proinflammatory effects in AS patients^[99]17. As
   a result, 87 target genes were involved in this pathway, and they were
   regulated by the same 8 TFs with low accessibility that were found in
   the IL-17 signaling pathway. For the 21 TF motifs that were more
   accessible in the AS_PBMC group than they were in the NC_PBMC group,
   only 20 TFs were able to regulate related genes and take part in the
   TNF signaling pathway. The TF of OLIG1 could be more active in CD8+ T
   cells from the AS_PBMC group than in CD8+ T cells from the NC_PBMC
   group, but it was not involved in the TNF signaling pathway. This
   result may indicate that the binding of OLIG was competitive.
   Similarly, the significance of KEGG analysis for the 20 more active TF
   motifs in the AS_PBMC group was much lower than it was for the 8 less
   accessible TF motifs (p value: 0.01 vs 8*10^–7). Since macrophages,
   including monocytes and DCs, have been reported to secrete IL-23, which
   in turn stimulates T cells to produce IL-17 in AS patients^[100]4, our
   results confirmed that T cells, monocytes-1 and DCs were very active in
   AS progression and had diverse roles, such as mediating the IL-17
   signaling pathway and TNF signaling pathway to cause inflammation from
   the perspective of gene transcriptional regulation.

   In CD8+ T cells, 14 genes were found to be abnormally regulated and
   critical for dysregulated cell functioning through antigen processing
   and presentation pathways. This result was consistent with the report
   that a particular antigen-specific subset of CD8+ T cells was involved
   in AS pathogenesis^[101]18. The TF motif NR1H4 from CD4+ T cells was
   more accessible in the AS_PBMC group than it was in the NC_PBMC group,
   and NR1H4 was able to regulate 35 genes to participate in chemokine
   signaling pathway. As CD4+ T cells mediate AS progression by producing
   chemokine receptors and cytokines^[102]19, our results suggested that
   NR1H4 from CD4+ T cells may be vital for the production of cytokines to
   cause inflammation. NK cells can recruit other immune cells to produce
   an excessive immune state by sending activating signals by their
   receptors in AS^[103]20. From our results, NK cells were involved in
   driving AS progression through regulating 60 and 34 genes by more
   enriched TF motifs of TEAD1 and JUN, respectively, which were
   associated with cytokine-cytokine receptor interaction and natural
   killer cell mediated cytotoxicity. This suggests that TEAD1 and JUN may
   be involved in recruiting other immune cells to produce an excessive
   immune state. That is, higher accessibility of TEAD1 and JUN could be a
   signature of malignant NK cells.

   In conclusion, T cells, B cells, NK cells, monocytes and DCs were
   identified without using antibodies. The CD8+ T cell number in AS
   patients was significantly lower than it was in healthy controls, and
   37 differential TF motifs associated with genes responsible for several
   inflammatory pathways were found. Further, T cells, monocytes-1 and DCs
   were found to be critical for the IL-17 signaling pathway and TNF
   signaling pathway through regulating 73 potential target genes, which
   was the result of 8 TF motifs being less active than they were in
   healthy controls. Meanwhile, NK cells were able to mediate AS
   progression by enriching the TF motifs TEAD1 and JUN to regulate 1495
   related genes and induce cytokine-cytokine receptor interactions and
   natural killer cell-mediated cytotoxicity. In addition, CD4+ T cells,
   as a subset of T cells, may be vital for altering host immune functions
   by producing cytokines, as the TF motif NR1H4 responsible for the
   chemokine signaling pathway was more accessible in the AS_PBMC group
   than it was in the NC_PBMC group. However, for CD8+ T cells, the cell
   number was lower; on the other hand, TF motif OLIG, which is involved
   in the pathway of antigen processing and presentation, was more active
   in the AS_PBMC group than they were in the NC_PBMC group. As for B
   cells, neither the cell number ratio nor its TF motifs were altered,
   which suggested that other pathways rather than regulatory elements may
   play an important role in AS pathogenesis. These findings may explain
   the core transcriptional circuitry in AS from the perspective of gene
   transcription regulation by different TF motifs from specific cells,
   which provides a basic framework to study personal regulomes, and it
   reveals insights into epigenetic therapy. In the future, differentially
   expressed genes and proteins can be identified to offer a more
   theoretical framework for precision medicine. Besides, the reseachers
   can continue the study based on our results, such as using scATAC-seq
   to analyze monocyte-1 and find out the most important cells relevant to
   the disease.

Methods

Human PBMC collection

Human subjects

   Samples were obtained with informed consent in accordance with
   protocols approved by the ethics committees of both Shenzhen People’s
   Hospital and Guangzhou Institutes of Biomedicine and Health (Guangzhou,
   China) (LL-KY-2019363). All AS subjects fulfilling the modified New
   York criteria^[104]21 (n = 9, male/female = 7/2, mean age
   37 ± 6 years), and healthy controls (n = 12, male/female = 6/6, mean
   age 34 ± 9 years) were recruited from outpatient clinics or were
   medical staff in Shenzhen People’s Hospital (Shenzhen, China). No
   patient was treated with immune suppressants within 3 months of sample
   collection.

Human PBMC collection

   Eight milliliters of venous peripheral blood was withdrawn from both
   patient and control subjects and preserved in heparin tubes. PBMCs were
   separated by adding equal proportions of Ficoll-Hypaque solution, and
   then subjected to density-gradient centrifugation (2700 g, 25 min,
   25 °C). Next, red blood cell (RBC) lysis buffer was added to remove the
   remaining RBCs. Finally, PBMCs were washed with chilled PBS, quantified
   with a cell counting plate and stored on ice for subsequent scATAC-seq
   analysis.

scATAC-seq library construction and sequencing

   All protocols for performing nuclei isolation and library construction,
   have been described previously^[105]8 and are also available here:
   [106]https://support.10xgenomics.com/single-cell-atac. The important
   details are as follows:

Nuclei isolation

   Nuclei suspensions were obtained by incubating lysis buffer (10 mM
   Tris–HCl, 3 mM MgCl2, 10 mM NaCl , 0.1% Tween-20, 0.1% Nonidet P40
   Substitute, 1% BSA) with 1,000,000 cells for 5 min on ice.

Library construction

   scATAC-seq libraries were generated according to the Chromium Single
   Cell ATAC protocol (10 × GENOMICS, [107]CG000168) as described
   previously^[108]8. Briefly, nuclei suspensions were incubated in a
   transposition mix that included a transposase and adaptor sequences,
   which fragmented the DNA in open regions of chromatin; then, barcoded
   gel beads, transposed nuclei, a master mix, and partitioning oil were
   loaded on a Chromium Chip E to generate GEMs. Next, silane magnetic
   beads were used to remove leftover biochemical reagents from the post
   GEM reaction mixture, and solid-phase reversible immobilization (SPRI)
   beads were added to eliminate unused barcodes from the samples; after
   addition of a sample index (P7) and a read 2 (Read 2N) sequence, the
   final libraries were constructed via PCR with P5 and P7 primers in
   Illumina bridge amplification. As the Illumina-ready sequencing
   libraries were produced, Illumina sequencer compatibility, sample
   indices, sequencing depth and run parameters, library loading and
   pooling were analyzed.

scATAC-seq data processing

   All protocols for data quality control, genome alignment, peak
   analysis, clustering and TF motif analysis, have been described
   previously^[109]8 and are also available here:
   [110]https://support.10xgenomics.com/single-cell-atac/software/pipeline
   s/latest/algorithms/overview. The main details are as follows:

Barcode processing

   To improve data quality, the occasional sequencing error in barcodes
   obtained from the ‘I2’ index read were fixed. In detail, if barcodes
   outside the whitelist had Hamming distance less than 2 and their
   probability of being the real barcodes (based on the abundance in the
   read data and quality value of the incorrect bases) was more than 90%,
   they were corrected to whitelist barcodes^[111]8.

Genome alignment

   Reference-based analysis was performed using the Cell Ranger ATAC
   pipeline
   ([112]https://support.10xgenomics.com/single-cell-atac/software/overvie
   w/welcome). First, the adapter and primer oligo sequences were trimmed
   off. In the current chemistry, a reverse complement of the primer
   sequence may be found in the 3′ end of a read if the read length was
   greater than that of the genomic fragment. Thus, the cutadapt
   tool^[113]22 was used to identify and trim the reverse complement.
   Then, BWA-MEM^[114]23 was applied with default parameters to align the
   trimmed read pairs that were greater than 25 bp to GRCh38.

Duplicate marking

   Duplicate reads were found by identifying groups of read pairs across
   all barcodes, where the 5′ ends of both R1 and R2 have identical
   mapping positions on the reference. Finally, the unique read pair was
   reported as a fragment in the file.

Peak analysis

   As already described^[115]8, the merged position-sorted BED file was
   used for peak calling through MACS2^[116]24. In detail, the number of
   transposition events at each base pair along the genome was first
   counted. Next, a smoothed profile of these events with a 401 bp moving
   window around each base pair and fitting a ZINBA-like mixture model was
   generated. Then, a signal threshold was set to determine whether a
   region was a peak signal or noise based on an odds ratio of 1/5.
   Finally, peaks within 500 bp of each other were to produce a
   position-sorted BED file. For each barcode, the mapped high-quality
   fragments that passed all filters were recorded, and the number of
   fragments that overlapped any peak regions was used to separate the
   signal from noise. After filtering to contain only cell barcodes, the
   matrix was used in subsequent analyses, such as dimensionality
   reduction, clustering and visualization.

Clustering and t-SNE projection

   Clustering and t-SNE projection were realized by Cell Ranger ATAC
   pipeline. To cast the data into a lower-dimensional space,
   dimensionality reduction was first performed via latent semantic
   analysis (LSA)^[117]25. The data were normalized via an
   inverse-document frequency (IDF) transform to provide greater weight to
   counts in peaks that occurred in rare barcodes. Singular value
   decomposition (SVD) was performed on this normalized matrix using IRLBA
   without scaling or centering to produce the transformed matrix in
   lower-dimensional space. Prior to clustering, normalization to depth
   was carried out by scaling each barcode data point to the unit L2-norm
   in the lower dimensional space. Specific to LSA, k-medoid clustering
   that produced 9 clusters for downstream analysis was provided. In
   addition, graph-based clustering and visualization via t-SNE were
   provided^[118]26, and the data were normalized to unit norms before
   performing graph-based clustering and t-SNE projection.

TF motif identification

   Peaks were enriched for transcription factor (TF) binding sites, and
   the presence of certain motifs was indicative of transcription factor
   activity. To identify these binding sites, the position weight matrix
   (PWM) of TF motifs was obtained from the JASPAR database^[119]27, and
   each peak was scanned using MOODS
   ([120]https://github.com/jhkorhonen/MOODS) to find the match for each
   peak and TF. The threshold p-value was 1E-7, and the background
   nucleotide frequencies were set according to the peak regions in each
   GC-enriched region. The list of motif-peak matches was unified across
   these regions, thus avoiding GC bias in the scan.

TF motif enrichment analysis

   Enrichment of motifs in the peaks was analyzed. First, the reads for a
   TF motif within a barcode were counted. Then, the ratio of this number
   to the total read number for that barcode was calculated. Next, the
   value was normalized to depth. TF motif enrichment was detected by
   z-scoring the distribution over barcodes of these proportion values. To
   perform robust analysis, a modified z-score calculation using the
   median and the scaled median absolute deviation from the median (MAD)
   was used.

Differential accessibility analysis

   To find differentially accessible motifs between groups of cells, the
   fast asymptotic beta test was performed in edgeR by Cell Ranger ATAC
   pipeline. For each cluster, the algorithm was run on that cluster
   versus all other cells, yielding a list of genes that were
   differentially expressed in that cluster relative to the rest of the
   sample (p-value < 0.5). Finally, the relative library size as the total
   cut site count for each cell divided by the median number was computed.

TF motif related genes GO enrichment analysis

   Based on the database of ENCODE^[121]28, ITFP^[122]29, TRED^[123]30 and
   TRRUST^[124]31, targeted genes related to the specific TF motifs were
   identified. TF motif-related Gene Ontology (GO) enrichment analysis was
   performed. First, all peak-related genes were mapped to GO terms in the
   Gene Ontology database ([125]https://www.geneontology.org/). Then, gene
   numbers were calculated for each term, and GO terms that were found to
   be significantly enriched when comparing peak-related genes to the
   genome background were defined by a hypergeometric test. p-values were
   calculated using the following formula:
   [MATH: <mrow><mi>P</mi><mo>=</mo><mn>1</mn><mo>-</mo><munderover><mo
   movablelimits="false">∑</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow>
   <mrow><mi>m</mi><mo>-</mo><mn>1</mn></mrow></munderover><mfrac><mrow><m
   fenced close=")"
   open="("><mrow><mtable><mtr><mtd><mi>M</mi></mtd></mtr><mtr><mtd><mrow>
   <mrow></mrow><mi>i</mi></mrow></mtd></mtr></mtable></mrow></mfenced><mf
   enced close=")"
   open="("><mrow><mtable><mtr><mtd><mrow><mi>N</mi><mo>-</mo><mi>M</mi></
   mrow></mtd></mtr><mtr><mtd><mrow><mrow></mrow><mrow><mi>n</mi><mo>-</mo
   ><mi>i</mi></mrow></mrow></mtd></mtr></mtable></mrow></mfenced></mrow><
   mfenced close=")"
   open="("><mrow><mtable><mtr><mtd><mi>N</mi></mtd></mtr><mtr><mtd><mrow>
   <mrow></mrow><mi>n</mi></mrow></mtd></mtr></mtable></mrow></mfenced></m
   frac></mrow> :MATH]

   N is the number of all genes with GO annotation; n is the number of
   peak-related genes in N; M is the number of all genes that are
   annotated to certain GO terms; and m is the number of peak-related
   genes in M. The calculated p-value was corrected by FDR, and an FDR of
   less than 0.05 was used as the threshold. GO terms meeting this
   condition were significantly enriched in peak-related genes. This
   analysis recognized the main biological functions of related genes.

TF motif related genes pathway enrichment analysis

   TF motif-related gene pathway enrichment analysis was performed using
   Kyoto Encyclopedia of Genes and Genomes (KEGG), identifying
   significantly enriched metabolic pathways and signal transduction
   pathways in peak-related genes compared to the whole genome background
   (database: [126]T01001). The p-value was calculated using the same
   equation that was used in GO analysis.
   [MATH: <mrow><mi>P</mi><mo>=</mo><mn>1</mn><mo>-</mo><munderover><mo
   movablelimits="false">∑</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow>
   <mrow><mi>m</mi><mo>-</mo><mn>1</mn></mrow></munderover><mfrac><mrow><m
   fenced close=")"
   open="("><mrow><mtable><mtr><mtd><mi>M</mi></mtd></mtr><mtr><mtd><mrow>
   <mrow></mrow><mi>i</mi></mrow></mtd></mtr></mtable></mrow></mfenced><mf
   enced close=")"
   open="("><mrow><mtable><mtr><mtd><mrow><mi>N</mi><mo>-</mo><mi>M</mi></
   mrow></mtd></mtr><mtr><mtd><mrow><mrow></mrow><mrow><mi>n</mi><mo>-</mo
   ><mi>i</mi></mrow></mrow></mtd></mtr></mtable></mrow></mfenced></mrow><
   mfenced close=")"
   open="("><mrow><mtable><mtr><mtd><mi>N</mi></mtd></mtr><mtr><mtd><mrow>
   <mrow></mrow><mi>n</mi></mrow></mtd></mtr></mtable></mrow></mfenced></m
   frac></mrow> :MATH]

   N is the number of all transcripts with KEGG annotation; n is the
   number of peak-related genes in N; M is the number of all transcripts
   annotated to specific pathways; and m is the number of peak-related
   genes in M.

   Notably, Cell Ranger ATAC software was used to perform initial data
   processing and downstream analysis as described above, while Loupe Cell
   Browser interactive visualization software was used to generate
   scATAC-seq peak profiles for cell clusters. p-value in this manuscript
   was calculated with Loupe Cell Browser 3.1.1 through the difference
   analysis part, and it was adjusted using the Benjimini-Hochberg
   correction for multiple tests.

Deep sequencing data

   CBI Gene Expression Omnibus: sequencing data are available under the
   accession number [127]GSE157595.

Supplementary information

   [128]Supplementary information.^ (1.1MB, docx)
   [129]Supplementary information.^ (19.8KB, docx)
   [130]Supplementary information.^ (221.1KB, xlsx)
   [131]Supplementary information.^ (33.6KB, xlsx)
   [132]Supplementary information.^ (58.7KB, xlsx)

Acknowledgments