Abstract
Background
Primary biliary cholangitis (PBC) is a classical autoimmune disease,
which is highly influenced by genetic determinants. Many genome-wide
association studies (GWAS) have reported that numerous genetic loci
were significantly associated with PBC susceptibility. However, the
effects of genetic determinants on liver cells and its immune
microenvironment for PBC remain unclear.
Results
We constructed a powerful computational framework to integrate GWAS
summary statistics with scRNA-seq data to uncover genetics-modulated
liver cell subpopulations for PBC. Based on our multi-omics integrative
analysis, 29 risk genes including ORMDL3, GSNK2B, and DDAH2 were
significantly associated with PBC susceptibility. By combining GWAS
summary statistics with scRNA-seq data, we found that cholangiocytes
exhibited a notable enrichment by PBC-related genetic association
signals (Permuted P < 0.05). The risk gene of ORMDL3 showed the highest
expression proportion in cholangiocytes than other liver cells
(22.38%). The ORMDL3^+ cholangiocytes have prominently higher
metabolism activity score than ORMDL3^− cholangiocytes
(P = 1.38 × 10^–15). Compared with ORMDL3^− cholangiocytes, there were
77 significantly differentially expressed genes among ORMDL3^+
cholangiocytes (FDR < 0.05), and these significant genes were
associated with autoimmune diseases-related functional terms or
pathways. The ORMDL3^+ cholangiocytes exhibited relatively high
communications with macrophage and monocyte. Compared with ORMDL3^−
cholangiocytes, the VEGF signaling pathway is specific for ORMDL3^+
cholangiocytes to interact with other cell populations.
Conclusions
To the best of our knowledge, this is the first study to integrate
genetic information with single cell sequencing data for parsing
genetics-influenced liver cells for PBC risk. We identified that
ORMDL3^+ cholangiocytes with higher metabolism activity play important
immune-modulatory roles in the etiology of PBC.
Graphical Abstract
[47]graphic file with name 12951_2021_1154_Figa_HTML.jpg
Supplementary Information
The online version contains supplementary material available at
10.1186/s12951-021-01154-2.
Keywords: Single cell sequencing analysis, GWAS, Risk genes, PBC,
ORMDL3, Liver cells
Background
Primary biliary cholangitis (PBC), which is formally known as primary
biliary cirrhosis until 2016 [[48]1], is a rare chronic cholestatic
liver disease characterized by progressive autoimmune-mediated
destruction of the small intrahepatic biliary epithelial cells [[49]2,
[50]3]. PBC patients suffering from chronic cholestasis can eventually
lead to cirrhosis and hepatic failure without effective treatments
[[51]2]. Although ursodeoxycholic acid has been used as the first-line
therapeutic agent for PBC, there exist 10–20% of PBC patients resistant
to ursodeoxycholic acid and developing to advanced-stage liver disease
[[52]2]. Previous studies [[53]4] have reported that a combination of
genetic and environmental risk factors have an important influence on
the aetiology of PBC. Hence, understanding the genetic mechanisms of
PBC is becoming a great interest, which may promote the development of
individualized therapeutic strategy for PBC.
Over the past decade, a growing number of genome-wide association
studies (GWAS) and Immunochip studies based on East Asian and European
populations have been performed to uncover the genetic susceptibility
loci associated with PBC [[54]4]. To date, more than 40 genetic loci
with numerous risk genes have been reported [[55]5–[56]10], such as
SLC19A3/CCL20, IRF8/FOXF1, NFKB1/MANBA, and PDGFB/RPL3. Nevertheless,
the GWAS approach has generally focused on examining the genetic
associations of millions of single nucleotide polymorphisms (SNPs) and
only a handful of SNPs with a genome-wide significance (P ≤ 5 × 10^–8)
are reported. There exist many common SNPs with small marginal effects
were neglected [[57]11, [58]12]. Moreover, the vast majority of
reported SNPs were mapped within non-coding genomic regions [[59]12].
It is plausible to infer that these non-coding SNPs may modulate the
expression levels of corresponding risk genes rather than change the
functions of their proteins. Thus, combination of GWAS summary
statistics and other different types of data that characterize tissue-
and cell-type-specific activity, including expression quantitative
trait loci (eQTL) [[60]11], DNase I-hypersensitive sites (DHS)
[[61]13], and histone marks [[62]14], contributes to highlight
disease-related risk genes and cell types.
With the advance of single cell sequencing techniques, researchers have
an effective avenue to discover more refined and novel cell populations
for complex diseases [[63]15]. An accruing and large number of single
cell RNA sequencing (scRNA-seq) studies on autoimmune diseases,
including rheumatoid arthritis [[64]16], inflammatory bowel disease
[[65]17], and systemic lupus erythematosus [[66]18], have been reported
to parse the heterogeneity of cellular subpopulations at unprecedented
resolution. In view of no scRNA-seq study was conducted for uncovering
human liver cell types implicated in PBC, we constructed a
computational framework to identify risk genes whose genetically
expressions associated with PBC and pinpoint cell subpopulations
implicated in the etiology of PBC.
Results
The framework for integrating single-cell transcriptomes and GWAS on PBC
In the current study, we constructed a computational framework to
unveil the cell-type specific genetic influence on the etiology of PBC
based on multiple omics datasets (Fig. [67]1 and Additional file [68]2:
Table S1). It included three primary sections: (1) combination of GWAS
summary statistics on PBC with scRNA-seq data to recapitulate the
genetics-modulated single cell landscape for PBC (Fig. [69]1a), (2)
prioritization of genetics-risk genes and pathways contributing to PBC
(Fig. [70]1b), and (3) revelation of the cell-to-cell interactions and
metabolic activities of genetics-influenced subset of liver
cholangiocytes and its immune microenvironment for PBC (Fig. [71]1c).
Fig. 1.
[72]Fig. 1
[73]Open in a new tab
The workflow of current integrative genomics analysis. a Integrating
single cell RNA-sequencing data with GWAS summary statistics on PBC
based on a regression-based polygenic model; Left panel: Two
independent single cell RNA-seq datasets based on liver and peripheral
CD4 + T cells, and one large-scale GWAS summary statistics on PBC;
Middle panel: There were 8444 parenchymal and non-parenchymal cells
obtained the transcriptional profiles based on the CellRanger analysis
pipeline and clustered by using t-Distributed Stochastic Neighbor
Embedding (tSNE) method, and we also obtained 1,124,241 SNPs from the
whole genome with P values and Beta value; Right panel: We applied a
regression-based polygenic model in RolyPoly to unveil the genetic
mapping single-cell landscape for PBC. b Prioritization of PBC-risk
genes by integrating GWAS summary data with GTEx eQTL data; In this
step, multiple bioinformatics analyses were leveraged to prioritize
PBC-risk genes, including MAMGA-based gene-level association analysis,
gene-property analysis, S-MultiXcan- and S-PrediXcan-based integrative
genomics analyses, 10^5 times of in silico permutation analysis,
differential gene expression (DGE) analysis, multidimensional scaling
(MDS) analysis, PPI network-based analysis, drug-gene interaction
analysis, and functional enrichment analysis. c Genetics-influenced
liver cell subpopulations and its immune microenvironment for PBC. We
performed comprehensive single cell sequencing-based analyses to
uncover the biological functions of ORMDL3^+ cholangiocytes and its
interacted immune cells of macrophages and monocytes. ORMDL3^+
cholangiocytes have significantly elevated metabolism activity score,
and VEGF signaling pathway has a crucial role in cellular
communications of ORMDL3^+ cholangiocytes
Identification of gene-level genetic associations for PBC
To identify the aggregated effects of SNPs in a given gene on PBC, we
performed a gene-level genetic association analysis and found that 563
genes were significantly associated with PBC (FDR ≤ 0.05, Additional
file [74]1: Fig. S1, Additional file [75]2: Table S2). For example, the
top-ranked genes of HLA-DPA1 (P = 2.69 × 10^–24), IL12A
(P = 6.63 × 10^–22), and BTNL2 (P = 1.65 × 10^–17). By using a
genome-wide pathway analysis, there were 41 KEGG pathways significantly
enriched (FDR ≤ 0.05, Additional file [76]2: Table S3 and Additional
file [77]1: Fig. S2a). Based on the MDS analysis, these 41 pathways
were grouped into five clusters, including Th1 and Th2 cell
differentiation, allograft rejection, Th17 cell differentiation, cell
adhesion molecules, and cytokine-cytokine receptor interactions
(Additional file [78]1: Fig. S2b). The top-ranked pathways, such as Th1
and Th2 cell differentiation, allograft rejection, inflammatory bowel
disease, and type I diabetes mellitus, were relevant to autoimmune
diseases.
Furthermore, by performing a gene-property analysis based on mouse
liver tissue with immune cells, we found that PBC-associated genes were
significantly enriched in several immune-related cell types (Additional
file [79]1: Fig. S3 and Additional file [80]2: Table S4), including
Cst3^+ dendritic cell (P = 5.8 × 10^–3), Trbc2^+ T cell
(P = 7.1 × 10^–3), Chil3^+ macrophage (P = 0.017), and Gzma^+ T cell
(P = 0.03), which were in line with the results in previous studies
[[81]19, [82]20].
Integrative analysis of GWAS summary statistics with eQTL data for PBC
To further highlight the functional genes whose expressions are
associated with PBC, we leveraged the S-MultiXcan software [[83]21] to
meta-analyze tissue-specific associations across 49 GTEx tissues. There
were 268 risk genes whose genetically-associated expression showing
notable associations with PBC (FDR ≤ 0.05, Fig. [84]2). Among these
significant genes, there were 52 risk genes having been documented in
the GWAS Catalog database (Additional file [85]2: Table S5). Moreover,
there was a high consistency of results between MAGMA and S-MultiXcan
analysis (232/268 = 86.6%, Additional file [86]1: Fig. S4), such as
HLA-DRB1 (P = 4.95 × 10^–69), HLA-DRB5 (P = 1.17 × 10^–42), and BTNL2
(P = 3.26 × 10^–41).
Fig. 2.
[87]Fig. 2
[88]Open in a new tab
Circus plot showing the results of S-MultiXcan integrative genomics
analysis. A circular symbol in the outer ring represents a given gene.
Color represents the statistical significance of the gene, where red
color marks significant genes with FDR ≤ 1 × 10^–8, orange color marks
significant genes with FDR is between 1 × 10^–8 and 0.001, light blue
indicates significant genes with FDR ranging from 0.001 to 0.05, and
dark blue indicates non-significant genes with FDR > 0.05
To further validate these risk genes in two PBC-relevant tissues (i.e.,
liver and blood) using the S-PrediXcan method, we found that 76 and 115
genes were significantly associated with PBC in liver and blood,
respectively (FDR ≤ 0.05, Fig. [89]3a, b and Additional file [90]2:
Tables S6, S7). In total, 29 risk genes were significantly associated
with PBC across different methods including MAGMA, S-MultiXcan, and
S-PrediXcan analyses (Fig. [91]3c). Using the Pearson correlation
analysis, we observed that significant genes from S-MultiXcan analysis
showed remarkable correlations with that from MAGMA and S-PrediXcan
analyses (P ≤ 0.05, Fig. [92]3d–f). Moreover, by performing three
independent permutation analyses, we found that the number of observed
overlapped genes between S-MultiXcan and MAGMA and S-PrediXcan were
significantly higher than random events (Empirical P < 1 × 10^–5,
Fig. [93]3g–i and Additional file [94]1: Fig. S5a–c). Overall, we
identified 29 genes contribute susceptibility to PBC (Table [95]1).
Fig. 3.
[96]Fig. 3
[97]Open in a new tab
Results from integrative genomics analyses. a, b Circus plot
demonstrates the results of S-PrediXcan integrative genomics analysis
on (a) liver tissue and (b) blood tissue. c Venn plot showing the
overlapped genes between S-MultiXcan, MAGMA, and S-PrediXcan on both
liver and blood. d–f Pearson correlation of top-ranked genes identified
from S-MultiXcan analysis with that from d MAGMA analysis, e
S-PrediXcan analysis on liver, and f S-PrediXcan analysis on blood. g–i
In silico permutation analysis of 100,000 random selections for the
overlapped genes between S-MultiXcan and g MAGMA, h S-PrediXcan on
liver, and i S-PrediXcan on blood
Table 1.
Identification of 29 significant risk genes by using integrative
genomics analysis
Gene name Z score P value (S-MultiXcan) FDR (S-MultiXcan) FDR
(S-PrediXcan on liver) FDR (S-PrediXcan on blood) FDR (MAGMA analysis
of GWAS) P value (Differential gene expression analysis) GWAS_Catalog
database or PubMed database
CSNK2B 1.40 7.17 × 10^–19 8.41 × 10^–16 3.22 × 10^–11 2.24 × 10^–6
1.67 × 10^–11 4.96 × 10^–4 (Blood), 2 × 10^–2 (CD14 + T cell) Novel
gene
LY6G5B − 7.92 2.43 × 10^–16 1.93 × 10^–13 5.52 × 10^–11 1.54 × 10^–10
1.19 × 10^–11 Non-significant Novel gene
DDAH2 − 5.57 1.39 × 10^–15 9.99 × 10^–13 2.65 × 10^–9 1.56 × 10^–11
1.69 × 10^–9 2.70 × 10^–3 (CD14 + T cell) Novel gene
LY6G5C − 7.82 2.40 × 10^–15 1.62 × 10^–12 8.64 × 10^–13 9.05 × 10^–12
1.41 × 10^–11 2.80 × 10^–2 (Blood) Novel gene
IRF5 7.11 2.62 × 10^–15 1.72 × 10^–12 4.29 × 10^–13 1.12 × 10^–12
7.18 × 10^–11 7.10 × 10^–3 (Liver) Reported gene
SOCS1 − 3.26 9.33 × 10^–13 4.16 × 10^–10 5.25 × 10^–10 7.34 × 10^–10
1.06 × 10^–8 3.50 × 10^–2 (CD14 + T cell) Reported gene
SYNGR1 − 1.30 2.83 × 10^–9 8.88 × 10^–7 7.97 × 10^–4 1.30 × 10^–7
4.13 × 10^–2 1.40 × 10^–2 (CD14 + T cell) Reported gene
C6orf48 − 4.46 9.57 × 10^–9 2.66 × 10^–6 7.83 × 10^–4 6.19 × 10^–4
4.83 × 10^–11 4.16 × 10^–6 (Blood) Novel gene
HLA-DMA − 3.99 8.87 × 10^–8 1.90 × 10^–5 1.27 × 10^–3 9.16 × 10^–4
1.62 × 10^–8 2.10 × 10^–3 (Liver) Novel gene
SMC4 − 1.34 1.00 × 10^–7 2.10 × 10^–5 9.38 × 10^–5 2.25 × 10^–6
2.07 × 10^–4 4.80 × 10^–4 (Blood) Novel gene
TCF19 4.55 1.49 × 10^–7 2.99 × 10^–5 9.46 × 10^–5 2.24 × 10^–6
1.98 × 10^–4 Non-significant Novel gene
ORMDL3 − 2.92 7.07 × 10^–7 1.22 × 10^–4 2.35 × 10^–3 7.38 × 10^–5
2.33 × 10^–6 3.5 × 10^–2 (Liver) Reported gene
KPNA4 − 1.22 1.07 × 10^–6 1.82 × 10^–4 2.10 × 10^–2 3.58 × 10^–3
4.67 × 10^–4 3.40 × 10^–3 (CD14 + T cell) Novel gene
MANBA − 1.45 1.48 × 10^–6 2.35 × 10^–4 2.46 × 10^–6 2.39 × 10^–2
8.30 × 10^–8 9.67 × 10^–5 (Blood) Reported gene
MFSD6 − 0.32 1.61 × 10^–6 2.51 × 10^–4 2.55 × 10^–2 4.57 × 10^–2
2.06 × 10^–3 3.60 × 10^–2 (Liver) Novel gene
MED1 − 2.13 1.76 × 10^–6 2.70 × 10^–4 1.65 × 10^–2 1.53 × 10^–2
4.42 × 10^–5 Non-significant Novel gene
IDUA 3.95 2.37 × 10^–6 3.47 × 10^–4 3.94 × 10^–2 3.87 × 10^–2
8.95 × 10^–4 8.10 × 10^–3 (Blood) Reported gene
DGKQ 3.24 3.86 × 10^–6 5.37 × 10^–4 4.90 × 10^–4 3.36 × 10^–4
2.35 × 10^–4 8.34 × 10^–7 (Blood) Reported gene
FCRL3 − 4.40 8.14 × 10^–6 1.07 × 10^–3 2.84 × 10^–3 2.84 × 10^–3
5.81 × 10^–4 Non-significant Reported gene
ZCRB1 3.22 9.34 × 10^–6 1.20 × 10^–3 4.83 × 10^–3 5.02 × 10^–3
1.52 × 10^–3 5.90 × 10^–3 (Liver); 2.8 × 10^–3 (CD14 + T cell); Novel
gene
AGAP5 − 4.28 1.29 × 10^–5 1.59 × 10^–3 4.50 × 10^–3 3.40 × 10^–3
3.65 × 10^–2 Non-significant Novel gene
CARM1 − 2.90 2.11 × 10^–5 2.49 × 10^–3 1.14 × 10^–3 8.92 × 10^–4
1.86 × 10^–4 2.60 × 10^–2 (Blood) Novel gene
UBE2D3 − 4.18 2.83 × 10^–5 3.23 × 10^–3 2.76 × 10^–3 5.08 × 10^–3
5.81 × 10^–4 3.40 × 10^–2 (Blood); 1.6 × 10^–4 (CD14 + T cell) Novel
gene
NAAA 4.08 6.73 × 10^–5 6.88 × 10^–3 1.25 × 10^–2 6.85 × 10^–3
7.41 × 10^–4 3.20 × 10^–2 (Liver) Reported gene
SH2B3 − 1.69 1.20 × 10^–4 1.17 × 10^–2 2.35 × 10^–2 1.87 × 10^–2
2.44 × 10^–3 1.40 × 10^–2 (Blood) Reported gene
CTSH 1.75 2.05 × 10^–4 1.84 × 10^–2 8.53 × 10^–3 3.46 × 10^–2
2.19 × 10^–5 Non-significant Novel gene
TTC34 3.56 2.82 × 10^–4 2.39 × 10^–2 5.86 × 10^–3 5.02 × 10^–3
4.98 × 10^–3 9.80 × 10^–3 (Blood) Novel gene
METTL1 2.19 3.11 × 10^–4 2.59 × 10^–2 4.95 × 10^–2 3.81 × 10^–2
1.68 × 10^–2 5.86 × 10^–7 (Blood) Novel gene
TSFM 3.45 3.14 × 10^–4 2.61 × 10^–2 4.87 × 10^–2 3.87 × 10^–2
1.54 × 10^–2 2.80 × 10^–4 (Blood); 1.5 × 10^–3 (CD14 + T cell) Novel
gene
[98]Open in a new tab
Based on three independent expression profiles of liver, blood, and
peripheral CD4 + T cells, we conducted differential gene expression
analysis for these 29 risk genes between PBC and matched control group.
We found that 23 of 29 genes (79.31%) showed significantly differential
expressions in PBC patients compared with controls (Table [99]1 and
Additional file [100]1: Figures S6–S8). The co-expression patterns
among these 29 genes in the peripheral CD4 + T cells were prominently
altered according to the PBC status (Additional file [101]1: Fig. S8a).
These results further support these identified risk genes have crucial
effects on PBC.
Functional analysis of these 29 PBC-associated risk genes
Through mining the PubMed literature and GWAS Catalog database, we
found that 10 of 29 identified risk genes have been reported to be
associated with PBC in previous GWAS studies, and there were 19 novel
genes newly identified (Table [102]1 and Additional file [103]1: Fig.
S9a). Among these novel identified genes, several genes (e.g., LY6G5B
and DDAH2) were associated with autoimmune-related diseases, including
rheumatoid arthritis [[104]22] and type 1 diabetes [[105]23]. By
conducting a PPI network enrichment analysis, we observed that these
risk genes were significantly interacted with each other in a
subnetwork (Fig. [106]4). For example, TCF19 showed shared protein
domains with IRF5 [[107]24], and DDAH2 was highly co-expressed with
CSNK2B [[108]25].
Fig. 4.
[109]Fig. 4
[110]Open in a new tab
PPI network analysis of 29 PBC-risk gene. This network analysis was
performed using the GeneMANIA tool. Orange node represents risk genes
identified to be associated with PBC in our current analysis. Gray node
represents predicted genes that connected with PBC-risk genes. Purple
edge represents the co-expression interactions (account for 69.52%).
Light orange edge represents the shared protein domain interactions
(account for 30.48%)
Furthermore, we performed a phenotype-based enrichment analysis, and
found that these 29 risk genes were remarkably enriched in several
phenotypes relevant to autoimmune diseases (FDR ≤ 0.05, Additional file
[111]1: Fig. S9b and Additional file [112]2: Table S8), such as type I
diabetes mellitus, immune system diseases, and juvenile rheumatoid
arthritis. We also performed a GO-term enrichment analysis, and found
that several GO-terms were notably overrepresented (Additional file
[113]1: Figs. S10–S12), such as interferon-gamma-mediated signaling
pathway (P = 2.8 × 10^–4) and neutrophil activation involved in immune
response (P = 7.3 × 10^–4).
Based on the drug-gene interaction analysis, we identified that 20 of
29 genes (68.96%) were enriched in ten potential “druggable” gene
categories (Additional file [114]1: Fig. S13a and Additional file
[115]2: Table S9). Five genes including SOCS1, TCF19, CARM1, SH2B3, and
NAAA were directly targeted at least one known drug (Additional file
[116]1: Fig. S13b). The gene of SOCS1 was found to be targeted by
insulin and aldesleukin, of which both have been applied to treat
autoimmune diseases, including systemic lupus erythematosus [[117]26],
type 1 diabetes mellitus [[118]27], and HIV [[119]28]. SH2B3 was
targeted by ruxolitinib and candesartan. Previous studies [[120]29]
have demonstrated that the Janus kinase (JAK)-inhibitor ruxolitinib
significantly influenced dendritic cell differentiation and function
resulting in impaired T-cell activation, which could be used for the
treatment of autoimmune diseases. These results provide a good drug
repurposing resource to develop effective therapeutics for PBC.
Identification of genetically-influenced liver cell subpopulations for PBC
Using the uniform manifold approximation and projection (UMAP), there
were 22 discrete clustered for the liver scRNA-seq dataset
(Fig. [121]5a). Using well-known marker genes, these clusters were
assigned into 13 distinct cell subpopulations, including portal
endothelial cells, cholangiocytes, non-inflammatory macrophages, T
cells, γδT cells, inflammatory monocytes/macrophages, natural killer
(NK)-like cells, red blood cells (RBCs), sinusoidal endothelial cells,
mature B cells, stellate cells, plasma cells, and hepatocytes
(Fig. [122]5b). By calculating the genetic risk score of 29
PBC-associated genes using the AddModuleScore function in Seurat, we
found that these risk genes were primarily enriched in non-inflammatory
macrophage and inflammatory monocyte/macrophage (Fig. [123]5c),
suggesting that these risk genes may have crucial functions in innate
immunity for PBC. To uncover genetics-regulatory cell subpopulations
associated with PBC, we used a regression-based polygenic model to
combine GWAS summary statistics on PBC with scRNA-seq data with 13
distinct human liver cell types. We found that cholangiocytes showed a
notable enrichment by PBC-relevant genetic association signals
(Permuted P < 0.05, Fig. [124]5d and Additional file [125]2: Table
S10). These results are consistent with previous evidence that an
immune-mediated injury of cholangiocytes contributes risk to PBC
[[126]30, [127]31].
Fig. 5.
[128]Fig. 5
[129]Open in a new tab
Genetics-influenced cell populations for PBC. a UMAP dimensionality
reduction embedding of liver and immune cells among all five samples
from primary liver patients (n = 8444 cells) colored by each cluster. b
UMAP embedding of liver and immune cells colored by orthogonally
generated clusters labeled by manual cell type annotation (13 cell
types). c UMAP embedding of all cells among 13 cell populations colored
by the genetic risk score of 29 PBC-risk genes. d Bar graph showing
genetics-influenced liver cell subpopulations for PBC. Orange color
represents PBC-relevant genetic association signals showing a
significant enrichment in cholangiocytes. e Dot plot showing the
expression percentage of 29 PBC-risk genes for each cell type from
human liver tissue. The color stands for the average expression of each
gene in each cell type, and the size of circular symbol indicates the
percentage of a given gene expressed in each cell type
Using the specificity algorithm in the MAGMA method [[130]32], we
noticed that the risk gene of ORMDL3 exhibited the highest expression
in cholangiocytes than other cell types (Fig. [131]5e and Additional
file [132]2: Table S11). The majority of ORMDL3-expressing cells were
cholangiocytes with a relative high percentage of 22.38%, reminiscing
that ORMDL3 was demonstrated to be significantly associated with PBC in
earlier studies [[133]7, [134]9, [135]33]. Based on the liver
expression profiles generated from healthy mice (n = 4) and mice
suffering from cholangitis (n = 6) from the GEO database
([136]GSE179993), we performed a differential gene expression analysis
and found that the ORMDL3 gene was significantly higher expressed among
cholangitis-affected mice than that among healthy age- and sex-matched
mice (P = 0.036, Additional file [137]1: Fig. S14). By further
performing immunohistochemistry experiment, we observed that the
expression of ORMDL3 in liver tissues of PBC patients was higher than
that in the normal liver, which is in line with the results from our
genomics analysis (Additional file [138]1: Fig. S15). The top-ranked
risk SNP associated with ORMDL3 is rs9303277 (P = 2.57 × 10^–11). This
SNP showed significant cell-specific eQTL of ORMDL3 among naïve B cell
(P = 5.8 × 10^–10), TFH^+CD4^+T cell (P = 3.8 × 10^–9), CD56^dim CD16^+
NK cells (P = 4.4 × 10^–9), TH1^+ CD4^+ T cells (P = 6.0 × 10^–6), and
memory TREG^+ CD4^+ T cells (P = 1.1 × 10^–5), and exhibited notable
cell-specific promoter-interacting eQTL of ORMDL3 among naïve B cell
(P = 4.2 × 10^–13) and CD56^dim CD16^+ NK cells (P = 5.0 × 10^–12)
(Additional file [139]1: Fig. S16). Among these immune cell types, the
CC genotype of rs9303277 shows prominent association with higher
expression of ORMDL3 compared with other genotypes (Additional file
[140]1: Fig. S16a–k).
Growing attentions have concentrated on the role of ORMDL3 in the
development of inflammatory diseases including PBC [[141]7, [142]9,
[143]34]. In view of the main goal of current study was to characterize
genetics-modulated liver cell subpopulations for PBC, the majority of
our subsequent analyses focused on revealing the functions of ORMDL3^+
cholangiocytes and its cellular communications with immune cells.
Characterization of biological functions of ORMDL3^+ cholangiocytes
The subset of ORMDL3^+ cholangiocytes account for 22.38% (635/2837) of
cholangiocytes (Fig. [144]6a). By calculating the metabolism activity
score among all annotated cells, we found that the cholangiocytes
showed remarkably higher metabolism activity score than other cells
(Fig. [145]6b). Interestingly, we noticed that the metabolism activity
score of ORMDL3^+ cholangiocytes was significantly higher than that in
ORMDL3^− cholangiocytes (P = 1.383 × 10^–15, Fig. [146]6c).
Furthermore, we compared the expression profiles of ORMDL3^+
cholangiocytes with ORMDL3^− cholangiocytes. There were 77
significantly DEGs with six up-regulated DEGs and 71 down-regulated
DEGs among ORMDL3^+ cholangiocytes (FDR < 0.05, Fig. [147]6d and
Additional file [148]2: Tables S12–S13). With regard to six
up-regulated DEGs, there were seven significant pathways
overrepresented (FDR < 0.05, Additional file [149]1: Fig. S17a and
Additional file [150]2: Table S14).
Fig. 6.
[151]Fig. 6
[152]Open in a new tab
The biological functions of ORMDL3^+ cholangiocytes. a UMAP projections
of liver and immune cells colored by ORMDL3^+ and ORMDL3^−
cholangiocytes. Blue color represents ORMDL3^− cholangiocytes, red
color represents ORMDL3^+ cholangiocytes, and gray color stands for the
rest of all other cell types. b UMAP embedding of all cells among 13
cell populations colored by the metabolism activity score of all
metabolism-related pathways collected from the KEGG pathway resource. c
Violin plot showing the difference in metabolism activity score between
ORMDL3^+ and ORMDL3^− cholangiocytes. Two-side Wilcoxon test was used
for assessing significance. d Volcano plot showing the differentially
expressed genes (DEGs) between ORMDL3^+ cholangiocytes and ORMDL3^−
cholangiocytes. Green color represents 71 significantly down-regulated
DEGs, and orange color represents 6 significantly up-regulated DEGs. e
Pathway enrichment analysis of 71 down-regulated DEGs based on the KEGG
pathway resource. f Scatter plot showing the dominant senders (sources)
and receivers (targets) in a 2D space. y axis represents incoming
interaction strength, and x axis represents outgoing interaction
strength. The size of each node represents the count of cellular
interactions. g The cellular communications of ORMDL3^+ Cholangiocyte
with other cell populations in liver tissues. The width represents the
number of cellular interactions
Among the 71 down-regulated DEGs, HLA-DRA has been shown to be
associated with PBC [[153]7], and CXCL8, CCL3, CXCL1, TIMP1, SPP1, and
IRF1 were inflammatory and cytokine genes, which have been reported to
be linked with the chemotaxis of immune cells that efflux to the site
of cytokine storms in response to ongoing tissue damage [[154]35,
[155]36]. IFI16 is reported to be an innate immune sensor for
intracellular DNA [[156]37]. Pathway enrichment analysis revealed that
there were 55 significant KEGG pathways enriched by these 71
down-regulated DEGs (FDR < 0.05, Fig. [157]6e and Additional file
[158]2: Table S15), of which several such as inflammatory bowel
disease, rheumatoid arthritis, Th17 cell differentiation,
cytokine-cytokine receptor interaction, and chemokine signaling pathway
have been demonstrated to implicate in inflammatory-related diseases
[[159]11, [160]38, [161]39].
To further explore the biological functions of these 71 down-regulated
DEGs, we conducted GO-term enrichment analysis according to three
categories of biological process (BP), cellular component (CC), and
molecular function (MF). There were 19 BP-terms, 7 CC-terms, and 13
MF-terms showing notable enrichments, respectively (FDR < 0.05,
Additional file [162]1: Fig. S17b–d and Additional file [163]2: Tables
S16–S18). These functional terms were largely linked with immune and
metabolism functions, such as granulocyte activation, neutrophil
mediated immunity, and S100 protein binding. These results suggest that
ORMDL3^+ cholangiocytes have immune-modulatory effects on PBC risk.
Cellular communications of ORMDL3^+ cholangiocytes with other cells
To gain refined insights into ORMDL3^+ cholangiocytes, we performed a
cell-to-cell interaction analysis among cell populations in liver
tissue using the CellChat algorithm. By calculating the aggregated
cell–cell communication network by counting the number of links, we
found that ORMDL3^+ cholangiocytes showed the highest outgoing
(sources) interaction strength than other cell types (Fig. [164]6e and
Additional file [165]1: Fig. S18). By further summarizing the
communication probability among cellular interactions, we observed a
high connective network of ORMDL3^+ cholangiocytes with other cells
(Fig. [166]6f and Additional file [167]1: Figs. S19, S20). The ORMDL3^+
cholangiocytes showed relatively high communications with
non-inflammatory macrophage and inflammatory monocyte/macrophages
(Fig. [168]6f), recalling that these two types of immune cells have
been remarkably enriched by PBC-risk genes in our above analysis. There
were 35 significant ligand-receptor interactions (including source and
target) of ORMDL3^+ cholangiocytes were predicted, including
MIF − (CD74 + CXCR4), MIF − (CD74 + CD44), VEGFA − VEGFR2, and
NAMPT-INSR (Fig. [169]7a, b). There existed several unique
ligand-receptor pairs in ORMDL3^+ cholangiocytes compared with ORMDL3^−
cholangiocytes, including VEGFA-VEGFR2, VEGFA-VEGFR1R2 and VEGFA-VEGFR1
(Additional file [170]1: Fig. S21).
Fig. 7.
[171]Fig. 7
[172]Open in a new tab
Dot plots showing that predicted cellular interactions of ORMDL3^+
Cholangiocyte with other cell populations in liver tissues. a ORMDL3^+
Cholangiocyte as a source (outgoing) communicating with other cells, b
ORMDL^+ Cholangiocyte as a target (incoming) communicating with other
cells. The circular size of represents the statistical significance of
each ligand-receptor pair, and color represents the communication
probability
By performing a pattern recognition analysis based on the non-negative
matrix factorization to uncover the global communication patterns among
all cell populations, we observed that five communication patterns that
connect cell populations with signaling pathways in the context of both
outgoing signaling (i.e., treating cells as source, Fig. [173]8a) and
incoming signaling (i.e., treating cells as target, Additional file
[174]1: Fig. S22a), respectively. We found that a large portion of
outgoing signaling of ORMDL3^+ cholangiocytes is characterized by
pattern #1, which represents multiple signaling pathways, including
MIF, PARs, VISFATIN, COMPLEMENT, ANGPTL, PROS, CHEMERIN, AGT, VEGF, and
IGF (Fig. [175]8b). Compared with ORMDL3^− cholangiocytes, the VEGF
signaling pathway is specific for ORMDL3^+ cholangiocytes as source
cells to interact with other cells (Fig. [176]8b). On the other hand,
the communication patterns of both ORMDL3^+ and ORMDL3^− cholangiocytes
as target cells showed similar patterns (Additional file [177]1: Fig.
S22b). Network centrality analysis confirmed that VEGF signaling
pathway is not only a significant sender (i.e., source cells) but also
a prominent influencer controlling the communications for ORMDL3^+
cholangiocytes (Fig. [178]8c). Notably, among all known ligand-receptor
pairs, VEGF signaling pathway is dominated by VEGFA ligand and its
VEGFR1 receptor (Additional file [179]1: Fig. S23). Using the
expression levels of all genes in the VEGF signaling pathway to
calculate a VEGF activity score, we found that ORMDL3^+ cholangiocytes
exhibited a significantly higher activity score than that among
ORMDL3^− cholangiocytes (P = 1.28 × 10^–15, Fig. [180]8d). These
results suggest that VEGF signaling pathway play an important role in
cellular communications of ORMDL3^+ cholangiocytes.
Fig. 8.
[181]Fig. 8
[182]Open in a new tab
The inferred outgoing (source) communication patterns of cell
populations in liver tissue. a Five inferred outgoing communications
patterns of 13 cell populations, which shows the correspondence between
the inferred latent patterns and cell groups, as well as signaling
pathways, b The contribution of signaling pathways among the five
outgoing communications patterns to liver and immune cell types. The
red dashed box marked the VEGF signaling pathway. c Heatmap showing the
relative importance of each cell group based on the computed four
network centrality measures of VEGF signaling pathway network. d Violin
plot showing the difference in VEGF signaling score between ORMDL3^+
and ORMDL3^− cholangiocytes. The expression levels of all genes in VEGF
signaling pathway were used to calculate the VEGF signaling score.
Two-side Wilcoxon test was used for assessing significance
Discussion
Using an integrative genomics approach based on multiple layers of
evidence, we identified that the genetics-influenced expression of 29
risk genes were remarkably associated with PBC. Among them, 10 genes,
including IRF5 [[183]5, [184]6, [185]8, [186]9], SOCS1 [[187]5], SYNGR1
[[188]5, [189]7], ORMDL3 [[190]7, [191]9, [192]33], MANBA [[193]5],
IDUA [[194]5], DGKQ [[195]5], FCRL3 [[196]40], NAAA [[197]41], and
SH2B3 [[198]5], have been documented to be associated with PBC. There
were 19 newly identified PBC-risk genes, such as GSNK2B, LY6G5B, DDAH2,
C6orf48, and HLA-DMA. Some of these novel risk genes such as DDAH2 and
HLA-DMA have been reported to be associated with the autoimmune
diseases, including type I diabetes [[199]42], rheumatoid arthritis
[[200]43], and systemic lupus erythematosus [[201]44]. The
previously-reported gene of ORMDL3 is related to biological functions
of innate immune system and metabolism. Moreover, ORMDL3 has also been
extensively reported to be linked with other inflammatory diseases,
including childhood asthma [[202]45], inflammatory bowel diseases
[[203]38], rheumatoid arthritis [[204]39], and Crohn’s disease
[[205]46]. Functional enrichment analyses uncovered that these
genetics-risk genes were notably enriched in several biological
processes or disease-terms, which are relevant to autoimmune phenotypes
[[206]47, [207]48]. Together, these identified genes are more likely to
be genuine genes implicated in PBC risk.
Our gene-property analysis unveiled that PBC-relevant genetic
association signals were significantly enriched in several
immune-related cell populations, including dendritic cells, T cells,
and macrophage. Consistently, based on the genetic risk score, we
revealed that these 29 PBC-associated genes were prominently enriched
in non-inflammatory macrophage and inflammatory monocyte/macrophage.
Dendritic cells have been shown to be relevant to the pathogenesis of
PBC [[208]19, [209]49]. Earlier studies have demonstrated that an
intense biliary inflammatory CD8 + and CD4 + T cell response has been
used for characterizing PBC [[210]50]. Moreover, dendritic cells are
crucial for inducing antigen-specific T-cell tolerance and for
activating the self-specific T cells, which have central roles in many
aspects of the pathogenesis of autoimmune liver diseases [[211]51].
Macrophages and monocytes are key components of the innate immune
system, and have tissue-repairing roles in reducing immune responses
and enhancing tissue regeneration [[212]52]. Multiple lines of evidence
[[213]53] have demonstrated that the infiltration of macrophages and
monocytes in diseased tissues is considered to be a hallmark of several
autoimmune diseases, including PBC [[214]20, [215]54]. Recently, Dorris
et al. [[216]55] reported that the PBC susceptibility gene C5orf30
modulates macrophage-mediated immune regulations. Overall, these
results suggest that genetics-risk genes have critical
immuno-modulatory roles in the development of PBC.
To further examine the effects of liver cell populations and its immune
microenvironments on PBC, we performed a regression-based polygenic
analysis based on human liver tissues with immune cells. Cholangiocytes
were significantly enriched by PBC-related genetic association signals.
Previously, the non-parenchymal cholangiocytes have been reported to be
injury in numerous human diseases termed as cholangiopathies [[217]56],
including PBC [[218]56, [219]57]. Recently, Banales and coworkers
[[220]56] have demonstrated that cholangiocytes play pivotal roles in
innate and adaptive immune responses relevant to immune-mediated
cholangiopathies. Erice et al. [[221]58] reported that microRNA-506
induces PBC-like features in human cholangiocytes and promotes the
activating processes of immune responses. Moreover, we found that there
existed higher metabolism activity score of cholangiocytes than other
cells among liver tissues, indicating that abnormal metabolic pathways
may involve in the etiology of PBC, which is in line with previous
results [[222]59]. Interestingly, the cell subset of ORMDL3^+
cholangiocytes have strikingly higher metabolic activity than ORMDL3
negative cells.
Compared with ORMDL3^− cholangiocytes, we found 77 significant DEGs
among ORMDL3^+ cholangiocytes, which contain numerous cytokine and
chemokine genes, such as CXCL8, CCL3, and CXCL1, that may involve in
mediating the immune-regulation for PBC risk. Furthermore, based on
cellular communications analysis, we identified ORMDL3^+ cholangiocytes
exhibited high interactions with two innate immune cell types of
non-inflammatory macrophage and inflammatory monocyte/macrophages.
Several vital signaling pathways, including MIF, PARs, VEGF, and IGF,
were predicted to implicate in the cellular interactions of ORMDL3^+
cholangiocytes with other cells. The VEGF signaling pathway showed a
higher specificity for ORMDL3^+ cholangiocytes than ORMDL3^−
cholangiocytes. VEGF is a potent stimulating factor for angiogenesis
and vascular permeability, which have been reported to involve in PBC
[[223]60]. Multiple lines of evidence have demonstrated that VEGF has
an important role in pathological conditions that are associated to
autoimmune diseases, such as systemic lupus erythematosus, rheumatoid
arthritis, inflammatory bowel disease, and multiple sclerosis
[[224]61].
There exist some limitations should be cautious. Although we leveraged
integrated bioinformatics methods to highlight PBC-associated risk
genes based on multiple omics data, there were many potential risk
genes with suggestive evidence for PBC as shown in the supplemental
tables needed to be further studied. Due to the heterogeneity across
different datasets used in the present investigation, we leveraged
different statistical methods for multiple testing correction for each
dataset, such as FDR < 0.05 for MAGMA-based gene association analysis
and S-MultiXcan analysis, permuted P < 0.05 for genome-wide pathway
enrichment analysis, Bonferroni corrected P < 0.05 for gene-property
analysis, and empirical P value < 0.05 for in silico permutation
analysis. In view of current integrative genomic analysis is only based
on samples derived from European ancestries, more studies based on
other ancestries should be performed to validate the effects of these
risk genes on PBC.
Conclusions
In summary, current study provides multiple lines of evidence to
support 29 genes including 19 novel genes are remarkably associated
with PBC susceptibility. To the best of our knowledge, this is the
first study to parse genetics-influenced human liver cell
subpopulations and its immune microenvironments that contribute risk to
PBC, and found that ORMDL3^+ cholangiocytes potentially play important
immune-regulatory roles in the pathogenesis of PBC. Current study gives
several highlighted genetics-risk genes and disease-relevant cell types
for unveiling the genetic mechanisms of PBC.
Methods
Datasets
1. Single-cell transcriptomes of PBC We downloaded two independent
single cell RNA sequencing profiles (Accession number:
[225]GSE93170 and [226]GSE115469) from the GEO database. With
regard to the dataset of [227]GSE93170, there were clinically and
pathologically diagnosed six healthy controls and six PBC patients
enrolled with written informed consent. Peripheral CD4 + T cells
were used to extract total RNA. The Agilent microarray of SurePrint
G3 human GE 8 × 60 K microarray kit was leveraged to produce gene
expression profiles according to manufacturer’s protocols. The
[228]GSE115469 dataset contained five samples from primary liver
patients, which were used for scRNA sequencing based on the
10 × Genomics Chromium Single Cell Kits. A total of 8444
parenchymal and non-parenchymal cells have obtained the
transcriptional profiles based on the CellRanger analysis pipeline.
The raw digital matrix of gene expression (namely UMI counts per
gene per cell) was filtered, normalized and clustered. Cell was
omitted if it has a very high (> 0.5) mitochondrial genome
transcript ratio or a very small library size (< 1500).
2. Bulk-based expression profiles of PBC We also downloaded two
independent bulk-based expression datasets based on liver tissue
(Accession number: [229]GSE159676) and blood (Accession number:
[230]GSE119600) from the Gene Expression Omnibus (GEO) database.
The dataset of [231]GSE159676 contained six healthy controls and
three PBC cases based on fresh frozen liver tissue, which were
obtained from explanted livers or diagnostic liver biopsies. The
Affymetrix Human Gene 1.0-st array was leveraged to produce
bulk-based liver expression profiles with 17,046 probes. With
respect to the dataset of [232]GSE119600, there were 47 healthy
controls and 90 PBC patients with whole blood samples. The Illumina
HumanHT-12 V4.0 expression beadchip was leveraged to produce
bulk-based blood transcriptomes with 47,230 probes.
3. GWAS summary statistics on PBC We downloaded a GWAS summary dataset
on PBC from the IEU open GWAS project
([233]https://gwas.mrcieu.ac.uk/) [[234]5]. There were 2764 PBC
patients and 10,475 healthy controls based on European ancestry in
this dataset used for performing a meta-analysis of GWAS signals. A
standard quality control (QC) pipeline was applied to remove
low-quality SNPs. The software package of MaCH [[235]62] with the
reference of HapMap3 CEU + TSI samples was implemented to perform a
genome-wide imputation analysis. There were 1,124,241 SNPs with
minor allele frequency > 0.005 and imputation quality score
R^2 > 0.5 included in the current analyses.
Combination of GWAS summary statistics with scRNA-seq data for PBC
We leveraged a widely-used method of RolyRoly [[236]63], which was
designed to gain the effects of SNPs near protein-coding genes on cell
types contributing to complex traits, to explore genetics-influenced
liver cell types for PBC. The regression-based polygenic model was used
in the RolyRoly to incorporate GWAS summary data with scRNA-seq data
(i.e., [237]GSE115469) for identifying PBC-associated liver cell
subpopulations. Let
[MATH: g(i) :MATH]
represents a given gene relevant to SNP
[MATH: i :MATH]
,
[MATH: Sj={i:g(i)=j} :MATH]
represents a given set with multiple SNPs relevant to the gene
[MATH: j :MATH]
, and
[MATH: βSj :MATH]
represents a PBC-GWAS-derived effect-size vector of
[MATH: Sj :MATH]
with a priori assumption that
[MATH:
βSj∼MVN0,σj<
/mrow>2I|Sj|
:MATH]
. Under the assumption, RolyPoly offers a polygenic linear model for
[MATH: βSj :MATH]
:
[MATH:
σj2=γ0+∑i=1
Nγiαji :MATH]
where
[MATH: γ0 :MATH]
represents an intercept term,
[MATH: αji(i=1,2,...,N) :MATH]
represents a group of annotations, for example, cell-type-specific gene
expressions, and
[MATH: γi :MATH]
is the annotation’s coefficient for
[MATH: αji :MATH]
. To fit the observed and expected sum squared SNP effect sizes
relevant to each gene, RolyPoly applies the method-of-moments
estimators to estimate
[MATH: γi :MATH]
by the following formula:
[MATH: E∑i∈S<
mi>jβ^i2=σj2TrRSj2+|Sj|σe2n-1<
/msup> :MATH]
where
[MATH: RSj :MATH]
represents the linkage disequilibrium (LD) matrix of
[MATH: Sj :MATH]
. The 1000 Genome Project European Phase 3 panel [[238]64] was used to
calculate the LD among SNPs. The major histocompatibility complex (MHC)
region was removed due to the highly extensive LD in this region.
RolyPoly utilized the 1000 iterations of block bootstrap to assess
standard errors for calculating t-statistic and corresponding P value.
P value ≤ 0.05 was interpreted to be of significance.
Integrative genomic analysis of combining GWAS summary statistics with eQTL
data
To highlight the functional risk genes whose expressions were
significantly associated with PBC, we leveraged the S-PrediXcan tool
[[239]65] to combine GWAS summary data on PBC with eQTL data for liver
and blood tissues from the GTEx Project (version 8) [[240]66]. A
MASHR-based prediction method with two linear regression models was
used to estimate gene expression weights:
[MATH:
Y=α1+Xlβl+
mo>ε1 :MATH]
[MATH:
Y=α2+Ggγg+
mo>ε2 :MATH]
where
[MATH: ε1 :MATH]
and
[MATH: ε2 :MATH]
are independent error terms,
[MATH: α1 :MATH]
and
[MATH: α2 :MATH]
are intercepts,
[MATH: Y :MATH]
is the
[MATH: n :MATH]
dimensional vector for
[MATH: n :MATH]
samples,
[MATH: Xl :MATH]
is the allelic dosage for SNP
[MATH: l :MATH]
in n samples,
[MATH: βl :MATH]
is the effect size of SNP
[MATH: l :MATH]
,
[MATH:
Gg=∑
i∈gene(g)ωigXi :MATH]
is the predicted expression calculated by
[MATH: ωlg :MATH]
and
[MATH: Xl :MATH]
, in which
[MATH: ωlg :MATH]
is derived from the GTEx Project, and
[MATH: γg :MATH]
is the effect size of
[MATH: Gg :MATH]
. The Wald-statistic Z score for each association is transformed as:
[MATH:
Zg=γ^gse
mi>γ^g≈∑i∈ge
mi>ne(g)ωigσ^iσ^gβ^ise
mi>β^i :MATH]
where
[MATH: σ^g :MATH]
is the standard deviation of
[MATH: Gg :MATH]
and can be calculated from the 1000 Genomes Project European Phase 3
Panel [[241]64],
[MATH: β^l :MATH]
is the effect size from GWAS on PBC and
[MATH: σ^l :MATH]
is the standard deviation of
[MATH: β^l :MATH]
.
To enhance the statistical power to identify significant genes whose
expression has similar functions across different tissues, we leveraged
the S-MultiXcan tool [[242]21] to incorporate convergent evidence
across 49 different GTEx tissues. The S-MultiXcan fits a multivariate
regression model from multiple tissue models jointly:
[MATH: Y=∑j=1
pTjgj+e=Tg
mtext>+e :MATH]
where
[MATH: T~j=∑
mo>i∈gene(j)ωi
msub>Xi :MATH]
is the predicted expression of tissue
[MATH: j :MATH]
, and
[MATH: Tj :MATH]
is the standardization of
[MATH: T~j :MATH]
to
[MATH:
mean=0 :MATH]
and
[MATH:
standar
ddeviat<
/mi>ion=1 :MATH]
.
[MATH: gj :MATH]
is the effect size for the predicted gene expression in tissue
[MATH: j :MATH]
,
[MATH: e :MATH]
is an error term with variance
[MATH: σe2
:MATH]
, and
[MATH: p :MATH]
is the number of chosen tissues. There were 22,279 genes used in the
multivariate regression model. P < 6.89 × 10^–4 (FDR < 0.05) was
considered to be significant.
Gene-based genetic association analysis
We conducted a gene-level association analysis of the GWAS summary
statistics on PBC by leveraging a multi-variant converging regression
model in the Multi-marker Analysis of GenoMic Annotation (MAGMA)
[[243]32, [244]67]. The analyzed SNP set among each gene was depended
on whether the SNP mapped into the body of the gene or its extended
regions (± 20 kb downstream or upstream). The 1000 Genomes Project
Phases 3 European Panel [[245]64] was used to compute the LD between
SNPs. The P value threshold was set to 0.0016, and the method of
Benjamini–Hochberg false discovery rate (FDR) was applied for multiple
testing correction [[246]68]. Furthermore, to investigate the
biological processes involved in PBC, we performed genome-wide pathway
enrichment analysis using the build-in function of the MAGMA gene-set
method [[247]32]. The competitive P value of each pathway was computed
by leveraging the results that the incorporated effect of genes in a
given pathway are prominently larger than the incorporated effect of
all rest genes. The build-in function of 10,000 permutations in MAGMA
was used for adjusting competitive P values. We utilized the
widely-used pathway resource of KEGG with latest version [[248]69] for
the MAGMA-based pathway enrichment analysis.
Multidimensional scaling analysis
To obtain the similarity of enriched pathways identified from
MAGMA-based pathway enrichment analysis, we carried out a
multidimensional scaling (MDS) analysis to group these biological
pathways. First, we arranged a pathway.txt file included all the
significant enriched pathways, and then utilized the Jaccard distance
algorithm [[249]70] to calculate the pathway-pathway distance scores
according to overlapped genes. Using the distance scores among
pathways, we obtained the first two components of results from the MDS
analysis (i.e., MDS1 and MDS2). Subsequently, we plotted the clusters
of these identified pathways via MDS1 and MDS2 using the symbols
function in R platform [[250]71]. The most significant pathway (i.e.,
pathway has the lowest P value in each cluster) was used to mark each
cluster.
In silico permutation analysis
We referenced the method used in previous studies [[251]11, [252]67,
[253]68, [254]72] to perform an in silico permutation analysis of
100,000 times of random selections (N[Total]) for validating the
consistency of results from S-MultiXcan analysis (Geneset #1, N = 308,
FDR ≤ 0.05) with other results from three distinct analyses: MAGMA
analysis (Geneset #2: N = 563, FDR ≤ 0.05), S-PrediXcan on liver
(Geneset #3: N = 76, FDR ≤ 0.05), and S-PrediXcan on blood (Geneset #4:
N = 115, FDR ≤ 0.05). First, we separately calculated the number of
overlapped genes between Geneset #1 and Genesets #2, #3 and #4
(N[Observation]). Second, we used the total number of genes in MAGMA,
S-PrediXcan on liver, and S-PrediXcan on blood as the background genes
(N[Background] = 18,068, 12,033, and 12,070, respectively). By randomly
selecting the same number of genes as Genesets #2, #3, and #4 from the
background genes to overlap with Geneset #1, we computed the number of
overlapped genes in each time (N[Random]). The empirical P value was
calculated by using the formula: P =
[MATH: NRandom≥NObservationN
mi>Total
:MATH]
. The empirically permuted P value ≤ 0.05 is interpreted as
significance. By using the same method as above, we also performed an
in silico permutation analysis for results from S-MultiXcan, MAGMA, and
S-PrediXcan (on liver and blood) at a significant level of P
value ≤ 0.05 to further explore the consistency.
Functional enrichment analysis of disease- and GO-terms
We conducted a disease-based enrichment analysis for these identified
risk genes associated with PBC by using the WEB-based Gene SeT AnaLysis
Toolkit (WebGestalt: [255]http://www.webgestalt.org) [[256]73] based on
the GLAD4U database [[257]74]. Moreover, we used the WebGestalt tool to
perform a functional enrichment analysis for these common PBC-genetic
risk genes based on the Gene Ontology (GO) database [[258]75]. There
were three categories of GO-terms: molecular function (MF), cellular
component (CC), and biological process (BP). The number of genes in
each GO-term is between 5 and 2000. The over-representation algorithm
was adopted for evaluating the significant level for these enrichment
analyses, and the Benjamini–Hochberg FDR method was used for multiple
correction. In addition, the WebGestalt tool was used to conduct
functional enrichment analyses for these significantly differentially
expression genes (DEGs) between ORMDL3^+ and ORMDL3^− cholangiocytes.
Protein–protein interaction network analysis
To explore the functional interactions of these identified
PBC-associated genes, we conducted a protein–protein interaction (PPI)
network-based analysis by leveraging the GeneMANIA tool
([259]http://www.genemania.org/), which is a widely-used plug-in of
Cytoscape platform [[260]76–[261]78]. By using PBC-associated genes as
an input list, the GeneMANIA would predict genes with similar
biological functions and construct interacted links by incorporating
current existing genomics and proteomics information, containing
co-expression associations and shared protein domains.
MAGMA gene-property analysis
To identify PBC-associated single cell subpopulations enriched by
MAGMA-identified genes, we conducted MAGMA gene property analysis using
the web-access tool of FUMA ([262]https://fuma.ctglab.nl/) [[263]79],
which is a highly efficient and easy-to-use software, to integrate
gene-level association signals from GWAS summary data on PBC [[264]5]
with scRNA-seq data based on liver tissue from the Mouse Cell Atlas
[[265]80]. There were 20 distinct cell types of liver tissue in the
Mouse Cell Atlas. The Bonferroni method [[266]81] was used for multiple
testing correction.
Immune cell-specific expression quantitative trait loci
To find the eQTL for PBC-risk SNPs, we first extracted genetic variants
statistics around upstream and downstream 400 kb of the targeted genes
from GWAS summary data. We used the LocusZoom
([267]http://locuszoom.org/) to figure a regional association plot for
each gene. Then, we explored the cis-eQTL and promotor-interacting eQTL
(pieQTL) among the Database of Immune Cell Expression quantitative
trait loci and Epigenomics (DICE,
[268]https://dice-database.org/landing) [[269]82] for annotating
functional genetic variants.
Defining cell activity scores
We applied the cell activity score (CAS) to evaluate the immunological
degree and metabolic activity of each cell type by using a pre-defined
expression gene set [[270]83]. The CASs were obtained by using a
widely-used equation: CAS[j] (n) = average [RLE (PGS[j], n)] − average
[RLE (CGS[m], n)], where PGS[j] is a pre-curated gene set j in a given
cell n, and CGS[m] is a control gene set that was randomly selected
from aggregate expression levels bins, which gain a comparable
distribution of expression levels and over size to that of the
pre-defined gene set. RLE represents the relative expression of PGS[j]
or CGS[m]. The AddModuleScore function in the Seurat tool [[271]84] was
leveraged to calculate the CAS with default parameters. We used
pre-collected 85 metabolism-related pathways (N = 1667 genes) in KEGG
database [[272]69], such as primary bile acid biosynthesis (ID: 00120),
pyruvate metabolism (ID: 00620), and oxidative phosphorylation (ID:
00640), to define inflammatory score and metabolic activity score,
respectively.
Cellular interactions among different cell types
To unveil potential cell-to-cell communications of ORMDL3^+ and
ORMDL3^− cholangiocytes with other liver and immune cells, we leveraged
the CellChat R package [[273]85] to infer the cellular interactions
based on the normalized scRNA-seq dataset (i.e., [274]GSE115469). The
algorithm of CellChat could examine the ligand-receptor interactions
significance among different types of cells based on the expression of
soluble agonist, soluble antagonist, and stimulatory and inhibitory
membrane-bound co-receptors. By summing the probabilities of the
ligand-receptor interactions among a given signaling pathway, we could
calculate the communication probability for the pathway.
Immunohistochemistry for the expression of ORMDL3 in liver
Through searching the information retrieval system, we found four
patients diagnosed as PBC through pathology in the past 2 years in the
First Affiliated Hospital of Wenzhou Medical University, and three of
them were able to obtain liver tissue sections. Approved by Ethics
Committee in Clinical Research (ECCR) of the First Affiliated Hospital
of Wenzhou Medical University, the liver paraffin sections of the PBC
patients and paracarcinoma paraffin section of intrahepatic metastasis
of rectal stromal tumor patient as negative control were used in this
research. Immunohistochemistry was performed to assess the differential
expression of ORMDL3 in liver between PBC patient and normal control.
Briefly, 5-mm paraffin sections were deparaffinized with xylene and
then rehydrated through descending grades of alcohol. Antigen retrieval
was performed by heating heated (95 °C) in the 0.01 M sodium citrate
buffer (pH 6.0) for 15 min and incubating in 3% hydrogen peroxide for
10 min to block endogenous peroxidases. The paraffin-embedded liver
sections were incubated overnight at 4 °C with Anti-ORMDL3 antibody
(Abcam, Cambridge, MA, USA), followed by the appropriate horseradish
peroxidase-conjugated secondary antibodies (zsbio, Beijing, China).
Diaminobenzidine (DAB) was used as the chromogenic substrate and the
sections were observed under the microscopy.
Statistical analysis
Differential gene expression (DGE) analyses between controls and PBC
patients of three RNA expression datasets (i.e., [275]GSE93170,
[276]GSE159676, and [277]GSE119600) were examined by using the
Student’s T-test. P value ≤ 0.05 was of significance. We also performed
a co-expression pattern analysis in the dataset of [278]GSE93170 for
genetics-risk genes among PBC and healthy controls to evaluate whether
the co-expression patterns were changed due to the disease status. The
PLINK (v1.90) [[279]86] was used to calculate the LD values among SNPs.
The hypergeometric test was used to evaluate the significant enrichment
for the disease- and GO-term enrichment analysis. The Jaccard distance
algorithm [[280]70] was used to assess the similarities among pathways.
Supplementary Information
[281]12951_2021_1154_MOESM1_ESM.docx^ (9.6MB, docx)
Additional File 1: Figure S1. Circus plot showing the results of
gene-level genetic association analysis of GWAS summary statistics on
PBC. Figure S2. MAGMA-based functional annotation analysis for GWAS
summary data. a) MAGMA-based pathway enrichment analysis based on GWAS
summary data on PBC. b) Multidimensional scaling analysis of 41
significant pathways identified MAGMA-based pathway analysis. Figure
S4. Results of MAGMA gene-property analysis based on the web-access
tool of FUMA. Figure S2. Venn plot shows the overlap of results between
MAGMA and S-MultiXcan analysis. Figure S5. Permutation analysis of
100,000 times for results from MAGMA and S-PrediXcan (liver and blood)
compared with results from S-MultiXcan across multiple tissues. Figure
S6. Differential gene expression analysis of 29 risk genes based on
bulk RNA profiles of liver tissue (Dataset of GSE159676). Figure S7.
Differential gene expression analysis of 29 risk genes based on bulk
RNA profiles of blood tissue (Dataset of GSE119600). Figure S8.
Differential gene expression analysis of 29 risk genes based on bulk
RNA profiles in CD4+T cells (Dataset of GSE93170). Figure S9.
Functional characterization of 29 risk associated with PBC. a)
Schematic diagram showing the literature mining and GWAS catalog
searching for novel PBC-risk genes. b) Phenotype-based enrichment
analysis based on the GLAD4U database for 29 risk genes. Figure S10.
Functional enrichment analysis of GO-term of biological process for 29
PBC-risk genes. Figure S11. Functional enrichment analysis of GO-term
of cellular components for 29 PBC-risk genes. Figure S12. Functional
enrichment analysis of GO-term of molecular function for 29 PBC-risk
genes. Figure S13. Gene-drug interaction analysis for these identified
PBC-associated risk genes. a) Drug-gene interaction analysis showing 10
druggable gene categories. b) Drug-gene subnetwork. Figure S14. Violin
plot showing the difference in the expression level of Ormdl3 between
cholangitis-affected mice and healthy controls. Figure S15. The
expression of ORMDL3 examined in PBC livers and paracarcinoma of
intrahepatic metastasis by immunohistochemistry. Figure S16. Box plots
showing the effects of different genotypes (CC, CT, and TT) of
rs9303277 on the expression of ORMDL3 among different immune cell
types. Figure S17. Functional enrichment analysis of differential
expressed genes between ORMDL3+ and ORMDL3- cholangiocytes. Figure S18.
Identify signals contributing most to outgoing (sources) or incoming
(targets) signaling of cell populations in liver tissues (GSE115469).
Figure S19. The aggregated cell-tocell communication network among cell
populations in liver tissues (GSE115469). Figure S20. The cellular
communications of ORMDL3- Cholangiocyte with other cell populations in
liver tissues (GSE115469). Figure S21. Dot plots showing that predicted
cellular interactions of ORMDL3- Cholangiocyte with other cell
populations in liver tissues (GSE115469). Figure S22. The inferred
incoming (target) communication patterns of cell populations in liver
tissue (GSE115469). Figure S23. Relative contribution of each
ligand-receptor pair to that of VEGF signaling pathway.
[282]12951_2021_1154_MOESM2_ESM.docx^ (107.3KB, docx)
Additional File 2: Table S1. Summary of collected datasets in current
integrative genomics analysis. Table S2. Significant genes identified
from MAGMA-based gene association analysis (FDR < 0.05). Table S3.
MAGMAidentified significant enriched pathways based on GWAS summary
data. Table S4. MAGMA gene-property analysis identifies liver single
cells based on the Mouse Cell Atlas. Table S5. Significant
PBC-associated genes identified by the S-MultiXcan tool. Table S6.
Significant genes identified from S-PrediXcan analysis by integrating
GWAS summary data with GTEx liver eQTL data. Table S7. Significant
genes identified from S-PrediXcan analysis by integrating GWAS summary
data with GTEx blood eQTL data. Table S8. Phenotype-based enrichment
analysis of these 29 identified risk genes for PBC based on the GLAD4U
database. Table S9. PBC-associated risk genes matched in druggable gene
categories. Table S10. Results of PBC-associated liver cell types by
using Rolypolybased integrative analysis of combining GWAS summary data
with scRNA-seq data. Table S11. The percentage of expressed 27 genetic
risk genes in all 13 distinct cell types in human liver tissues. Table
S12. The significantly upregulated DEGs among ORMDL3+ cholangiocytes.
Table S13. The significantly down-regulated DEGs among ORMDL3+
cholangiocytes. Table S14. Significantly KEGG pathways enriched by
up-regulated DEGs among ORMDL3+ cholangiocytes using the
clusterProfiler tool. Table S15. Significantly KEGG pathways enriched
by down-regulated DEGs among ORMDL3+ cholangiocytes using the
clusterProfiler tool. Table S16. GO enrichment analysis according to
biological process terms for 71 down-regulated DEGs among ORMDL3+
cholangiocytes. Table S17. GO enrichment analysis according to cellular
component terms for 71 down-regulated DEGs among ORMDL3+
cholangiocytes. Table S18. GO enrichment analysis according to
molecular function terms for 71 downregulated DEGs among ORMDL3+
cholangiocytes.
Acknowledgements