Abstract
Latent tuberculosis infection (LTBI) poses a major roadblock in the
global effort to eradicate tuberculosis (TB). A deep understanding of
the host responses involved in establishment and maintenance of TB
latency is required to propel the development of sensitive methods to
detect and treat LTBI. Given that LTBI individuals are typically
asymptomatic, it is challenging to differentiate latently infected from
uninfected individuals. A major contributor to this problem is that no
clear pattern of host response is linked with LTBI, as molecular
correlates of latent infection have been hard to identify. In this
study, we have analyzed the global perturbations in host response in
LTBI individuals as compared to uninfected individuals and particularly
the heterogeneity in such response, across LTBI cohorts. For this, we
constructed individualized genome-wide host response networks informed
by blood transcriptomes for 136 LTBI cases and have used a sensitive
network mining algorithm to identify top-ranked host response
subnetworks in each case. Our analysis indicates that despite the high
heterogeneity in the gene expression profiles among LTBI samples, clear
patterns of perturbation are found in the immune response pathways,
leading to grouping LTBI samples into 4 different immune-subtypes. Our
results suggest that different subnetworks of molecular perturbations
are associated with latent tuberculosis.
Keywords: latent tuberculosis, genome-wide network analysis,
transcriptomics, heterogeneity, immune subtypes
Introduction
Mycobacterium tuberculosis (Mtb) is one of the most successful
pathogens known to humans. Despite the availability of an array of
anti-mycobacterial drugs, tuberculosis (TB) is still the leading cause
of death among infectious agents. Although it is the active form of TB
that causes morbidity, contagiousness and mortality, a majority of Mtb
infected individuals remain latently infected ([29]1). These
individuals, accounting for approximately 1.7 billion people worldwide,
harbor the dormant pathogen while remaining clinically asymptomatic for
decades and carry a 10% lifetime risk of TB reactivation and thus act
as reservoirs of the TB bacilli ([30]1, [31]2). Therefore to eradicate
TB, it is not only necessary to treat the active TB patients, but also
important to successfully diagnose and treat LTB infection (LTBI) in
asymptomatic individuals. Establishment and maintenance of latency
result from an equilibrium in the host-pathogen interactions where the
host immune response can successfully contain the spread of the
bacteria by forming granulomatous lesions but fails to completely
eradicate it ([32]3).
Multiple studies have previously shown the presence of a wide spectrum
of disease in tuberculosis, including latent, incipient, subclinical
and active TB ([33]4, [34]5). Although the last three types have
historically gathered more research focus due to the imminent threat to
the patient, it is important to gain deeper understanding into the
mechanisms that establish and maintain LTBI condition, preventing its
progression to further stages.
Multiple challenges are involved in studying LTBI. At the outset,
accurate diagnosis of LTBI remains to be problematic as the Interferon
Gamma Release Assay (IGRA), the most sensitive technique currently
available, lacks the specificity to differentiate between active and
latent TB as well as new and treated infections ([35]6). Moreover, as
LTBI individuals are clinically asymptomatic and do not routinely
require any clinical intervention, testing is often limited to primary
contacts of active TB patients. Next, there is a lack of an ideal
animal or in vitro model for LTBI, making it hard to study the
condition. Studies involving whole blood samples from LTBI subjects can
be expected to throw more light on the precise host immune response in
maintaining latency of Mtb. Systems level studies based on host
transcriptome data are increasingly being used to gain a holistic
insight into different perturbations in TB and other infectious
diseases ([36]7–[37]9). These studies have been successful in gaining
mechanistic insights into active TB ([38]10–[39]12) as well as in
identifying biomarkers to differentiate active TB from LTBI or
uninfected individuals ([40]7, [41]13–[42]17). Blood-based
transcriptomic signatures have also been successful in identifying the
prospect of reactivation of LTBI ([43]18). Recently, Burel and
co-workers used transcriptomics and protein profiling of CD4+ T cells
and identified an LTBI specific signature to separate them from
uninfected cases ([44]19). However, accurately differentiating
non-incipient and non-subclinical LTBI from uninfected individuals has
remained an open question ([45]20).
In this work, we seek to obtain an unbiased and personalized view of
the key immune responses that are either activated or repressed in
response to a latent TB infection as compared to uninfected
individuals. Towards this, we built sample-specific network models of
the immune processes by using a computational pipeline previously
developed in the laboratory that integrates transcriptomes of LTBI
individuals with a human protein-protein interaction network to make
precision networks for each subject. Whole blood transcriptomes for
LTBI subjects were publicly available generated in multiple different
studies. Our network that was recently reconstructed in the laboratory
([46]21), is knowledge-based and covers interactions of proteins coded
by the whole genome, where the interactions and functional influences
are encoded with direction information. Our analysis algorithm
sensitively mines these transcriptome-integrated networks to find the
most important perturbations in LTBI and computes top-ranked perturbed
subnetworks. We observed that although the gene expression profiles of
the LTBI individuals vary greatly from the uninfected samples, there is
a high amount of heterogeneity among the LTBI samples. Our analysis
revealed that the highly heterogeneous gene expression profiles are
related to perturbations in a limited number of pathways, belonging
mostly to innate and adaptive immune responses. It further identified
that the meta-cohort studied here could be stratified into immune
subtypes where each subtype showed a unique perturbation pattern that
arises from different combinations of these pathways, and also the
subtype which might help in better clinical characterization. Our
observations suggest that different molecular perturbations could be
associated with the maintenance of TB latency by the host system.
Methods
Selection of Publicly Available Datasets
We searched the Gene expression repository GEO ([47]22) for microarray
and high throughput sequencing based expression profiling datasets on
latently infected human samples using the keyword search
‘(Tuberculosis) AND. “Homo sapiens”[porgn:_txid9606]’. Using this
search criteria we obtained a list of 196 datasets (both microarray and
RNA-seq data) that we filtered using the following criteria: i)
presence of LTBI as well as uninfected control samples, ii) data
obtained only from whole blood and not cell type-specific, iii) absence
of any comorbidity such as HIV, iv) more than 3 samples for each
condition and v) cohort from an adult age group (over 18 years). 5
datasets matched the selection criteria. The microarray datasets, which
also included samples from active TB patients were used as discovery
datasets in this study. Active TB patients were identified with sputum
smear, chest X-ray and/or culture positivity tests, whereas the LTBI
cases were Tuberculin Skin Test (TST) or IGRA positive and sputum smear
or culture test negative. Details of the 5 transcriptomic datasets
utilized in this work are provided in [48]Table 1 . Further, 4
additional transcriptomic datasets ([49] Table 1 ) were selected which
included samples from LTBI individuals but not uninfected cases. These
datasets were also used as a second layer of validation.
Table 1.
Blood transcriptome datasets obtained from GEO satisfying the inclusion
criteria as mentioned in Methods section.
GEO ID Platform Uninfected Samples LTBI Samples Active TB Samples
Geographic Location Age Group Dataset Usage Reference
[50]GSE19439 Illumina [51]GPL6947 12 17 13 London, UK Adult Discovery
([52]7)
[53]GSE19444 Illumina [54]GPL6947 12 21 21 London, UK Adult Discovery
([55]7)
[56]GSE28623 Agilent [57]GPL4133 37 25 46 The Gambia Adult Discovery
([58]13)
[59]GSE107993 Illumina HiSeq [60]GPL20301 15 16 – Leicester, UK Adult
Validation ([61]23)
[62]GSE107994 Illumina HiSeq [63]GPL20301 50 57 – Leicester, UK Adult
Validation ([64]23)
[65]GSE37250 Illumina [66]GPL10558 – 83 97 South Africa, Malawi Adult
Validation (Set 2) ([67]14)
[68]GSE40553 Illumina [69]GPL10558 – 36 29 South Africa Adult
Validation (Set 2) ([70]24)
[71]GSE79362 Illumina Hiseq
[72]GPL11154 – 153 – South Africa, Gambia Adolescent Validation (Set 2)
([73]18)
[74]GSE101705 Illumina Nextseq [75]GPL18573 – 16 27 South India Adult
Validation (Set 2) ([76]25)
[77]Open in a new tab
Processing of Gene Expression Data and Statistical Analysis
We obtained raw data for each dataset from the GEO. The datasets were
pre-processed and normalized separately since the platform chemistry
was different for each dataset. EdgeR and Limma package in Bioconductor
in the R statistical environment was used for all gene expression
analysis ([78]26–[79]28). Overall, each dataset was subjected to
background correction, normalization and probe-set summarization. The
normalized and log2 transformed gene expression values were used for
calculation of differential expression between conditions, with
moderated t-statistics and Benjamini-Hochberg’s method to control the
false discovery rate.
In case of the datasets with only LTBI samples but not uninfected
cases, the gene expression profiles were computed in comparison to the
uninfected samples from other datasets. The tool ComBat ([80]29) was
used to merge the normalized gene expression data from individual
datasets and remove batch effect, followed by differential gene
expression calculation with the Limma package.
Generating Precision Networks for Each LTBI Subject
We used a comprehensive in-house human protein-protein interaction
network (hPPiN) from Sambarey et al., 2017 in this study ([81]21). In
summary, the network contained high confidence, experimentally
validated physical and functional interactions curated from different
databases, mainly STRING ([82]30), SignaLink 2.0 ([83]31), BioGRID
([84]32) and primary literature ([85]10, [86]33). The proteins in the
hPPiN were represented as ‘nodes’ and the interactions between the
proteins were represented as ‘edges’. The edges were given directions
based on the nature of interaction between the proteins. While the
edges signifying interactions like activation, inhibition,
phosphorylation, etc. were marked as unidirectional, the physical
binding interactions and interactions without functional annotation
were marked as bidirectional. The network comprises 17,070 nodes and
209,582 edges, of which 80% of the edges are directed, which is used as
the base network. We converted the base network to individualized
precision networks for each LTBI subject by integrating it with the
corresponding subject’s transcriptome data. For this, we used an
in-house computational approach ([87]10, [88]17, [89]21, [90]34) that
derives node and edge weights as per Equations 1, 2 and 3. In every
case LTBI samples were compared against the median gene expression
values of uninfected samples from the same dataset.
To mine the weighted LTBI networks, we use our previously developed
computational method ([91]10, [92]17, [93]21, [94]35), which involved
identification of ‘most active’ and ‘most repressed’ paths in each
network based on the path cost. Path cost was computed as the sum of
weights of edges present in the path normalized by the number of nodes
in the path (Equation 4). The least cost paths in each sample
encompassing the top 500 nodes were selected as the most active and
repressed paths. Networks obtained from most-active paths formed the
top-active network (LTBINetA) and the most-repressed paths constituted
the top-repressed network (LTBINetR). LTBIA and LTBIR were combined to
generate the top-perturbed precision network (LTBINetP) for each
individual.
Functional Enrichment Analysis
Pathway enrichment analysis of the precision networks, LTBINetP, was
carried out with Enrichr ([95]36). Enrichment of immune response
pathways was carried out in higher detail by using InnateDB, the
manually curated innate immunity pathway and interactions database
([96]37) with a hypergeometric test and the Benjamini-Hochberg’s
correction method. For both Enrichr and InnateDB enrichment, pathways
with corrected P-value ≤0.05 were considered to be significantly
enriched.
Since the pathway description in the databases includes all the genes
involved in the pathways and many pathways share multiple signal
mediators, there is a significant overlap of genes between different
pathways. Therefore, a pathway can be statistically over-represented if
some of the signal mediators are captured in the gene list, but not the
initiating genes of the pathway, i.e. the ligands or receptors. In
order to avoid the selection of such pathways from further analysis,
another curation step was applied to the enrichment analysis to select
only those pathways for which the ligand or receptor genes (as per the
source databases) were present in the LTBINet analyzed.
Computation of Binary Pathway Over-Representation Score and Clustering
Each of the target pathways was given a binary score of 1 or 0 based on
the over- representation of the pathway associated genes (corrected
p-value ≤1e-05) in the top perturbed network of individual samples. The
precise pattern of pathway perturbation was used for clustering the
samples. Unsupervised clustering of LTBI samples was performed with the
pathway perturbation scores using the ‘ConsensusClusterPlus’ package in
R ([97]38).
Results
A Meta-Analysis of Whole Blood Transcriptomes of LTBI Individuals Reveal
Heterogeneity in Gene Expression
To obtain a systems perspective of the host immune responses associated
with LTBI, we first analyzed 3 publicly available transcriptome
datasets, [98]GSE19439, [99]GSE19444 and [100]GSE28623, that contained
information on samples from LTBI, uninfected healthy controls (HC) as
well as active TB cases and subsequently validated our findings in 2
other independent LTBI transcriptome datasets ([101]GSE107993 and
[102]GSE107994). The 3 datasets together contained samples from 63 LTBI
(TST or IGRA +ve, sputum smear or culture test -ve) and 55 healthy
individuals. We found very few differentially expressed genes (DEGs)
between LTBI and HC, with a fold change ≥ ± 1.5, FDR adjusted p value
≤0.05 (2 in [103]GSE19439, 1 in [104]GSE19444 and 0 in [105]GSE28623)
([106] Figure 1A ), although a significant number of genes (1000) were
found to be differentially regulated (fold change ≥1.5, FDR adjusted p
value ≤0.05) between active TB and HC in all 3 datasets ([107] Figure
1A ). However, with less stringent statistical criteria (unadjusted p
value ≤0.05 for the same fold change cut-off of ≥ ± 1.5), there was a
significant increase in the number of DEGs in each dataset [ranging
from 993 to 6680 ([108] Figure 1C )], but there were no common DEGs
among them ([109] Figure 1B ). Analysis of the DEG lists indicated that
transcriptomes in LTBI samples do exhibit many variations as compared
to HC ([110] Figure 1A ), but the DEG lists differ greatly between
cohorts and also within each cohort, but the same genes were not
differentially expressed across all LTBI samples and the extent of
differential expression of the genes was also not similar ([111] Figure
1D ). This clearly shows that there is a high amount of heterogeneity
amongst the LTBI samples, which leads to the absence of statistically
significant DEGs when many samples are analyzed together in or across
datasets.
Figure 1.
[112]Figure 1
[113]Open in a new tab
LTBI patients have a highly heterogeneous gene expression profile. (A)
The three selected datasets [114]GSE19439, [115]GSE19444 and
[116]GSE28623 showed a significant number of genes to be differentially
regulated in active TB condition but very few DEGs in LTBI when
compared to uninfected cases with conventional thresholds of FDR ≤0.05
and fold change ≥ ± 1.5 criteria. More number of genes could be
considered as DEGs if the criteria are modified to unadjusted p value
≤0.05 and fold change ≥ ± 1.5. However each of these samples contained
thousands of genes with ≥ ± 1.5 fold change in gene expression compared
to uninfected individuals. (B) Venn diagram shows that there are no
common DEGs in LTBI condition (unadjusted p value ≤0.05, fold change ≥
± 1.5) between all three datasets. The number of common DEGs between
any two datasets are also very few. (C) Each LTBI sample contained over
1000 genes with ≥ ± 1.5 fold change, showing a significant difference
between the gene expression profile of the individual samples from
uninfected cases. (D) Heatmap shows the fold change of expression of
the genes in each individual LTBI sample that showed (≥ ± 1.5) fold
change in any sample (union of all genes from [117]Figure 1A , column
5). White signifies no significant fold change (≤ ± 1.5), red stands of
upregulation in gene expression and blue shows downregulation. It is
clear that although all the samples show significant change in the gene
expression profiles, it is not consistent across the samples,
suggesting a heterogeneous response to LTBI in humans.
Genome-Wide Response Network Analysis Identifies Most Frequently Perturbed
Pathways in LTBI Individuals
The inherent heterogeneity in the gene expression profile of LTBI
individuals makes a conventional differential gene expression analysis
miss out on several genes that may play an important role in the host
response in some LTBI individuals but not all. At a functional level, a
pathway might be perturbed as a result of differential regulation in
any of the key genes involved in the pathway, leading to a similar
end-effect at the level of phenotype. To understand if the highly
heterogeneous gene expression profile of the LTBI individuals were
indeed involved in perturbing a similar group of pathways, it became
necessary to study the effect of the gene expression in each case at a
functional level. A network approach involving genome-wide
protein-protein interaction networks integrated with condition specific
gene expression data has been shown to be more efficient in studying
underlying cross-talks between DEGs and identifying functional
alterations between the two conditions ([118]21, [119]35, [120]39). We
have previously developed methods in our laboratory to construct
condition-specific networks by incorporating transcriptome data with a
genome-wide protein-protein interaction network and further to
sensitively mine such networks to identify subnetworks containing
top-ranked perturbations in the condition of interest ([121]21,
[122]34). We adopted this approach to build LTBI specific response
networks for each of the 63 samples. Briefly the method integrates the
in-house human protein-protein interaction network (hPPiN) with gene
expression data and identifies a connected set of perturbed paths (as
compared to HC), from which the top 500 nodes in each case is taken to
constitute a top-ranked response network (LTBINets) ([123] Figure 2A ).
Each of the 63 LTBINets contained one large connected component
containing genes belonging to broadly similar functional categories in
each case. An enrichment analysis based on the Reactome Database showed
that the predominant pathways were those of the immune response, signal
transduction and cell cycle phases ([124] Supplementary File S2.A ).
This clearly indicated that, despite a lack of common DEGs across these
samples, the LTBINets contained genes of the same functional
categories, showing that they shared commonalities in the alteration
patterns at the level of pathways.
Figure 2.
[125]Figure 2
[126]Open in a new tab
Response network analysis workflow. (A) Schematic representation of the
individualized protein-protein interaction network analysis workflow
used in this work. (B) Workflow to identify the most frequently
perturbed immune response pathways in LTBI patients.
We focused on studying the most important immune response pathways and
used InnateDB database for further over-representation analysis.
However, we perceived that the ontologies that describe the pathway
associations for genes are rather broad and multiple pathways share a
large number of overlapping genes. Thus, a pathway can be statistically
enriched despite the lack of the key genes in the input that trigger
the pathway, leading to a possibility of false positives. To address
this problem, we applied a filter that checks for feasibility in
pathway perturbation by eliminating all those where the initiating
genes of the pathway were absent. Specifically, we obtained a
frequency-weighted union of all 63 LTBINets and pruned the pooled
network to retain only those nodes that were present in at least 20% of
LTBINets ([127] Figure 2B ) and studied the most enriched immune
response related pathways that satisfied our criteria of the presence
of initiating genes ([128] Supplementary File S2.C ). We further
manually curated the list of over-represented pathways to identify the
specific immune responses important in LTBI. From the list of pathways
in [129]Supplementary File S2.C only those pathways were retained that
a) represented a specific cellular signaling or immune response
pathway, and b) satisfied our criteria of the presence of pathway
initiating genes (i.e. ligands or receptors) as described in the
corresponding databases in the union LTBINet ([130] Supplementary File
S2.D ). Thereby 13 pathways, comprising 10 distinct immune response
signaling pathways were obtained from the pruned network that could be
clearly linked to LTBI in most cases and refer to these as Immune
response pathways in LTBI (IPLTB). These are the IFNγ mediated
signaling, signaling by IL2, IL4 and IL12, TNFα and TGFβ signaling
pathways, TLR2 mediated signaling and the signaling pathways by
receptor tyrosine kinases EGFR, PDGF and FGFR. All of these pathways
except TNF, IL2 and IL4 mediated signaling were found to be more active
(more frequent in top active networks) in LTBI cases, whereas IL4
signaling activity was found to be repressed (more frequent in top
repressed networks) in LTBI as compared to HC ([131] Supplementary File
S2.E ). TNFα and IL2 pathways showed higher activity in majority but
lower in some LTBI individuals than HC. Most of these pathways have
been reported to play significant roles in active TB, whereas
interferon-γ, IL2 and TNFα mediated signaling have been clearly
implicated in LTBI and host susceptibility to TB ([132] Supplementary
Table S1.A ). Further, there are reports of reactivation of LTBI in
cancer patients treated with receptor tyrosine kinase inhibitors,
providing additional support for the involvement of these pathways in
maintenance of latency ([133]40, [134]41).
Clustering of Samples Based on Perturbed Pathways Indicates Immune Response
Sub-types in LTBI Individuals
It was evident from the network analysis that not every LTBI individual
had perturbation in all the over-represented pathways. We therefore
asked if there were any distinct subtypes among LTBI individuals in
terms of their immune responses. To address this, we considered the
most frequently perturbed immune pathways (IPLTB) and represented each
sample as a binary barcode of perturbations of IPLTB ([135]
Supplementary File S2.F ). We then clustered the samples based on the
extent of similarity in the barcodes with an unsupervised clustering
tool ConsensusClusterPlus with Euclidean distance metrics. The
cumulative distribution function reached an approximate maximum at 9
clusters (C1 to C9), of which 4 clusters (C4, C5, C7 and C9) were large
consisting of 87% of the total samples, whereas the rest of the samples
formed small clusters clearly different from the rest ([136] Figures
3A, B ). The clusters were not dataset-specific, as samples from
different datasets were found in the same cluster ([137] Figure 3C ).
From here onwards, the major clusters C4, C5, C7 and C9 will be called
C-a, C-b, C-c and C-d respectively. Almost all samples showed
perturbation in the EGFR, PDGFR, TNFα and TGFβ signaling pathways,
whereas the other pathways were perturbed only in a fraction of the
LTBI samples. All samples belonging to the cluster C-a, show
perturbations in PDGF, EGF, TGFβ and TNFα mediated signaling pathways,
along with perturbed IL2 and IL4 mediated signaling in some cases.
Samples in C-b show perturbations in IL12/IFNγ axis and IL4 mediated
signaling in addition to the pattern in C-a. Samples in cluster C-c
have the maximum perturbations since all but FGFR mediated signaling
pathways are found to be perturbed in most of them. C-d significantly
differs from C-c in not having any perturbation in IL4 mediated
signaling and from C-a and C-b by showing variation in TLR2 mediated
signaling. The other 9 samples have fewer perturbed pathways than the
other clusters ([138] Figure 4 ). We earlier observed the lack of
common DEGs across all the samples from the datasets. Now we performed
a gene expression analysis for samples in each of the four major
clusters and saw that the number of DEGs (fold change ≥1.5, p value
≤0.05) was increased significantly in each cluster. C-a showed 37 DEGs,
whereas C-b, C-c and C-d showed 80, 59 and 149 DEGs respectively ([139]
Figure 4 ). This shows that the sample heterogeneity is lesser within
each subtype and the extent of perturbation also differed between the
subtypes.
Figure 3.
[140]Figure 3
[141]Open in a new tab
LTBI samples can be divided into groups based on pathway perturbation.
(A, B) The LTBI samples can be divided into 9 substantially stable
clusters based on their pathway perturbation patterns. The cumulative
distribution function (CDF) plot shows that CDF reaches an approximate
maximum as early as k=9 cluster. The clusters are significantly
different from each other. 4 of the clusters contain the majority of
the samples, whereas the few other samples show a highly varied pathway
profile. (C) The dendrogram shows the samples in each cluster. The
clustering was not biased by cohort or dataset as each of the large
clusters contains samples from different datasets.
Figure 4.
[142]Figure 4
[143]Open in a new tab
Pathway perturbation pattern in sample clusters. The pattern of the
perturbations in each sample arranged according to their clusters (as
mentioned in [144]Figure 3C ) is depicted in a binary scoring manner.
Blue signifies that the pathway is perturbed in the patient and light
yellow signifies no perturbation. Red lines demarcate the clusters.
We performed a 2 fold validation of the clustering patterns. In the
first step, we analyzed the 2 RNAseq based transcriptome datasets that
contained samples from both LTBI and uninfected cases. There were a
total of 73 LTBI samples in [145]GSE107993 and [146]GSE107994 to which
a similar network analysis pipeline followed by an enrichment analysis
was applied. For each sample, a binary score was computed for each of
the 10 immune pathways. Each of the 73 samples thus scored were added
to the binary score table of the 63 previous samples and consensus
clustering was performed. 63 of the 73 LTBI samples clustered with one
of the four major clusters (C-a, C-b, C-c and C-d) showing the patterns
to be significant for LTBI condition ([147] Figure 5A ). Similar to the
discovery datasets, C-a contained a minimum number of DEGs (fold change
≥1.5, p value ≤0.05), which is 54, whereas C-d showed the highest
number of DEGs, 325. C-b and C-c had 229 and 134 DEGs respectively
([148] Supplementary Figure 1 ). The significant difference in the
validation set samples from the previous set was that a higher
frequency of perturbations in FGFR mediated signaling pathway was
observed in all clusters and the higher frequency of perturbation in
IL-2 mediated signaling in C-b among validation samples ([149]
Supplementary Figure 1 ).
Figure 5.
[150]Figure 5
[151]Open in a new tab
Validation of clustering pattern with independent RNAseq datasets. (A)
There are 8192 possible combinations of binary scoring patterns for the
13 pathways used for clustering analysis. 14 out of 16 samples from
[152]GSE107993, 49 out of 57 samples from [153]GSE107994 and 13 out of
15 samples from [154]GSE84076 clustered into one of C-a, C-b, C-c or
C-d. This is significantly more enriched than the random possibilities,
validating the clustering pattern of pathways. (B) Similarly, in
datasets without uninfected samples, 251 out of 288 samples clustered
with one of the recognized immune subtypes. Notably, in [155]GSE79362,
153 non-progressors were analyzed and 60 of them clustered with C-d,
showing a bias for non-progressor for this subtype. (C) 44 of the 46
LTBI progressor samples from [156]GSE79362 and [157]GSE107994 clustered
one of the 4 subtypes, with a clear bias towards C-c. 5 of the 12
active TB datasets also showed immune perturbation patterns similar to
the subtype C-c.
In the second validation step, clustering patterns of the LTBI samples
from the datasets lacking uninfected samples were analyzed in a similar
manner. The gene expressions were calculated with respect to uninfected
samples from other datasets as mentioned in the Methods section. These
four datasets contained a total of 288 LTBI samples, of which 251
clustered with one of the four major clusters with significant
hypergeometric p-value ([158] Figure 5B and [159]Supplementary Figure 2
). These validation analyses clearly suggest that the host immune
system in majority of the LTBI individuals follow one of the 4
identified subtypes.
As evident from [160]Figure 4 and [161]Supplementary Figures 1 and
[162]2 , no pathway was perturbed in every sample, nor did any sample
have perturbation in all pathways. Thus each of the clusters showed
specific immune response subtypes associated with TB latency, of which
4 subtypes (C-a, C-b, C-c and C-d) are more frequently observed. This
indicates that different individuals with the same phenotype of latent
TB are associated with different genotypes for either establishing or
maintaining latency. The samples in each cluster have similar
perturbation (and similar genotypes) among them and hence form a
subtype. The analysis clearly shows multiple subtypes, indicating that
latency could be maintained by different molecular routes.
Individuals With Immune Response Like Subtype C-c Might Be at Higher Risk of
TB Reactivation
From the pattern of pathway perturbation, it could be seen that samples
in C-c showed perturbation in the highest number of pathways in the
identified IPLTB. We sought to test if it corresponded to the state of
the disease or possibility of progression into active TB. 46 LTBI
samples with information on confirmed progression into active disease
were available in two datasets, [163]GSE107994 and [164]GSE79362. The
perturbation pattern of IPLTB in these samples was analyzed using the
same network analysis method and their clustering pattern was observed.
Although the samples did not exclusively belong to any of the subtypes,
a bias towards C-c was indeed observed ([165] Figure 5C and
[166]Supplementary Figure 3 ). Interestingly, among the non-progressor
LTBI samples from [167]GSE79362, a clustering bias was observed for
C-d, which had the highest number of non-progressors (60 of the 153)
samples ([168] Figure 5B ). Although, 12 of the 16 genes from the Zak16
signature ([169]18) were present in some of our sample LTBINets, only a
few were DEGs in early time points and no specific pattern in cluster
membership of those genes was observed, indicating that the Zak16
signature is not repurposable for subtyping LTBI samples. We also
studied the membership of genes from other progression signatures,
RISK4 ([170]42) and PREDICT29 ([171]43) in IPLTB, but did not find any
correlation between the signatures and our pathway-based subtypes.
12 transcriptomic datasets ([172] Supplementary Table S1.B ) on whole
blood from active TB and uninfected cases were also analyzed. Since,
gene expression perturbations in active TB condition are significantly
homogeneous ([173] Figure 1A ), the datasets were analyzed as sample
pools instead of as individual samples. The perturbation pattern of
IPLTB in active TB condition showed that 7 of the datasets showed
similarity with the LTBI subtypes, of which 5 resembled C-c ([174]
Figure 1C and [175]Supplementary Figure 3 ). The rest of the datasets
were not similar to any of the subtypes, which can be expected since
the immune response in LTBI and active TB conditions vary greatly. This
also suggests that the samples in C-c might have higher resemblance in
their immune status with active TB state than the other subtypes. From
these two observations it can be suggested that the samples in C-c have
a higher possibility of the presence of subclinical disease or
propensity towards progression into active TB than the samples from any
other subtype.
Discussion
In asymptomatic LTBI cases, the balance between bacterial containment
and persistence is caused due to certain immune responses in the host,
which prevents the pathogen from multiplying and causing the disease.
Although whole blood transcriptomic studies have previously been widely
successful in identifying differential gene expression signatures for
active TB ([176]7, [177]13–[178]17), they have not been successful in
differentiating between the HC and asymptomatic LTBI populations
([179]7, [180]13). All of these studies, along with other gene
expression meta-analysis have clearly shown that there is no
statistically significant differential gene expression between these
two ([181]20). Our independent analysis of the datasets here
corroborates these findings.
We hypothesized that the specific molecular alterations that enable the
host to keep Mtb in a latent state are not universally the same in all
individuals, and thus the gene expression profiles are also vastly
different leading to the observation of no common DEGs across different
cohorts. It is possible however, that there are broad similarities at
the level of pathways among these individuals, which we have
investigated in this work. We adapted the network analysis method from
our previous work and have carried out a meta-analysis of the LTBI
samples at the level of networks. This allowed us to identify the
common key subnetworks and the pathways that are frequently perturbed
in LTBI individuals through different gene expression patterns.
We identified 10 pathways to be commonly perturbed in LTBI samples,
which include multiple cytokine mediated signaling pathways, such as
IL2, IL4, IFNγ, TNFα, which are known to be involved in the host immune
response to active TB. Many of these are also reported to be involved
in Mtb latency in animal models, as summarized in [182]Supplementary
Table S1.A . Our networks indicate an increased activity in the IL12
and IFNγ pathways in LTBI. This is in high agreement with previous
reports from literature indicating that an increase in the levels of
IL12 and IFNγ impart resistance against Mtb ([183]44–[184]46). Our
method also identified an increase in IL2 in many LTBI cases and a
reduction in activity in IL4 signaling, suggesting a high Th1 and a low
Th2 cell activity in LTBI condition, as compared to healthy samples.
The significance of Th1/Th2 balance and the cytokines IL2 and IL4 in
maintaining LTBI has been reported earlier ([185]47–[186]50). Our
networks also suggested that the pro-inflammatory cytokine TNFα
signaling to be perturbed in most of the LTBI samples as compared to
HC. TNFα has been described as a double-edged sword with respect to its
role in tuberculosis. While it is known to have a beneficial effect on
the host by controlling Mtb infection, it can also cause severe tissue
damage in active tuberculosis ([187]51). However, in Th1 dominant LTBI,
it is known to activate macrophages playing a host protective function
([188]52). This also explains increased LTBI reactivation in patients
treated with anti-TNF drugs like infliximab ([189]53, [190]54). IFNγ
and TNFα can also synergistically induce oxidative stress response by
macrophages, leading to an antimycobacterial effect ([191]55). Our
analysis also shows TLR2 mediated responses to be high in some LTBI
individuals (clusters 7-9). TLR2 activation can lead to an induction of
an antimicrobial peptide cathelicidin and also induce Th1 cytokine
release, which in turn helps in containing mycobacterial infection
([192]56–[193]58). Polymorphisms in TLR2 pathway genes have also been
linked to tuberculosis susceptibility ([194]59, [195]60). Since
mycobacterial cell wall components can directly induce TLR2 mediated
response ([196]57), it is possible that the extent of TLR2 responses is
dependent on the bacterial burden in the individual. These pathways are
also perturbed in active TB condition to a different extent as reported
in literature as well as observed in our analysis when active TB
samples were compared to HC and LTBI conditions following a similar
method ([197] Figure 6A ).
Figure 6.
[198]Figure 6
[199]Open in a new tab
Status and function of IPLTB in uninfected (HC), LTBI and active TB
patients. (A) The extent of activity of the IPLTB pathways is active TB
condition compared to HC and LTBI was analyzed using a similar network
analysis method. From this analysis, as well as literature reports, the
status of these pathways in the 3 conditions, HC, LTBI and active TB,
are summarized in a semiquantitative manner. IL12/IFNγ, TLR2 and EGFR
mediated signaling pathways were found to be more active in LTBI
compared to HC and further activated in active TB. IL2 and IL4 showed
an opposite trend of activity in LTBI and active TB. TGF and TNF were
more active in LTBI and active TB compared to uninfected, but the
difference between LTBI and active TB cannot be commented upon from our
analysis. PDGFR mediated pathway was observed to be more active in both
LTBI and active compared to HC, but the extent of activity was higher
in LTBI. FGFR mediated signaling was not observed to be one of the top
perturbed pathways in active TB condition. The pathways previously
reported to have an important function in the context of active and
LTBI are marked with a tick, whereas no previous report is marked with
a cross. (B) The possible effects of the IPLTB on the host immune
system are drawn as a simplified schematic. The pathways can be linked
to inflammation, macrophage activation, antimycobacterial effects,
granuloma formation and fibrosis, etc., which can finally help in
maintaining TB latency. The different clusters use some of the pathways
from IPLTB, as shown in insets, to achieve TB latency. Red and green
correspond to perturbed activities, in comparison to HC. Green denotes
the pathways perturbed in all the clusters, whereas red shows pathways
perturbed in different clusters. Dark red signifies that most of the
members of the cluster show the perturbation whereas light red
signifies about half of the members to show the perturbation.
The successful containment of Mtb in the host requires a protective
granuloma structure with a fibrotic capsule and a necrotic center that
can restrict the pathogen ([200]61–[201]63). In addition to TLR2 and
cytokine signaling, our analysis indicated TGFβ, EGFR, PDGFR and FGFR
mediated signaling responses to be highly active in LTBI. TGFβ levels
are also known to be high in active TB and are required for
intracellular survival of Mtb ([202] Figure 6A ) ([203]64, [204]65).
Drug repurposing studies have identified that Gefitinib, an EGFR
inhibitor, restricts Mtb growth, providing support for the involvement
of the EGFR signaling pathway in tuberculosis infection ([205]66). High
activities of these pathways are known to be beneficial to the pathogen
and thus are associated with active TB ([206]67). Our analysis suggests
that an increase in the activity of these pathways as compared to HC is
associated with LTBI. Further support for the involvement of the EGFR
pathway comes from an observation that Erlotinib, an EGFR inhibitor
leads to reactivation of LTBI ([207]41). All of these pathways can
cause collagen deposition and fibrosis ([208]68–[209]72). Put together,
these indicate that an increase in activity of the four pathways
compared to HC might help in increasing fibrosis of granuloma and
restricting Mtb dissemination in LTBI cases. EGFR signaling can
increase proliferation of anti-inflammatory M2 type macrophages in
tumor-like environments and adoptive transfer of M2 macrophages have
been reported to attenuate Mtb infection ([210]73, [211]74). It is
possible that a moderately enhanced activity through this pathway could
result in LTBI, while a highly enhanced activity is associated with
active TB ([212] Figure 6A ).
Although all of these pathways can be involved in controlling Mtb
infection, we observed that only a subset of these pathways are
perturbed in any LTBI individual. We could divide the total LTBI
population into 4 large subtypes with a few outliers based on different
molecular routes taken by the individual immune system to achieve
latency. It shows that although there is a similarity between the host
responses in LTBI individuals, there also exists heterogeneity in
exactly how the host system controls the infection. All of the
identified pathways can lead to inflammation regulation and
antimicrobial effect or granuloma formation ([213] Figure 6B ).
Therefore, the host immune system can achieve latency of Mtb by
modulating only a combination of these, giving rise to different
possible molecular routes to latency ([214] Figure 6B ). We also
observed that LTBI progressors and active TB samples have a higher
tendency of being clustered with C-c than the others. Although clinical
information on the progression time-course was not available for the
other LTBI samples in discovery and validation datasets, this
clustering bias of known progressors and active TB cases might indicate
that the other LTBI samples belonging to cluster C-c have a higher
propensity for subclinical TB or reactivation. Among the four subtypes,
C-c shows perturbation in the maximum number of pathways from IPLTB,
being the only subtype to show perturbations in both TLR2 mediated
signaling and IL4 mediated signaling. Also, C-c is the only subtype
that shows perturbed IL4 signaling in all the samples. Increased TLR2
mediated signaling could be caused by a comparatively higher bacterial
burden in this subtype as TLR2 can recognize Mtb antigens and it might
be related to a higher chance of reactivation of LTBI. Repression of
IL4 signaling might play a crucial role in such cases as IL4 can
promote pathogenesis, and is therefore repressed in all the progressor
LTBI samples before reactivation. Overall, the immune status of the
samples in C-c varies the most from uninfected conditions. It is
suggestive of LTBI cases with a higher chance of progression into
active disease having more perturbations in their immune system even at
time points much before TB incidence. This information can provide
insights into immune response responsible for susceptibility towards
LTBI reactivation and also aid in clinical decision making for
personalized therapy of LTBI.
To the best of our knowledge, this is the first systematic analysis
that compares the LTBI transcriptomes with those of uninfected
individuals that provides an insight into the immune response of the
LTBI condition. These identified pathways could possibly be further
explored as potential preventive targets for LTBI reactivation in
future. This knowledge can also be useful in taking a cautious approach
while treating any LTBI subject with drugs targeting these pathways for
other diseases.
Equations
For active precision network,
[MATH:
NWi=FC(i)=(Expression of gene ’i’ i
mi>n one LTBI s<
mi>ample)(Median expression of gene ’i’ in unin
mi>fected
samples) :MATH]
(1)
For repressed precision network,
[MATH:
NWi=FC(i)=(Median expression of gene ’i’ in unin
mi>fected
samples)(Expression of gene ’i’ i
mi>n one LTBI s<
mi>amples) :MATH]
(2)
Where, NW is node weight and FC is fold change.
The edge weight (EW) between nodes ‘i1’ and ‘i2’ is calculated based on
the node weights as,
[MATH:
EWi1
i2=1
mn>NWi<
mn>1×NWi2<
/mfrac> :MATH]
(3)
[MATH:
Path Cost=∑i=1nEWi1i2<
/mrow>Number of nodes <
/mtext>in path :MATH]
(4)
Nomenclature
EGFR, Epidermal growth factor receptor; FGFR, Fibroblast growth factor
receptor; IFNγ, Interferon gamma; IL, Interleukin; HC, Healthy
(Uninfected) control; LTBI, Latent tuberculosis infection; Mtb,
Mycobacterium tuberculosis; PDGFR, Platelet derived growth factor
receptor; TB, Tuberculosis; TGFβ, Transforming growth factor beta; TLR,
Toll like receptor; TNFα, Tumor necrosis factor alpha
Data Availability Statement
Publicly available datasets were analyzed in this study. This data can
be found here: [215]https://www.ncbi.nlm.nih.gov/geo/ under the IDs
[216]GSE19439, [217]GSE19444, [218]GSE28623, [219]GSE107993,
[220]GSE107994, [221]GSE84076,[222]GSE40553, [223]GSE79362,
[224]GSE37250, and [225]GSE101705. Codes used for analysis in this work
are available at
[226]https://github.com/chandralab-iisc/LTB_immune_Subtype.
Author Contributions
NC conceptualized, designed, and supervised the study. UB performed
data curation, methodology, analysis, and interpretation. PB performed
data curation and methodology. AS provided supervision and critical
insights. UB wrote the first draft of manuscript. All authors revised
and approved the submitted manuscript.
Funding
The authors thank the Department of Biotechnology, Government of India
for providing the funding for this study in the form of a DBT-IISc
partnership grant.
Conflict of Interest
NC is a co-founder of the companies qBiome Research Pvt Ltd and
HealthSeq Precision Medicine Pvt Ltd. They had no role in this
manuscript.
The remaining authors declare that the research was conducted in the
absence of any commercial or financial relationships that could be
construed as a potential conflict of interest.
Acknowledgments