Abstract

   The immune system is a highly complex and dynamic biological system. It
   operates through intracellular molecular networks and intercellular
   (cell–cell) interaction networks. Systems immunology is an emerging
   discipline that applies systems biology approaches of integrating
   high-throughput multi-omics measurements with computational network
   modeling to better understand immunity at various scales. In this
   review, we summarize key omics technologies and computational
   approaches used for immunological studies at both population and
   single-cell levels. We highlight the hidden driver analysis based on
   data-driven networks and comment on the potential of translating
   systems immunology discoveries to immunotherapy of cancer and other
   human diseases.

   Keywords: Systems immunology, scRNA-seq, gene regulatory network,
   cell–cell communication, hidden driver analysis

Graphical Abstract

   graphic file with name nihms-1052169-f0001.jpg

Introduction

   The immune system, one of the most complex and dynamic biological
   systems in mammals, is comprised of diverse cell types with varying
   functional states. Between the two major arms of the immune system, the
   innate immune system, comprised of macrophages, dendritic cells,
   neutrophils and other cells, serves as the first line of defense by
   mounting immediate and potent immune and inflammatory responses against
   invading pathogens and other immunological insults. Cells in the
   adaptive immune system, including T and B cells, have a more
   specialized role in immune reactions and are characterized by antigen
   specificity and long-term memory. These different elements interact as
   an integrative system to give rise to proper immune responses and
   regulation, and play crucial roles in protecting host health against
   viruses, bacteria, parasites and tumors. Dysfunctions in the immune
   network may lead to autoimmune, malignant, and inflammatory diseases.
   Characterizing these diverse cell types, their unique molecular
   features, and their interactions is the key to successfully manipulate
   the immune system for therapeutic applications. Advances in
   high-throughput profiling technologies, particularly the emerging
   single-cell omics platforms, enable comprehensive characterization of
   the immune components at multiple scales. However, immunity is not
   merely a sum of its components, and its behavior cannot be explained or
   predicted solely by examining individual components. Therefore, systems
   biology approaches are essential for decoding the cellular complexity,
   plasticity, and functional diversity of the immune system, leading to
   the emerging field of systems immunology to better understand how the
   immune system works as a whole in health and disease.

   Gene regulatory networks function as the crucial molecular determinants
   of cell fate and state by governing gene expression programing and
   reprograming in immune development and homeostasis [[28]1]. Signaling
   and epigenetic factors are also crucial drivers of immunological
   functions and are likely druggable, making them promising therapeutic
   targets. However, it is often difficult to identify many of these
   drivers (hence known as “hidden drivers”), because they may not be
   genetically altered or differentially expressed at the mRNA or protein
   levels but, rather, are altered by posttranslational modifications
   (PTMs; e.g., phosphorylation) or other mechanisms. Moreover, immune
   responses are mediated by both the intracellular gene networks and
   crosstalk between many types of immune cells in specific tissues and
   microenvironmental contexts; and their dysregulations can lead to
   diseases, including cancer and inflammatory disorders. Therefore,
   molecular and cellular networks, and their drivers and “hidden” drivers
   (cannot be easily detected by conventional approaches) must be
   systematically dissected to develop effective and curative
   immunotherapies for diseases such as cancer [[29]2].

Omics technologies for immunology research

   Technological advances in high-throughput and high-bandwidth profiling,
   phenotyping and perturbation assays have contributed to rapid advances
   of systems immunology. A variety of omics technologies at both
   population and single-cell levels have played important roles in
   improving our understanding of the immune system ([30]Figure 1). Each
   technology has its advantages and limitations, and understanding these
   factors is essential to devise effective and reliable systems
   approaches that address immunological questions at the appropriate
   resolution. In this section, we discuss these technologies by
   summarizing the essential aspects of their proper use, with example
   applications and certain limitations.

Figure 1.

   [31]Figure 1.
   [32]Open in a new tab

   Overview of the omics profiling technologies to characterize the immune
   system of human and mouse at population and single-cell levels.

Population level

   Transcriptome profiling by microarray or RNA-based next-generation
   sequencing (RNA-seq) is the most widely used omics method in immunology
   research. Transcriptome analysis has provided instrumental insights
   into the mechanisms of immune system development and homeostasis under
   steady state, and transcriptional dynamics during the immune response
   to antigens or pathogens, including the identification of diverse
   immune cell types and functional states [[33]3]. As the cost of
   sequencing decreases, RNA-seq, particularly bulk RNA-seq, has become
   the more prevalent technology for gene expression profiling, with
   several advantages than microarray technology: high coverage and
   sensitivity (detecting low-abundance transcripts); detection of
   splicing events, gene fusions, and small RNAs; low background noise and
   batch effects; and the ability to handle low RNA input (down to 10 pg).
   The community-driven and publicly available databases of gene
   expression profiles, such as Gene Expression Omnibus (GEO), have
   enabled data mining across platforms, studies, and species. A few
   curated immune-specific databases with analysis and visualization tools
   have provided valuable resources for immunology researchers, including
   ImmGen [[34]4], ImmPort [[35]5], ImmuneSpace [[36]6], and 10K Immunomes
   [[37]7].

   The expression levels of mRNA and protein can differ substantially for
   many genes [[38]8], especially during the dynamic transitional state
   when there is a temporal delay between transcription and translation
   [[39]9]. Moreover, posttranslational modifications, such as
   phosphorylation, are crucial regulators of protein functions and
   signaling, but are poorly correlated with mRNA or total protein
   expression. With the recent advances in mass spectrometry (MS)
   analytical technologies [[40]10], in-depth proteomic profiling can now
   identify more than 10,000 proteins (whole proteomics) and 30,000
   phosphopeptides (phosphoproteomics) across multiple samples
   simultaneously [[41]11**,[42]12,[43]13]. The tandem mass tagging (TMT)
   [[44]14] and the label-free quantitation (LFQ) are two common proteomic
   methods to quantify the differential abundance of expressed proteins,
   with the TMT method recently shown to have higher precision and
   coverage than the LFQ method [[45]15,[46]16]. Despite the challenge to
   cover the entire proteome and PTM landscape, current MS-based proteomic
   technologies are capable of providing comprehensive characterizations
   of proteome dynamics and biological insights into gene regulation and
   signaling circuits in immunology, such as T-cell activation
   [[47]11**,[48]17] and host-pathogen interaction [[49]18]. Proteomics by
   affinity purification-mass spectrometry (AP-MS) is also commonly used
   to identify protein-protein interactions (PPIs) [[50]19**] that help
   dissect the molecular mechanisms of crucial immunological modulators,
   for example, Mst1 signaling in regulatory T cells [[51]20]. Recently,
   advanced MS-based platforms have been developed to profile and explore
   the metabolome that may shape the functions of immune cells (e.g.,
   metabolomics [[52]21] and lipidomics [[53]22,[54]23]).

   DNA-based next-generation sequencing (NGS) has revolutionized the study
   of many fields in biology, including immunology [[55]24**].
   Whole-genome or -exome sequencing and targeted DNA sequencing are now
   routinely used to identify somatic genetic alterations associated with
   cancer and other diseases, spurring the advent of precision medicine
   [[56]25]. In basic immunology research, NGS is commonly used to dissect
   protein-DNA interactions (ChIP-seq) [[57]26], protein-RNA interactions
   (CLIP-seq) [[58]27], DNA methylation (Bisulfite-seq) [[59]28],
   chromosomal interactions (Hi-C) [[60]29], and chromatin accessibility
   (ATAC-seq) [[61]30].

   The revolutionary CRISPR/Cas-based genome engineering technologies
   enable the use of genome-wide functional perturbation screening
   [[62]31] to systematically interrogate novel players and circuits that
   regulate or modulate immune development, homeostasis and response
   [[63]32,[64]33**]. CRISPR and conventional RNAi screens perform
   comparably for identifying essential genes [[65]34]. Novel CRISPRi/a
   technologies provide a complementary but superior approach to RNAi by
   repressing or activating gene expression at the transcriptional level,
   while RNAi represses gene expression at the mRNA level [[66]35].

Single-cell level

   The immune system encompasses various cell types and functional states.
   Population- or bulk-based profiling performed by averaging results from
   thousands of cells of distinct types presents an inherent heterogeneity
   problem for data analysis and interpretation. However, the advent of
   single-cell technologies to profile the transcriptome (scRNA-seq)
   [[67]36], proteome (mass cytometry or CyTOF, NanoLC-MS)
   [[68]37,[69]38], genome [[70]39], and chromatin accessibility or
   epigenome (scATAC-seq, scChIP-seq, scBS-seq, scHi-C)
   [[71]40,[72]41,[73]42,[74]43,[75]44] has provided an unprecedented
   opportunity to overcome this challenge by simultaneously quantifying
   molecular features at the single-cell resolution. Indeed, single-cell
   technology was recognized as the breakthrough of the year for 2018. In
   immunology research, scRNA-seq [[76]45**] and mass cytometry [[77]37]
   are widely adapted.

   In the last few years, advances in technologies of cell suspension,
   automation, microfluidics and implementation of unique molecular
   identifiers have boosted the scRNA-seq field by improving the
   throughput (the number of cells), sensitivity (the number of
   uniquely-detected genes), precision (level of noise), and
   reproducibility [[78]46**]. The scRNA-seq technology has been widely
   used in immunology to reveal immune cell heterogeneity and dynamics in
   healthy and malignant conditions [[79]47*,[80]48*,[81]49*]. Significant
   efforts have been invested to profile the entire human and mouse cell
   atlas [[82]50,[83]51,[84]52]. Because of their high-throughput of
   cells, droplet-based scRNA-seq platforms, including 10X Genomics
   Chromium [[85]53], inDrop [[86]54], and Drop-seq [[87]55], are becoming
   more popular than FACS- or plate-based protocols for immunology studies
   [[88]56]. However, plate-based methods have no sequencing bias on the
   5’ or 3’ end of transcript tags and capture more molecules than
   droplet-based platforms [[89]57]. The combined use of both platforms
   can provide more comprehensive and in-depth information [[90]58].
   Imaging-based, single-molecule fluorescence in situ hybridization
   (smFISH) [[91]59,[92]60] is another powerful, emerging technology for
   high-throughput single-cell transcriptomics with additional spatial
   information integrated, but is yet to be applied to the immune system.

   Flow cytometry uses fluorescent antibodies to simultaneously profile
   multiple proteins per cell and has been the mainstay for
   immune-phenotyping. Mass cytometry overcomes the limitation associated
   with the spectral overlap of fluorophores in flow cytometry by using
   metal-conjugated antibodies that increase the dimension [[93]37]. It
   has enabled the identification and characterization of a variety of
   immune cell types and states in the mammalian immune system with
   emerging applications in the clinic [[94]61]. However, this technology
   is limited to a small number of pre-defined parameters (e.g., surface
   markers), and the profiling of these parameters depends on the
   availability of protein-specific antibodies. More recently, a
   multiplexed immunofluorescence method has been developed to obtain 40
   protein readouts of thousands of cells in situ [[95]62], which may also
   be adopted in immunology.

   To understand cellular behaviors in-depth, strategies to integrate
   multiple single-cell omics technologies or combine them with
   population-based profiling to simultaneously profile various dimensions
   of biological information from the same cell have emerged [[96]63]. For
   instance, recent studies have combined profiles of single-cell and bulk
   transcriptomes [[97]64]; transcriptomes and chromatin states
   [[98]65,[99]66*]; transcriptomes and protein epitopes
   [[100]67,[101]68,[102]69]; transcriptomes obtained by scRNA-seq and
   those obtained by smFISH [[103]70*]; epitomes and protein epitopes
   [[104]71]; transcriptomes and functional genomes
   [[105]72,[106]73,[107]74]; and genome, transcriptome, and methylome
   data [[108]75]. Application of these cutting-edge integrative
   technologies to immunological questions will likely provide new insight
   in our understanding of the immune system.

Computational approaches for systems immunology

   Multi-omics technologies providing population– and single-cell–level
   information give rise to remarkably rich and complex datasets with
   which to tackle immunological questions. However, interpretation and
   integration of such “big” data remain a challenge and a barrier to
   broad implementation of systems approaches in immunological studies. In
   this section, we review common computational algorithms and strategies
   for in-depth analysis and integration of multi-omics data in systems
   immunology ([109]Figure 2). We start with the immune cell deconvolution
   to identify proportions of cells within heterogenous populations, and
   then discuss various systems biology strategies to dissect the
   molecular pathways or features associated with immune cell identity,
   function and response, with an emphasis on hidden driver analysis based
   on data-driven networks to decode regulatory mechanisms of the immune
   system.

Figure 2.

   [110]Figure 2.
   [111]Open in a new tab

   Overview of common computational analyses and algorithms in systems
   immunology.

Deconvolution of the immune cellular heterogeneity

   One of the most frequent analyses in immunology is immune-cell
   phenotyping because extensive cellular heterogeneity underpins the
   functional diversity of the immune system. For bulk microarray or
   RNA-seq gene expression profiles, linear regression-based deconvolution
   algorithms [[112]76,[113]77,[114]78] have been developed to predict the
   frequency of diverse cell subsets based on predefined signatures.
   However, these approaches rely on prior knowledge on existing immune
   cell types. Instead, the widespread adoption of single-cell profiling
   enables unbiased identification of known and unknown subsets of immune
   cells. Several algorithms for clustering analysis, cell-type
   identification, and visualization from single-cell transcriptomics data
   have emerged. For instance, SC3 employs consensus k-means clustering
   method with a combination of various distance metrics and initial
   conditions that improves the accuracy and robustness of clustering in
   comparison with previous approaches [[115]79]. For more detailed
   discussion of cluster algorithms for scRNA-seq, we refer readers to
   other comprehensive reviews [[116]80**,[117]81]. However, more advanced
   and efficient algorithms of scRNA-seq analysis remain needed to capture
   the nonlinear cell–cell correlations, to reduce noise from the
   “dropout” effects, and to handle datasets with millions of cells.

Gene signature and pathway enrichment analysis

   Genome-wide transcriptomic and proteomic profiles of immune cells
   following treatments, stimulations or genetic perturbations provide
   valuable insights into molecular signatures and pathways that define
   cell identity, gene regulation, and immune responses. Differential gene
   expression analysis is the mainstream strategy to define a gene
   signature, followed by functional or pathway enrichment by
   hypergeometric test or gene set enrichment analysis (GSEA)-type
   approaches [[118]82,[119]83,[120]84,[121]85]. However, the signature
   analysis may be limited by poor correlation between different studies,
   as signature genes derived from independent experiments may not be
   entirely consistent. Additionally, the pathway databases may lack
   context-specific information and are limited by incomplete or
   inaccurate prior knowledge. Immune cell deconvolution gives the
   proportion of heterogeneous cell types while functional enrichment
   analysis defines the molecular features in each cell type. A
   combination of these two approaches facilitates downstream analysis of
   intracellular and intercellular interactions.

Intracellular gene network inference

   The availability of large-scale profiling platforms enables the study
   of relationships among the molecular elements (i.e., intracellular gene
   networks) in the immune system. Most of the network reconstruction
   methods are based on gene expression profiles of perturbation
   experiments (e.g., gene silencing, deletion or overexpression), as
   previously reviewed [[122]86,[123]87]. Here, we highlight two common
   network inference strategies that use baseline transcriptomic data. One
   is co-expression network analysis by WGCNA [[124]88] based on Pearson
   or Spearman correlations. However, co-expression networks usually
   contain a large number of redundant interactions that lack biological
   relationships. To overcome this problem, ARACNe [[125]89] uses mutual
   information to capture nonlinear gene–gene relationships and applies
   data processing inequality to remove redundant edges. It has been
   widely used to infer transcription factor (TF) regulatory networks from
   gene expression data. Recently, SJARACNe [[126]90] was developed to
   scale up and extend ARACNe to infer both TF regulatory and signaling
   networks from large-input datasets, including scRNA-seq data. For
   example, SJARACNe was used to reverse-engineer the signaling
   interactome of dendritic cells (DCs), leading to novel molecular
   insights into the functions of DC subsets [[127]91**]. For scRNA-seq
   data, SCENIC utilizes TF motif databases to reconstruct regulatory
   networks that improves clustering and reduces batch effects [[128]92].
   Other modeling approaches including Bayesian network, Boolean network,
   and diffusion or differential equation–based network approaches, are
   used for inference of small-sized networks [[129]93]. For example,
   Bayesian network was used to identify causal correlations of molecular
   and clinical features of Alzheimer’s disease [[130]94]. However, it
   remains challenging to scale up these approaches for genome-wide
   networks, because of the high complexity of parameters and limited
   samples [[131]95]. To complement and improve networks predicted in
   silico, experimental approaches are also used to directly infer
   subnetworks of proteins of interest (e.g., TF regulatory network by
   ChIP-seq [[132]96], post-transcriptional networks by CLIP-seq
   [[133]27], PPI by AP-MS [[134]19], enzyme-substrate network by
   PTM-enriched proteomics [[135]97], and metabolic networks by
   metabolomics [[136]21]). However, these networks are limited to
   selected proteins and lack generalizability.

Network-based integrative analysis

   Integration of multi-tier omics data increases the sensitivity and
   reliability of discoveries in the immune and other complex biological
   systems by aggregating information at multi-layers to increase the
   signal-to-noise ratio [[137]93,[138]98]. This approach is particularly
   important in understanding immune system function, given the high
   complexity of cellular components and molecular circuits in the immune
   system. However, different omics platforms have distinct features and
   dimensions, making the meta-analysis challenging. The most popular
   strategy is to superimpose co-expression or regulatory networks,
   constructed from transcriptomes and/or knowledge-based network
   databases (e.g., MSigDB, PPI, TF-target, kinase-substrate) on various
   omics data to identify network modules that control immune cell
   development and response [[139]99]. For example, this strategy has been
   applied to integrate temporal transcriptome, proteome, and
   phosphoproteome data, leading to the identification of novel signaling
   circuits and bioenergetics pathways that mediate T-cell quiescence exit
   [[140]11]. Additionally, PARADIGM [[141]100] integrates genomic and
   transcriptomic alterations to identify dysregulated pathways.
   NetGestalt [[142]101] defines the hierarchical architecture in the
   network of omics data clustering. CellNet [[143]102] utilizes
   co-expression networks to determine the cell identity and master
   regulators of cell types/states. PageRank combines ATAC-seq and
   transcriptomic datasets to identify master regulators of T-cell
   residency in non-lymphoid tissues and tumors [[144]103**]. Both VIPER
   [[145]104] and NetBID [[146]91**] use ARACNe/SJARACNe-derived
   regulatory networks to infer protein activities in individual samples
   and master regulators associated with phenotypes. While VIPER is
   focused primarily on gene expression data, NetBID uses a distinct
   activity inference algorithm and a Bayesian framework to integrate
   multiple omics data.

Hidden driver analysis based on data-driven and context-specific networks

   In addition to transcription factors that are the focus of most
   network-based algorithms, signaling and epigenetic factors are also
   crucial drivers of immunological functions. However, many of these
   factors are hidden drivers, because their activities are associated
   with PTM but not with genetic alterations or expression abundance. PTM
   proteomics-based direct measurements of protein activities are
   technically challenging. Here, we highlight NetBID [[147]91]
   ([148]Figure 3), a recently developed algorithm to identify hidden
   drivers from multi-omics data by using data-driven networks and
   Bayesian inference. In our study of DC subset functions [[149]91],
   NetBID superimposed a DC-specific signaling interactome, which was
   computationally reconstructed from a set of transcriptomic profiles of
   total DCs, onto multi-layer omics datasets (transcriptome, proteome,
   phosphoproteome) to infer activities of signaling proteins in CD8α^+
   and CD8α^− DCs, followed by a Bayesian approach to integrate
   information at all levels, leading to the identification of putative
   hidden drivers that selectively modulate functions of DC subsets. In
   particular, NetBID has identified the Hippo kinase Mst1/Stk4 as a
   hidden driver, selectively active in CD8α^+ DCs, which was further
   validated by genetic and functional experiments. Of note, there is no
   differential expression of Mst1 at mRNA levels, while Mst1 protein
   expression is even lower in CD8α^+ than CD8α^− DCs. One advantage of
   NetBID for successfully capturing Mst1 is that the Mst1 subnetwork
   inferred in silico is enriched in its putative downstream targets as
   defined by perturbation experiments, enabling inference of its true
   functional activity. NetBID currently relies on bulk omics data. An
   improved version that handles single-cell omics data to infer
   cell-type–specific hidden drivers remains to be developed.

Figure 3.

   [150]Figure 3.
   [151]Open in a new tab

   Hidden driver analysis by NetBID. (A) The overview flowchart of NetBID
   analysis to identify hidden drivers of phenotype case vs. control. (B)
   An illustration of an example hidden driver (HD) that has no
   differential expression but has network enrichment and activity.
   Diff-exp, differential expression.

Intercellular network inference

   Cell–cell communication fundamentally regulates how the immune system
   operates as a network to effectively respond to infection and other
   insults. Systematically decoding intercellular networks that modulate
   immunity has been a longstanding challenge. Recently, an algorithm
   developed by text mining the literature has predicted previously
   unappreciated cell–cytokine interactions [[152]105*], but the attempt
   is limited by the inherent bias of existing knowledge. Single-cell
   technologies have provided a unique opportunity to tackle this
   challenge in a more unbiased manner. Systematic inference of
   intercellular communications is still in early development, with a few
   limited examples based on scRNA-seq to date
   [[153]106,[154]107*,[155]108,[156]109**,[157]110].

Closing remarks: towards translational systems immunology

   Technological advancement is driving fundamental discoveries in
   immunology. Recent advances in single-cell technologies enable the
   study of immunological diversity and complexity at an unprecedented
   resolution. Next-generation, single-cell omics methods are able to
   simultaneously capture additional information, such as spatial
   organization [[158]111], dynamic clonality via lineage barcoding
   [[159]112], and immune receptor repertoire
   [[160]113*,[161]114,[162]115]. An equally important and complementary
   effort is to develop sophisticated computational algorithms to analyze
   and integrate high-throughput multi-omics and multi-sourced data
   [[163]116].

   The importance of translational immunology is illustrated by the
   remarkable success of cancer immunotherapies that demonstrate durable
   responses in the clinic, including CAR-T-cell therapies [[164]117] and
   checkpoint-blockade therapies [[165]118], which were recently
   recognized by the Nobel prize [[166]119]. For example, tumor cells
   escape immune surveillance by up-regulating PD-L1 that interacts with
   PD-1 receptor on T cells to elicit the immune checkpoint response.
   Therefore, blocking the crosstalk between PD-L1 on tumor cells and PD-1
   on T cells will reactivate the cytotoxic T cells to kill tumor cells.
   However, immunotherapies are efficacious for only a fraction of
   patients, and existing biomarkers based on tumor mutation burden and
   single protein expression (e.g., PD-L1) have limited prediction power.
   The emerging systems immunology approaches could be translated to
   tackle pressing issues in the clinic [[167]98] by dissecting the
   heterogeneity and interactions of tumor and immune microenvironment
   [[168]116]. For instance, integrative systems biology analysis of bulk
   omics data from over 10,000 patient samples of 33 cancer types has
   provided instrumental insights into the immune landscape of cancer
   [[169]120]. More recently, scRNA-seq and high-dimensional flow
   cytometry analyses of human tumors have revealed a unique CD8 T cell
   subset that infiltrates tumors and responds to checkpoint blockade
   immunotherapy to mediate effective tumor immunity [[170]121,[171]122],
   and this control mechanism is also observed and validated in murine
   tumor models [[172]123,[173]124]. The state-of-the-art technologies are
   enabling comprehensive molecular characterization of tumor cells and
   their microenvironment from large cohorts of patient samples at the
   single-cell resolution. The development of immune-competent and
   humanized mouse models is facilitating immune-related functional and
   mechanistic studies. We envision that network-based systems immunology
   analysis of multi-omics data, from both the human and mouse model
   systems, will enable identification of hidden drivers of resistance to
   existing cancer immunotherapies, novel predictive biomarkers to better
   stratify patients, and novel therapeutic targets and combination
   strategies to overcome drug resistance and develop more precise
   immunotherapy. These strategies may also manifest legitimate
   therapeutic opportunities for other immune-related disorders, including
   autoimmune, inflammatory and neurodegenerative diseases.

Highlights.

     * Systems immunology is emerging with omics tools at population &
       single-cell levels
     * Integrative analysis of multi-omics data has revealed novel
       insights in immunology
     * Single-cell sequencing technology is driving immunology research
     * Data-driven and context-specific networks enable capture of hidden
       drivers

Acknowledgements