Abstract

   Gene expression data is commonly used in vaccine studies to
   characterize differences between treatment groups or sampling time
   points. Group-wise comparisons of the transcriptional perturbations
   induced by vaccination have been applied extensively for investigating
   the mechanisms of action of vaccines. Such approaches, however, may not
   be sensitive enough for detecting changes occurring within a minority
   of the population under investigation or in single individuals. In this
   study, we developed a data analysis framework to characterize
   individual subject response profiles in the context of repeated measure
   experiments, which are typical of vaccine mode of action studies.
   Following the definition of the methodology, this was applied to the
   analysis of human transcriptome responses induced by vaccination with a
   subunit influenza vaccine. Results highlighted a substantial
   heterogeneity in how different subjects respond to vaccination.
   Moreover, the extent of transcriptional modulation experienced by each
   individual subject was found to be associated with the magnitude of
   vaccine-specific functional antibody response, pointing to a
   mechanistic link between genes involved in protein production and
   innate antiviral response. Overall, we propose that the improved
   characterization of the intersubject heterogeneity, enabled by our
   approach, can help driving the improvement and optimization of current
   and next-generation vaccines.

   Subject terms: Computational biology and bioinformatics, Immunology,
   Systems biology

Introduction

   Seasonal influenza is a human respiratory disease, caused by influenza
   virus infection, which severity ranges from mild symptoms to
   hospitalization and death^[36]1–[37]3. To date, annual vaccination
   represents the most effective intervention for containment of this
   disease^[38]4,[39]5.

   To foster the development of new and more effective influenza vaccines,
   several systems biology studies tried to characterize the mechanisms of
   immunogenicity of these vaccines by searching for biomarkers of both
   humoral and cellular immune responses^[40]6–[41]10. Despite the general
   success in identifying molecular signatures of increased vaccine
   immunogenicity, these studies highlighted the fact that establishing
   causality in the chain of reactions triggered by vaccination is not
   trivial. This is due, at least in part, to the pervasive heterogeneity
   that characterizes the human population, which results in distinct
   individuals to respond differently to the same vaccine^[42]11. Such
   heterogeneity is likely the result of both heritable (genetic) and
   non-heritable factors (microbiome, age, diet, etc.), all of which can
   have a remarkable impact on the immune response to vaccination^[43]12.
   A further layer of complexity is introduced by the fact that the immune
   system is a complex apparatus, with different cells, tissues and organs
   dynamically interacting with each other over time^[44]13. This further
   impairs our ability to understand which immune components should be
   targeted by a vaccine in order to achieve a broadly protective
   response.

   In contrast to the pervasive heterogeneity characterizing the immune
   response among different individuals, conventional approaches to the
   analysis of immunological data are based on group-wise comparisons
   between groups of individuals (treatment vs control, treatment “A” vs
   treatment “B”, etc.) and on the resulting identification of signals
   that are consistently found in most individuals within a study group.
   Student’s t-test and analysis of variance are two examples of
   approaches falling in this category. While these represent powerful
   tools for assessing differences at a group level, they are not suited
   for capturing subtle signals in noisy datasets or changes occurring in
   only a small fraction of the study population. For this reason, there
   have been attempts to explore alternative approaches for characterizing
   the heterogeneity of biological systems. Among these, Menche et
   al.^[45]14 developed an approach to derive personalized transcriptome
   response profiles in individuals affected by either asthma, Parkinson’s
   or Huntington’s disease. Their method allowed for the characterization
   of genes being modulated within individual subjects, based on the
   comparison with a healthy control group. Their approach proved to be
   more sensitive than conventional techniques in identifying
   transcriptional signatures associated to a particular disease and
   allowed to diagnose subjects affected by Huntington’s disease with 100%
   accuracy.

   Inspired by the work of Menche et al., we developed a new data analysis
   pipeline, for the characterization of individual subject response
   profiles, and applied it to investigate the transcriptional responses
   induced by the administration of a subunit influenza vaccine. Our
   approach identifies genes’ modulation within individual subjects based
   on the comparison with pre-immune levels. The framework also includes a
   bootstrap step for estimating the level of confidence in the calls for
   gene modulation.

   Our findings highlighted a high degree of heterogeneity in the
   transcriptional response to influenza vaccine, both in terms of total
   number of modulated genes within each individual and in the consistency
   of modulation of a same gene across multiple study subjects. Despite
   this, however, genes involved in some key immune related functions
   showed a more robust response across the study population. Finally, we
   found the extent of transcriptional perturbation experienced by
   individual subjects to be correlated with functional antibody response
   induced by vaccination, with genes involved in protein synthesis and
   antiviral response driving this association.

   Overall, the present study establishes a new approach to the analysis
   and interpretation of vaccine response data, representing a promising
   strategy for improving our understanding of the human heterogeneity in
   vaccine responsiveness.

Results

Whole blood transcriptome response induced by influenza vaccination is highly
heterogeneous

   We set out to analyze the transcriptional response to vaccination using
   publicly available data (Tsang et al. 2014) consisting in PBMC
   transcriptome responses derived from 63 healthy adults vaccinated with
   the seasonal and pandemic H1N1 subunit influenza vaccines.
   Conventional, group-wise comparisons identified a set of genes which
   were significantly upregulated at either 24 hours or 7 days following
   vaccination (Fig. [46]1a, Supplementary Fig. [47]S1). Forty-three (43)
   and 14 differentially expressed genes (DEGs; absolute log
   [MATH: <msub><mrow></mrow><mn>2</mn></msub> :MATH]
   fold-change from baseline
   [MATH: <mrow><mo>≥</mo><mn>0.5</mn></mrow> :MATH]
   , adjusted p-value
   [MATH: <mrow><mo>≤</mo><mn>0.05</mn></mrow> :MATH]
   , two-tailed Wilcoxon signed-rank test) were identified at 1- and
   7-days post-vaccination, respectively. A deeper assessment of
   transcripts abundance levels, however, revealed a substantial level of
   heterogeneity across different individuals. Despite the significant
   modulation observed at the group level, a substantial portion of the
   study population showed post-vaccination transcript abundance levels
   comparable (average
   [MATH: <mrow><mo>±</mo><mn>1</mn></mrow> :MATH]
   standard deviation) with preimmune levels (Supplementary Table [48]S1
   and [49]S2, Supplementary Figure [50]S2).

Figure 1.

   [51]Figure 1
   [52]Open in a new tab

   Analytical framework for deriving individual transcriptome response
   profiles. (a) Group-wise DEGs analysis executed on day 1 / day 0 gene
   expression values (detected DEGs are shown in red). The dashed lines
   show the cutoffs for calling a gene differentially expressed: log
   [MATH: <msub><mrow></mrow><mn>2</mn></msub> :MATH]
   fold-change
   [MATH: <mrow><mo>±</mo><mn>0.5</mn></mrow> :MATH]
   (red lines), adjusted p-value
   [MATH: <mrow><mo><</mo><mn>0.05</mn></mrow> :MATH]
   (two-tailed, paired Wilcoxon signed-rank test, blue line). (b)
   Expression levels distribution for the CXCL10 gene. While the
   group-wise comparison suggests a global up-regulation (log
   [MATH: <msub><mrow></mrow><mn>2</mn></msub> :MATH]
   fold-change = 0.96, adjusted p-value
   [MATH:
   <mrow><mo>=</mo><mn>7.5</mn><mo>×</mo><msup><mn>10</mn><mrow><mo>-</mo>
   <mn>7</mn></mrow></msup></mrow> :MATH]
   , two-tailed Wilcoxon signed-rank test), some individuals exhibit
   either absence of modulation or down-regulation. (c) Illustration of
   the proposed approach to derive individual transcriptome response
   profiles. Instead of comparing case and control groups, each subject is
   tested individually against the distribution of the gene expression
   values from the control group. Genes are tested individually and when
   the expression level is different enough from the control distribution,
   they are defined as modulated. For each subject an individual
   transcriptome response profile is generated.

   For exemplary purposes, we report the response levels for the gene
   encoding for CXCL10, a chemoattractant chemokine involved in the
   recruitment of T lymphocytes into sites of tissue inflammation^[53]15.
   Multiple vaccine studies have reported CXCL10 to be strongly
   upregulated in peripheral blood few hours after vaccination, with peak
   responses typically occurring after 24 h^[54]16. Consistently with
   that, we also found the CXCL10-encoding gene to be amongst the most
   strongly upregulated transcripts at the 24 hours’ time point
   (Fig. [55]1a; log
   [MATH: <msub><mrow></mrow><mn>2</mn></msub> :MATH]
   fold-change
   [MATH: <mrow><mo>=</mo><mn>0.96</mn></mrow> :MATH]
   , adjusted p-value
   [MATH:
   <mrow><mo>=</mo><mn>7.5</mn><mo>×</mo><msup><mn>10</mn><mrow><mo>-</mo>
   <mn>7</mn></mrow></msup></mrow> :MATH]
   , two-tailed Wilcoxon signed-rank test). Despite the significant CXCL10
   upregulation observed at the group level, this response was not
   consistently found across all vaccinated subjects. Half of the study
   subjects (30 out of 60) showed a CXCL10 transcript abundance comparable
   (average
   [MATH: <mrow><mi>μ</mi><mo>±</mo><mn>1</mn></mrow> :MATH]
   standard deviation
   [MATH: <mi>σ</mi> :MATH]
   ) to preimmune levels (Fig. [56]1b). Moreover, for 5 subjects the
   CXCL10 baseline abundance was found to exceed the average level
   measured 24 hours after vaccination, highlighting a considerable level
   of intersubject heterogeneity in the response of this transcript.
   Overall, results indicate that the peripheral blood transcriptome
   response to influenza vaccination is characterized by a pervasive
   intersubject heterogeneity.

Only 55% of day 1 differentially expresses genes are also found within
individual subject response profiles

   To quantify the intersubject variation in PBMC transcriptome response,
   we computed individual subject response profiles. Briefly, each subject
   was tested individually against the distribution of the gene expression
   values from the control group. Genes were tested individually and when
   their expression level was different enough from the control
   distribution, they were defined as modulated. A bootstrap approach was
   used to limit the effect of outlying values and to assess the
   robustness of the response (Fig. [57]1c and methods). After computing
   the individual subject response profiles, we compared them to the list
   of DEGs identified through the Wilcoxon test-based group-wise analysis.
   Despite a substantial overlap, not all DEGs were confirmed by the
   individual profiles (Supplementary Fig. [58]S3). Among the 43 DEGs
   observed at day 1, an average of 55% [interquartile range (IQR):
   37%-77%] was also captured by the individual profiles, while for day 7
   (14 DEGs) the average overlap was 71% (IQR: 42%-100%). This suggests
   that the average responses captured by group wise analyses do not
   necessarily represent the behavior of each individual subject. For the
   day -7 and day 70 time points, which are unlikely to reflect any
   substantial vaccination-specific effect, the overlap was smaller [day
   -7 (11 DEGs): 23% (IQR: 0-36%); day 70 (2 DEGs): 20% (IQR: 0-38%);
   Supplementary Fig. [59]S3]. This is suggestive of the fact that the
   overlaps observed for day 1 and day 7 are reflective of a real
   biological perturbation induced by the vaccine stimulus.

   Individual response profiles from day 1 showed the highest number of
   modulated genes, with an average of 717 genes. The first and third
   quartiles of that distribution were respectively 531 and 1114 (Fig.
   [60]2a,b), reflecting the different response magnitude experienced by
   the different study subjects. Individual profiles from all other time
   points showed less modulation, albeit the spread between the 1st and
   3rd quartiles remained approximately two-fold (IQR day 7: 334-653; IQR
   day -7: 340-664; IQR day 70: 256-571). This same trend was also
   observed with the group-wise differential gene expression analysis
   (Supplementary Fig. [61]S1).

Figure 2.

   [62]Figure 2
   [63]Open in a new tab

   Heterogeneity among individual transcriptome response profiles. (a)
   Number of genes modulated among subjects detected by our analysis. (b)
   Reverse cumulative distributions of number of modulated genes across
   subjects. (c) Jaccard similarity coefficients of pairwise subjects
   combinations calculated using individual subject transcriptome
   profiles. (d) Number of shared differentially expressed genes (DEGs)
   among pairwise subjects combinations.

   An additional information gathered from individual transcriptome
   response profiles was the proportion of individuals in which each gene
   was found to be modulated. Surprisingly, no gene was found to be
   consistently modulated across the entire study population, regardless
   of sampling time point. As we have shown, the most robust response was
   observed at day 1. At this time point the number of genes found to be
   modulated in 20 or more subjects, for example, was 587, compared to 84,
   120 and 8 measured at days -7, 7 and 70, respectively. We also tested,
   for days 0, 1 and 7, the pairwise similarity between individual
   profiles by computing the Jaccard similarity metric and the number of
   commonly modulated genes (Fig. [64]2c,d). Both approaches confirmed
   that the day 1 response was the most consistent. In line with previous
   observations, however, the generally low similarity coefficients
   highlight substantial heterogeneity across individual profiles.

   Overall, we showed that the individual transcriptome response profiles,
   generated through our approach, were able to capture
   vaccination-specific responses while also providing additional
   information about the consistency of such responses. Specifically,
   transcriptome responses were found to differ greatly across different
   subjects and no transcript was found to be consistently modulated
   across the entire study population.

Genes participating to immune functions show a more consistent modulation

   Next, we tested whether the observed heterogeneity in the transcriptome
   response is a ubiquitous characteristic or if there are specific
   biological functions showing a more consistent modulation. To this end,
   we functionally characterized each individual transcriptome response
   profile by performing a pathway enrichment analysis. Briefly, we
   collected a list of canonical pathways from the MSigDB database and
   tested, for each individual profile, whether the number of modulated
   genes within a given pathway was higher than random expectation.
   Through this approach, we could derive a list of enriched pathways from
   each subject. The level of conservation of such pathways could then be
   assessed as the proportion of subjects in which they were found to be
   enriched. A comprehensive listing of the results observed throughout
   the different time points is presented in Supplementary Tables
   [65]S3-7.

   Figure [66]3a,b lists the 10 most frequently enriched pathways across
   the study population for days 1 and 7, respectively. Consistently with
   previous observations from group-wise analyses^[67]6,[68]9,[69]10, 24
   hours transcriptomes showed enrichment for genes involved in cytokine
   signaling and interferon-mediated antiviral response. Conversely, day 7
   responses were enriched for protein transcription and cell
   proliferation functions, a pattern that is usually associated with B
   lymphocyte proliferation and differentiation into antibody-producing
   cells. Similarly to what was observed with individual genes, also the
   pathways activation was variable across subjects, with the most
   frequently enriched pathway at 24 h, cytokine signaling in immune
   system, being enriched in less than 50% of the study population (28 out
   of 60 subjects; Fig. [70]3a). Seven days post vaccination the response
   was generally less consistent and pathways were found to be enriched in
   up to 17 out of 57 subjects (less than 30% of the study population;
   Fig. [71]3b).

Figure 3.

   [72]Figure 3
   [73]Open in a new tab

   Functional characterization of individual subject response profiles.
   (a,b) Top ten immunological pathways that show enrichment in the
   highest number of subjects at days 1 and 7, respectively. X-axis:
   number of subjects in which a pathway is significantly enriched. (c,d)
   Most frequently enriched pathways at days 1 and 7 respectively. Jaccard
   similarity coefficients represent the robustness of gene modulation for
   those genes belonging to a specific pathway (red line). For comparison,
   the robustness of gene modulation of non-in-pathway, randomly selected
   genes (1000 bootstrap samplings) is also shown. Solid and dashed black
   lines represent the median and 95th percentile of the bootstrapped
   distributions. Gray shaded areas represent the 95% confidence interval.

   To test whether the observed functional enrichment was indicative of a
   true biological signal, the Jaccard pairwise similarity for modulated
   genes observed in each enriched functional pathway was compared to that
   of randomly selected gene pools. Genes belonging to the same functional
   pathway typically showed more robust modulation than randomly assorted
   genes (Fig. [74]3c-d and Supplementary Fig. [75]S5-6). Overall, these
   results are indicative of the fact that, despite the substantial
   response heterogeneity across different subjects, there is a
   convergence towards more consistent modulation of genes participating
   to biological processes involved in the response to influenza
   vaccination.

Assessment of vaccination-induced transcriptional modules shows that the
intersubject response heterogeneity increases as a function of time

   Given the relatively high number of genes within the individual
   response profiles, there is the chance that genes may appear as
   modulated across different subjects simply by chance. In the absence of
   a biological driver, it could be assumed a model of complete
   independence between subjects and expect the occurrence of
   perturbations across multiple subjects to follow a random expectation.
   Evidences collected so far, however, indicate that the vaccination
   stimulus induces transcriptional perturbations pointing towards common
   functional disruptions and that a vaccine-induced gene module may
   exist. In order to characterize such transcriptional module, we applied
   a combinatorial model to compute the random probability of a modulated
   gene to appear in multiple subjects and compared it to the available
   experimental data.

   The combinatorial model indicated that genes co-modulated in
   [MATH: <mo>≥</mo> :MATH]
   25 and
   [MATH: <mo>≥</mo> :MATH]
   19 subjects at days 1 and 7, respectively, are unlikely to appear
   randomly and could be reasonably attributed to a vaccine-related
   effect. This estimate produced a core-pool of 341 and 135 genes for day
   1 and 7 responses, respectively (Fig. [76]4, Fig. [77]S7 and
   Supplementary Data [78]1 and [79]2). From a functional perspective, the
   core pool of day 1 modulated genes included an abundance of genes
   participating in cytokine signaling, mainly related to type-I/II
   interferon response. The day 7 core pool, instead, was characterized by
   genes participating in mitotic cell division, likely reflecting B-cell
   proliferation and differentiation into plasmablasts (supplementary Data
   [80]3). Overall, the observed functional profiles are consistent with
   previously reported data^[81]6,[82]9,[83]10.

Figure 4.

   [84]Figure 4
   [85]Open in a new tab

   Individual transcriptome response profiles for the day 7
   vaccination-induced gene pool. Each row corresponds to a gene and each
   column to a subject. On the right: number of gene modulations observed
   across all subjects. On the bottom: number of modulated genes in each
   study subject.

   As a negative control, the same test was applied to samples collected
   before vaccination (day 0 and day -7). In both cases, no genes were
   found to exceed the identified random expectation thresholds. In line
   with previous observations, also within the identified vaccine-induced
   gene modules the modulation was found to be more robust at day 1
   compared to day 7. For day 1, the median number of subjects in which a
   gene was modulated was 30 (50% of the 60 subjects, IQR: 27-35), whereas
   at day 7 the median number was 25 (44% of the 57 subjects, IQR: 21-30).

The magnitude of day 7 transcriptome response is associated with functional
antibody response

   Having identified the set of genes that are modulated by vaccination,
   we tested whether the extent of transcriptional modulation of these
   genes correlates with the humoral immune response. The percentage of
   modulated genes within each individual response profile was tested for
   correlation with subjects’ ability to produce functional antibodies
   using a receiver operating characteristic (ROC) curve. Subjects were
   stratified into high and low responders based on their day 70 influenza
   microneutralization titer response.

   Overall, the modulation of vaccination-induced gene pools was 51%
   (median, IQR: 36%-72%) for day 1 and 40% (median, IQR: 25%-68%) for day
   7 (see Supplementary Figure [86]S4). The ROC curve provided an area
   under the curve (AUC) of 0.53 and 0.81 respectively for day 1 and day 7
   transcriptome response data (Fig. [87]5a), suggesting the existence of
   a correlation between the number of modulated genes at day 7 and the
   subsequent functional antibody production. Conversely, day 1
   transcriptome data did not show any appreciable correlation. These
   evidences are in agreement with previous studies, which proposed that
   day 7 blood-derived signatures reflect the establishment of the humoral
   immune response^[88]9,[89]17,[90]18.

Figure 5.

   [91]Figure 5
   [92]Open in a new tab

   The day 7 vaccination-induced gene pool is associated with the
   magnitude of seroresponse detected through day 70 MicroNeutralization
   titers. (a) Smoothed ROC curves of the immunological response
   classification predictions inferred using the modulation percentages of
   the vaccination-induced gene pools calculated from the individual
   transcriptome response profiles. (b) Percentage of modulation in the
   day 7 vaccination-induced gene pool shown in relation to the
   immunological response classification based on day 70 Influenza
   MicroNeutralization titers. (c,d) Distributions of vaccination-induced
   gene pools modulation percentages grouped by immunological response
   classification.

   The correlation between the day 7 transcriptome response and subjects’
   ability to produce neutralizing antibodies is shown in Figure [93]5b
   (see Supplementary Fig. [94]S8 for day 1 data). The median number of
   modulated genes in high responder subjects was 63% (IQR: 55%-86%),
   while in low responders this number dropped to 27% (IQR: 20%-45%, Fig.
   [95]5c-d). This difference was statistically significant (p-value =
   0.002, two-sided Kolmogorov-Smirnov tests followed by
   Benjamini-Hochberg correction for multiple testing). Further
   investigation identified 66 genes for which the modulation was
   significantly more frequent among the high responders than low
   responders (Fisher exact test, one sided, p-value
   [MATH: <mo>≤</mo> :MATH]
   0.05, Benjamini-Hochberg correction; see Supplementary Table [96]S8).
   These included several genes participating in immune response
   functions. Overall, these data are indicative of the fact that subjects
   experiencing a higher transcriptional activity at day 7 are more likely
   to produce neutralizing antibodies.

   To provide corroborative evidence in support to this finding we also
   tested whether the amount of vaccine specific IgG-secreting cells,
   measured using ELISpot assays targeted against the entire vaccine
   (seasonal or pandemic H1N1) 7 days post-vaccination, showed any
   correlation with the individual transcriptome response profiles. As
   observed with microneutralization titers, also the seasonal
   vaccine-specific B cell response showed association with day 7
   transcriptome profiles (AUC = 0.74; supplementary Fig. [97]S9).
   Conversely, the same trend was not observed with the pandemic
   H1N1-specific B cells (supplementary Fig.[98]S10).

   Finally, the same procedure was applied to evaluate the potential
   effect of subjects’ intrinsic characteristics (age, gender and
   pre-immune status) on their individual transcriptome response profile.
   None of the evaluated factors showed any appreciable effect (results
   relative to pre-existing vaccine-specific antibodies are shown in
   supplementary Fig. [99]S11).

Identification of genes associated with functional antibody response

   In order to get further insights into which genes were driving the
   observed correlation between the day 7 transcriptome response and day
   70 microneutralization titers, we applied a 5-fold cross validated
   Guided Random Forest (GRF)^[100]19. Briefly, the strategy was to
   classify subjects into high or low responders, without prior knowledge
   of their microneutralization response, using the gene modulation data
   of the day 7 vaccination-induced gene pool as input. This approach was
   able to predict subjects’ functional antibody production with an 80%
   correct classification rate (min-max: 67%-100%, min-max AUCs: 70%-100%,
   Fig. [101]6a,b).

Figure 6.

   [102]Figure 6
   [103]Open in a new tab

   Guided Random Forest results. (a) Classification accuracy of the models
   (percentage of correctly classified subjects). (b) ROC curves and AUCs
   of the models. (c) Number of occurrences across the five models of the
   most informative genes (from each model, the top 10 most informative
   features are reported). The y-axis indicates the number of models in
   which a gene was observed in the top 10. (d) Top 10 most informative
   genes across GRF models ordered by median of the mean decrease Gini
   coefficients (higher values correspond to higher feature importance).

   To understand which genes were most informative, the top 10 scoring
   genes from each of the five cross-validations were selected for further
   analysis. Three of these genes were selected by all five models, 6
   genes were present in four out of five models and 1 was found in three
   models (Fig. [104]6c). Ranking of these genes by their median decrease
   Gini across all five models (Fig. [105]6d) highlighted GLDC as the most
   informative gene. This gene encodes a mitochondrial glycine
   decarboxylase, which was described to confer survival advantage to
   T-cells in hypoxic environments such as sites of
   infection^[106]20,[107]21. Another top performing gene was HIST1H3G,
   which codes for a replication-dependent histone protein which is
   generally active during the S-phase^[108]22. We also observed ZBP1, a
   cytosolic DNA sensor which induces the DNA-mediated interferon
   response^[109]23 and APOBEC3B, a cytosine deaminase induced by viral
   infection^[110]24. Among these genes we also observed IRF4 and POU2AF1,
   both of which are involved in maturation and differentiation of T and B
   cells^[111]25–[112]27, respectively.

Discussion

   Human response to vaccination can be a heterogeneous phenomenon. While
   some of the factors contributing to this variation have been
   identified, others are still to be unraveled. Young and elderly
   individuals, for example, usually show a suboptimal response to
   vaccination due to their immune system being not fully developed or
   aged, respectively^[113]28,[114]29. Several latent factors have also
   been proposed to impact vaccine responsiveness, the mechanisms through
   which they mediate their effect, however, are still subject of
   research^[115]11. It is generally recognized that this lack of
   knowledge on how immune responses to vaccination are generated is a
   critical barrier to the development of vaccines that are effective
   across diverse populations^[116]30,[117]31.

   To cope with the molecular heterogeneity characterizing a set of
   systemic diseases, Menche and coworkers^[118]14 devised a data analysis
   strategy for deriving personalized response profiles and overcome the
   inherent limitations of group-wise analysis methodologies. In this
   work, we presented an adaptation of this methodology, aimed at
   accommodating the analysis of datasets generated through repeated
   measures design studies and to provide an estimate of the robustness of
   the observed gene perturbations.

   The individual-subject response profiles derived from healthy adults
   following influenza vaccination showed a high degree of heterogeneity,
   both in terms of total number of modulated genes within each individual
   and in the consistency of modulation of a same gene across multiple
   subjects. Day 1 individual response profiles were generally more
   consistent than those observed 7 days post-vaccination. This may be
   explained by the fact that differences among subjects add up during the
   chain of events linking the innate and adaptive arms of the immune
   response. To our surprise, however, no gene was found to be
   consistently modulated in all study subjects, regardless of the
   sampling time point.

   In principle, the high heterogeneity observed among different
   individuals could have arisen from technical variation, random noise or
   a combination of the two. In such case, functional enrichment analysis
   of the individual response profiles would have failed in identifying
   molecular pathways that are consistently enriched across multiple
   individuals. Our results, instead, showed consistent enrichment of a
   number of molecular pathways, covering immune related functions, that
   were previously described to be involved in the immune response to
   influenza vaccination. These include cytokine signaling and
   Interferon-mediated antiviral response for day 1 response profiles and
   transcripts involved in B lymphocyte proliferation and differentiation
   into antibody-producing cells for day 7 profiles^[119]6,[120]9. These
   evidences point to the conclusion that influenza vaccination
   consistently activates these biological processes, while the observed
   heterogeneity in the individual transcriptome response profiles
   represents the molecular diversity of such functional perturbations.

   One of the main areas of interest for modern vaccinology studies is
   that of identifying early cellular or molecular biomarkers that are
   able to predict the quality and magnitude of the adaptive response to
   vaccination. After deriving subjects’ individual response profiles, we
   found that the number of modulated genes within each individual subject
   7 days post-vaccination was correlated with the magnitude of
   vaccine-specific functional antibody response. Given the functional
   analysis of day 7 transcriptome responses, which highlighted an
   enrichment of genes involved in B-cell and plasmablasts response, it
   could be assumed that the ability of the influenza vaccine to elicit a
   plasmablasts response at day 7 is linked with the downstream production
   of functional antibodies. We also investigated on which genes are most
   strongly associated with a high antibody response through a machine
   learning approach. This analysis identified genes involved in protein
   production (GLDC and HIST1H3G) and antiviral response (APOBEC3B and
   ZBP1). These evidences, which recall classical notions of immunology,
   represent a corroborative evidence for the relevance of our
   methodology. Importantly, results on the identified genes, which could
   potentially serve as predictive biomarkers, should be interpreted with
   caution until further validation through independent analyses or,
   ideally, experimental testing.

   Overall, the present study allowed for a deeper characterization of the
   transcriptional response induced by influenza vaccination in humans. We
   argue that the improved characterization of the intersubject
   heterogeneity offered by this approach will help driving the
   improvement and optimization of both new and established vaccines. The
   ability to identify genes that are predictive of functional antibody
   production following vaccination, along with the fraction of subjects
   in which they are modulated, can allow for better predictions on the
   real-world effectiveness of vaccines. Furthermore, individual subject
   analysis approaches could help the improvement of vaccines targeting
   specific subpopulations, like the elderly or immunocompromised
   individuals.

   Finally, we envision that the application of this methodology to
   multiple vaccine studies and datasets will be foundational to expand
   our understanding of vaccines’ mode of action. Our methodology was
   designed to be readily generalizable to all repeated measures
   experiments and adaptable to different types of data.

Methods

Study dataset

   The present study was based on data previously reported by Tsang et al.
   2014. In that study (Clinicaltrials.gov [121]NCT01191853), 63 healthy
   adults were vaccinated with a single dose of both the 2009 FLUVIRIN
   seasonal influenza (Novartis) and H1N1 pandemic (Sanofi-Aventis)
   vaccines, both without adjuvant. Pre-processed Affymetrix microarray
   data were downloaded from the GEO database using the [122]GSE47353
   accession number. Control and non-protein coding probe sets were
   filtered out and probe sets mapping onto the same gene were collapsed
   by computing their geometric average fluorescence intensity. To limit
   the dataset dimensionality, the median absolute deviation across the
   dataset was computed for each transcript and the 5000 most variable
   ones were retained for further analysis. Antibody response data were
   collected from the original manuscript^[123]18.

Group-wise transcriptome data analysis

   Group-wise comparisons of transcript abundance values were performed
   using a two-sided, paired Wilcoxon signed-rank test. p-values were
   corrected using the Benjamini-Hochberg procedure. Genes showing an
   absolute log
   [MATH: <msub><mrow></mrow><mn>2</mn></msub> :MATH]
   fold-change from baseline
   [MATH: <mrow><mo>≥</mo><mn>0.5</mn></mrow> :MATH]
   and an adjusted p-value
   [MATH: <mrow><mo>≤</mo><mn>0.05</mn></mrow> :MATH]
   were considered significantly modulated.

Analytical framework for deriving individual transcriptome response profiles

   Transcriptome response data were cleaned from any subject-specific
   effect through a mixed-effect model. Transcript abundance values (v)
   were set as the dependent variable, the sampling time points (t) as the
   fixed effect and the subject identity (s) as the random effect [lmer
   [MATH: <mrow><mo
   stretchy="false">(</mo><mi>v</mi><mo>∼</mo><mi>t</mi><mo>+</mo><mo
   stretchy="false">(</mo><mn>1</mn><mo
   stretchy="false">|</mo><mi>s</mi><mo stretchy="false">)</mo><mo
   stretchy="false">)</mo></mrow> :MATH]
   ]. Fitted random effect components were then subtracted from the
   original data.

   Gene modulation in individual subjects was assessed by comparing the
   transcript abundance of each single gene at a specific time point to
   the respective abundance values observed in the study population before
   vaccination (reference sample). To limit the effect of outlying values
   and to assess the robustness of the response, this comparison was
   repeated 1000 times over bootstrapped reference distributions obtained
   by randomly selecting 2/3 of the study subjects. If a test gene
   abundance differed from that of the reference distribution over a
   predefined threshold (median ± 2.5 median absolute deviations), the
   gene was assumed to test positive. Genes testing positively over
   [MATH: <mrow><mo>≥</mo><mn>75</mn><mo>%</mo></mrow> :MATH]
   of bootstrap iterations were regarded as being modulated by
   vaccination. Genes were tested for both up- and down-regulation from
   baseline. This approach produced individual transcriptome response
   profiles consisting of arrays of categorical gene modulation values (
   [MATH: <mrow><mn>0</mn><mo>=</mo></mrow> :MATH]
   no modulation,
   [MATH: <mrow><mn>1</mn><mo>=</mo></mrow> :MATH]
   up-regulation,
   [MATH: <mrow><mo>-</mo><mn>1</mn><mo>=</mo></mrow> :MATH]
   down-regulation). One response profile was generated for each subject
   and time point combination.

Gene set Enrichment Analysis

   A curated list of canonical biological pathways (C2 collection, v6.1)
   was obtained from the Molecular Signatures Database
   [124]https://www.gsea-msigdb.org/gsea/msigdb/index.jsp;^[125]32–[126]34
   ). Pathways with less than 10, or more than 200 mapped genes were
   excluded from the analysis, providing a total of 437 gene sets and 2369
   mapped genes. Functional enrichment was computed using the Fisher’s
   exact test followed by the Benjamini-Hochberg p-value correction for
   multiple testing. For each individual response profile, the proportion
   of modulated genes within a given pathway was compared with the
   proportion of modulation of genes not included in that pathway.
   Pathways scoring an adjusted p-value
   [MATH: <mrow><mo>≤</mo><mn>0.05</mn></mrow> :MATH]
   were considered significantly enriched. The overall vaccination effect
   was then assessed by counting, for each pathway, in how many subjects
   it was found to be enriched.

   To test whether the observed pathway enrichments were indicative of a
   true biological signal, as opposed to random observation, we tested
   whether genes within each given pathway were more consistently
   modulated than an equally sized set of randomly selected genes. The
   Jaccard similarity index^[127]35 was computed for all pairwise subject
   combinations to assess the proportion of genes being modulated across
   multiple subjects. This metric was computed twice, once using the genes
   belonging to a given pathway, and another time using an equally sized
   set of randomly selected genes (the test on the randomly selected genes
   was bootstrapped 1000 times). If genes’ modulation within a test
   pathway was more pervasive (
   [MATH: <mrow><mo>≥</mo><mn>95</mn><mo>%</mo></mrow> :MATH]
   quantile of the control values) than that of randomly selected genes,
   that signal was assumed to be biologically meaningful.

Definition of vaccination-induced gene pools

   Starting from the individual response profiles, the pool of genes
   modulated by vaccination was determined based on the comparison with
   the random probability of a gene to be modulated across multiple
   subjects. This null probability was computed using a binomial
   distribution, according to a previously established approach^[128]14.
   Briefly, a null model is established where each subject has g modulated
   genes drawn randomly from all G genes. The probability for one gene to
   be modulated in k out of n subjects can then be calculated via the
   binomial distribution, where
   [MATH:
   <mrow><mi>p</mi><mo>=</mo><mfrac><mi>g</mi><mi>G</mi></mfrac></mrow>
   :MATH]
   .
   [MATH: <mrow><mtable><mtr><mtd
   columnalign="right"><mrow><mi>f</mi><mrow><mo
   stretchy="false">(</mo><mi>k</mi><mo>;</mo><mi>n</mi><mo>,</mo><mi>p</m
   i><mo
   stretchy="false">)</mo></mrow><mo>=</mo><mtext>Pr</mtext><mrow><mo
   stretchy="false">(</mo><mi>X</mi><mo>=</mo><mi>k</mi><mo
   stretchy="false">)</mo></mrow><mo>=</mo><mfenced close=")"
   open="("><mrow><mtable><mtr><mtd><mi>n</mi></mtd></mtr><mtr><mtd><mrow>
   <mrow></mrow><mi>k</mi></mrow></mtd></mtr></mtable></mrow></mfenced><ms
   up><mi>p</mi><mi>k</mi></msup><msup><mrow><mo
   stretchy="false">(</mo><mn>1</mn><mo>-</mo><mi>p</mi><mo
   stretchy="false">)</mo></mrow><mrow><mi>n</mi><mo>-</mo><mi>k</mi></mro
   w></msup></mrow></mtd></mtr></mtable></mrow> :MATH]
   1

   Setting g as the mean number of genes observed across the individual
   profiles at a given time point, it is possible to derive the histogram
   of the number of subjects in which a given gene is found to be
   co-modulated by random chance by multiplying
   [MATH: <mrow><mi>G</mi><mo>×</mo><mo
   stretchy="false">(</mo><mi>f</mi><mo>;</mo><mi>n</mi><mo>,</mo><mi>p</m
   i><mo stretchy="false">)</mo></mrow> :MATH]
   .

   Based on this approach, we computed 1000 random binomial distributions
   using
   [MATH: <mrow><mi>G</mi><mo>=</mo><mn>5000</mn></mrow> :MATH]
   ,
   [MATH: <mrow><mi>n</mi><mo>=</mo></mrow> :MATH]
   number of available subjects for each specific time point and
   [MATH:
   <mrow><mi>p</mi><mo>=</mo><mfrac><mi>g</mi><mi>G</mi></mfrac></mrow>
   :MATH]
   . For each iteration, the minimum number of subjects (
   [MATH: <msub><mi>X</mi><mo movablelimits="true">min</mo></msub> :MATH]
   ) in which the resulting number of commonly modulated genes was zero
   was retained, providing a pool of 1000
   [MATH: <msub><mi>X</mi><mo movablelimits="true">min</mo></msub> :MATH]
   values. We then set the threshold number of subjects (X) for inclusion
   in the vaccination-induced gene pool as the 95th percentile of the
   [MATH: <msub><mi>X</mi><mo movablelimits="true">min</mo></msub> :MATH]
   values distribution. This allowed to identify genes with a probability
   [MATH: <mo>≲</mo> :MATH]
   5% of being co-modulated across
   [MATH: <mrow><mo>≥</mo><mi>X</mi></mrow> :MATH]
   subjects. The resulting X values were 25 and 19 subjects for day 1 and
   day 7 responses, respectively. Therefore, genes found to be
   co-modulated in
   [MATH: <mo>≥</mo> :MATH]
   25 subjects for day 1 and
   [MATH: <mo>≥</mo> :MATH]
   19 subjects for day 7 were regarded as being specifically modulated by
   vaccination.

Identification of transcriptional signatures associated with antibody
response

   Subjects were classified into high and low responders based on their
   functional antibody response. The day 70 fold-increase in influenza
   MicroNeutralization titer, the maximum value across the different
   strains represented in the vaccine, was used as response measure.
   Subjects for which the response was above or below the median of the
   distribution were respectively tagged as high and low responders. The
   percentage of modulated genes within each individual response profile
   was then used as discriminant variable for high and low responder
   subjects using a receiver operating characteristic curve. The area
   under the curve was used as measure of the classification performance.

   In order to identify those genes that were contributing the most to the
   high/low responder subjects’ classification, we applied a guided random
   forest classifier (GRF)^[129]19 combined with a five-fold
   cross-validation. This task was performed using the RRF R package with
   the following parameters: ntree = 30000 and mtry = 4. The penalty score
   of the GRF^[130]19 was set to maximum (
   [MATH: <mrow><mi>γ</mi><mo>=</mo><mn>1</mn></mrow> :MATH]
   ). Variables (genes) importance was imputed based on the mean decrease
   Gini score obtained across the five cross-validated GRF models.

Supplementary information

   [131]Supplementary Information 1.^ (3.4KB, csv)
   [132]Supplementary Information 2.^ (1.4KB, csv)
   [133]Supplementary Information 3.^ (12.4KB, xlsx)
   [134]Supplementary Information 4.^ (66.6KB, pdf)
   [135]Supplementary Information 5.^ (2MB, pdf)

Author contributions

   C.D.I. executed the analyses and wrote the manuscript. E.S. conceived
   and supervised the study and wrote the manuscript. M.B., D.Maf., L.D.M,
   M.C. and D.Med. gave scientific advice. All authors reviewed the
   manuscript.

Competing interests

   Carlo De Intinis is a PhD student at the University of Turin, with a
   scholarship from GlaxoSmithKline Biologicals SA. DMa participated in a
   GSK sponsored PhD program at the time of the study. LDM participated in
   a GSK sponsored postdoctoral scholar at the time of the study. DMe was
   an employee of GSK group of companies at the time of the study. ES, MB
   and MC are all employees of the GSK group of companies. The clinical
   study was independently conducted by the National Institute of Health
   (NIH), in accordance with the Declaration of Helsinki. The study is
   registered on [136]http://clinicaltrials.gov with the reference ID
   [137]NCT01191853. Data was collected from the Gene Expression Omnibus
   public database (Accession ID: [138]GSE47353).

Footnotes

   Publisher's note

   Springer Nature remains neutral with regard to jurisdictional claims in
   published maps and institutional affiliations.

Supplementary Information

   The online version contains supplementary material available at
   10.1038/s41598-021-99870-0.

References