Abstract Influenza, a communicable disease, affects thousands of people worldwide. Young children, elderly, immunocompromised individuals and pregnant women are at higher risk for being infected by the influenza virus. Our study aims to highlight differentially expressed genes in influenza disease compared to influenza vaccination, including variability due to age and sex. To accomplish our goals, we conducted a meta-analysis using publicly available microarray expression data. Our inclusion criteria included subjects with influenza, subjects who received the influenza vaccine and healthy controls. We curated 18 microarray datasets for a total of 3,481 samples (1,277 controls, 297 influenza infection, 1,907 influenza vaccination). We pre-processed the raw microarray expression data in R using packages available to pre-process Affymetrix and Illumina microarray platforms. We used a Box-Cox power transformation of the data prior to our down-stream analysis to identify differentially expressed genes. Statistical analyses were based on linear mixed effects model with all study factors and successive likelihood ratio tests (LRT) to identify differentially-expressed genes. We filtered LRT results by disease (Bonferroni adjusted p < 0.05) and used a two-tailed 10% quantile cutoff to identify biologically significant genes. Furthermore, we assessed age and sex effects on the disease genes by filtering for genes with a statistically significant (Bonferroni adjusted p < 0.05) interaction between disease and age, and disease and sex. We identified 4,889 statistically significant genes when we filtered the LRT results by disease factor, and gene enrichment analysis (gene ontology and pathways) included innate immune response, viral process, defense response to virus, Hematopoietic cell lineage and NF-kappa B signaling pathway. Our quantile filtered gene lists comprised of 978 genes each associated with influenza infection and vaccination. We also identified 907 and 48 genes with statistically significant (Bonferroni adjusted p < 0.05) disease-age and disease-sex interactions, respectively. Our meta-analysis approach highlights key gene signatures and their associated pathways for both influenza infection and vaccination. We also were able to identify genes with an age and sex effect. This gives potential for improving current vaccines and exploring genes that are expressed equally across ages when considering universal vaccinations for influenza. Keywords: influenza, vaccinations, immunity, aging, meta-analysis, micro-arrays, gene-expression 1. Introduction The influenza virus, a respiratory pathogen, is responsible for seasonal influenza (also known as the flu), influenza pandemics and high rates of morbidity and mortality worldwide ([27]1). The influenza virus infects the upper respiratory tract by invading the epithelial cells, releasing viral RNA, replicating and spreading throughout the respiratory tract while also causing inflammation ([28]2). Influenza is a highly contagious disease and spreads easily via contact with an infected person's nasal discharges and cough droplets ([29]3). The main virulence factors are haemagglutinin (HA) and neuraminidase (NA) ([30]2). These surface glycoproteins are also important for determining the sub-type of the influenza virus. The influenza virus can also reduce host gene expression through their viral proteins ([31]4, [32]5). The viral proteins affect transcription and translation in the host which reduces the production of host proteins and promotes immune system evasion for the virus ([33]4, [34]5). The virus interferes with host gene expression to promote viral gene expression, and this affects the immune system of the host by reducing the expression of immune components such as the major histocompatibility (MHC) molecules antigen presentation, and interferon and cytokine signaling pathways ([35]4, [36]6). Influenza is a global health burden, and as a preventative method vaccinations are offered annually. Vaccines are modified annually because the influenza virus strains change and mutate every season ([37]7). The influenza vaccinations target the viral strains and sub-types that researchers predict would be most prevalent each flu season ([38]3, [39]8). Furthermore, there are groups in the population who are considered at a higher risk for influenza infection, and they include young children, elderly, individuals who are immunocompromised, and females who are pregnant ([40]3). The Centers for Disease Control and Prevention (CDC) has estimated, for the 2017–2018 season for influenza, 959,000 hospitalizations and over 79,000 deaths ([41]3). 90% of the deaths during the 2017–2018 flu season were within the elderly population, while about 48,000 of the hospitalizations were in children ([42]3). These estimates highlight that young children and especially the elderly are at higher risks for influenza and severe infections that can lead to hospitalization or death. Additionally, the CDC has recommended varying dosages for each vaccine for different age groups due to age-dependent immune responses ([43]3, [44]9). Due to a decrease in efficacy of the influenza vaccines in the 65 and older population, they receive different dosages compared to younger age groups, in order to elicit a beneficial immune response ([45]3, [46]9). Contrasting between changes in gene expression due to immunosenescence in healthy subjects and the age-dependent immune responses to diseases such as influenza can help our understanding of how responses to different diseases vary with age. Due to the influenza virus constantly changing and the efficacy of the vaccine being dependent on one's age, researchers have started efforts to develop a universal vaccine ([47]10–[48]12). The goal is for such a universal vaccine to provide protection to all influenza strains ([49]13). One approach, is to implement the use of highly conserved influenza peptides in vaccine formulations ([50]12, [51]13). Previous studies have investigated global blood gene expression to compare influenza disease to other respiratory diseases to assess severity and pathogenesis ([52]14). For example, influenza has been shown to induce a stronger immune response than respiratory syncytial virus by producing more respiratory cytokines ([53]14, [54]15). Studies also explored responses to vaccinations to highlight gene signatures. In our meta-analysis, our aim was to combine publicly available influenza microarray data to identify the effects of disease state (control, influenza infection, and vaccination), age and sex on gene expression. We explored gene expression variation in blood for 3,481 samples (1,277 controls, 297 influenza infected, 1,907 influenza vaccinated) to identify genes and their pathways in influenza ([55]Figures 1, [56]2). This is to the best of our knowledge, the largest meta-analysis (18 datasets) to explore blood expression changes in influenza infection and vaccination. Our results provide gene signatures and pathways that can be targeted to improve influenza treatment and vaccinations. We also highlight disease associated genes that have interactions with age and sex, that can be used to further explore improving vaccinations, and aid efforts in identifying potential gene targets toward developing universal vaccinations to help reduce the burden of influenza. Figure 1. [57]Figure 1 [58]Open in a new tab Meta-analysis workflow to assess gene expression variation in influenza disease and vaccination. (A) Main steps. (B) Data pre-processing in R. (C) Downstream analysis. (D) post-hoc analysis. Figure 2. [59]Figure 2 [60]Open in a new tab Preferred reporting items for systematic reviews and meta-analyses (PRISMA) checklist. 2. Methods We curated 18 influenza-related microarray datasets from public database repositories ([61]Table 1) to investigate changes in gene expression due to disease status, sex, and age. The 18 datasets were from Affymetrix and Illumina microarray platforms ([62]Table 1). We modified and implemented the data-analysis pipeline outlined by Brooks et al. ([63]29). To achieve our goal, after curating the datasets, we used the R programming language ([64]30) to pre-process the raw gene expression data and to fit linear mixed effects models to determine statistically significant differentially expressed genes by factor ([65]Figure 1). In addition, we identified genes that varied in expression due to disease status, sex, and age, and we also determined which gene ontology (GO) terms and pathways enrichment based on these gene sets ([66]Figure 1). Table 1. Demographics of curated influenza microarray datasets. Accession number Controls Influenza disease Influenza vaccine Sex (M/F) Age range Platform References