Abstract Marek’s disease (MD) is a T cell lymphoma disease induced by Marek’s disease virus (MDV), a highly oncogenic α herpesvirus primarily affecting chickens. MD is a chronic infectious disease that threatens the poultry industry. However, the mechanisms of genetic resistance for MD are complex and not completely understood. In this study, to identify high-confidence candidate genes of MD genetic resistance, high throughput sequencing (RNA-seq) was used to obtain transcriptomic data of CD4^+ T cells isolated from MDV-infected and non-infected groups of two reciprocal crosses of individuals mating by two highly inbred chicken lines (6[3] MD-resistant and 7[2] MD-susceptible). After RNA-seq analysis with two biological replicates in each group, we identified 61 and 123 single nucleotide polymorphisms (SNPs) (false discovery rate (FDR) < 0.05) annotated in 39 and 132 genes in intercrosses 6[3] × 7[2] and 7[2] × 6[3], respectively, which exhibited allele-specific expression (ASE) in response to MDV infection. Similarly, we identified 62 and 79 SNPs annotated in 66 and 96 genes in infected and non-infected groups, respectively. We identified 534 and 1543 differentially expressed genes (DEGs) (FDR < 0.05) related to MDV infection in intercrosses 6[3] × 7[2] and 7[2] × 6[3], respectively. We also identified 328 and 20 DEGs in infected and non-infected groups, respectively. The qRT-PCR using seven DEGs further verified our results of RNA-seq analysis. The qRT-PCR of 11 important ASE genes was performed for gene functional validation in CD4^+ T cells and tumors. Combining the analyses, six genes (MCL1, SLC43A2, PDE3B, ADAM33, BLB1, and DMB2), especially MCL1, were highlighted as the candidate genes with the potential to be involved in MDV infection. Gene-set enrichment analysis revealed that many ASE genes are linked to T cell activation, T cell receptor (TCR), B cell receptor (BCR), ERK/MAPK, and PI3K/AKT-mTOR signaling pathways, which play potentially important roles in MDV infection. Our approach underlines the importance of comprehensive functional studies for gaining valuable biological insight into the genetic factors behind MD and other complex traits, and our findings provide additional insights into the mechanisms of MD and disease resistance breeding in poultry. Keywords: Marek’s disease virus, CD4^+ T cells, allele-specific expression, differential expression, genetic resistance 1. Introduction Marek’s disease (MD) is a T cell lymphoma and a strictly cell-associated disease induced by the highly oncogenic α-herpesvirus II disease virus [[38]1], which has a complex life with four main phases [[39]2,[40]3]: an early cytolytic phase at 2–7 days post infection (dpi), a latent phase around 7–10 dpi, a late cytolytic phase with the presence of tumors that is triggered between 14 and 21 dpi, and a final proliferation phase after 28 dpi. During the first cytolytic phase, Marek’s disease virus (MDV) first uses B cells as targets for its replication before targeting activated CD4^+ T cells to enable a persistent latent infection [[41]4,[42]5,[43]6]. MDV then uses T cells as the main target during the later phase, especially CD4^+ T cells [[44]7] during the cytolytic phase with tumors at 21 dpi, which were used in this study. CD4 is a co-receptor that facilitates T cell receptor (TCR) communication with an antigen-presenting cell. MD is a commercially important neoplastic disease in chickens and also the main chronic infectious disease threatening the poultry industry. Enhancing genetic resistance to MD in poultry is an important long-term goal in controlling MD. To optimally implement this control strategy through marker assisted selection (MAS) and to understand the etiology and mechanisms of MD, it is necessary to identify more specific alleles and genes with respect to MD latency, although alleles at the well-studied major histocompatibility complex (MHC) locus are already known to be involved in genetic resistance to MD. The comprehensive identification of genes underlying phenotypic variation of complex traits, especially disease resistance, remains one of the greatest challenges in biology, despite having genome sequences and powerful tools. Screening for allele-specific expression (ASE) [[45]8,[46]9,[47]10,[48]11] is an effective approach to identifying regulatory variation responsible for differences in transcript abundance in genes. ASE refers to unequal expression of multiple alleles of a gene, and the extreme case of ASE is monoallelic expression, where only one of the alleles is expressed while the other is completely inactive. The genes with single nucleotide polymorphisms (SNPs) exhibiting ASE will provide a strong foundation for investigating the genetic mechanisms of complex traits, such as MDV infection. Cheng et al. [[49]12] found that SNPs in ASE genes account for more than 83% of the additive genetic variation of genetic resistance to MD, demonstrating that most of the genes exhibiting ASE are strong candidates for studying MD resistance. MacEachern et al. [[50]13] suggested that ASE can be used to identify genes with cis-regulatory elements that respond to MDV infection. Perumbakkam et al. [[51]14] considered that the TLR and JAK/STAT signaling pathways may be responding to MDV infection through the genes exhibiting ASE. In the present study, transcriptomic data in CD4^+ T cells isolated from MDV-infected and non-infected groups of two reciprocal crosses were analyzed by high throughput sequencing (RNA-seq) to identify candidate genes of MD genetic resistance. Two F1 reciprocal crosses were used to compare the differences, not only between infected and non-infected individuals, but also between two reciprocal crosses. Key SNPs, genes, and pathways related to MD resistance or MD susceptibility were identified by ASE and differential expression (DE) analyses. We used both CD4^+ T cells and tumors for the validation of genes exhibiting ASE. 2. Materials and Methods 2.1. Ethics Statement All of the chickens were kept in a pathogen-free facility at the Avian Disease and Oncology Laboratory (ADOL, East Lansing, MI, USA). All animals were approved by the United States Department of Agriculture (USDA), Agricultural Research Service, and ADOL Animal Care and Use Committee. Their guidelines (revised April 2005, Approval Number: 6040-31320-009-00-D) were followed, along with the Guide for the Care and Use of Laboratory Animals published by Institute for Laboratory Animal Research (ILAR Guide) in 1996 ([52]http://www.nap.edu/openbook.php?record_id=5140) and University of Maryland (R-08-62). All efforts were made to minimize discomfort and suffering. 2.2. Chickens, Samples, and Experimental Design Two reciprocal crosses were obtained—the F1 progeny from intermating 6[3] and 7[2] lines (USDA-ARS ADOL, East Lansing, MI, USA) [[53]15], two highly inbred chicken lines that are MD-resistant and MD-susceptible, respectively. For each reciprocal cross, the chickens were divided into two groups, with two birds infected with MDV and two non-infected controls. A very virulent plus (vv+) strain of MDV-648A passage 50 [[54]16] was injected intra-abdominally at day 5 after hatching with a viral dosage of 500 plaque-forming units (PFU) per chick. The viral challenge experiment was conducted in the BSL-2 facility at ADOL. At 21 dpi, CD4^+ T cells were isolated from blood using a CD4^+ T cell isolation kit (Miltenyi Biotec, Auburn, CA, USA) with anti-biotin microbeads (Miltenyi Biotec, Auburn, CA, USA) and mouse anti-chicken CD4 antibody (SouthernBiotech, Birmingham, AL, USA). The number and purities of enriched cells were determined by flow cytometry using the MACSQuant^® analyzer (Miltenyi Biotec, Auburn, CA, USA) and stored in RNAlater solution (Qiagen, Valencia, CA, USA) immediately at −80 °C until RNA extraction. 2.3. Library Construction and Sequencing Total RNA was isolated from CD4^+ T cells using TRIzol (Invitrogen, Carlsbad, CA, USA) and examined using the RNeasy Mini Kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions. RNA concentration was assessed using a Nanodrop ND-1000 spectrophotometer (Thermo Scientific, Madison, WI, USA) and RNA quality was determined using the 2100 Bioanalyzer (Agilent, Pal Alto, CA, USA). Only samples with RIN scores over 7 could be submitted for sequencing. Then, mRNA was used to synthesize the first- and second-strand cDNA using SuperScriptTM III reverse transcriptase (Invitrogen, Carlsbad, CA, USA) and oligo (dT) 12–18 primers (Invitrogen, Carlsbad, CA, USA). After purification, the double-strand cDNA (dscDNA) was fragmented into ~300 base pairs (bp). The library was subsequently constructed as follows. End repair of the fragmented dscDNA was performed and then 3’ poly-A was added to the end-repaired dscDNA using DNA polymerase I, large (Klenow) fragment. A pair of Solexa adaptors was ligated to the repaired ends using T4 ligase, then 200–400 bp of fragments were selected using the Invitrogen^® 2% E-Gel (Carlsbad, CA, USA). Specific dscDNA fragments were amplified by PCR and the libraries were then quantified and pooled. Finally, clusters were generated and sequencing was analyzed using Illumina Hiseq 2000 (single end, 50 base read length; San Diego, CA, USA). The sequencing data were submitted to the Sequence Read Archive (SRA) of National Center for Biotechnology Information (NCBI), and are accessible through the accession number PRJNA488865. 2.4. Mapping and Assembling Chicken genome assembly (galGal4) was downloaded from UCSC Genome Browser website ([55]http://hgdownload.soe.ucsc.edu/goldenPath/galGal4/bigZips/) [[56]17]. To minimize the mapping errors, quality control was performed by FastQC [[57]18] and low quality reads were removed with the help of the FastX Toolkit [[58]19] and Trimmomatic [[59]20] with default parameters. The resulting FastQ files of mapping reads of each sample were individually aligned with the reference genome using Bowtie 2 (version 2.2.7) [[60]21,[61]22], with mainly default parameters. SAMtools (version 1.3) [[62]23] was then used to convert the alignment results (SAM format) to BAM format for calling SNPs (for ASE analysis) and counting reads number (for DE analysis). 2.5. SNP Calling, ASE Estimation, and Functional Annotation of SNPs SNP calling was performed for each chromosome on all samples simultaneously, using SAMtools and BCFtools (version 1.3) [[63]24] with basic command lines. The SNPs were then filtered using the BCFtools to exclude the following variants: (1) SNPs with a mapping quality covering reads < 20; and (2) SNPs with a total read depth < 4 or > 100. Read counts per allele were calculated by the Genome Analysis Toolkit (GATK, version 3.5) [[64]25] ASEReadCounter with default parameters. The read counts of each allele were used for further ASE SNP detection. All of the indels were subsequently removed and the variant calling format (VCF) files derived from non-infected and infected birds were merged using VCFtools (version 0.1.14) [[65]26]. An in-house R script was used for allelic imbalance determined by a chi-square test with two variables: (1) the read counts (allele A (reference) and B (alternative)) of each SNP and (2) two different groups (four comparison groups were divided into 6[3] × 7[2] infected (I) vs 6[3] × 7[2] non-infected (N), 7[2] × 6[3] I vs 7[2] × 6[3] N, 6[3] × 7[2] I vs 7[2] × 6[3] I, and 6[3] × 7[2] N vs 7[2] × 6[3] N). SNPs with a 0.05 level of significance (false discovery rate (FDR) < 0.05) were chosen for further investigation. Variant Effect Predictor (VEP) [[66]27] based on the Ensembl website ([67]http://useast.ensembl.org/Gallus_gallus/Tools/VEP) was used to functionally annotate the putative SNPs to the genes within 5000 bp. Each SNP was classified based on its position in the reference chicken genome as exonic, intronic, intergenic, 5’ untranslated region (UTR), 3’ UTR, splicing site, and upstream or downstream. 2.6. Differential Expression (DE) Analysis The alignments obtained above were counted by HTSeq count based on python [[68]28] and analyzed using edgeR [[69]29], a bioconductor package for DE analysis within R. The resulting data of HTseq from all of the individual samples were converted and merged to text files imported into R and then analyzed. A portion of the reads (10.4 and 13.8 million reads on average for intercross 6[3] × 7[2] and 7[2] × 6[3], and 11.8 and 12.4 million reads on average in infected and non-infected groups, respectively) were assigned as “no feature”. 2.7. Experimental Validation To evaluate the reliability of our data analysis, we randomly selected seven genes for quantitative real-time PCR (qRT-PCR) validation in CD4^+ T cells. Simultaneously, 11 overlapped genes between ASE and DE analysis were selected for gene functional validation in both CD4^+ T cells and tumors. The protocols followed for mRNA extraction and dsDNA synthesis were the same as those mentioned above. Gene-specific primers were designed to span over exons from the chicken genome reference Unigene sequences available in GenBank for conventional PCR amplification using Primer3 ([70]http://fokker.wi.mit.edu/primer3/input.htm) and confirmed by Oligo 6.0 ([71]Tables S1 and S2). The housekeeping gene GAPDH and β-actin were used as the endogenous control. The qRT-PCR using SYBR Green PCR Kit was performed in triplicate based on iCycler iQ PCR System (Bio-Rad, Hercules, CA, USA). The qRT-PCR reaction program was run as follows: pre-incubation (95 °C for 10 min), 40 cycles of amplification (95 °C for 10 s, 60 °C for 10 s, and 72 °C for 10 s), melting curves using a heat ramp, and cool down. Cycle threshold values (Ct values) were obtained from iCycler iQ PCR software (Bio-Rad, Hercules, CA, USA). The expressions of genes were normalized against GAPDH or β-actin cDNA in the corresponding samples. The relative fold enrichment of each treatment group was calculated by comparing the enrichment value to GAPDH or β-actin. 2.8. Gene Ontology (GO) and Pathway Enrichment Analyses Using DAVID and IPA Two methods were used for GO and pathway enrichment analysis here. The ASE significant genes (FDR < 0.05) list was firstly submitted to DAVID (version 6.7) ([72]https://david-d.ncifcrf.gov/) [[73]30]. The analysis classification stringency was set to the highest level with suitable controls. The resulting clustering was then limited to an enrichment score of >1.0 and FDR for multiple testing was performed following the Benjamin and Hochberg method. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was also performed using DAVID. Simultaneously, gene functions were annotated using the Ingenuity Pathway Analysis (IPA) tool ([74]http://www.ingenuity.com/products/ipa). The program inquires the IPA knowledge database for information and direct relationships of genes and endogenous chemicals. This creates algorithmically generated networks and data clustered into biological functions and diseases that are overrepresented in the scrutinized data. The program determines the high representation of signaling and biological pathways. The method employs the Fisher’s exact test, determining the proportion of genes mapped to a function or pathway in the sample and then compares it to the ratio in the reference set. The result included functional networks, disease and disorders, molecular and cellular functions, physiological system development and function, top canonical pathways, upstream regulators, and top toxicity lists. 3. Results 3.1. ASE SNP Discovery and ASE Genes in Response to MDV Infection In the present study, we sequenced RNA samples from two infected and two non-infected birds for each reciprocal cross ([75]Figure 1). The average number of raw reads was approximately 22.15, 28.92, 23.37, and 27.69 million for intercross 6[3] × 7[2], 7[2] × 6[3], infected, and non-infected datasets, respectively ([76]Table 1). After quality control (QC) procedures, all of the raw data were sufficient, with no further trimming processes required. The mapping levels of the samples were adequate and similar, ranging from 73.13% to 79.61% for all individuals. These high quality alignments were appropriate for subsequent analysis, with few false positives. Figure 1. [77]Figure 1 [78]Open in a new tab Experimental and data analyses designs. (A) Experimental design for identifying allele-specific expression (ASE) and differentially expressed (DE) genes based on high throughput sequencing (RNA-seq) using reciprocal crosses. (B) The pipeline of ASE and DE gene identification. The number on an arrow indicates the remaining genes after the previous step. Table 1. Reads alignment. Sample No. of Total Reads One Time Aligned Reads Multiple Times Aligned Reads Total Aligned Reads Alignment 6[3] × 7[2] I_1 19,223,914 14,084,204 1,063,547 15,147,751 78.80% 6[3] × 7[2] I_2 18,840,597 13,938,069 1,061,498 14,999,567 79.61% 6[3] × 7[2] N_1 21,702,185 15,052,834 1,315,212 16,368,046 75.42% 6[3] × 7[2] N_2 28,813,734 18,928,310 2,143,823 21,072,133 73.13% 7[2] × 6[3] I_1 24,046,040 17,200,244 1,406,524 18,606,768 77.38% 7[2] × 6[3] I_2 31,386,432 22,780,475 1,935,946 24,716,421 78.75% 7[2] × 6[3] N_1 34,382,991 23,676,165 2,111,990 25,788,155 75.00% 7[2] × 6[3] N_2 25,854,955 17,753,865 1,747,443 19,501,308 75.43% [79]Open in a new tab All of the SNPs were called within alignments using SAMtools and BCFtools, and read counts per allele were calculated with the help of GATK ASEReadCounter (Broad Institute, Cambridge, MA, USA). The average number of SNPs finally called was in the approximate range of 500,000 to 710,000. With the criteria described above, around 150,000–260,000 SNPs were left after filtering, and about one-third of heterozygous SNPs were left ([80]Table 2). All of the VCF files were then merged and divided into four comparison groups (infection in contrast to non-infection in 6[3] × 7[2] and 7[2] × 6[3] intercross; infection comparison and non-infection comparison groups between two intercrosses) using VCFtools and consensus SNPs (the missing SNPs that were not present in all of the individuals involved in one comparison group were removed and the overlapped SNPs were retained) were ultimately selected for ASE SNP identification ([81]Table 3). Table 2. Number of raw, screened, and heterozygous single nucleotide polymorphisms (SNPs). Sample Raw SNPs No. of SNPs after Filtering Heterozygous SNPs 6[3] × 7[2] I_1 552,541 165,244 65,689 6[3] × 7[2] I_2 541,775 162,552 62,759 6[3] × 7[2] N_1 507,400 156,293 64,405 6[3] × 7[2] N_2 577,350 184,980 75,044 7[2] × 6[3] I_1 614,547 199,101 81,324 7[2] × 6[3] I_2 716,655 259,509 105,297 7[2] × 6[3] N_1 671,763 250,528 100,882 7[2] × 6[3] N_2 561,954 184,962 73,707 [82]Open in a new tab Table 3. Number of ASE SNPs. Group SNPs for Chi-Square Test No. of ASE SNPs (False Discovery Rate (FDR) < 0.05) No. of ASE Genes 6[3] × 7[2] I vs 6[3] × 7[2] N 15,105 61 39 7[2] × 6[3] I vs 7[2] × 6[3] N 22,067 123 132 6[3] × 7[2] I vs 7[2] × 6[3] I 18,229 62 66 6[3] × 7[2] N vs 7[2] × 6[3] N 20,310 79 96 [83]Open in a new tab An R script was used for each SNP determined by a chi-square test examining each allele (read counts of alleles A and B) and two different statuses in birds—infected and non-infected. A total number of 61 and 123 SNPs exhibiting ASE in response to MDV infection were identified in intercrosses 6[3] × 7[2] and 7[2] × 6[3], respectively (FDR < 0.05, [84]Table 3). The significant SNPs for both reciprocal crosses were distributed on each chromosome of the chicken genome ([85]Figure 2A). To locate these SNPs on the chicken genome with respect to genes and classify them based on function, VEP was then used to functionally annotate the putative SNPs to the genes within 5000 bp. In intercross 6[3] × 7[2], 61 significant SNPs were located in 39 genes; the majority of SNPs were classified as exonic, intronic, and intergenic. Similarly, in intercross 7[2] × 6[3], 123 SNPs were located in 132 genes; the majority of SNPs were classified as intronic, downstream, and exonic ([86]Table 4). For example, two alleles with T-A and A-G on chromosome 25 are located on the intron of MCL1. Three alleles with C-A, A-T, and T-C on chromosome 5 are located on the intron of PDE3B. Seven genes were matched between the two reciprocal crosses, including MCL1, PTBP1, SLC43A2, ST6GAL1, PDE3B, RUNX1, and ADAM33. The detailed lists of ASE SNPs and genes responding to MDV infection are provided in [87]Table S3. Figure 2. [88]Figure 2 [89]Open in a new tab Distribution of ASE SNPs in chicken chromosomes. (A) Distribution of ASE SNPs in intercrosses 6[3] × 7[2] and 7[2] × 6[3], respectively, in response to MDV infection. (B) Distribution of ASE SNPs between two reciprocal crosses in infected and non-infected groups. Table 4. Classification of ASE SNPs responding to Marek’s disease virus (MDV) infection. SNP Type 6[3] × 7[2] I_N 7[2] × 6[3] I_N SNPs Genes SNPs Genes Exonic 19 17 29 24 Intronic 16 13 52 43 Intergenic 17 - 4 - Upstream 0 0 15 14 Downstream 0 0 36 32 5’ UTR 1 1 1 1 3’ UTR 6 6 20 16 Splicing 2 2 2 2 Total 61 39 159 132 [90]Open in a new tab UTR: Untranslated region. 3.2. ASE SNP Discovery and ASE Genes between Two Chicken Reciprocal Crosses Similarly, a chi-square test was used to examine each allele and two different statuses in infected birds of intercross 6[3] × 7[2] compared with those of 7[2] × 6[3] and in non-infected birds. A total of 62 and 79 SNPs exhibiting ASE were identified in infected and non-infected birds, respectively (FDR < 0.05, [91]Table 3). The significant SNPs were distributed on each chromosome of the chicken genome ([92]Figure 2B). A total of 62 significant SNPs were located in 66 genes and 79 SNPs in 96 genes in infected and non-infected groups, respectively ([93]Table S3). The classification of ASE SNPs is detailed in [94]Table 5. Therefore, 96 ASE genes were identified before MDV infection and 66 were identified after infection, with 11 overlapped genes in total (MCL1, VAV3, PTPRC, RAB8B, ATP13A2, RAB43, CHD2, CD44, NRK, RHOH, and ENSGALG00000026782 (novel gene)), including MCL1. Interestingly, the same alleles located on the intron of MCL1 were also identified. The common genes were both identified before and after MDV infection, so they may not only be associated with MDV but also be related to reciprocal crosses in both the virus- and host-related genes. We identified 85 unique genes before infection and 55 unique genes after infection, which should be in response to MDV infection with different statuses before and after infection. Table 5. Classification of ASE SNPs between two reciprocal crosses. SNP Type Infected Non-Infected SNPs Genes SNPs Genes Exonic 12 11 20 18 Intronic 26 21 21 16 Intergenic 0 - 5 - Upstream 11 10 21 18 Downstream 19 18 23 23 5’ UTR 0 0 0 0 3’ UTR 6 5 18 17 Splicing 1 1 4 4 Total 75 66 112 96 [95]Open in a new tab 3.3. Comparison and Analysis of Differentially Expressed Genes (DEGs) The same aligned datasets were used for the DE analysis. DE was estimated by obtaining the count data from each biological sample using the Htseq script [[96]28]. Biological replicates were closed to each other ([97]Figure 3) to verify the reliability of our datasets. A total of 17,108 chicken genes from galGal4 and edgeR, based on R package, were then used to estimate DE. We also divided the samples into four comparison groups the same as for ASE analysis. A total of 534 and 1543 DEGs related to MDV infection were identified in intercrosses 6[3] × 7[2] and 7[2] × 6[3], respectively (FDR < 0.05; [98]Table 6, [99]Figure 4A,B). For intercross 6[3] × 7[2], the expression of 382 (71.5%) DEGs were higher and 152 (28.5%) were lower after infection. For intercross 7[2] × 6[3], 854 (55.3%) and 689 (44.7%) DEGs were up- and down-regulated after infection, respectively. Similarly, for 328 and 20 DEGs, 279 up-regulated and 49 down-regulated genes were identified in infected groups, and 5 up-regulated and 15 down-regulated genes were identified in non-infected groups (FDR < 0.05; [100]Table 6, [101]Figure 4C,D), respectively. The detailed lists of DEGs are provided in [102]Table S4. Compared with the ASE genes, 3 and 24 overlapped genes responding to MDV infection were identified in intercrosses 6[3] × 7[2] and 7[2] × 6[3], respectively. Similarly, 10 and 0 overlapped genes were identified in infected groups and non-infected groups, respectively ([103]Table 7). These common genes, including MCL1, SLC43A2, PDE3B, and ADAM33, which were mentioned above, could be selected as candidate MD-resistant or MD-susceptible genes. The detailed information of these genes is provided in [104]Table S5. Figure 3. [105]Figure 3 [106]Open in a new tab Cluster map of all of the samples. (A) Heatmap and hierarchical clustering tree based on the Reynolds’ distance between individuals. RS and SR represent intercrosses 6[3] × 7[2] and 7[2] × 6[3], respectively. I and N respect infected and non-infected groups, respectively. Two biological replicates are clustered together. (B) Principle component analysis (PCA) plots based on the first two principal components. The blue and red dots represent infected and non-infected birds, respectively. The infected and non-infected birds are clustered into two groups. Table 6. Differentially expressed genes (DEGs) of four comparison groups. Group No. of DEGs (FDR < 0.05) Up-/Down-Regulated 6[3] × 7[2] I vs 6[3] × 7[2] N 534 382/152 7[2] × 6[3] I vs 7[2] × 6[3] N 1543 854/689 Overlapped 223 153/70 6[3] × 7[2] I vs 7[2] × 6[3] I 328 279/49 6[3] × 7[2] N vs 7[2] × 6[3] N 20 5/15 Overlapped 0 0/0 [107]Open in a new tab Figure 4. [108]Figure 4 [109]Open in a new tab The differentially expressed genes. (A) Infected birds in contrast to non-infected birds in intercross 6[3] × 7[2]. (B) Infected birds in contrast to non-infected birds in intercross 7[2] × 6[3]. Comparison of two reciprocal crosses in (C) infected groups and (D) non-infected groups. MA plot shows the significant differentially expressed genes in red (y-axis depicts the log[2] fold change (FC) of gene expression; x-axis represents the average log reads count per million (CPM) for each gene; the blue lines denote |log[2] Fold Change| = 1). Table 7. ASE and DE genes of four comparison groups. Group No. of ASE Genes No. of DE Genes Overlapped Genes 6[3] × 7[2] I vs 6[3] × 7[2] N 39 534 3 7[2] × 6[3] I vs 7[2] × 6[3] N 132 1543 24 6[3] × 7[2] I vs 7[2] × 6[3] I 66 328 10 6[3] × 7[2] N vs 7[2] × 6[3] N 96 20 0 [110]Open in a new tab 3.4. RNA-Seq Results Verification and Gene Functional Validation by qRT-PCR To verify the RNA-seq analysis, seven genes (GZMA, GZMK, LITAF, DDX60, AVD, CYP26B1, and ENSGALG00000019325 (novel gene)) were examined by qRT-PCR using the RNA of CD4^+ T cells derived from infected and non-infected birds. The results ([111]Figure 5) showed good agreement with the DE analysis pattern, indicating the reliability of the latter. Figure 5. [112]Figure 5 [113]Open in a new tab The qRT-PCR of seven genes for validating the RNA-seq results in intercrosses (A) 6[3] × 7[2] and (B) 7[2] × 6[3]. The y-axis represents the log[2] FC of gene expression; the x-axis shows the Ensembl names of genes employed for validation. Blue and red bars depict the RNA-seq and qRT-PCR results, respectively. R^2 refers to the coefficient of determination. As described above, the overlapped genes between ASE and DE analyses may be much more important. Thus, we selected 11 overlapped genes based on their existing gene functions and the position of their ASE SNPs for the functional validation in both CD4^+ T cells and tumors using qRT-PCR to confirm whether they are critical genes in the response to MDV infection. The results ([114]Figure 6) revealed that some of these genes, including MCL1, SLC43A2, PDE3B, ADAM33, PPFIA1, CD28, B4GALT3, and CBLB, showed significant differences (|fold change (FC)| > 1.2 or |log[2] FC| > 0.27) in both CD4^+ T cells and tumors of the comparison groups, and the same trend was observed with the RNA-seq results. Figure 6. [115]Figure 6 [116]Open in a new tab Gene functional validation in CD4^+ T cells and tumors using qRT-PCR: (A) Infected in contrast to non-infected; (B) infected comparison groups between two reciprocal crosses. The y-axis represents the log[2] FC of gene expression; the x-axis shows the genes employed for validation. Blue, red, and green bars depict the RNA-Seq and qRT-PCR of genes in CD4^+ T cell and tumor results, respectively. R^2 refers to the coefficient of determination. 3.5. GO and Pathway Enrichment Analyses of ASE Genes To identify potential GO terms and pathways, a list of significant ASE genes (FDR < 0.05) was firstly submitted to DAVID (version 6.7) for biological processing and pathway enrichment. According to the ASE analysis results, 268 ASE genes related to MDV infection were finally obtained as initial input genes ([117]Table S6). Using functional annotation clustering, 38 clusters were formed at the highest classification stringency. However, only seven clusters were chosen after using an enrichment cutoff > 1.0 ([118]Table S7). GO terms and pathways analysis invoked in DAVID yielded 45 (5 terms of cellular component, 15 terms of molecular function, and 25 terms of biological processes) functional terms (P < 0.05, [119]Table S8) and 11 pathways (P < 0.05, [120]Table 8), including the immune-related pathway, T cell activation ([121]Figure S1). Table 8. DAVID analysis of ASE genes in response to MDV infection (P < 0.05). Category Term Count % P-Value KEGG_PATHWAY gga04914:Progesterone-mediated oocyte maturation 6 2.43 1.61E-02 KEGG_PATHWAY gga04210:Apoptosis 6 2.43 1.71E-02 KEGG_PATHWAY gga04810:Regulation of actin cytoskeleton 9 3.64 2.58E-02 KEGG_PATHWAY gga04672:Intestinal immune network for IgA production 4 1.62 2.83E-02 KEGG_PATHWAY gga04910:Insulin signaling pathway 7 2.83 2.86E-02 KEGG_PATHWAY gga04114:Oocyte meiosis 6 2.43 4.37E-02 KEGG_PATHWAY gga04070:Phosphatidylinositol signaling system 5 2.02 4.39E-02 PANTHER_PATHWAY [122]P00021:FGF signaling pathway 7 2.83 2.23E-03 PANTHER_PATHWAY [123]P00018:EGF receptor signaling pathway 6 2.43 8.78E-03 PANTHER_PATHWAY [124]P00048:PI3 kinase pathway 5 2.02 1.89E-02 PANTHER_PATHWAY [125]P00053:T cell activation 5 2.02 2.02E-02 [126]Open in a new tab The ASE gene set was also submitted for IPA analysis. We obtained 215 pathways (P < 0.05, [127]Table S9). The preeminent canonical pathways are comprised of T cell receptor (TCR), PI3K/AKT, B cell receptor (BCR), interleukin-4 (IL-4), and ERK/MAPK signaling pathways ([128]Figures S2–S6). The corresponding P-values and ratios are listed in [129]Table 9. The top diseases and disorder clusters, including inflammatory response, immunological disease, hematological disease, cancer, and organismal injury and abnormalities, were considered relevant to MDV infection. Additional information about the top networks, top lists, and top molecules were also obtained from the IPA analysis ([130]Table S10). Table 9. Top canonical pathways from IPA. Ingenuity Canonical Pathways P-value Ratio T Cell Receptor Signaling 1.00E-11 14/109 (12.8%) PI3K/AKT Signaling 7.40E-10 13/124 (10.5%) B Cell Receptor Signaling 1.34E-09 15/185 (8.1%) IL-4 Signaling 2.65E-09 11/89 (12.4%) ERK/MAPK Signaling 3.67E-09 15/199 (7.5%) [131]Open in a new tab 4. Discussion MD, a complicated tumor disease, has been used as a model to study human tumors [[132]31]. The genetic mechanism underlying MD resistance and MD susceptibility is likely to be complex and remains incompletely understood. Thus, understanding the genetic basis of MD resistance or MD susceptibility for poultry is important to provide crucial clues for human diseases. In the present study, based on the high throughput sequencing platform, some bioinformatics analyses, such as ASE, DE, GO, and pathway analysis, were used to identify SNPs, genes, and enriched pathways from infected and non-infected birds. Six genes exhibiting ASE and six pathways were selected as the likely candidate factors that induce MD resistance. To minimize transcriptional variations and take full advantage of identical genetic backgrounds in inbred lines, we paired birds by reciprocal crossing and MDV infection and tested the differences. We not only identified genes most likely related to MD resistance, but also revealed the effects of MDV infection on the host. ASE is a powerful technique used to measure the expression of each allele through studying SNP within an RNA sample [[133]32,[134]33,[135]34,[136]35]. The F[1] data used here are an ideal test case for our approach. In this study, we hypothesized that a gene with allelic inequality or imbalance is the key genetic factor causing MDV susceptibility. Since variation in gene expression is thought to be a major factor affecting phenotypic variation, genes with ASE SNPs provide candidates and markers that may account for the complex trait of interest. Although the ASE SNPs identified here could not be considered to explain much more additive genetic variation of the genetic resistance to MD because fewer ASE SNPs and genes were obtained from RNA-seq data, our findings are still of interest in view of the strict experimental design and analysis methods. In this experiment, we successfully aligned RNA-seq reads to the reference chicken genome and identified high quality SNPs in infected and non-infected chickens. The significant SNPs exhibiting ASE in response to MDV infection were distributed on autosomes 1 to 28 and Z and W chromosomes of the chicken genome, with the number of SNPs for each chromosome roughly proportional to the size of each chromosome in the chicken genome assembly. To locate these SNPs on the chicken genome with respect to genes and classify them based on function, VEP was used. In the 6[3] × 7[2] I_N group, the most SNPs (n = 19, 31.1%) were classified as exonic; however, the largest number of SNPs (n = 52, 32.7%) was classified as intronic in the 7[2] × 6[3] I_N group. The largest number of SNPs was classified as intronic (n = 26, 34.7%) and located downstream of a gene (n =23, 20.5%) in infected and non-infected groups, respectively. Interestingly, for both ASE and DE results, more SNPs and DEGs were identified in the 7[2] × 6[3] I_N group than in the 6[3] × 7[2] I_N group, which may be because of the different MD incidences (%) of both reciprocal crosses. Seven overlapped ASE genes were identified in intercrosses 6[3] × 7[2] and 7[2] × 6[3] in response to MDV infection. We also identified 11 overlapped ASE genes in infected and non-infected groups, and these common genes are likely induced by MD viral infection in the host. MCL1 was the only gene identified in both overlapped groups, which could be firstly selected as a key gene in the response to MDV infection. Other ASE genes, including CD5, CD28, CD44, TCF1, and IRF-4, are also important genes related to T cells, viruses, and tumors. DE is another powerful technique that could further explain the results of ASE. Many more overlapped genes were identified in DE than in ASE analysis in response to MDV infection, including MCL1. We compared the results of the DEGs to the ASE gene analysis to identify four overlapped genes: MCL1, SLC43A2, PDE3B, and ADAM33. As a tumor disease, gene functional validation was conducted not only in CD4^+ T cells but also in tumor tissues, which could provide powerful evidence of key genes responding to MDV resistance. Eight genes exhibiting ASE, including these four genes, showed significant differences in both CD4^+ T cells and tumors of the comparison groups, and were completely consistent with the RNA-seq results. Therefore, these four genes could be selected as potential candidate genes to study MD resistance or MD susceptibility, of which MCL1, as a member of the BCL2 family, plays a role in cell proliferation, differentiation, tumorigenesis, and apoptosis [[137]36,[138]37,[139]38,[140]39]. It is also associated with tumor, cancer, and CD4^+ and CD8^+ T cells [[141]40,[142]41,[143]42,[144]43]. In this study, we found an ASE SNP in this MCL1 gene, and the expression of this gene is significantly higher in infected birds and tumors. Thus, this SNP may increase the gene expression of MCL1 after MDV infection. This gene was identified both as a virus-related gene and a host-related gene, so it could be a critical gene connecting MDV to the host. Chicken MHC plays an important role in the determination of resistance to MDV [[145]1]. The GGA16 contains three loci, the B locus, Y (or Rfp-Y) locus, and nucleolar-organizing region (NOR), and only one ASE SNP was found to be associated with a gene that encodes for the class I alpha chain of the Rfp-Y loci [[146]14]. In this study, no SNPs in the related loci were identified, and this could be attributed to the incomplete sequence information available for the GGA16 loci in the chicken genome. However, in the DE analysis, two MHC class II (MHCII)-associated genes, BLB1 [[147]44,[148]45,[149]46] and DMB2 [[150]47,[151]48,[152]49], were found to be significantly different between infected and non-infected birds, which are related to T cell immune responses, and their expressions are altered in cancers and tumors. Due to the complexity of MD, it may not be a single-gene-controlled trait. Thus, we subsequently performed pathway analyses to investigate the specific gene sets involved in signaling cascades. DAVID analysis is usually a simple and useful method that is widely used for many gene set enrichment analyses. IPA is a powerful and useful commercial gene set enrichment analysis tool. In this study, based on the DAVID and IPA results, we focused on several important pathways that may be related to MDV infection. T cell activation [[153]50] was identified by DAVID from the PANTHER pathway dataset. Activation of CD4^+ T cells occurs through the simultaneous engagement of the T cell receptor and a co-stimulatory molecule on the T cell via the MHCII peptide and co-stimulatory molecules on the antigen-presenting cell (APC). This signaling pathway downstream from co-stimulatory molecules engages the PI3K/AKT molecules, which was identified in both DAVID and IPA analysis. The PI3K/AKT-mTOR signaling pathway is one of the three major signaling pathways critical in tumor progression. This classic route consists of two signaling pathways, PI3K/AKT and mTOR, which are important in regulating the cell cycle and directly related to cellular proliferation, tumors, and cancer. AKT is activated downstream of PI3K and then activates mTOR [[154]51]. In many cancers, this pathway is overactive, thus reducing apoptosis and allowing proliferation. This pathway is related to the T cell activation pathway mentioned above and the key MD-related gene MCL1, identified in this study, is also involved in this pathway. This pathway may play a role in MD resistance or MD susceptibility. The TCR signaling pathway was the top canonical pathway identified by IPA. TCR is a molecule found on the surfaces of T cells or T lymphocytes, and is responsible for recognizing fragments of antigens as peptides bound to MHC molecules. When the TCR engages with an antigenic peptide and MHC (peptide/MHC), the T lymphocyte is activated through signal transduction, which is a series of biochemical events mediated by associated enzymes, co-receptors (like CD4), specialized adaptor molecules, and activated or released transcription factors [[155]52]. MDV first uses B cells as targets for its replication before targeting activated CD4^+ T cells. Thus, the BCR signaling pathway is critical for MDV infection. BCR is a transmembrane receptor protein located on the outer surface of B cells that has two crucial functions: signal transduction, involving changes in receptor oligomerization, and mediating internalization for subsequent processing of the antigen and presentation of peptides to helper T cells. BCR functions are required for normal antibody production, so defects in BCR signal transduction may lead to immunodeficiency [[156]53]. The ERK/MAPK signaling pathway is a chain of proteins in the cell that communicates a signal from a receptor on the surface of the cell to the DNA in the nucleus of the cell. It consists of many proteins, including mitogen-activated protein kinase (MAPK), which is necessary for the development of cancers [[157]54]. This pathway is one of the most extensively studied pathways involved in tumorigenesis [[158]55] and has been found to be critical in previous MD studies by Yan et al. [[159]56] and Subramaniam et al. [[160]57,[161]58]. Overall, our analyses indicate that the genes and pathways described here are worthy of further studies on MD. 5. Conclusions In summary, we successfully identified SNPs associated with ASE and revealed genes and pathways that may be involved in genetic resistance to MD. Combining ASE, DE, and pathway analyses, and due to the complexity of MD, we ultimately found with high confidence that candidate genes (including MCL1, SLC43A2, PDE3B, ADAM33, BLB1, and DMB2, and especially MCL1) and several immune- or disease-related pathways (such as T cell activation, TCR, BCR, ERK/MAPK, and PI3K/AKT-mTOR signaling pathways) play potentially important roles in MDV infection. Overall, our approach underlines the importance of comprehensive functional studies in gaining valuable biological insight into the genetic factors behind MD and other complex traits. Our findings provide additional insights into the mechanisms of MD and disease resistance breeding in poultry. Acknowledgements