Abstract
Bovine respiratory disease (BRD), the leading disease complex in beef
cattle production systems, remains highly elusive regarding diagnostics
and disease prediction. Previous research has employed cellular and
molecular techniques to describe hematological and gene expression
variation that coincides with BRD development. Here, we utilized
weighted gene co-expression network analysis (WGCNA) to leverage total
gene expression patterns from cattle at arrival and generate
hematological and clinical trait associations to describe mechanisms
that may predict BRD development. Gene expression counts of previously
published RNA-Seq data from 23 cattle (2017; n = 11 Healthy, n = 12
BRD) were used to construct gene co-expression modules and correlation
patterns with complete blood count (CBC) and clinical datasets. Modules
were further evaluated for cross-populational preservation of
expression with RNA-Seq data from 24 cattle in an independent
population (2019; n = 12 Healthy, n = 12 BRD). Genes within
well-preserved modules were subject to functional enrichment analysis
for significant Gene Ontology terms and pathways. Genes which possessed
high module membership and association with BRD development, regardless
of module preservation (“hub genes”), were utilized for protein-protein
physical interaction network and clustering analyses. Five
well-preserved modules of co-expressed genes were identified. One
module (“steelblue”), involved in alpha-beta T-cell complexes and
Th2-type immunity, possessed significant correlation with increased
erythrocytes, platelets, and BRD development. One module (“purple”),
involved in mitochondrial metabolism and rRNA maturation, possessed
significant correlation with increased eosinophils, fecal egg count per
gram, and weight gain over time. Fifty-two interacting hub genes,
stratified into 11 clusters, may possess transient function involved in
BRD development not previously described in literature. This study
identifies co-expressed genes and coordinated mechanisms associated
with BRD, which necessitates further investigation in BRD-prediction
research.
Introduction
Despite decades of research involved in discovering novel management
tools, developing interventional systems, and advancing antimicrobial
therapeutics, bovine respiratory disease (BRD) remains the leading
cause of morbidity and mortality in beef cattle operations across North
America [[36]1–[37]3]. Due to its widespread prevalence, BRD is
considered one of the most economically devastating components of beef
cattle production systems [[38]2–[39]4]. BRD is a polymicrobial,
multifactorial disease complex, incorporating infectious agents, host
immunity, and environmental elements as predisposing factors
[[40]5–[41]7]. Previous research over the past several decades has
greatly detailed these factors and risks associated with BRD, yet there
is minimal evidence that overall rates of disease have improved [[42]5,
[43]8–[44]10]. Furthermore, diagnostic evaluation of BRD often relies
on visual signs attributed to the disease complex, which are commonly
non-specific to airway and lung disease, and lack clinical sensitivity
[[45]11, [46]12]. Therefore, data driven approaches which capture the
biological intricacies associated with clinical BRD development and
provide candidate molecular targets capable of stratifying or
predicting risk of disease and/or production loss would offer a more
precise method of managing BRD.
Clinical BRD progression and severity often presents as an acute
inflammatory disease [[47]13]. However, molecular and cellular changes
precede physiological changes in terms of disease development. As such,
identifying consistent molecular and/or cellular components that relate
to BRD development would allow for the development of rapid diagnostics
capable of being performed with cattle at the time of facility arrival.
Such a tool could facilitate precision medicine practices in stocker
and feedlot operations and improve both speed and success of targeted
therapy. Accordingly, hematological samples are ideal, as they
represent a relatively noninvasive, cost effective, and readily
obtainable source that reflects dynamic biological processes throughout
the body [[48]14, [49]15].
Previous research has investigated cellular and molecular components
that may indicate or predict clinical BRD. Richeson and colleagues,
utilizing complete blood count (CBC) variables and castration status at
facility arrival, identified significant associations with BRD in
calves with comparatively decreased numbers of eosinophils and
increased numbers of erythrocytes [[50]16]. When evaluating the
relationships between cytokine gene expression and CBC data in cattle
with concurrent BRD, Lindholm-Perry and colleagues discovered that
cattle with BRD possessed a comparative increase in numbers of
neutrophils, decrease in numbers of basophils, and increased expression
of CCL16, CXCR1, and CCR1 [[51]17]. Recent RNA sequencing studies,
performed by both our group and others, have identified mechanisms and
candidate biomarkers in whole blood associated with BRD development
[[52]18–[53]20]. However, these studies primarily sought to identify
differentially expressed genes (DEGs) between cattle that were or were
not treated for BRD based on clinical signs. Focus on identifying DEGs
meant that much of the data generated by these studies was neglected.
Therefore, we aimed to leverage global gene expression patterns across
high-risk cattle, and incorporate available cellular-level
hematological data from the same cattle, to infer mechanisms associated
with BRD development with a more holistic approach.
As gene expression operates in tandem with biological regulatory
networks and complexes, investigation of gene co-expression levels may
reveal transcriptional coordination, distinguish protein production
relationships, and measure cellular composition and function relevant
to specific disease states such as BRD [[54]21, [55]22]. This analysis
approach falls into the field of systems biology, where, in contrast to
reductionist biology, molecular components are pieced and scaled
together to better understand disease and generate novel hypotheses
[[56]23, [57]24]. In this respect, we sought to build networks of
co-expressed genes, utilizing the full structure of previously
published gene expression data [[58]20], and discover relationships
between gene expression and cellular hematological components, which
may elucidate and/or further confirm genes and mechanisms related to
BRD development or resistance.
Materials and methods
Animal enrollment
All animal use and procedures were approved by the Mississippi State
University Animal Care and Use Committee (IACUC protocol #17–120) and
carried out in accordance with relevant IACUC and agency guidelines and
regulations. This study was carried out in accordance with Animal
Research: Reporting of In Vivo Experiments (ARRIVE) guidelines
([59]https://arriveguidelines.org). This study was conducted in
accompaniment with previous work focused on differential gene
expression analysis and candidate biomarker validation [[60]20]; the
RNA-Seq data of these animals were previously deposited in the National
Center for Biotechnology Information (NCBI) Gene Expression Omnibus
(GEO) database under accession number [61]GSE161396. Briefly, 24
samples (n = 12 BRD, n = 12 Healthy) from the 2017 study were
previously selected based on randomized stratification of vaccine and
oral anthelminthic administration upon facility arrival (d0), and 24
samples from the 2019 study randomly selected with equal distribution
of clinical BRD development within 28 days of arrival [[62]20]. All
cattle within each population (year) were of proportional arrival
weight ([63]S1 Table) and age (estimated 4–6 months). All animals
enrolled in these two groups were commercial cattle, with unknown
genetic characteristics and background; this is a typical attribute of
newly received stocker cattle in commercial production systems. Of the
24 cattle from the 2017 population having RNA-Seq data, one individual
(ID: 162–2017_S24; [64]GSM4906455) was not incorporated into the
network analysis due to missing CBC data. The following clinical data
were recorded for each animal: at-arrival fecal egg counts per gram via
modified-Wisconsin procedure (FEC-d0), body weight in pounds (WT) at
arrival, Day 12, Day 26, and Day 82, average daily weight gain at each
time point (ADG), growth rate (slope of weight over days recorded; GR),
at-arrival castration status (Sex), at-arrival rectal temperature
(Temp-d0), development of clinical BRD within 28 days post-arrival
(BRD), number of clinical BRD treatments (Treat_Freq), and timing to
first BRD treatment (Risk_Days). Ages (not recorded) were estimated to
be similar upon facility arrival. Clinical data for these cattle are
found in [65]S1 Table.
Hematology analysis
Approximately 6 mL of whole blood was collected at arrival into
K[3]-EDTA glass blood tubes (BD Vacutainer; Franklin Lakes, NJ, USA)
via jugular venipuncture. Blood samples were stored at 4°C and analyzed
the same day of collection with the flow cytometry-based Advia 2120i
hematology analyzer (Siemens Healthcare Diagnostics Inc., Tarrytown,
NY, USA), testing for the following parameters: white blood cells (WBC;
K/μL), erythrocytes (RBC; M/μL), hemoglobin (HGB; g/dL), hematocrit
(HCT; %), mean corpuscular volume (MCV; fL), mean corpuscular
hemoglobin (MCH; pg), mean corpuscular hemoglobin concentration (MCHC;
g/dL), red blood cell distribution width (RDW; %), and platelets (PLT;
K/μL). Blood smear staining was performed with a Hematek 3000 Slide
Stainer (Siemens Healthcare Diagnostics Inc., Tarrytown, NY, USA) via
Wright-Giemsa stain reagents. Stained blood smears were evaluated for
leukocyte distribution via a manual 300-count white blood cell
differential by trained clinical pathology technical staff at
Mississippi State University College of Veterinary Medicine.
Neutrophil, eosinophil, basophil, monocyte, and lymphocyte percentages
were recorded, with accompanying neutrophil-to-lymphocyte ratios (NL
Ratio). Hematology data for these cattle are found in [66]S2 Table.
RNA-Seq data processing and normalization
The gene-level raw count matrix generated from our previous research
was utilized for this study [[67]20]. Briefly, RNA was isolated via
Tempus Spin RNA Isolation Kits (Thermo Fisher Scientific; Waltham, MA,
USA), following manufacturer’s protocol. TruSeq RNA Library Kit v2
(Illumina; San Diego, CA, USA) was utilized for mRNA sequencing library
preparation, following manufacturer’s protocol. Single-lane,
high-throughput RNA sequencing was performed with NovaSeq 6000 S4
reagent kit and flow cell (Illumina). Sequence read files were quality
assessed and trimmed with FastQC v0.11.9 [[68]25] and Trimmomatic v0.39
[[69]26], respectively. Reference-guided (Bos taurus; ARS-UCD1.2) read
mapping, indexing, and gene-level assembly were performed with HISAT2
v2.2.1 [[70]27, [71]28] and StringTie v2.1.2 [[72]29, [73]30],
respectively. The python program prepDE.py [[74]31] was utilized for
gene-level count matrix construction.
Raw gene counts were imported to R v4.0.4 and processed with the
filterByExpr toolkit [[75]32], removing genes with a minimum total
count of less than 200 and counts-per-million (CPM) below 1.0 across a
minimum of 12 libraries. Libraries were normalized with the trimmed
mean of M-values method (TMM) [[76]33, [77]34] and converted into
log2-counts per million values (log2CPM). A total of 12,795 genes were
identified after count processing and were utilized for weighted
network analysis.
Weighted gene co-expression network analysis (WGCNA)
Weighted network analysis was performed with the R package WGCNA
v1.70.3 [[78]35]. Clinical and hematology trait data were compiled and
aligned to each respective sample library. To remove any outlier
sample, canonical Euclidean distance-based network adjacency matrices
were estimated and used to identify outliers based on standardized
connectivity. Estimated adjacency matrices had network connectivity
standardized with the provided equation, where the z-score normalized
network connectivity (Z.k[μ]) for each sample is calculated from
mean-center scaling of the raw network adjacencies (k) [[79]36]:
[MATH:
Z.kμ=scal<
mi>e(k)μ
msub>=kμ−mean(
k)√var(k). :MATH]
Samples with a standardized connectivity < -5.0 were considered
outliers and to be removed from further analysis; no samples were
considered outliers in this study ([80]S1 Fig). An adjacency matrix was
constructed from the calculated signed Pearson coefficients between all
genes across all samples. We utilized signed networks as they better
capture gene expression trends (up- and down-regulation) and classify
co-expressed gene modules which improve the ability to identify
functional enrichment, when compared to unsigned networks [[81]24,
[82]35–[83]37]. Soft thresholding was used to calculate the power
parameter (β) required to exponentially raise the adjacency matrix, to
reach a scale-free topology fitting index (R^2) of >80%; β = 8 was
selected for this study. The relationship between each unit β and R^2
is seen in [84]S2 Fig. Co-expression modules were constructed with the
automatic, one-step blockwiseModules function within the WGCNA R
package, using the following parameters: power = 8, corType =
“pearson,” TOMType = “signed,” networkType = “signed,” maxBlockSize =
12795, minModuleSize = 30, mergeCutHeight = 0.25, and pamRespectsDendro
= FALSE; all other parameters were set to default. Constructed
co-expression modules were assigned a color by the WGCNA R package,
with any gene not assembling into a specific module placed in the
“grey” module. Module-trait associations were identified with Pearson
correlation between module eigengene (ME; first principal component of
the co-expression matrix [[85]38] and clinical and hematology data).
Modules were considered weakly or strongly correlated with each trait
having a p-value ≤ 0.10 and |R| ≥ 0.3 or p-value ≤ 0.05 and |R| ≥ 0.4,
respectively. Color scaling was performed with the Bioconductor package
viridis v0.6.1 [[86]39] to allow ease of visual interpretation for
individuals with color blindness.
Cross-population module preservation analysis
Based on our previous work, it can be inferred that host gene
expression captured at facility arrival is variable across BRD severity
cohorts [[87]20, [88]40, [89]41]. Therefore, we assessed whether the
at-arrival co-expression patterns and modules found in this study were
well preserved across an RNA-Seq data set from an independent
population of cattle. We investigated cross-populational module
preservation across the whole blood transcriptomes of cattle previously
assessed for differential gene expression ([90]GSE161396; 2019
population (n = 24)) with the modulePreservation function found within
the WGCNA R package. The gene-level raw count matrix from previous
analysis [[91]20] was utilized and processed, filtered, and normalized
in identical procedures as the 2017 RNA-Seq data set (see RNA-Seq data
processing and normalization section); a total of 12,803 genes were
identified in the 2019 data set after count processing and
normalization. Permutation testing (n = 200 permutations) was conducted
to assess the significance of module preservation across the 2017 and
2019 RNA-Seq data sets, utilizing the two composite statistical
measurements Zsummary and medianRank scores [[92]36, [93]42]. Briefly,
the identified modules within the test network are randomly permuted n
times, where, for each permuted index, the mean and standard deviation
is calculated for defining the corresponding Z statistic [[94]42,
[95]43]. Through the combination of additional preservation statistics
(average of Zdensity and Zconnectivity), the calculated Zsummary
statistic determines the level of mean connectivity among all genes
within a module (i.e., network density) across the two data sets
[[96]24, [97]42]. Higher Zsummary values indicate a stronger level of
module preservation between data sets but is dependent on the number of
genes within the module (i.e., module size) [[98]42]. To further
evaluate preservation in a module size-independent manner, medianRank
scores are calculated from the mean connectivity and density
measurements observed from each module and assigned a rank score
[[99]42]. Lower medianRank values indicate a stronger level of module
preservation between data sets. For this study, any module possessing
Zsummary ≥ 10 and medianRank ≤ 5 was considered highly preserved.
Functional enrichment analysis of preserved modules
WebGestalt 2019 [[100]44] (WEB-based Gene SeT AnaLysis Toolkit;
accessed September 13, 2021) was utilized for over-representation
analysis to identify enriched Gene Ontology (GO) biological processes,
cellular components, molecular functions, and pathways from genes found
in each module considered well preserved. Pathway enrichment analysis
was performed with the pathway database Reactome [[101]45]. Human (Homo
sapiens) gene orthologs and functional databases were utilized for GO
term and pathway enrichment analyses. Over-representation analysis
parameters within WebGestalt 2019 included between 3 and 3000 genes per
category, Benjamini-Hochberg (BH) procedure for multiple hypothesis
correction, adjusted p-value (FDR) cutoff of 0.05 for significance, and
a total of 10 expected reduced sets of the weighted set cover algorithm
for redundancy reduction.
BRD-associated hub gene identification and network analyses
Hub genes are those genes found within a module (eigengenes) that
possess high connectivity which may exhibit a greater degree of
biological significance in respect to significantly associated clinical
traits, when compared to all other eigengenes [[102]38, [103]46,
[104]47]. Here, we sought to identify hub genes found from modules
which are significantly associated with any of the clinical BRD
categories (BRD, Treat_Freq, and Risk_Days). This was performed in the
WGCNA R package with two procedures. First, Pearson correlation between
gene expression and module eigengenes was calculated, resulting in the
level of module membership (k[ME]) for each gene. Second, the Pearson
correlation between individual gene expression level and clinical trait
was calculated, resulting in the level of gene significance (GS) for
each gene. Any gene possessing k[ME] and GS values ≥ 0.7 and ≥ 0.3,
respectively, were considered hub genes for clinical traits [[105]36].
All BRD-associated hub genes were used for network construction of
known and predicted protein-protein interactions with the Search Tool
for the Retrieval of Interacting Genes (STRING) database v11.5
[[106]48], utilizing bovine (Bos taurus) annotations. STRING analysis
was performed with the physical subnetwork setting, where edges only
display protein interactions that have evidence of binding to or
forming a physical complex. Any interaction above a combined score
(confidence) of 0.200 was incorporated into the complete network prior
to network clustering; disconnected nodes were removed from the
network. The Markov Cluster (MCL) algorithm was utilized for network
clustering due to its superior performance in complex extraction
without the need of additional parameter tuning [[107]49]. Hub genes
within the interaction network were placed into distinct clusters based
on MCL clustering of the distance matrix acquired from the combined
interaction scores, using a MCL inflation parameter of 1.4.
Statistical analysis
Clinical and hematology data (described in animal enrollment and
hematology analysis) were compared between cattle treated for
naturally-acquired clinical BRD within the first 28 days following
facility arrival (BRD) and those never being diagnosed nor treated
(Healthy). Residual normality was assessed in R v4.0.4 with the
Shapiro-Wilk test [[108]50], with an a priori level of significance set
at 0.10; neutrophil percentage (Neu%), eosinophil percentage (Eos%),
basophil percentage (Baso%), lymphocyte percentage (Lymph%),
neutrophil-to-lymphocyte ratio (NL ratio), FEC-d0, MCHC, RDW, and Sex
were considered non-normally distributed. Differences in normally
distributed variables between BRD and Healthy cattle were assessed with
the Student’s t-test. Differences in non-normally distributed variables
were assessed with the Welch’s t-test; differences between the two
groups with respect to Sex was assessed with Pearson’s chi-square test
with Yates’ continuity correction. Differences between BRD and Healthy
cattle were considered significant having a p-value ≤ 0.05.
Results
Statistical analysis of clinical and hematological parameters
Descriptive statistics for the clinical and hematological data are
provided in [109]Table 1. Regarding the hematological parameters,
average values of Lymph%, RDW, and PLT were outside of the internal
reference intervals for both BRD and Healthy cattle. In this study, RBC
was considered significantly higher at arrival in BRD cattle compared
to Healthy cattle; no other parameter was considered significantly
different between the two groups. Regarding clinical data, BRD cattle
possessed significantly lower weight gain by end of study (ADG-d82;
2.273 lbs/day in BRD and 2.946 lbs/day in Healthy) and lower calculated
slopes of weight gain over time (Growth Rate; 2.370 in BRD and 2.995 in
Healthy); no other clinical parameter was considered significantly
different between the two groups. We did not include the at arrival
weight (WT-d0) as an analysis variable because there was no significant
difference between the Healthy (mean: 477.0, s.d.:24.8) and BRD (mean:
475.3, s.d.:26.9) cohorts.
Table 1. Statistical analysis of hematological and clinical traits between
BRD and healthy groups.
Variable Internal Reference BRD mean (s.d.) Healthy mean (s.d.) p-value
Neu% 37.000–80.000 35.917 (5.547) 37.213 (9.748) 0.717
Eos% 0.000–12.000 3.944 (3.237) 2.635 (1.616) 0.251
Baso% 0.000–2.500 0.193 (0.213) 0.151 (0.218) 0.658
Mono% 0.000–12.000 8.862 (4.603) 8.363 (4.507) 0.805
Lymph% 10.000–50.000 51.083 (4.756) 51.635 (11.928) 0.893
NL Ratio N/A 0.711 (0.141) 0.859 (0.660) 0.504
WBC (K/μL) 4.000–12.000 7.430 (2.722) 7.320 (1.292) 0.913
RBC (M/μL) 5.000–9.990 9.605 (0.568) 9.032 (0.676) 0.047
HGB (g/dL) 7.700–15.000 13.075 (1.071) 12.491 (0.906) 0.194
HCT (%) 25.000–45.000 36.125 (3.269) 35.000 (2.534) 0.391
MCV (fL) 36.000–55.000 37.725 (3.843) 38.845 (2.851) 0.460
MCH (pg) 12.000–22.000 13.625 (1.112) 13.855 (0.806) 0.597
MCHC (g/dL) 32.000–40.000 36.225 (1.190) 35.691 (0.977) 0.272
RDW (%) 11.600–14.800 29.258 (2.362) 27.564 (3.023) 0.171
PLT (K/μL) 200.000–900.000 1413.083 (506.885) 1149.000 (401.516) 0.203
FEC-d0 N/A 761.250 (768.795) 618.364 (408.492) 0.597
ADG-d12 N/A 0.667 (1.604) 2.167 (1.838) 0.059
ADG-d26 N/A 1.917 (1.204) 2.710 (0.948) 0.110
ADG-d82 N/A 2.273 (0.599) 2.946 (0.432) 0.008
Growth Rate N/A 2.370 (0.554) 2.995 (0.435) 0.009
Temp-d0 (F°) N/A 103.333 (0.712) 103.291 (0.667) 0.890
Sex N/A 10 bulls, 2 steers 10 bulls, 1 steer 1.000
[110]Open in a new tab
Means, standard deviations (in parentheses), and statistical
probability values of differences in hematological and clinical
parameters between BRD (n = 12) and Healthy (n = 11) cattle. Parameters
were considered significantly different with p-values ≤ 0.05.
Weighted gene co-expression network construction
The remaining filtered genes (n = 12,795) were used for WGCNA network
and module construction. The resulting network identified a total of 41
color-coded modules of co-expressed genes, excluding the grey module
which incorporates uncorrelated genes (n = 1,235) ([111]Fig 1). Across
the 41 assigned modules, the turquoise module possessed the largest
number of co-expressed genes (n = 2,503) and the lightsteelblue1 module
possessed the smallest number of co-expressed genes (n = 38); the
average size of each module was approximately 282 genes. The complete
list of genes and module assignment is found in [112]S3 Table.
Fig 1. Cluster dendrogram of 12,795 genes generated through dissimilarity
metrics (1-TOM) and hierarchical clustering.
[113]Fig 1
[114]Open in a new tab
Automated block-wise module detection of interconnected genes were
grouped into 41 unique color-coded modules, excluding the grey module
(uncorrelated genes). The x-axis corresponds to the gene-module
assignment and the y-axis (Height) depicts the calculated distance
between co-expressed genes from hierarchical average linkage
clustering.
Module-trait relationship with hematological and clinical datasets
Pearson correlation heatmaps were generated to assess the relationship
between all modules and hematological clinical datasets. Regarding
hematological data, several significant relationships of interest exist
([115]Fig 2). The tan module possessed the highest number of
significant correlations with the hematological data (8), followed by
turquoise, pink, lightgreen, and lightcyan modules (7). With respect to
RBC, considered significantly higher at arrival in BRD cattle compared
to Healthy cattle, six modules were strongly correlated: paleturquoise
(R = 0.44, p = 0.03), lightcyan (R = 0.51, p = 0.01), green (R = 0.41,
p = 0.05), steelblue (R = 0.50, p = 0.01), brown (R = -0.45, p = 0.03),
turquoise (R = 0.49, p = 0.02). Additionally, seven modules were
considered weakly correlated with RBC: magenta (R = 0.32, p = 0.10),
darkgreen (R = 0.36, p = 0.09), lightsteelblue1 (R = -0.36, p = 0.09),
blue (R = 0.36, p = 0.10), saddlebrown (R = 0.36, p = 0.09), orangered4
(R = -0.33, p = 0.10), tan (R = -0.36, p = 0.09). Regarding modules
correlating with RBC, three modules possessed significant associations
with multiple related red cell indices (HGB, HCT, MCV, MCH, MCHC, and
RDW): saddlebrown, steelblue, and lightcyan. Saddlebrown was strongly
associated with MCV (R = -0.63, p = 0.001) and MCH (R = -0.62, p =
0.001), and weakly associated with HCT (R = -0.31, p = 0.10) and MCHC
(R = 0.36, p = 0.10). Steelblue was strongly associated with RDW (R =
0.70, p = 2e-04) and weakly associated with HGB (R = 0.35, p = 0.10)
and MCHC (R = 0.40, p = 0.06). Lightcyan was strongly associated with
HGB (R = 0.47, p = 0.02) and RDW (R = 0.51, p = 0.01) and weakly
associated with HCT (R = 0.38, p = 0.08).
Fig 2. Module-trait relationships between co-expression modules and
hematological traits.
[116]Fig 2
[117]Open in a new tab
Pearson correlations between each of the unique color-coordinated
modules and hematological traits are visualized and represented as a
heatmap. Each row represents a distinct co-expression module, and each
column represents hematological traits as follows: white blood cells
(WBC; K/μL), erythrocytes (RBC; M/μL), hemoglobin (HGB; g/dL),
hematocrit (HCT; %), mean corpuscular volume (MCV; fL), mean
corpuscular hemoglobin (MCH; pg), mean corpuscular hemoglobin
concentration (MCHC; g/dL), red blood cell distribution width (RDW; %),
and platelets (PLT; K/μL). Cells are represented by how positive
(yellow/white) or negative (purple/black) the correlation is between
module and hematological trait, respectively.
The relationships between modules and clinical data are found in
[118]Fig 3. With respect to all clinical disease associations (BRD,
Treat_Freq, and Risk_Days), five modules possessed significant
correlations: steelblue, mediumpurple3, royalblue, orange, and violet.
Steelblue was strongly associated with BRD (R = 0.41, p = 0.05) and
Risk_Days (R = -0.41, p = 0.05). Mediumpurple3 was weakly associated
with Treat_Freq (R = -0.40, p = 0.06). Royalblue was weakly associated
with Treat_Freq (R = -0.40, p = 0.06). Orange was weakly associated
with BRD (R = -0.39, p = 0.07), Treat_Freq (R = -0.34, p = 0.10), and
Risk_Days (R = 0.38, p = 0.07). Violet was weakly associated with
Treat_Freq (R = -0.38, p = 0.07). Regarding production traits (ADG-d12,
ADG-d26, ADG-d82, and GR), ten modules possessed significant
correlations: darkgreen, skyblue, darkturquoise, darkmagenta, purple,
yellowgreen, orange, orangered4, darkred, and lightyellow. However, to
mitigate unexplained variation which may confound differences in
ADG-d12 and ADG-d26, coupled with the lack of significance between
disease cohorts, eight modules correlating with ADG-d82 and GR were
prioritized. Darkred was strongly associated with ADG-d82 (R = 0.50, p
= 0.02) and GR (R = 0.50, p = 0.02). Orangered4 was strongly associated
with ADG-d82 (R = 0.41, p = 0.05) and weakly associated with GR (R =
0.38, p = 0.07). Orange was strongly associated with ADG-d82 (R = 0.43,
p = 0.04) and GR (R = 0,41, p = 0.05). Yellowgreen was strongly
associated with ADG-d82 (R = 0.41, p = 0.05) and GR (R = 0.42, p =
0.05). Purple was weakly associated with GR (R = 0.32, p = 0.10).
Darkmagenta was weakly associated with ADG-d82 (R = -0.34, p = 0.10).
Skyblue was weakly associated with GR (R = -0.36, p = 0.10). Darkgreen
was weakly associated with ADG-d82 (R = -0.32, p = 0.10). Notably,
orange was the only module which possessed significant correlations
with both disease-associated and weight gain traits. However, orange
did not possess any significant correlations with hematological traits.
Fig 3. Module-trait relationships between co-expression modules and clinical
traits.
[119]Fig 3
[120]Open in a new tab
Pearson correlations between each of the unique color-coordinated
modules and clinical traits are visualized and represented as a
heatmap. Each row represents a distinct co-expression module, and each
column represents clinical traits as follows: at-arrival fecal egg
counts per gram via modified-Wisconsin procedure (FEC-d0), body weight
in pounds (WT) at arrival, Day 12, Day 26, and Day 82, calculated
average daily weight gain at each time point (ADG), growth rate (slope
of weight over days recorded; GR), at-arrival castration status (Sex),
at-arrival rectal temperature (Temp-d0), development of clinical BRD
within 28 days post-arrival (BRD), number of clinical BRD treatments
(Treat_Freq), and timing to first BRD treatment (Risk_Days). Cells are
represented by how positive (yellow/white) or negative (purple/black)
the correlation is between module and clinical trait, respectively.
Cross-populational network preservation analysis
Module preservation analysis identified five modules considered well
preserved across the 2017 and 2019 populations: black (size = 432;
Zsummary = 39.772; medianRank = 4), purple (size = 296; Zsummary =
34.773; medianRank = 2), lightgreen (size = 123; Zsummary = 23.291;
medianRank = 1), tan (size = 222; Zsummary = 17.559; medianRank = 5),
and steelblue (size = 59; Zsummary = 11.555; medianRank = 3) ([121]Fig
4). Notably, steelblue was the only well-preserved module which
possessed significant association with BRD-related clinical traits.
Fig 4. Cross-populational module preservation analysis.
[122]Fig 4
[123]Open in a new tab
The medianRank and Zsummary values across all modules are depicted
through the scatterplot x- and y-axes, respectively. Zsummary values ≥
10.0 and medianRank values ≤ 5.0, indicated by dashed lines, denote
that a module identified with the 2017 gene expression data is well
preserved across the 2019 gene expression data.
Functional enrichment analysis of well-preserved modules
To explore the functionality and biological relevance of the five well
preserved modules, we performed over-representation analysis with all
genes from each module (black, purple, lightgreen, tan, and steelblue;
[124]S4 Table). Analysis of genes from the black module revealed 47
biological process terms, 49 cellular component terms, 17 molecular
function terms, and five significantly enriched pathways. Biological
processes identified from genes within the black module were related to
neutrophil activity and degranulation, aldehyde metabolism, nitrogen
compound response and catabolism, and cellular transport. Cellular
components identified from genes within the black module involved
intracellular and extracellular vesicles, secretory granules, cellular
junctions, and lysosomes. Molecular functions identified from genes
within the black module involve cytokine, enzyme, and calcium-dependent
protein binding, aldehyde dehydrogenase (NAD) activity, and
interleukin-1 receptor activity. Enriched pathways identified from
genes within the black module involved neutrophil degranulation,
metabolic disease, and signaling via tyrosine kinase receptor.
Analysis of genes from the purple module revealed 54 biological process
terms, 46 cellular component terms, 16 molecular function terms, and 40
significantly enriched pathways. Biological processes identified from
genes within the purple module involved mitochondrial processes
(cristae formation, respiratory chain complex assembly), non-coding RNA
processing and maturation, cellular protein transport, and metabolic
processes and biosynthesis. Cellular components identified from genes
within the purple module involved cell substrate and adhesion junction,
ribosomes, cytoplasmic side of endoplasmic reticulum, mitochondrial
inner membrane and envelope, and the 48S preinitiation complex.
Molecular functions identified from genes within the purple module
involved mRNA/rRNA binding, ubiquitin ligase inhibition, ATP synthase
activity, and NADH dehydrogenase. Enriched pathways identified from
genes within the purple module involved infectious disease/viral
infection, amino acid metabolism, translation initiation/termination,
rRNA processing, and ATP synthesis and respiratory electron transport.
Analysis of genes from the lightgreen module revealed 38 biological
process terms, 49 cellular component terms, three molecular function
terms, and one significantly enriched pathway. Biological processes
identified from genes within the lightgreen module involved
leukocyte/neutrophil differentiation, activation, and degranulation,
tissue remodeling, cell secretion and exocytosis, phagocytosis and
micropinocytosis, dendritic cell activation, and interleukin-8
secretion. Cellular components identified from genes within the
lightgreen module involved lysosome, secretory/azurophil granule,
vesicular/vacuolar membrane, granule lumen, and macropinosome.
Molecular functions identified from genes within the lightgreen module
involved symporter activity, potassium-chloride symporter activity, and
phosphatidylinositol binding. The single enriched pathway identified
from genes within the lightgreen module was neutrophil degranulation.
Analysis of genes from the tan module revealed 35 biological process
terms, 32 cellular component terms, four molecular function terms, and
two significantly enriched pathways. Biological processes identified
from genes within the tan module involved B-cell activation, receptor
signaling, and regulation, immunoglobulin production, cytokine
production, positive regulation of interferon-gamma production, and
mononuclear cell proliferation. Cellular components identified from
genes within the tan module involved MHC class II protein complex,
lytic vacuole membrane, clathrin-coated endocytic vesicle, endosomal
membrane, and B-cell receptor complex. Molecular functions identified
from genes within the tan module involved MHC class II receptor
activity, MHC class II protein complex binding, and peptide antigen
binding. Enriched pathways identified from genes within the tan module
were antigen activates B-cell receptor (BCR) leading to generation of
second messengers and CD22-mediated BCR regulation.
Analysis of genes from the steelblue module revealed three biological
process terms, three cellular component terms, no molecular function
terms, and no significantly enriched pathways. Biological processes
identified from genes within the steelblue module were cell surface
receptor signaling pathway, negative regulation of fibroblast growth
factor receptor signaling pathway, and antigen receptor-mediated
signaling pathway. Cellular components identified from genes within the
steelblue module involved side of membrane, plasma membrane part, and
alpha-beta T cell receptor complex.
BRD-associated hub gene identification and in silico protein-protein
interaction and clustering analyses
Hub gene identification analysis included co-expressed genes from the
following modules: violet (54), orange (68), royalblue (100),
mediumpurple3 (41), and steelblue (59). The k[ME] and GS value cutoffs
within each module resulted in 24, 46, 30, 22, and 32 BRD-associated
hub genes from the violet, orange, royalblue, mediumpurple3, and
steelblue modules, respectively ([125]S5 Table). These resulting hub
genes were further utilized for physical subnetwork protein-protein
interactions and network clustering. After removal of all disconnected
nodes, the interaction network demonstrated significant connectivity
between 52 proteins across 11 distinct clusters with high inter-nodal
connectivity ([126]Fig 5); these gene products and their combined
interaction scores are found in [127]S6 Table. These connected gene
products demonstrate possible at-arrival biomolecular complexes
associated with BRD development and severity.
Fig 5. Protein-protein interaction network of interconnected BRD-associated
hub genes.
[128]Fig 5
[129]Open in a new tab
Interaction score analysis reveals 52 genes, with high intramodular and
BRD-trait relationship, which possess high connectivity. Interconnected
gene products (nodes) were further grouped into distinct clusters based
on their interaction scores (edges). Edge thickness represents the
level of interaction confidence between nodes.
Discussion
While at-arrival management practices are somewhat dependent upon
anticipated risk of BRD development, both inter- and intra-herd level
disease prevalence is highly variable [[130]5, [131]51]. To counter
this variability, beef production systems will often administer
antimicrobials and/or immunostimulants at arrival to reduce the risk of
clinical BRD development and associated production losses [[132]52,
[133]53]. However, immunostimulant administration alone as a
metaphylactic protocol for controlling BRD appears to have minimal
impact on rates of morbidity [[134]54–[135]56]. Metaphylactic use of
antimicrobials at arrival reduces risk of morbidity and mortality
across beef production systems, however this management practice is
variable in efficacy, in both rates of disease across cattle
populations and in pharmacological choice, and the practice is
suspected to drive expansion of antimicrobial resistance, a growing
societal concern [[136]52, [137]57, [138]58]. Given this background,
our research group and others have focused on evaluating host
transcriptomes at arrival, to better characterize host-driven
mechanisms and develop candidate mRNA biomarkers associated with
clinical BRD outcomes [[139]18, [140]19, [141]20]. These studies have
provided valuable information regarding cattle treated based on
clinical signs of BRD, but these studies heavily rely on semi-objective
evaluation of BRD cases and may miss underlying subclinical or
misdiagnosed disease. As such, the underlying host mechanisms involved
in BRD development remain disputed. Therefore, to identify at facility
arrival genes and mechanisms which may represent the variable
development of BRD cases and leverage the total expression profile of
individual cattle, we employed a systems biology approach with weighted
co-expression network analysis. This methodology allows us to identify
networks of genes exclusively co-expressed, and to evaluate said
networks in a reduced manner in order to identify molecules and
mechanisms of interest for future BRD prediction studies. Importantly,
co-expression network analysis serves as a complementary, yet distinct,
approach to identifying genes and mechanisms associated with disease
status, when compared to differential expression analyses. The network
approach performed in this study evaluates and identifies genes that
are strongly coordinated in terms of expression, and determines
correlation with overlapping metadata (clinical data), whereas
differential expression analyses typically follow a pairwise approach
to determine level of effect and probability of gene differences
between groups. Co-expression network analyses consider greater
biological context when evaluating gene expression differences,
compared to more traditional pairwise approaches. Additionally, through
utilization of hematological parameters, we could capture changes in
the cellular composition of whole blood as they may relate to cellular
and gross pathophysiology across individuals.
While we recognize that dynamic changes captured in whole blood may not
completely encompass biomolecular characteristics seen within lung
tissue, whole blood serves as a practical and easily obtainable sample
for respiratory and inflammatory disease diagnostics [[142]14,
[143]59]. After initial statistical assessment of CBC data, we
identified that both BRD and Healthy cattle possessed comparable
lymphocytosis, thrombocytosis, and erythrocytic macrocytosis; the
distribution of these values were not considered significantly
different between the two groups. Notably, mild to moderate levels of
dehydration, a common condition in newly arrived post-weaned beef
animals, may cause elevated changes in hematocrit levels and
lymphocytes [[144]60, [145]61]. Lymphocytosis and thrombocytosis may
also result from host responses to infection or inflammation.
Additionally, reticulocytosis (i.e., immature erythrocytosis) is the
most common cause of erythrocytic macrocytosis [[146]60] and was noted
as a common feature found across all blood samples submitted for
analysis. While these cattle did not possess physiological nor
hematological evidence of hemolysis or blood loss upon facility
arrival, this finding may be associated with early regenerative anemia,
systemic inflammation, or mineral deficiencies [[147]60–[148]62].
Furthermore, blood-borne pathogens were not reported from blood smear
assessment. Nevertheless, it does not rule out the possibility of
mild/subclinical intraerythrocytic pathology or asymptomatic
convalescence that may result in these increased hematological changes.
Such pathology is often caused by parasitic diseases such as
anaplasmosis, a common infectious disease of cattle across the United
States [[149]63, [150]64]. It is plausible that these findings indicate
that cattle at facility entry are undergoing similar physiological
changes as it relates to stressful and/or pathogenic events
(long-distance transportation, co-mingling, etc.) and underlying
genomic mechanisms serve to resolve or prolong deleterious
physiological conditions that result in BRD.
With respect to distributions, we found that the majority of variables
tested for module correlations were normally distributed. Of the nine
(of the 26 total) non-parametric testing variables, six demonstrated
relative linearity upon visual distribution assessment (data not shown;
Neu%, Lymph%, NL Ratio, MCHC, RDW, and FEC-d0). Moreover, the
non-parametric nature of Eos%, Baso%, and Sex is perceived to be due to
data sparsity and relative rarity of the expected cell counts
attributed to eosinophils and basophils in cattle ([151]Table 1)
[[152]65]. We elected to utilize Pearson correlation models as they can
measure discrete and continuous datasets without need for
transformation, and preserve linearity from the raw data structure when
assessing these variables with gene co-expression modules.
Additionally, calculated Pearson correlations from WGCNA can better
handle datasets with missing or censored data and is highly
computationally efficient [[153]66]. We identified that RBCs were
significantly increased in cattle that would go on to develop BRD
versus those that did not. Although this result was identified in a
relatively small number of cattle, it corresponds with the work of
Richeson and colleagues [[154]16]. As discussed within their prior
research, elevated RBCs may indicate dehydration and subsequent
predisposition with BRD development [[155]5, [156]16]. Interestingly,
we were able to identify one well-preserved co-expression module which
possessed significant correlations with RBCs, RDW, PLT, BRD, and
Risk_Days (i.e., shorter time to first treatment): steelblue. Upon
further investigation, we discovered that the genes within this module
were related to antigen receptor-mediated signaling (BLK, CD247, CD276,
CD3G, GATA3, and PLEKHA1) and negative regulation of fibroblast growth
factor receptor signaling (CREB3L1, GATA3, and WNT5A), and specifically
components of alpha-beta T-cell receptor complexes (CD247 and CD3G).
The upregulation of IL-7R and associated signaling molecules, which
include CD3G and CD247, initiate NOTCH-dependent proliferation of
T-cell precursors [[157]67]. Furthermore, elevated levels of BLK and
GATA3 tend to skew the immune response towards Th2-type immunity
[[158]68–[159]70]. In terms of RBC relationship, previous research has
demonstrated that Th2-stimulated bone marrow T-cells promote erythroid
differentiation and lead to the development of erythroblasts [[160]71].
Additionally, CXCL12, also identified within the steelblue module and
previously identified as a differentially expressed gene associated
with BRD development [[161]20], is involved in Th2-cell migration and
immune response [[162]71, [163]72]. HNRNPH3, found within the steelblue
module, has previously been identified as a key transcription factor
associated with clinical BRD [[164]18]. Lastly, several genes
identified in the steelblue module were also found in the “turquoise”
module identified by Hasankhani and colleagues [[165]24], which
enriched positive regulation of activated T-cell proliferation and
Th1/Th2-cell differentiation pathways. While this study cannot
elucidate the exact mechanistic components nor temporality of molecular
events, it suggests that promotion of Th2-mediated T-cells at arrival
shares a common mechanism with RBC elevation and risk of BRD
development. Our previous research has indicated that genes elevated at
arrival in cattle that eventually develop BRD interact, and may
enhance, TLR-4 and IL-6 responses [[166]20, [167]40, [168]73], which
may contribute to the co-expressed pattern related to Th2-mediated
T-cell development [[169]74]. Overall, this pattern of Th2-mediated
immunity is strongly associated with clinical BRD development and
timing to first treatment, and may further strengthen the depiction
that early Th2 responses indicate clinical disease development and lung
pathology [[170]75, [171]76].
While steelblue was the only well-preserved BRD-associated module
detected, four other modules were determined to be well-preserved
across populations and warranted specific functional enrichment
investigation: black, purple, lightgreen, and tan. Genes within the
black module, largely involved with neutrophil activation and
degranulation, IL-1 activity, and metabolic disease, was only
significantly associated with hemoglobin and erythrocyte parameters
(HGB, HCT, MCV, and MCH); notably, the black module did not possess any
significant associations with clinical variables. This may indicate
that neutrophilic and IL-1 activity was not indicative of BRD within
this population of cattle, and/or additional disease-associated
variables were not recorded in this study. Genes within the purple
module, associated with increased eosinophil percentage, decreased
neutrophil-lymphocyte ratio, decreased MCV and MCH, increased
at-arrival fecal parasitic egg count, and increased growth rate (weight
gain over 82 days), largely enriched for mitochondrial function and
aerobic metabolism and RNA processing. Importantly, this module
possessed positive association to weight gain independent of BRD
development. Previous research has investigated many of these ribosomal
protein-encoding genes for their potential for immune effector capacity
[[172]77] and cell regulation [[173]78], however this marks the first
time, to our knowledge, that they have been implicated in contributing
to weight gain potential in high-risk cattle. Notably, one gene (RPS26)
has been previously identified as a differentially decreased marker in
the diseased lungs of cattle experimentally challenged with
BRD-associated pathogens [[174]79, [175]80]. Similar to the black
module, genes identified within the lightgreen module were associated
with hemoglobin and erythrocyte parameters, but additionally positively
correlated with neutrophil percentage and neutrophil-lymphocyte ratio,
and negatively correlated with basophil percentage; likewise, the
lightgreen module did not possess significant associations with
clinical variables. Lastly, the tan module, possessing several
significant hematological associations, and was negatively correlated
with castration status at arrival, possessed genes which enriched for
B-cell receptor complexes and regulation and interferon-gamma
production. Unfortunately, the underlying physiological impact of
co-expressed genes identified within the black, lightgreen, and tan
modules were not captured in this study. As this study was primarily
focused on BRD development and severity, the genes within these three
modules may possess a role in other disease complexes or
immune-mediated events, such as gastrointestinal or apoptotic/necrotic
diseases.
Utilizing hub gene and interaction network analyses, we further
identified genes related to BRD development and severity. Here, we
detected and mapped 52 genes into a protein-protein interaction
network, further stratified into 11 distinct clusters based on their
combined interaction scores. This procedure helps describe the physical
relationship that multiple BRD-associated gene products possess with
one another in a more holistic approach. Here, we may infer that these
interactions possess accompanying transient functions involved in BRD
development not previously described in literature. As such, these
predicted protein-protein network interactions may infer potential
modular units which participate in BRD development or resistance
[[176]81, [177]82]. Further evidence of the associative importance
related to BRD development exists with these genes, as CXCL12
[[178]20], TLL2 [[179]20], ALOX15 [[180]18, [181]20, [182]40], and
LOC100298356 [[183]73, [184]79, [185]80, [186]83] have been previously
identified as differentially expressed when comparing cattle with and
without BRD development. Specifically, CXCL12, TLL2, and LOC100298356
are considered drivers of innate surveillance and inflammation
associated with BRD development, whereas ALOX15 encodes for an enzyme
involved in specialized proresolving mediator biosynthesis and
associated with cattle that do not develop clinical BRD in high-risk
systems [[187]18, [188]20, [189]40, [190]79, [191]83]. Collectively, we
detected these previously identified differentially expressed genes,
associated to BRD development, with an independent approach. This
overlap emphasizes the potential capability of these genes and
mechanisms to serve a predictive role for BRD. Proteomic approaches
have detailed that proteins infrequently operate as single biological
entities and, when involved in similar biological functions, interact
in dynamic, yet organized complexes [[192]84–[193]87]. As such, these
findings provide candidate protein complexes related to BRD development
and severity, which warrants further investigation for avenues of
confirmation in larger populations of cattle and novel therapeutic
target development.
Conclusions
This study was conducted to utilize systems biology methodology to
further establish genes, mechanisms, and coordinated biological
complexes associated with dynamic hematological changes and BRD
development. Utilizing our previously published RNA-Seq data and WGCNA,
we identified five well-preserved modules of highly co-expressed genes
with significant associations with hematological and clinical traits in
cattle at facility arrival. The “steelblue” module, containing genes
involved in alpha-beta T cell receptor complex and negative regulation
of fibroblast growth factor receptor signaling, possessed significant
positive correlations with erythrocyte count, platelet count, red cell
width, and BRD diagnosis, and negative correlation with days at risk
for BRD. The “purple” module, containing genes involved in
mitochondrial processes and non-coding RNA processing and maturation,
possessed significant correlation with increased eosinophil percentage,
decreased neutrophil-lymphocyte ratio, and increased growth rate
(weight gain over time). Protein-protein interaction network and
clustering analyses of BRD-related hub genes identified possible
at-arrival biological complexes strongly associated with BRD
development; many of these hub genes have been described as
differentially expressed genes in previous BRD research. Through this
holistic molecular approach, we provide genes, mechanisms, and
predicted protein complexes associated with BRD development and
performance which are warranted for future analyses targeted in
predicting BRD at facility arrival.
Supporting information
S1 Table. Clinical metadata of cattle selected for WGCNA analysis.
(XLSX)
[194]Click here for additional data file.^ (15.1KB, xlsx)
S2 Table. CBC and leukocyte distribution data of cattle selected for
WGCNA analysis.
(XLSX)
[195]Click here for additional data file.^ (12.7KB, xlsx)
S3 Table. Full gene list and weighted module assignment.
(CSV)
[196]Click here for additional data file.^ (222.4KB, csv)
S4 Table. Functional enrichment analysis of the five well-preserved
modules.
(XLSX)
[197]Click here for additional data file.^ (44.4KB, xlsx)
S5 Table. Hub gene analysis of all five BRD-associated modules.
(XLSX)
[198]Click here for additional data file.^ (34.7KB, xlsx)
S6 Table. STRING identifiers and physical interaction scores
(combined_score ≥ 0.200).
(XLSX)
[199]Click here for additional data file.^ (13.4KB, xlsx)
S1 Fig. Heatmap and hierarchical clustering of clinical and
hematological data across the 23 cattle utilized for transcriptome
network analysis.
Standardized connectivity was calculated from network adjacency
matrices and used to classify potential outliers (Z.k < -5); no animal
was identified as an outlier. The remaining rows represent the
numerical values of all clinical and hematological traits across each
animal. Colors indicate an increase (yellow/white) or decreased
(purple/black) value for each trait; Sex and BRD are both represented
as a value of 1 for bulls and Yes, and 0 for steers and No,
respectively.
(TIF)
[200]Click here for additional data file.^ (1.3MB, tif)
S2 Fig. Soft threshold (β) selection for signed weighed correlation
network construction through scale free topology (SFT) plot analysis.
A) SFT index R^2 (y-axis) at increasing soft threshold powers (β;
x-axis). The value β = 8 was selected, seen where the saturation curve
is above 0.8 (orange horizontal line). B) Increasing soft threshold
powers (β; x-axis) with respect to decreasing mean connectivity
(y-axis). The goal of selecting a value β is to maximize scale
independence (i.e., suppress low correlations) while simultaneously
minimizing loss in mean connectivity.
(TIF)
[201]Click here for additional data file.^ (542.6KB, tif)
Acknowledgments