Abstract In Populus, drought is a major problem affecting plant growth and development which can be closely reflected by corresponding transcriptomic changes. Nevertheless, how these changes in Populus are not fully understood. Here, we first used meta-analysis and machine learning methods to identify water stress-responsive genes and then performed a systematic approach to discover important gene networks. Our analysis revealed that large transcriptional variations occur during drought stress. These changes were more associated with the response to stress, cellular catabolic process, metabolic pathways, and hormone-related genes. The differential gene coexpression analysis highlighted two acetyltransferase NATA1-like and putative cytochrome P450 genes that have a special contribution in response to drought stress. In particular, the findings showed that MYBs and MAPKs have a prominent role in the drought stress response that could be considered to improve the drought tolerance of Populus. We also suggest ARF2-like and PYL4-like genes as potential markers for use in breeding programs. This study provides a better understanding of how Populus responses to drought that could be useful for improving tolerance to stress in Populus. Subject terms: Plant molecular biology, Plant stress responses, Plant sciences Introduction Climate change is a worldwide problem that has negative impacts on ecological systems, species distribution and forest growth. High temperatures resulting from climate change enhance the frequency and intensity of drought. Such extreme conditions of drought can lead to decrease forest productivity and increase tree mortality^[29]1–[30]3. Drought has significant effects on plant growth and development; these effects are mainly due to changes in the normal rates of photosynthesis, osmotic adjustment and oxidative damage^[31]4,[32]5. Response of plants to drought is a complex mechanism involving morphological, physiological, and metabolic changes^[33]6. A variety of protective genes, proteins, and pathways are involved in the protective responses against stress conditions^[34]7. Poplars (Populus sp.) are trees with high ecological and economic values, which are widely distributed around the world and are known as sensitive woody plants to water^[35]8. Populus is serves as a model system for biology research and analysis of genetics in forest trees^[36]9. Genomic information for the Populus genus provides a valuable resource for investigating genome features and the characterization of stress-related genes in woody plants. Drought stress can cause damage to cellular structures and macromolecules, decreasing the photosynthesis rate, the reduction of growth and biomass production, and sometimes even lead to tree death. Studies have shown that at the molecular level, several genes and multiple biological processes are involved in the adaptation of Populus to stress. For example, water deficit significantly changes expression profiles of numerous genes encoding transcription factors (TFs), plant phosphatases and hormones^[37]10. Additionally, several genes such as ProDH, JAZ3 and RAP2.6, LEA and ABA1 have been induced that play an important role in the response to drought stress in Populus^[38]11,[39]12. Investigating the transcriptome and discovering different drought-tolerant genes are greatly informative for understanding the mechanisms of plant stress tolerance. Different strategies have been employed to determine the molecular basis of Populus response to drought stress^[40]4,[41]13–[42]15. In recent years, high throughput gene expression technologies provide valuable information about transcriptome for research on genes and molecular mechanisms involved in stress. The detection of differentially expressed genes is one of the strategies for data analysis. A wide variety of methods is greatly performed on the dataset with small sample sizes for the screening of genes. Restricting the analysis to an individual study reduces statistical power which may lead to unreliable results. Meta-analysis is a powerful approach for integrating gene expression datasets and obtaining gene signatures more robust and accurate and through increasing sample size. Several studies have used meta-analysis for the identification of responsive genes environmental stress in plants^[43]16–[44]19. Recently, machine learning models are known as an attractive strategy to gain new biological insights. Generally, these methods with efficient dimensionality reduction of the data and feature selection methodologies can integrate transcriptome studies and as a result, find the significant features and underlying mechanisms^[45]20. Different algorithms such as support vector machine (SVM) and principal component analysis (PCA) have been employed in feature selection. These algorithms use various evaluation criteria for classifying data and scoring the input features. In the area of plant stress, the effective use of machine learning and feature selection models for selecting gene features is reported in rice^[46]21, Arabidopsis^[47]22, potato^[48]23, and maize^[49]24. However, at the transcriptome level, the machine learning algorithms for identifying key signatures related to environmental stress have not been applied in Populus. Despite the importance of differentially expressed genes identification, this strategy mostly focuses on the discovery of gene contents and suffers from exploring relationships among genes. Coexpression network analyses allow us to obtain a system-level view of gene–gene connections. Moreover, the biological role of genes with unknown functions can be predicted using methods based on coexpression^[50]25. There are different approaches including coexpression and differential coexpression analyses for constructing from gene expression data. The coexpression builds networks of genes based on the similarity between expression patterns of the gene pair across all the samples, where can be used for identifying co-activate and regulatory genes. While differential coexpression analysis discovers genes with altered coexpression partners under different states which can lead to defining gene groups affected by the change of state^[51]26. These analyses have been widely applied to better understand the molecular mechanisms in plant species^[52]16,[53]27. In this study, we employed large transcriptome data to gain a comprehensive view of drought stress response in Populus. We performed a meta-analysis in combination with machine learning techniques to identify drought-responsive genes. Additionally, through coexpression network analyses, we detected functional sets of genes associated with drought stress. Our findings can provide valuable information about the underlying mechanisms related to a drought stress response which can be used for future genetic improvement and breeding programs in tree species. Results To understand the transcriptional responses of Populus to drought stress, 13 microarray datasets (Supplementary Table [54]S1) consisting of 324 arrays in total were considered. After pre-processing and removing the batch effect, the normalized datasets were obtained for further downstream analysis. These datasets were divided into control and stress conditions in each study. From meta-analysis, we identified 3178 genes with differences in expression level between stress and control conditions (FDR < 0.05). Among DEGs, 2060 genes were up-regulated and 1118 genes were down-regulated in drought compared to normal conditions (Supplementary Table [55]S2, Fig. [56]1). The gene ontology (GO) terms showed that upregulated DEGs were most significantly enriched in response to stress, response to stimulus and cellular catabolic process while the downregulated DEGs were related to small molecule metabolic process, cellular ketone metabolic process and organic acid metabolic process (Supplementary Table [57]S3, Fig. [58]2A). The most important molecular function terms associated with up-regulated DEGs included phosphoprotein phosphatase activity and phosphoric ester hydrolase activity. Moreover, results of KEGG pathway analysis revealed that the up-regulated DEGs were mostly enriched in metabolic pathways, spliceosome and plant hormone signal transduction, while the down-regulated DEGs were significantly associated with phagosome and amino sugar and nucleotide sugar metabolism (Supplementary Table [59]S3). Figure 1. Figure 1 [60]Open in a new tab Gene expression comparison between normal and stress conditions. A volcano plot showing combined effect size (x-axis) and significance level (− log[10]-adjusted p-value; y-axis) for genes differentially expressed between normal and stress samples. The significant up and down-regulated genes are plotted as red dots. Figure 2. [61]Figure 2 [62]Open in a new tab (A) Top 10 gene ontology biological processes (GO-BP) terms of the up-regulated and the down-regulated differentially expressed genes (DEGs) (B) Distribution of transcription factor (TF) families in the differentially expressed genes (DEGs). The number of genes is shown for each transcription factor family on the y-axis. Due to the high dimensionality of microarray gene expression data, feature selection techniques as a branch of machine learning can be of great help in distinguishing genes with key biological functions. In this study, to discover the transcriptomic signature of response to drought stress in poplar species, eight feature selection models were implemented on the gene expression dataset in stress and normal groups. In total, 648 genes were identified as the most important features by at least one of the models (Supplementary Table [63]S4). Functional annotation showed that the feature genes enriched in response to abiotic stimulus. In addition, a lot of genes were related to secondary metabolic process. Interestingly, the seven methods selected Auxin response factor 2-like (PtpAffx.211941.1.S1_at) and PYL4-like (PtpAffx.31936.1.A1_at) as important features. Moreover, the meta-analysis and feature selection techniques had 232 common genes among which the genes, 6 genes are involved with the MAPK signaling pathway. Thus, our results indicate key role of MAPKs in Populus during drought stress. Here, we also identified 210 transcription factors from 35 different gene families responding to drought stress. Among them, the MYB, bZIP, NAC, ERF, and bHLH families comprise a high proportion of drought-responsive members (Fig. [64]2B). Cis-regulatory element analysis To discover the conserved motifs and consensus cis-regulatory elements (CREs) in the promoters of DEGs, we applied the MEME tool and identified 14 motifs with lengths ranging from 15 to 50aa (Table [65]1). We also compared the identified motifs with known motifs in the CIS-BP database. We found that eight of the motifs were matched to the known motifs related to various TFs, including ARID, C2H2, AHL, AP2/B3-like, BBR/BPC, MYB, and Sox (Supplementary Table [66]S5). GO term analysis for motifs revealed that motifs are involved in the regulation of transcription, transmembrane receptor protein tyrosine kinase signaling pathway and development (Table [67]1, Supplementary Table [68]S6). Interestingly, this analysis highlighted motif associated with response to salicylic acid stimulus (GO:0,009,751). Moreover, some motifs were involved in circadian rhythm and response to auxin stimulus (Supplementary Table [69]S6). Table 1 . The conserved motifs found in promoter regions of differentially expressed genes (DEGs) using the MEME analysis tool. Motif logo E-value Width Best match in CIS-BP Top significant BP terms identified by GOMO Motif 1 graphic file with name 41598_2023_27746_Figa_HTML.gif 7.10E−165 50 M06695 BP regulation of transcription Motif 2 graphic file with name 41598_2023_27746_Figb_HTML.gif 8.20E−143 29 M06719 BP regulation of transcription, DNA-dependent Motif 3 graphic file with name 41598_2023_27746_Figc_HTML.gif 1.20E−93 21 M06704, M06818, M06805, M06796 BP cytoskeleton organization BP regulation of transcription, DNA-dependent Motif 4 graphic file with name 41598_2023_27746_Figd_HTML.gif 1.00E−54 29 M01106, M06704, M01108, M06695 BP regulation of transcription Motif 5 graphic file with name 41598_2023_27746_Fige_HTML.gif 1.00E−38 37 Motif 6 graphic file with name 41598_2023_27746_Figf_HTML.gif 4.30E−31 29 Motif 7 graphic file with name 41598_2023_27746_Figg_HTML.gif 9.30E−36 29 M06704, M06805, M06695 BP regulation of transcription Motif 8 graphic file with name 41598_2023_27746_Figh_HTML.gif 2.90E−16 21 M06704, M06805, M06818 BP transmembrane receptor protein tyrosine kinase signaling pathway Motif 9 graphic file with name 41598_2023_27746_Figi_HTML.gif 1.30E−12 41 Motif 10 graphic file with name 41598_2023_27746_Figj_HTML.gif 1.60E−07 50 BP regulation of transcription Motif 11 graphic file with name 41598_2023_27746_Figk_HTML.gif 1.70E−06 15 M06719 BP leaf development BP regulation of transcription, DNA-dependent Motif 12 graphic file with name 41598_2023_27746_Figl_HTML.gif 5.00E−05 15 M06704 BP regulation of transcription Motif 13 graphic file with name 41598_2023_27746_Figm_HTML.gif 1.80E−04 20 BP regulation of transcription, DNA-dependent Motif 14 graphic file with name 41598_2023_27746_Fign_HTML.gif 2.10E−02 41 BP translation [70]Open in a new tab Identification of coexpression networks by WGCNA We constructed gene coexpression networks using WGCNA based on the gene expression data of DEGs derived from the meta-analysis and feature selection. A total of 5 modules were detected that range from a maximum of 1145 genes of the turquoise module to the 305 genes of the green module (Fig. [71]3). The biological process gene ontology (GO) analysis revealed that the turquoise module was mainly related to small molecule metabolic process and microtubule cytoskeleton organization. Moreover, the 41 genes in the turquoise module were mainly related to protein transport. The genes in the blue module were mainly enriched in RNA biosynthetic process, whereas the genes in the brown module were mainly enriched in RNA splicing and response to reactive oxygen species. The yellow and green modules were also significantly enriched with the mRNA metabolic process and regulation of RNA metabolic process, respectively. The KEGG pathway analysis showed that phagosome and fatty acid degradation were significantly associated with turquoise and blue modules, respectively. Interestingly, genes in turquoise and green modules were significantly enriched in plant hormone signal transduction (Supplementary Table [72]S7). Figure 3. [73]Figure 3 [74]Open in a new tab Weighted gene co-expression network analysis (WGCNA) of differentially expressed genes (DEGs). Cluster dendrogram showing co-expression modules identified by WGCNA. The modules are denoted in the colour bar. Identification of hub genes To discover the key genes associated with the modules, we identified genes with the most connections for each module that defended as hub genes (Supplementary Table [75]S8, [76]S9). The top 10 genes were screened from each module. For the turquoise module, the top hub was CSLC5-like (PtpAffx.214284.1.S1_at). The top hub genes in the blue module included amino acid transporter AVT6A-like (PtpAffx.7148.2.S1_a_at) and MDAR6-like (PtpAffx.27718.1.S1_s_at). In addition, GIGANTEA-like protein (PtpAffx.25624.1.A1_s_at), E3 ligase Rma1H1-like (PtpAffx.8582.3.S1_a_at) and protein phosphatase 4 (PtpAffx.935.1.S1_at) were also hub genes in the brown, green and yellow modules, respectively (Fig. [77]S1). Differential gene coexpression network analysis Generally, differential expression analysis investigates each gene separately and does not consider its relationship with other genes^[78]28. To detect changes in the patterns of expression of the genes between normal and stress conditions, we performed differential coexpression analysis and calculated Pearson's correlations of all gene pairs. We detected 28,863 gene pairs to have significant coexpression changes in the conditions at a cutoff of FDR < 0.01 (Supplementary Table [79]S10). Finally, a differential correlation network with 11,344 nodes (genes) was constructed. In differential correlations between two networks, nodes with more connected edges represent genes that have the most distinct expression patterns between the two conditions. Acetyltransferase NATA1-like (PtpAffx.18595.1.S1_at) and putative cytochrome P450 (PtpAffx.209025.1.S1_at) genes with 430 and 365 edges, respectively, were the top differentially connected genes (Fig. [80]4). Under normal conditions, 7705 of gene pairs, and under stress conditions, 8264 of gene pairs, respectively, showed a positive correlation that most of these genes related to response to abiotic stimulus and small molecule metabolic process. Additionally, 10,422 genes showed the opposite correlation direction between the two conditions. According to the GO analysis of the genes with different correlation directions, these genes are involved in response to stimulus and cellular catabolic process (Supplementary Table [81]S11). The differential coexpression analysis revealed that peptidylprolyl isomerase (PtpAffx.25142.1.A1_at) and thioredoxin-dependent peroxiredoxin (PtpAffx.3462.1.A1_s_at) had the strongest change in correlation where correlation strength decreased from 0.67 to − 0.33 after stress. Figure 4. [82]Figure 4 [83]Open in a new tab Differential co-expression network constructed using DiffCorr algorithm. Each node is a gene, violet nodes represent differentially expressed genes (DEGs) and red nodes represent genes with the most connectivity. Discussion Drought stress induces a huge range of responses that enable plants to survive in restriction of water availability. Populus is a woody plant model with sufficient genomic information, and due to the importance of this genus for biology studies, we collected comprehensive expression data for drought stress. We designed a pipeline with a systematic view for investigating transcriptional changes during drought stress to identify the genetic architecture in Populus. With meta-analysis and feature selection methods for expression data, we determined 3,594 genes that are differentially regulated by drought stress. Of these DEGs, 232 genes were identified by both methods and confirmed that many of the genes selected by feature selection methods were included among those selected by meta-analysis. The GO enrichment analysis suggests that DEGs are involved in a large range of biological functions. The majority of these GO terms were associations with metabolic process, response to stress, catabolic process. Both up-and down-regulated DEGs were enriched for the secondary metabolic process (Supplementary Table [84]S3). However, more genes that belonged to this class were primarily up-regulated. The accumulation of secondary metabolites is an important part of a plant’s protection strategy against environmental stress^[85]29. The accumulation of phenolic and flavonoid compounds has been reported in Populus under drought stress^[86]30. A survey of KEGG pathways detected 17 pathways (Supplementary Table [87]S3). The results revealed that the plant hormone signal transduction pathway was enriched. In this pathway, 17 DEGs such as STK (PTPAFFX.163630.1.S1_AT) and ABF1 (PTPAFFX.215822.1.S1_S_AT) were upregulated. Plant hormones act as central integrators of signaling cascades in adaptation to stress^[88]31. A previous study demonstrated that ABA, ethylene, brassinosteroid and jasmonate signal pathways may be involved in Populus tomentosa cold response processes^[89]32. The STK and ABF1 are associated with responses to drought stress in the plant^[90]33. The pathway analysis also demonstrated that drought stress had a significant effect on the carbon metabolism (Supplementary Table [91]S3), indicating that Populus may has a defensive response through conserving energy in stress conditions. We also found that the expression changes of six genes related to the MAPK signaling pathway occurred during drought stress, which suggested that the MAPKs have key roles in the drought response in Populus. Among DEGs, transcription factors such as bZIPs, NACs and ERFs, which are known to be involved in stresses were also detected. The bZIP-type TFs play major roles in the regulation of development and salt stress responses of Populus^[92]34. Additionally, 24 members of the MYB family had a difference in expression between conditions (Fig. [93]2B). Previous studies have demonstrated that MYBs participate in various processes and have important functions in plant stress tolerance^[94]35,[95]36. In Tamarix hispida, the overexpression of ThMYB13 enhances salt stress tolerance^[96]37. These results imply a regulatory role of MYBs in Populus for drought response. According to most feature selection models, two genes ARF2 -like (PtpAffx.211941.1.S1_at) and PYL4-like (PtpAffx.31936.1.A1_at) defined as the important features (Supplementary Table [97]S4) which are involved in several biological processes such as cellular metabolism and development regulation. ARF plays an important role in hormone regulation and plant stress response^[98]38. Recently, an ARF was discovered that is crucial for early developing xylem in Populus^[99]39. It suggests that the expression change of ARF2 may be a reaction mechanism against water stress by impressing xylem function. In addition, The PYL genes are abscisic acid receptors that have a role in regulating drought tolerance^[100]40. A previous study showed that overexpression of PYL4 in Arabidopsis resulted in drought resistance^[101]41. To obtain further insights into the mechanism of gene regulation during drought stress in Populus, motifs and transcription factor binding sites were discovered in promoter regions of DEGs. We found 14 distinct conserved motifs comprised of ARID, C2H2, AHL, AP2/B3-like, BBR/BPC, MYB, and Sox (Table [102]1, Supplementary Table [103]S5). The signal transduction, regulation of transcription and response to salicylic acid were important functional categories for these motifs (Supplementary Table [104]S6). Previous studies showed the role of MYB in various biological processes, especially responses to environmental stresses^[105]42. The MYB binding site-cis-element is necessary for the gene expression of drought-inducible genes^[106]43. In Populus, the expression changes of MYB genes are observed under different stresses. These genes may also participate in the regulation of circadian rhythms, senescence-related ABA signaling cascade and cell-fate determination^[107]44. Yang et al.^[108]45 in their study detected cis-acting elements involved in abscisic acid and stress responsiveness in upstream regions of PtrMYB053 and PtrMYB081. The AHL is a group of transcriptional regulators with a highly conserved structure in the plant kingdom and extensive information demonstrates function in plant growth and development as well as stress responses^[109]46. A previous study identified 37 AHL genes in Populus trichocarpa that expression levels of these genes were induced by drought stress^[110]47. To assess the interactions among DEGs, a WGCNA was performed that can help to better understand the mechanisms involved in drought stress. The DEGs were grouped into 5 modules (Fig. [111]3). Functional analysis results showed that the genes in the modules were mostly associated with molecule metabolic process, RNA biosynthetic process, RNA splicing, and stress response (Supplementary Table [112]S7). Some DEGs included in the turquoise module belonged to a generation of precursor metabolites and energy, and transport that reflecting their importance in response to stress. The saving of energy and maintenance of homeostasis under stress is considered an adaptive feature. During stress, the plant through adjustments of metabolism and gene expression shunt energy sources from processes of growth to adaptation^[113]48. The role of transport as a key regulatory molecule in sugar metabolism, abscisic acid signaling, stress responses and enhancing photosynthetic activity has been established^[114]49,[115]50. The turquoise module included an ABC-2 type transporter (PtpAffx.85773.1.S1_at) and TRH1 (PTP.121.1.S1_S_AT). ABC transporters are necessary for the internal detoxification of ions and act as a regulator for maintaining plant homeostasis under environmental stresses^[116]51. Previous studies have proposed that TRH1 is required for root hair elongation and participates in auxin translocation and accumulation^[117]52,[118]53. Genes in the brown module were involved in RNA splicing and response to reactive oxygen species. Environmental stresses damage DNA and affect its stability. Studies demonstrate the importance of alternative splicing in DNA repair for plants^[119]54. Among genes in response to reactive oxygen species, we identified two HSP15- like (PTPAFFX.53720.1.S1_AT) and APX3- like (PTP.4040.1.S1_S_AT). Heat shock proteins are ubiquitous that accumulate in response to abiotic stress^[120]55,[121]56. In Populus euphratica, HSPs had high transcript levels after stress^[122]57. In our results, HSP15-like (PTPAFFX.53720.1.S1_AT) significantly induced under drought stress. The yellow module genes enriched for vacuole organization. Vacuoles play a major function in plant trafficking pathways and response to environmental signaling and participate in oxidative-stress resistance^[123]58. We also identified the gene with the highest connectivity (hubs) within each module (Supplementary Table [124]S9) which are the key components of the networks. We found that the hub genes were enriched in the response to misfolded protein. The highly connected gene in the turquoise module, including CSLC5-like (PtpAffx.214284.1.S1_at) has been reported to be related to developing cell walls^[125]59. It has been shown that activation of CSLC5-like is essential for protection from adverse environmental conditions. The hub gene of the blue module (protein phosphatase 4) was involved in a variety of cellular functions^[126]60, suggesting a change in phosphorylation status is one of the most important regulatory ways to stress responses in all Populus. Interestingly, amino acid transporter AVT6A-like (PtpAffx.7148.2.S1_a_at) and PLATZ1 (PtpAffx.27718.1.S1_s_at) were the most interconnected genes in the blue module. PLATZ encodes a class of plant-specific zinc-finger transcription factor that functions in plant growth, development and abiotic stresses through ABA^[127]61. Among the hub genes in the blue module, uspA-like (Ptp.4430.1.S1_a_at) is of interest. Universal stress proteins (USP) can arrest cell growth and constitute a biological defense mechanism in stress conditions^[128]62. In particular, GIGANTEA-like protein (PtpAffx.25624.1.A1_s_at) was identified as one of the hub genes in brown module. This gene functions in developmental stage transitions and stress responses. Recent findings exhibit that GIGANTEA-like protein mediates circadian rhythm and responses to stress in poplar^[129]63. To identify significant changes in the coexpression structure and provide an overview of gene expression interactions between normal and stress conditions, we performed differential gene coexpression analysis. We identified over 28,000 gene pairs whose expression levels were significantly correlated. We obtained a network with 11,344 genes (Fig. [130]4, Supplementary Table [131]S10) which among them, 1460 genes were DEGs. Genes Acetyltransferase NATA1-like (PtpAffx.18595.1.S1_at) and putative cytochrome P450 (PtpAffx.209025.1.S1_at) had most connections in the network. NATA1 has implicated the regulatory function of polyamine acetylation and plays a function in the complex cross-talk between salicylate and jasmonate signaling^[132]64. A previous study indicated that NATA1 is also involved in ABA-mediated stomatal closure in A. thaliana^[133]65. Cytochrome P450s are also one of the largest enzyme families and play an essential role in stress response. These key enzymes have also been detected in Populus trichocarpa that participate in the volatile formation^[134]66. The results also indicated that the coexpression between peptidylprolyl isomerase (PtpAffx.25142.1.A1_at) and thioredoxin-dependent peroxiredoxin (PtpAffx.3462.1.A1_s_at) genes was extensively changed after stress. The peptidylprolyl isomerase was found associated with the folding of newly synthesized proteins in the cellular processes^[135]67. Protein folding is a fundamental process for cell survival under adverse environmental conditions^[136]68. It was suggested that peptidylprolyl isomerases have a role in abiotic stress response in plants^[137]69. In addition, thioredoxin-dependent peroxiredoxin as part of antioxidant defenses and redox signaling is essential for cellular response against oxidative stress^[138]70. Previous observation implies a possible link between mitochondrial thioredoxin system and peptidyl prolyl cis–trans isomerase activity^[139]71. Collectively, in this study, we implemented a strategy for the detection of genes and underlying mechanisms involved in response to drought stress in Populus. Based on the meta-analysis, feature selection algorithms and coexpression analysis, we highlighted genes and pathways that play key roles during drought stress. The findings revealed that transcriptional changes under drought stress can be extremely varied in Populus. These results showed that drought stress induces a complex of hormone signaling pathways. In addition, TFs such as MYBs and bZIPs are key for Populus response. In particular, our analysis suggests that ARF2-like (PtpAffx.211941.1.S1_at) and PYL4-like (PtpAffx.31936.1.A1_at) genes can be potential candidates for screening and breeding purposes in Populus. The results obtained can pave the way for understanding the molecular basis of drought response and for further investigations in Populus. Methods Data collection and preprocessing The array expression datasets were retrieved from the Gene Expression Omnibus (GEO) ([140]https://www.ncbi.nlm.nih.gov/gds) and the ArrayExpress ([141]https://www.ebi.ac.uk/arrayexpress) (Supplementary Table [142]S1). The data consisted of 13 studies that generated from the Affymetrix GeneChip Poplar Genome Array. The chip description file (CDF) and annotation file were downloaded from the Affymetrix site. The robust multichip average (RMA) algorithm was used to background corrected and normalized gene expression data^[143]72 from the Affy R package. Moreover, genes that had low mean and variation in expression values were filtered out. Finally, an empirical Bayes method was performed to correct non-biological differences and remove batch effects from gene expression datasets using ComBat function in the SVA R package^[144]73. Differential gene expression analysis To identify upregulated and downregulated differentially expressed genes (DEGs) among normal and drought samples, meta-analysis was performed using the effect size combination method in the metaMA R package^[145]74. Genes with an FDR ≤ 0.05 were defined as DEGs. Feature selection algorithms were employed to reduce the dimensionality of the expression dataset and identify the gene expression features between normal and stress conditions. We implemented various attribute weighting algorithms, including Support Vector Machine (SVM), Chi Squared, Information Gain, Information Gain Ratio, Deviation, Gini Index, Uncertainty, Relief, and PCA on corrected data to identify the most important genes^[146]75. The feature selection was performed using RapidMiner Studio software (version 7.0.001). Functional analysis To investigate the functions related to the DEGs, we conducted the Gene Ontology (GO) using the AgriGO ([147]http://systemsbiology.cau.edu.cn/agriGOv2/). The web server REVIGO was used to remove redundant enriched GO terms. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis was performed by gProfiler tool ([148]https://biit.cs.ut.ee/gprofiler/gost). GO terms and KEGG pathways with an adjusted P-value < 0.05 were defined as significant. In addition, transcription factors (TFs) families were obtained by BLASTX search against Populus trichocarpa transcription factors ([149]http://planttfdb.cbi.pku.edu) with a cut-off of E ≤ 10^−6. Cis-elements analysis To discover conserved motifs and investigate cis-elements in DEGs, the 1kbp upstream sequences of genes were extracted from Ensembl Plants ([150]http://plants.ensembl.org) and then were submitted to a search by the MEME program ([151]http://meme-suite.org/). The Tomtom v 5.4.1 tool ([152]http://meme-suite.org/tools/tomtom)^[153]76 was employed for the identification of known TF binding sites based on the CIS-BP database. The GoMo tool ([154]http://meme-suite.org/tools/gomo) was also used for the biological function of motifs. Coexpression network analysis and identification of potential hub genes The weighted gene correlation network analysis (WGCNA)^[155]77 was used to construct the co-expression network and identify gene modules. First, the gene expression similarity matrix was built based on normalized gene expression data of the DEGs by determining Pearson’s correlation coefficient (PCC) between gene pairs. Subsequently, the gene expression similarity matrix was transformed into an adjacency matrix using an appropriate soft threshold value (β) of 10. Then, the adjacency matrix was converted into a topological matrix by the topological overlap matrix (TOM) by the TOM similarity algorithm. Finally, the dynamic tree-cut algorithm method was used to identify the modules of co-expressed DEGs with the maxBlockSize of 6,000, minModuleSize of 30 and mergeCutHeight of 0.2. In addition, according to gene intramodular connectivity, genes with a high kME were considered hub genes in each module. The gProfiler tool was used to perform GO and KEGG functional enrichment analysis of the significant modules. We used the cystoscope to visualize the networks. Differential coexpression analysis To identify significant differential correlations between normal and stress conditions, a differential coexpression gene network was constructed using the DiffCorr R package^[156]78. First, the expression data were divided into normal and stress groups, and then genes with zero expression or low variation across samples were removed that resulting in 30,865 genes selected for further analysis. Pearson correlations were calculated between each pair of genes for each group and each correlation value was transformed to Z score using Fisher’s Z transformation. Next, the difference in correlation of a gene pair between normal and stress conditions was calculated by the equation: [MATH: ΔZ=ZS -ZN1nS-3+1nN-3 :MATH] where n[N] and n[S] are respectively the numbers of samples under normal and stress conditions. Finally, Fisher’s z-test was applied to detect significant differential correlations. Differential correlations with a FDR < 0.01 were considered significant^[157]79. The differential coexpression network was visualized by Cytoscape. A schematic workflow summarizing the major steps of this study is shown in Fig. [158]5. Figure 5. [159]Figure 5 [160]Open in a new tab A schematic overview of the multistep strategy for understanding aspects of the response of Populus to drought stress. Supplementary Information [161]Supplementary Information 1.^ (168.6KB, pdf) [162]Supplementary Information 2.^ (1.1MB, xlsx) Author contributions A.T. designed the research; A.T. and S.A. collected the data; A.T., S.A. and A.N. analyzed the data; and A.T., S.A. and A.N. wrote the paper. All authors read and approved the manuscript. Data availability All data generated or analysed during this study are available from the corresponding authors on reasonable request. Competing interests The authors declare no competing interests. Footnotes The original online version of this article was revised: The original version of this Article contained an error in the order of the Figures. Figures 1, 2, 3, 4 and 5 were published as Figure 5, 2, 3, 4 and 1. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Change history 2/22/2023 A Correction to this paper has been published: 10.1038/s41598-023-30033-z Contributor Information Ahmad Tahmasebi, Email: atahmasebi1@gmail.com. Ali Niazi, Email: niazi@shirazu.ac.ir. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-023-27746-6. References