Abstract Cucumber (Cucumis sativus L.) is a crucial vegetable crop, requiring significant nitrogen fertilizer inputs. However, excessive nitrogen application not only impairs growth but also poses severe environmental risks. Thus, enhancing nitrogen use efficiency (NUE) in cucumber is imperative. For the identification of genes associated with NUE in cucumber, roots of high NUE and low NUE lines were analyzed under high nitrogen conditions. Using transcriptome sequencing through WGCNA, a total of 15,180 genes were categorized into 35 co-expression modules, with 5 modules being highly correlated with NUE. Based on differential expression within the five modules and the results of GO and KEGG enrichment analyses, 25 genes were identified as potentially related to NUE. Among these, CsaV4_1G002492 (GLR22), CsaV4_2G003460 (GLR35), CsaV4_3G000307 (NRT1.1), and CsaV4_7G001709 (UPS2) were homologous to genes in Arabidopsis known to directly participate in NUE related process. These four genes were chosen as key genes for further analysis. qRT-PCR analysis revealed that CsaV4_3G000307 and CsaV4_7G001709 were more active during the early stages of the high nitrogen treatment in the high NUE line. Conversely, CsaV4_1G002492 and CsaV4_2G003460 were more active in the low NUE line. Using transcriptomic analysis, a frameshift INDEL mutation was observed in CsaV4_3G000307 in the low NUE line, which impacted the compactness of the protein structure, potentially altering its function. Analysis of protein interactions of these four key genes predicted some potential interaction networks. This research offers critical insights into the genetic factors influencing NUE in cucumber, presenting potential targets for genetic modification or breeding programs. Keywords: cucumber, NUE, transcriptomic sequencing, WGCNA, key genes 1. Introduction Cucumber is one of the top ten vegetables globally, widely cultivated across various regions, and plays a significant role as an economic crop with substantial impacts on global agriculture [[38]1]. According to the Food and Agriculture Organization (FAO) ([39]http://agris.fao.org/), the total production area of cucumbers worldwide was 2.174 million hectares with a total yield of 94.718 million tons in 2022. Cucumber is a crop with high nutrient demands, particularly with a significant requirement for nitrogen fertilizers [[40]2]. The application of nitrogen fertilizers significantly increases cucumber yield and is crucial for achieving high productivity [[41]3,[42]4]. However, excessive application of nitrogen fertilizer inhibits growth of cucumbers and increase economic costs [[43]5,[44]6,[45]7]. With increasing nitrogen fertilizer application, the NUE of cucumbers tends to decline [[46]8,[47]9]. In Chinese agricultural production, the average utilization rate of nitrogen fertilizers is only 30–35%, meaning that approximately 65–70% of the nitrogen is not effectively used by the crops [[48]10]. This results not only in substantial economic losses but also in severe environmental issues, such as water eutrophication, groundwater pollution, greenhouse gas emissions, and soil acidification [[49]11,[50]12,[51]13,[52]14]. Consequently, optimizing nitrogen fertilizer application and enhancing NUE are imperative for sustainable agricultural development. However, the processes of nitrogen response and efficiency are governed by intricate networks involving metabolic, developmental, and environmental signals. Considerations of nitrogen absorption, transport, assimilation, and extensive signaling networks are necessary for enhancing NUE. Robust modern strategies are employed for identifying key genes. Chu et al. used genomic positional mapping and pinpointed a gene critical for high nitrate utilization, NRT1.1b. A SNP mutation in this gene has been demonstrated to influence nitrogen use efficiency in rice [[53]15]. Zhou et al., through comparative transcriptomic and metabolomic analysis of maize and rice leaf tissues, pinpointed the gene OsDREB1C, which was involved in augmenting photosynthesis and NUE [[54]16]. Gao et al. demonstrated that analysis of the expression profile of CsNRT2.1 showed that CsNRT2.1 is a high affinity nitrate transporter [[55]17]. Ming et al. analyzed the expression patterns and subcellular localization of the GS1 protein under different nitrogen conditions, confirming its role in enhancing low nitrogen tolerance in cucumber [[56]18]. Transcriptomic analysis through WGCNA is considered a powerful means of mining key genes [[57]19]. WGCNA categorizes a multitude of genes into different modules based on gene expression changes, reducing analytical complexity and associating co-expression patterns within modules with phenotypic differences, thus identifying gene modules and key genes that play crucial roles during phenotypic variations [[58]20,[59]21,[60]22,[61]23]. This method has been extensively applied to unravel molecular mechanisms in various plants, including cucumbers [[62]24]. Examples include identifying key genes responsible for resistance against Podosphaera xanthii in cucumber cultivars [[63]25], pinpointing genes crucial for phenylpropanoid biosynthesis during cucumber storage [[64]26], determining central genes for salt response [[65]27], and identifying genes significantly associated with flavonoid biosynthesis in cucumbers [[66]28]. Moreover, SNPs and INDELs are genetic variations that can significantly influence phenotypic traits [[67]15,[68]29,[69]30]. The accurate identification of these mutations is crucial for genetic research and breeding programs. The presence of an SNP or INDEL may lead to phenotypic changes, playing a significant role in genetic breeding [[70]31]. For instance, SNPs and INDELs have been used to assist in selecting genes resistant to powdery mildew [[71]32], genes resistant to stem blight [[72]33], genes associated with germination [[73]34], and key genes for parthenocarpy in cucumbers [[74]35]. Previously, we have established laboratory techniques for identifying phenotypes under varying nitrogen concentrations during the seedling stage, and yielded materials with extreme genotypes. Five candidate genes associated with cucumber tolerance to low nitrogen were identified through genome-wide association studies [[75]36]. In this study, two extreme phenotypic cucumber lines under high nitrogen conditions were used. Transcriptomic sequencing of root tissues at seven time points after high nitrogen treatment was performed. Using the WGCNA method, a co-expression network of genes related to nitrogen use efficiency under high nitrogen conditions was constructed. Specific modules responding to NUE in cucumber were selected, and their functions were annotated and analyzed to mine related key genes and interaction networks. This analysis offers new insights into the regulation mechanisms of NUE in cucumber. 2. Materials and Methods 2.1. Plant Materials and High Nitrogen Treatment Two extreme phenotypic cucumber lines, L2 with high NUE and L8 with low NUE, were used as plant material, which were obtained through previous screening of 112 cucumber lines of different cucumber types at Cucumber Research Institute of Tianjin Academy of Agricultural Sciences, China. L2 and L8 both belong to the South China cucumber type. Seeds were initially soaked in warm water at 55 °C for 15 min, followed by immersion in room temperature water for 3.5 h. Subsequently, seeds were placed in a germination culture box maintained at 28 °C for 2–3 days to allow sprout development. The sprouts were affixed to sponge plugs and then were transferred to a plastic box and cultivated in a half-strength Hoagland nutrient solution (nitrate concentration of 7 mmol/L) at a pH of 6.2. The nutrient solution was refreshed every three days and aeration was conducted twice a day. Once seedlings reached the one true leaf stage, they were transferred to 700 mL black hydroponic plastic buckets containing 500 mL of half-strength Hoagland nutrient solution. When seedlings reached the two true leaf stage, they were subjected to a higher nitrate treatment with a nitrogen concentration of 20 mmol/L. The Hoagland nutrient solution recipe is available in the [76]Supplementary Table S1. Root sampling was conducted at 0 h, 2 h, 4 h, 8 h, 12 h, 24 h, and 30 h of cultivation, with at least three biological replicates per treatment. 2.2. RNA Extraction and Transcriptome Sequencing Total RNA was extracted from root tissues using the standard extraction method provided by Novogene Co. Ltd. The quality and integrity of RNA were assessed with an Agilent 2100 bioanalyzer. Libraries for transcriptomic sequencing were prepared using the standard NEB library construction kit [[77]37] and were sequenced on the Illumina platform to generate paired-end reads. Raw sequence reads were processed and aligned to the cucumber reference V4 genome ([78]http://www.cucumberdb.com/#/home, accessed on 25 September 2024) utilizing HISAT2 [[79]38]. 2.3. Weighted Gene Co-Expression Network Analysis Transcriptomic data from 14 samples were analyzed using the Novamagic cloud platform with a soft threshold set at 10. Genes were hierarchically clustered based on their Topological Overlap Matrix (TOM) dissimilarity [[80]19], and modules were identified using dynamic tree cutting techniques. Each module contained a minimum of 30 genes. Modules that exhibited highly similar expression profiles (dissimilarity less than 0.25) were merged to minimize redundancy and enhance the robustness of the findings. 2.4. Differential Expression Genes Analysis Differentially expressed genes (DEGs) between the high NUE and low NUE lines at each time point were identified using the edgeR [[81]39]. The p-values were adjusted using the Benjamini and Hochberg method. Corrected p-value of 0.05 and absolute log[2](foldchange) of 0.8 were set modules as the thresholds for significantly differential expression. A DEGs Venn diagram was created using an R 4.4.1 package. 2.5. Identification and Enrichment Analysis of the Core Modules Identification and enrichment analysis of the core modules were conducted by calculating the Pearson correlation coefficients and the significance of the p-values for the module eigengenes in relation to L2 and L8. Modules were selected as core if they exhibited a correlation coefficient (r) greater than 0.6 and a p-value less than 0.05. Differentially expressed genes (DEGs) within these modules were then analyzed using Gene Ontology (GO, [82]http://www.geneontology.org/, accessed on 11 October 2024) and the Kyoto Encyclopedia of Genes and Genomes (KEGG, [83]http://www.genome.jp/kegg/, accessed on 11 October 2024) to elucidate the biological functions and pathways predominantly represented. Bar charts were utilized to display the top 30 significant GO enrichment analysis results. Bubble charts were employed to show the top 20 KEGG pathway analysis results. 2.6. Real-Time Quantitative PCR Analysis Primers were designed using NCBI ([84]https://www.ncbi.nlm.nih.gov/, accessed on 1 October 2024) and synthesized by Sangon Biotech Co., Ltd. (Shanghai, China). Detailed primer sequence information is available in [85]Supplementary Table S2. RNA extraction was performed using the NG312 kit from Beijing LABLEAD Company, Beijing, China, and cDNA was synthesized with a First-strand cDNA Synthesis Mix kit from the same company. All procedures were carried out according to the instructions provided. The qRT-PCR was conducted using the 2x Realab Green PCR Fast mixture reagent from Beijing LABLEAD Company, Beijing, China in a 20 µL system as per the manufacturer’s instructions. The protocol included an initial denaturation at 95 °C for 30 s, followed by 40 cycles of denaturation at 95 °C for 10 s and annealing at 60 °C for 30 s. A melting curve analysis was performed from 60 to 90 °C in 10 s intervals with a ∆T of 0.5 °C. An actin gene served as the internal control for data standardization, with relative expression calculated using the 2^−∆∆Ct method. Variance analysis was conducted using R with a t-test to determine significance. 2.7. Transcriptomic SNPs and INDELs Analysis SNPs and INDELs were analyzed using the Novamagic cloud platform. Sequence logos were generated to visualize these variations using the WebLogo tool, providing a graphical representation of sequence conservation and variation at identified loci [[86]40,[87]41]. 2.8. Protein Structure and Interaction Prediction Protein structure prediction was performed using AlphaFold3, with the default settings for speed and copies parameters [[88]42]. The detailed protein sequence is in [89]Supplementary File S1. STRING was used to predict protein interaction, employing default settings ([90]https://cn.string-db.org/, accessed on 26 October 2024). 3. Results 3.1. Identification of Gene Co-Expression Modules Through WGCNA An optimal soft-threshold power was selected for the construction of the gene co-expression network by assessing the scale-free topology. Various soft-threshold powers, ranging from 1 to 20, were evaluated ([91]Figure 1). A power of 10 was determined to be optimal, corresponding to a fit index that exceeded 0.8 and ensuring stabilization and the attainment of a scale-free topology within the network. Figure 1. [92]Figure 1 [93]Open in a new tab Soft-threshold power selection for gene co-expression network construction. Left panel: red line represents scale-free topology fit index across varying soft-threshold powers. Right panel: mean connectivity changes with increasing soft-threshold power. Initial clustering based on the TOM revealed an extensive array of color blocks, reflecting a complex pattern of gene co-expressions that complicated further analysis ([94]Supplementary Figure S1). The clustering led to the identification of 35 gene co-expression modules, which varied significantly in size. The smallest module, labeled grey, contained 48 genes, while the largest, labeled turquoise, included over 4000 genes, demonstrating the diversity of gene co-expression across the sampled conditions. 3.2. Correlation Between the Modules and NUE To elucidate the roles of the 35 identified modules in the two lines, associations were explored by calculating Pearson correlation coefficients between module eigengenes and the L2 (high NUE)/ L8 (low NUE) lines. As shown in [95]Figure 2, a detailed heatmap illustrates the strength and significance of these correlations, highlighting each module’s potential contribution to NUE. Five out of the 35 modules identified were found to be significantly correlated with the extreme NUE lines. The purple (correlation coefficient 0.99), skyblue (correlation coefficient 0.95), cyan (correlation coefficient 0.72), darkorange (correlation coefficient 0.71), and lightcyan (correlation coefficient 0.68) modules, contain 241, 86, 200, 101, and 195 genes, respectively. Figure 2. [96]Figure 2 [97]Open in a new tab Correlation heatmap between gene co-expression modules and L2/L8 lines. Red indicates positive correlations and blue denotes negative correlations. Cells contain Pearson correlation coefficients and p-values in parentheses. Modules such as darkorange and skyblue, which show positive correlations, were considered to be active in the high NUE line, whereas modules like purple, cyan, and lightcyan, showing negative correlations, were considered to be active in the low NUE line, all of which were targeted for further comprehensive functional studies. 3.3. Comprehensive Analysis of the Core Modules 3.3.1. Analysis of the Purple Module The purple module, encompassing 241 genes, displayed distinct expression patterns across the seven different time points in response to high nitrogen. [98]Figure 3 illustrates that eigengene expressions differ significantly between the high NUE and low NUE cucumber lines. A peak in expression at 4 h in the high NUE line corresponded to the lowest expression in the low NUE line, highlighting significant differences in high nitrogen response. Figure 3. [99]Figure 3 [100]Open in a new tab Heatmap of the gene expression of the eigengenes in the purple module. Color ranges from green (low) to red (high). The maximum number of DEGs within this module was observed at 4 h, totaling 145 genes ([101]Supplementary Figure S2). The time points at 2 and 8 h each recorded 140 DEGs, while the lowest count of 129 DEGs occurred at 24 h. Seventy-four genes were differentially expressed at all time points, underlining their possible vital role in nitrogen metabolism. GO ([102]Supplementary Figure S3 and Table S3) and KEGG enrichment analyses ([103]Supplementary Figure S4 and Table S4), elucidated the biological functions and pathways of these DEGs. The enrichment analysis highlighted several crucial biological processes and molecular functions related to NUE. Key NUE-related functions and pathways included glycine, serine, and threonine metabolism, aminoacyl-tRNA biosynthesis, and arginine biosynthesis and metabolism. Fifteen genes were identified, with nine exhibiting differential expression at four or more time points. These nine genes were selected as candidate genes for further analysis. 3.3.2. Analysis of the Lightcyan Module The lightcyan module, encompassing 195 genes, exhibited diverse expression patterns across the seven time points ([104]Figure 4). Significant differences in expression between the high NUE and low NUE lines were observed, which was particularly notable at 4 h where the low NUE line displayed remarkable peak eigengene expressions in contrast to the lower expressions in the high NUE line. Figure 4. [105]Figure 4 [106]Open in a new tab Gene expression of the eigengenes in the lightcyan module. Color ranges from green (low) to red (high). Within this module, the highest number of DEGs was observed at 4 h, totaling 147 genes ([107]Supplementary Figure S5). This peak was followed by 98 DEGs at 24 h, with the lowest count of 29 DEGs at 2 h. While DEG counts at time points 0, 8, 12, and 30 h were recorded at 35, 63, 82, and 55, respectively. Notably, 74 genes were differentially expressed at four or more time points, with 7 genes differentially expressed across all time points. GO enrichment identified 11 genes with NUE-related functions ([108]Supplementary Figure S6 and Table S5) and KEGG pathways enrichment identified 10 genes ([109]Supplementary Figure S7 and Table S6). After excluding duplicates, the enrichment analysis highlighted 19 genes with several crucial biological processes and molecular functions related to NUE, including arginine and proline metabolism, as well as glycine, serine, and threonine metabolism. Among these, 10 genes exhibited differential expression at four or more time points, emphasizing their potential roles in nitrogen metabolism. These 10 genes were selected as candidate genes for further analysis. 3.3.3. Analysis of the Darkorange Module The darkorange module encompassed 101 genes. Notably higher eigengene expressions were observed at 0 and 2 h in the high NUE line, marking these periods as critical for gene activity within this module. In contrast, the low NUE line exhibited negative values at these times ([110]Figure 5). Figure 5. [111]Figure 5 [112]Open in a new tab Gene expression of the eigengenes in the darkorange module. Color ranges from green (low) to red (high). The number of DEGs peaked at 0 h with 64 genes, followed by 58 at 2 h ([113]Supplementary Figure S8). The lowest differential expression was noted at 4 h with 23 genes, with consistent counts of 30 genes observed at both 8 and 12 h, and varying counts of 29 and 39 genes at 24 and 30 h, respectively. A total of 35 genes were identified as differentially expressed at four or more time points, where 12 exhibited differential expression across all seven time points. GO enrichment identified 10 genes with NUE-related functions ([114]Supplementary Figure S9 and Table S7) and KEGG pathways enrichment identified 5 genes ([115]Supplementary Figure S10 and Table S8). These analyses highlighted crucial pathways such as nitrogen metabolism and arginine biosynthesis, which were essential for understanding nitrogen utilization in plants. After excluding duplicates, 14 genes critical for nitrogen utilization were identified, where 5 genes showed differential expression across four or more time points, underscoring their significant roles in nitrogen metabolism. These five genes were selected as candidate genes from this module for further analysis. 3.3.4. The Analysis of the Cyan Module The cyan module comprised 200 genes. Marked differences in eigengene expressions between the high NUE and low NUE lines were particularly prominent at 4 h, where the low NUE line showed a peak in positive values, which is in stark contrast to the negative values observed in the high NUE line ([116]Figure 6). Figure 6. [117]Figure 6 [118]Open in a new tab Gene expression of the eigengenes in the cyan module. Color ranges from green (low) to red (high). The module exhibited a peak in differential expression at 4 h with 62 genes, and a secondary peak at 24 h with 47 genes ([119]Supplementary Figure S11). The lowest differential expression was noted at 2 h with only 16 DEGs, and variable DEG counts were observed at other time points—29 at 0 h, 22 at 8 h, 35 at 12 h, and 36 at 30 h. Of these, 17 genes were identified as differentially expressed at four or more time points, with one gene expressed across all seven time points. GO ([120]Supplementary Figure S12 and Table S9) and KEGG pathways ([121]Supplementary Figure S13 and Table S10) revealed the involvement of these genes in crucial metabolic pathways, including amino acid biosynthesis and arginine and proline metabolism. These pathways played vital roles in the plant metabolic framework, indirectly influencing nitrogen assimilation and utilization. However, there were no genes differentially expressed across four or more time points and therefore no genes were selected as candidate genes. 3.3.5. The Analysis of the Skyblue Module The skyblue module comprised 86 genes. The expression heatmap revealed no distinct time-specific variations within this module ([122]Figure 7). A peak in differential expression was observed at 12 h involving 31 genes, closely followed by 29 genes at 4 h ([123]Supplementary Figure S14). The lowest number of differentially expressed genes, totaling 20, was recorded at 8 h. Counts of differentially expressed genes at 0 and 30 h included 28 genes each, with 25 and 26 genes at 2 and 24 h, respectively. Importantly, 17 genes in this module were identified as differentially expressed across four or more time points, while 6 genes differentially expressed at all time points. Figure 7. [124]Figure 7 [125]Open in a new tab Gene expression of the eigengenes in the skyblue module. Color ranges from green (low) to red (high). GO enrichment identified two genes with NUE-related functions ([126]Supplementary Figure S15 and Table S11) and KEGG pathways enrichment identified one gene ([127]Supplementary Figure S16 and Table S12). Crucial functions and pathways included nitrogen compound transport and arginine and proline metabolism. Among these genes, only one gene showed differential expression across multiple time points. This one gene was selected as a candidate gene from this module for further analysis. 3.4. Information and Functional Annotation of 25 Selected Genes [128]Table 1 provides detailed information about the candidate genes from the five modules, including gene descriptions, species affiliations, and their respective expression modules. These genes have been studied across multiple species, such as in Arabidopsis thaliana, Oryza sativa, Cucumis melo, Cucumis sativus, and so on. We further screened 25 genes by reviewing related studies to determine whether the genes or their gene family members had been previously studied for functions related to NUE. The results identified genes directly related to NUE such as CsaV4_1G002492 (GLR22) and CsaV4_2G003460 (GLR35), which are glutamate receptors potentially involved in nitrogen signal transduction; CsaV4_3G000307 (NRT1.1/NPF6.3), which is involved in nitrogen absorption and transport; and CsaV4_7G001709 (UPS2), which is involved in nitrogen recycling. These four genes were selected as key genes for further analysis. Table 1. Gene information and functional annotation of the 25 selected genes. Gene ID Gene Description Organism Species Module Related References