Abstract

   Cucumber (Cucumis sativus L.) is a crucial vegetable crop, requiring
   significant nitrogen fertilizer inputs. However, excessive nitrogen
   application not only impairs growth but also poses severe environmental
   risks. Thus, enhancing nitrogen use efficiency (NUE) in cucumber is
   imperative. For the identification of genes associated with NUE in
   cucumber, roots of high NUE and low NUE lines were analyzed under high
   nitrogen conditions. Using transcriptome sequencing through WGCNA, a
   total of 15,180 genes were categorized into 35 co-expression modules,
   with 5 modules being highly correlated with NUE. Based on differential
   expression within the five modules and the results of GO and KEGG
   enrichment analyses, 25 genes were identified as potentially related to
   NUE. Among these, CsaV4_1G002492 (GLR22), CsaV4_2G003460 (GLR35),
   CsaV4_3G000307 (NRT1.1), and CsaV4_7G001709 (UPS2) were homologous to
   genes in Arabidopsis known to directly participate in NUE related
   process. These four genes were chosen as key genes for further
   analysis. qRT-PCR analysis revealed that CsaV4_3G000307 and
   CsaV4_7G001709 were more active during the early stages of the high
   nitrogen treatment in the high NUE line. Conversely, CsaV4_1G002492 and
   CsaV4_2G003460 were more active in the low NUE line. Using
   transcriptomic analysis, a frameshift INDEL mutation was observed in
   CsaV4_3G000307 in the low NUE line, which impacted the compactness of
   the protein structure, potentially altering its function. Analysis of
   protein interactions of these four key genes predicted some potential
   interaction networks. This research offers critical insights into the
   genetic factors influencing NUE in cucumber, presenting potential
   targets for genetic modification or breeding programs.

   Keywords: cucumber, NUE, transcriptomic sequencing, WGCNA, key genes

1. Introduction

   Cucumber is one of the top ten vegetables globally, widely cultivated
   across various regions, and plays a significant role as an economic
   crop with substantial impacts on global agriculture [[38]1]. According
   to the Food and Agriculture Organization (FAO)
   ([39]http://agris.fao.org/), the total production area of cucumbers
   worldwide was 2.174 million hectares with a total yield of 94.718
   million tons in 2022. Cucumber is a crop with high nutrient demands,
   particularly with a significant requirement for nitrogen fertilizers
   [[40]2]. The application of nitrogen fertilizers significantly
   increases cucumber yield and is crucial for achieving high productivity
   [[41]3,[42]4]. However, excessive application of nitrogen fertilizer
   inhibits growth of cucumbers and increase economic costs
   [[43]5,[44]6,[45]7]. With increasing nitrogen fertilizer application,
   the NUE of cucumbers tends to decline [[46]8,[47]9]. In Chinese
   agricultural production, the average utilization rate of nitrogen
   fertilizers is only 30–35%, meaning that approximately 65–70% of the
   nitrogen is not effectively used by the crops [[48]10]. This results
   not only in substantial economic losses but also in severe
   environmental issues, such as water eutrophication, groundwater
   pollution, greenhouse gas emissions, and soil acidification
   [[49]11,[50]12,[51]13,[52]14]. Consequently, optimizing nitrogen
   fertilizer application and enhancing NUE are imperative for sustainable
   agricultural development.

   However, the processes of nitrogen response and efficiency are governed
   by intricate networks involving metabolic, developmental, and
   environmental signals. Considerations of nitrogen absorption,
   transport, assimilation, and extensive signaling networks are necessary
   for enhancing NUE. Robust modern strategies are employed for
   identifying key genes. Chu et al. used genomic positional mapping and
   pinpointed a gene critical for high nitrate utilization, NRT1.1b. A SNP
   mutation in this gene has been demonstrated to influence nitrogen use
   efficiency in rice [[53]15]. Zhou et al., through comparative
   transcriptomic and metabolomic analysis of maize and rice leaf tissues,
   pinpointed the gene OsDREB1C, which was involved in augmenting
   photosynthesis and NUE [[54]16]. Gao et al. demonstrated that analysis
   of the expression profile of CsNRT2.1 showed that CsNRT2.1 is a high
   affinity nitrate transporter [[55]17]. Ming et al. analyzed the
   expression patterns and subcellular localization of the GS1 protein
   under different nitrogen conditions, confirming its role in enhancing
   low nitrogen tolerance in cucumber [[56]18]. Transcriptomic analysis
   through WGCNA is considered a powerful means of mining key genes
   [[57]19]. WGCNA categorizes a multitude of genes into different modules
   based on gene expression changes, reducing analytical complexity and
   associating co-expression patterns within modules with phenotypic
   differences, thus identifying gene modules and key genes that play
   crucial roles during phenotypic variations
   [[58]20,[59]21,[60]22,[61]23]. This method has been extensively applied
   to unravel molecular mechanisms in various plants, including cucumbers
   [[62]24]. Examples include identifying key genes responsible for
   resistance against Podosphaera xanthii in cucumber cultivars [[63]25],
   pinpointing genes crucial for phenylpropanoid biosynthesis during
   cucumber storage [[64]26], determining central genes for salt response
   [[65]27], and identifying genes significantly associated with flavonoid
   biosynthesis in cucumbers [[66]28]. Moreover, SNPs and INDELs are
   genetic variations that can significantly influence phenotypic traits
   [[67]15,[68]29,[69]30]. The accurate identification of these mutations
   is crucial for genetic research and breeding programs. The presence of
   an SNP or INDEL may lead to phenotypic changes, playing a significant
   role in genetic breeding [[70]31]. For instance, SNPs and INDELs have
   been used to assist in selecting genes resistant to powdery mildew
   [[71]32], genes resistant to stem blight [[72]33], genes associated
   with germination [[73]34], and key genes for parthenocarpy in cucumbers
   [[74]35].

   Previously, we have established laboratory techniques for identifying
   phenotypes under varying nitrogen concentrations during the seedling
   stage, and yielded materials with extreme genotypes. Five candidate
   genes associated with cucumber tolerance to low nitrogen were
   identified through genome-wide association studies [[75]36]. In this
   study, two extreme phenotypic cucumber lines under high nitrogen
   conditions were used. Transcriptomic sequencing of root tissues at
   seven time points after high nitrogen treatment was performed. Using
   the WGCNA method, a co-expression network of genes related to nitrogen
   use efficiency under high nitrogen conditions was constructed. Specific
   modules responding to NUE in cucumber were selected, and their
   functions were annotated and analyzed to mine related key genes and
   interaction networks. This analysis offers new insights into the
   regulation mechanisms of NUE in cucumber.

2. Materials and Methods

2.1. Plant Materials and High Nitrogen Treatment

   Two extreme phenotypic cucumber lines, L2 with high NUE and L8 with low
   NUE, were used as plant material, which were obtained through previous
   screening of 112 cucumber lines of different cucumber types at Cucumber
   Research Institute of Tianjin Academy of Agricultural Sciences, China.
   L2 and L8 both belong to the South China cucumber type. Seeds were
   initially soaked in warm water at 55 °C for 15 min, followed by
   immersion in room temperature water for 3.5 h. Subsequently, seeds were
   placed in a germination culture box maintained at 28 °C for 2–3 days to
   allow sprout development. The sprouts were affixed to sponge plugs and
   then were transferred to a plastic box and cultivated in a
   half-strength Hoagland nutrient solution (nitrate concentration of 7
   mmol/L) at a pH of 6.2. The nutrient solution was refreshed every three
   days and aeration was conducted twice a day. Once seedlings reached the
   one true leaf stage, they were transferred to 700 mL black hydroponic
   plastic buckets containing 500 mL of half-strength Hoagland nutrient
   solution. When seedlings reached the two true leaf stage, they were
   subjected to a higher nitrate treatment with a nitrogen concentration
   of 20 mmol/L. The Hoagland nutrient solution recipe is available in the
   [76]Supplementary Table S1. Root sampling was conducted at 0 h, 2 h, 4
   h, 8 h, 12 h, 24 h, and 30 h of cultivation, with at least three
   biological replicates per treatment.

2.2. RNA Extraction and Transcriptome Sequencing

   Total RNA was extracted from root tissues using the standard extraction
   method provided by Novogene Co. Ltd. The quality and integrity of RNA
   were assessed with an Agilent 2100 bioanalyzer. Libraries for
   transcriptomic sequencing were prepared using the standard NEB library
   construction kit [[77]37] and were sequenced on the Illumina platform
   to generate paired-end reads. Raw sequence reads were processed and
   aligned to the cucumber reference V4 genome
   ([78]http://www.cucumberdb.com/#/home, accessed on 25 September 2024)
   utilizing HISAT2 [[79]38].

2.3. Weighted Gene Co-Expression Network Analysis

   Transcriptomic data from 14 samples were analyzed using the Novamagic
   cloud platform with a soft threshold set at 10. Genes were
   hierarchically clustered based on their Topological Overlap Matrix
   (TOM) dissimilarity [[80]19], and modules were identified using dynamic
   tree cutting techniques. Each module contained a minimum of 30 genes.
   Modules that exhibited highly similar expression profiles
   (dissimilarity less than 0.25) were merged to minimize redundancy and
   enhance the robustness of the findings.

2.4. Differential Expression Genes Analysis

   Differentially expressed genes (DEGs) between the high NUE and low NUE
   lines at each time point were identified using the edgeR [[81]39]. The
   p-values were adjusted using the Benjamini and Hochberg method.
   Corrected p-value of 0.05 and absolute log[2](foldchange) of 0.8 were
   set modules as the thresholds for significantly differential
   expression. A DEGs Venn diagram was created using an R 4.4.1 package.

2.5. Identification and Enrichment Analysis of the Core Modules

   Identification and enrichment analysis of the core modules were
   conducted by calculating the Pearson correlation coefficients and the
   significance of the p-values for the module eigengenes in relation to
   L2 and L8. Modules were selected as core if they exhibited a
   correlation coefficient (r) greater than 0.6 and a p-value less than
   0.05. Differentially expressed genes (DEGs) within these modules were
   then analyzed using Gene Ontology (GO,
   [82]http://www.geneontology.org/, accessed on 11 October 2024) and the
   Kyoto Encyclopedia of Genes and Genomes (KEGG,
   [83]http://www.genome.jp/kegg/, accessed on 11 October 2024) to
   elucidate the biological functions and pathways predominantly
   represented. Bar charts were utilized to display the top 30 significant
   GO enrichment analysis results. Bubble charts were employed to show the
   top 20 KEGG pathway analysis results.

2.6. Real-Time Quantitative PCR Analysis

   Primers were designed using NCBI ([84]https://www.ncbi.nlm.nih.gov/,
   accessed on 1 October 2024) and synthesized by Sangon Biotech Co., Ltd.
   (Shanghai, China). Detailed primer sequence information is available in
   [85]Supplementary Table S2. RNA extraction was performed using the
   NG312 kit from Beijing LABLEAD Company, Beijing, China, and cDNA was
   synthesized with a First-strand cDNA Synthesis Mix kit from the same
   company. All procedures were carried out according to the instructions
   provided. The qRT-PCR was conducted using the 2x Realab Green PCR Fast
   mixture reagent from Beijing LABLEAD Company, Beijing, China in a 20 µL
   system as per the manufacturer’s instructions. The protocol included an
   initial denaturation at 95 °C for 30 s, followed by 40 cycles of
   denaturation at 95 °C for 10 s and annealing at 60 °C for 30 s. A
   melting curve analysis was performed from 60 to 90 °C in 10 s intervals
   with a ∆T of 0.5 °C. An actin gene served as the internal control for
   data standardization, with relative expression calculated using the
   2^−∆∆Ct method. Variance analysis was conducted using R with a t-test
   to determine significance.

2.7. Transcriptomic SNPs and INDELs Analysis

   SNPs and INDELs were analyzed using the Novamagic cloud platform.
   Sequence logos were generated to visualize these variations using the
   WebLogo tool, providing a graphical representation of sequence
   conservation and variation at identified loci [[86]40,[87]41].

2.8. Protein Structure and Interaction Prediction

   Protein structure prediction was performed using AlphaFold3, with the
   default settings for speed and copies parameters [[88]42]. The detailed
   protein sequence is in [89]Supplementary File S1. STRING was used to
   predict protein interaction, employing default settings
   ([90]https://cn.string-db.org/, accessed on 26 October 2024).

3. Results

3.1. Identification of Gene Co-Expression Modules Through WGCNA

   An optimal soft-threshold power was selected for the construction of
   the gene co-expression network by assessing the scale-free topology.
   Various soft-threshold powers, ranging from 1 to 20, were evaluated
   ([91]Figure 1). A power of 10 was determined to be optimal,
   corresponding to a fit index that exceeded 0.8 and ensuring
   stabilization and the attainment of a scale-free topology within the
   network.

Figure 1.

   [92]Figure 1
   [93]Open in a new tab

   Soft-threshold power selection for gene co-expression network
   construction. Left panel: red line represents scale-free topology fit
   index across varying soft-threshold powers. Right panel: mean
   connectivity changes with increasing soft-threshold power.

   Initial clustering based on the TOM revealed an extensive array of
   color blocks, reflecting a complex pattern of gene co-expressions that
   complicated further analysis ([94]Supplementary Figure S1). The
   clustering led to the identification of 35 gene co-expression modules,
   which varied significantly in size. The smallest module, labeled grey,
   contained 48 genes, while the largest, labeled turquoise, included over
   4000 genes, demonstrating the diversity of gene co-expression across
   the sampled conditions.

3.2. Correlation Between the Modules and NUE

   To elucidate the roles of the 35 identified modules in the two lines,
   associations were explored by calculating Pearson correlation
   coefficients between module eigengenes and the L2 (high NUE)/ L8 (low
   NUE) lines. As shown in [95]Figure 2, a detailed heatmap illustrates
   the strength and significance of these correlations, highlighting each
   module’s potential contribution to NUE. Five out of the 35 modules
   identified were found to be significantly correlated with the extreme
   NUE lines. The purple (correlation coefficient 0.99), skyblue
   (correlation coefficient 0.95), cyan (correlation coefficient 0.72),
   darkorange (correlation coefficient 0.71), and lightcyan (correlation
   coefficient 0.68) modules, contain 241, 86, 200, 101, and 195 genes,
   respectively.

Figure 2.

   [96]Figure 2
   [97]Open in a new tab

   Correlation heatmap between gene co-expression modules and L2/L8 lines.
   Red indicates positive correlations and blue denotes negative
   correlations. Cells contain Pearson correlation coefficients and
   p-values in parentheses.

   Modules such as darkorange and skyblue, which show positive
   correlations, were considered to be active in the high NUE line,
   whereas modules like purple, cyan, and lightcyan, showing negative
   correlations, were considered to be active in the low NUE line, all of
   which were targeted for further comprehensive functional studies.

3.3. Comprehensive Analysis of the Core Modules

3.3.1. Analysis of the Purple Module

   The purple module, encompassing 241 genes, displayed distinct
   expression patterns across the seven different time points in response
   to high nitrogen. [98]Figure 3 illustrates that eigengene expressions
   differ significantly between the high NUE and low NUE cucumber lines. A
   peak in expression at 4 h in the high NUE line corresponded to the
   lowest expression in the low NUE line, highlighting significant
   differences in high nitrogen response.

Figure 3.

   [99]Figure 3
   [100]Open in a new tab

   Heatmap of the gene expression of the eigengenes in the purple module.
   Color ranges from green (low) to red (high).

   The maximum number of DEGs within this module was observed at 4 h,
   totaling 145 genes ([101]Supplementary Figure S2). The time points at 2
   and 8 h each recorded 140 DEGs, while the lowest count of 129 DEGs
   occurred at 24 h. Seventy-four genes were differentially expressed at
   all time points, underlining their possible vital role in nitrogen
   metabolism.

   GO ([102]Supplementary Figure S3 and Table S3) and KEGG enrichment
   analyses ([103]Supplementary Figure S4 and Table S4), elucidated the
   biological functions and pathways of these DEGs. The enrichment
   analysis highlighted several crucial biological processes and molecular
   functions related to NUE. Key NUE-related functions and pathways
   included glycine, serine, and threonine metabolism, aminoacyl-tRNA
   biosynthesis, and arginine biosynthesis and metabolism. Fifteen genes
   were identified, with nine exhibiting differential expression at four
   or more time points. These nine genes were selected as candidate genes
   for further analysis.

3.3.2. Analysis of the Lightcyan Module

   The lightcyan module, encompassing 195 genes, exhibited diverse
   expression patterns across the seven time points ([104]Figure 4).
   Significant differences in expression between the high NUE and low NUE
   lines were observed, which was particularly notable at 4 h where the
   low NUE line displayed remarkable peak eigengene expressions in
   contrast to the lower expressions in the high NUE line.

Figure 4.

   [105]Figure 4
   [106]Open in a new tab

   Gene expression of the eigengenes in the lightcyan module. Color ranges
   from green (low) to red (high).

   Within this module, the highest number of DEGs was observed at 4 h,
   totaling 147 genes ([107]Supplementary Figure S5). This peak was
   followed by 98 DEGs at 24 h, with the lowest count of 29 DEGs at 2 h.
   While DEG counts at time points 0, 8, 12, and 30 h were recorded at 35,
   63, 82, and 55, respectively. Notably, 74 genes were differentially
   expressed at four or more time points, with 7 genes differentially
   expressed across all time points.

   GO enrichment identified 11 genes with NUE-related functions
   ([108]Supplementary Figure S6 and Table S5) and KEGG pathways
   enrichment identified 10 genes ([109]Supplementary Figure S7 and Table
   S6). After excluding duplicates, the enrichment analysis highlighted 19
   genes with several crucial biological processes and molecular functions
   related to NUE, including arginine and proline metabolism, as well as
   glycine, serine, and threonine metabolism. Among these, 10 genes
   exhibited differential expression at four or more time points,
   emphasizing their potential roles in nitrogen metabolism. These 10
   genes were selected as candidate genes for further analysis.

3.3.3. Analysis of the Darkorange Module

   The darkorange module encompassed 101 genes. Notably higher eigengene
   expressions were observed at 0 and 2 h in the high NUE line, marking
   these periods as critical for gene activity within this module. In
   contrast, the low NUE line exhibited negative values at these times
   ([110]Figure 5).

Figure 5.

   [111]Figure 5
   [112]Open in a new tab

   Gene expression of the eigengenes in the darkorange module. Color
   ranges from green (low) to red (high).

   The number of DEGs peaked at 0 h with 64 genes, followed by 58 at 2 h
   ([113]Supplementary Figure S8). The lowest differential expression was
   noted at 4 h with 23 genes, with consistent counts of 30 genes observed
   at both 8 and 12 h, and varying counts of 29 and 39 genes at 24 and 30
   h, respectively. A total of 35 genes were identified as differentially
   expressed at four or more time points, where 12 exhibited differential
   expression across all seven time points.

   GO enrichment identified 10 genes with NUE-related functions
   ([114]Supplementary Figure S9 and Table S7) and KEGG pathways
   enrichment identified 5 genes ([115]Supplementary Figure S10 and Table
   S8). These analyses highlighted crucial pathways such as nitrogen
   metabolism and arginine biosynthesis, which were essential for
   understanding nitrogen utilization in plants. After excluding
   duplicates, 14 genes critical for nitrogen utilization were identified,
   where 5 genes showed differential expression across four or more time
   points, underscoring their significant roles in nitrogen metabolism.
   These five genes were selected as candidate genes from this module for
   further analysis.

3.3.4. The Analysis of the Cyan Module

   The cyan module comprised 200 genes. Marked differences in eigengene
   expressions between the high NUE and low NUE lines were particularly
   prominent at 4 h, where the low NUE line showed a peak in positive
   values, which is in stark contrast to the negative values observed in
   the high NUE line ([116]Figure 6).

Figure 6.

   [117]Figure 6
   [118]Open in a new tab

   Gene expression of the eigengenes in the cyan module. Color ranges from
   green (low) to red (high).

   The module exhibited a peak in differential expression at 4 h with 62
   genes, and a secondary peak at 24 h with 47 genes ([119]Supplementary
   Figure S11). The lowest differential expression was noted at 2 h with
   only 16 DEGs, and variable DEG counts were observed at other time
   points—29 at 0 h, 22 at 8 h, 35 at 12 h, and 36 at 30 h. Of these, 17
   genes were identified as differentially expressed at four or more time
   points, with one gene expressed across all seven time points.

   GO ([120]Supplementary Figure S12 and Table S9) and KEGG pathways
   ([121]Supplementary Figure S13 and Table S10) revealed the involvement
   of these genes in crucial metabolic pathways, including amino acid
   biosynthesis and arginine and proline metabolism. These pathways played
   vital roles in the plant metabolic framework, indirectly influencing
   nitrogen assimilation and utilization. However, there were no genes
   differentially expressed across four or more time points and therefore
   no genes were selected as candidate genes.

3.3.5. The Analysis of the Skyblue Module

   The skyblue module comprised 86 genes. The expression heatmap revealed
   no distinct time-specific variations within this module ([122]Figure
   7). A peak in differential expression was observed at 12 h involving 31
   genes, closely followed by 29 genes at 4 h ([123]Supplementary Figure
   S14). The lowest number of differentially expressed genes, totaling 20,
   was recorded at 8 h. Counts of differentially expressed genes at 0 and
   30 h included 28 genes each, with 25 and 26 genes at 2 and 24 h,
   respectively. Importantly, 17 genes in this module were identified as
   differentially expressed across four or more time points, while 6 genes
   differentially expressed at all time points.

Figure 7.

   [124]Figure 7
   [125]Open in a new tab

   Gene expression of the eigengenes in the skyblue module. Color ranges
   from green (low) to red (high).

   GO enrichment identified two genes with NUE-related functions
   ([126]Supplementary Figure S15 and Table S11) and KEGG pathways
   enrichment identified one gene ([127]Supplementary Figure S16 and Table
   S12). Crucial functions and pathways included nitrogen compound
   transport and arginine and proline metabolism. Among these genes, only
   one gene showed differential expression across multiple time points.
   This one gene was selected as a candidate gene from this module for
   further analysis.

3.4. Information and Functional Annotation of 25 Selected Genes

   [128]Table 1 provides detailed information about the candidate genes
   from the five modules, including gene descriptions, species
   affiliations, and their respective expression modules. These genes have
   been studied across multiple species, such as in Arabidopsis thaliana,
   Oryza sativa, Cucumis melo, Cucumis sativus, and so on. We further
   screened 25 genes by reviewing related studies to determine whether the
   genes or their gene family members had been previously studied for
   functions related to NUE. The results identified genes directly related
   to NUE such as CsaV4_1G002492 (GLR22) and CsaV4_2G003460 (GLR35), which
   are glutamate receptors potentially involved in nitrogen signal
   transduction; CsaV4_3G000307 (NRT1.1/NPF6.3), which is involved in
   nitrogen absorption and transport; and CsaV4_7G001709 (UPS2), which is
   involved in nitrogen recycling. These four genes were selected as key
   genes for further analysis.

Table 1.

   Gene information and functional annotation of the 25 selected genes.
   Gene ID Gene Description Organism Species Module Related
   References