Abstract

   Cotton fibers are single cells that develop from the epidermal cells in
   the outer integument of developing seeds. The processes regulating
   fiber cell development have been extensively studied; however, the
   spatiotemporal transcriptome and metabolome profiles during the early
   stages of fiber development remain largely unknown. In this study, we
   profile the dynamics of transcriptome and metabolome during the early
   stages of cotton fiber cell development using a combination of spatial
   transcriptomic, single-cell transcriptomic, and spatial metabolomic
   analyses. We identify the key genes (e.g., DOX2, KCS19.4, BEE3, and
   HOS3.7) and metabolites (e.g., linoleic acid, spermine, spermidine, and
   α-linolenic acid) that may regulate the early development of fiber
   cells. Finally, knockdown and gain-of-function analyses identify the
   crucial role of GhBEE3/Gh_A09G062900 in cotton fiber initiation. We
   also construct a publicly accessible website
   ([73]https://cotton.cricaas.com.cn/ovule/) for visualization of the
   spatiotemporal gene expression in cotton, providing a reference dataset
   for further studies on cotton fiber development.

   Subject terms: Plant physiology, Plant development, Gene expression
   analysis, Metabolomics
     __________________________________________________________________

   While cotton fiber development has been extensively studied, the
   spatiotemporal transcriptome and metabolome landscape remain largely
   unknown. Here, the authors profile early stage cotton fiber cell using
   a combination of spatial transcriptomic, single-cell transcriptomic,
   and spatial metabolomic analyses.

Introduction

   Cotton (Gossypium species) is one of the most important cash crops in
   the world. It is cultivated in about 80 countries, occupies
   approximately 2.5% of the world’s arable land, and generates roughly
   500 billion dollars of annual revenue worldwide^[74]1. Upland cotton
   (G. hirsutum L.) is the most widely cultivated allotetraploid cotton
   variety, accounting for ~ 95% of global cotton fiber
   production^[75]2,[76]3. Cotton fibers are single cells originating on
   the outer epidermal surface of developing seeds. Approximately 25% of
   the ovular epidermal cells differentiate into lint fibers at the early
   stage of seed development^[77]4. These fibers develop through four
   distinct stages, initiation (− 2 to 5 days post anthesis; DPA),
   elongation and primary wall synthesis (3 to 20 DPA), secondary cell
   wall thickening (16 to 40 DPA), and fiber maturation (40 to 50
   DPA)^[78]5,[79]6. The first two stages determine the number and length
   of the fibers produced by each ovule/seed, which determines the final
   yield of cotton^[80]7. The latter two stages, associated with cell wall
   thickening, determine fiber strength and fineness^[81]7. However, the
   regulatory factors that determine these developmental stages remain
   largely unknown. Understanding the mechanisms determining the cell fate
   and regulating the initiation and elongation of cotton fiber will
   provide an opportunity to improve the quality and yield of cotton
   fiber.

   Progress in sequencing and omics technology and functional studies have
   led to some advances in the identification of transcription factors
   (TFs) involved in regulating fiber differentiation from ovule epidermal
   cells. Genes encoding TFs like myeloblastosis-type (MYB-type such as
   GhMYB25^[82]8, GhMYB25-like^[83]9, and GhMYB2^[84]10),
   homeodomain-leucine zipper (HD-ZIP)-type homeodomain protein 1
   (GhHD1)^[85]11, defense-related protodermal factor 1 (GbPDF1)^[86]12,
   and auxin-related auxin-responsive protein 2 (ARF2)^[87]13 play crucial
   roles in fiber initiation, while the auxin transporter-encoding gene
   pin-formed 3a (GhPIN3a)^[88]14, calcium sensor calmodulin 7-like
   (GhCaM7-like)^[89]15, GhMYB109^[90]16, glycolysis-related cytosolic
   pyruvate kinase 6 (GhPK6)^[91]17, HD-ZIP homeobox 3 (GhHOX3)^[92]18,
   and basic helix-loop-helix (bHLH) TF-encoding paclobutrazol resistance
   1 (GhPRE1)^[93]19 regulate the elongation of cotton fibers. However,
   several studies have demonstrated that the expression of TF-encoding
   genes responsible for fiber differentiation from ovule epidermal cells
   is also regulated by phytohormones^[94]6. Abscisic acid (ABA), auxins,
   brassinosteroids (BRs), cytokinins, gibberellin (GA), jasmonic acid
   (JA), and ethylene regulate cotton fiber development^[95]20. More
   specifically, auxins and BRs promote fiber initiation and
   elongation^[96]21,[97]22, GAs and ethylene positively affect fiber
   elongation^[98]20, and JA and GA synergistically promote fiber
   initiation^[99]23. On the other hand, cytokinins and ABA inhibit cotton
   fiber elongation^[100]24. In addition, several secondary metabolites
   have also been correlated with cotton fiber development. The flavonoid
   naringin negatively influences fiber development^[101]25, while
   linolenic acid (C18:3) promotes cotton fiber elongation^[102]26.
   Reactive oxygen species (ROS) and calcium signaling are other crucial
   factors that regulate cotton fiber development^[103]15.

   Although some achievements have been made in the exploration of the
   regulatory mechanisms of fiber development at the molecular level,
   these are still unclear to a large extent. Single-cell RNA sequencing
   (scRNA-seq) is a technology that provides insights into the
   heterogeneity of gene expression among cells and the dynamics of cell
   types during development. This technology allows the identification of
   candidate genes associated with specific traits at the single-cell
   level^[104]27–[105]29. However, applying scRNA-seq in non-model plants,
   particularly in cotton, is challenging due to the lack of known marker
   genes for defining each cell type^[106]30,[107]31. Qin et al. recently
   analyzed the single-cell transcriptome during initiation and provided
   information on the regulation of fiber development, but at the same
   time, important spatial location information was not available
   yet^[108]32. The emerging spatial transcriptomic (ST) technology is
   widely used in animals to help overcome this problem by providing
   markers to associate gene expression with cell location and cell
   types^[109]33. Similarly, the recently developed spatial metabolomics
   (SM) based on mass spectrometry imaging (MSI) helps visualize
   quantification and identification of the distribution of specific
   metabolites in specific tissues^[110]34. SM has demonstrated potential
   in characterizing the regulatory mechanisms underlying development in
   various plants^[111]35–[112]40 but has not yet been applied to cotton.
   The mechanism of cotton fiber development is complex and cannot be
   thoroughly studied with a single omics research method. Therefore, the
   multi-omics analysis of ST, scRNA and SM will help us to further
   analyze the molecular mechanism of cotton fiber initiation and early
   development.

   In this work, we carry out transcriptome and metabolome profiling of
   ovules based on ST, scRNA-seq, and SM to systematically analyze the
   dynamic changes at the single-cell level during fiber initiation and
   early elongation in cotton (Fig. [113]1). We then aim to spatially map
   the cell types and achieve in situ visualization of gene and metabolite
   expression patterns. Further, to integrate the ST and SM datasets, we
   consistently set the spatial resolution in the study at 100 μm to
   identify the marker genes and metabolites that play key roles in
   regulating fiber development. Finally, we explore the role of a
   representative marker gene, GhBEE3, in regulating the initiation of
   fiber cells via knockdown and gain-of-function approaches. These
   findings on the spatiotemporal changes in gene expression and
   metabolite concentrations during the initiation and elongation of
   cotton fiber will facilitate further studies on cotton fiber
   development. In conclusion, we provide a strategic reference for future
   studies, as well as a methodological model for accelerating the
   application of single-cell spatial multiomics to diverse crop plants.

Fig. 1. A molecular approach towards fiber development atlas in cotton.

   [114]Fig. 1
   [115]Open in a new tab

   The study performed (a) spatial transcriptomics to characterize the
   spatiotemporal transcriptome patterns in cotton bolls from − 1.5-DPA to
   5-DPA by ST; (b) single-cell transcriptomics (scRNA-seq) to dissect
   cell heterogeneity in ovules from − 1.5-DPA to 1-DPA; (c) spatial
   metabolomics to identify key metabolites by MSI from − 1.5-DPA to
   5-DPA; (d) ST and SM integrated analysis to generate a multi-omic
   atlas. The genes and clusters (cell types) of interest were selected,
   and the heatmap of metabolite-related gene expression was generated
   according to the expression level of each cell cluster (as shown in
   Fig. [116]2d).

Results

Spatio-temporal gene expression patterns in cotton bolls

   Cotton fiber initiation occurs from − 2-DPA to 5-DPA. Cotton fiber
   cells begin to protrude at − 0.5-DPA and display numerous protrusions
   by 0-DPA, suggesting that the cell fate is determined as early as
   − 1-DPA or − 1.5-DPA^[117]32. In this study, we examined the
   spatiotemporal anatomical variations at the early stages of fiber
   development using paraffin-embedded sections of cotton bolls at
   − 1.5-DPA, − 1-DPA, 0-DPA, 1-DPA, 3-DPA, and 5-DPA (Supplementary
   Fig. [118]1). The cross-sections stained with toluidine blue showed
   that fiber cell initiation starts at 0-DPA. Subsequently, the fiber
   cells start to elongate from the outer integument at 3-DPA and further
   elongate at 5-DPA.

   We then sequenced the TM-1 cotton boll samples from the above-mentioned
   stages of development using ST technology to obtain the spatiotemporal
   gene expression dynamics and elucidate the mechanisms underlying fiber
   development. A total of eight samples at six stages of development were
   sequenced, including two cotton bolls collected from two different
   cotton plants that were maintained as biological replicates at 0-DPA
   and 5-DPA to ensure accuracy. The RNA integrity number (RIN) of cotton
   boll sections at all developmental stages was evaluated (Supplementary
   Table [119]1), and the qualified samples with RIN > 7 were stained with
   Hematoxylin-Eosin (H&E) (Fig. [120]2a), then ST sequencing was
   performed using 10 × Genomics Visium. The optimal tissue
   permeabilization conditions were determined through preliminary tissue
   optimization experiments, including incubating samples for 24 min in a
   specifically modified permeabilization enzyme cocktail (2% w/v
   Cellulase R10, 0.4% w/v Macerozyme R10, 1% w/v Pectinase, 1% w/v
   Hemicellulase, 0.4% Snailase) (Supplementary Fig. [121]2). Tissue
   permeabilization was carried out for all the samples subsequently under
   the optimized conditions, and the cDNA was synthesized from the
   captured mRNA and used for library preparation. The library had cDNA
   fragments of the desired length (Supplementary Fig. [122]3) and was
   sequenced. Then, the spatial barcodes unique to each spot were used to
   determine the spatial location of genes. After quality control, our ST
   experiment using cotton boll samples collected at different DPA
   identified ~ 50,000 genes (Supplementary Data [123]1), which is within
   the expected range. In detail, we obtained 2651–3998 high-quality spots
   per section, 5487–11,210 unique molecular identifiers (UMIs) per spot,
   3177–5556 expressed genes per spot, and 0.0000115%–0.0000218%
   mitochondrial genes per spot (Supplementary Fig. [124]4a–c and
   Supplementary Data [125]1). Then, Pearson correlation analysis was
   conducted among the eight sample sections, and a strong correlation
   among the samples was obtained (Supplementary Fig. [126]4d–j). Each
   spot captured an average of 1–10 cells (Supplementary Fig. [127]4k).

Fig. 2. Spatiotemporal gene expression during six stages of cotton fiber
development.

   [128]Fig. 2
   [129]Open in a new tab

   a Hematoxylin and Eosin (H&E) stained sections of cotton bolls at
   − 1.5-DPA, − 1-DPA, 0-DPA, 1-DPA, 3-DPA, and 5-DPA. The H&E stain
   experiment was repeated three times. Scale bar, 1 mm. b UMAP
   visualization of cotton boll cells. Each point in the figure represents
   a spot; different colors indicate different groups of spots. The cell
   types in each cluster were annotated based on the tissue information in
   (c). From cluster 1 to cluster 19, the number of spots contained in
   each cluster is 3540, 3392, 2700, 2469, 1966, 2077, 1572, 1289, 1190,
   1114, 1014, 988, 689, 462, 378, 183, 142, 123, and 119, respectively. c
   All clusters obtained by the UMAP analysis are mapped to their specific
   locations. The mapping shows the localization of 19 clusters. d Dotplot
   shows the expression of the representative marker genes expressed in
   each cell cluster. Each row represents a marker gene, and each column
   represents a cell cluster. The size of the dots represents the
   percentage of cells in the clusters that express the gene. Gene
   expression levels are indicated by a color key ranging from blue (low)
   to pink (high). e Annotation of the major clusters of cotton bolls.
   KCS19.4, HOS3.7, and bZIP43.11 are the marker genes of ‘fiber cells’,
   while SBT4.14, GH3.6.2, and GASA14.1 are the marker genes of
   'integument.' The UMAP plot of transcript accumulation for the marker
   genes and the spatial distribution of these marker genes in cotton
   bolls are shown. The color intensity indicates the relative transcript
   level of each marker gene. Source data are provided as a Source Data
   file.

   Subsequent uniform manifold approximation and projection (UMAP)
   analysis of all spots in the samples from six developmental stages
   showed 19 cell clusters (Fig. [130]2b). Next, to assess the spatial
   organization of the clusters, we mapped the spots to their original
   coordinates in the tissue sections and annotated them (Fig. [131]2c).
   The resulting clusters corresponded to specific regions in the bolls of
   six DPA. ‘Mesocarp/Diaphragm cells’ (clusters 1, 2, 6, 9, and 14)
   appeared in the boll slices at most stages, while ‘fiber cells’
   (clusters 3, 10, and 19) appeared in boll slices at 3-DPA and 5-DPA
   (Fig. [132]2b, c and Supplementary Data [133]2).

   To further characterize the clusters, we identified the differentially
   expressed genes (DEGs) between them (Supplementary Data [134]3). The
   DEGs with the highest expression levels in each cell cluster were
   identified as the key marker genes (Fig. [135]2d). For example,
   gretchen hagen 3 (GH3.6.2) (Gh_A03G188900), subtilase 4.14 (SBT4.14)
   (Gh_D11G090900), and alpha dioxygenase 2 (DOX2) (Gh_A09G193100) were
   found highly expressed in ‘integument_1’ (cluster 4) (Fig. [136]2d).
   The sugars will eventually be exported transporters 15 (SWEET15)
   (Gh_D04G160000) was highly expressed in ‘integument_1’ (cluster 4), and
   ‘nucellus’ (clusters 5 and 13) (Fig. [137]2d). The brassinosteroid
   enhanced expression 3 (BEE3) (Gh_A09G062900) was highly expressed in
   ‘placenta’ (cluster 11), and GA-stimulated 14.1 (GASA14.1)
   (Gh_A07G244400) was highly expressed in ‘integument_2’ (cluster 8)
   (Fig. [138]2d). Besides, the 3-ketoacyl-CoA synthase 19.4 (KCS19.4)
   (Gh_A10G219300) and high expression of osmotically responsive 3.7
   (HOS3.7) (Gh_D01G000600) were found to be highly expressed in ‘fiber
   cells’ (clusters 3 and 10) at two different time points during the
   early elongation of fiber cells (3-DPA and 5-DPA; Fig. [139]2e).
   Gh_D12G200300, an ortholog of the Arabidopsis thaliana marker gene
   basic leucine zipper 43.11 (bZIP43.11), showed high expression in
   ‘fiber cells_3’ (cluster 19) (Fig. [140]2e).

   Further gene ontology (GO) enrichment analysis of the DEGs in the cell
   clusters indicated that ‘fruit development’ and ‘seed development’
   terms were associated with ‘exocarp_1’ (cluster 12), and ‘nucellus_2’
   (cluster 13) (Supplementary Table [141]2). The DEGs in ‘integument_2’
   (cluster 8) were also associated with the ‘organonitrogen compound
   biosynthetic process’ (Supplementary Table [142]2). Meanwhile, the DEGs
   in ‘mesocarp’ or ‘diaphragm’ (clusters 1, 2, 6, 9, and 14) were
   associated with ‘macromolecule biosynthetic process’, ‘cellular
   metabolic process’, ‘fruit development’, ‘organonitrogen compound
   biosynthetic process’, and ‘response to stimulus’ (Supplementary
   Table [143]2), and those of ‘fiber cells’ (clusters 3, 10, and 19) and
   ‘integument_1’ (cluster 4) were associated with ‘seed trichome
   differentiation’ (Supplementary Table [144]2).

   To examine the robustness of our ST results, ST data from all cotton
   boll samples were compared with scRNA-seq data from the outer
   integument and fiber of Xu142 0-DPA sample by Qin et al.^[145]32.
   Specifically, the AddModuleScore function from Seurat was used to score
   the expression accumulation of outer integument and fiber enrichment
   gene datasets from Qin et al.^[146]32 in each spot of our ST data.
   Then, the expression accumulation scores of these gene datasets were
   plotted as a feature plot for visualization (Supplementary
   Fig. [147]5a). It was obvious that these gene datasets were mainly
   highly expressed in ‘fiber cells’ (clusters 3, and 19) and ‘integument’
   (clusters 4, and 8), which was consistent with the results of their
   research. Further, we performed a Venn diagram to show the correlation
   between the outer integument and fiber-enriched gene datasets from Qin
   et al.^[148]32 and the gene datasets of ‘fiber cells’ (clusters 3, 10,
   and 19) and ‘integument’ (clusters 4, and 8) in our ST data, and the
   results showed that there was a certain correlation between the two
   gene datasets (Supplementary Fig. [149]5b). In addition, consistent
   with previous study by Deng et al.^[150]12, our ST analysis revealed
   high expression of PDF1 (Gh_D04G040400), a known key marker gene for
   fiber initiation, in the ‘fiber cells’ and ‘integument’ in the young
   developing cotton bolls (Supplementary Fig. [151]5c).

   In order to further analyze the mechanisms underlying fiber
   development, we mapped the gene expression of cotton boll samples at
   each development stage to explore the genes specifically expressed at
   each stage (Supplementary Figs. [152]6–[153]11). The DEGs identified
   from the samples for each developmental stage of the cotton boll were
   shown in Supplementary Data [154]4–[155]9. The analysis revealed that
   the cell populations at the six stages of early fiber development
   shared similar biological processes (Supplementary Tables
   [156]3–[157]8); however, each cell population had its own set of
   uniquely expressed genes (Supplementary Figs. [158]6–[159]11), from
   which the marker genes of specific tissue or cell type were identified.
   For example, Gh_D12G199400, an ortholog of the Arabidopsis teosinte
   branched 1, cycloidia, and proliferating cell nuclear antigen factor
   5.3 (TCP5.3), was found highly expressed in ‘mesocarp’ (cluster 1) of
   the − 1.5-DPA boll (Supplementary Fig. [160]6b). Gh_D09G117000, an
   ortholog of the Arabidopsis fasciclin-like arabinogalactan-protein 11.3
   (FLA11.3), was highly expressed in ‘fiber cells_1’ (cluster 2) of the
   3-DPA boll (Supplementary Fig. [161]10b). At 5-DPA, the Arabidopsis
   homologous gene transmembrane kinase 3 (TMK3) (Gh_D09G144600), was
   highly expressed in ‘fiber cells_1’ (cluster 1) (Supplementary
   Fig. [162]11b).

ScRNA-seq analysis of gene expression in cotton ovule

   We then analyzed the gene expression patterns in cotton ovules at four
   different early developmental stages from − 1.5- to 1-DPA by scRNA-seq
   to determine the heterogeneity of cells. Four cotton ovules were
   collected at − 1.5-, − 1-, 0-, and 1-DPA for library construction and
   high-throughput sequencing. After filtering, we obtained 7395, 4535,
   7528, and 8307 high-quality cells and 52,403, 49,232, 57,608, and
   52,379 genes from − 1.5-, − 1-, 0-, and 1-DPA samples, respectively
   (Supplementary Table [163]9). The average number of UMI per cell ranged
   from 1390 to 4213, the average gene number per cell ranged from 1022 to
   2501, and the average proportion of mitochondrial genes per cell ranged
   from 0.0000068% to 0.0000115% (Supplementary Fig. [164]12a–c and
   Supplementary Table [165]9). We then identified the DEGs for all four
   early developmental stages from the scRNA-seq data (Supplementary
   Data [166]10–[167]14).

   Furthermore, to examine the robustness of the sequencing results, we
   compared the scRNA-seq data for each DPA sample with the corresponding
   ST data. All the Pearson correlation coefficients (R) were > 0.7 with
   p-value < 2.2e-16, which indicated a strong correlation between the two
   datasets (Supplementary Fig. [168]12d). We then screened the genes
   highly expressed in the corresponding cell clusters identified by ST
   and scRNA-seq (Supplementary Data [169]15). Subsequent UMAP analysis of
   data from the four single-cell components (inner integument, outer
   integument, nucellus, and diaphragm; (Fig. [170]3a) from ovules at
   − 1.5-, − 1-, 0-, and 1-DPA showed 16 cell clusters (Fig. [171]3b). The
   percentage of cells in each cluster ranged from 0.8% to 12.86%
   (Supplementary Table [172]10). We annotated the 16 cell clusters based
   on the expression levels of the marker genes and the annotation
   information of the ST data (Fig. [173]3b). We further analyzed gene
   expression and identified the DEGs in each cluster. The expression of
   the top ten DEGs was presented as a dot plot (Fig. [174]3c), and the
   expression patterns of some representative genes were characterized
   (Fig. [175]3d). Some of the identified DEGs have been reported to be
   involved in regulating hormone accumulation, such as the ortholog
   (Gh_A05G105700) of the Arabidopsis marker gene OPDA reductase 2 (OPR2),
   which regulate JA accumulation^[176]41. The cotton homolog
   (Gh_D08G075000) of the Arabidopsis gene indole-3-pyruvate monooxygenase
   10.4 (YUC10.4) has been reported to mediate auxin synthesis^[177]42. GO
   analysis of the DEGs in each cluster indicated enrichment of the ‘fatty
   acid biosynthetic process’, ‘seedling development’, and ‘post-embryonic
   development’, especially in the ‘outer integument’ and ‘nucellus’
   (clusters 1, 2, 3, 5, and 15) (Supplementary Table [178]11). Besides,
   the ‘outer integument’ (clusters 3, 7, and 10) had DEGs related to
   ‘seed trichome differentiation’, and ‘seed trichome elongation’ GO
   terms (Supplementary Table [179]11).

Fig. 3. Single-cell transcriptome analysis of cotton ovules from −1.5-DPA to
1-DPA.

   [180]Fig. 3
   [181]Open in a new tab

   a Structure of an ovule. b UMAP plots show major clusters based on
   single-cell transcriptomes of − 1.5-DPA, − 1-DPA, 0-DPA, and 1-DPA
   cotton ovules. Each dot represents a cell, and different colors
   represent different groups of cells. From cluster 1 to cluster 16, the
   number of cells contained in each cluster is 3571, 3188, 3031, 2616,
   2437, 2340, 2189, 1722, 1349, 1268, 1042, 997, 679, 596, 502, and 238,
   respectively. c Dotplot shows the expression of the representative
   marker genes in each cell cluster. Each row represents a marker gene,
   and each column represents a cell cluster. The size of the dots
   represents the percentage of cells in the clusters that express the
   gene. Expression levels are indicated by a color key ranging from blue
   (low) to pink (high). d Expression patterns of the representative
   marker genes from 16 cell clusters. The color intensity indicates the
   relative transcript level of each marker gene. Source data are provided
   as a Source Data file.

   In the same way as ST analysis, correlations were compared between the
   scRNA-seq data from our all cotton boll samples and the scRNA-seq data
   from outer integument and fiber of Xu142 0-DPA sample by Qin et
   al.^[182]32. The AddModuleScore function was used to score the
   expression accumulated of outer integument and fiber enrichment gene
   datasets from Qin et al.^[183]32 in each cell of our scRNA-seq data
   (Supplementary Fig. [184]13a). The gene datasets of Qin et al.^[185]32
   were found to be highly expressed in ‘outer integument’ (clusters 3,
   13, and 15) of our scRNA-seq data (Supplementary Fig. [186]13a), which
   was consistent with their findings. Further, we performed a Venn
   diagram to show the correlation between the outer integument and
   fiber-enriched gene datasets from Qin et al.^[187]32 and the gene
   datasets of ‘integument’ (clusters 1, 3, 4, 7, 8, 10, 12, 13, and 15)
   in our ST data, and the results showed that there was a strong
   correlation between the two gene datasets (Supplementary
   Fig. [188]13b). In addition, our scRNA-seq analysis showed that PDF1, a
   known key marker gene for fiber initiation^[189]12, was highly
   expressed in 0-DPA ‘outer integument’ (cluster 13) (Supplementary
   Fig. [190]13c).

   Further analysis using the short time-series expression miner (STEM)
   program grouped the DEGs into seven based on the expression trend from
   − 1.5-DPA to 1-DPA (Supplementary Fig. [191]14). We also analyzed the
   top ten DEGs in each cluster of ovules from these four stages, and
   identified 33 marker genes specifically expressed at 0-DPA
   (Supplementary Fig. [192]15a), such as xyloglucan
   endotransglycosylases/hydrolase 22.3 (XTH22.3) (Gh_A08G028100), a
   member of the GhXTH1 family^[193]43, expansin 2 (EXPA2) (Gh_D10G123800)
   and EXPA4.6 (Gh_A10G149600)^[194]21, auxin-inducible 22 (AUX22)
   (Gh_A05G114100)^[195]44, aminocyclopropane-1-carboxylic acid (ACC)
   synthase 2.1 (ACS2.1) (Gh_A02G179600)^[196]45, and phytosulfokine 6
   (PSK6) (Gh_A01G235100)^[197]46. GO enrichment analysis indicated that
   compared with those in − 1-DPA ovule, the up-regulated genes in 0-DPA
   ovule were mainly enriched ‘translation’ and ‘ribosomal small subunit
   assembly’ terms (Supplementary Fig. [198]15b). Besides, the DEGs
   identified in the different cell clusters included numerous TFs
   (Supplementary Fig. [199]15c) probably involved in regulating the early
   development of fiber cells, including anthocyaninless 2 (ANL2.3, an
   HD-ZIP member) (Gh_A11G092100), ERF008.1 (Gh_A12G080000), and MYB306.11
   (Gh_D12G192800) highly expressed in the 0-DPA ‘outer integument cells’
   (cluster 3) (Supplementary Fig. [200]15d).

   We then analyzed the scRNA-seq data from four stages using Monocle,
   which orders the single-cell transcriptomes in an unsupervised manner
   and adopts machine learning to reconstruct the single-cell
   developmental trajectory along pseudotime. The pseudotime trajectory
   had three branches, with one main developmental trajectory populated by
   the cells from the four samples (Fig. [201]4a). The pseudotime
   trajectory also grouped the cells into five pseudotime states
   (Fig. [202]4a). The proportion of pseudotime states in the four samples
   indicated that the pseudotime trajectory was consistent with their
   developmental stage (Fig. [203]4b). However, cells of some clusters
   showed a relatively independent distribution along the pseudotime
   trajectory (Fig. [204]4c). ‘− 1.5-DPA outer integument cells’ (cluster
   7) were concentrated in the early stage of the pseudotime trajectory,
   while ‘1-DPA outer integument cells’ (cluster 1) and ‘0-DPA outer
   integument cells’ (cluster 13) were concentrated in the late stage of
   the pseudotime trajectory (Fig. [205]4c), consistent with the actual
   developmental period of outer integument. Generally, the distribution
   characteristics of different cell types along the developmental
   trajectories suggest the relationships among these cells at different
   developmental stages. Interestingly, the ‘− 1-, 0-DPA inner integument
   cells’ (cluster 4) and the ‘− 1.5-, − 1-, 0-, 1-DPA nucellus cells’
   (cluster 5) exhibited relatively similar distribution patterns in the
   developmental trajectory, suggesting similar developmental stages
   (Fig. [206]4c).

Fig. 4. Pseudotime analysis of all cotton ovules samples from −1.5-DPA to
1-DPA.

   [207]Fig. 4
   [208]Open in a new tab

   a A pseudotime trajectory shows the distribution of cell clusters. The
   dashed arrow indicates the developmental direction of the pseudotime
   trajectory. b Proportion of pseudotime states in the four cotton ovule
   samples. c Pseudotime trajectories colored according to tissue type. d
   Heatmap clustering of key genes for developmental trajectories in all
   samples and expression kinetics of genes over pseudotime. The names of
   genes and cell types that determine the developmental trajectories are
   shown on the right side of the heatmap. Expression levels are indicated
   by the color ranging from blue (low) to red (high). Source data are
   provided as a Source Data file.

   Next, to analyze the genes that play a decisive role during specific
   stages of development (early, middle, and late), we screened the key
   genes in the developmental trajectory based on the scRNA-seq data. The
   heatmap of these genes showed that the pseudotime patterns could be
   divided into three modules (Fig. [209]4d). The first module can be
   divided into three branches. In the first branch, the expression of
   genes, mainly from ‘0-DPA inner integument cells’ (cluster 8) and
   ‘− 1.5-DPA outer integument cells’ (cluster 7), was substantially high
   at the beginning of development but decreased along the pseudotime. In
   the second branch, the expression of genes, mainly from ‘− 1.5-DPA
   outer integument cells’ (cluster 15), gradually increased to a maximum
   and then decreased along the pseudotime. In the third branch, the
   expression of genes, mainly from ‘− 1.5-, − 1-, 0-, 1-DPA nucellus
   cells’ (cluster 5), was comparatively low during all developmental
   periods along the pseudotime and was the weakest at the last stage of
   pseudotime (Fig. [210]4d). In the second module, the expression of
   genes, mainly from ‘− 1.5-, − 1-, 0-, 1-DPA diaphragm cells’ (cluster
   16), ‘0-DPA diaphragm cells’ (cluster 9), ‘− 1.5-, − 1-, 0-, 1-DPA
   nucellus cells’ (cluster 11), ‘0-DPA outer integument cells’ (cluster
   13), and ‘− 1-, 0-DPA diaphragm cells’ (cluster 14), gradually
   increased along pseudotime, indicating that these cell types were at
   mature stage of development (Fig. [211]4d). In the third module, the
   expression of genes, mainly from ‘− 1-, 1-DPA inner integument cells’
   (cluster 12), such as the heat shock protein (HSP)-encoding gene
   HSP70.20 (Gh_D13G265800), gradually increased to the maximum and then
   decreased quickly along the pseudotime (Fig. [212]4d). The HSP70 family
   has been reported to play a key role in cotton fiber initiation and
   elongation by maintaining cell homeostasis^[213]47.

Spatio-temporal metabolites expression patterns in cotton bolls

   The ST and scRNA-seq analyses revealed that many genes related to
   hormone metabolism, such as GH3.6.2, HOS3.7, and DOX2 (Fig. [214]2),
   play crucial roles in fiber development. However, the internal
   regulatory mechanisms of fiber development are complex, and
   transcriptomic techniques alone are not sufficient for their discovery.
   Therefore, in combination with a multi-stage sampling strategy, we
   carried out the highly sensitive and wide-coverage imaging, named air
   flow-assisted desorption electrospray ionization mass spectrometry
   imaging (AFADESI-MSI), of metabolites in seven developmental stages
   cotton boll at − 1.5-DPA, − 1-DPA, 0-DPA, 1-DPA, 2-DPA, 3-DPA, and
   5-DPA to explore the metabolic changes during early fiber development
   and to further understand their underlying mechanisms (Supplementary
   Fig. [215]16a). The boll samples at each stage had three biological
   replicates from different plants, and a total of 21 bolls were used for
   the study. We detected the spatiotemporal distribution of metabolites
   by combining the results of spatial shrunken centroids clustering with
   the safranin O/fast green stained sections of adjacent tissues
   (Supplementary Fig. [216]16b, c). Supplementary Fig. [217]16d shows
   images of specific metabolites in representative tissues (gland,
   diaphragm, ovule, and cavity).

   The mass spectra obtained from the whole section of cotton bolls in
   positive and negative ion modes were featured by peaks in the
   70–1000 m/z range (Supplementary Fig. [218]17a, b). The main compounds
   were organic acids, amino acids, polyamines, carbohydrates, choline,
   nucleotides, nucleosides, sesquiterpenoids, benzenoids, fatty acids
   (FAs), phosphatidylcholines (PCs), phosphatidylethanolamines (PEs),
   phosphatidylserines (PSs), phosphatidylglycerols (PGs),
   phosphatidylinositols (PIs), phosphatidic acids (PAs), etc.
   Supplementary Fig. [219]17c–h shows the mass spectra of the specific
   regions, such as the gland, ovule, and cavity in positive and negative
   ion modes. Supplementary Fig. [220]18 shows the spatial distribution of
   representative metabolites in cotton boll sections based on AFADESI-MSI
   imaging. Supplementary Figs. [221]19–[222]30 shows the imaging of
   metabolites in some key metabolic pathways, including lipid metabolism
   (linoleic acid metabolism, fatty acid degradation, glycerolipid
   metabolism, glycerophospholipid metabolism), carbohydrate metabolism
   (citrate cycle (TCA cycle), glycolysis /gluconeogenesis, pentose
   phosphate), amino acid metabolism (arginine and proline metabolism,
   glycine, serine and threonine metabolism, alanine, aspartate and
   glutamate metabolism), and nucleotide metabolism (purine metabolism,
   pyrimidine metabolism). It was worth noting that most metabolites
   showed highly heterogeneous spatial characteristics, for example, FAs,
   PAs, and PCs were specifically expressed in the ovule region, and PIs
   and carbohydrates were specifically expressed in the cavity region. The
   above results indicated that the AFADESI-MSI technique could indeed
   perform imaging analysis of metabolites in cotton bolls with high
   sensitivity, wide coverage, and strong specificity.

Screening of key metabolites for fiber development

   To identify metabolites that play a key role in early fiber
   development, we screened for metabolites that are specifically
   expressed in ovule from − 1.5-DPA to 5-DPA (Supplementary
   Data [223]16). These metabolites were mainly fatty acids and conjugates
   (such as oleic acid and dodecanedioic acid), linoleic acids and
   derivatives (such as linoleic acid and α-linolenic acid), amines (such
   as spermine and spermidine), amino acids, peptides, and analogs (such
   as L-arginine) (Supplementary Fig. [224]31). In addition, succinic
   acid, choline, PA-34:1, PA-34:2, PE-44:10, PE-36:4, PC-33:4 were also
   highly expressed in early developing ovule, while their expression was
   relatively low in other tissues. The main metabolic pathways involved
   in these specific metabolites include ‘linoleic acid metabolism’,
   ‘cutin, suberin and wax biosynthesis’, ‘α-linolenic acid metabolism’,
   ‘glycerophospholipid metabolism’, ‘β-alanine metabolism’, ‘biosynthesis
   of unsaturated fatty acids’, ‘glutathione metabolism’, ‘arginine and
   proline metabolism’ (Fig. [225]5a). We then analyzed the accumulation
   dynamics of these specific metabolites in ovule at different
   developmental stages from − 1.5-DPA to 5-DPA (Fig. [226]5b), and imaged
   the spatial distribution characteristics of some representative
   metabolites (Fig. [227]5c). The expression abundance of behenic acid,
   linoleic acid, PE-44:10, spermine, choline in ovule from − 1.5-DPA to
   5-DPA was gradually decreased, indicating that these metabolites may be
   closely related to the initiation of fibers. In contrast, the
   expression abundance of α-linolenic acid, histidinal, oleic acid, and
   PA-34:1 were gradually increased in ovule from − 1.5-DPA to 5-DPA,
   suggesting that these metabolites may be closely related to the early
   elongation of fibers.

Fig. 5. Specific expression metabolites identified in cotton ovules from
−1.5-DPA to 5-DPA under positive and negative ion mode.

   [228]Fig. 5
   [229]Open in a new tab

   a Enrichment of Top 10 metabolic pathways of ovule-specific
   metabolites. KEGG pathway enrichment analysis was performed using a
   hypergeometric test to evaluate the significance of metabolite
   enrichment in each pathway. b Heatmap shows the pattern of variation in
   the abundance of specific metabolites identified in ovule at different
   stages. Metabolite abundance is indicated by the color ranging from
   blue (low) to red (high). c Mass spectrum imaging of dodecanedioic
   acid, linoleic acid, alpha-linolenic acid, and oleic acid from Fig.
   (b). The data shown are representative images of three independent
   experiments with similar results. Metabolite abundance is indicated by
   a color key ranging from blue (low) to red (high), and the intensity of
   the color scale is a relative value. Source data are provided as a
   Source Data file.

   We then performed unsupervised principal component analysis (PCA) to
   compare the metabolite data of cotton ovule in both positive and
   negative ion modes, and the results showed that there was a significant
   clustering and grouping trend of endogenous metabolites in different
   stages of development (Supplementary Fig. [230]32). The spatiotemporal
   anatomical variations in the early stages of fiber development showed
   that fiber cells initiated from cells in the outer integument of the
   ovule at 0-DPA and then increased significantly in length at 3-DPA,
   indicating that 3-DPA was the key stage of fiber elongation
   (Supplementary Fig. [231]1). Therefore, the differentially produced
   metabolites (DPMs) that were highly abundant in the ovule of 0-DPA may
   play a key role in fiber initiation, while the DPMs that were highly
   abundant in the ovule of 3-DPA may play a key role in fiber elongation.

   To further identify key metabolites for fiber initiation, we compared
   DPMs between 0-DPA ovule and − 1.5-DPA ovule (Supplementary
   Fig. [232]33 and Supplementary Data [233]17). Compared with − 1.5-DPA
   ovule, DPMs such as glucaric acid, L-glutamic acid, L-asparagine,
   D-ribose, L-aspartic acid, D-glucose, fructose 6-phosphate,
   L-glutamine, oxoglutaric acid, L-threonine, dIMP were up-regulated in
   0-DPA ovule. These up-regulated DPMs were mainly involved in ‘alanine,
   aspartate, and glutamate metabolism,’ ‘glyoxylate and dicarboxylate
   metabolism,’ ‘glycine, serine, and threonine metabolism,’ and
   ‘histidine metabolism.’ However, compared with − 1.5-DPA ovule, the
   DPMs down-regulated in 0-DPA ovule mainly included PA-34:1,
   L-ornithine, cis-aconitic acid, behenic acid, choline, L-valine,
   L-arginine, spermine, linoleic acid, α-linolenic acid, L-histidine.
   These down-regulated DPMs were mainly involved in metabolic pathways
   including ‘linoleic acid metabolism’, ‘biosynthesis of unsaturated
   fatty acids’, ‘arginine and proline metabolism’, ‘β-alanine
   metabolism’, ‘α-linolenic acid metabolism’.

   Similarly, to further identify key metabolites for the early elongation
   of fibers, we compared DPMs between 3-DPA ovule and − 1.5-DPA ovule
   (Supplementary Fig. [234]34 and Supplementary Data [235]18). Compared
   with − 1.5-DPA ovule, the DPMs up-regulated in 3-DPA ovule mainly
   included D-glucose, glucaric acid, oleic acid, L-asparagine, L-glutamic
   acid, α-linolenic acid, D-ribose, deoxyribose, PI-32:0, glutarate,
   fructose 6-phosphate, dIMP, 9,10-dihydroxystearic acid, phosphatidate
   (34:3). These up-regulated DPMs were mainly involved in metabolic
   pathways including ‘pentose phosphate pathway’, ‘cutin, suberine and
   wax biosynthesis’, ‘alanine, aspartate and glutamate metabolism’,
   ‘starch and sucrose metabolism’, ‘biosynthesis of unsaturated fatty
   acids’. However, compared with the − 1.5-DPA ovule, the DPMs
   down-regulated in the 3-DPA ovule mainly included linoleic acid,
   L-histidine, L-aspartic acid, behenic acid, L-ornithine, cis-aconitic
   acid, choline, L-valine, L-arginine, spermine. These down-regulated
   DPMs were mainly involved in metabolic pathways including ‘arginine
   biosynthesis’, ‘linoleic acid metabolism’, ‘pantothenate and CoA
   biosynthesis’, ‘β-alanine metabolism’, ‘arginine and proline
   metabolism’, ‘glutathione metabolism’.

   In addition, to verify the accuracy of the SM results, we performed
   liquid chromatography-tandem mass spectrometry (LC-MS)/MS analysis on
   samples at the same developmental stage as those used in the SM assay
   and conducted referential target metabolites analysis on some
   representative metabolites, further proving the metabolites identified
   in the SM analysis (Supplementary Fig. [236]35). The annotation of
   representative target analyte obtained from spatial metabolomics were
   further identified by MS/MS spectrum (Supplementary
   Figs. [237]36–[238]61).

Integrated analysis of transcriptome and metabolome data

   Next, we conducted an integrated analysis of the ST and SM data from
   the early development stage of cotton, and more comprehensively
   analyzed the internal molecular regulatory mechanism of fiber
   development from the multi-omics dimension. The SM of this study
   identified numerous DPMs during early fiber development, of which
   L-arginine, oxoglutaric acid, L-aspartic acid, α-linolenic acid,
   linoleic acid, spermine, and spermidine were the significant ones
   (Fig. [239]5 and Supplementary Figs. [240]33–[241]34). Based on the
   network relationship information of metabolic pathways in the KEGG
   database, we conducted an integrated analysis of these key DPMs and ST
   data (Supplementary Data [242]19). As shown in Fig. [243]6a, we
   visualized key metabolites and regulatory genes in the entire
   α-linolenic acid metabolism pathway. SM analysis revealed that
   phosphatidylcholine, α-linolenic acid, 13(S)-HPOT, and traumatic acid,
   key metabolites in the ‘α-linolenic acid metabolism’ pathway, were
   heterotopically highly expressed in the ovule. Then, 56 genes encoding
   metabolic enzymes related to this pathway were screened out from the ST
   data (Supplementary Data [244]20). Specifically, LCAT4 or PLA2-ALPHA
   catalyzes the metabolism of phosphatidylcholine to α-linolenic acid and
   is highly expressed in ovule. The α-linolenic acid can not only be
   catalyzed by DOX2 to 2(R)-HPOT but also by LOX3.1.3 to 13(S)-HPOT.
   Then, HPL catalyzes the metabolism of 13(S)-HPOT to traumatic acid. In
   this metabolic pathway, α-linolenic acid can be converted to JA by the
   catalyzation of a series of enzymes. A heatmap of the expression levels
   of α-linolenic acid-related genes from Supplementary Data [245]20 was
   made to study differential transcription at different stages and
   tissues, and these genes were found to be highly expressed in 3-, 5-DPA
   nucellus and fiber cells (Fig. [246]6b). These results were consistent
   with the high accumulation of α-linolenic acid in 3-DPA and 5-DPA ovule
   (Fig. [247]5c), strengthening that α-linolenic acid plays an important
   role in early fiber elongation as a signature metabolite. In addition,
   we hypothesized that the genes highly expressed in the fiber cells and
   integument cells in Fig. [248]6b (e.g., DOX2) play an important role in
   early fiber development. DOX2 was highly expressed in 3-DPA fiber cells
   in ST analysis, and they were also involved in the ‘α-linolenic acid
   metabolism’ pathway.

Fig. 6. Integrated analysis of α-linolenic acid metabolic pathway and genes
related to metabolic enzymes in ST.

   [249]Fig. 6
   [250]Open in a new tab

   a MS images of key metabolites and spatial expression images of key
   genes in α-linolenic acid metabolic pathway (intensity in MS image
   color scale is relative value, intensity in gene image color scale is
   log2 transformed). b Heatmap shows the expression level of ‘α-linolenic
   acid metabolic’ pathway-related genes identified based on ST of five
   cotton tissue regions (fiber, integument, placenta, exocarp, nucellus,
   and diaphragm). Expression levels are indicated by a color key ranging
   from blue (low) to red (high).

   BRs and auxin have been shown to promote fiber initiation and
   elongation^[251]21,[252]22. We examined oxoglutaric acid, L-aspartic
   acid, linoleic acid, spermine, spermidine, BRs, and auxin, and
   conducted an integrated analysis with the transcriptome data
   (Supplementary Figs. [253]62–[254]63 and Supplementary Data [255]19).
   The results showed that oxoglutaric acid-related genes were mainly
   highly expressed in 0-, 1-DPA integument cells and 3-, 5-DPA fiber
   cells (Supplementary Fig. [256]62a), L-aspartic acid-related genes were
   mainly highly expressed in − 1.5-, − 1-, 0-, 1-DPA integument and − 1-,
   1-, 3-, 5-DPA nucellus cells (Supplementary Fig. [257]62b), linoleic
   acid-, and spermine and spermidine-related genes were mainly highly
   expressed in − 1.5-, − 1-DPA integument and 3-DPA fiber cells
   (Supplementary Fig. [258]62c, d), and BR-related genes were mainly
   highly expressed in placenta and integument (Supplementary
   Fig. [259]63a), while auxin synthesis- and transport-related genes were
   mainly highly expressed in integument and nucellus cells (Supplementary
   Fig. [260]63b, c). Thus, these observations suggested that oxoglutaric
   acid, L-aspartic acid, fatty acid metabolites (such as α-linolenic acid
   and linoleic acid), amine metabolites (such as spermine and
   spermidine), auxin, and BR-like substances play an important role in
   the early development of cotton fibers.

   In addition, quantitative strength analysis of correlation was
   performed on the key KEGG_PATHWAY gene dataset obtained from the
   integrated analysis of ST and SM in Supplementary Data [261]19.
   Specifically, we utilized the AddModuleScore function to score these
   KEGG_PATHWAY gene datasets (e.g., ‘α-linolenic acid metabolism’,
   ‘linoleic acid metabolism’, and ‘brassinosteroid biosynthesis and
   signal transduction’), and then drew feature maps for visualization to
   calculate the mean expression values of related genes for integrated
   analysis in each cell population (Supplementary Fig. [262]64). Results
   of the above analysis indicate that there was a strong correlation
   between the obtained transcriptome dataset and metabolome dataset.

Marker gene expression analysis by ISH

   Through ST analysis at the early development stage of cotton bolls,
   marker genes DOX2, SBT4.14, GH3.6.2, and SWEET15 were identified in
   integument cells (Fig. [263]2d). In order to verify the classification
   reliability of the clustered cell clusters in ST analysis, ISH was used
   to analyze the expression patterns of these four integument cells
   markers in cotton bolls at the early development stage. ISH analysis
   showed that DOX2 and SBT4.14 were highly expressed in − 1-DPA outer
   integument cells, GH3.6.2 was strongly expressed in 0-DPA outer
   integument cells, and SWEET15 was strongly expressed in 3-DPA outer
   integument cells (Fig. [264]7b), which was consistent with ST analysis
   (Fig. [265]7a). Furthermore, we used ISH to analyze the expression
   patterns of marker genes SWEET15 and DOX2 in cotton bolls at each stage
   of early development, and found that they were obviously expressed in
   integument cells at each stage (Supplementary Fig. [266]65). This
   result was consistent with the spatiotemporal expression patterns of
   DOX2 and SWEET15 in early development obtained by the ST analysis. The
   above analysis indicated that the classification of cell clusters in
   our ST analysis was reliable, and also verified that DOX2, SBT4.14,
   GH3.6.2, and SWEET15 were marker genes of integument cells.

Fig. 7. Spatiotemporal expression pattern of integument cell marker genes
revealed by RNA in situ hybridization.

   [267]Fig. 7
   [268]Open in a new tab

   a Spatiotemporal expression pattern of DOX2, SBT4.14, GH3.6.2, and
   SWEET15, the marker genes of integument cells based on spatial
   transcriptomics. The color intensity indicates the relative transcript
   level of each marker gene. b Nitro-blue tetrazolium-stained RNA ISH
   shows the spatiotemporal expression patterns of DOX2, SBT4.14, GH3.6.2,
   and SWEET15. No probe was used as a blank control. The positive cells
   are stained bluish-violet. The data shown are representative images of
   three independent experiments with similar results. OI, outer
   integument; II, inner integument; N, nucellus. Red arrows indicate
   fiber cells. Scale bars, 50 μm in − 1 DPA and 0 DPA; 100 μm in 3 DPA.

GhBEE3 promotes cotton fiber cell initiation

   BRs have been reported to promote fiber initiation and
   elongation^[269]22,[270]48. In the cotton placenta, we detected higher
   expression of GhBEE3, which encodes an important component of BR
   signaling^[271]49 (Supplementary Fig. [272]9b, c). Therefore, we
   speculated whether GhBEE3 might affect the early developmental process
   of cotton fiber through BR signaling. Therefore, using the
   Zhongmiansuo24 (ZM24) variety as the background, we generated GhBEE3
   (Gh_A09G062900) overexpression and RNAi lines and advance them into T3
   generation to determine the potential role of GhBEE3 in the early
   development of cotton fibers (Supplementary Fig. [273]66). Bolls at
   various developmental stages (− 1-DPA, 0-DPA, 1-DPA, 2-DPA, and 3-DPA)
   were collected simultaneously from similar positions of the cotton
   plant. Then, scanning electron microscopic (SEM) was used to observe
   the development of fiber cells on ovule surface. It was evident that
   very few protruding fiber cells on the ovules of ZM24 (control),
   GhBEE3-OE, and GhBEE3-RNAi plants at − 1-DPA (Fig. [274]8a). At 0 DPA,
   the number of protruding fiber cells in GhBEE3-OE ovule was
   significantly higher than ZM24, while those in GhBEE3-RNAi were lower
   (Fig. [275]8a, b). Similarly, at 1-DPA, the GhBEE3-OE ovule surface
   showed significantly more protruding fiber cells than ZM24, while the
   GhBEE3-RNAi ovule surface displayed fewer (Fig. [276]8a, c). Detailed
   analysis showed quite elongated fiber cells on the ovule surface at
   2-DPA and 3-DPA. Besides, the protruding fiber cells in GhBEE3-OE were
   denser than ZM24, while those of GhBEE3-RNAi were sparser
   (Fig. [277]8a). These results indicated that GhBEE3 promotes cotton
   fiber cell initiation.

Fig. 8. GhBEE3 positively regulates cotton fiber initiation.

   [278]Fig. 8
   [279]Open in a new tab

   a Scanning electron microscopy images show the protuberant fiber cells
   in the ovules of different cotton lines at different developmental
   stages. The data shown are representative images of three independent
   experiments with similar results. Scale bar, 100 μm. b Number of
   protuberant fiber cells per 0.01 square millimeter in 0-DPA ovules of
   different lines. Data are means ± SD from three independent biological
   replicates (n = 3). The error bar is the sample standard deviation
   relative to the mean. Statistical analysis was assessed by unpaired
   two-tailed t test, and significant differences were indicated with
   p-values at confidence intervals of 0.95. c Number of protuberant fiber
   cells per 0.01 square millimeter in 1-DPA ovules of different lines.
   Data are means ± SD from three independent biological replicates
   (n = 3). The error bar is the sample standard deviation relative to the
   mean. Statistical analysis was assessed by unpaired two-tailed t test,
   and significant differences were indicated with p-values at confidence
   intervals of 0.95. Source data are provided as a Source Data file.

Discussion

   Cotton is one of the most important sources of natural fiber
   worldwide^[280]1. Therefore, the regulatory mechanism underlying cotton
   fiber development has been a consistent focus of cotton research.
   Although researchers have made considerable progress and an array of
   regulatory genes and metabolites have been identified, our
   understanding of the regulatory mechanism underlying cotton fiber
   development remains limited. Cotton fiber develops from a single cell
   and undergoes a complex series of developmental processes, including
   the determination of cell fate, the initiation of fiber development,
   the elongation of individual fibers, and the maturation of fiber
   cells^[281]5,[282]6. Across all of these complex developmental
   processes, transcriptomic patterns change dynamically at different
   developmental stages. Specific genes involved in the regulation of
   fiber development are active at different developmental stages, making
   the identification and characterization of key regulatory genes
   challenging. Therefore, it is necessary to conduct in-depth and
   systematic analyses of the developmental patterns of cotton fiber cells
   at single-cell resolution. In this study, multi-omics techniques, such
   as single-cell transcriptomics, spatial transcriptomics, and spatial
   metabolomics, were used to investigate dynamic transcriptomic and
   metabolomic changes during the early stages of cotton fiber
   development. In addition, we also preliminarily screened and identified
   key genes and metabolites expressed at specific developmental stages
   and in specific tissues. Using these data, we created a graphical and
   user-friendly database to provide other scientists with information on
   the expression patterns and dynamics of genes involved in the
   development of cotton fiber. This information can not only save time
   during early experimental stages, but can also reduce research costs,
   accelerate research progress, and promote the genetic improvement of
   cotton with excellent fiber quality.

   Gene expression is spatially specific during the development of
   organisms; hence, studying gene expression in specific tissues will
   help us better understand the mechanisms of tissue genesis and
   development. Recently, the use of ST has allowed successful
   localization of cells and high-throughput gene detection in plants,
   such as Arabidopsis^[283]50. Unlike scRNA-seq, ST has the advantage
   that no specific marker genes are required for cell type identification
   in non-model crops, such as cotton. Yet, the application of ST in
   plants is hindered by tissue complexity and permeability challenges. To
   address these issues, we initially optimized conditions for
   permeabilizing the cotton boll samples (specifically, modification of
   permeabilization enzyme cocktail and incubation time) through
   preliminary tissue optimization experiment, and ensured complete mRNA
   release from the boll slice, a crucial step in ST (Supplementary
   Fig. [284]2). Subsequent library preparation yielded cDNAs with defined
   lengths (Supplementary Fig. [285]3), guaranteeing sequencing accuracy
   and ST precision. A recent study used a combination of ST with
   scRNA-seq and created a single-cell spatiotemporal transcriptional
   atlas of intestinal development in humans, mapping morphologic patterns
   across time, location, and cell compartments^[286]51. The combination
   of scRNA-seq and ST can enable the study of spatial gene expression to
   reach the ultra-high resolution single-cell level, which has a
   promising application prospect in the field of plant tissue cell
   development.

   Recently, Qin et al. conducted scRNA-seq on cotton ovules (− 1.5-,
   − 1-, − 0.5-, and 0-DPA)^[287]32. They elucidated the regulatory
   network underlying the initiation of cotton fiber differentiation and
   provided a genome-wide gene expression during the initiation and early
   elongation of fiber cells^[288]32. Nevertheless, an integrated approach
   based on scRNA-seq and ST will provide genes’ complete temporal and
   spatial expression data during fiber initiation and elongation. In this
   study, we analyzed cotton ovules from the same developmental stage
   (− 1.5-, − 1-, 0-, 1-, 3-, and 5-DPA) using both ST and scRNA-seq
   (Figs. [289]2, [290]3). Although we could not obtain high-quality
   protoplasts from 3- and 5-DPA cotton ovules for scRNA-seq, our
   scRNA-seq still covered samples from the early stages of cotton
   development. Furthermore, correlation analysis was conducted between ST
   and scRNA-seq datasets from us and scRNA-seq datasets from Qin et
   al.^[291]32, respectively (Supplementary Figs. [292]5, and 13). The
   results both found that there was a strong correlation between the two
   datasets, which, to some extent confirmed the reliability of our
   research results.

   The spatial information on gene expression obtained via ST combined
   with multi-stage sampling helped us identify key genes regulating fiber
   initiation and early elongation, and those highly expressed in specific
   cell clusters during early boll development. Specifically, we annotated
   the cell clusters based on spatial barcode information and identified
   the top ten DEGs for each cell cluster as the marker genes
   (Fig. [293]2). These marker genes are closely related to their cell
   clusters. The functions of a few marker genes have been studied in
   cotton and other species, but the potential functions of most of these
   marker genes have not been fully explored. For example, our ST data
   identified KCS19.4, FLA11.3, TMK3, and bZIP43.11 as marker genes of
   ‘fiber cells’ (Fig. [294]2 and Supplementary Figs. [295]10, [296]11).
   It has been reported that KCS19 is a member of the same gene family as
   KCS1 in Arabidopsis that plays a role in increasing the level of
   very-long-chain fatty acids (VLCFAs) and improving the cold tolerance
   of seedlings^[297]52. FLA11.3 acts with FLA12 to fine-tune the
   initiation of secondary cell wall development and the balance of lignin
   and cellulose in the secondary cell wall in Arabidopsis^[298]53. TMK3
   is a member of the receptor-like kinase (RLK) family and a key
   regulator of cell expansion and proliferation in Arabidopsis^[299]54.
   The bZIP43.11 plays a role in regulating bHLH109 during the induction
   of somatic embryogenesis (SE) in Arabidopsis^[300]55. SBT4.14, GH3.6.2,
   DOX2, and GASA14.1 are marker genes of ‘integument cells’
   (Fig. [301]2). It has been reported that Arabidopsis homologous genes
   GH3, SBT4.14, and Solanum lycopersicum-homologous alpha DOX2 were
   identified to play a role in auxin metabolism, stress responses, and
   lipid synthesis, respectively^[302]56–[303]58. GASA14.1 has been
   reported to play a role in regulating the accumulation of ROS in
   Arabidopsis^[304]59. In addition, we also identified SWEET15 as the
   marker gene of ‘integument’ and ‘nucellus cells’ (Fig. [305]2), and
   TCP5.3 as the marker gene of ‘mesocarp cells’ (Supplementary
   Fig. [306]6b). Two consecutive field experiments showed that the
   GhSWEET15-upregulated fibers became thinner, longer, and stronger than
   the wild-type control, suggesting that SWEET15 promotes fiber
   elongation^[307]60. In Arabidopsis, TCP5.3 plays a role in controlling
   plant thermo-morphogenesis by positively regulating PIF4
   activity^[308]61. Further, GO annotation of the DEGs indicated fatty
   acid oxidation enrichment in the outer integument cells during early
   development and the same genes with similar expression levels in fiber
   cells and the ovary wall (Supplementary Table [309]2). These results
   provide insights into the regulatory networks governing cotton fiber
   differentiation.

   In order to verify the classification reliability of the clustered cell
   clusters in ST analysis, ISH was used to analyze the expression
   patterns of DOX2, SBT4.14, GH3.6.2, and SWEET15, the markers of
   ‘integument cells’ in cotton bolls at the early development stage
   (Fig. [310]7 and Supplementary Fig. [311]65). ISH analysis showed that
   these four genes were expressed in integument cells, which was
   consistent with our ST analysis (Fig. [312]7 and Supplementary
   Fig. [313]65). In addition, the results of ST analysis revealed that
   PDF1, a key gene known for fiber initiation, was highly expressed in
   the ‘fiber cells’ and ‘integument cells’ regions of young cotton bolls
   (Supplementary Fig. [314]5c), consistent with those reported by Deng et
   al.^[315]12. Previously, GbPDF1 was preferentially expressed during
   fiber initiation and early elongation, with the highest transcript
   accumulation in 5-DPA fiber cells^[316]12. The results of these reports
   confirmed that the classification of cell clusters in ST analysis was
   reliable.

   Researchers now adopt MSI instead of gas chromatography-mass
   spectrometry (GC-MS) to gain insights into the spatial distribution of
   metabolites in complex and heterogeneous tissues^[317]62. With various
   other metabolomics techniques, MSI has been used as an SM approach for
   in situ metabolic profiling^[318]63. Several metabolites related to
   cotton fiber development, such as sugars, hormones, amino acids, and
   fatty acids, have been reported. For example, sucrose and very
   long-chain fatty acids (VLCFAs) are known to regulate cotton fiber
   elongation^[319]64,[320]65. However, the influence of small molecule
   metabolites, such as amino acids, nucleosides, and organic acids, on
   cotton fiber development remains largely unknown. Therefore, to explore
   the spatiotemporal changes of metabolites in cotton boll cells and
   their potential roles in regulating fiber development, we conducted an
   SM using AFADESI-MSI on samples collected from the same developmental
   stages of ST (Supplementary Fig. [321]16). Specifically, we identified
   several metabolites that play an important role in regulating early
   development of the ovule (Fig. [322]5, Supplementary Fig. [323]31 and
   Supplementary Data [324]16). For example, succinic acid, a four-carbon
   dihydric acid oxidized by succinate dehydrogenase (SDH) as part of the
   TCA cycle, mediates ROS metabolism in plants^[325]66. In order to
   identify key metabolites regulating fiber initiation and early
   elongation, we screened the DPMs in 0-DPA and 3-DPA ovules,
   respectively (Supplementary Figs. [326]33−34 and Supplementary
   Data [327]17−[328]18). α-linolenic acid was detected at high levels at
   the fiber elongation stage (3-DPA to 5-DPA) (Fig. [329]5), consistent
   with the reported roles of linolenic acid in promoting cotton fiber
   elongation^[330]26. JA has been reported to promote fiber
   development^[331]67, and α-linolenic acid can be converted to JA in the
   ‘α-linolenic acid metabolic’ pathway (Fig. [332]6a), which seems to
   indicate that α-linolenic acid is highly likely to promote fiber
   elongation.

   Furthermore, we analyzed the ST and SM data in an integrated manner to
   explore the gene-metabolite network and elucidate the regulatory
   pathways at the early stage of fiber development (Fig. [333]6,
   Supplementary Figs. [334]62–[335]63 and Supplementary Data
   [336]19–[337]20). In addition, the results obtained by the
   AddModuleScore function indicate a strong correlation between the
   transcriptome dataset and metabolome dataset, at least in the case of
   the pathways related to ‘α-linolenic acid metabolism,’ ‘linoleic acid
   metabolism,’ and ‘BR biosynthesis and signal transduction’
   (Supplementary Fig. [338]64).

   Analysis of the ST dataset identified a BR-related gene, GhBEE3, that
   was highly expressed in the placenta (Supplementary Fig. [339]9b, c and
   Supplementary Fig. [340]63a). A previous study based on transcriptome
   analysis revealed that BRs promote cotton fiber elongation by
   influencing ethylene and cadmium signaling, fatty acid biosynthesis,
   and cell wall- and cytoskeletal-related gene expression^[341]68;
   however, the molecular mechanisms underlying this regulation remain
   largely unknown. In another independent study, GhBEE3 has been
   identified as a positive regulator of BR signaling^[342]49, which
   suggests its role in promoting fiber initiation and elongation.
   Therefore, we produced GhBEE3-RNAi and GhBEE3-OE lines to assess their
   function (Supplementary Fig. [343]66). Detailed analysis showed that at
   0-DPA and 1-DPA, the GhBEE3-OE ovules had significantly more protruding
   fiber cells than the ZM24 control, while GhBEE3-RNAi ovules had fewer
   (Fig. [344]8). These observations indicated that GhBEE3 regulate fiber
   initiation in cotton maybe by modulating the BR signaling or other
   signaling pathways. However, the role of GhBEE3 and other DEGs in
   regulating fiber development needs to be investigated.

   In summary, based on a rational, multi-DPA sampling strategy and the
   combined application of ST, scRNA-seq, and SM technologies
   (Fig. [345]1), the present study demonstrated the spatiotemporal
   profiles of transcriptomes and metabolomes during the initiation and
   elongation of fibers in cotton. We also developed an interactive
   website to include the transcriptomes and metabolomes generated in this
   study, providing a reference for further research on cotton fiber
   development at a single-cell level (Supplementary Fig. [346]67). At the
   same time, our study also provides a feasible research strategy for
   other scientists to conduct similar research, and provides a key
   methodological model for accelerating the application of single-cell
   spatial multiomics to diverse crop species.

Methods

Plant materials and sample collection

   We used the upland cotton TM-1 (G. hirsutum L. acc. Texas Mark-1) to
   perform ST, scRNA-seq, SM, LC-MS/MS, and ISH experiments and
   Zhongmiansuo24 (ZM24) to generate the overexpression (OE) and RNA
   interference (RNAi) lines. All cotton plants were maintained in the
   experimental field at the Institute of Cotton Research in the Chinese
   Academy of Agricultural Sciences (CAAS; Anyang, China) under
   conventional field management measures. Flowers were labeled at the
   start of flowering, and bolls were sampled − 1.5-, − 1-, 0-, 1-, 2-,
   3-, and 5-DPA to obtain fibers at different stages of development. All
   samples were collected at 8:00 AM and 8:00 PM. ST experiment and SM
   experiment were performed on the same batch of cotton boll tissue
   samples.

Spatial transcriptomics analysis

   All reagents were obtained from Visium Spatial Gene Expression Reagent
   Kits, and the experiments were operated according to the user guide of
   “Visium Spatial Protocols-Tissue Preparation Guide ([347]CG000240)”,
   “Visium Spatial Gene Expression Reagent Kits-Tissue Optimization
   ([348]CG000238)”, and “Visium Spatial Gene Expression Reagent Kits
   ([349]CG000239)”.

Sample preparation

   The collected cotton fresh bolls (− 1.5-DPA, − 1-DPA, 0-DPA, 1-DPA,
   3-DPA, and 5-DPA) were embedded directly in optimum cutting temperature
   (OCT) compound (OE Biotech, Shanghai, Item No. 4583) in a histology
   plastic cassette, and cut into 10 μm serial frozen sections at − 20 °C
   on a cryostat microtome (Leica, CM 1950). Subsequently, RNA was
   extracted and assessed for quality (Supplementary Table [350]1). The
   eligible sections with RNA integrity number (RIN) greater than 7 were
   selected for Hematoxylin-Eosin (H&E) staining (Sigma, USA) and
   incubated at 37 °C for 5 min prior to imaging. Images were used to
   confirm that each section included the targeted region. The remaining
   qualified sample sections were used for subsequent experiments.

Tissue optimization

   Adjacent frozen cotton boll sections on 10 × Genomics Visium array
   slides were fixed, stained, and permeabilized at different times. The
   permeabilization enzyme cocktail has been modified to include 2% w/v
   Cellulase R10, 0.4% w/v Macerozyme R10, 1% w/v Pectinase, 1% w/v
   Hemicellulase, 0.4% Snailase. Poly adenylated mRNA from the attached
   tissue section was captured by probes on the slides. Then, add Master
   Mix containing reverse transcription (RT) reagents and fluorescently
   labeled nucleotides to the surface of the tissue section to obtain
   fluorescently labeled cDNA. Next, remove excess tissue, leaving
   fluorescently labeled cDNA covalently linked to oligonucleotides on the
   Visium array slides. Then, fluorescently labeled cDNA was visualized.
   Twenty-four minutes was chosen as the optimal permeabilization time
   because the fluorescence signal is the strongest and the signal
   diffusion is the weakest (Supplementary Fig. [351]2).

Tissue permeabilization, cDNA synthesis, library construction and sequencing

   Fixed and stained tissue sections were permeabilized for 24 min with
   the modified permeabilization enzyme. The primers on the spots capture
   the poly-adenylated mRNA released from the overlying cells. Then, cDNA
   synthesis, library construction, and sequencing were performed
   according to the User Guide of Visium Spatial Gene Expression Reagent
   Kits. During the library preparation, the cDNA lengths we obtained were
   distributed within the standard range (Supplementary Fig. [352]3).

Processing of sequencing data and quantification of gene expression

   The raw reads in FASTQ format generated via high-throughput sequencing
   were processed with the 10 × Genomics Space Ranger software (version
   1.2.0), and the obtained sequences were aligned to the cotton reference
   genome using the STAR software (version 2.7.1a). The brightfield images
   of the sections were captured. Then, the spatial barcode information
   was used to align the reads to specific spots on the tissue section
   using the H&E images. The total number of spots, the number of reads
   per spot, the number of detected genes, and the number of unique
   molecular identifiers (UMIs) were determined to evaluate the quality of
   the sequencing reads. Finally, a gene-spot matrix was generated for
   gene expression analysis.

Quality control and further processing of data

   The Seurat4 package was used to perform further quality control and
   process the data obtained via Space Ranger (version 1.2.0). Since the
   sequencing data of all samples were of good quality (Supplementary
   Fig. [353]4) and more spot expression information was expected, the
   downstream data processing and analysis were directly based on the
   matrix results generated by Space Ranger, without further filtering of
   spots. Then, the SCTransform function was used to normalize and
   stabilize the variance in the ST data using regularized negative
   binomial regression, and the data were stored in the SCT assay for
   further analysis.

Dimensionality reduction and clustering

   The batch effects in ST data were corrected with the R package
   ‘batchelor’^[354]69 (version 1.3.4), following the mutual nearest
   neighbors (MNN) approach. Then, the “FindClusters” function of the
   Seurat4 package was used to scale the gene expression and cluster the
   cells based on the MNN approach. This strategy removed batch effects in
   the ST data and enabled the detection of cell subpopulations. Seurat’s
   “RunUMAP” function, a two-dimensional uniform manifold approximation
   and projection (UMAP) algorithm, was used to visualize the cells.

Differential gene and enrichment analysis

   The “FindMarkers” function (test. use = MAST) in Seurat4 was used to
   identify the differentially expressed genes (DEGs); the genes with a
   |log[2]foldchange | > 0.58 and a p-value < 0.05 were identified as the
   DEGs. Then, GO enrichment analyses of DEGs were performed with the
   annotated gene name of DEGs [[355]https://pantherdb.org/]. The
   reference species is G. hirsutum.

Pearson correlation analysis

   Correlations were calculated using the summation method to obtain the
   expression level of each gene across different samples and cell groups.
   First, the expression counts for each gene was summed for each sample.
   Second, “1” was added to the summation value to take log10 (i.e., log10
   [sum + 1]). Third, the scatter plot was created according to this
   value, with each point representing one gene. Using the stat_cor ()
   function in the ggpubr package, the correlation value and p-value were
   calculated by Pearson analysis.

Preparation of single-cell suspensions from cotton ovules

   We optimized reagent and treatment conditions for the dissociation of
   protoplast from cotton at − 1.5-DPA, − 1-DPA, 0-DPA, and 1-DPA. The
   major improvements involved modification in enzyme solution formulation
   (20 mM MES (pH 5.7), 3% (wt/vol) Cellulase R10, 1% (wt/vol) Macerozyme
   R10, 1% (wt/vol) Snailase, 1% (wt/vol) Cutinase, 0.4 M Mannitol, and
   20 mM KCl) and treatment conditions (dark condition, 28 °C, 20 rpm
   horizontal shaking). The cotton ovules were cut into 1 mm pieces with a
   blade and soaked in the enzyme solution for 15 min under a vacuum. The
   mixture was then strained through a 40 μm cell filter to remove large
   tissue blocks and obtain a protoplast suspension, which was kept on ice
   to prevent cell death. The protoplasts were counted with a
   hemocytometer, and the activity was assayed following trypan blue
   staining (0.4%). The protoplasts with more than 90% activity rate were
   used for scRNA-seq library construction.

Construction of scRNA-seq cDNA library

   The scRNA-seq library was prepared from the protoplasts using a
   Chromium Single Cell 3’ Gel Beads-in-emulsion (GEM) Library & Gel Bead
   v3 Kit, following the manufacturer’s instructions (10 × Genomics,
   California, USA).

ScRNA-seq data preprocessing

   The raw sequencing data were analyzed using the Cell Ranger software
   pipeline (version 5.0.0) provided by 10 × Genomics. The sequences were
   aligned against the G. hirsutum TM-1 reference genome^[356]70 using the
   STAR aligner (version 2.7.1a), and the transcripts were quantified to
   obtain the unique molecular identifier (UMI) count matrix. This matrix
   was further processed using the R package Seurat4. Then, we filtered
   out cells with UMIs/gene threshold (> mean value +/− two-fold of the
   standard deviation, assuming a Gaussian distribution of each cell’s
   UMI/gene number) and doublets using the DoubletFinder package (version
   2.0.2). The data obtained from cells after quality control was
   normalized using the “NormalizeData” function in Seurat4 to get the
   normalized count.

   Principal component analysis (PCA) was performed with the “RunPCA”
   function in Seurat4 to reduce the dimensionality, and graph-based
   clustering was performed using the “FindClusters” function in Seurat4
   to cluster the cells based on their gene expression profiles. The final
   data and the cells were visualized using the “Run UMAP” function in
   Seurat4. Then, the “FindAllMarkers” function (test. use = bimod) in
   Seurat4 was used to identify the positive marker genes for each cluster
   compared with other cells, and the “FindMarkers” function (test.
   use = MAST) to determine the DEGs (|log[2]foldchange | > 0.58 and
   p-value < 0.05). Further, the GO and KEGG pathway enrichment analyses
   of DEGs were performed using R based on the hypergeometric
   distribution.

Pseudotime analysis

   The developmental pseudotime was calculated with the Monocle 2 package
   (version 2.9.0). The scRNA-seq raw count from the Seurat4 object was
   first converted into the CellDataSet object with the “importCDS”
   function, and then, the ordering genes (q-value < 0.01) were selected
   using the “differentialGeneTest” function to order cells along the
   pseudotime trajectory. The dimensionality reduction and clustering was
   performed using the “reduceDimension” function, followed by trajectory
   inference with the “orderCells” function using default parameters.
   Finally, the expression of each gene was plotted over pseudotime using
   the “plot_genes_in_pseudotime” function.

Spatial metabolome sample preparation and process

   The boll samples at each stage had three biological replicates. The
   fresh cotton bolls (− 1.5-DPA, − 1-DPA, 0-DPA, 1-DPA, 2-DPA, 3-DPA, and
   5-DPA) were embedded directly in the Leica Cryo-Gel from Leica (Leica
   Microsystem, Germany, Item No. 14020108926), and subsequently frozen
   and solidified on dry ice. The continuous sections were sliced using a
   − 20 °C freezing microtome (Leica, CM 1950, Germany) with a thickness
   of 10 μm. Using a brush, the sections were gently deposited onto
   Superfrost Plus positive charge anti-slip slides (Thermofisher
   Scientific, Waltham, MA, USA). The samples were defrosted by placing a
   finger on the back of the slide, ensuring that the samples were
   attached to the anti-slip slides. The sections were stored at − 80 °C
   for subsequent MS imaging. Prior to AFADESI-MSI, the frozen tissue
   sections were dried in a vacuum dryer for 30 mins.

AFADESI-MSI analysis

   The MSI was performed on a Q-OT-qIT hybrid mass spectrometer (Orbitrap
   Fusion Lumos; Thermo Fisher Scientific, United States) coupled with a
   lab-made AFAI ion source at a spatial resolution of 100 μm.
   Acetonitrile/water (80:20, V/V; containing 0.1% formic acid; positive
   ion mode) and acetonitrile/water (80:20; negative ion mode) were used
   as solvents at a flow rate of 5 μL/min for scanning the tissues. The
   MSI was carried out in positive and negative ion modes from a
   mass-to-charge ratio (m/z) of 70–1000, with high mass accuracy (< 5 ppm
   mass error) and a high mass resolution (70,000). Nitrogen (0.6 MPa) was
   used as spray gas, and the capillary temperature was set to 350 °C. The
   tissue was scanned continuously along the x-axis at a constant speed
   (Vx) of 0.2 mm/s, maintaining a distance of 0.1 mm between two lines
   (Dy), an X-axis scanning length of 10 mm, and a Y-axis scanning length
   of 10 mm. The Xcalibur data acquisition and processing system was used
   to collect data in a sequence based on sample size, step spacing, and
   scanning speed.

AFADESI-MSI data preprocessing

   The acquired raw data (.raw format) were converted into.imzML format
   using imzMLConverter. Then, the open-source R package Cardinal was used
   to process the data and visualize the mass spectra. A series of
   processing steps, such as peak alignment and missing value
   interpolation, were used to obtain the quantitative data. All spectral
   images were normalized using the total ion count (TIC) in each pixel.
   Then, the target regions of interest were selected for subsequent
   analysis based on the results of spatial shrunken centroids clustering
   and tissue staining of sections.

LC-MS/MS analysis

   To verify the results of the SM analysis, we conducted LC-MS/MS
   analysis on integument_nucellus and exocarp_mesocarp tissue samples
   from the same batch used for SM. The metabolomics assay was performed
   by Shanghai Luming Biotechnology Co., Ltd (Shanghai, China). Six
   biological replicates were utilized for each group of samples. Briefly,
   50 mg samples were added to 1.5 mL Eppendorf tubes containing 600 μL of
   methanol-water solution (methanol:water = 7:3, V/V), with
   L-2-chlorophenylalanine (4 μg/mL) as an internal standard, and the
   tubes were subsequently vortexed for 10 s. Whole samples were extracted
   by ultrasonication for 30 min in an ice-water bath and then centrifuged
   at 4 °C (13,201 g) for 10 min. The supernatants (150 μL) from each tube
   were filtered using 0.22 μm microfilters and transferred to LC vials.
   The vials were stored at − 80 °C until LC-MS/MS analysis. Quality
   control (QC) samples were prepared by mixing aliquots of all samples
   into a pooled sample. An ACQUITY UPLC I-Class plus (Waters Corporation,
   Milford, MA, USA) fitted with a Q-Exactive mass spectrometer equipped
   with a heated electrospray ionization (ESI) source (Thermo Fisher
   Scientific, Waltham, MA, USA) was used to analyze the metabolic
   profiles in both positive and negative ion modes. An ACQUITY UPLC HSS
   T3 column (1.8 μm, 2.1 × 100 mm) (Waters Corporation, Milford, MA, USA)
   was employed in both positive and negative modes. The binary gradient
   elution system consisted of (A) water (containing 0.1% formic acid,
   V/V) and (B) acetonitrile. Separation was achieved using the following
   gradient: 0 min, 5% B; 2 min, 5% B; 4 min, 30% B; 8 min, 50% B; 10 min,
   80% B; 14-15 min, 100% B; 15.1 min, 5% B; and 16 min, 5%B. The flow
   rate was 0.35 mL/min, and the column temperature was 45 °C. All samples
   were kept at 10 °C during the analytical procedure. The injection
   volume was 4 μL. For MS, the mass range was from 100 to 1200 m/z. The
   resolution was set at 70,000 for full MS scans and 17,500 for MS/MS
   scans. The collision energy was set at 10, 20, and 40 eV. The mass
   spectrometer was operated as follows: spray voltage, 3800 V (+) and
   3000 V (−); sheath gas flow rate, 50 arbitrary units; auxiliary gas
   flow rate, 10 arbitrary units; capillary temperature, 320 °C; Aux gas
   heater temperature, 350 °C.

Metabolite annotation and comparative analysis

   The ions detected by AFADESI-MSI were annotated following the pySM
   pipeline and an in-house SmetDB database. SmetDB contains approximately
   20,000 endogenous plant metabolites and was constructed by Lumingbio
   (Shanghai, China). For bulk metabolomic analysis, the original LC-MS
   data were processed with Progenesis QI V2.3 (Nonlinear, Dynamics,
   Newcastle, UK) for baseline filtering, peak identification, integral,
   retention time correction, peak alignment, and normalization. The
   following parameters were applied: 5 ppm precursor tolerance, 10 ppm
   product tolerance, and 5% production threshold. Compounds were
   identified using the Metlin and LuMet-plant (a mass spectral database
   containing ~ 1000 standard substances constructed by Shanghai Luming
   Biotechnology Co., Ltd.) databases based on the precise mass-to-charge
   ratio (m/z), secondary fragments, and isotopic distribution data. The
   extracted data were then further processed by removing peaks with a
   missing value (ion intensity = 0) in more than 50% of groups, by
   replacing the zero value with half of the minimum value, and by
   screening according to the qualitative results of the compound.
   Compounds with scores below 36 (out of 60) points were also deemed to
   be inaccurate and removed. A data matrix was created combining both the
   positive and negative ion data. PCA was performed for the metabolome
   data to observe the overall sample distribution and process stability.
   Orthogonal partial least squares discrimination analysis (OPLS-DA) was
   used to distinguish the differences in the metabolic profiles and
   identify the differentially abundant metabolites between groups. Then,
   the variable importance in projection (VIP) was calculated for each
   compound to assess their significance in separating the groups.
   Further, a two-tailed Student’s t test was applied to compare the
   samples and determine the differences in metabolites; metabolites with
   VIP values > 1.0 and p-value < 0.05 were identified as the differential
   ones. The structure-specific pattern ions of the target analyte
   obtained from spatial metabolomics were further identified by LC-MS/MS
   after sample homogenizing, and some representative target analytes were
   demonstrated in Supplementary Figs. [357]35–[358]61.

Standard solutions and calibration curves

   In the pursuit of targeted metabolite analysis, we employed linolenic
   acid (B21967-20 mg, Shanghai Yuanye Bio-Technology Co., Ltd), linoleic
   acid (B21421-0.2 mL, Shanghai Yuanye Bio-Technology Co., Ltd), and
   succinic acid (B20534-20 mg, Shanghai Yuanye Bio-Technology Co., Ltd)
   as the subjects of our study. The stock solutions of the standard were
   prepared by dissolving the compound at a concentration of 1 mg/mL in
   methanol. To create the primary mixed standard stock solution (MSS), we
   mixed the individual standard stock solutions and further diluted the
   resultant mixture with methanol to achieve appropriate concentrations.
   Calibration curves were generated by serially diluting the MSS with a
   methanol-2-propanol mixture (1:1, V/V) to achieve the following final
   concentrations: 2000 ng/mL, 800 ng/mL, 320 ng/mL, 128 ng/mL,
   51.2 ng/mL, 20.18 ng/mL, 8.19 ng/mL, 3.28 ng/mL, 1.31 ng/mL,
   0.52 ng/mL, 0.21 ng/mL, and 0.08 ng/mL. For quality control purposes, a
   mixed standard of 50 ng/mL was selected as STD-QC. It is imperative to
   note that all solutions and samples were stored at − 80 °C to ensure
   their integrity until required for analysis.

Sample pretreatment and metabolite extraction

   Metabolite extraction procedures were conducted according to the
   chemical characteristics of multi-targeted metabolites under
   investigation. Specifically, 600 μL of ice-cold methanol-water (7:3,
   V/V), which included succinic acid-d4 as an internal standard, was
   added into freeze-dried samples weighing (0.0500 g). Subsequently, the
   samples were placed within the confined of two steel balls and
   subjected to grinding using a grinder at a frequency of 60 Hz for
   2 min. Following this initial step, the entirety of the samples
   underwent extraction via ultrasonication for 10 min in an ice-water
   bath. The resulting mixture was then subjected to centrifugation at
   4 °C (13,201 g) for 10 min, after which 300 μL of the supernatant was
   decanted into sample vials. This supernatant was subjected to drying
   under a nitrogen stream and subsequently re-dissolved in 300 μL of
   methanol-2-propanol (1:1, V/V), containing 2-chloro-L-phenylalanine as
   an internal standard. A second round of extraction by ultrasonication
   for 5 min in an ice-water bath was conducted, followed by
   centrifugation at 4 °C (15,493 × g) for 5 min. Finally, 150 μL of the
   resulting supernatant was selected for subsequent UPLC-ESI-MS/MS
   analysis. For quality control purposes, a pooled sample was prepared by
   mixing aliquots of the individual samples.

UPLC-ESI-MS/MS analysis

   The LC procedure was performed on an AB ExionLC system (AB SCIEX,
   Framingham, MA, United States). An ACQUITY UPLC BEH C18 column
   (2.1 mm × 100 mm, 1.8 μm) (Waters Corporation, Milford, MA, USA) was
   used for analysis, with an injection volume of 5 μL. The mobile phase A
   consisted of water containing 0.1% formic acid, while the mobile phase
   B comprised methanol with 0.1% formic acid. The elution gradient was as
   follows, maintaining a flow rate of 0.30 mL/min: 0 min, A/B (95:5,
   V/V); 1 min, A/B (95:5, V/V); 4 min, A/B (20:80, V/V); 6 min, A/B
   (0:100, V/V); 7 min, A/B (0:100, V/V); 7.01 min, A/B (95:5, V/V);
   8 min, A/B (95:5, V/V). Throughout the analysis, a constant temperature
   of 4 °C was maintained for all samples, while the column temperature
   was held at a stable 40 °C. The MS procedure was performed on the AB
   SCIEX API 6500 + Qtrap System (AB SCIEX, Framingham, MA, United
   States), incorporating an electrospray ionization (ESI) source that
   operated in the negative ion mode. Nitrogen served as the collision
   gas. Key instrumental parameters were configured as follows: Curtain
   Gas: 30 Psi; IonSpray voltage: − 4500 V; Collision Cell Exit Potential:
   − 20 V; Source Temperature: 450 °C; Gas1: 50 Psi; Gas2: 50 Psi.

   Targeted metabolites were analyzed in multiple reaction monitoring
   (MRM) mode. The MRM pairs, declustering potentials (DPs), and collision
   energies (CEs) were optimized for each analyte. Subsequent data
   acquisition and further analysis were conducted using the Analyst
   software, while the quantification of all metabolites was performed
   using the SCIEX OS-MQ software.

Construction of transcriptome-metabolome association network

   Based on the known metabolic pathway information, DEGs with
   |log[2]foldchange | > 0.58 and p-value < 0.05, and DPMs with VIP > 1.0
   and p-value < 0.05 in different tissue regions of cotton boll were
   selected to construct transcriptome-metabolome association networks.
   The addmodulescore function in the Seurat4 package was scored to
   visualize the expression of genes related to metabolite synthesis in
   spatial organization.

RNA in situ hybridization

   In situ hybridization for mRNA was performed using the kit (Boster
   Biological Technology, Wuhan, China), following the manufacturer’s
   instructions. The cotton bolls were collected, embedded in paraffin,
   and cut into 5 μm thick sections. The sections were de-waxed using
   xylene and rehydrated using a series of alcohol gradients.
   Subsequently, 3% citric acid and concentrated pepsin (two drops) were
   added to the section and incubated at 37 °C for 10 min to expose the
   mRNA fragments. The cells were incubated with a DIG-labeled RNA probe
   (BOSTER) overnight at 60 °C for mRNA hybridization, washed with Saline
   Sodium Citrate (SSC), blocked with serum, and labeled with alkaline
   phosphatase-conjugated anti-digoxigenin antibody (anti-Dig-AP, BOSTER,
   no dilution required) at 37 °C for 1 h. The alkaline phosphatase
   activity in the cells was detected in the dark using a nitro-blue
   tetrazolium/5-bromo-4-chloro-3-inodyl-phosphate (NBT/BCIP) chromogenic
   substrate solution (BOSTER). The sections were analyzed, and the images
   were captured using an Aperio VERSE 8 (Leica) multifunctional tissue
   cell analyzer. A blank control, incubated with no probe, was used for
   comparison. The positive cells are stained bluish-violet. Primers or
   probes used in this assay are listed in Supplementary Data [359]21.

Construction of overexpression and RNAi vector and generation of the
transgenic materials

   The pCAMBIA-2300 plant expression vector with the constitutive CaMV35S
   promoter was cleaved using XbaI and PacI restriction enzymes and used
   to generate the GhBEE3-OE construct by homologous recombination. The
   pBI121 vector digested with SacI and BamHI restriction enzymes was used
   to create the GhBEE3-RNAi construct. The constructs were confirmed by
   sequencing and transferred to Agrobacterium tumefaciens LBA4404 cells.
   Then, GhBEE3-OE and GhBEE3-RNAi transgenic cotton strains were
   generated using agrobacterium-mediated genetic transformation.
   Specifically, cotton seeds were sterilized with 75% (v/v) ethanol 3
   times for 1 min each time, then washed with sterile water 3 times for
   1 min each time, and finally placed in an incubator at 37 °C for
   12–20 h. After the seeds began to germinate, they were transferred to
   MS medium and cultured in a light incubator (16 h light/8 h dark) at
   28 °C for 7 days. Under the aseptic condition, hypocotyls of ZM24 were
   cut into fragments of about 1 cm, and the segments were infected with
   Agrobacterium tumefaciens at the concentration of OD[600] = 0.8 for
   1 min, and then tissue culture in a culture medium. For the
   identification of positive transgenic plants, the gene-specific primer
   of GhBEE3 was used to detect the presence of GhBEE3. The putative
   transgenics were cultured to T3 generation with stable inheritance by
   adopting the screening method, which was used for further research. The
   gene-specific primers used in the experiment are listed in
   Supplementary Data [360]21.

RNA extraction and quantitative real-time PCR

   Total RNA was extracted from the leaves of cotton plants at flowering
   with FastPure Plant Total RNA Isolation Kit (Vazyme, RC401, USA) and
   reversely transcribed into cDNA using NovoScript® Plus All-in-one 1st
   Strand cDNA Synthesis SuperMix (gDNA Purge) (Novoprotein, E047, China).
   The qRT-PCR was performed using the extracted cDNA and Premix Ex Taq™
   II (Takara) on a LightCycler 480 system (Roche Diagnostics, Mannheim,
   Germany) to detect the GhBEE3 expression. The thermocycling was carried
   out as follows: 95 °C for 10 min, followed by 40 cycles of
   amplification at 95 °C for 10 s and 60 °C for 30 s. Each experiment was
   repeated at least three times. Primers used in the assay are listed in
   Supplementary Data [361]21.

Scanning electron microscopy imaging

   Bolls at various developmental stages (− 1-DPA, 0-DPA, 1-DPA, 2-DPA,
   and 3-DPA) were collected simultaneously from similar positions on the
   cotton plant. The ovules collected from these samples were rapidly
   imaged on an environmental scanning electron microscope (FEI MLA650
   ESEM; Thermo Fisher Scientific, USA) under vacuum mode without any
   treatment. Each experiment was repeated at least three times.

Statistical analysis

   Three biological replicates were maintained for each data set unless
   stated otherwise, where n indicates the number of biologically
   independent samples examined. Statistical analysis was assessed by
   unpaired two-tailed t test and significant differences were indicated
   with p-values at confidence intervals 0.95. Data analyses were carried
   out by GraphPad Prism 9.0.2.

Database website

   The Shiny program in R was used to construct an interactive webpage
   ([362]https://cotton.cricaas.com.cn/ovule/) to access the cotton scRNA
   data and ST data generated in this study. The “DimPlot”, “VlnPlot”,
   “FeaturePlot”, and “SpatialPlot” functions in the Seurat package were
   used to plot and display the output. The user interface (UI) can employ
   the Shinythemes package to create custom themes.

   The webpage has been divided into two modules (two tabs displayed at
   the top of the webpage): cotton single-cell transcriptomic data
   (scRNA-seq data) and cotton spatial transcriptomic data (Spatial
   Transcriptomics data). Once a tab is opened, the function parameter is
   displayed on the left and the result on the right. Above the result
   display area are the cluster display plots/gene violin
   plots/FeaturePlot. In the function parameter area, the user can set the
   size of a point, dimensionality reduction method (t-SNE/UMAP), grouping
   information (cluster or sample), and select the gene of interest to
   adjust and obtain the desired results. A list of the top 10 marker
   genes identified appears below the result display area.

Reporting summary

   Further information on research design is available in the [363]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [364]Supplementary Information^ (12.9MB, pdf)
   [365]Peer Review file^ (13MB, pdf)
   [366]41467_2025_55869_MOESM3_ESM.pdf^ (100KB, pdf)

   Description of Additional Supplementary Files
   [367]Supplementary Data 1^ (10KB, xlsx)
   [368]Supplementary Data 2^ (13.3KB, xlsx)
   [369]Supplementary Data 3^ (3.2MB, xlsx)
   [370]Supplementary Data 4^ (1.5MB, xlsx)
   [371]Supplementary Data 5^ (1.3MB, xlsx)
   [372]Supplementary Data 6^ (1.9MB, xlsx)
   [373]Supplementary Data 7^ (1.5MB, xlsx)
   [374]Supplementary Data 8^ (1.8MB, xlsx)
   [375]Supplementary Data 9^ (1.3MB, xlsx)
   [376]Supplementary Data 10^ (695.7KB, xlsx)
   [377]Supplementary Data 11^ (353.5KB, xlsx)
   [378]Supplementary Data 12^ (238.9KB, xlsx)
   [379]Supplementary Data 13^ (1.2MB, xlsx)
   [380]Supplementary Data 14^ (316.1KB, xlsx)
   [381]Supplementary Data 15^ (126.2KB, xlsx)
   [382]Supplementary Data 16^ (13KB, xlsx)
   [383]Supplementary Data 17^ (15.8KB, xlsx)
   [384]Supplementary Data 18^ (16KB, xlsx)
   [385]Supplementary Data 19^ (17.1KB, xlsx)
   [386]Supplementary Data 20^ (11.4KB, xlsx)
   [387]Supplementary Data 21^ (10.5KB, xlsx)
   [388]Reporting Summary^ (149KB, pdf)

Source data

   [389]Source Data^ (293.7KB, xlsx)

Acknowledgements