Abstract Mulberry leaf shape is an important agronomic trait indicating yield, growth, development, and habitat variation. China was the earliest country in the world to grow mulberry for sericulture, and it is also one of the great contributions of the Chinese nation to human civilization. ATAC‐seq (Assay for Transposase Accessible Chromatin using sequencing) is a recently developed technique for genome‐wide analysis of chromatin accessibility. The samples used for ATAC sequencing in this study were divided into two groups of whole leaves (CK‐1 and CK‐2) and lobed leaves (HL‐1 and HL‐2), with two replicates in each group. The related motif analysis, differential expression motif screening, and functional annotation of mulberry leaf shape differences were performed by raw letter analysis to finally obtain the transcription factors (TFs) that lead to the production of heteromorphic leaves. These transcription factors are common in plants, especially the TCP family, shown to be associated with leaf development and growth in other woody plants and are a potential transcription factor responsible for leaf shape differences in mulberry. Dissecting the regulatory mechanisms of leaf shape of different forms of mulberry leaves by ATAC‐seq is an important way to protect mulberry germplasm resources and improve mulberry yield. It is conducive to cultivating mulberry varieties with high resistance to adversity, promoting the sustainable development of sericulture, and protecting and improving the ecological environment. Keywords: ATAC‐seq, biochemical, leaf shape, mulberry 1. INTRODUCTION Mulberry is a perennial heterogeneous, pollinated woody plant used in sericulture (Ma et al., [40]2019). Morus alba (scientific name: Morus alba Pendula) is a cultivated species of mulberry in the family Moraceae, which is distributed in tropical, subtropical, and temperate regions and has a wide distribution in China. It can also form mixed forests with other trees, which increases species diversity and ecological stability. Mulberry has a large biomass and carbon storage capacity and has various ecological functions such as soil and water conservation, wind and sand control, and air purification. As a roadside landscape tree, it has the function of beautifying the environment (Jokar et al., [41]2016). Mulberry leaves are composed of a petiole, leaf blade, and stipules, and are the main assimilation organs and harvests of mulberry trees. In nature, the same plant can display both complete and lobed leaves, and because both are derived from the same genetic background, the differences in their phenotypes are regulated by epigenetics (Itabashi et al., [42]2018; Zhao & Zhou, [43]2012). Leaf morphological differences have an important role in variety identification, sericulture production, and natural disease control. Leaf shape is an important agronomic trait in mulberry because the size and shape of mulberry leaves are closely related to economic yield. The leaf shape also reflects differences in energy metabolism, variations in tree habitats, and genetic variation (Cao et al., [44]2019; Ramesha et al., [45]2020; Samami et al., [46]2019). The study on the growth and morphogenesis of mulberry leaves and the mechanisms underlying variation in leaf shape can provide strong support for future mulberry as a model woody plant. The pendant mulberry leaf types, including M. alba Pendula can vary between unlobed and up to 6–7‐lobed. Therefore, M. alba Pendula provides a most suitable research material to study epigenetic mechanisms of leaf shape regulation in mulberry. In recent years, the formation of leaf shape has been studied extensively. Detailed morphological observations of leaf development from meristem cell initials has led to the identification of growth centers and the development of predictive models of polarized growth along proximal‐distal, medial‐lateral, and adaxial‐abaxial axes in simple and complex leaf structures (Diaz, [47]2017). Some of the genes regulating growth along the different axes and their interplay have been reported (Itakura & Hosoi, [48]2018). However, the regulatory networks underlying the formation of different leaf shapes from the same genome, such as in mulberry, has received little attention to date. Therefore, the aim of this work is to identify, study, and utilize key TFs governing the formation of different mulberry leaf morphologies. A greater understanding of the regulatory mechanisms underlying the production of different mulberry leaf morphologies is an important way to conserve mulberry germplasm resources and improve mulberry yield. It is also conducive to breeding mulberry varieties with high resistance to adversity, promoting sustainable development of sericulture and protecting and improving the environment. ATAC‐seq is a recently developed technique for genome‐wide analysis of chromatin accessibility. Compared with earlier ChIP‐seq or DNase‐seq methods for detecting chromatin accessibility, ATAC‐seq is faster and easier to perform, does not require cross‐linking, has a higher signal‐to‐noise ratio, and can be performed on a smaller number of cells. ATAC‐seq takes into account transposase‐digested DNA fragments that contain other nucleosome localization information (Lu et al., [49]2021). ATAC‐seq seq is widely used to measure chromatin accessibility and to identify open chromatin regions (OCRs). OCRs usually indicate active regulatory elements in the genome and is directly related to gene regulatory networks. Identification of differentially accessible regions (DARs) between different biological conditions is essential to determine the differential activity of regulatory elements. Differential analysis of ATAC‐seq data has many similarities to differential expression analysis of RNA‐seq data. However, the distribution of ATAC‐seq signal intensity differs from that of RNA‐seq data, and DAR identification requires higher sensitivity. Many different tools can be used to perform differential analysis of ATAC‐seq data (Gontarz et al., [50]2020). In our study, we have analyzed the motifs associated with leaf shape differences in mulberry through chromatin accessibility assay of and identified the associated TFs through differential expression motif screening, functional enrichment, and annotation, thus providing new information of molecular mechanisms involved in the generation of heteromorphic leaves in mulberry. 2. MATERIALS AND METHODS 2.1. Plant materials M. alba Pendula were obtained from the Nursery of the National Mulberry Germplasm at Jiangsu University of Science and Technology, Zhenjiang, China. Young whole and lobed leaves were sampled in early April. Furthermore, fresh, young, and actively growing tissues were removed from mulberry trees. Dust or soil from the surface of the material were rinsed with sterile water and blot it up. Large tissues were cut into small pieces or thin slices on ice and then placed into 2 ml screw‐cap lyophilization tubes of >400 mg each, labeled with the sample name; quickly submerged in liquid Nitrogen for at least 1 h of snap‐freezing and then stored in a −80°C refrigerator for subsequent experiments. 2.2. Flow chart of experiments Nuclei suspensions were incubated in a Transposition Mix that includes a Transposase. The Transposase enters the nuclei and preferentially fragments the DNA in open regions of the chromatin. Simultaneously, adapter sequences are added to the ends of the DNA fragments. The transposition reaction was incubated at 37°C for 30 min. Immediately following transposition, the products were purified using a QIAGEN mini‐elute kit, amplified as described in (Buenrostro et al., [51]2013), and sequenced using Illumina HiSeqTM 4000 by Gene Denovo Biotechnology Co. (Guangzhou, China). 2.3. Clean reads filtering Reads obtained from the sequencing machines were filtered to remove adapter sequences, reads containing more than 10% of unknown nucleotides (N) and low quality reads containing more than 50% of low quality (Q value ≤ 20) bases. 2.4. Reads alignment Bowtie2 (Langmead & Salzberg, [52]2012) (version 2.2.8) with the parameters”–X 2000″ was used to align the clean reads from each sample against the reference genome, and reads aligned to the mitochondria or chloroplasts were filtered out. For all data files duplicates were removed using Picard. 2.5. Peak scanning All reads aligning to the + strand were offset by +4 bps, and all reads aligning to the – strand were offset −5 bps. The shifted, concordantly aligned paired mates were used for peak calling by MACS (version 2.1.2) (Zhang et al., [53]2008) with the parameters “‐‐nomodel ‐‐shift ‐100 ‐‐extsize 200 ‐B ‐q 0.05.” Dynamic Poisson Distribution was used to calculated The p‐value of the specific region based on the unique mapped reads. The region was defined as a peak (accessible DNA region) when the q‐value <.05. 2.6. Peak‐related genes annotation According to the genomic location information and gene annotation information of the peak, peak‐related genes were confirmed using ChIPseeker (Yu et al., [54]2015) (version v1.16.1). The position of the peak on the gene or inter‐genic regions (such as promoter, 5′UTR, 3′UTR, exon, intron, downstream and intergenic) was recorded. 2.7. Peak‐related genes GO enrichment analysis Gene ontology (GO) is an international standardized gene functional classification system which offers a dynamic‐updated controlled vocabulary and a strictly defined concept to comprehensively describe properties of genes and their products in any organism. GO has three ontologies: molecular function, cellular component and biological process. The basic unit of GO is the GO term. Each GO term belongs to a type of ontology. GO enrichment analysis provides all GO terms that significantly enriched in peak‐related genes comparing with the genome background, and filter the peak‐related genes that correspond to biological functions. All peak‐related genes were mapped to GO terms in the Gene Ontology database ([55]http://www.geneontology.org/) and gene numbers were calculated for every term. Significantly enriched GO terms in peak‐related genes comparing with the genome background were defined by the hypergeometric test. The formula for p‐value calculation was P= [MATH: 1i= 0m1< mfrac>MiNMniNn :MATH] , where N is the number of all GO annotated genes; n is the number of peak‐related genes in N; M is the number of all genes that are annotated to the certain GO terms; m is the number of peak‐related genes in M. The calculated p‐value was FDR corrected, taking FDR ≤ .05 as the threshold. GO terms meeting this condition were defined as significantly enriched GO terms in peak‐related genes. 2.8. Peak‐related genes pathway enrichment analysis KEGG pathway (Kanehisa et al., [56]2008) enrichment analysis was used to identify significantly enriched metabolic pathways or signal transduction pathways in peak‐related genes relative to the whole genome background. The calculating formula is the same as that in GO analysis. Here, N is the number of all transcripts that with KEGG annotation, n is the number of peak‐related genes in N, M is the number of all transcripts annotated to specific pathways, and m is number of peak‐related genes in M. The calculated p‐value was FDR corrected, taking FDR ≤ .05 as a threshold. Pathways meeting this condition were defined as significantly enriched. 2.9. Irreproducible discovery rate (IDR) To measure consistency between biological replicates within an experiment, Irreproducible discovery rate (IDR) (Li et al., [57]2011) software (version 2.2) was used with parameters “‐‐input‐file‐type narrowPeak –plot.” IDR calculations using scripts provided by the ENCODE project ([58]https://www.encodeproject.org/software/idr/) were performed on all pairs of replicates using an oracle peak list called from merged replicates for each condition. Peaks showing an IDR value of .05 or less were considered relevant. 2.10. Motif analysis The MEME suit ([59]http://meme-suite.org/) was used to detect motifs in peak sequences. We used MEME‐ChIP to scan motifs with high reliability through peak regions, and nd used MEME‐AME to confirm the existences of any specific known motifs. 2.11. Differential analysis of multi‐samples The DiffBind (Stark & Brown, [60]2011) software (version 2.8.0) was used to analyze peak differences across the two groups and significant differential peaks with an FDR of <.05 were selected. Using the same method, genes associated with the different peaks were annotated, and enrichment analysis of GO functions and KEGG pathways were identified. 2.11.1. Transcription factors validation (TFs) by qRT‐PCR Total RNA was extracted from fresh leaf samples using the RNAiso Plus reagent (Takara, China), according to the manufacturer's protocol. First, the concentration and purity of total RNA were checked on a Nanodrop spectrophotometer. Then, the extracted total RNA was reverse transcribed using PrimeScript™ RT reagent Kit (Takara, China), following the manufacturer's instructions, to obtain the cDNAs. Three‐step qPCR was performed using the FastStart Universal SYBR Green Master Mix kit (Roche, USA) in ABI Quant Studio 6 Flex System (Applied Biosystems, USA), according to the kit instructions. The qPCR reaction mixture comprised 10 μl FastStart Universal® SYBR Green Master, 1 μl forward primer (10 μM), 1 μl reverse primer (10 μM), 2 μl cDNA, and the addition of ddH[2]O to make a final volume of 20 μl. Amplification was achieved as follows; PCR cycling conditions: 95°C denaturation, 1 min, 35 cycles: denaturation at 95°C, 20 s; annealing at 60°C, 20 s; extension at 72°C, 30 s. 3. RESULTS 3.1. Experimental library construction process and data quality control Cell samples were obtained and the nuclei were extracted, Tn5 transposase was added to the nuclei suspension for transposition reaction, and the DNA fragments were purified after the reaction was completed. The amplification products were then used for PCR amplification and purification to construct sequencing libraries. After the libraries were constructed, the libraries were tested for quality. Libraries that passed the quality check were used for up‐sequencing to obtain sequence information of the open chromatin region to be sequenced. Data filtering was performed on the downstream data to remove low‐quality data. The filtered data was then compared with the reference genome and after confirming the quality of the comparison, the sequences at the unique position on the comparison were extracted and processed for information analysis. The samples used for ATAC sequencing in this study were divided into two groups of whole leaves (CK‐1 and CK‐2) (Figure [61]1a) and lobed leaves (HL‐1 and HL‐2) (Figure [62]1b). The Clean reads obtained from the lower machine data after processing CK‐1, CK‐2, HL‐1, and HL‐2 were 217924734, 179669554, 198654016, and 181546126, respectively, with the percentage of low quality reads ranging from .17% to .34% (Table [63]1). Frequency and proportion plots of reads indicate that HQ_Clean_Reads_Num and Adapter account for a significant proportion (Figure [64]2). From the distribution of bases before and after data filtering, it can be seen that the quality of filtered data is higher (Figure [65]S1). FIGURE 1. FIGURE 1 [66]Open in a new tab Mulberry leaf shape. (a) Whole leaf; (b) lobed leaf TABLE 1. Reads filtered information statistics Sample Clean_Reads_Num HQ_Clean_Reads_Num (%) Read_length Adapter (%) Low_quality (%) PolyA (%) N (%) CK‐1 217924734 173083490 (79.42%) 150 + 150 44368018 (20.36%) 946452 (.22%) 0 (.0%) 0 (.0%) CK‐2 179669554 143046708 (79.62%) 150 + 150 35964666 (2.02%) 1236016 (.34%) 0 (.0%) 80344 (.02%) HL‐1 198654016 168445512 (84.79%) 150 + 150 29878974 (15.04%) 659060 (.17%) 0 (.0%) 0 (.0%) HL‐2 181546126 155636996 (85.73%) 150 + 150 25594048 (14.1%) 630164 (.17%) 0 (.0%) 0 (.0%) [67]Open in a new tab FIGURE 2. FIGURE 2 [68]Open in a new tab Data quality control. (a) Sample filtering analysis frequency graph; (b) sample filtering analysis proportion graph 3.2. Comparison analysis The data were matched to the reference genome using the matching software Bowtie2, and reads that matched to mitochondria or chloroplasts were filtered out (Table [69]S1). Reads matching to a unique position on the genome (unique matched reads) were used for the subsequent information analysis (Table [70]2). The unique matched sequences obtained after alignment were analyzed for their coverage distribution on the reference genome, and the depth information of genomic loci was counted to obtain the statistical results of sequencing depth on the genome ([71]Figure S2). All reads in the 2 kb region upstream and downstream of the transcription start site (TSS) were counted using deepTools (Ramírez et al., [72]2016) software with a window of size 50 bp, and the average depth of reads within each window was calculated. The distribution of reads relative to the transcriptional start site (TSS) was plotted in a fold line and heat map (Figure [73]3). The densities of uniquely matched, de‐duplicated reads were compared with the individual chromosomes (divided into positive and negative strands) on the genome for statistical purposes ([74]Figure S3). Because Tn5 transposase preferentially attacks the chromatin open region, in general, most of the DNA fragments produced are short and contain none or only one nucleosome, whereas there are also long fragments containing multiple nucleosomes, which will show distinct fragment distribution characteristics in terms of content distribution. The insertion fragment distribution of each sample was mapped using ATACseqQC (Ou et al., [75]2018) (Figure [76]4). TABLE 2. Statistical results of each sample compared to the reference genome Sample_ID Total_Reads UnMapped_Reads Mapped_Reads Multi_Mapped_Reads Unique_Mapped_Reads Duplicate_Reads CK‐1 173083490 42058971‐24.30% 131024519‐75.70% 81154630‐46.89% 49869889‐28.81% 94977008‐72.49% CK‐2 143046708 38573646‐26.97% 104473062‐73.03% 58335266‐40.78% 46137796‐32.25% 63862448‐61.13% HL‐1 168445512 41443871‐24.60% 127001641‐75.40% 79120355‐46.97% 47881286‐28.43% 92423246‐72.77% HL‐2 155636996 38035814‐24.44% 117601182‐75.56% 74137809‐47.64% 43463373‐27.93% 85131006‐72.39% [77]Open in a new tab FIGURE 3. FIGURE 3 [78]Open in a new tab Distribution of reads relative to TSS position and signal. (a) Distribution of CK‐1 reads relative to TSS position and signal; (b) distribution of HL‐1 reads relative to TSS position and signal; (c) distribution of CK‐2 reads relative to TSS position and signal; (d) distribution of HL‐2 reads relative to TSS position and signal distribution FIGURE 4. FIGURE 4 [79]Open in a new tab Distribution of insertion fragments. (a) Distribution of CK‐1 insertion fragments; (b) distribution of CK‐2 insertion fragments; (c) distribution of HL‐1 insertion fragments; (d) distribution of HL‐2 insertion fragments 3.3. Single sample peek detection and statistics Sequencing analysis of DNA products after Tn5 transposase digestion was performed to find open chromatin regions from the whole genome. Because Tn5 transposase brings about an adaptor shift of four bases in the positive strand and five bases in the negative strand, the position of the reader needs to be corrected for displacement by shifting the reads position of the paired to the positive strand by +4 and the paired to the negative strand by −5. A genome‐wide peak (open chromatin enriched region) scan was performed using MACS2 analysis software (peak calling) with a threshold q‐value < .05. The peak sequences were then analyzed to identify their associated genes (Table [80]3). The distribution of peaks was counted ([81]Table S2) and their genomic distribution classified as either promoter (within 2 k upstream of the gene), 5′UTR, 3′UTR, Exon, intron, downstream (within 500 bp downstream of the gene) or distal intergenic (beyond 2 k upstream of the gene or beyond 500 bp downstream) ([82]Table S3), and determine the correspondence between the peak and each functional region (Figure [83]5). TABLE 3. Number of peak statistics by sample SampleId PeakNumber TotalLength AverageLength TotalPileup AveragePileup GenomeRatio FRiP CK‐1 31612 12689327 401 1163575 36 4.29% 65.81% CK‐2 39097 16696817 427 1636508 41 5.65% 64.67% HL‐1 29915 11876626 397 1002739 33 4.02% 69.85% HL‐2 28070 10729635 382 773489 27 3.63% 68.87% [84]Open in a new tab FIGURE 5. FIGURE 5 [85]Open in a new tab Distribution of shared peaks in functional elements of genes within the group. (a, b) Pie chart of distribution of shared peaks in functional elements of genes; (c) Proportion of distribution of shared peaks in functional elements of genes; (d) proportion of distribution of shared peaks relative to TSS distance Where the location of peaks was classified as distal intergenic, their regulatory relationship with the nearest neighboring gene could not be established with any confidence. Therefore, peaks categorized as distal intergenic were eliminated from further consideration. The remaining peak‐related genes were used in GO and KO enrichment analyses ([86]Figures S4 and S5). 3.4. Analysis of TFs associated with intra‐group shared peaks TFs are proteins that bind specific DNA regulatory motifs of genes to regulate their gene transcription. TF‐motif analysis was performed using MEME‐chip motif discovery of the MEME Suite ([87]http://meme-suite.org/). The analysis included an enrichment analysis of known TF (present in the database) and predicted TF motifs. The top 400 peaks of the enrichment multiplicity were taken by default (Figure [88]6). Different colors represent different base types, and the height of the letter represents the conservativeness of the base (the higher the letter, the higher its frequency across the locus, the more conserved it is). FIGURE 6. FIGURE 6 [89]Open in a new tab MEME significant motif sequence map Currently, a large number of DNA motifs recuiting TFs have been reported in transcription factor databases (e.g., JASPAR database). Enrichment analyses of these motifs helps us to quickly target the corresponding key TFs, which provides clues for further data parsing and downstream molecular experiments. Here, we have used Analysis of Motif Enrichment (AME) of the MEME Suite to perform enrichment analysis of TF motifs ([90]Table S4). AME uses the JASPAR CORE non‐redundant database as a control for input sequence enrichment. Motifs with a significant p‐value were considered to be an enriched TF motif (Figure [91]7). The top 20 enriched transcription factors were ABF2, ABI5, ARALYDRAFT_484466, ARALYDRAFT_495258, At1g72010, AT4G18890, AT5G08330, bHLH31, Glyma19g2656.1, HY5, MYC3, OJ1058_F05.8, OsI_08196, PIF1, PIF4, PIF7, TCP19, TCP20, TCP23, and TCP7. These enriched transcription factors were verified by qPCR ([92]Table S5), and the experimental results showed consistent trends between the two experimental methods, ATAC‐seq and qPCR (Figure [93]8). FIGURE 7. FIGURE 7 [94]Open in a new tab Motif enrichment bubble diagram FIGURE 8. FIGURE 8 [95]Open in a new tab Transcription factors validation (TFs) by qRT‐PCR. CK: whole leaf; HL: lobed leaf. 3.5. Merging of peaks across treatment groups and multisample clustering DiffBind software was utilized to perform the merging of peaks from each treatment group and to calculate the abundance of each peak in each sample. To facilitate inter‐treatment comparisons, principal component analysis (PCA) was used to reduce, the high dimensional information contained in the samples (thousands of peaks enriched with information) into a composite indicator of several dimensions (principal components), while ensuring that as much information as possible was retained in the original data (Figure [96]9). The Pearson correlation coefficients calculated between every two samples are presented visually in the form of a heat map (Figure [97]10). FIGURE 9. FIGURE 9 [98]Open in a new tab Sample principal component analysis FIGURE 10. FIGURE 10 [99]Open in a new tab Sample correlation heat map 3.6. Analysis of peaks and their associated genes shared or unique in simple and lobed leaves A Venn diagram describes the common and unique peaks present in simple and lobed leaves (Figure [100]11). In particular, the peaks specific to each group were associated with the biological regulation specific to the corresponding group. For the peaks that are common or unique among the comparison groups, we further analyze the peak‐associated genes. The nearest neighboring gene of each peak were initially considered as the peak‐related gene. Considering that when peaks are far away from genes, it is difficult to determine whether there is a regulatory relationship between peaks and genes. Therefore, the peaks classified as distal intergenic were not considered further. The remaining peak‐related genes were subsequently subjected to GO ([101]Figure S8) and KO enrichment analyses ([102]Figure S9). FIGURE 11. FIGURE 11 [103]Open in a new tab CK versus HL intergroup peak Venn diagram 3.7. Analysis of TF motifs exclusively present in simple or lobed leaves For peaks specific to a particular treatment group, TF motif analysis was performed use using the MEME Suite to predict conserved motifs within the peak region (400 peaks were taken by default). MEME‐chip was used to detect new motifs, and the AME package used to detect known motifs. Longer (8–15 bp) and shorter motifs (3–8 bp) were predicted from scratch using MEME software ([104]Figure S10 A&B) and DREME software ([105]Figure S10C,D), respectively. For the enrichment analysis of known TFs‐motifs, the number of grouped TF motifs were 35 and 77 ([106]Table S6). Grouped enrichment statistics were performed ([107]Table S7) and grouped motif enrichment bubble plots were obtained ([108]Figure S11). 3.8. Quantitative analyses of the differential availability of peaks and associated genes between simple and lobed leaves The qualitative analysis of peak and associated genes presented above indicated some TF binding sites that were differentially employed in simple and lobed leaves. However, quantitative differences in the availability of TF binding sites can also reveal potential differential regulations of genes between these two morphologies. We therefore used DiffBind software to identify the differential availability of peaks between simple and lobed leaf samples, where peak motifs with differences in abundance of log2 values ≥1 and a FDR ≤.05 were selected and their peak‐associated genes were identified as described above to provide an indication of the number and types of genes showing differential susceptibility to regulation by their corresponding peak motif in the two morphological leaf types. A statistical plot of genes displaying such significantly differential susceptibilities is presented in (Figure [109]12). Genes with differential susceptibilities for regulation by differentially available peak motifs were for subsequently subjected to GO (Figure [110]13) and KO enrichment analyses (Figure [111]14). FIGURE 12. FIGURE 12 [112]Open in a new tab (a) Difference peak statistics plot; (b) CK versus HL difference comparison volcano plot. Up: difference up‐regulated peak; down: difference down‐regulated peak FIGURE 13. FIGURE 13 [113]Open in a new tab GO enrichment analysis. (a) GO enrichment circle graph. First circle: enrichment of the top 20 GO terms; outside the circle is the coordinate scale of gene numbers. Different colors represent different ontologies. Second circle: the number of GO terms in background genes and the Q value. Bar lengths reflect the number of genes and their Q values are mapped to red intensity, where smaller Q values are more intensely red. Third circle: the number of genes differing in the GO term. Fourth circle: the RichFactor value of each GO term (the number of differences in the GO term divided by the number of all). In the background grid, each cell represents a RichFactor unit of .1). (b) GO enrichment classification bar graph; (c) GO enrichment bubble graph, GO enrichment bubble graph of the top 20 GO terms with the smallest Q values. The vertical coordinate is GO term, the horizontal coordinate is the enrichment factor (the number of variances in that GO term divided by all the numbers), and the bubble size indicates how many and the darker the color the smaller the Q value). (d) GO enrichment bar graph (constructed using the first 20 GO term with the smallest Q values): The vertical coordinate is the GO term, and the horizontal coordinate is the number of that GO term as a percentage of the number of all variances. The darker the color, the smaller the Q value, and the values depicted on the bars are the number of GO term and the Q value. Different colors represent different A classes. Second circle: the number of pathways in the background genes and the Q value. The more genes the longer the bar, the smaller the Q value the redder the color. Third circle: the number of genes differing in the pathway. Fourth circle: the RichFactor value of each pathway (the number of differences in the pathway divided by the number of all), background grid lines, each cell represents .1 RichFactor unit). The vertical coordinate is the pathway, the horizontal coordinate is the enrichment factor (the number of differences in the pathway divided by the number of all), the size indicates the number of how much, and the more red the color the smaller the Q value. The darker the color the smaller the Q value, and the value on the bar is the number of that pathway and the Q value. FIGURE 14. FIGURE 14 [114]Open in a new tab KO enrichment analysis. (a) KO enrichment circle; (b) KO enrichment bubble; (c) KO enrichment bar graph 3.9. Enrichment analysis of exposed TF‐motifs in simple and lobed leaves TF motif analysis for intergroup differential peaks assisted the identification of TFs involved in high and low variation in genomic openness between simple and lobed leaves. The MEME Suite was used as described above. Up‐regulated peak motif de novo prediction, using MEME software, from scratch to predict longer motif (8–15 bp) ([115]Figure S12 A) down‐regulate the motif denovo prediction of peak. The following is the long motif (8–15 bp) predicted from scratch using MEME software ([116]Figure S12 B), using Dreme software, and the longer motif predicted from scratch Short motif (3–8 bp) ([117]Figure S12 C). The total number of motifs with an up‐regulated number of open chromatin sites is 6, whereas the number of down‐regulated sites is 24 ([118]Figure S13). 4. DISCUSSION The mulberry tree is a very important economic tree species, which is widely planted in the world. Its various organs have different economic or medicinal values, especially mulberry leaves, which are an important food source for silkworms (Ruth et al., [119]2019). In this study, we have utilized ATAC‐seq to analyze differences in chromatin accessibility between lobed and simple leaves of mulberry in order to identify regulatory mechanisms underlying the development of heteromorphic leaf shapes in M. alba individuals. ATAC‐seq, which identifies open chromatin regions, identifies nucleosome bound and unbound sites enriched in regulatory regions including TFs. The bacterial transposase, Tn5, binds to the open regions of chromatin and cleaves it, whereas the helically wound portion is unaffected. Tn5 has been widely used in library construction for second‐generation sequencing and with the unique “tagging” function of the dimeric Tn5, it can cut double‐stranded DNA and ligate the resulting DNA ends to specific ligands. The cleaved and interrupted DNA is collected together for library construction and sequencing to characterize the remodeling of chromatin accessibility that occurs between biological conditions (Gontarz et al., [120]2020). Before ATAC‐Seq technology was developed, there were several commonly used methods to study open chromatin, such as DNase‐seq, MNase‐seq, and FAIRE‐seq. DNase‐seq utilizes a restriction endonuclease (DNase I) to fragment the sample. In chromatin dense regions, DNA strands are well protected making it impossible for endonucleases to approach these regions so that only DNA in open regions are cut. Similarly, within the open regions of chromatin, DNA wound around nucleosomes is protected so that only DNA sequences between nucleosomes can be cut by DNase I (Zhong et al., [121]2016). MNase‐seq is similar to DNase‐seq in that it uses a restriction exonuclease to excises all unprotected regions, leaving only the DNA sequence wound on the nucleosome (Hu et al., [122]2017). Formaldehyde‐assisted isolation of regulatory elements sequencing (FAIRE‐seq) is a genome‐wide method for identifying chromatin accessibility and detecting DNA sequences associated with regulatory activity (Bianco et al., [123]2015). These techniques have obvious shortcomings, such as the very large number of cells required, the tedious experimental procedure, and the less coverage of data obtained by sequencing compared to ATAC‐Seq. In comparison, ATAC‐Seq starts with a low cell volume, takes less time and obtains a larger range of open regions. ATAC‐seq can capture the entire open area, but the sequencing results cannot match the results of DNase‐seq and MNase‐seq (Martins et al., [124]2018). In addition, the preparation of the library building is more tedious and the reagents are more expensive. More TCP family transcription factor peaks are indicated in the results of this investigation. Numerous research on the regulation of leaf shape development by transcription factors of the TCP family has been reported. TCP family genes encode a set of plant‐specific transcription factors involved in multiple aspects of plant development (Barkoulas et al., [125]2007; Koyama et al., [126]2010; Lan & Qin, [127]2020).TCP affects leaf size and shape in many herbaceous plants. However, whether these functions are retained in woody plants remains unknown. Genome‐wide identification of TCP genes in Populus and Populus trichocarpa identified 33 and 36 genes encoding putative TCP proteins, respectively, and phylogenetic analysis of Populus TCP together with Arabidopsis TCP showed a biased expansion of the TCP gene family through segmental duplication. In addition, the results showed correlations between different expression patterns of several poplar TCP genes and leaf shape changes, suggesting their involvement in regulating leaf shape development (Ma et al., [128]2016). In this study, as shown in Figure [129]7, there were more TCP families, and most were enriched in the CK group, suggesting the possibility of TCP may be decreased in the HL group, resulting in the change of leaf shape (from whole leaves to lobed leaves). Also, the TCP family analyzed from the down‐regulated peaks in the HL group showed similar patterns ([130]Figure S13B). This further supports the possibility that the expression of the TCP transcription factors family is reduced in cleaved leaves. The results showed that the TCP family is an important transcription factor that leads to different leaf shapes. ATAC‐Seq technology itself will be used in new applications with the continuing development of other technologies, such as sciatica‐Seq which, combines single‐cell sequencing technology with it. Using ATAC‐Seq technology, we can solve a variety of problems, such as studying the pathogenesis of major diseases, finding differences in chromatin open regions of diseases; studying differences in binding regulation of TFs; identifying novel enhancers in development; and performing nucleosome localization (Ji et al., [131]2020; Martins et al., [132]2018; Satpathy et al., [133]2018). In addition, this technique can be used in combination with other techniques, such as ChIP‐seq, which can be used to validate potential binding regions of TFs predicted by ATAC‐Seq (Lhoumaud et al., [134]2019; Newell et al., [135]2021). Alternatively, it can be combined with transcriptome sequencing to combine differentially expressed genes with Peak results to see the corresponding expression of transcripts and to study the upstream regulatory sequences, all of which are hot topics of current research (Ranzoni et al., [136]2021; Xu et al., [137]2022). 5. CONCLUSION In this study, we used ATAC‐seq to mine the TFs that lead to leaf shape differences, detected and counted single samples and intra‐group peaks respectively by bioinformatics analysis and performed TF motif analysis to provide insight the TFs related to heteromorphic leaf occurrence. This study has laid the foundation for an in‐depth study to understand the mechanisms underlying leaf shape changes in mulberry. AUTHOR CONTRIBUTIONS Weiguo Zhao, Xuzhu Gao, and Dayong Xu conceived this study. Lei Wang, Yuming Feng, and Jiangying Wang performed the experiments. Lei Wang and Michael Ackah wrote the manuscript. Xin Jin and Qiaonan Zhang revised the original manuscript. All authors read and approved the final version of the manuscript. CONFLICT OF INTEREST The Authors did not report any conflict of interest. Supporting information Table S1 Statistical results of mitochondria and chloroplasts for each sample comparison Table S2 Statistics of the number of peaks in each group Table S3 Distribution statistics of peaks on functional elements of genes Table S4 Sample TF motif number statistics Table S5 qRT‐PCR primer sequences Table S6 Number of grouped TF motif statistics. Subgroup CK‐vs‐HL.CK motif enrichment statistics Figure S1: Base distribution before and after filtering, A‐D based distribution before filtering, E‐H base distribution after filtering Figure S2: Cumulative analysis of genome sequencing depth. A. CK‐1 sequencing depth, B. CK‐2 sequencing depth, C. HL‐1 sequencing depth, D. HL‐2 sequencing depth Figure S3: Distribution of reads on chromosomes. A. Distribution of CK‐1 reads on chromosomes, B. Distribution of HL‐1 reads on chromosomes, C. Distribution of CK‐2 reads on chromosomes, D. Distribution of HL‐2 reads on chromosomes Figure S4: HL GO enrichment analysis. A. GO enrichment circle graph, B. GO enrichment bubble graph, C. GO enrichment bar graph, D. GO enrichment classification bar graph Figure S5: CK GO enrichment analysis. A. GO enrichment circle graph, B. GO enrichment bubble graph, C. GO enrichment bar graph, D. GO enrichment classification bar graph Figure S6: HL KO enrichment analysis. A. KO enrichment circle graph, B. KO enrichment bubble graph, C. KO enrichment bar graph Figure S7: CK KO enrichment analysis. A. KO enrichment circles, B. KO enrichment bubbles, C. KO enrichment bars Figure S8: GO enrichment analysis. A. GO enrichment circle graph, B. GO enrichment classification bar graph, C. GO enrichment bubble graph, D. GO enrichment bar graph Figure S9: KO enrichment analysis. A. KO enrichment circle, B. KO enrichment bubble, C. KO enrichment bar graph Figure S10: A/B MEME significant motif sequence, C/D. Dreme significant motif sequence Figure S11: Grouped motif‐enriched bubble plots Figure S12: TF‐motif denovo prediction. A. Up‐regulate peeks motif denovo prediction, B. Down‐regulate peak's motif denovo prediction, C. Down‐regulate peaks motif denovo prediction (3‐8 bp). Figure S13: Comparison group motif enrichment bubble plot. [138]Click here for additional data file.^ (6.2MB, docx) Wang, L. , Feng, Y. , Wang, J. , Jin, X. , Zhang, Q. , Ackah, M. , Wang, Y. , Xu, D. , & Zhao, W. (2022). ATAC‐seq exposes differences in chromatin accessibility leading to distinct leaf shapes in mulberry. Plant Direct, 6(12), e464. 10.1002/pld3.464 Funding information This work was supported by the Jiangsu Postgraduate Research and Practice Innovation Program in 2022 (grant no. KYCX22_3855), the China Agriculture Research System of MOF and MARA, National Key R&D Program of China, the Key Projects of International Scientific and Technological Innovation Cooperation (grant no. 2021YFE0111100), the Guangxi Innovation‐driven Development Project (grant no. AA19182012‐2), and the Zhenjiang Science and Technology Support Project (grant no. GJ2021015). Contributor Information Dayong Xu, Email: xudayong3030@sina.com. Weiguo Zhao, Email: wgzsri@126.com. REFERENCES