Abstract

   Wool fiber diameter (WFD) is the most important economic trait of wool.
   However, the genes specifically controlling WFD remain elusive. In this
   study, the expression profiles of skin from two groups of Gansu Alpine
   merino sheep with different WFD (a super-fine wool group [FD = 18.0 ±
   0.5 μm, n= 3] and a fine wool group [FD=23.0±0.5μm, n=3]) were analyzed
   using next-generation sequencing–based digital gene expression
   profiling. A total of 40 significant differentially expressed genes
   (DEGs) were detected, including 9 up-regulated genes and 31
   down-regulated genes. Further expression profile analysis of natural
   antisense transcripts (NATs) showed that more than 30% of the genes
   presented in sheep skin expression profiles had NATs. A total of 7 NATs
   with significant differential expression were detected, and all were
   down-regulated. Among of 40 DEGs, 3 DEGs (AQP8, Bos d2, and SPRR) had
   significant NATs which were all significantly down-regulated in the
   super-fine wool group. In total of DEGs and NATs were summarized as 3
   main GO categories and 38 subcategories. Among the molecular functions,
   cellular components and biological processes categories, binding, cell
   part and metabolic process were the most dominant subcategories,
   respectively. However, no significant enrichment of GO terms was found
   (corrected P-value >0.05). The pathways that were significantly
   enriched with significant DEGs and NATs were mainly the lipoic acid
   metabolism, bile secretion, salivary secretion and ribosome and
   phenylalanine metabolism pathways (P < 0.05). The results indicated
   that expression of NATs and gene transcripts were correlated,
   suggesting a role in gene regulation. The discovery of these DEGs and
   NATs could facilitate enhanced selection for super-fine wool sheep
   through gene-assisted selection or targeted gene manipulation in the
   future.

Introduction

   Fine wool sheep, also called Merino, is a world famous sheep breed that
   is known to produce high-quality fine wool. Fine wool sheep are
   distributed primarily in Australia, China, New Zealand, South Africa,
   Uruguay, Argentina and other countries[[39]1]. Wool fiber diameter(WFD)
   is the most important economic trait of merino sheep and determines 75%
   of the value of wool fibers. The WFD variation-induced profit accounted
   for 61% of the total profits of wool [[40]2, [41]3]. Wool is formed
   from keratinocytes derived from a progenitor population at the base of
   the hair follicle(HF)[[42]4]. The morphogenesis and growth of HF in
   sheep has been extensively studied since the 1950’s and the
   developmental processes at the cellular level are reasonably well
   understood [[43]5–[44]7]. Dermal papilla (DP) cells are a population of
   mesenchymal cells at the base of the HF[[45]8], and provide signals
   that contribute to specifying the size, shape and pigmentation of the
   wool[[46]9]. It is well known that WFD is significantly associated with
   size of DP and matrix in mammals [[47]4, [48]6, [49]10–[50]17], and is
   largely specified post-initiation, during the period of HFs growth and
   morphogenesis[[51]18]. To illustrate the molecular mechanisms of
   controlling WFD, the expression profiles of different stage of fetal
   and adult sheep skin have also been generated by sequencing of
   expressed sequence tags (ESTs) and cDNA microarray [[52]19, [53]20].
   However, the size of DP and matrix in mammals is also markedly
   influenced by genetic[[54]10, [55]11, [56]21], physiological[[57]13],
   nutrition[[58]22], hormones[[59]12] during the anagen phase of the hair
   cycle. Up to now, there are no studies on the molecular mechanisms of
   controlling WFD during the anagen phase, and the genes specifically
   controlling WFD remain elusive [[60]23].

   Knowledge of the genes controlling development of DP and matrix come
   from studies of the morphogenesis and cycle of HF of mice and
   human[[61]24–[62]28], It involves a series of signaling between the
   matrix and the dermal papill, such as Wnt/beta-catenin, EDA/EDAR/NF-κB,
   Noggin/Lef-1, Ctgf/Ccn2, Shh, BMP-2/4/7, Dkk1/Dkk4 and EGF[[63]4,
   [64]29]. The mutation, epigenetic modification and post-translational
   modification of any ligand or receptors in these pathways maybe affect
   WFD [[65]30, [66]31]. Therefore, in addition to the current efforts on
   protein-encoding genes, the attention should also be paid to a novel
   regulatory factor, non-coding RNA (ncRNA), such as micro RNA, long
   non-coding RNA and natural antisense transcripts (NATs)[[67]32]. Among
   these ncRNAs, NATs are not only large in quantity but also play
   important roles in gene expression regulation in organisms [[68]33].
   NATs refer to a class of non-coding RNAs that are produced inside
   organisms under natural conditions and are expressed in many species
   [[69]34–[70]36]. NATs play an important role in transcript regulation
   at the mRNA and/or protein levels and in regulating various
   physiological and pathological processes, such as organ formation, cell
   differentiation and disease [[71]37, [72]38]. However, the roles of
   NATs in controlling WFD have not been described.

   The next-generation sequencing (NGS)-based digital gene expression
   (DGE) profiling technologies developed in recent years constitute a
   revolutionary change in traditional transcriptome technology. Compared
   with ESTs and cDNA microarray, the strength of DGE profiling is that it
   is an "open system" and has better capability to discover and search
   for new information, providing a new way to identify novel genes and
   NATs that specifically control WFD [[73]39]. However, the success or
   effectiveness of the search is largely dependent on the completeness
   and stitching quality of the genome sequences of the species studied
   [[74]40]. It is exciting that the International Sheep Genomics
   Consortium (ISGC) has achieved initial success in assembling the
   reference sheep genome. The total length of the assembled genome, Sheep
   Genome v3.1, has reached 2.64 Gb, with only 6.9% gaps [[75]41].
   Needless to say, the release of a high-quality sheep genome reference
   sequence provides important resources for using NGS to study the skin
   transcriptome of sheep with different WFD and find genes and NATs that
   specifically control WFD [[76]42, [77]43].

   In this study, NGS-based DGE profiling was conducted to analyze the
   differentially expressed genes (DEGs) and NATs in the skin of Gansu
   Alpine fine wool sheep with different WFD. A total of 47 significant
   DEGs and NATs were detected, including 9 up-regulated genes, 31
   down-regulated genes and 7 down-regulated NATs. These DEGs and NATs may
   be useful in further study on molecular markers of controlling WFD in
   fine wool sheep.

Results

Sequencing and assembly

   NGS was performed on 6 individuals of the super fine wool group (sample
   nos.65505,65530,65540;n = 3) and the fine wool group (sample
   nos.5Y127,5Y212,5Y339;n = 3), and raw data greater than 5 Mb were
   obtained for all individuals ([78]Table 1), submitted to the NCBI
   BioProject database, with the accession number PRJNA274817. There was a
   3' adaptor sequence in the raw data, along with small amounts of
   low-quality sequences and various impurities. Impurity data were
   removed from the raw data, and clean tags greater than 4.7 Mb were
   obtained. The distinct tag number of each individual was greater than
   0.11 Mb ([79]Table 1).

Table 1. Summary of tag numbers based on the DGE data from Gansu Alpine fine
wool sheep skin with different WFD.

   Sample ID             Raw Data                        Clean Tag
             Total number Distinct Tag number Total number Distinct Tag number
   5Y127     5259561      355014              4917435      119886
   5Y212     5364832      350472              5033144      117880
   5Y339     5145452      336695              4857460      120525
   65505     5244848      314710              4930422      110879
   65530     5485701      343795              5184375      117642
   65540     5282704      373209              4941669      143800
   [80]Open in a new tab

   The sheep genome reference sequence database includes 19,346 gene
   sequences, including 18,940 genes with "CATG" loci and accounting for
   97.90% of the total number of genes. The total number of reference tags
   in the reference tag database is 173,207, including 169,843 unambiguous
   reference tags, accounting for 98.06% of the total number of tags. All
   clean tags were compared with reference genes and reference genomes,
   and the results indicated that 87.45%, 87.32%, 88.63%, 88.97%, 88.72%
   and 87.38% of the total numbers of clean tags of the 6 individuals from
   two groups (sample nos. 5Y127, 5Y212, 5Y339, 65505, 65530,and 65540,
   the same hereinafter), respectively, could be matched to the reference
   tags; of these, 49.05%, 50.26%, 51.86%, 52.41%, 51.48% and 50.37% of
   the clean tags could be uniquely located in the reference sequences
   (sense and antisense), and the ratios of distinct tag were 36.00%,
   36.63%, 37.64%, 36.61%, 37.33% and 33.22% ([81]Table 2), respectively;
   the numbers of unambiguous tag-mapped genes were 10,391, 10,209,
   10,666, 10,434, 10,137, 10,298,10763 and 9,965, respectively.
   ([82]Table 2) These uniquely located sequences indicated that the key
   genes that regulate the WFD may exist in genes expressed in the
   individuals with different WFD.

Table 2. Summary of unambiguous tag mapping to gene and unambiguous
tag-mapped genes (sense & anti-sense).

   Sample ID Unambiguous Tag Mapping to Gene Unambiguous Tag-mapped Genes
   Total number Total % of clean tag Distinct Tag number Distinct Tag % of
   clean tag number % of ref genes
   5Y127 2412224 49.05% 43158 36.00% 10391 53.71%
   5Y212 2529818 50.26% 43175 36.63% 10209 52.77%
   5Y339 2519260 51.86% 45369 37.64% 10666 55.13%
   65505 2583990 52.41% 40595 36.61% 10434 53.93%
   65530 2668978 51.48% 43910 37.33% 10137 52.40%
   65540 2489004 50.37% 47775 33.22% 10298 53.23%
   [83]Open in a new tab

   There were a large number of possible unknown tags that are in the gene
   expression profile libraries of these six individuals. Accounting for
   12.55% (617,198), 12.68% (638,292), 11.37% (552,279), 11.03% (543,900),
   11.28% (585,017) and 12.62% (623,705) of the total number of clean tags
   were unannotated, respectively; the proportions of distinct tag number
   of unknown tag were 18.77% (22,503),18.79% (22,152),16.17%
   (19,493),18.02% (19,975),17.25% (20,292) and 20.43% (29,382),
   respectively. These results suggested that there were many unknown
   genes in sheep skin tissue that may also play important roles in the
   regulation of HF development and wool growth.

Analysis of the expression profiling of NATs

   Sense-antisense regulation is an important method for controlling gene
   expression. If the clean tags can be matched to the antisense strand of
   the gene, then it suggests that there are also transcripts for the
   antisense strand of this gene and that this gene may be subjected to
   sense-antisense regulation [[84]44]. In the present study, we found
   that in the gene expression profiling libraries of the 6 individuals of
   the two groups, the percentages of genes with antisense transcripts
   were, respectively, 5.83%, 6.01%, 5.91%, 5.84%, 6.00% and 6.20% of the
   total numbers of the genes in the library, including 247,816 (5.04%),
   262,353 (5.21%), 249,595 (5.14%), 247,940 (5.03%), 266,926 (5.15%) and
   265,599 (5.37%) tags that could be exactly matched, respectively
   ([85]Table 3); the number of tag species accounted for, respectively,
   13.31%, 13.31%, 13.46%, 13.20%, 13.65% and 11.94% of the total numbers
   of tag species in the expression profile libraries of the 6 individuals
   from the 2 groups, including 15,192 (12.67%), 14,966 (12.70%), 15,483
   (12.85%), 13,957 (12.59%), 15,289 (13.00%) and 16,346 (11.37%) tag
   species that could be exactly matched, respectively([86]Table 3); there
   were 6472 (33.45%), 6398 (33.07%), 6585 (34.04%), 6227 (32.19%), 6346
   (32.80%) and 6758 (34.93%) genes with NATs in the gene expression
   profiles of the 6 individuals of the 2 groups ([87]Table 3). The
   sense-antisense transcript ratios of all 6 expression profile libraries
   exhibited similar trends. The ratios between the tag number, distinct
   tag number and tag-mapped genes of sense and antisense transcripts were
   10:1, 2:1 and 5:3, respectively; the spearman r between the sense and
   antisense transcripts expression were 0.533 ~ 0.557.

Table 3. Summary of unambiguous tag mapping to anti-sense genes and
unambiguous tag-mapped anti-sense genes.

   Sample ID Unambiguous tag mapping to anti-sense genes Unambiguous
   tag-mapped anti-sense genes
   Total number Total % of clean tag Distinct Tag number Distinct Tag % of
   clean tag number % of ref genes
   5Y127 247816 5.04% 15192 12.67% 6472 33.45%
   5Y212 262353 5.21% 14966 12.70% 6398 33.07%
   5Y339 249595 5.14% 15483 12.85% 6585 34.04%
   65505 247940 5.03% 13957 12.59% 6227 32.19%
   65530 266926 5.15% 15289 13.00% 6346 32.80%
   65540 265599 5.37% 16346 11.37% 6758 34.93%
   [88]Open in a new tab

Detection of DEGs, NATs and validation

   The clean tags of all six individuals were aligned with the reference
   tag database, allowing a maximum of one base mismatch. Unambiguous tags
   were annotated, and the number of raw clean tags that corresponded to
   the same gene was counted and then standardized to obtain the
   standardized expression level of each gene in the skin transcriptome of
   the 6 individuals. Noiseq software was used to select DEGs, NATs that
   exhibited different expression levels between S and F group. However,
   there were only 47 significant DEGs and NATs (log2Ratio (S/F)≥1,
   q-value≥0.8).9 genes were up-regulated, 31 genes and all 7 NATs were
   down-regulated ([89]Table 4, [90]Fig 1). Among the down-regulated
   genes, NT5C3L (Gene ID 101109197) showed the greatest expression
   difference (log2Ratio(S/F) = –10.59,q value = 0.89); among the
   up-regulated genes, CCNA2 (Gene ID 100144758) showed the greatest
   expression difference (log2Ratio(S/F) = 6.04, q value = 0.83). 38
   significant expression genes had NATs, and only 2 had no found
   antisense transcripts: CCNA2 (Gene ID: 100144758) and
   prolactin-inducible protein homolog (Gene ID: LOC101114011).3
   significant DEGs (AQP8, Gene ID: 101108013;Bos d2,Gene ID:101116281,
   and SPRR,Gene ID: 443313) had significant NATs. All 3 of these genes
   and their NATs were significantly down-regulated in the super-fine wool
   group ([91]Table 4).10 DEGs, NATs were used to validate selected
   differentially expressed transcripts identified from DGE profiling by
   Real-time PCR. The results from the real-time PCR confirmed the
   expression pattern of DGEs and NATs at two different groups in Gansu
   Alpine fine wool sheep.

Table 4. Summary of DEGs and NATs between two groups.

   Kinds of Transcriptes GeneID log2Ratio(S/F) Probability Symbol
   DEGs 101109197 -10.5912106 0.886583566 NT5C3L
   101116281 -7.40151954 0.969520529 LOC101116281
   101120858 -4.57602606 0.868500902 LOC101120858
   443313 -4.45466762 0.848327046 Small proline-rich
   101109718 -4.16402747 0.835055492 CA4
   101108013 -4.13509115 0.838991854 AQP8
   101118004 -3.99924015 0.86634137 LOC101118004
   101115395 -3.90149163 0.829820677 PGLYRP1
   101114011 -3.8813167 0.855789733 LOC101114011
   101107368 -3.8266707 0.827278443 SLC25A35
   101104026 -3.72227757 0.855311355 S100A8
   101115563 -3.4495271 0.829328632 ABCC11
   101109387 -3.38125687 0.87068777 GLYATL2
   101106121 -3.3527916 0.870715106 LOC101106121
   101115336 -3.29504885 0.839237877 DNASE1L2
   101116409 -3.16407151 0.86568531 LOC101116409
   101113168 -3.1562348 0.826690722 LOC101113168
   100137068 -2.97890634 0.857115521 LOC100137068
   101116799 -2.80442661 0.859794434 LOC101116799
   [92]NM_001009395.1 -2.67922661 0.847588978 -
   101102714 -2.61084761 0.822795364 LOC101102714
   101102540 -2.3882578 0.861489257 RPL39
   101102697 -2.14152458 0.81083593 DNAJC12
   101108654 -2.10679611 0.847930676 LOC101108654
   101116537 -1.89220493 0.803933629 LOC101116537
   101121216 -1.86897593 0.842094473 LOC101121216
   101112716 -1.85095925 0.834194413 KRT7
   101121307 -1.5666126 0.800489312 ACSM3
   101111121 -1.46170294 0.809141108 CD82
   101114256 -1.40678444 0.802238806 ACTG2
   101109430 -1.32738037 0.80115904 KRT1
   101105583 1.377957974 0.802293478 GSDMA
   101110063 1.491887466 0.816426111 HSPA2
   100135694 1.587840025 0.830968782 RPS27A
   443218 1.679295263 0.826485703 FOS
   101113964 1.819875496 0.838185446 PDCD6IP
   101119862 2.058276167 0.826212345 LIPK
   101120443 2.31282643 0.827415122 LOC101120443
   101105188 4.991770667 0.839825597 DAP
   100144758 6.03994716 0.834235416 CCNA2
   NATs 101116281 -9.15903036 0.941608145 Bos d2
   443313 -7.587465008 0.819808271 SPRR
   101108013 -7.807354922 0.844108795 AQP8
   101120353 -5.685396543 0.88975588 major allergen I polypeptide chain
   2-like
   101120550 -6.896227669 0.975067811 SMC1A
   101113086 -9.556506055 0.957344034 primary amine oxidase
   101113693 -8.661778098 0.914297923 ABP
   [93]Open in a new tab

Fig 1. The numbers of DEGs between two groups.

   [94]Fig 1
   [95]Open in a new tab

   Between two groups, there were 9 upregulated genes and 31 downregulated
   genes and 7 NATs.

   Gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and
   other databases were used for the functional analysis and signaling
   pathway annotations of these DGEs and NATs. Among all of the
   significant DEGs and NATs, 44 genes were annotated, and 3 genes located
   in the genome could not be annotated effectively. In total of 47 DEGs
   and NATs were summarized as three main GO categories and 38
   subcategories ([96]Fig 2). Among the molecular functions category, the
   top three were involved in binding and catalytic activity. Regarding
   cellular components, cell part, cell, organelle, membrane, organelle
   parts were the dominant groups. Within biological processes category,
   metabolic process and cellular process were the most dominant group.
   However, no significant enrichment of GO terms was found (corrected
   P-value >0.05). The pathways that were significantly enriched with
   significant DEGs and NATs were mainly the lipoic acid metabolism, bile
   secretion, salivary secretion and ribosome and phenylalanine metabolism
   pathways (P < 0.05).

Fig 2. GO functional analysis of DEGs and NATs.

   Fig 2
   [97]Open in a new tab

   The results were summarized in three main categories: biological
   process, cellular component and molecular function. Among these groups,
   the terms binding, cell part and metabolic process were dominant in
   each of the three main categories, respectively.

Discussion

   Wool is produced via synthetic processes by HFs, which are embedded in
   the skin of sheep [[98]45]. There are two types of HFs, named primary
   HF and secondary HF, which are different in appearance and function.
   Secondary HFs of the fine wool sheep are the main hair follicle and are
   critical determinants of mean fiber diameter and other wool
   characteristics. WFD is highly correlated with the size of DP of HF
   [[99]17], whose origin can be traced to the dermal condensate, one of
   the earliest features of the developing HF[[100]19]. The charactering
   of the molecular controls of HF initiation, morphogenesis, branching
   and growth can facilitate enhanced selection for new sheep breeds with
   lower WFD [[101]20]. In order to understand the molecular mechanisms of
   controlling WFD, Adelson et al. (2004) constructed three cDNA libraries
   from fetal and adult sheep skin, obtained 2,345 noredundant EST
   sequences and identified 61 ESTs expressed in the adult HF, which
   constituted a high priority candidate gene subset for further work
   aimed at identifying genes useful as selection markers or as targets
   for genetic engineering[[102]19]. Norris et al. (2005) found 50 up- and
   82 down-regulated genes with increased fetal and inguinal expression
   relative to adult by compared skin gene expression profiles between
   fetal day 82, day 105, day 120 and adult HF anagen stage using a
   combined ovine–bovine skin cDNA microarray[[103]20]. However, comparing
   to sequencing of expressed sequence tags (ESTs) and cDNA microarray,
   DGE profiling has many unique advantages [[104]46, [105]47]. It is
   highly accurate and has a very low detection limit, giving it a very
   wide range of applications[[106]48],such as detection of new
   transcripts [[107]49],functional research of non-coding RNA [[108]50,
   [109]51]. In this study, six DGE profiling were conducted to analyze
   the DEGs and NATs in the skin of Gansu Alpine fine wool sheep with
   different WFD at the same age, gender, and nutrition level during the
   anagen. More than 4.7 Mb of clean tags were obtained for each sheep,
   with more than 0.11 Mb of distinct tags. A total of 47 significant DEGs
   and NATs were detected, including 9 up-regulated genes, 31
   down-regulated genes and 7 down-regulated NATs.

   The HF comprises several concentric epithelial structures[[110]29].
   Wool fiber is enveloped by two epithelial sheaths, known as the inner
   root sheath (IRS) and the outer root sheath (ORS). At the distal end of
   the HF, IRS cells undergo apoptosis and liberate the wool fiber
   [[111]29, [112]52]. ORS cells contain proliferating cells derived from
   stem cells in the bulge that feed into the matrix compartment of the
   bulb[[113]53]. The matrix is a proliferative zone located at the
   proximal end of the HF surrounding the DP [[114]52]. During the anagen
   phase, the DP cells are thought to direct the matrix cells to
   proliferate and differentiate into the multiple cell types that form
   the wool fiber and its channel, the IRS [[115]9]. During postnatal
   life, hair follicles show patterns of cyclic activity with periods of
   active growth and hair production (anagen), apoptosis-driven involution
   (catagen), and relative resting (telogen) [[116]53–[117]55]. However,
   Merino HFs are predominantly in anagen throughout growth, different to
   many animals such as the mouse, rabbit, and guinea-pig [[118]30]. It is
   also well known that WFD is significantly associated with sizes of DP
   and matrix in mammals [[119]4, [120]6, [121]10–[122]17],which are
   markedly influenced by genetic[[123]10, [124]11, [125]21],
   physiological[[126]13],nutrition[[127]22], hormones[[128]12] during the
   anagen phase of the hair cycle. However, the genes specifically
   controlling the size of DP and matrix remain elusive. In the present
   study, the up-regulated genes in the super fine wool group included 5
   genes associated with the cell cycle and apoptosis: GSDMA3 [[129]56],
   HSPA2 [[130]57], RPS27A [[131]58], PDCD6IP [[132]59], DAP[[133]60],
   CCNA2 [[134]61] and FOS[[135]62]; the down-regulated genes included 7
   genes associated with promoting follicle cell proliferation and
   differentiation: KRT1 [[136]63, [137]64], KRT7 [[138]65, [139]66],
   HSD11B1 [[140]67, [141]68], S100A8 [[142]41, [143]69, [144]70], NT5C3
   L[[145]71] and DNAJC12 [[146]72]. These genes may through promoting HF
   cell apoptosis, inhibiting follicle cell proliferation and
   differentiation, thereby reduce the WFD.

   White adiposities, dermal fibroblasts, smooth muscle and endothelial
   cells of the vasculature, neurons and resident immune cells also
   surround the HFs in the skin. Recent studies show that the growth of
   wool fiber is not only affected by the proliferation and
   differentiation of follicular cells but is also affected by other cells
   around the follicle [[147]73, [148]74]. The layer of intradermal
   adipocytes expands as the HF enters its anagen stage and then thins
   during telogen [[149]75, [150]76]. FATP4^-/-, Dgat1^-/-, or Dgat2^-/-
   Early B cell factor 1 (Ebf1^-/-) mice have decreased intradermal
   adipose tissue due to defects in lipid accumulation in mature
   adipocytes [[151]77–[152]79]. Interestingly, these mice also display
   abnormalities in skin structure and function, such as hair loss and
   epidermal hyperplasia. In the present study, in the super fine wool
   group, it was also determined that the genes related to fat synthesis
   (ABCC11[[153]80], GLYATL2 [[154]81], and ACSM3 [[155]82]) were
   significantly down-regulated and that LIPK, which is involved in
   lipolysis, was significantly up-regulated [[156]83].

   Genes with NATs are very common in animal and plant genomes, accounting
   for 7% to 9% of transcripts in plants and 5% to 30% of transcripts in
   animals, except for nematodes [[157]84]. The present study indicated
   that more than 30% of the genes had NATs, which is consistent with the
   findings of the above mentioned studies. In the present study,
   correlation analysis was conducted on the sense and antisense
   transcripts in the DGEs of skin, and the results showed that the
   correlation coefficients between the sense and antisense transcripts
   were between 0.533 and 0.557, indicating that the WFD maybe were
   regulated by both sense and antisense transcripts. NATs regulate sense
   transcripts through positive or negative feedback by forming a
   sense-antisense bidirectional structure [[158]85]. In this study, 3 out
   of 40 significantly differentially expressed genes had significantly
   different NATs: AQP8, Bos d2, and SPRR (type II small proline-rich
   protein). AQP8 belongs to the aquaporin subfamily of the AQP family
   [[159]86]; it is located mainly in a variety of tissues and on the
   surfaces of acinar cells in various glands and plays important roles in
   gland secretion and the transportation of water, urea, ammonia and
   hydrogen peroxide [[160]87–[161]90]. Bos d 2 belongs to the lipocalin
   family of proteins[[162]91]. In the skin sections, Bos d2 was found in
   the secretory cells of apocrine sweat glands and the basement membranes
   of the epithelium and HF[[163]92]. It is assumed that Bos d2 is a
   pheromone carrier.[[164]92]. The SPRR proteins comprise a subclass of
   specific cornified envelope precursors encoded by a multigene family
   clustered within the epidermal differentiation complex region
   [[165]93]. Two SPRR1, seven SPRR2, one SPRR3 and one SPRR4 genes are
   located within a 300-kb area of the EDC[[166]94, [167]95] and are
   expressed in the epidermis, HFs and capillaries [[168]96, [169]97].
   Several studies have suggested that the SPRRs are related to increased
   epithelial proliferation and to malignant processes[[170]98]. Other
   than functioning as structural proteins, SPRRs also regulate gene
   expression levels, inhibit cell proliferation and promote
   differentiation [[171]99]. The present study also found that AQP8, Bos
   d2, SPRR and their antisense transcripts were significantly
   down-regulated in the super fine wool group, suggesting that AQP8, Bos
   d2 and SPRR and their antisense transcripts were regulated by positive
   feedback. However, the mechanism by which AQP8, Bos d2 and SPRR
   regulate the WFD needs to be further studied.

Materials and Methods

Sheep skin sampling

   Gansu Alpine fine wool sheep were bred in the Huang Cheng District of
   Gansu Province, China, by cross breeding Mongolian or Tibetan sheep
   with Xinjiang Fine Wool sheep and then with some fine wool sheep breeds
   from the Union of Soviet Socialist Republics, such as Caucasian sheep
   and Salsk sheep. The breed was approved by the Gansu provincial
   government in 1980. Gansu Alpine fine wool sheep were obtained from a
   sheep stud farm located in Zhangye city, Gansu Province. All
   experimental and surgical procedures were approved by the Institutional
   Animal Care and Use Committee, Lanzhou Institute of Husbandry and
   Pharmaceutical Sciences, Peoples Republic of China. Six unrelated 3
   years old ewes at different WFD, and also as different DP size were
   selected and divided into super fine wool group (S)(WFD = 18.0±0.5μm;
   Diameter of secondary DP size = 3.2±0.2μm) and fine wool group (F)(WFD
   = 23.0±0.5μm; Diameter of secondary DP size = 4.1±0.2μm) ([172]Fig 3).
   A piece of midside skin (2 mm in diameter) was collected via punch skin
   biopsy under local anesthesia using 1% procaine hydrochloride
   immediately placed in liquid nitrogen and stored at -80°C for
   subsequent analysis.

Fig 3. The size of secondary DP cells between two groups.

   Fig 3
   [173]Open in a new tab

   The diameter of super fine wool sheep secondary DP cells = 3.2±0.2μm
   (A), the diameter of fine wool sheep secondary DP cells = 4.1±0.2μm
   (B).

Total RNA extraction, library construction and deep sequencing

   Total RNA was isolated from the tissues using the RNeasy Maxi Kit
   (Qiagen, Hilden, GER) according to the manufacturer instructions. RNA
   quality was verified using a 2100 Bioanalyzer RNA Nanochip (Agilent,
   Santa Clara, CA, USA), and the RNA Integrity Number (RIN) value was
   >8.5. Then, the RNA was quantified using the Nano Drop ND-2000
   Spectrophotometer (Nano-Drop, Wilmington, DE, USA). Sequence tags were
   prepared using the Illumina Digital Gene Expression Tag Profiling Kit,
   according to the manufacturer’s protocol. Sequencing libraries were
   prepared from 1μg of total RNA using reagents from the NlaIII Digital
   Gene Expression Tag Profiling kit (Illumina Inc., San Diego, CA, USA).
   mRNA was captured on magnetic oligo(dT) beads and reverse transcribed
   into double-stranded cDNA (SuperScript II, Invitrogen, Carlsbad, CA,
   USA). The cDNA was cleaved using the restriction enzyme NlaIII. An
   adapter sequence containing the recognition sequence for the
   restriction enzyme MmeI was ligated to the NlaIII cleavage sites. The
   adapter-ligated cDNA was digested with MmeI to release the cDNA from
   the magnetic bead, while leaving 17 bp of sequence in the fragment. The
   fragments were dephosphorylated and purified by phenol–chloroform. A
   second adapter was ligated at the MmeI cleavage sites. Adapter-ligated
   cDNA fragments were amplified by PCR, and after 15 cycles of linear PCR
   amplification, 95 bp fragments were purified by 6% TBE PAGE gel
   electrophoresis, denatured and the single-chain molecules were fixed
   onto the Illumina Sequencing Chip. A single-molecule cluster sequencing
   template was created through in situ amplification. Nucleotides labeled
   with different colors were used to perform sequencing by the sequencing
   by synthesis method; each tunnel can generate millions of raw reads
   with a sequencing length of 35 bp.

Determination of gene expression levels and detection of DEGs and NATs

   Sequencing-received raw image data were transformed by base culling
   into sequence data, which was called raw data. Raw sequences were
   transformed into clean tags after removed all low quality tags such as
   short tags (< 21 nt), empty reads, and singletons (tags that occurred
   only once). Clean tags were classified according to their copy number
   (as a percentage of the total number of clean tags) and the saturation
   of the library was analyzed. All clean tags were mapped to sheep
   reference sequences(version:Oarv3.1) by SOAP2(version:2.21) with
   default parameters, and allowing one mismatches. To monitor mapping
   events on both strands, both sense and complementary antisense
   sequences were included [[174]100].

   A preprocessed database of all possible 17 base-long sequences of
   Oarv3.1 located next to the NlaIII restriction site was created as the
   reference tags, and only one mismatch was allowed. Following the common
   practice, only clean reads that can be uniquely mapped back to the
   reference tags were considered ("-r 0" option), and those mapped more
   than one location were discarded. Remainder clean tags were designed as
   unambiguous clean tags. When multiple types of remainder clean tags
   were aligned to the different positions of the same gene, the gene
   expression levels were represented by the summation of all. The number
   of unambiguous clean tags for each gene was calculated and then
   normalized to TPM (number of transcripts per million clean tags) =
   clean tag number corresponding to each gene number of total clean
   labels in the sample × 1,000,000 [[175]101, [176]102].

   All TPM of 3 samples each group were integrated, and NOISeq (version:
   2.8.0) [[177]103]with default parameters, which is a novel
   nonparametric approach for the identification of DEGs and NATs, was
   used to detect the differentially expressed transcripts between super
   fine wool group (S) and fine wool group (F). To measure expression
   level changes between two groups, NOISeq takes into consideration two
   statistics: M(the log[2]-ratio of the two conditions) and D(the
   absolute value of the difference between conditions). The probability
   thresholds were P ≥ 0.8 and the TMM value in the lower expressed sample
   was ≥1. The higher the probability, the greater the change in
   expression between two groups. Using a probability threshold of 0.8
   means that the gene is 4 times more likely to be differentially
   expressed than non-differentially expressed[[178]104].

Strand-specific real-time quantitative RT-PCR

   To confirm the differentially expressed sense and antisense transcripts
   between super fine wool group and fine wool group, ten genes were
   randomly selected to verify their expression levels of gene and NATs
   transcripts in skin by strand-specific qRT-PCR according to the
   protocol described in Haddad et al. (2007)[[179]105]. Primers for
   real-time PCR were designed with Primer Express 3.0 (Applied
   Biosystems) ([180]Table 5). GAPDH was used as a reference control. Real
   time PCR was performed using SYBR Green master mix (TianGen) on the
   CFX96 Real-Time System (BioRAD, USA). The reaction was performed using
   the following conditions: denaturation at 95°C for 3 min, followed by
   40 cycles of amplification (95°C for 30s, 60°C for 30s, and 72°C for
   30s). Relative expression was calculated using the delta-delta-Ct
   method.

Table 5. Relevant information of gene and primer sequences for
strand-specific RT-PCR.

   Genes name Primer sequences (5′→3′) GenBank accession No Produce size
   (bp)
   Bos d2 ACAGTAGTGGCACAGAGAGC [181]XM_004021893.1 136
   TTGTGGTCTTCTCGCCATCA
   SPRR CAAAAATGCCCTCCTGTGCC [182]NM_001009773.1 91
   CCTGACCAGATAAAAGCTGATGC
   AQP8 TCATCCTGACGACACTGCTG [183]XM_004020851.1 146
   ATTCATGCACGCTCCAGACA
   KRT1 CCCTGGATGTGGAGATTGCC [184]XM_004006320.1 119
   ACTGATGCTGGTGTGACTGG
   KRT7 ACATCAAGCTGGCTCTGGAC [185]XM_004006333.1 108
   CCACAGAGATGTTCACGGCT
   S100A8 GCTGACGGATCTGGAGAGTG [186]XM_004002523.1 163
   GAACCAAGTGTCCGCATCCT
   GAPDH GCTGAGTACGTGGTGGAGTC [187]NM_001190390.1 136
   GGTTCACGCCCATCACAAAC
   AQP8-NAT CCTTGGGAAATCCTTGCAGC 137
   GAAAAGCCCGTTGCAGACTC
   Bos d2-NAT TTGGCGGAAGTATGTCGCAG 112
   CAGCTCAGCCAGAATCGTCA
   SPRR-NAT CCATGACCCGCTTTGAAGA 132
   CAATGCCAGCAGAAGTGCC
   [188]Open in a new tab

GO and KEGG enrichment analysis of differentially expressed transcripts

   Gene ontology (GO), an international standardized gene functional
   classification system, offers a dynamic-updated controlled vocabulary
   and strictly defined concept to comprehensively describe the properties
   of genes and their products in any organism[[189]106]. Kyoto
   Encyclopedia of Genes and Genomes (KEGG) database is a database
   resource for understanding functions and utilities of the biological
   system, such as the cell, the organism and the ecosystem from molecular
   information, especially for large-scale molecular datasets generated by
   genome sequencing and other high throughput experimental technologies
   ([190]http://www.genome.jp/kegg/) [[191]107]. all DEGs and NATs were
   mapped to GO-terms in GO database, looking for significantly enriched
   GO terms in DEGs comparing to the genome background GOseq R
   package(version:1.18.0)[[192]108],while significantly enriched
   metabolic pathways or signal transduction pathways in DEGs and NATs
   were identified via pathway enrichment analysis using KEGG, public
   pathway-related database, and comparing with the whole genome
   background by KOBAS(version: 2.0)[[193]109].

   In all tests, P-values were calculated using Benjamini-corrected
   modified Fisher’s exact test and ≤0.05 was taken as a threshold of
   significance. The calculating formula is:
   [MATH: <mi mathvariant="normal">P</mi><mspace
   width="1pt"></mspace><mo>=</mo><mspace
   width="1pt"></mspace><mn>1</mn><mo>-</mo><mrow><msubsup><mo
   stretchy="true">∑</mo><mrow><mtext>i</mtext><mspace
   width="1pt"></mspace><mo>=</mo><mspace
   width="1pt"></mspace><mn>0</mn></mrow><mrow><mtext>m</mtext><mo>-</mo><
   mn>1</mn></mrow></msubsup><mrow><mfrac><mrow><mfenced
   separators="|"><mrow><mtable><mtr><mtd><mrow><maligngroup></maligngroup
   ><mtext>M</mtext></mrow></mtd></mtr><mtr><mtd><mrow><maligngroup></mali
   gngroup><mtext>i</mtext></mrow></mtd></mtr></mtable></mrow></mfenced><m
   fenced
   separators="|"><mrow><mtable><mtr><mtd><mrow><maligngroup></maligngroup
   ><mtext>N</mtext><mo>-</mo><mtext>M</mtext></mrow></mtd></mtr><mtr><mtd
   ><mrow><maligngroup></maligngroup><mtext>n</mtext><mo>-</mo><mtext>i</m
   text></mrow></mtd></mtr></mtable></mrow></mfenced></mrow><mrow><mfenced
   separators="|"><mrow><mtable><mtr><mtd><mrow><maligngroup></maligngroup
   ><mtext>N</mtext></mrow></mtd></mtr><mtr><mtd><mrow><maligngroup></mali
   gngroup><mtext>n</mtext></mrow></mtd></mtr></mtable></mrow></mfenced></
   mrow></mfrac></mrow></mrow> :MATH]

   Where N is the number of all genes with GO or KEGG annotation; n is the
   number of DEGs or NATs in N; M is the number of all genes that are
   annotated to the certain GO terms or specific pathways; m is the number
   of DEGs or NATs in M.

Supporting Information

   S1 Table. Summary of tags mapping to gene & genome and unknown tag.

   (DOCX)
   [194]Click here for additional data file.^ (17.3KB, docx)
   S2 Table. Summary of tags mapping to anti-sense genes and tag-mapped
   anti-sense genes.

   (DOCX)
   [195]Click here for additional data file.^ (15KB, docx)

Acknowledgments