Abstract Background Transcription factor (TF) proteins are a key component of the gene regulatory networks that control cellular fates and function. TFs bind DNA regulatory elements in a sequence-specific manner and modulate target gene expression through combinatorial interactions with each other, cofactors, and chromatin-modifying proteins. Large-scale studies over the last two decades have helped shed light on the complex network of TFs that regulate development in Drosophila melanogaster. Results Here, we present a detailed characterization of expression of all known and predicted Drosophila TFs in two well-established embryonic cell lines, Kc167 and S2 cells. Using deep coverage RNA sequencing approaches we investigate the transcriptional profile of all 707 TF coding genes in both cell types. Only 103 TFs have no detectable expression in either cell line and 493 TFs have a read count of 5 or greater in at least one of the cell lines. The 493 TFs belong to 54 different DNA-binding domain families, with significant enrichment of those in the zf-C2H2 family. We identified 123 differentially expressed genes, with 57 expressed at significantly higher levels in Kc167 cells than S2 cells, and 66 expressed at significantly lower levels in Kc167 cells than S2 cells. Network mapping reveals that many of these TFs are crucial components of regulatory networks involved in cell proliferation, cell–cell signaling pathways, and eye development. Conclusions We produced a reference TF coding gene expression dataset in the extensively studied Drosophila Kc167 and S2 embryonic cell lines, and gained insight into the TF regulatory networks that control the activity of these cells. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10241-1. Keywords: Drosophila, Transcription factor, Embryo, Cell lines, Kc, S2 Background Cellular identities and functions depend on differential gene expression, which occurs primarily at the transcriptional level and is controlled by complex regulatory networks of transcription factors (TFs). Specifically, TF proteins bind DNA by interacting with specific sequences in regulatory elements to activate or repress transcription. Modulated through protein interactions and signaling pathways, TFs control the spatial and temporal transcriptional programs that ultimately specify cell fates and coordinate tissue and organ formation during development. Genomic technologies have enabled detailed annotation of regulatory networks in multiple biological contexts and have increased our understanding of regulatory connections. Several valuable large-scale studies over the last 20 years have helped elucidate the complex network of TFs that regulate development in Drosophila melanogaster [[29]1–[30]8]. Further comprehension of the control of gene expression in Drosophila requires the integration of a systematic analysis of the expression profile of TFs in defined cellular systems. Two of the most widely studied cell lines from Drosophila are Kc167 [[31]9] and Schneider 2 [[32]10] cells. Both of these cell lines were isolated from embryos and have been extensively used in wide-ranging studies of biological processes [[33]11]. Kc167 (Kc) cells were derived from embryos at stage 13–15 (dorsal closure) [[34]12] [[35]13], while Schneider 2 (S2) cells originate from embryos at stage 16–17 (late embryonic) [[36]10]. Although both cell lines display evidence of a hematopoietic origin, their respective global gene expression profiles are distinct [[37]14, [38]15] and the patterns of TF coding gene expression in each cell type remains poorly characterized. In this current work, we expand on these earlier studies by utilizing genome-wide RNA sequencing approaches to systematically characterize expression for all known or predicted Drosophila TFs in Kc and S2 cells. Deep read coverage enables us to compare the transcription profile for all 707 annotated TFs in the two cell lines in detail. The results shed light on some key shared features and differences between the two embryonic cell types and contribute to our understanding of the transcriptional landscape in the cell lines. Network analysis uncovers some important components of the regulatory environment in the cells and opens up the possibility of using these cell lines to further investigate critical TFs involved in the molecular control of gene regulatory networks in embryonic development. Methods Cell culture and RNA isolation The Kc167 (Kc, RRID: CVCL_Z833) and S2-DRSC (S2, RRID: CVCL_Z992) cell lines used in this study were obtained from the Drosophila Genomics Resource Center (DGRC). Cells were thawed, passaged and frozen as previously described [[39]15]. Cells were harvested at ~ 5 × 10^6 cells/mL density from six replicate samples grown in 25cm^2 canted neck culture flasks (Corning) and RNA isolated as previously described using a RNeasy kit following the manufacturer’s protocol (Qiagen) [[40]15]. RNA sequencing and read mapping Library construction and sequencing were performed at the Beijing Genomics Institute. Briefly, 10ug of total RNA was enriched for poly(A)^+ RNA by oligo(dT) selection and used to generate a cDNA library for sequencing, as previously described [[41]15]. The libraries were sequenced on the Illumina nanoball (DNBSEQ) PE100 platform. Sequencing data was filtered to remove reads that contained adaptor sequences, reads whose N content was greater than 5%, and low-quality reads (quality score less than 15 for 20% or greater of the total bases in the given read). The generated clean read fastq files were aligned using Bowtie2 software to the Drosophila melanogaster genome (Release 6 plus ISO1 mitochondrial, RefSeq accession: GCF_000001215.4) and used to calculate quantitative RPKM, FPKM and TPM scores as previously described [[42]15]. Transcription factor gene expression analysis A detailed computational pipeline enhanced with manual curation was employed by the Berkeley Drosophila Genome Project to identify a comprehensive list of 708 genes that encode a putative DNA-binding domain (DBD) in the Drosophila melanogaster genome [[43]2]. Upon analysis of the complete list of 708 genes we discovered one duplication (gene symbols mamo and CG11071). We therefore removed CG11071 and the total number of transcription factor (TF) genes considered in our study was 707 (Table S[44]1). Of these 707 genes, 604 demonstrated detectable expression in either Kc or S2 cells, and 493 had read counts of 5 or greater in at least one of the cell lines (Table S[45]2). Classification of the individual TFs by DBD was performed using the GO TF level 2 annotation term in the phyper R package ([46]https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Hypergeom etric.html), in combination with the qvalue Bioconductor package ([47]https://bioconductor.org/packages/release/bioc/html/qvalue.html). GO enrichment analysis was initially performed in the phyper R package to calculate p-values. A q-value for each family of DBD TFs was obtained by multiple testing correction of the p-value, with a final q-value <  = 0.05 considered a significant enrichment. A TF enrichment ratio (Rich Ratio) for each DBD family was calculated by dividing the number of candidate genes identified with a specific DBD term, with the total number of genes in the genome annotated with the same DBD term. Differentially expressed genes were defined as genes with a False Discovery Rate (FDR) equal to or less than 0.001 and fold change equal to or greater than 2. The R package pheatmap was used to perform hierarchical clustering analysis on the sets of differentially expressed genes, as previously described [[48]15]. Dr. Tom ([49]http://biosys.bgi.com) was used to evaluate the protein–protein interaction (PPI) network of the TF encoding genes. All the nodes in the PPI network were TF mRNAs. KEGG pathway enrichment analysis [[50]16] was performed using the same methodology as the GO functional enrichment analysis described above. Pathways with a final q-value <  = 0.05 are defined as significantly enriched in differentially expressed genes. The KEGG pathways of the selected genes are ranked by the number of genes in the pathway and only the top 10 pathways with the largest number of genes are displayed. Data availability The datasets supporting the results of this article are available at the NCBI Sequence Read Archive (SRA) under BioProject accession number PRJNA937779. Results and discussion Transcription factor expression landscape In order to consider the expression of a complete set of DNA-binding transcription factors (TFs) in the Drosophila genome, we compiled a list containing 707 genes (Table S[51]1). The list was based on detailed prior studies that utilized a combination of computational prediction from a DNA-binding domain (DBD) database [[52]17] enhanced with manual annotation to identify all putative TF coding genes in the genome [[53]2] (see Methods for details). As previously reported, we identified a total of 73 distinct DBD families represented in this comprehensive list of 707 genes [[54]2]. Gene expression of the 707 TF genes in both cell lines is exponentially distributed, varying from undetectable to 3,932 transcripts per million (TPM) in Kc cells and undetectable to 892 TPM in S2 cells, with a majority of genes expressed at the lower end of this range (Fig. [55]1a and Table S[56]1). The median TPM expression value (log[2] scale) in Kc cells is 3.671 and in S2 cells is 3.676 (Fig. [57]1b). Expression was detectable in at least one of the two cell line transcriptomes for 604 genes, which equals 85.43% of the 707 total annotated TFs in the genome. A group of 493 of the 707 genes (69.73%) had a read count of 5 or greater in at least one of the cell lines (Table S[58]2), with median TPM expression values in Kc cells of 4.155 and in S2 cells of 4.265 for this group (Fig. [59]1c). Fig. 1. Fig. 1 [60]Open in a new tab Expression profile of all 707 Drosophila transcription factor genes. A Heatmap indicates log[2] expression level for all 707 annotated TF genes in the Drosophila genome in Kc and S2 cells. The expression level color key is shown to the right. B Violin plot indicating maximum, upper quartile, median, lower quartile and minimum TPM expression (log[2] scale) for all 707 TF genes in Kc (dark blue) and S2 (orange) cells. The median value in Kc cells is 3.671 and in S2 cells is 3.676. C Violin plot indicating maximum, upper quartile, median, lower quartile and minimum TPM expression (log[2] scale) for the 493 TF genes with a read counts of 5 or greater in at a least one cell type (as described in methods) in Kc (dark blue) and S2 (orange) cells. The median value in Kc cells is 4.155 and in S2 cells is 4.265 Comparison to existing datasets The overall expression profiles for the TF genes in both cell lines are largely consistent with earlier modENCODE studies [[61]14]. Analysis across 25 different Drosophila cell lines, including Kc and S2, revealed that 228 of the 711 identified TFs with characterized DNA-binding domains were not expressed at all in any of the cell lines examined [[62]14]. For the remaining 483 TFs, there was a wide range in levels of expression across the different cell types. We analyzed the expression of the 27 TFs shown to have the greatest variation in expression across the 25 cell lines analyzed in the modENCODE studies (Table S[63]3). Seventeen of the 27 demonstrate at least a two-fold change in expression between the Kc and S2 cell types in our study (see Methods for full details), with many mirroring the reported differences between these cell lines in the earlier study [[64]14]. Two TFs (sug and ham) have very high expression in both cell lines and two TFs (noc and HLH4C) have relatively low expression in both cell lines, in agreement with the prior studies (Table S[65]3). In contrast, we found that the remaining six TFs (twi, ac, bi, Dr, HGTX and hbn) have no detectable expression in either cell type, representing a difference between our dataset and the modENCODE results. We also analyzed the expression of 16 TF genes previously shown to have ubiquitous and high level expression in all 25 different Drosophila cell lines [[66]14]. Reassuringly, all 16 of these genes demonstrated very high levels of expression in both Kc and S2 cells in our study, with no significant differences in the levels of expression between the two cell types (Table S[67]4). As these TFs are expressed uniformly in all Drosophila cell lines, we propose that they may represent a signature for immortalization and would be the core set of candidate TFs to study the regulation of Drosophila cell replicative proliferation in future studies. 493 expressed TFs classified by DBD family If we consider just the 493 TF coding genes with a read count of 5 or greater in at least one of the cell lines (Table S[68]2), the majority are expressed at relatively low levels in both cell types (Fig. [69]2a) and consequently do not demonstrate a significant fold change in expression between the two cell types (Fig. [70]2b). However, a number of genes do show distinct expression profiles when compared between Kc and S2 cells (Fig. S[71]1). The distribution of the 493 TFs by their annotated DBD is shown in Fig. [72]3a. Of the 73 DBD families present in the comprehensive list of 707 TFs, 54 (73.97%) different families are represented in the group of 493 TFs. Of those, 13 are present in five or more TFs, with the zinc finger zf-C2H2 family as the largest group with 179 members, followed by the bHLH (30 TFs) and homeobox (23 TFs) groups (Fig. [73]3a). Twenty-four distinct DBD families are present in only one TF each. Fig. 2. [74]Fig. 2 [75]Open in a new tab Expression profile of the 493 transcription factor genes with read counts of 5 or greater in Kc and/or S2 cells. A Heatmap indicates log[2]expression level for the 493 TF genes in Kc and S2 cells. The expression level color key is shown to the right. B Heatmap indicates log[2] fold change for the 493 TF genes in Kc and S2 cells. The fold change ratio (log[2] Kc/S2) color key is shown to the right. Genes indicated in red are more highly expressed in Kc cells than S2 cells. Genes indicated in blue are more highly expressed in S2 cells than Kc cells. Fig. 3. [76]Fig. 3 [77]Open in a new tab Classification of the 493 transcription factor genes by DNA-binding domain. A Histogram of the 493 TF genes annotated DNA-binding domain (DBD) sorted by frequency. B Enrichment analysis of the TF DBD families. The color key for the calculated q-value for each class is shown to the right and the size of the data point for each DBD family is representative of the total number of genes in that class. C Boxblot indicating maximum, upper quartile, median, lower quartile and minimum TPM expression (log[2] scale) in Kc and S2 cells for TF genes in the 18 most frequent DBD families. Outlying data points are indicated with individual dots. D Boxblot indicating maximum, upper quartile, median, lower quartile and minimum fold change (log[2] scale) between Kc and S2 cells for TF genes in the 18 most frequent DBD families. Positive values represent genes more highly expressed in Kc cells than S2 cells, while negative values represent genes more highly expressed in S2 cells than Kc cells. Outlying data points are indicated with individual dots Analysis of the enrichment in the 493 TFs (see Methods for details) reveals 20 different DBD families with at least 2 different TFs and a Rich Ratio greater than 0.5 (Fig. [78]3b). Amongst those, only the zf-C2H2 family, with 179 of the 224 total TF genes in the genome, is significant (0.799 Rich Ratio, q-value = 8.88 × 10^–14). Eleven of the DBD families demonstrate a Rich Ratio of 1, indicating that every member of that particular DBD family in the genome is present on the list of 493 TFs. However, none report a significant q-value score (Fig. [79]3b), likely due to the very low number of total annotated genes in these families (ranging from 2 to 7). If we consider the overall expression level in both cell types of the 493 TFs classified by DBD family, the median TPM expression values (log[2] scale) range from 0.43 to 6.61 (Fig. [80]3c). Notably, the members of the homeobox family have the lowest median expression level amongst the 18 most frequent DBD families, although a few outlying homeobox TF genes are expressed at higher levels (Fig. [81]3c). When the difference in expression between Kc and S2 cells is analyzed for the 18 most frequent DBD families, the median fold change (log[2] scale) values all fall in the narrow range of -1.62 to 1.39 (Fig. [82]3d). However, within almost all 18 different DBD families there is extensive variation from the minimum to maximum fold change value and six different DBD families (zf-C2H2, bHLH, HMG, bZIP, ZBTB and zf-BED) contain individual TF genes with outlying fold change values (Fig. [83]3d). Differentially expressed TF genes In order to further characterize the expression profile differences amongst the 493 TFs in Kc and S2 cells, we identified 123 differentially expressed genes (DEGs) with at least a two-fold change in expression between the two cell types (see Methods for full details). Of these DEGs, 57 are expressed at significantly higher levels in Kc cells than S2 cells (Kc up, Table [84]1) and 66 are expressed at significantly lower levels in Kc cells than S2 cells (Kc down, Table [85]2). Organizing the DEGs in expression heatmaps along with their annotated DBD, reveals that the 57 Kc up TFs are contained in 13 different DBD families (Fig. S[86]2a), while the 66 Kc down TFs are in 19 different DBD families (Fig. S[87]2b). A heatmap of the fold change in expression between the two cell types demonstrates that many, but certainly not all, of the TF genes show a relatively modest fold change (Fig. [88]4). Taken together with the overall expression profile for all 707 TFs, this indicates that the transcriptional landscape for the TF coding genes in the genome may be shared to a certain degree between the two cell lines. Table 1. Upregulated DEGs ranked by log[2] fold change Gene Symbol Gene ID KC TPM S2 TPM KC Read Count S2 Read Count log2 (KC / S2) nerfin-2 41235 7.99 0 269 0 9.29 bcd 40830 4.84 0 168 0 8.57 dysf 43174 176.21 0.54 8689 26 8.32 croc 40374 5.63 0.03 181 1 7.78 Sox15 36575 16.51 0.12 835 6 7.16 zfh2 43795 1.77 0.01 253 2 7.11 toy 43833 21.78 0.19 773 6 6.83 CG17801 42144 6.59 0.06 107 1 6.69 CG8301 41160 83.34 0.81 3559 35 6.65 Blimp-1 38638 1.16 0.02 71 1 6.49 CG4328 39405 0.78 0 21 0 5.93 cad 35341 1.36 0.03 46 1 5.74 CG43689 31353 0.34 0 32 0 4.75 CG10654 39428 1.18 0.06 26 1 4.52 fd19B 33010 2.8 0.15 36 2 4.19 bab2 44254 8.01 0.44 574 32 4.16 CG31875 318996 15.58 0.97 211 13 3.98 SoxN 44275 0.77 0.05 43 3 3.91 l(3)neo38 41423 5.54 0.37 327 17 3.90 cnc 42743 152.08 13.13 10551 1051 3.51 vis 36372 8.77 0.87 213.59 22.28 3.31 CG6175 39271 15.98 1.65 870 86 3.24 CG14441 31611 1.1 0.15 91 12 2.84 klu 39228 2.85 0.41 186 24 2.80 CG6765 38971 6.3 0.99 295.9 46.86 2.64 pdm3 35813 1.45 0.25 104 16 2.50 Dlip3 53579 9.4 1.66 186 33 2.48 ci 43767 1.69 0.3 116 22 2.46 HmgD 37481 3932.31 805.31 51211 10794 2.26 CG13204 36227 15.55 3.75 396 93 2.02 cyc 40162 58.44 15.56 1283 344 1.88 cbt 33224 267.85 72.71 7564 2049 1.85 hng2 41056 41.25 11.82 662 191 1.78 CG11398 31070 4.35 1.3 90 40 1.72 Irbp18 39243 91.21 27.43 682 204 1.71 Hmg-2 37407 47.21 15.15 851 275 1.61 Usf 31384 21.31 7.16 909 309 1.55 Ets21C 33229 39.01 13.51 1423 521 1.50 topi 41199 13.77 4.82 502 177 1.49 CG6808 41394 21.09 7.46 410 146 1.47 CkIIalpha-i1 39721 12.3 4.5 263 97 1.42 Pdp1 45588 70.72 26.36 4358 1625 1.40 foxo 41709 24.84 9.28 1276 486 1.39 CG6163 39274 9.17 3.48 204 79 1.37 Eip78C 40345 11.53 4.43 594 237 1.35 p53 2768677 46.45 18.29 937 383 1.32 Atf3 43867 21.86 8.9 2136 878 1.27 jigr1 43093 39.95 16.48 893 370 1.25 sima 43580 65.16 27.2 4843 2118 1.23 CG10431 35157 1.53 0.7 61 26 1.09 CG8765 40147 35.41 16.43 1506 704 1.08 Coop 35677 26.85 12.47 539 251 1.08 brk 31665 25 11.74 1103 522 1.06 cwo 44669 52.42 24.78 3354 1632 1.05 Myc 31310 72.31 34.36 6538 3448 1.05 CG11456 40309 10.65 5.08 215 103 1.04 Elba2 33442 6.32 3.07 127 62 1.02 [89]Open in a new tab The 57 genes expressed at significantly higher levels in Kc cells when compared to S2 cells (Kc up) are listed with the read count, TPM values and log[2] expression ratio for the two cell types Table 2. Downregulated DEGs ranked by log[2] fold change Gene Symbol Gene ID KC TPM S2 TPM KC Read Count S2 Read Count log2 (KC / S2) tsh 35430 0.17 62.22 11 4129 -8.57 hng3 38090 0 2.76 0 44 -7.78 lz 31883 0.02 5.14 1 240 -7.68 pnr 44849 1.41 221.13 55 8652 -7.32 Sox102F 43844 0.08 11.03 3 506 -7.19 fuss 43835 0.06 4.84 2 194 -6.59 ey 43812 0.05 2.8 2 120 -5.80 Sox100B 45039 0.04 1.46 1 41 -5.27 bowl 33602 0.95 34.25 45 1639 -5.20 CG13894 38086 0.03 1.16 1 35 -4.94 net 45339 0.13 3.23 4 82 -4.55 dl 35047 6.44 87.07 253 3524 -3.78 Hr51 36702 0.04 0.49 1 14 -3.70 CG14050 31218 0.16 2.09 3 39 -3.67 CG31612 35427 0.8 9.55 38 448 -3.59 dve 37546 0.03 0.29 2 18 -3.52 dati 43789 0.11 1.27 5 58 -3.49 CG7963 41001 0.21 2.26 3 32 -3.40 gt 31227 0.29 2.84 7 69 -3.30 E(spl)m3-HLH 43156 0.52 3.87 9 68 -2.95 scro 3355151 0.99 6.81 26 170 -2.80 br 44505 9.3 52.6 649 4276 -2.53 CG4496 34000 0.69 3.79 22 116 -2.48 Smr 32225 13.02 68.81 2522 13390 -2.43 CG12071 43660 0.47 2.05 17 75 -2.14 apt 37734 9.14 36.4 330 1427 -2.02 E(spl)mbeta-HLH 43152 5.46 21.64 70 279 -2.01 Dif 35045 13.55 52.51 623 2311 -1.98 cic 53560 24.53 92.6 2829 10180 -1.94 Mondo 35402 25.88 97.62 1402 5326 -1.94 peb 31391 0.37 1.37 39 139 -1.91 Eip93F 44936 0.82 2.88 130 472 -1.84 usp 31165 19.12 62.07 665.53 2163.49 -1.73 Glut4EF 41217 5.13 16.62 361 1333 -1.72 CG2678 40937 7.49 24.16 192 616 -1.72 Mad 33529 14.82 47.07 553 1775 -1.69 kay 3772082 115.27 355.35 5393 16809 -1.65 sqz 42300 5.02 15.25 226 692 -1.63 CG9948 38757 11 33.02 117 375 -1.61 tx 43190 1.18 3.48 39 116 -1.59 CG8089 36679 2.42 6.78 66 186 -1.51 Snoo 5740414 5.59 15.65 479.53 1432.64 -1.51 Eip74EF 39962 3.44 9.5 249 676 -1.49 zfh1 43650 93.03 257.02 8460 23596 -1.49 Clamp 35445 40.18 107.82 1471 3995 -1.45 NK7.1 41747 17.77 45.97 747 1958 -1.40 lilli 33496 11.44 27.21 1209 2779 -1.28 Rfx 41266 2.07 4.92 128 321 -1.27 Rel 41087 28.28 66.47 1251 2981 -1.26 Kah 38072 1.95 4.57 76 179 -1.25 CG11695 32106 13.56 31.61 321 755 -1.25 schlank 50392 128.45 298.29 4025 9345 -1.24 chif 34974 12.1 28.01 1106.71 2555.57 -1.24 lola 44548 117.88 264.61 5228.28 11234.91 -1.19 CG44002 33535 4 8.95 68 153 -1.19 CG10366 35258 9.6 21.46 270 608 -1.19 tai 34242 17.93 39.46 2037 4507 -1.17 tna 39217 12.86 27.98 807 1855 -1.15 da 34413 40.23 87.24 1759 3851 -1.14 mamo 32353 14.06 30.37 1733 4468 -1.14 chm 43928 9.01 19.25 342 698 -1.12 CG4360 42413 60.22 128.6 2933 6305 -1.12 CG2712 31267 5.56 11.84 136 292 -1.12 chinmo 33343 28.82 60.55 3935 7583 -1.10 emc 38091 47.06 98.52 1315 2773 -1.09 shn 36171 15.37 31.31 2010 4139 -1.05 [90]Open in a new tab The 66 genes expressed at significantly higher levels in S2 cells when compared to Kc cells (Kc down) are listed with the read count, TPM values and log[2] expression ratio for the two cell types Fig. 4. [91]Fig. 4 [92]Open in a new tab Difference heatmaps for DEGs. Heatmaps indicate log[2] fold change for the 57 genes expressed at significantly higher levels in Kc cells when compared to S2 cells (Kc up) (A) and the 66 genes that are expressed at significantly higher levels in S2 cells than Kc cells (Kc down) (B). The annotated functional GO TF level 2 classification term for each gene is indicated to the left of the heatmap with color key to the far right. The gene name and fold change ratio (log[2] Kc/S2) color key is shown to the right Despite these overall similarities, if we analyze the TPM values of the DEGs in the two cell types there are genes with disparate expression profiles, including a number of genes with high level expression in one cell type and very low TPM values in the other cell type (Fig. [93]5a, b). The distribution of the 57 Kc up TFs and the 66 Kc down TFs by their respective annotated DBD is shown in Fig. [94]5c, d. Thirteen different DBD families are represented in the group of 57 TFs, with the zinc finger zf-C2H2 family as the largest group with 15 members, followed by the homeobox and bHLH groups (5 TFs each). Six DBD families are present in only one TF each (Fig. [95]5c). In the group of 66 Kc down TFs, 19 different DBD families are represented, with the zinc finger zf-C2H2 family as the largest group with 20 members, followed by the bHLH group (8 TFs) and ZBTB (4 TFs). Eleven DBD families are present in only one TF each (Fig. [96]5d). Analysis of the enrichment of the DEGs classified by their DBD reveals a range of values, including two DBD families (P53 and zf-C2HC) amongst the 57 Kc up TFs (Fig. [97]5e) and one DBD family (AF-4) amongst the 66 Kc down TFs (Fig. [98]5f) with a Rich Ratio of 1. However, it should be noted that all three of those DBD families are represented by a single TF encoding gene. None of the DBD families in either group of DEGs report a significant q-value score (Fig. [99]5e, f). Fig. 5. [100]Fig. 5 [101]Open in a new tab Classification of DEGs by DNA-binding domain. Scatterplots comparing the TPM value for the 56 genes (the HmgD gene, ID #37841 is not shown due to very high expression in Kc cells) expressed at significantly higher levels in Kc cells when compared to S2 cells (Kc up) (A) and the 66 genes that are expressed at significantly higher levels in S2 cells than Kc cells (Kc down) (B). Histograms of the frequency of annotated DNA-binding domains (DBDs) in the 57 Kc up genes (C) and the 66 Kc down genes (D). Enrichment analysis of the TF DBD classes in the 57 Kc up genes (E) and the 66 Kc down genes (F). The color key for the calculated q-value for each class is shown to the right and the size of the data point for each DBD class is representative of the total number of genes in that class If we consider the overall expression level in both cell types of the DEGs classified by DBD family, the median TPM expression values (log[2] scale) range from 0.21 to 4.92 in the 13 families representing the 57 Kc up TFs (Fig. [102]6a), and 0.58 to 5.36 in the 19 families representing the 66 Kc down TFs (Fig. [103]6c). When the difference in expression between Kc and S2 cells for the DBD families is analyzed, the median fold change (log[2] scale) values for the Kc up group fall in the range of 1.32 to 6.83 (Fig. [104]6b), with the Kc down group in the range of -1.15 to -7.68 (Fig. [105]6d). For all of the 14 DBD families across the two DEG groups with more than one TF member, there is extensive variation from the minimum to maximum fold change value (Fig. [106]6b, d). Fig. 6. [107]Fig. 6 [108]Open in a new tab DEG DNA-binding domain groups expression and difference profile. A Boxblot indicating maximum, upper quartile, median, lower quartile and minimum TPM expression (log[2] scale) in Kc and S2 cells for the 57 Kc up TF genes organized by DNA-binding domain (DBD) classes. Outlying data points are indicated with individual dots. B Boxblot indicating maximum, upper quartile, median, lower quartile and minimum fold change (log[2] scale) between Kc and S2 cells for the 57 Kc up TF organized by DBD classes. C Boxblot indicating maximum, upper quartile, median, lower quartile and minimum TPM expression (log[2] scale) in Kc and S2 cells for the 66 Kc down TF genes organized by DNA-binding domain (DBD) classes. Outlying data points are indicated with individual dots. D Boxblot indicating maximum, upper quartile, median, lower quartile and minimum fold change (log[2] scale) between Kc and S2 cells for the 66 Kc down TF genes organized by DBD classes. Outlying data points are indicated with individual dots Hematopoietic cell identity To investigate the hematopoietic identity of the two cell lines we analyzed expression of known TF gene markers within this lineage (Table S[109]5). In Drosophila, three distinct types of hemocytes originate from a common precursor stem-cell like population: plasmatocytes, crystal cells and lamellocytes [[110]18]. Previous studies have indicated that Kc cells have a plasmatocyte identity and that S2 cells combine some properties of plasmatocyte and crystal cells [[111]14, [112]15]. In our RNA-seq data, we detect very high levels of the GATA-like prohemocyte marker serpent (srp) in Kc and S2 cells (Table S[113]5). In addition, both cell types express the Friend of GATA homolog u-shaped (ush), which regulates the population size of crystal cells and can act as an inhibitor of crystal cell differentiation [[114]19], and scalloped (sd), which promotes crystal cell specification in a Notch/Serrate-dependent manner [[115]20]. This discrepancy could in part be due to the detection sensitivity in our current study, although the level of expression we detect for these two genes is very high. The differentiating prohemocyte markers glial cells missing (gcm) and gcm2 are not expressed in either cell type (Table S[116]5), indicating that neither cell line has the molecular profile of an intermediate prohemocyte [[117]20]. S2 cells do express a detectable level of the Runt domain TF lozenge (lz) and pebbled (pb), both of which are associated with Notch-dependent crystal cell differentiation and not with plasmatocytes [[118]20–[119]22]. Expression of these two crystal cell markers is very low in Kc cells (Table S[120]5). Intriguingly, klumpfuss (klu), which has been shown to inhibit pre-destined crystal cells from becoming plasmatocytes [[121]20], is expressed at a higher level in Kc cells than S2 cells. We also compared our data with single-cell transcriptome analysis in Drosophila embryos. Examining expression in embryos at the onset of gastrulation (stage 6) using the Drosophila Virtual Expression eXplorer (DVEX) package [[122]23] reveals that srp and ush, both of which are expressed at a high level in both Kc and S2 cells with no significant differences in the levels of expression between the two cell types (Table S[123]5), are co-expressed in previously identified t-distributed stochastic neighbor embedding (t-SNE) cell clusters [[124]23]. These two genes are predominantly co-expressed in t-SNE cluster 6 (Fig. S[125]3), which is one of the 11 spatially identified clusters grouped by transcriptome similarity in the DVEX project, and corresponds to the dorsal epidermis of the embryo [[126]23]. The crystal cell marker pb, which we detect at a higher level in S2 cells than Kc cells, is also predominantly expressed in t-SNE cluster 6 (Fig. S[127]3d). It should be noted that expression of srp, ush, and pb is also detectable in sub-populations of cells within additional t-SNE clusters (Fig. S[128]3) and expression of sd is very widespread in stage 6 embryos. In summary, our new data confirm that both Kc and S2 cells display TF expression profiles indicative of a hematopoietic origin, but that the lines are distinct from each other. The S2 cells appear to have a more prominent crystal cell identity, based on lz and pb expression, while the Kc cells exhibit many of the hallmarks of plasmatocyte identity. However, both cell types display a certain level of transcriptional plasticity not seen in any particular hemocyte cell type found in the embryo, suggesting that further analysis, including single cell transcriptomics on the cell lines, will be required to clarify the precise molecular identity of the cells. Protein–protein interaction network analysis To investigate the potential biological roles of the TF genes that are expressed in the two embryonic cell types, we performed an analysis of the molecular networks in the cells. The protein–protein interaction (PPI) network map of the 493 TFs demonstrates considerable interaction for the majority of TF genes (Fig. [129]7a). Of the 493 total genes, only 128 (25.96%) have zero node connections. Of the 365 genes (74.04%) with connections, the total number of connections for each gene ranges from 1 to 35 (Fig. [130]7a). Amongst the genes with a high level of connectivity (≥ 17 connections), some intriguing network components are revealed (Fig. [131]7a). The Myc gene (35 connections) encodes a TF homologous to the well-characterized vertebrate Myc proto-oncogene and has a critical role in cell growth and proliferation [[132]24, [133]25]. Given that the two Drosophila cell lines are immortalized it is not surprising to find this gene expressed in both Kc (72.31 TPM) and S2 (34.36 TPM) cells. Likewise, the Jra (Jun-related antigen) gene (28 connections) encodes for a homolog of the mammalian Jun proto-oncogene TF [[134]26, [135]27]. The EcR (Ecydysone receptor) gene (27 connections) encodes for a protein that forms the nuclear ecdysone receptor heterodimer with the protein product of usp (ultraspiracle) [[136]28]. The heterodimer TF modulates expression of hundreds of downstream genes through sequence-specific binding to ecdysone response elements (ECRES) in the regulatory regions of target genes [[137]29]. In addition, the functional activity of the heterodimer is known to be controlled by an array of co-activators and co-repressors, many of which, including the co-repressor Smr (Smrter, SMRT-related ecdysone receptor interacting protein) [[138]30] (25 connections), are captured on the PPI network map (Fig. [139]7a). The foxo gene (29 connections) encodes for a TF involved in the regulation of the insulin signaling pathway and plays a key role in regulation of the cell cycle, modulating cell growth and proliferation [[140]31]. Two less predictable components of the PPI network are represented by the fkh (forkhead) gene (34 connections), which encodes for a pleiotropic TF most frequently studied for its role in salivary gland formation [[141]32], and the ey (eyeless) (26 connections) and toy (twin of eyeless) (17 connections) gene pair, that encode for critical TFs involved in eye development as part of the retinal determination gene network [[142]33]. Fig. 7. [143]Fig. 7 [144]Open in a new tab Protein–protein interaction network analyses. A PPI network map for all 493 TF genes. Individual genes with high connectivity are labeled. B Interactions between the 57 TF genes that are expressed at significantly higher levels in Kc cells than S2 cells (Kc up). Individual genes are indicated by number; 1 = Myc, 2 = foxo, 3 = p53, 4 = cyc, 5 = cad, 6 = toy, 7 = Sox15, 8 = SoxN. C Interactions between the 66 TF genes that are expressed at significantly higher levels in S2 cells than Kc cells (Kc down). Individual genes are indicated by number; 9 = ey, 10 = Mad, 11 = usp, 12 = Rel, 13 = kay, 14 = da. Circles indicate individual TF coding genes (nodes) and are color coded according to the number of total connections from each node as indicated in the node connections color code key. The line color connecting the nodes indicates the relative strength of the calculated PPI value as shown in the ppi color key If we analyze just the 57 Kc up TFs, 28 (49.12%) have zero PPI connections, while 29 (50.88%) have between 1 and 5 connections (Fig. [145]7b). A key cluster in the network includes the Myc, foxo, p53, and cyc (cycle) genes, indicating that the TFs encoded by these genes play a critical role in regulating the cell cycle and proliferation in the Kc cell line. An interesting additional cluster is identified in the network containing the cad (caudal), toy, Sox15 (Sox box protein 15), and SoxN (Sox Neuro) genes (Fig. [146]7b). The CAUDAL TF has a well-studied role in early embryonic patterning [[147]34], but is also known to play a role in innate immune homeostasis [[148]35], which may fit with the previously characterized hemocyte-like identity of the Kc cells [[149]15]. The potential role of the TFs encoded by the toy, Sox15 and SoxN genes in Kc cells is more enigmatic. All three of these TFs have characterized activity in the formation of the central nervous system [[150]36–[151]38], but are also known to be important for cell proliferation [[152]39–[153]41]. Unraveling the precise functional role of the toy and Sox15 encoded TFs in future studies will be particularly appealing, given that they are expressed at significantly higher levels in Kc cells (toy 21.78 TPM and Sox15 16.51 TPM) than in S2 cells (toy 0.19 TPM and Sox15 0.12 TPM) (Table [154]1). Amongst the 66 Kc down TFs, 33 (50%) have zero PPI connections, while 33 (50%) have between 1 and 7 connections (Fig. [155]7c). Thirty-one of the 33 genes with connectivity form a single large cluster in the network, which includes the ey, Mad (Mothers against dpp), usp, Rel (Relish), kay (kayak), and da (daughterless) genes (Fig. [156]7c). The TFs encoded by all these genes are implicated in regulating cell cycle and proliferation, although not necessarily in the same organ, tissue or system. The ey, kay, and Mad gene products have characterized roles in eye development [[157]42–[158]44], the Rel gene is important for the immune deficiency pathway [[159]45], and the TF encoded by da participates in transcriptional regulation of a wide variety processes, including oogenesis, neurogenesis, myogenesis and cell proliferation [[160]46] and is critical for sex determination and dosage compensation by controlling the feminizing switch gene Sxl (Sex lethal) [[161]47]. It should be noted that for many of the genes in the single large network cluster, there is relatively high expression in both S2 and Kc cells (Table [162]2), suggesting that the biological activity of the encoded TFs may be functionally important in both cell lines. This potential shared role in the two cell lines makes these genes key candidates for further characterization in future studies. KEGG pathway analysis KEGG pathway mapping was performed to investigate the link between our TF expression data and the underlying biological pathways in the two embryonic cell types. Analysis of the top 10 pathways for the group of 493 TFs reveals that each node (representing a particular biological pathway) has at least 6 connections to individual TF encoding genes, with the node of highest degree having 12 connections (Fig. [163]8a). Seven of the 10 nodes represent signaling pathways (including Signaling pathways regulating pluripotency of stem cells) with extensive interconnectivity between the nodes, indicating that regulation of cell–cell signaling is important in Kc and S2 cells. This result reinforces observations from earlier studies which showed that many parallel signaling pathways are active and that the signaling landscape is extensively shared between the two cell types [[164]14, [165]15]. Amongst the KEGG signaling pathways identified, the fly MAPK (Mitogen activated protein kinase) pathway is well represented with 12 connections (Fig. [166]8a). Given that the protein encoded by the Myc gene is known to be one of the key TFs for the downstream regulation controlled by the MAPK signaling cascade [[167]25], the KEGG pathway data ties in with the fact that expression of Myc was detected in both cell types. The three identified KEGG nodes not annotated as cell signaling pathways are; Apoptosis, Longevity regulating pathway, and Cellular senescence. Given the high proliferative activity of the two cell lines, the appearance of these pathways is perhaps not surprising. Fig. 8. [168]Fig. 8 [169]Open in a new tab KEGG pathway analyses. A The 493 genes in the TF network were clustered using KEGG pathway analysis. KEGG pathway nodes are indicated by number; 1 = MAPK signaling pathway – fly, 2 = Apoptosis, 3 = Longevity regulating pathway, 4 = MAPK signaling pathway, 5 = PI3K-Akt signaling pathway, 6 = Cellular senescence, 7 = Hippo signaling pathway – fly, 8 = Signaling pathways regulating pluripotency of stem cells, 9 = Hippo signaling pathway, 10 = TGF-beta signaling pathway. B KEGG pathway analysis of the 57 TF genes that are expressed at significantly higher levels in Kc cells than S2 cells (Kc up). KEGG pathway nodes are indicated by number; 1 = Circadian rhythm, 2 = TGF-beta signaling pathway, 3 = Hippo signaling pathway – fly, 4 = Signaling pathways regulating pluripotency of stem cells, 5 = Thyroid hormone signaling pathway, 6 = Cellular senescence, 7 = PI3K-Akt signaling pathway, 8 = Mitophagy, 9 = Longevity regulating pathway, 10 = FoxO signaling pathway. C KEGG pathway analysis of the 66 genes that are expressed at significantly higher levels in S2 cells than Kc cells (Kc down). KEGG pathway nodes are indicated by number; 1 = Hippo signaling pathway, 2 = TGF-beta signaling pathway, 3 = Hippo signaling pathway – fly, 4 = MAPK signaling pathway – fly, 5 = Apoptosis, 6 = Toll and lmd signaling pathway, 7 = Ras signaling pathway, 8 = Sphingolipid signaling pathway, 9 = Cellular senescence, 10 = Thyroid hormone signaling pathway. Squares indicate defined KEGG pathways (nodes) and circles indicate individual TF coding genes. The KEGG pathway nodes are color ranked by the number of genes connected in the pathway, with only the top 10 pathways with the largest number of genes displayed in each panel. The different line colors connecting nodes to genes represent KEGG classification of the pathway; cellular processes (red), environmental information processing (blue) or organic systems (yellow) The KEGG mapping for the top 10 pathways amongst the 57 Kc up TFs uncovers a large cluster of 9 interconnected nodes, the majority of which represent cell–cell signaling pathways (Fig. [170]8b). The separate single node represents the Circadian rhythm pathway and supports the earlier observation that the cyc gene, which encodes a TF known to activate transcription of key downstream circadian clock genes [[171]48], is expressed (58.44 TPM) in Kc cells (Table [172]1). A similar profile for the KEGG mapping is observed for the top 10 pathways amongst the 66 Kc down TFs, with two clusters of interconnected nodes present (Fig. [173]8c). Eight of the ten nodes are explicitly annotated as cell–cell signaling pathways, with the other two nodes (Apoptosis and Cellular senescence) demonstrating connectivity to cell signaling mechanisms through shared genes (Fig. [174]8c). In an effort to explore the connection between KEGG identified pathways and the functional requirements of specific genes within those pathways, we examined the JAK/STAT pathway, as a proof of principle, in detail. Our previous studies have revealed that the core domeless (dome) receptor and unpaired 2 (upd2) and unpaired 3 (upd3) ligands in the JAK/STAT pathway are expressed in both Kc and S2 cells [[175]15]. A detailed study from the Sheffield RNAi Screening Facility (SRSF) utilized a second-generation, computationally optimized dsRNA library to perform a genome-wide RNAi screen in Kc cells to identify 42 regulators of JAK/STAT signaling [[176]49]. Of the 42 genes identified, nine encode for TFs (Table S[177]6). Amongst those nine genes, seven are ranked in the top 41 by overall expression level in Kc cells and the other two have very high levels of expression (ranked 159 and 162 overall). Furthermore, all nine TF genes also demonstrated very high levels of expression in S2 cells in our study, with six of the nine genes showing no significant differences in the levels of expression between the two cell types (Table S[178]6). Taken together, the data indicate that the TF components of the JAK/STAT pathway identified by RNAi are likely functionally critical in Kc and S2 cells, as evidenced by their uniform very high levels of expression. Overall, the confirmed widespread expression of many of the TF protein components in the cell–cell signaling network, and their presence in many different KEGG mapped signaling pathways in Kc and S2 cells, confirms that the cell lines will remain a valuable tool to study Drosophila cell–cell interactions in future studies. Conclusions Our comprehensive analysis of TF expression from the Drosophila genome in the commonly utilized Kc and S2 embryonic cell lines has revealed a complex landscape, in which the majority of the TF coding genes are actively expressed, although at varying levels. Of the 707 annotated TF genes, 493 are expressed at a level of 5 or greater TPM in at least one of the cell types and 123 are expressed at significantly different levels between the two cell lines. The protein–protein interaction network displays the collective influences of these TFs on cellular function and highlights a potentially important role for TFs involved in cell proliferation, cell–cell signaling pathways, and, surprisingly, eye development. KEGG pathway analysis further elucidates the role of these active TFs in signaling and cell cycle regulatory pathways. Understanding whether the differential expression of the 123 TFs is chiefly due to a difference in biological function between the two cell lines despite both possessing hematopoietic origins, a difference in neutral accumulated mutations in these two immortalized cell types, a difference in the molecular control of the route to immortality, or a combination of these possibilities, will require further study. This dataset sheds light on the TF milieu at play in these two cell lines and will serve as a resource for future gene regulatory studies that make use of either of these embryonic cell lines. Supplementary Information [179]Supplementary Material 1. ^(172.8KB, xlsx) [180]Supplementary Material 2. ^(133.3KB, xlsx) [181]Supplementary Material 3. ^(13.1KB, xlsx) [182]Supplementary Material 4. ^(12.8KB, xlsx) [183]Supplementary Material 5. ^(11.9KB, xlsx) [184]Supplementary Material 6. ^(12.1KB, xlsx) [185]Supplementary Material 7. ^(41.3KB, docx) [186]Supplementary Material 8. ^(1.9MB, docx) [187]Supplementary Material 9. ^(2MB, docx) Acknowledgements