Abstract Purpose Head and neck squamous cell carcinoma (HNSCC) is the sixth most prevalent cancer in the world, accounting for more than 90% of head and neck malignant tumors. However, its molecular mechanism is largely unknown. To help elucidate the potential mechanism of HNSCC tumorigenesis, we investigated the gene interaction patterns associated with tumorigenesis. Methods Weighted gene co-expression network analysis (WGCNA) can help us to predict the intrinsic relationship or correlation between gene expression. Additionally, we further explored the combination of clinical information and module construction. Results Sixteen modules were constructed, among which the key module most closely associated with clinical information was identified. By analyzing the genes in this module, we found that the latter may be related to the immune response, inflammatory response and formation of the tumor microenvironment. Sixteen hub genes were identified—ARHGAP9, SASH3, CORO1A, ITGAL, PPP1R16B, TBC1D10C, IL10RA, ITK, AKNA, PRKCB, TRAF3IP3, GIMAP4, CCR7, P2RY8, GIMAP7, and SP140. We further validated these genes at the transcriptional and translation levels. Conclusion The innovative use of a weighted network to analyze HNSCC samples provides new insights into the molecular mechanism and prognosis of HNSCC. Additionally, the hub genes we identified can be used as biomarkers and therapeutic targets of HNSCC, laying the foundation for the accurate diagnosis and treatment of HNSCC in clinical and research in the future. Keywords: head and neck squamous cell carcinoma, wgcna, co-expression, gene module, hub gene, immune response-related genes Introduction Head and neck squamous cell carcinoma (HNSCC) is a common malignant cancer type that originates from squamous cells of the upper respiratory tract[30]^1 (eg, the oral cavity, pharynx, and larynx) and upper gastrointestinal tract. HNSCC is the sixth most prevalent cancer in the world,[31]^2 accounting for more than 90% of head and neck malignancies. As a heterogeneous disease, HNSCC has different prognoses due to its diverse anatomical locations, disease progression and etiology.[32]^3 Smoking, drinking and infection with human papillomavirus (HPV) are all risk factors for the occurrence and development of HNSCC.[33]^4^,[34]^5 Despite surgery, radiotherapy and chemotherapy, approximately half of the patients die from this disease,[35]^6 explaining why exploring the molecular mechanism of its tumorigenesis and finding new targets for drug therapy are needed. In recent years, research on the molecular mechanism of HNSCC is in progress, and the discovery of important pathway alterations, such as Notch,[36]^7 Ras,[37]^8 and VEGF signaling,[38]^9 has led to the emergence of some related targeted drugs that are expected to improve the survival and prognosis of HNSCC patients.[39]^10 Therefore, it is important to study the molecular mechanism of HNSCC for early diagnosis and optimal treatment. Exploring the molecular mechanism underlying HNSCC invasion and metastasis is of great significance in addressing the issue of identifying new drug targets for future therapy. From chip technology to high-throughput sequencing technology, biological data have increased exponentially due to the development of biological detection methods.[40]^11 An integrated organism is a complex network comprising many genes, RNA and proteins. In the occurrence of various diseases, especially cancer, there are many interactions among these components. Traditional methods are not suitable to this complex interaction network. Therefore, the biological network analysis method based on high-throughput data arises at the historic moment.[41]^12 The signaling pathway network, protein interaction network, metabolic network, gene regulation network and gene co-expression network are common biological networks.[42]^13 The most representative gene co-expression network is weighted gene co-expression network analysis (WGCNA).[43]^14 The aim of this method is to identify synergistic expression gene modules and explore the relationship between gene networks and phenotypes, as well as between hub genes in the network.[44]^15 WGCNA emphasizes the change of modules rather than individual genes, greatly alleviating the problem of multiple comparisons.[45]^16 Additionally, it links phenotypes with gene expression, which is propitious for further exploration. WGCNA has provided significant results in the gene analysis of many species, such as humans and mice,[46]^17 and has become a widely used network analysis tool. In order to identify mechanistically informative features of HNSCC, in this study, weighted gene co-expression network analysis was used to analyze the gene expression profiles of 270 HNSCC samples from the [47]GSE65858 dataset. Several modules have been obtained that were combined with clinical information of patients to determine the biological significance by further analysis. We identified a significant correlation between the genes in the turquoise module and some clinical traits, leading us to further excavate the genes in this module. Finally, we validate these hub genes in The Cancer Genome Atlas (TCGA) dataset to confirm that these genes do have biological significance in the tumorigenesis of HNSCC. Our study lays the foundation for searching potential biomarkers to treat HNSCC and further experimental verification. Materials and methods Data information The HNSCC tissue samples used in this study were all from the [48]GSE65858 dataset of NCBI Gene Expression Omnibus ([49]https://www.ncbi.nlm.nih.gov/). Two hundred seventy samples were obtained from HNSCC patients. The gene expression data were obtained using the [50]GPL10558 Illumina HumanHT-12 V4.0 expression beadchip sequencing platform. The corresponding clinical information (including age, sex, smoking, alcohol consumption, tumor location, tumor grade, tumor stage, metastasis, and HPV16 infection) was also obtained from the [51]GSE65858 dataset. Because some data were missing, the use of WGCNA for subsequent analysis required the integrity of the gene expression data. Thus, the missing data needed to be processed to ensure subsequent analysis. The advantage of deleting missing data rows or columns as a processing method is that no errors are introduced. First, we manually removed 20 samples of patients with clinical information deficiency and retained 250 samples with complete information for further follow-up analysis. Eighty-four (33.6%) of the tumors occurred in the oropharynx, 82 (32.8%) occurred in the cavum oris, 48 (19.2%) occurred in the larynx, 32 (12.8%) occurred in the hypopharynx and 4 contained missing information. In 196 samples, the expression of HPV 16 DNA was negative, 19 samples were HPV 16 dna+rna-, and 35 samples were HPV 16 dna+rna+. Number assignment of samples’ clinical traits can be seen in [52]Table S1. Network module analysis is vulnerable to the influence of outlier samples; thus, removing outlier samples before constructing the network is particularly important to obtain meaningful subsequent analysis results. The goodGeneSample function in WGCNA was used to remove genes and samples with excessive missing values, and then the outlier samples were removed by the method of sample hierarchical clustering-pruning graph. The specific methods were as follows: 1) building a hierarchical clustering tree for all samples and drawing and 2) removing outliers whose leaf node height was significantly higher than other samples. Co-expression network construction We used the Affy package (using version 3.5.1 R software) to preprocess and standardize matrix files (RMA standardization). RMA is used to correct background, and impute is used to supplement missing values. According to the order of standard deviation (SD) from large to small, we selected the first five thousand genes to construct WGCNA,[53]^18 which not only reduces the size of the whole network and computational load but also maintains a scale-free topological network. WGCNA realizes network construction in the form of a “soft threshold”. The essence of the soft threshold method is to transform the correlation coefficients of similar matrices to adjacent functions in the form of function transformation and then calculate the topological overlap matrix. Next, we used pickSoftThreshold to select the appropriate soft threshold. The scale-free topology fit index, as well as the mean connectivity, serves as a function of the soft-thresholding power. By calculating the scale-free topology fit index and mean connectivity for several powers, we can choose an appropriate soft-thresholding power to build network. WGCNA uses dissimilarity to cluster. The specific algorithm used is the topological overlap dissimilarity measure (TOM)[54]^19 to calculate the degree of association between genes. Using hierarchical clustering function, the genes with a similar expression spectrum are classified into TOM-based modules. We used these 5,000 genes to visualize gene networks. Gray refers to the background gene without any module. Different modules are built from dendrograms. Module preservation across the network Two hundred forty-five samples clustered after removing outliers were divided into the training and validation sets using the createDataPartition function in R. The advantage of createDataPartition is that it can randomly extract the training set we need from the low-entropy dataset. We calculate the preservation of different modules by calculating the Z value, Z=observed−mean[permuted]/sd[permuted]. This is a permutation test. It repeatedly permutes the gene labels in the test network to estimate the mean and standard deviation under the hypothesis of null preservation. If Z is greater than 10, the module indicates strong preservation; if Z is less than 10 and greater than 2, the module is weakly preserved; if less than 2, the module is not preserved. Statistical significance is evaluated by bootstrapping (N=200 permutations).[55]^20 Identification of clinically significant modules In the analysis, we combined modules with a various nontime dependent variables. First, we calculate the eigengene (module Eigengenes, ME) of the module, which is defined as the first principal component of all gene expression level vectors in the module. Next, for an optional gene, we defined its gene significance (GS) as the correlation coefficient between its expression level and dependent variable level. Continuous dependent variables are calculated by the Pearson correlation coefficient. For a module, the modular significances (MSs) relative to a dependent variable are defined as the correlation coefficients between its characteristic genes and level of dependent variables. The module membership (MM) of any gene in a module is defined as the correlation coefficient between the gene and characteristic gene of the module. It measures the degree of subordination of a gene in the module that can be used to screen the important genes in the module. In the module analysis, if a module and a dependent variable have a high-significance MS, the gene in the module may be an important factor affecting the dependent variable Y. Under this condition, it is necessary to carry out in-module analysis to screen the important genes in the module. The specific methods are as follows: For gene i in the module, a scatter plot of the module membership degree (MM) relative to the gene significance (GS) is made, and the genes with high MM and GS are selected as the objects for further analysis. Function enrichment analysis In the postgenomic era, knowledge about gene functional pathways is increasing daily. Kyoto University and Tokyo University jointly developed the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, which integrates a large amount of knowledge about gene, protein, biological pathways and other interactive networks to facilitate the use of gene information.[56]^21 The Gene Ontology (GO) database is a platform for standardized description or interpretation of the terminology of gene and gene product characteristics. It is a platform for bioinformatics researchers to unify the induction, processing,interpretation and sharing of gene and gene product data.[57]^22 In this study, “clusterProfiler”[58]^23 and “org.Hs.eg.db” packages in R software were used to analyze GO and KEGG, annotate the genes of interest in the modules, and try to identify the enrichment pathway of the genes in the modules. We set p-valueCutoff =0.01 and qvalueCutoff =0.05. Identification of hub genes in the key module We believe that genes that are highly interconnected with other nodes in the module are functionally important. In this study, key modules were selected and visualized by cytoscape, and hub genes in key module networks were screened according to degree. First, we selected the weight value of more than 0.3 by weight ranking, and then identified the top 30 genes by degree. We believe that these genes are highly interconnected with other genes and have biological significance. Validation and exploration of hub genes TCGA is a joint project initiated by NCI and NHGRI in the United States that provides a complete atlas related to all changes in the cancer genome. CBioPortal ([59]http://www.cbioportal.org/) acts as a web page for cancer genome research that can be used to browse, visualize and analyze multidimensional cancer genome data. It provides graphical information aggregation, network analysis and survival analysis at the gene level on multiple platforms. GEPIA([60]http://GEPIA.cancer-pku.cn/) is a web server to analyze the RNA sequencing expression data of 9,736 tumors and 8,587 normal samples from the TCGA and The Genotype-Tissue Expression (GTEx) projects using a standard processing pipeline. GEPIA was used to analyze the survival rate of 30 candidate genes. According to similar literatures,[61]^24^,[62]^25 the genes with logrank p-value less than 0.05 were identified as real hub genes. Those genes can affect prognosis. Next, we used GEPIA to check the differences in the expression of hub genes between normal and tumor tissues. In this study, we used CBioPortal to verify the related genetic changes in the hub genes identified in HNSCC samples in the TCGA database and explore the correlation between the hub genes and other genes and drugs. We hope to uncover relevant drug targets and provide direction for further research. Results Expression value analysis of microarray data We downloaded the gene expression matrix files of 270 HNSCC samples from [63]GSE65858 ([64]https://www.ncbi.nlm.nih.gov/) and manually deleted 20 samples for which clinical information is missing. Finally, 250 samples were retained to carry out in subsequent analysis. The background was corrected, and the data were normalized by R software. The visualization results of normalization are shown in [65]Figure S1. Gene annotations were carried out using R package, and probes corresponding to multiple genes were removed. The average value of genes corresponding to multiple probes was taken as the expression level of the gene. After obtaining the genes, we selected the first 5,000 genes in SD descending order to construct a co-expression network. We used the hclust function of the WGCNA packages to cluster 250 samples. As shown in [66]Figure 1A, we can see the sample clustering tree. The samples are divided into two clusters. We chose 60 as the cutoff value; after 5 outliers were removed, 245 samples were retained for subsequent WGCNA analysis. Next, we reclustered the 245 samples and corresponded them with clinical traits to obtain the sample tree and characteristic heat map ([67]Figure 1B). Figure 1. [68]Figure 1 [69]Open in a new tab Construction of the co-expression network using the WGCNA package in R. Notes: (A) Sample clustering to detect outliers. Five outliers were removed after clustering—[70]GSM1607776, [71]GSM1607734, [72]GSM1607907, [73]GSM1607816 and [74]GSM1607849. cutHeight =60. (B) Recluster samples and clinical traits dendogram. The color intensity was proportional to the gender, age, packyears, alcohol consumption, tumor type, tumor site, u1cc stage, t category, n category, distant metastasis and HPV16 infection. Abbreviations: WGCNA, weighted gene co-expression network analysis; HPV, human papillomavirus. Construction of the weighted co-expression network and preservation To determine the optimal value of the soft threshold, it is necessary to search the power value that can make the adjacent function suit the scale-free condition in a certain range. As shown in [75]Figure 2A and [76]B, we chose 4 as the power value when the Scale-free topology fit index nearly equals to 0.9 and the mean connectivity is also close to 0. We constructed the co-expression matrix using the expression matrix and the estimated optimal power value. The connectivity between genes was calculated, and similarity between genes was calculated accordingly. Next, the coefficient of dissimilarity among genes was deduced, and the hierarchical clustering gene dendrogram was obtained accordingly. According to the standard of dynamicTreecut, the number of genes was set to 30 as the minimum number of each module. Here, we used different colors to represent all the modules, among which gray, by default, represented the genes that could not be classified into any module. Sixteen modules were finally obtained, and the visualization results are shown in [77]Figure 3A. There were 229 genes in the black, 645 in the blue, 456 in the brown, 54 in the cyan, 393 in the green, 138 in the magenta, 50 in the midnightblue, 165 in the pink, 124 in the purple, 319 in the red, 66 in the salmon, 75 in the tan, 1092 in the turquoise, 413 in the yellow, and 601 in the gray modules. Detailed information of these genes can be found in [78]Table S2. We verified the module preservation by randomly selecting test sets and training sets in the dataset and found that all modules had high preservation, especially the turquoise module ([79]Figure 3B). Figure 2. [80]Figure 2 [81]Open in a new tab Determination of the soft-thresholding power. Notes: (A) Analysis of the scale-free fit index for various soft-thresholding powers (β). (B) Analysis of the mean connectivity for various soft-thresholding powers. The best fit power value was 4. Figure 3. [82]Figure 3 [83]Open in a new tab (A) Cluster dendrogram of genes. Each branch in the figure represents one gene, and every color below represents one co-expression module. (B) Zsummary statistics of module preservation. The figure shows the preservation Zsummary statistic (y-axis) as a function of the module size. The blue (low) and green (high) lines are thresholds indicating the 2< Z <10 region corresponding to moderate/high preservation. Relating clinical traits to modules and correlation between modules The Pearson correlation coefficients of the module Eigengenes and corresponding variables represent the correlation between the modules and clinical traits. The results are shown in [84]Figure 4. Among the clinical information included in the study, gender, age, smoking, alcohol consumption, tumor type, site, grade, stage, metastasis and infection of HPV16 were nontime-related variables. We found that the absolute correlation coefficients of each module with gender, tumor type and metastasis were small, and the correlation was almost insignificant. Age, smoking and alcohol consumption were weakly correlated with some modules. The site, grade, stage of tumors and infection of HPV16 are strongly correlated with some modules. It was obvious that the brown module, green module and turquoise module show a significant correlation with clinical traits. As illustrated in [85]Figure 4, to further analyze genes in these modules, we found a strong correlation between the Gene Significance for the tumor site and Module Membership in the turquoise module (cor =0.58, p=3.6e−99), and the turquoise module was negatively correlated with the tumor site (cor =−0.23, p=(3e−4)) ([86]Figure 5A). There is a negative correlation between genes in the brown module and N category, as shown in [87]Figure 5B. Additionally, GS for the infection status of HPV16 was negatively correlated with MM in the green module, as shown in [88]Figure 5C. Additionally, we constructed a hierarchical clustering tree based on WGCNA module eigengenes and the heat map of adjacency between modules. The colors are from low adjacency (green) to high adjacency (red) to show the relationship between the eigengenes of the WGCNA module ([89]Figure 5D). Figure 4. [90]Figure 4 [91]Open in a new tab Heatmap of the correlation between the module eigengenes and clinical traits. Turquoise, green and brown modules are three most closely related modules to clinical traits. Figure 5. [92]Figure 5 [93]Open in a new tab (A) Scatter plot of gene significance for the tumor site and module membership in the turquoise module. (B) Scatter plot of gene significance for the n category and module membership in the brown module. (C) Scatter plot of gene significance for HPV16 infection and module membership in the green module. (D) Eigengene dendrogram and eigengene adjacency heatmap. Identification of key modules We constructed a network heatmap plot with these 5,000 genes, as shown in [94]Figure S2. The different colors of the horizontal axis and vertical axis represented different modules, and the color differences of each block could be clearly seen. Among them, the turquoise module had the darkest yellow color. At the same time, we found that the turquoise module has a significant correlation with many clinical features in [95]Figure 4. Thus, this module was identified as a key module. Next, we will further analyze the genes in this module. Function enrichment analysis in the turquoise module We performed enrichment analysis to explore the GO database and KEGG pathway in which the turquoise module genes are involved. The main results are shown in [96]Tables 1 and [97]2. The enrichment results showed that the biological process of turquoise module genes mainly involved inflammation-related and immune responses, such as T-cell activation, leukocyte cell-cell adhesion, positive regulation of T cells, and lymphocyte and leukocyte activation. The cell components associated with it included the Major Histocompatibility Complex (MHC) class II protein complex, MHC protein complex, membrane raft, and T-cell receptor complex, which are spatially related to the surface and extracellular space of immune cells. Simultaneously, the molecular function of the module was related to MHC class II receptor activity and nonmembrane spanning protein tyrosine kinase activity (see [98]Tables S3 and [99]S4 for more information). More specifically, pathway enrichment analysis of the module indicated that inflammation and the tumor microenvironment evolution mechanisms are related to these genes. Important pathways such as Th1 and Th2 cell differentiation, Th17 cell differentiation, hematopoietic cell lineage, and intestinal immune network for IgA production were enriched. Other pathways such as Staphylococcus aureus infection,viral myocarditis,Epstein-Barr virus infection,and human T-cell leukemia virus 1 infection also showed enrichment. The visualization of the enrichment results is shown in [100]Figure 6. The characteristic information may further promote our research on the pathogenesis and tumorigenesis of HNSCC. Table 1. GO analysis of genes in turquoise module ONTOLOGY ID Description p-value p. adjust q-value Count BP GO:0042110 T cell activation 5.75E-26 2.92E-22 2.52E-22 89 BP GO:0007159 Leukocyte cell-cell adhesion 2.06E-21 5.23E-18 4.52E-18 68 BP GO:0050867 Positive regulation of cell activation 4.76E-20 8.07E-17 6.97E-17 69 BP GO:0050870 Positive regulation of T cell activation 1.33E-19 1.44E-16 1.25E-16 51 BP GO:0051249 Regulation of lymphocyte activation 1.42E-19 1.44E-16 1.25E-16 78 BP GO:1903039 Positive regulation of leukocyte cell-cell adhesion 2.53E-19 2.15E-16 1.85E-16 52 BP GO:0050863 Regulation of T cell activation 3.91E-19 2.84E-16 2.46E-16 63 BP GO:0002696 Positive regulation of leukocyte activation 5.76E-19 3.66E-16 3.16E-16 66 BP GO:0070661 Leukocyte proliferation 2.41E-18 1.36E-15 1.17E-15 57 BP GO:0022409 Positive regulation of cell-cell adhesion 6.12E-18 3.11E-15 2.69E-15 54 BP GO:1903037 Regulation of leukocyte cell-cell adhesion 6.81E-18 3.15E-15 2.72E-15 59 BP GO:0022407 Regulation of cell-cell adhesion 8.12E-18 3.44E-15 2.97E-15 68 BP GO:0046651 Lymphocyte proliferation 4.99E-17 1.95E-14 1.69E-14 53 BP GO:0032943 MONONUCLEAR cell proliferation 7.07E-17 2.57E-14 2.22E-14 53 BP GO:0045785 Positive regulation of cell adhesion 2.04E-16 6.91E-14 5.97E-14 66 BP GO:0051251 Positive regulation of lymphocyte activation 2.26E-16 7.19E-14 6.21E-14 58 BP GO:0002521 Leukocyte differentiation 5.83E-16 1.74E-13 1.50E-13 74 BP GO:0030098 Lymphocyte differentiation 2.00E-15 5.66E-13 4.89E-13 58 BP GO:0002768 Immune response-regulating cell surface receptor signaling pathway 5.64E-15 1.51E-12 1.30E-12 72 BP GO:0050851 Antigen receptor-mediated signaling pathway 7.82E-15 1.99E-12 1.72E-12 52 BP GO:0042113 B cell activation 3.43E-14 8.29E-12 7.16E-12 50 BP GO:0002429 Immune response-activating cell surface receptor signaling pathway 1.93E-13 4.46E-11 3.85E-11 66 BP GO:0050670 Regulation of lymphocyte proliferation 8.15E-13 1.80E-10 1.56E-10 40 BP GO:0032944 Regulation of mononuclear cell proliferation 9.64E-13 1.96E-10 1.70E-10 40 BP GO:0001819 Positive regulation of cytokine production 9.66E-13 1.96E-10 1.70E-10 61 CC GO:0042613 MHC class II protein complex 1.51E-14 8.70E-12 7.81E-12 13 CC GO:0042611 MHC protein complex 8.87E-11 2.56E-08 2.30E-08 13 CC GO:0045121 Membrane raft 5.23E-08 8.29E-06 7.44E-06 41 CC GO:0098857 Membrane microdomain 5.74E-08 8.29E-06 7.44E-06 41 CC GO:0098589 Membrane region 1.55E-07 1.79E-05 1.60E-05 41 CC GO:0042101 T cell receptor complex 2.15E-07 2.07E-05 1.86E-05 9 CC GO:0031234 Extrinsic component of cytoplasmic side of plasma membrane 3.81E-07 3.14E-05 2.82E-05 21 CC GO:0009898 Cytoplasmic side of plasma membrane 9.07E-07 6.54E-05 5.87E-05 27 CC GO:0071556 Integral component of lumenal side of endoplasmic reticulum membrane 1.59E-06 9.20E-05 8.26E-05 10 CC GO:0098553 Lumenal side of endoplasmic reticulum membrane 1.59E-06 9.20E-05 8.26E-05 10 CC GO:0030669 Clathrin-coated endocytic vesicle membrane 1.89E-06 9.90E-05 8.89E-05 12 CC GO:0030665 Clathrin-coated vesicle membrane 2.70E-06 0.00013 0.000117 19 CC GO:0098562 Cytoplasmic side of membrane 9.80E-06 0.000435 0.00039 27 CC GO:0009897 External side of plasma membrane 1.49E-05 0.000616 0.000552 34 CC GO:0019897 Extrinsic component of plasma membrane 1.70E-05 0.000655 0.000588 24 CC GO:0015629 Actin cytoskeleton 2.15E-05 0.000743 0.000667 49 CC GO:0045334 Clathrin-coated endocytic vesicle 2.19E-05 0.000743 0.000667 13 CC GO:0031252 Cell leading edge 2.75E-05 0.000881 0.000791 41 CC GO:0030136 Clathrin-coated vesicle 3.66E-05 0.00111 0.000996 24 CC GO:0001772 Immunological synapse 4.53E-05 0.001306 0.001172 9 CC GO:0030139 Endocytic vesicle 0.000105 0.002875 0.00258 32 CC GO:0042629 Mast cell granule 0.00011 0.00289 0.002594 7 CC GO:0030662 Coated vesicle membrane 0.000141 0.003459 0.003104 22 CC GO:0043235 Receptor complex 0.000144 0.003459 0.003104 38 CC GO:0030666 Endocytic vesicle membrane 0.000163 0.003759 0.003374 21 CC GO:0005925 Focal adhesion 0.000193 0.004292 0.003852 39 MF GO:0032395 MHC class II receptor activity 3.00E-09 2.80E-06 2.64E-06 8 MF GO:0004715 Non-membrane spanning protein tyrosine kinase activity 2.28E-06 0.001066 0.001002 13 [101]Open in a new tab Abbreviations: BP, biological process; CC, cellular component; GO, Gene Ontology; MF, molecular function. Table 2. KEGG pathway analysis of turquoise module ID Description GeneRatio p-value p. adjust qvalue Count hsa04658 Th1 and Th2 cell differentiation 29/497 4.61E-13 1.33E-10 1.06E-10 29 hsa04659 Th17 cell differentiation 31/497 9.01E-13 1.33E-10 1.06E-10 31 hsa04640 Hematopoietic cell lineage 27/497 7.70E-11 6.66E-09 5.30E-09 27 hsa04672 Intestinal immune network for IgA production 19/497 8.99E-11 6.66E-09 5.30E-09 19 hsa05150 Staphylococcus aureus infection 19/497 1.31E-09 7.73E-08 6.16E-08 19 hsa05166 Human T-cell leukemia virus 1 infection 39/497 1.00E-08 4.94E-07 3.93E-07 39 hsa04662 B cell receptor signaling pathway 20/497 1.84E-08 7.79E-07 6.20E-07 20 hsa05416 Viral myocarditis 18/497 2.42E-08 8.94E-07 7.12E-07 18 hsa05310 Asthma 13/497 2.92E-08 9.60E-07 7.65E-07 13 hsa05169 Epstein-Barr virus infection 36/497 3.25E-08 9.61E-07 7.66E-07 36 hsa04064 NF-kappa B signaling pathway 23/497 3.70E-08 9.95E-07 7.92E-07 23 hsa04514 Cell adhesion molecules (CAMs) 28/497 1.96E-07 4.83E-06 3.85E-06 28 hsa05140 Leishmaniasis 19/497 2.13E-07 4.85E-06 3.86E-06 19 hsa05330 Allograft rejection 13/497 4.97E-07 1.05E-05 8.37E-06 13 hsa05332 Graft-versus-host disease 13/497 1.34E-06 2.65E-05 2.11E-05 13 hsa04612 Antigen processing and presentation 18/497 1.98E-06 3.66E-05 2.92E-05 18 hsa04062 Chemokine signaling pathway 31/497 2.18E-06 3.80E-05 3.02E-05 31 hsa04940 Type I diabetes mellitus 13/497 2.47E-06 4.06E-05 3.23E-05 13 hsa05340 Primary immunodeficiency 12/497 2.63E-06 4.10E-05 3.27E-05 12 hsa05321 Inflammatory bowel disease (IBD) 16/497 3.57E-06 5.28E-05 4.21E-05 16 hsa05145 Toxoplasmosis 22/497 3.96E-06 5.58E-05 4.45E-05 22 hsa05320 Autoimmune thyroid disease 14/497 5.98E-06 7.70E-05 6.14E-05 14 hsa05152 Tuberculosis 29/497 5.99E-06 7.70E-05 6.14E-05 29 hsa04660 T cell receptor signaling pathway 20/497 8.33E-06 0.000103 8.18E-05 20 hsa05130 Pathogenic Escherichia coli infection 14/497 9.57E-06 0.000113 9.03E-05 14 [102]Open in a new tab Abbreviation: KEGG, Kyoto Encyclopedia of Genes and Genomes. Figure 6. [103]Figure 6 [104]Open in a new tab GO and KEGG pathway enrichment analyses of turquoise module genes. (A) Dotplot of GO analysis divided into three functional groups: biological processes, cell composition and molecular function. (B) UpSet plot of KEGG pathway enrichment, visualizing the complex association between genes and different pathways. It emphasizes the genes overlapping among different pathways. (C) plotGOgraph of Biological Processes depicting tree-like relationships between different processes. (D) The terms of Biological Processes are organized as a directed acyclic graph. An insightful way to view the results of the analysis is to investigate how the significant Biological Processes are distributed over the GO graph. The goplot function shows the subgraph induced by the most significant terms. Abbreviations: GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes. Identification of hub genes in the turquoise module We used the Matthews correlation coefficient (MCC) algorithm and degree to rank the genes in the turquoise module (see [105]Tables S5 and [106]S6 for more information). The first 30 genes were selected and intersected with 18 hub genes selected by HNSCC data from the TCGA database by WGCNA.[107]^26 The results are shown in [108]Figure S3A (Detailed results are shown in [109]Table 3). [110]Figure S3B is a plot of the connectivity among the first 30 genes selected by degree sequencing (see [111]Table S7 for more information). We screened these genes and identified 16 genes that have an impact on survival in patients with HNSCC by GEPIA.[112]^27 The results are shown in [113]Figure 7, and these genes are considered as hub genes. Table 3. Common genes between different algorithms and the hub gene obtained from TCGA Categories Elements Count MCC & TCGA & degree NCKAP1L/IL10RA/TRAF3IP3/WAS/SASH3 5 MCC & degree CORO1A/SP140/GIMAP4/PPP1R16B/IRF8/PRKCB1/HCST/PARVG/CD53/DOCK2/AKNA/ITK /GIMAP7/TBC1D10C/GIMAP5/ARHGAP9/ITGAL/BTK 18 TCGA & degree SLA/APBB1IP 2 Degree only P2RY8/GPSM3/CCR7/SH2D1A/LAT2 5 MCC only ARHGEF6/RCSD1/FAM65B/GMFG/DOCK8/FAIM3/Septin 6 7 TCGA only PTPN22/FERMT3/GIMAP6/DOK2/CELF2/SELPLG/TNFRSF1B/MYO1F/WIPF1/UBASH3A/AMI CA1 11 [114]Open in a new tab Abbreviations: TCGA, The Cancer Genome Atlas; MCC, Matthews correlation coefficient. Figure 7. [115]Figure 7 [116]Open in a new tab Survival analysis of 16 hub genes from the turquoise module. They are ARHGAP9, SASH3, CORO1A, ITGAL, PPP1R16B, TBC1D10C, IL10RA, ITK, AKNA, PRKCB, TRAF3IP3, GIMAP4, CCR7, P2RY8, GIMAP7 and SP140. P<0.05 was regarded as significant. Validation and exploration of hub genes GEPIA was used to validate the changes in these 16 hub genes in normal and tumor tissues. [117]Figure 8 shows the expression difference of these genes between normal and tumor tissues in HNSCC. The expression of most genes in tumors was higher than that in normal tissues. Except for PPP1R16B, there was no significant difference between tumors and normal tissues. GIMAP7 expression in normal tissues was higher than that in tumors. CBioPortal[118]^28^,[119]^29 was used to further explore these genes, and queried genes were altered in 183 (37%) patients. The frequency of change of each hub gene in HNSCC is shown in [120]Figure 9A, and TBC1D10C is altered the most (10%). We can tell that the upregulation and amplification of these genes are the main changes in HNSCC. [121]Figure 9B shows 16 queried genes and the 50 most frequently changed neighbor genes, which also introduces the relationship between central genes and cancer drugs. ITK and ITGAL are the targets of cancer drugs. [122]Figure S4 is the relationship between the methylation and gene expression of these genes in HNSCC explored in CBioPortal. ARHGAP9, SASH3 and P2RY8 have no methylation information. The expression of other genes is negatively correlated with the degree of methylation, indicating that the upregulation of these genes may be caused by hypomethylation. Figure 8. Figure 8 [123]Open in a new tab Validation of the hub genes at the transcriptional level. Notes: Validation of the hub genes by the TCGA HNSCC dataset in GEPIA. The expression of most genes in tumors was higher than that in normal tissues. Except for PPP1R16B, there was no significant difference in expression between tumors and normal tissues. GIMAP7 expression in normal tissues was higher than that in tumors. Abbreviations: TCGA, The Cancer Genome Atlas; HNSCC, Head and neck squamous cell carcinoma. Figure 9. [124]Figure 9 [125]Open in a new tab (A) The alteration frequency of 16 hub genes in the TCGA HNSCC dataset is illustrated. (B) Gene network and tumor drugs associated with the hub genes in the TCGA HNSCC dataset. Abbreviation: TCGA, The Cancer Genome Atlas. Discussion HNSCC is a common malignant tumor caused by squamous cell abnormalities. The 5-year survival rate of patients with early HNSCC is 40–60%. However, for patients with local recurrence and distant metastasis, the median survival time after palliative chemotherapy is only 6–9 months. The survival time of advanced patients with chemotherapy tolerance has decreased to 3–6 months.[126]^30 Therefore, studying the development of HNSCC at the molecular level and formulating effective measures to prevent and suppress the metastasis of HNSCC are of great significance for the prevention and treatment of HNSCC. Weighted network analysis can be used to provide a holistic view of disease dynamics, while also enabling us to reduce complexity to a direct relationship. This method may reveal new discoveries of the disease status and mechanism of tumor evolution. It has been used to analyze the molecular mechanisms of many diseases, such as breast cancer[127]^31 and inflammatory bowel disease.[128]^32 In this study, we used this important tool to analyze HNSCC samples in the [129]GSE65858 dataset. We used this method to transform the expression data from 5,000 genes to 16 modules. Next, we combined them with various clinical information. In these modules, we further focused on gene pivots highly related to various clinical features. Functional enrichment of key module genes was analyzed by R software. Sixteen hub genes significantly related to the survival rate were identified in key modules and were further validated in TCGA to verify the reliability of the results. We also compared the hub genes obtained by WGCNA from the TCGA data of HNSCC,[130]^33 and found a high degree of overlap, indicating that this module is highly preservative, and these genes are indeed related to the pathogenesis of HNSCC. CBioPortal was used to explore hub genes and their genetic alteration. Based on the measurable changes in network analysis, more samples were analyzed. These gene changes provide new insights into the molecular explanation of the pathogenesis of HNSCC, and the hub genes may be used as biomarkers and therapeutic targets for the accurate diagnosis and treatment of HNSCC in the future, providing an early warning to clinicians about patients at risk for progressive disease development. Current advances in cancer treatment are focused on precise/personalized approaches based on detectable molecular abnormalities in patients’ tumors, which can then be utilized in a targeted manner. HNSCC evades immune responses through multiple resistance mechanisms. It is characterized by an immunosuppressive environment which includes the release of immunosuppressive factors,[131]^34 activation, expansion of immune cells with inhibitory activity and the reduction of tumor immunogenicity.[132]^35^,[133]^36 This is why HNSCC has long been considered an immunosuppressive disease. Patients often show a low absolute lymphocyte count, spontaneous apoptosis of cytotoxic T lymphocytes (CTLs), deficiencies in NK cell activity and antigen presentation dysfunction. The key module selected by WGCNA also confirms this view. The preservation of this module is very strong, indicating that the genes in it are closely related and highly correlated. Moreover, its gene function enrichment analysis is mainly focused on immune-related pathways, suggesting that the pathogenesis of HNSCC may be related to immune disorders. The survival analysis of hub genes reveals that the expression of immune-related genes may affect the prognosis of patients with HNSCC. Survival analysis in HNSCC patients showed that patients with high expression of these genes had a higher survival rate (p<0.05), with biological significance. To further explore the expression differences of these genes in normal and tumor tissues, we found that the expression of many genes in tumors was higher than that in normal tissues, indicating that these immune-related genes were activated in HNSCC and played roles such as inhibiting the process of tumorigenesis and formation of the local tumor microenvironment. These data also provide us with a new idea for the treatment of HNSCC. Ideally, the immune system can play an active immune killing role by recognizing the relevant or specific antigens of tumors. If the immunogenicity of tumor cells is weakened, they can grow and spread without the control and establish a local immunosuppressive microenvironment.[134]^37 Among all malignant tumors, head and neck squamous cell carcinoma has a high mutation rate,[135]^38 and the establishment of an immunosuppressive microenvironment is the key to tumorigenesis and development.[136]^39 In recent years, tumor immunotherapy has gradually become one of the hotspots in clinical and basic research of cancer and has made breakthroughs in the treatment of melanoma, lung cancer and other tumors.[137]^40 Immunotherapy of tumors is mainly aimed at curing tumors by improving the patient’s own immune system and inducing an anti-tumor immune response.[138]^41 Cytokines are often in a state of imbalance in the tumor microenvironment of head and neck squamous cell carcinomas—that is, immunosuppressive cytokines are more than immunostimulatory cytokines, thus promoting the immune escape of head and neck squamous cell carcinomas. Compared with healthy people, HNSCC patients have fewer absolute T lymphocyte counts in peripheral blood and more inhibited regulatory T cells.[139]^42 At the same time, the tumor and its surrounding stromal cells can secrete many cytokines such as IL10 and tumor growth factor to form an immunosuppressive tumor microenvironment, which can avoid killing by reducing the expression of tumor antigen and promoting the apoptosis of cytotoxic T lymphocytes.[140]^43 Additionally, the inhibition of natural killer cell activity and damage of the dendritic cell antigen-presenting function are also important mechanisms to avoid epidemic surveillance of head and neck tumors.[141]^44 Therefore, immunotherapy for head and neck squamous cell carcinoma has great potential. We analyzed the hub genes screened in this study. ARHGAP9 (Rho GTPase Activating Protein 9), a member of the RhoGAP family, has been identified as a RhoGAP for Cdc42 and Rac1.[142]^45 RhoGAP proteins promote the hydrolysis of GTP-bound Rho GTP enzymes, thus inactivating Rho GTP enzymes and inhibiting various cellular processes, such as gene transcription, cytoskeleton tissue, cell proliferation, migration and invasion.[143]^46 Recent studies have found that ARHGAP9 inhibits the migration and invasion of hepatocellular carcinoma cells by upregulating FOXJ2/E-cadherin.[144]^47 We speculate that ARHGAP9 may play a similar role in the genesis of HNSCC. GTPase, a family of immuno-related proteins (GIMAP), is most widely expressed in the immune system and is differentially regulated during early human Th-cell differentiation.[145]^48 The highly conserved human GIMAP gene region consists of seven functional genes and one pseudogene.[146]^49 Although they are likely to be regulated similarly, subcellular localization and structural differences indicate that each gene has a specific function.[147]^50 GIMAP4 plays a role in Th-cell secretion.[148]^51 But there is little research on GIAMP7, and the expression of GIMAP7 in normal tissues is higher than that in tumor tissues. This also indicates that there are different functions of genes in this family. We speculate that GIMAP7 may play a regulatory role in the regulation of the immune response in coordination with other molecules. It can be used as a potential molecular target for future exploration. SASH3 was also found to have associated changes in oligodendroglioma,[149]^52 and methylation of the PPP1R16B locus was reported in patients with colorectal cancer.[150]^53 CORO1A is an actin regulatory protein mainly expressed in hematopoietic cells that is essential for the development of T cells and homeostasis.[151]^54 These genes have not yet been reported to play a role in HNSCC and warrant further exploration. The role of IL10 in the regulation of the immune response during tumorigenesis has been widely described,[152]^55 which corresponds to what we found in this study that the higher expression of IL10RA in HNSCC than in normal samples. Our study also found that the expression of IL10RA affect the prognosis of HNSCC, presumably related to the different expression of IL10. TBC1D10C was previously proven to bind and inhibit Ras and Calcineurin proteins. It has been proven that TBC1D10C exhibits a physical interaction with H-Ras and CaN in T cells, inhibits the Ras/MAPK signaling pathway and is a negative feedback inhibitor of the CaN signaling pathway.[153]^56 It was hypothesized that human AKNA is encoded by a single gene located in the FRA9E region of chromosome 9q32[154]^57 that is associated with functional deletion mutations and often leads to inflammatory and neoplastic diseases.[155]^58 Recent findings that single-nucleotide polymorphisms (SNPs) in the AT-hook domain of human AKNA increase the risk of cervical cancer support this hypothesis.[156]^59 Overall, these genes play an important role in HNSCC tumorigenesis. However, the primary limitation of the present study was that these important hub genes remain to be verified by experiments; therefore, further analyses are required to determine the mechanisms underlying the growth of HNSCC cells. Further research is required to fully explore their roles in HNSCC. In general, the innovative use of a weighted network to analyze HNSCC samples provides new insights into the molecular mechanism and prognosis of HNSCC. We can combine more information in future research, such as SNPs, leading to better understanding of HNSCC. Acknowledgments