Abstract

Purpose

   Head and neck squamous cell carcinoma (HNSCC) is the sixth most
   prevalent cancer in the world, accounting for more than 90% of head and
   neck malignant tumors. However, its molecular mechanism is largely
   unknown. To help elucidate the potential mechanism of HNSCC
   tumorigenesis, we investigated the gene interaction patterns associated
   with tumorigenesis.

Methods

   Weighted gene co-expression network analysis (WGCNA) can help us to
   predict the intrinsic relationship or correlation between gene
   expression. Additionally, we further explored the combination of
   clinical information and module construction.

Results

   Sixteen modules were constructed, among which the key module most
   closely associated with clinical information was identified. By
   analyzing the genes in this module, we found that the latter may be
   related to the immune response, inflammatory response and formation of
   the tumor microenvironment. Sixteen hub genes were identified—ARHGAP9,
   SASH3, CORO1A, ITGAL, PPP1R16B, TBC1D10C, IL10RA, ITK, AKNA, PRKCB,
   TRAF3IP3, GIMAP4, CCR7, P2RY8, GIMAP7, and SP140. We further validated
   these genes at the transcriptional and translation levels.

Conclusion

   The innovative use of a weighted network to analyze HNSCC samples
   provides new insights into the molecular mechanism and prognosis of
   HNSCC. Additionally, the hub genes we identified can be used as
   biomarkers and therapeutic targets of HNSCC, laying the foundation for
   the accurate diagnosis and treatment of HNSCC in clinical and research
   in the future.

   Keywords: head and neck squamous cell carcinoma, wgcna, co-expression,
   gene module, hub gene, immune response-related genes

Introduction

   Head and neck squamous cell carcinoma (HNSCC) is a common malignant
   cancer type that originates from squamous cells of the upper
   respiratory tract[30]^1 (eg, the oral cavity, pharynx, and larynx) and
   upper gastrointestinal tract. HNSCC is the sixth most prevalent cancer
   in the world,[31]^2 accounting for more than 90% of head and neck
   malignancies. As a heterogeneous disease, HNSCC has different prognoses
   due to its diverse anatomical locations, disease progression and
   etiology.[32]^3 Smoking, drinking and infection with human
   papillomavirus (HPV) are all risk factors for the occurrence and
   development of HNSCC.[33]^4^,[34]^5 Despite surgery, radiotherapy and
   chemotherapy, approximately half of the patients die from this
   disease,[35]^6 explaining why exploring the molecular mechanism of its
   tumorigenesis and finding new targets for drug therapy are needed. In
   recent years, research on the molecular mechanism of HNSCC is in
   progress, and the discovery of important pathway alterations, such as
   Notch,[36]^7 Ras,[37]^8 and VEGF signaling,[38]^9 has led to the
   emergence of some related targeted drugs that are expected to improve
   the survival and prognosis of HNSCC patients.[39]^10 Therefore, it is
   important to study the molecular mechanism of HNSCC for early diagnosis
   and optimal treatment. Exploring the molecular mechanism underlying
   HNSCC invasion and metastasis is of great significance in addressing
   the issue of identifying new drug targets for future therapy.

   From chip technology to high-throughput sequencing technology,
   biological data have increased exponentially due to the development of
   biological detection methods.[40]^11 An integrated organism is a
   complex network comprising many genes, RNA and proteins. In the
   occurrence of various diseases, especially cancer, there are many
   interactions among these components. Traditional methods are not
   suitable to this complex interaction network. Therefore, the biological
   network analysis method based on high-throughput data arises at the
   historic moment.[41]^12 The signaling pathway network, protein
   interaction network, metabolic network, gene regulation network and
   gene co-expression network are common biological networks.[42]^13 The
   most representative gene co-expression network is weighted gene
   co-expression network analysis (WGCNA).[43]^14 The aim of this method
   is to identify synergistic expression gene modules and explore the
   relationship between gene networks and phenotypes, as well as between
   hub genes in the network.[44]^15 WGCNA emphasizes the change of modules
   rather than individual genes, greatly alleviating the problem of
   multiple comparisons.[45]^16 Additionally, it links phenotypes with
   gene expression, which is propitious for further exploration. WGCNA has
   provided significant results in the gene analysis of many species, such
   as humans and mice,[46]^17 and has become a widely used network
   analysis tool.

   In order to identify mechanistically informative features of HNSCC, in
   this study, weighted gene co-expression network analysis was used to
   analyze the gene expression profiles of 270 HNSCC samples from the
   [47]GSE65858 dataset. Several modules have been obtained that were
   combined with clinical information of patients to determine the
   biological significance by further analysis. We identified a
   significant correlation between the genes in the turquoise module and
   some clinical traits, leading us to further excavate the genes in this
   module. Finally, we validate these hub genes in The Cancer Genome Atlas
   (TCGA) dataset to confirm that these genes do have biological
   significance in the tumorigenesis of HNSCC. Our study lays the
   foundation for searching potential biomarkers to treat HNSCC and
   further experimental verification.

Materials and methods

Data information

   The HNSCC tissue samples used in this study were all from the
   [48]GSE65858 dataset of NCBI Gene Expression Omnibus
   ([49]https://www.ncbi.nlm.nih.gov/). Two hundred seventy samples were
   obtained from HNSCC patients. The gene expression data were obtained
   using the [50]GPL10558 Illumina HumanHT-12 V4.0 expression beadchip
   sequencing platform. The corresponding clinical information (including
   age, sex, smoking, alcohol consumption, tumor location, tumor grade,
   tumor stage, metastasis, and HPV16 infection) was also obtained from
   the [51]GSE65858 dataset.

   Because some data were missing, the use of WGCNA for subsequent
   analysis required the integrity of the gene expression data. Thus, the
   missing data needed to be processed to ensure subsequent analysis. The
   advantage of deleting missing data rows or columns as a processing
   method is that no errors are introduced. First, we manually removed 20
   samples of patients with clinical information deficiency and retained
   250 samples with complete information for further follow-up analysis.
   Eighty-four (33.6%) of the tumors occurred in the oropharynx, 82
   (32.8%) occurred in the cavum oris, 48 (19.2%) occurred in the larynx,
   32 (12.8%) occurred in the hypopharynx and 4 contained missing
   information. In 196 samples, the expression of HPV 16 DNA was negative,
   19 samples were HPV 16 dna+rna-, and 35 samples were HPV 16 dna+rna+.
   Number assignment of samples’ clinical traits can be seen in [52]Table
   S1.

   Network module analysis is vulnerable to the influence of outlier
   samples; thus, removing outlier samples before constructing the network
   is particularly important to obtain meaningful subsequent analysis
   results. The goodGeneSample function in WGCNA was used to remove genes
   and samples with excessive missing values, and then the outlier samples
   were removed by the method of sample hierarchical clustering-pruning
   graph. The specific methods were as follows: 1) building a hierarchical
   clustering tree for all samples and drawing and 2) removing outliers
   whose leaf node height was significantly higher than other samples.

Co-expression network construction

   We used the Affy package (using version 3.5.1 R software) to preprocess
   and standardize matrix files (RMA standardization). RMA is used to
   correct background, and impute is used to supplement missing values.
   According to the order of standard deviation (SD) from large to small,
   we selected the first five thousand genes to construct WGCNA,[53]^18
   which not only reduces the size of the whole network and computational
   load but also maintains a scale-free topological network. WGCNA
   realizes network construction in the form of a “soft threshold”. The
   essence of the soft threshold method is to transform the correlation
   coefficients of similar matrices to adjacent functions in the form of
   function transformation and then calculate the topological overlap
   matrix. Next, we used pickSoftThreshold to select the appropriate soft
   threshold. The scale-free topology fit index, as well as the mean
   connectivity, serves as a function of the soft-thresholding power. By
   calculating the scale-free topology fit index and mean connectivity for
   several powers, we can choose an appropriate soft-thresholding power to
   build network. WGCNA uses dissimilarity to cluster. The specific
   algorithm used is the topological overlap dissimilarity measure
   (TOM)[54]^19 to calculate the degree of association between genes.
   Using hierarchical clustering function, the genes with a similar
   expression spectrum are classified into TOM-based modules. We used
   these 5,000 genes to visualize gene networks. Gray refers to the
   background gene without any module. Different modules are built from
   dendrograms.

Module preservation across the network

   Two hundred forty-five samples clustered after removing outliers were
   divided into the training and validation sets using the
   createDataPartition function in R. The advantage of createDataPartition
   is that it can randomly extract the training set we need from the
   low-entropy dataset. We calculate the preservation of different modules
   by calculating the Z value, Z=observed−mean[permuted]/sd[permuted].
   This is a permutation test. It repeatedly permutes the gene labels in
   the test network to estimate the mean and standard deviation under the
   hypothesis of null preservation. If Z is greater than 10, the module
   indicates strong preservation; if Z is less than 10 and greater than 2,
   the module is weakly preserved; if less than 2, the module is not
   preserved. Statistical significance is evaluated by bootstrapping
   (N=200 permutations).[55]^20

Identification of clinically significant modules

   In the analysis, we combined modules with a various nontime dependent
   variables. First, we calculate the eigengene (module Eigengenes, ME) of
   the module, which is defined as the first principal component of all
   gene expression level vectors in the module. Next, for an optional
   gene, we defined its gene significance (GS) as the correlation
   coefficient between its expression level and dependent variable level.
   Continuous dependent variables are calculated by the Pearson
   correlation coefficient. For a module, the modular significances (MSs)
   relative to a dependent variable are defined as the correlation
   coefficients between its characteristic genes and level of dependent
   variables. The module membership (MM) of any gene in a module is
   defined as the correlation coefficient between the gene and
   characteristic gene of the module. It measures the degree of
   subordination of a gene in the module that can be used to screen the
   important genes in the module. In the module analysis, if a module and
   a dependent variable have a high-significance MS, the gene in the
   module may be an important factor affecting the dependent variable Y.
   Under this condition, it is necessary to carry out in-module analysis
   to screen the important genes in the module. The specific methods are
   as follows: For gene i in the module, a scatter plot of the module
   membership degree (MM) relative to the gene significance (GS) is made,
   and the genes with high MM and GS are selected as the objects for
   further analysis.

Function enrichment analysis

   In the postgenomic era, knowledge about gene functional pathways is
   increasing daily. Kyoto University and Tokyo University jointly
   developed the Kyoto Encyclopedia of Genes and Genomes (KEGG) database,
   which integrates a large amount of knowledge about gene, protein,
   biological pathways and other interactive networks to facilitate the
   use of gene information.[56]^21 The Gene Ontology (GO) database is a
   platform for standardized description or interpretation of the
   terminology of gene and gene product characteristics. It is a platform
   for bioinformatics researchers to unify the induction,
   processing，interpretation and sharing of gene and gene product
   data.[57]^22 In this study, “clusterProfiler”[58]^23 and “org.Hs.eg.db”
   packages in R software were used to analyze GO and KEGG, annotate the
   genes of interest in the modules, and try to identify the enrichment
   pathway of the genes in the modules. We set p-valueCutoff =0.01 and
   qvalueCutoff =0.05.

Identification of hub genes in the key module

   We believe that genes that are highly interconnected with other nodes
   in the module are functionally important. In this study, key modules
   were selected and visualized by cytoscape, and hub genes in key module
   networks were screened according to degree. First, we selected the
   weight value of more than 0.3 by weight ranking, and then identified
   the top 30 genes by degree. We believe that these genes are highly
   interconnected with other genes and have biological significance.

Validation and exploration of hub genes

   TCGA is a joint project initiated by NCI and NHGRI in the United States
   that provides a complete atlas related to all changes in the cancer
   genome. CBioPortal ([59]http://www.cbioportal.org/) acts as a web page
   for cancer genome research that can be used to browse, visualize and
   analyze multidimensional cancer genome data. It provides graphical
   information aggregation, network analysis and survival analysis at the
   gene level on multiple platforms.
   GEPIA([60]http://GEPIA.cancer-pku.cn/) is a web server to analyze the
   RNA sequencing expression data of 9,736 tumors and 8,587 normal samples
   from the TCGA and The Genotype-Tissue Expression (GTEx) projects using
   a standard processing pipeline. GEPIA was used to analyze the survival
   rate of 30 candidate genes. According to similar
   literatures,[61]^24^,[62]^25 the genes with logrank p-value less than
   0.05 were identified as real hub genes. Those genes can affect
   prognosis. Next, we used GEPIA to check the differences in the
   expression of hub genes between normal and tumor tissues. In this
   study, we used CBioPortal to verify the related genetic changes in the
   hub genes identified in HNSCC samples in the TCGA database and explore
   the correlation between the hub genes and other genes and drugs. We
   hope to uncover relevant drug targets and provide direction for further
   research.

Results

Expression value analysis of microarray data

   We downloaded the gene expression matrix files of 270 HNSCC samples
   from [63]GSE65858 ([64]https://www.ncbi.nlm.nih.gov/) and manually
   deleted 20 samples for which clinical information is missing. Finally,
   250 samples were retained to carry out in subsequent analysis. The
   background was corrected, and the data were normalized by R software.
   The visualization results of normalization are shown in [65]Figure S1.
   Gene annotations were carried out using R package, and probes
   corresponding to multiple genes were removed. The average value of
   genes corresponding to multiple probes was taken as the expression
   level of the gene. After obtaining the genes, we selected the first
   5,000 genes in SD descending order to construct a co-expression
   network. We used the hclust function of the WGCNA packages to cluster
   250 samples. As shown in [66]Figure 1A, we can see the sample
   clustering tree. The samples are divided into two clusters. We chose 60
   as the cutoff value; after 5 outliers were removed, 245 samples were
   retained for subsequent WGCNA analysis. Next, we reclustered the 245
   samples and corresponded them with clinical traits to obtain the sample
   tree and characteristic heat map ([67]Figure 1B).

Figure 1.

   [68]Figure 1
   [69]Open in a new tab

   Construction of the co-expression network using the WGCNA package in R.

   Notes: (A) Sample clustering to detect outliers. Five outliers were
   removed after clustering—[70]GSM1607776, [71]GSM1607734,
   [72]GSM1607907, [73]GSM1607816 and [74]GSM1607849. cutHeight =60. (B)
   Recluster samples and clinical traits dendogram. The color intensity
   was proportional to the gender, age, packyears, alcohol consumption,
   tumor type, tumor site, u1cc stage, t category, n category, distant
   metastasis and HPV16 infection.

   Abbreviations: WGCNA, weighted gene co-expression network analysis;
   HPV, human papillomavirus.

Construction of the weighted co-expression network and preservation

   To determine the optimal value of the soft threshold, it is necessary
   to search the power value that can make the adjacent function suit the
   scale-free condition in a certain range. As shown in [75]Figure 2A and
   [76]B, we chose 4 as the power value when the Scale-free topology fit
   index nearly equals to 0.9 and the mean connectivity is also close to
   0. We constructed the co-expression matrix using the expression matrix
   and the estimated optimal power value. The connectivity between genes
   was calculated, and similarity between genes was calculated
   accordingly. Next, the coefficient of dissimilarity among genes was
   deduced, and the hierarchical clustering gene dendrogram was obtained
   accordingly. According to the standard of dynamicTreecut, the number of
   genes was set to 30 as the minimum number of each module. Here, we used
   different colors to represent all the modules, among which gray, by
   default, represented the genes that could not be classified into any
   module. Sixteen modules were finally obtained, and the visualization
   results are shown in [77]Figure 3A. There were 229 genes in the black,
   645 in the blue, 456 in the brown, 54 in the cyan, 393 in the green,
   138 in the magenta, 50 in the midnightblue, 165 in the pink, 124 in the
   purple, 319 in the red, 66 in the salmon, 75 in the tan, 1092 in the
   turquoise, 413 in the yellow, and 601 in the gray modules. Detailed
   information of these genes can be found in [78]Table S2. We verified
   the module preservation by randomly selecting test sets and training
   sets in the dataset and found that all modules had high preservation,
   especially the turquoise module ([79]Figure 3B).

Figure 2.

   [80]Figure 2
   [81]Open in a new tab

   Determination of the soft-thresholding power.

   Notes: (A) Analysis of the scale-free fit index for various
   soft-thresholding powers (β). (B) Analysis of the mean connectivity for
   various soft-thresholding powers. The best fit power value was 4.

Figure 3.

   [82]Figure 3
   [83]Open in a new tab

   (A) Cluster dendrogram of genes. Each branch in the figure represents
   one gene, and every color below represents one co-expression module.
   (B) Zsummary statistics of module preservation. The figure shows the
   preservation Zsummary statistic (y-axis) as a function of the module
   size. The blue (low) and green (high) lines are thresholds indicating
   the 2< Z <10 region corresponding to moderate/high preservation.

Relating clinical traits to modules and correlation between modules

   The Pearson correlation coefficients of the module Eigengenes and
   corresponding variables represent the correlation between the modules
   and clinical traits. The results are shown in [84]Figure 4. Among the
   clinical information included in the study, gender, age, smoking,
   alcohol consumption, tumor type, site, grade, stage, metastasis and
   infection of HPV16 were nontime-related variables. We found that the
   absolute correlation coefficients of each module with gender, tumor
   type and metastasis were small, and the correlation was almost
   insignificant. Age, smoking and alcohol consumption were weakly
   correlated with some modules. The site, grade, stage of tumors and
   infection of HPV16 are strongly correlated with some modules. It was
   obvious that the brown module, green module and turquoise module show a
   significant correlation with clinical traits. As illustrated in
   [85]Figure 4, to further analyze genes in these modules, we found a
   strong correlation between the Gene Significance for the tumor site and
   Module Membership in the turquoise module (cor =0.58, p=3.6e−99), and
   the turquoise module was negatively correlated with the tumor site
   (cor =−0.23, p=(3e−4)) ([86]Figure 5A). There is a negative correlation
   between genes in the brown module and N category, as shown in
   [87]Figure 5B. Additionally, GS for the infection status of HPV16 was
   negatively correlated with MM in the green module, as shown in
   [88]Figure 5C. Additionally, we constructed a hierarchical clustering
   tree based on WGCNA module eigengenes and the heat map of adjacency
   between modules. The colors are from low adjacency (green) to high
   adjacency (red) to show the relationship between the eigengenes of the
   WGCNA module ([89]Figure 5D).

Figure 4.

   [90]Figure 4
   [91]Open in a new tab

   Heatmap of the correlation between the module eigengenes and clinical
   traits. Turquoise, green and brown modules are three most closely
   related modules to clinical traits.

Figure 5.

   [92]Figure 5
   [93]Open in a new tab

   (A) Scatter plot of gene significance for the tumor site and module
   membership in the turquoise module. (B) Scatter plot of gene
   significance for the n category and module membership in the brown
   module. (C) Scatter plot of gene significance for HPV16 infection and
   module membership in the green module. (D) Eigengene dendrogram and
   eigengene adjacency heatmap.

Identification of key modules

   We constructed a network heatmap plot with these 5,000 genes, as shown
   in [94]Figure S2. The different colors of the horizontal axis and
   vertical axis represented different modules, and the color differences
   of each block could be clearly seen. Among them, the turquoise module
   had the darkest yellow color. At the same time, we found that the
   turquoise module has a significant correlation with many clinical
   features in [95]Figure 4. Thus, this module was identified as a key
   module. Next, we will further analyze the genes in this module.

Function enrichment analysis in the turquoise module

   We performed enrichment analysis to explore the GO database and KEGG
   pathway in which the turquoise module genes are involved. The main
   results are shown in [96]Tables 1 and [97]2. The enrichment results
   showed that the biological process of turquoise module genes mainly
   involved inflammation-related and immune responses, such as T-cell
   activation, leukocyte cell-cell adhesion, positive regulation of T
   cells, and lymphocyte and leukocyte activation. The cell components
   associated with it included the Major Histocompatibility Complex (MHC)
   class II protein complex, MHC protein complex, membrane raft, and
   T-cell receptor complex, which are spatially related to the surface and
   extracellular space of immune cells. Simultaneously, the molecular
   function of the module was related to MHC class II receptor activity
   and nonmembrane spanning protein tyrosine kinase activity (see
   [98]Tables S3 and [99]S4 for more information). More specifically,
   pathway enrichment analysis of the module indicated that inflammation
   and the tumor microenvironment evolution mechanisms are related to
   these genes. Important pathways such as Th1 and Th2 cell
   differentiation, Th17 cell differentiation, hematopoietic cell lineage,
   and intestinal immune network for IgA production were enriched. Other
   pathways such as Staphylococcus aureus infection，viral
   myocarditis，Epstein-Barr virus infection，and human T-cell leukemia
   virus 1 infection also showed enrichment. The visualization of the
   enrichment results is shown in [100]Figure 6. The characteristic
   information may further promote our research on the pathogenesis and
   tumorigenesis of HNSCC.

Table 1.

   GO analysis of genes in turquoise module
   ONTOLOGY ID Description p-value p. adjust q-value Count
   BP GO:0042110 T cell activation 5.75E-26 2.92E-22 2.52E-22 89
   BP GO:0007159 Leukocyte cell-cell adhesion 2.06E-21 5.23E-18 4.52E-18
   68
   BP GO:0050867 Positive regulation of cell activation 4.76E-20 8.07E-17
   6.97E-17 69
   BP GO:0050870 Positive regulation of T cell activation 1.33E-19
   1.44E-16 1.25E-16 51
   BP GO:0051249 Regulation of lymphocyte activation 1.42E-19 1.44E-16
   1.25E-16 78
   BP GO:1903039 Positive regulation of leukocyte cell-cell adhesion
   2.53E-19 2.15E-16 1.85E-16 52
   BP GO:0050863 Regulation of T cell activation 3.91E-19 2.84E-16
   2.46E-16 63
   BP GO:0002696 Positive regulation of leukocyte activation 5.76E-19
   3.66E-16 3.16E-16 66
   BP GO:0070661 Leukocyte proliferation 2.41E-18 1.36E-15 1.17E-15 57
   BP GO:0022409 Positive regulation of cell-cell adhesion 6.12E-18
   3.11E-15 2.69E-15 54
   BP GO:1903037 Regulation of leukocyte cell-cell adhesion 6.81E-18
   3.15E-15 2.72E-15 59
   BP GO:0022407 Regulation of cell-cell adhesion 8.12E-18 3.44E-15
   2.97E-15 68
   BP GO:0046651 Lymphocyte proliferation 4.99E-17 1.95E-14 1.69E-14 53
   BP GO:0032943 MONONUCLEAR cell proliferation 7.07E-17 2.57E-14 2.22E-14
   53
   BP GO:0045785 Positive regulation of cell adhesion 2.04E-16 6.91E-14
   5.97E-14 66
   BP GO:0051251 Positive regulation of lymphocyte activation 2.26E-16
   7.19E-14 6.21E-14 58
   BP GO:0002521 Leukocyte differentiation 5.83E-16 1.74E-13 1.50E-13 74
   BP GO:0030098 Lymphocyte differentiation 2.00E-15 5.66E-13 4.89E-13 58
   BP GO:0002768 Immune response-regulating cell surface receptor
   signaling pathway 5.64E-15 1.51E-12 1.30E-12 72
   BP GO:0050851 Antigen receptor-mediated signaling pathway 7.82E-15
   1.99E-12 1.72E-12 52
   BP GO:0042113 B cell activation 3.43E-14 8.29E-12 7.16E-12 50
   BP GO:0002429 Immune response-activating cell surface receptor
   signaling pathway 1.93E-13 4.46E-11 3.85E-11 66
   BP GO:0050670 Regulation of lymphocyte proliferation 8.15E-13 1.80E-10
   1.56E-10 40
   BP GO:0032944 Regulation of mononuclear cell proliferation 9.64E-13
   1.96E-10 1.70E-10 40
   BP GO:0001819 Positive regulation of cytokine production 9.66E-13
   1.96E-10 1.70E-10 61
   CC GO:0042613 MHC class II protein complex 1.51E-14 8.70E-12 7.81E-12
   13
   CC GO:0042611 MHC protein complex 8.87E-11 2.56E-08 2.30E-08 13
   CC GO:0045121 Membrane raft 5.23E-08 8.29E-06 7.44E-06 41
   CC GO:0098857 Membrane microdomain 5.74E-08 8.29E-06 7.44E-06 41
   CC GO:0098589 Membrane region 1.55E-07 1.79E-05 1.60E-05 41
   CC GO:0042101 T cell receptor complex 2.15E-07 2.07E-05 1.86E-05 9
   CC GO:0031234 Extrinsic component of cytoplasmic side of plasma
   membrane 3.81E-07 3.14E-05 2.82E-05 21
   CC GO:0009898 Cytoplasmic side of plasma membrane 9.07E-07 6.54E-05
   5.87E-05 27
   CC GO:0071556 Integral component of lumenal side of endoplasmic
   reticulum membrane 1.59E-06 9.20E-05 8.26E-05 10
   CC GO:0098553 Lumenal side of endoplasmic reticulum membrane 1.59E-06
   9.20E-05 8.26E-05 10
   CC GO:0030669 Clathrin-coated endocytic vesicle membrane 1.89E-06
   9.90E-05 8.89E-05 12
   CC GO:0030665 Clathrin-coated vesicle membrane 2.70E-06 0.00013
   0.000117 19
   CC GO:0098562 Cytoplasmic side of membrane 9.80E-06 0.000435 0.00039 27
   CC GO:0009897 External side of plasma membrane 1.49E-05 0.000616
   0.000552 34
   CC GO:0019897 Extrinsic component of plasma membrane 1.70E-05 0.000655
   0.000588 24
   CC GO:0015629 Actin cytoskeleton 2.15E-05 0.000743 0.000667 49
   CC GO:0045334 Clathrin-coated endocytic vesicle 2.19E-05 0.000743
   0.000667 13
   CC GO:0031252 Cell leading edge 2.75E-05 0.000881 0.000791 41
   CC GO:0030136 Clathrin-coated vesicle 3.66E-05 0.00111 0.000996 24
   CC GO:0001772 Immunological synapse 4.53E-05 0.001306 0.001172 9
   CC GO:0030139 Endocytic vesicle 0.000105 0.002875 0.00258 32
   CC GO:0042629 Mast cell granule 0.00011 0.00289 0.002594 7
   CC GO:0030662 Coated vesicle membrane 0.000141 0.003459 0.003104 22
   CC GO:0043235 Receptor complex 0.000144 0.003459 0.003104 38
   CC GO:0030666 Endocytic vesicle membrane 0.000163 0.003759 0.003374 21
   CC GO:0005925 Focal adhesion 0.000193 0.004292 0.003852 39
   MF GO:0032395 MHC class II receptor activity 3.00E-09 2.80E-06 2.64E-06
   8
   MF GO:0004715 Non-membrane spanning protein tyrosine kinase activity
   2.28E-06 0.001066 0.001002 13
   [101]Open in a new tab

   Abbreviations: BP, biological process; CC, cellular component; GO, Gene
   Ontology; MF, molecular function.

Table 2.

   KEGG pathway analysis of turquoise module
   ID Description GeneRatio p-value p. adjust qvalue Count
   hsa04658 Th1 and Th2 cell differentiation 29/497 4.61E-13 1.33E-10
   1.06E-10 29
   hsa04659 Th17 cell differentiation 31/497 9.01E-13 1.33E-10 1.06E-10 31
   hsa04640 Hematopoietic cell lineage 27/497 7.70E-11 6.66E-09 5.30E-09
   27
   hsa04672 Intestinal immune network for IgA production 19/497 8.99E-11
   6.66E-09 5.30E-09 19
   hsa05150 Staphylococcus aureus infection 19/497 1.31E-09 7.73E-08
   6.16E-08 19
   hsa05166 Human T-cell leukemia virus 1 infection 39/497 1.00E-08
   4.94E-07 3.93E-07 39
   hsa04662 B cell receptor signaling pathway 20/497 1.84E-08 7.79E-07
   6.20E-07 20
   hsa05416 Viral myocarditis 18/497 2.42E-08 8.94E-07 7.12E-07 18
   hsa05310 Asthma 13/497 2.92E-08 9.60E-07 7.65E-07 13
   hsa05169 Epstein-Barr virus infection 36/497 3.25E-08 9.61E-07 7.66E-07
   36
   hsa04064 NF-kappa B signaling pathway 23/497 3.70E-08 9.95E-07 7.92E-07
   23
   hsa04514 Cell adhesion molecules (CAMs) 28/497 1.96E-07 4.83E-06
   3.85E-06 28
   hsa05140 Leishmaniasis 19/497 2.13E-07 4.85E-06 3.86E-06 19
   hsa05330 Allograft rejection 13/497 4.97E-07 1.05E-05 8.37E-06 13
   hsa05332 Graft-versus-host disease 13/497 1.34E-06 2.65E-05 2.11E-05 13
   hsa04612 Antigen processing and presentation 18/497 1.98E-06 3.66E-05
   2.92E-05 18
   hsa04062 Chemokine signaling pathway 31/497 2.18E-06 3.80E-05 3.02E-05
   31
   hsa04940 Type I diabetes mellitus 13/497 2.47E-06 4.06E-05 3.23E-05 13
   hsa05340 Primary immunodeficiency 12/497 2.63E-06 4.10E-05 3.27E-05 12
   hsa05321 Inflammatory bowel disease (IBD) 16/497 3.57E-06 5.28E-05
   4.21E-05 16
   hsa05145 Toxoplasmosis 22/497 3.96E-06 5.58E-05 4.45E-05 22
   hsa05320 Autoimmune thyroid disease 14/497 5.98E-06 7.70E-05 6.14E-05
   14
   hsa05152 Tuberculosis 29/497 5.99E-06 7.70E-05 6.14E-05 29
   hsa04660 T cell receptor signaling pathway 20/497 8.33E-06 0.000103
   8.18E-05 20
   hsa05130 Pathogenic Escherichia coli infection 14/497 9.57E-06 0.000113
   9.03E-05 14
   [102]Open in a new tab

   Abbreviation: KEGG, Kyoto Encyclopedia of Genes and Genomes.

Figure 6.

   [103]Figure 6
   [104]Open in a new tab

   GO and KEGG pathway enrichment analyses of turquoise module genes. (A)
   Dotplot of GO analysis divided into three functional groups: biological
   processes, cell composition and molecular function. (B) UpSet plot of
   KEGG pathway enrichment, visualizing the complex association between
   genes and different pathways. It emphasizes the genes overlapping among
   different pathways. (C) plotGOgraph of Biological Processes depicting
   tree-like relationships between different processes. (D) The terms of
   Biological Processes are organized as a directed acyclic graph. An
   insightful way to view the results of the analysis is to investigate
   how the significant Biological Processes are distributed over the GO
   graph. The goplot function shows the subgraph induced by the most
   significant terms.

   Abbreviations: GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and
   Genomes.

Identification of hub genes in the turquoise module

   We used the Matthews correlation coefficient (MCC) algorithm and degree
   to rank the genes in the turquoise module (see [105]Tables S5 and
   [106]S6 for more information). The first 30 genes were selected and
   intersected with 18 hub genes selected by HNSCC data from the TCGA
   database by WGCNA.[107]^26 The results are shown in [108]Figure S3A
   (Detailed results are shown in [109]Table 3). [110]Figure S3B is a plot
   of the connectivity among the first 30 genes selected by degree
   sequencing (see [111]Table S7 for more information). We screened these
   genes and identified 16 genes that have an impact on survival in
   patients with HNSCC by GEPIA.[112]^27 The results are shown in
   [113]Figure 7, and these genes are considered as hub genes.

Table 3.

   Common genes between different algorithms and the hub gene obtained
   from TCGA
   Categories Elements Count
   MCC & TCGA & degree NCKAP1L/IL10RA/TRAF3IP3/WAS/SASH3 5
   MCC & degree
   CORO1A/SP140/GIMAP4/PPP1R16B/IRF8/PRKCB1/HCST/PARVG/CD53/DOCK2/AKNA/ITK
   /GIMAP7/TBC1D10C/GIMAP5/ARHGAP9/ITGAL/BTK 18
   TCGA & degree SLA/APBB1IP 2
   Degree only P2RY8/GPSM3/CCR7/SH2D1A/LAT2 5
   MCC only ARHGEF6/RCSD1/FAM65B/GMFG/DOCK8/FAIM3/Septin 6 7
   TCGA only
   PTPN22/FERMT3/GIMAP6/DOK2/CELF2/SELPLG/TNFRSF1B/MYO1F/WIPF1/UBASH3A/AMI
   CA1 11
   [114]Open in a new tab

   Abbreviations: TCGA, The Cancer Genome Atlas; MCC, Matthews correlation
   coefficient.

Figure 7.

   [115]Figure 7
   [116]Open in a new tab

   Survival analysis of 16 hub genes from the turquoise module. They are
   ARHGAP9, SASH3, CORO1A, ITGAL, PPP1R16B, TBC1D10C, IL10RA, ITK, AKNA,
   PRKCB, TRAF3IP3, GIMAP4, CCR7, P2RY8, GIMAP7 and SP140. P<0.05 was
   regarded as significant.

Validation and exploration of hub genes

   GEPIA was used to validate the changes in these 16 hub genes in normal
   and tumor tissues. [117]Figure 8 shows the expression difference of
   these genes between normal and tumor tissues in HNSCC. The expression
   of most genes in tumors was higher than that in normal tissues. Except
   for PPP1R16B, there was no significant difference between tumors and
   normal tissues. GIMAP7 expression in normal tissues was higher than
   that in tumors. CBioPortal[118]^28^,[119]^29 was used to further
   explore these genes, and queried genes were altered in 183 (37%)
   patients. The frequency of change of each hub gene in HNSCC is shown in
   [120]Figure 9A, and TBC1D10C is altered the most (10%). We can tell
   that the upregulation and amplification of these genes are the main
   changes in HNSCC. [121]Figure 9B shows 16 queried genes and the 50 most
   frequently changed neighbor genes, which also introduces the
   relationship between central genes and cancer drugs. ITK and ITGAL are
   the targets of cancer drugs. [122]Figure S4 is the relationship between
   the methylation and gene expression of these genes in HNSCC explored in
   CBioPortal. ARHGAP9, SASH3 and P2RY8 have no methylation information.
   The expression of other genes is negatively correlated with the degree
   of methylation, indicating that the upregulation of these genes may be
   caused by hypomethylation.

Figure 8.

   Figure 8
   [123]Open in a new tab

   Validation of the hub genes at the transcriptional level.

   Notes: Validation of the hub genes by the TCGA HNSCC dataset in GEPIA.
   The expression of most genes in tumors was higher than that in normal
   tissues. Except for PPP1R16B, there was no significant difference in
   expression between tumors and normal tissues. GIMAP7 expression in
   normal tissues was higher than that in tumors. Abbreviations: TCGA, The
   Cancer Genome Atlas; HNSCC, Head and neck squamous cell carcinoma.

Figure 9.

   [124]Figure 9
   [125]Open in a new tab

   (A) The alteration frequency of 16 hub genes in the TCGA HNSCC dataset
   is illustrated. (B) Gene network and tumor drugs associated with the
   hub genes in the TCGA HNSCC dataset.

   Abbreviation: TCGA, The Cancer Genome Atlas.

Discussion

   HNSCC is a common malignant tumor caused by squamous cell
   abnormalities. The 5-year survival rate of patients with early HNSCC is
   40–60%. However, for patients with local recurrence and distant
   metastasis, the median survival time after palliative chemotherapy is
   only 6–9 months. The survival time of advanced patients with
   chemotherapy tolerance has decreased to 3–6 months.[126]^30 Therefore,
   studying the development of HNSCC at the molecular level and
   formulating effective measures to prevent and suppress the metastasis
   of HNSCC are of great significance for the prevention and treatment of
   HNSCC.

   Weighted network analysis can be used to provide a holistic view of
   disease dynamics, while also enabling us to reduce complexity to a
   direct relationship. This method may reveal new discoveries of the
   disease status and mechanism of tumor evolution. It has been used to
   analyze the molecular mechanisms of many diseases, such as breast
   cancer[127]^31 and inflammatory bowel disease.[128]^32 In this study,
   we used this important tool to analyze HNSCC samples in the
   [129]GSE65858 dataset. We used this method to transform the expression
   data from 5,000 genes to 16 modules. Next, we combined them with
   various clinical information. In these modules, we further focused on
   gene pivots highly related to various clinical features. Functional
   enrichment of key module genes was analyzed by R software. Sixteen hub
   genes significantly related to the survival rate were identified in key
   modules and were further validated in TCGA to verify the reliability of
   the results. We also compared the hub genes obtained by WGCNA from the
   TCGA data of HNSCC,[130]^33 and found a high degree of overlap,
   indicating that this module is highly preservative, and these genes are
   indeed related to the pathogenesis of HNSCC. CBioPortal was used to
   explore hub genes and their genetic alteration. Based on the measurable
   changes in network analysis, more samples were analyzed. These gene
   changes provide new insights into the molecular explanation of the
   pathogenesis of HNSCC, and the hub genes may be used as biomarkers and
   therapeutic targets for the accurate diagnosis and treatment of HNSCC
   in the future, providing an early warning to clinicians about patients
   at risk for progressive disease development. Current advances in cancer
   treatment are focused on precise/personalized approaches based on
   detectable molecular abnormalities in patients’ tumors, which can then
   be utilized in a targeted manner.

   HNSCC evades immune responses through multiple resistance mechanisms.
   It is characterized by an immunosuppressive environment which includes
   the release of immunosuppressive factors,[131]^34 activation, expansion
   of immune cells with inhibitory activity and the reduction of tumor
   immunogenicity.[132]^35^,[133]^36 This is why HNSCC has long been
   considered an immunosuppressive disease. Patients often show a low
   absolute lymphocyte count, spontaneous apoptosis of cytotoxic T
   lymphocytes (CTLs), deficiencies in NK cell activity and antigen
   presentation dysfunction. The key module selected by WGCNA also
   confirms this view. The preservation of this module is very strong,
   indicating that the genes in it are closely related and highly
   correlated. Moreover, its gene function enrichment analysis is mainly
   focused on immune-related pathways, suggesting that the pathogenesis of
   HNSCC may be related to immune disorders. The survival analysis of hub
   genes reveals that the expression of immune-related genes may affect
   the prognosis of patients with HNSCC. Survival analysis in HNSCC
   patients showed that patients with high expression of these genes had a
   higher survival rate (p<0.05), with biological significance. To further
   explore the expression differences of these genes in normal and tumor
   tissues, we found that the expression of many genes in tumors was
   higher than that in normal tissues, indicating that these
   immune-related genes were activated in HNSCC and played roles such as
   inhibiting the process of tumorigenesis and formation of the local
   tumor microenvironment. These data also provide us with a new idea for
   the treatment of HNSCC.

   Ideally, the immune system can play an active immune killing role by
   recognizing the relevant or specific antigens of tumors. If the
   immunogenicity of tumor cells is weakened, they can grow and spread
   without the control and establish a local immunosuppressive
   microenvironment.[134]^37 Among all malignant tumors, head and neck
   squamous cell carcinoma has a high mutation rate,[135]^38 and the
   establishment of an immunosuppressive microenvironment is the key to
   tumorigenesis and development.[136]^39 In recent years, tumor
   immunotherapy has gradually become one of the hotspots in clinical and
   basic research of cancer and has made breakthroughs in the treatment of
   melanoma, lung cancer and other tumors.[137]^40 Immunotherapy of tumors
   is mainly aimed at curing tumors by improving the patient’s own immune
   system and inducing an anti-tumor immune response.[138]^41 Cytokines
   are often in a state of imbalance in the tumor microenvironment of head
   and neck squamous cell carcinomas—that is, immunosuppressive cytokines
   are more than immunostimulatory cytokines, thus promoting the immune
   escape of head and neck squamous cell carcinomas. Compared with healthy
   people, HNSCC patients have fewer absolute T lymphocyte counts in
   peripheral blood and more inhibited regulatory T cells.[139]^42 At the
   same time, the tumor and its surrounding stromal cells can secrete many
   cytokines such as IL10 and tumor growth factor to form an
   immunosuppressive tumor microenvironment, which can avoid killing by
   reducing the expression of tumor antigen and promoting the apoptosis of
   cytotoxic T lymphocytes.[140]^43 Additionally, the inhibition of
   natural killer cell activity and damage of the dendritic cell
   antigen-presenting function are also important mechanisms to avoid
   epidemic surveillance of head and neck tumors.[141]^44 Therefore,
   immunotherapy for head and neck squamous cell carcinoma has great
   potential.

   We analyzed the hub genes screened in this study. ARHGAP9 (Rho GTPase
   Activating Protein 9), a member of the RhoGAP family, has been
   identified as a RhoGAP for Cdc42 and Rac1.[142]^45 RhoGAP proteins
   promote the hydrolysis of GTP-bound Rho GTP enzymes, thus inactivating
   Rho GTP enzymes and inhibiting various cellular processes, such as gene
   transcription, cytoskeleton tissue, cell proliferation, migration and
   invasion.[143]^46 Recent studies have found that ARHGAP9 inhibits the
   migration and invasion of hepatocellular carcinoma cells by
   upregulating FOXJ2/E-cadherin.[144]^47 We speculate that ARHGAP9 may
   play a similar role in the genesis of HNSCC. GTPase, a family of
   immuno-related proteins (GIMAP), is most widely expressed in the immune
   system and is differentially regulated during early human Th-cell
   differentiation.[145]^48 The highly conserved human GIMAP gene region
   consists of seven functional genes and one pseudogene.[146]^49 Although
   they are likely to be regulated similarly, subcellular localization and
   structural differences indicate that each gene has a specific
   function.[147]^50 GIMAP4 plays a role in Th-cell secretion.[148]^51 But
   there is little research on GIAMP7, and the expression of GIMAP7 in
   normal tissues is higher than that in tumor tissues. This also
   indicates that there are different functions of genes in this family.
   We speculate that GIMAP7 may play a regulatory role in the regulation
   of the immune response in coordination with other molecules. It can be
   used as a potential molecular target for future exploration. SASH3 was
   also found to have associated changes in oligodendroglioma,[149]^52 and
   methylation of the PPP1R16B locus was reported in patients with
   colorectal cancer.[150]^53 CORO1A is an actin regulatory protein mainly
   expressed in hematopoietic cells that is essential for the development
   of T cells and homeostasis.[151]^54 These genes have not yet been
   reported to play a role in HNSCC and warrant further exploration. The
   role of IL10 in the regulation of the immune response during
   tumorigenesis has been widely described,[152]^55 which corresponds to
   what we found in this study that the higher expression of IL10RA in
   HNSCC than in normal samples. Our study also found that the expression
   of IL10RA affect the prognosis of HNSCC, presumably related to the
   different expression of IL10. TBC1D10C was previously proven to bind
   and inhibit Ras and Calcineurin proteins. It has been proven that
   TBC1D10C exhibits a physical interaction with H-Ras and CaN in T cells,
   inhibits the Ras/MAPK signaling pathway and is a negative feedback
   inhibitor of the CaN signaling pathway.[153]^56 It was hypothesized
   that human AKNA is encoded by a single gene located in the FRA9E region
   of chromosome 9q32[154]^57 that is associated with functional deletion
   mutations and often leads to inflammatory and neoplastic
   diseases.[155]^58 Recent findings that single-nucleotide polymorphisms
   (SNPs) in the AT-hook domain of human AKNA increase the risk of
   cervical cancer support this hypothesis.[156]^59 Overall, these genes
   play an important role in HNSCC tumorigenesis. However, the primary
   limitation of the present study was that these important hub genes
   remain to be verified by experiments; therefore, further analyses are
   required to determine the mechanisms underlying the growth of HNSCC
   cells. Further research is required to fully explore their roles in
   HNSCC.

   In general, the innovative use of a weighted network to analyze HNSCC
   samples provides new insights into the molecular mechanism and
   prognosis of HNSCC. We can combine more information in future research,
   such as SNPs, leading to better understanding of HNSCC.

Acknowledgments