Abstract Fuchs’ endothelial corneal dystrophy (FECD) is a disease where progressive visual impairment occurs by the thickening of the Descemet’s membrane and the gradual degeneration and loss of corneal endothelial cells. This study aimed to investigate the key changes in gene expression associated with FECD and explore potential biomarkers and new therapeutic strategies for FECD. To explore the potential therapeutic targets of FECD, we downloaded the gene expression dataset [30]GSE171830 from the Gene Expression Omnibus (GEO) database. A total of 303 differentially expressed genes (DEGs) were identified by the limma package. The enriched Gene Ontology (GO) annotations and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of DEGs mostly included the extracellular matrix organization, collagen-containing extracellular matrix, and the structural constituents of the extracellular matrix. Fifteen hub genes from the most significant module were ascertained by Cytoscape. Both collagen-containing extracellular matrix and extracellular matrix hit to ANXA1, VCAN, GPC3, TNC, IGFBP7, MATN3, and SPARCL1 genes in the GO cellular components. Among these genes, the expression of SPARCL1 was down-regulated in the FECD samples, whereas the expression of GPC3, MATN3, IGFBP7, TNC, VCAN, and ANXA1 was up-regulated in the FECD samples. Gene set enrichment analysis (GSEA) plots showed that among the 20,937 genes, SPARCL1 played an important role in three pathways, cytokine-cytokine receptor interaction, the TGF-beta signaling pathway, and antigen processing and presentation. The top three pathways enriched by the GPC3, MATN3, IGFBP7, TNC, VCAN, and ANXA1 genes were those for cytokine-cytokine receptor interaction, TGF-beta signaling, and RIG-I-like receptor signaling. In conclusion, the DEGs identified here might assist clinicians in understanding the pathogenesis of FECD. Furthermore, these identified biomarkers might serve as potential therapeutic targets for the treatment of FECD. 1| Introduction Fuchs’ endothelial corneal dystrophy (FECD) is characterized by a bilateral progressive loss of the corneal endothelial cells. The clinical signs of FECD include the formation of abnormal extracellular matrix material called guttata/guttae on the Descemet’s membrane [[31]1]. FECD mainly occurs in the elderly over the age of 40 and shows a gender dichotomy. The Reykjavik Eye Study revealed FECD in 11% female and 7% male residents older than 55 years in Reykjavik, Iceland [[32]2]; the Kumejima Study found a prevalence of FECD in 4.1% of the individuals who were above 40 years [[33]3]. The female-to-male ratio of the occurrence of FECD was found to be 2.5–3:1 [[34]4]. Currently, the treatment for FECD requires surgical treatments (e.g., Descemet stripping endothelial keratoplasty, Descemet membrane endothelial keratoplasty, Descemetorhexis without endothelial keratoplasty, Rho-associated kinase inhibitors, and cell-based approaches), and non-surgical and pharmacological treatments (e.g., topical application of hypertonic 5% sodium chloride eye drops or ointments). The annual preparation of donor tissue is difficult and has unacceptable risks, including loss of donor tissue, financial losses, and cancellation of surgery; therefore, these limit the acceptance of corneal transplantation in patients. The pathophysiology of Fuchs’ endothelial dystrophy currently includes the following: 1. Genetics: mutations causing the early-onset of FECD have been exclusively linked to the α2 chain of collagen 8 (COL8A2) [[35]5]. Extensive studies in recent years have led to the description of key genetic changes in the late-onset of FECD, which include genes for transcription factor 4 [[36]6], transcription factor 8 [[37]7], ATP/GTP-binding protein-like 1 [[38]8], and solute carrier family 4 member 11 [[39]9]. These findings may provide new and important insights into the pathogenesis of the disease. 2. Molecular pathomechanisms: endothelial cells in FECD generally appear to be under endoplasmic reticulum stress, showing markers of apoptosis [[40]10], oxidative stress, and premature senescence [[41]11, [42]12], and perform epithelial-mesenchymal transition as a self-defense mechanism. Loss of barrier function and/or pumping function occurs gradually, as shown by a reduction in the Naþ/Kþ ATPase expression [[43]13]. However, the pathophysiological mechanisms of FECD are not fully elucidated. Therefore, more researchers aim to find new targets for the treatment of FECD. Limma is an R and Bioconductor software package that performs large-scale analysis of gene differential expression, differential splicing, and expression profiles to obtain gene chips and high-throughput sequencing data. An original microarray dataset [44]GSE171830 was downloaded; six FECD samples and six unaffected samples were analyzed in our study. The probe linear model and the affyPLM software package were used to control the data quality, and the gcrma software package was used to analyze the completeness and comparability of the dataset. Commonly changed DEGs were filtered from the integrated data. Additionally, the GO/KEGG pathway analysis, protein-protein interaction network construction, and CentiScape analysis were also performed to analyze the data. Finally, we screened 15hub genes, among which SPARCL1, GPC3, MATN3, IGFBP7, TNC, VCAN, and ANXA1were determined to be the key genes related to FECD. 2 | Materials and methods 2.1 | Access to public data GEO ([45]http://www.ncbi.nlm.nih.gov/geo) is a public functional genomics database that contains data on gene expression, chips, and microarrays [[46]14]. The chip dataset [47]GSE171830, used for expression profiling analysis, was downloaded from the GEO ([48]GPL10558 Illumina HumanHT-12 V4.0 expression beadchip). According to the annotation information, the probes were converted into the corresponding gene symbols. The [49]GSE171830 data contained six samples related to the FECD model and six unaffected samples used as a control group. 2.2 | Intragroup data repeatability test The repeatability of the data within each group was verified by performing Pearson’s correlation test. The R programming language was used for statistical analysis and plotting graphs. We used R to plot a visual heat map of the correlations between all samples from the same dataset. Sample cluster analysis was used to test the repeatability of the intragroup data within the dataset. Principal component analysis (PCA) is a common method of sample clustering, which is usually used for gene expression, resequencing, diversity analysis, and other sample clustering based on the information from different variables. 2.3 | Identification of differentially expressed genes We used the limma software package (Ritchie et al., 2015) to screen for differentially expressed genes between the FECD and unaffected groups, and further screened for differentially expressed genes by determining the fold change (log2 (fold change) > 2); the differences were considered to be significant at p-value < 0.01. We plotted heat maps and volcanoes and visualized differentially expressed genes by using the ggplot2 and heatmap packages in R. 2.4 | Functional analysis of DEGs To determine the related pathways and functions for the regulation of the differentially expressed genes, we conducted Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. When p-value < 0.01, the GO term or KEGG pathway was identified as being significantly enriched by the genes. The gene function enrichment analysis was performed using the clusterProfiler software package in the R software. The GOplot and ggplot2 packages in the R software were used to plot the results of the GO and KEGG analyses, respectively. 2.5 | Protein-protein interaction (PPI) network construction and the analysis and mining of hub genes The analysis of the functional interactions between proteins might provide insights into the mechanism of disease occurrence or development. We thus analyzed the protein interactions of different genes using the STRING database ([50]http://string-db.org/) [[51]15], and the maximum interaction score required was 0.9 (medium confidence) [[52]16]. Cytoscape (version 3.6.1) was used to visualize the PPI networks [[53]17]. Initially, we constructed the PPI network diagram in the Cytoscape software. Next, we determined the most important module of the network map by MCODE [[54]18], a plug-in of Cytoscape. The criteria for the MCODE analysis comprised: degree cut-off = 2, MCODE scores > 5, max depth = 100, k-score = 2, and node score cut-off = 0.2. The hub genes were excavated for degrees ≥ 13 [[55]19]. The clustering analysis of the hub genes was performed using Metascape (online analysis). 2.6 | Enrichment analysis by Gene Set Enrichment Analysis (GSEA) We used the GSEA version 2.2.1 software for gene set enrichment analysis (Subramanian et al., 2005). The genes were divided into high and low expression groups according to the expression level of the key genes. We used the MSigDB database on the GSEA website to obtain the c2.cp.kegg.v6.0.symbols.gmtdataset. Then, we performed the enrichment analysis using the default weighted enrichment method and set 1,000 times the random combination. 3 | Results 3.1 Validation of the datasets We performed Pearson’s correlation test and PCA to validate the repeatability of the intragroup data. Results of the Pearson’s correlation test showed that there was a strong correlation among the samples within the FECD group, as well as within the unaffected group for [56]GSE171830 ([57]Fig 1A). Results of the PCA showed that the repeatability of the intragroup data of the [58]GSE171830 dataset was acceptable. The distance between samples in the unaffected group, as well as in the FECD group, was very short ([59]Fig 1B). Fig 1. [60]Fig 1 [61]Open in a new tab (A) Pearson’s correlation analysis of [62]GSE171830 data set samples. The color reflects the strength of the correlation. When 0