Abstract Background Celiac disease is associated with HLA-risk haplotypes, but non-HLA genes and environmental factors are also linked to disease susceptibility. In this study, we explore the molecular pathways involved in celiac disease by analyzing the differential expression of genes in both the gut and peripheral blood across various celiac disease phenotypes. Methods Whole genome RNA sequencing was performed on 283 samples from intestinal mucosa and peripheral blood from 72 cases with either active, potential, or treated celiac disease and 73 disease controls. Enrichr pathway analysis of top differentially expressed genes was performed. Results Overall, 7565 genes in intestinal biopsies and 542 genes in blood samples were differentially expressed between cases and controls. Compared with controls, immunoglobulin heavy variable 5–51 (IGHV5-51) (p = 1.05 × 10^−14) and tissue transglutaminase (TGM2) (p = 5.29 × 10^−10), encoding for TG2, the main autoantigen in celiac disease, were two of the top up-regulated genes in intestinal biopsies from celiac cases. TGM2 was also slightly upregulated in blood cells from cases with active disease compared with controls (p = 0.05). The topmost differentially expressed genes in peripheral blood were HLA-DQB1, HLA-DQB2, and GSTM1. Among pathways identified containing transcriptionally differentiated genes were antioxidant defense systems (e.g., nuclear factor (erythroid-derived 2)-like 2 (Nrf2), glutathione, ergothioneine, and peroxisome metabolism), as well as MHC class 1 antigen presentation, amino acid transport, mTORC1, bilirubin and lipid metabolism, liver homeostasis, the complement system, and interferon signaling. Conclusions Differentially expressed genes in cases and controls indicate crosstalk between molecular pathways involved in antioxidant defense, immune regulation, and nutrient signaling in the pathogenesis of celiac disease. Supplementary Information The online version contains supplementary material available at 10.1186/s12916-025-04261-1. Keywords: Celiac disease, HLA, MHC, Autoimmunity, Oxidative stress, RNA sequencing, Small bowel, MTORC1, Mammalian target of rapamycin What is already known on this topic Highly expressed genes can distinguish damaged intestinal mucosa present in celiac disease from normal healthy mucosa. What this study adds This study presents RNA sequenced from several intestinal phenotypes such as from patients with damaged intestinal mucosa as well as from those without a damaged intestine but still an autoantibody response to tissue transglutaminase (“potential celiac disease”) or those with treated celiac disease. By combining phenotypes in a much larger number of tissue samples from both peripheral blood and mucosal biopsies, we found highly differentially expressed genes, but also identified genes involved in controlling and triggering celiac disease-associated changes. How this study might affect research, practice or policy This study identifies molecular mechanisms involved in celiac disease and new possible targets for treatment and for identification of individuals at risk. Background Genetic predisposition influences the risk for celiac disease (CD) and the most associated gene variants are located in the HLA region on chromosome six [[34]1]. Another hallmark of CD is the presence of autoantibodies against the tissue transglutaminase 2 (TG2) protein, encoded by the gene TGM2 [[35]2]. Previously, a CD-specific transcriptomic signature was found to be associated with an increase of cell proliferation, nuclear division, and cell cycle activity [[36]3]. Several studies have shown that some of the most highly expressed genes in CD mucosa are those encoding for immunoglobulin heavy variable 5–51 (IGHV5-51), C-X-C motif chemokine ligand 11 (CXCL11), interferon gamma (IFNG), and cytotoxic T-lymphocyte associated protein 4 (CTLA4) [[37]4, [38]5]. Another study using single-cell transcriptomic analyses of the immune cell compartment from the intestine indicates that IFNG signaling might play a central role in the suppression of macrophage maturation and the accumulation of proinflammatory macrophages due to upregulation of the IFI16 and GBP2/4/5 genes and involvement of viral/bacterial-related pathways [[39]6]. In the present study, these previous investigations were extended by using RNA sequencing in an unbiased analysis of all expressed genes to explore molecular changes occurring in children with different phenotypes of CD. Highly differentially expressed genes as well as perhaps the more important insights into disease mechanisms are likely to be gained from studying the duodenal mucosa. However, in this study, we extended previous in situ studies to also include samples from peripheral blood. We ranked the top differentially expressed genes in intestinal biopsies and in cells from blood using a combined analysis of different CD phenotypes. Methods Material Enrolled were children referred to four pediatric clinics in Sweden for investigation with upper endoscopy and intestinal biopsy, as described previously [[40]7]. Diagnosis of CD was based on the ESPGHAN criteria, i.e., an intestinal biopsy showing a Marsh (M) score ≥ 2 [[41]8, [42]9]. In addition to multiple intestinal biopsies collected from the proximal duodenum and separately from the bulb for histopathological examinations, two separate duodenal biopsies were obtained for RNA extraction. For the purpose of clearly separating cases from controls, two children with M2 only were excluded from the analysis, resulting in a total of 72 children included as cases for the present study. Of those who were positive for TG2 autoantibodies at diagnosis, 48 children also had intestinal villous atrophy, i.e., active CD (a-CD), 18 children had normal intestinal mucosa findings, at the same time as they were positive for TG2 autoantibodies, i.e., potential CD (p-CD) and six children were examined for mucosal healing after treatment, i.e., treated CD (t-CD) (Table [43]1). Control patients were investigated for other diseases, but had normal intestinal histopathology, and were included as disease controls (Table [44]1). The vast majority of the disease controls were diagnosed with gastroenterological reflux disorder (GERD), recurrent abdominal pain (RAP), or Helicobacter pylori (HP) gastritis. Other diagnoses among disease controls were esophagitis, inflammatory bowel syndrome (IBS), wheat, and cow’s milk allergy. Children with Crohn’s disease were excluded from the analysis because of the possibility of having inflamed small intestinal mucosa. For the present study, a total of 283 patient samples passed quality control, and of these, 70 intestinal biopsies and 69 peripheral blood samples were from CD cases and 71 intestinal biopsies, and 73 peripheral blood samples were from disease controls, respectively. Table 1. Characteristics of the patient material Biopsy A-CD Biopsy controls Biopsy P-CD Biopsy T-CD Blood A-CD Blood control Blood P-CD Blood T-CD N = 290* 52 70 18 6 47 71 18 8 mean age (SD) 7.72 (4.45) 11.22 (4.57) 7.01 (3.83) 11.74 (2.63) 7.40 (4.67) 11.16 (4.58) 7.09 (4.51) 9.06 (3.35) Male sex (%) 20 (38.5) 31 (44.3) 8 (44.4) 2 (33.3) 18 (38.3) 30 (42.3) 9 (50.0) 3 (37.5) Marsch score  M0 - 54 (77.1) 11 (61.1) 1 (16.7) - 57 (80.3) 13 (72.2) 2 (25.0)  M1 - 16 (22.9) 7 (38.9) 3 (50.0) - 14 (19.7) 5 (27.8) 4 (50.0)  M2 3 (5.8) * - - - 2 (4.3) * - - -  M3 2 (3.8) - - - 2 (4.3) - - -  M3A 13 (25.0) - - 2 (33.3)* 16 (34.0) - - 2(25.0) *  M3B 23 (44.2) - - - 15 (31.9) - - -  M3C 10 (19.2) - - - 11 (23.4) - - -  M4 1 (1.9) - - - 1 (2.1) - - - Control diagnoses  GERD - 23 (32.8) - - - 24 (33.8) - -  RAP - 11 (15.7) - - - 10 (14.1) - -  Gastritis - 7 (10.0) - - - 9 (12.7) - -  H. pylori - 6 (8.6) - - - 7 (9.9) - -  IBS - 6 (8.6) - - - 3 (4.2) - -  U. colitis - 4 (5.7) - - - 5 (7.0) - -  Esophag - 3 (4.3) - - - 1 (1.4) - -  T1D - 2 (2.9) - - - 1 (1.4) - -  CMPA - 1 (1.4) - - - 1 (1.4) - -  Malabs - 1 (1.4) - - - 1 (1.4) - -  Other - 6 (8.6) - - - 9 (12.7) - - [45]Open in a new tab ^*Seven samples excluded from analyses, resulting in a total of 283 samples Abbreviations: GERD gastroenterological reflux disorder, RAP recurrent abdominal pain, H. pylori Helicobacter pylori, U. colitis Ulcerous colitis, Esophag. esophagitis, T1D type 1 diabetes, CMPA gastrointestinal diagnosis of cow’s milk protein allergy, Malabs.,,malabsorption. “Others” included heredity of CD, skin disorders, malabsorption, and wheat allergy RNA extraction and cDNA synthesis Biopsy samples from patients undergoing small intestinal endoscopy were directly put in RNA later medium (Thermo Fisher Scientific Inc., CA, USA) and these were later processed for extraction and purification of total RNA using the Qiagen RNA kit or with the Maxwell 16 instrument (Promega, CA, USA) according to the manufacturer’s instructions. Blood samples were collected using Tempus tubes (Thermo Fisher Scientific Inc., CA, USA) and RNA from whole blood (i.e., all cells containing RNA) was extracted using the Maxwell 16 instrument (Promega, CA, USA). Whole genome RNA sequencing The quality of the libraries was evaluated using the Fragment Analyzer (Advanced Analytical Technologies, Inc.) with the DNF-471 Standard Sensitivity RNA kit. Sequencing libraries were prepared from 10 to 500 ng total RNA using the Illumina Stranded Total RNA library preparation kit with Ribo-Zero Plus treatment (cat# 20,040,525/20040529, Illumina Inc. San Diego, CA). Unique dual indexes (cat# 20,040,553/20040554, Illumina Inc.) were used. The library preparation was performed according to the manufacturers’ protocol (# 1,000,000,124,514). The adapter-ligated fragments were quantified by qPCR using the Library Quantification kit for Illumina (KAPA Biosystems) on a CFX384 Touch instrument (Bio-Rad). Cluster generation and 150-cycle paired-end sequencing of the libraries were performed on the S4 flow cell using the NovaSeq 6000 system and v1.5 sequencing chemistry (Illumina Inc. San Diego, CA) at the SciLifeLab SNP&SEQ Technology Platform (Uppsala, Sweden). Demultiplexing and conversion to FASTQ format were done using the bcl2fastq2 (2.20.0.422) software provided by Illumina. Additional statistics on sequencing quality were compiled with an in-house script from the FASTQ files, RTA, and BCL2FASTQ2 output files. The RNA-seq data was analyzed using the best practice pipeline [46]https://nf-co.re/rnaseq/3.3. In short, adapters and low-quality tails were trimmed from reads prior to read alignment. Clean sequence reads aligned to the human genome were used to assemble transcripts, estimate the abundance of these transcripts, quantify the genes, and detect differential expression among samples. For mRNA and lncRNA analyses, the reference genome build GRCh38/hg38 was chosen as the annotation references. Fragments per kilo-base million (FPKM) of