Abstract Background Despite significant advances made by genome-wide association studies (GWAS) in the genetic exploration of tumors such as thyroid cancer (TC), the precise pathogenic genes and underlying biological mechanisms of TC remain unclear. Methods We performed a transcriptome-wide association study (TWAS) to identify the susceptibility genes for TC. Three complementary methods: FUSION (Functional Summary-based Imputation), FOCUS (Fine-mapping Of CaUsal gene Sets), and Multi-marker Analysis of GenoMic Annotation (MAGMA) were used to validate key gene discoveries. Additionally, MAGMA was used to examine the functional enrichment of single nucleotide polymorphisms (SNPs) associated with TC. Conditional and joint analysis, as well as fine mapping techniques, were employed to deepen our understanding of TC’s genetic architecture. To explore causal relationships, we conducted Mendelian randomization analysis, while colocalization analysis was used to provide potential shared SNPs between key genes and TC risk. Results Through the comprehensive application of three TWAS methods, we identified three potential susceptibility genes closely associated with TC risk. Mendelian randomization analysis provided causal links between the TGFB2, SMAD3 and SDCCAG8 genes and TC. Colocalization analysis further revealed that TGFB2 (rs1764705), SMAD3 (rs17293632), and SDCCAG8 (rs2490395) may share genetic signals between GWAS and expression quantitative trait loci (eQTL), indicating common pathways in TC pathogenesis. The study highlighted differences in the expression of significant genes in normal and thyroid cancer tissues at the transcriptome level and investigated their relationship with the tumor microenvironment. Conclusion This investigation uncovered three new genes associated with increased TC risk, shedding light on the genetic underpinnings of TC and aiding in a deeper comprehension of its complex genetic architecture. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-025-03194-8. Keywords: TC, TWAS, FUSION, MAGMA, FOCUS Introduction Thyroid cancer (TC) represents the most prevalent form of endocrine malignancy, and its incidence has been rising in the past few decades [[38]1, [39]2]. The histological classification of cancer includes papillary thyroid carcinoma (PTC), follicular thyroid carcinoma (FTC), poorly differentiated thyroid carcinoma (PDTC), and anaplastic thyroid carcinoma (ATC). Differentiated thyroid carcinoma (DTC) generally has a favorable outlook, with a 10-year survival rate specific to the disease reaching up to 90%, whereas PDTC has a less favorable prognosis, with a 5-year survival rate specific to the disease at 66% [[40]3, [41]4]. With a mean survival of less than 8 months, ATC is highly aggressive. PTC is the most prevalent type of DTC, representing more than 85% of thyroid cancer cases, with FTC (5–10%), PDTC (4–7%), and ATC (around 2%) following [[42]5]. The development of thyroid cancer is quite intricate. While the exact causes remain unclear, studies suggest that factors like race, radiation exposure, and gender may affect the risk of developing thyroid cancer [[43]6]. Evidence suggests that genetic mutations in essential tumor suppressor genes, passed down through families, are connected to hereditary syndromes that can promote the growth of thyroid lesions and neoplasms [[44]7]. Genome-wide association studies (GWAS) have become essential in exploring the genetic risk loci of TC. By scanning the genome for single nucleotide polymorphisms (SNPs), GWAS identify genetic variants linked to TC risk. For instance, variations in genes like SLC45A3、RAB7L1 and SLC41A1 are strongly associated with increased TC risk [[45]8, [46]9]. These discoveries not only elucidate the genetic landscape of TC but also provide targets for further functional research. Moreover, GWAS can pinpoint SNPs significantly associated with the disease but cannot elucidate the biological mechanisms underlying these associations. Therefore, future research must integrate additional genomic and functional approaches to fully understand the genetic basis and pathogenesis of TC. The study’s objective is to pinpoint genes that are vulnerable to TC by employing a transcriptome association study. Firstly, the results of candidate genes in a single tissue were verified by FUSION [[47]10], and candidate genes were further screened based on multi-marker Analysis of GenoMic Annotation (MAGMA) analysis [[48]11] and FOCUS [[49]12]. In addition, Mendelian randomization (MR) and colocalization analyses [[50]13] were conducted to infer the causal relationship between candidate genes and TC. This all-encompassing strategy not only deepens our insight into the genetic underpinnings of TC but also creates opportunities for future studies and therapeutic approaches. The goal of this research is to offer a comprehensive genetic framework for individualized treatment and precision medicine in TC. Materials and methods Data source This study employed genome-wide association data from the international public database GWAS Catalog (ID: GCST90399736) for Transcriptome-Wide Association Study (TWAS) analysis. To maintain data integrity, a rigorous quality control process was applied to the raw data prior to conducting the GWAS analysis. Specifically, the genomic coordinates of all single nucleotide polymorphisms (SNPs) were aligned with the hg38 version of the reference genome. Regarding minor allele frequency (MAF), SNPs with a MAF exceeding 1% were selected, while low-frequency variants were excluded. The expression quantitative trait loci (eQTL) data used in this study were obtained from the GTEx v8 dataset, which includes gene expression data from 838 individuals across 54 distinct tissues. This dataset systematically investigates the relationship between gene expression and genetic variants (SNPs), covering the expression of approximately 20,000 genes and incorporating samples from multiple ethnicities, predominantly European, African, and Latino ancestry. GTEx v8 uses RNA-seq for gene expression profiling and Matrix eQTL software for analyzing the link between genotype and gene expression. TWAS analysis We employed FUSION to perform a summary-based transcriptome-wide association study (TWAS) utilizing the GRCh38 genome from European populations as represented in the 1000 Genomes Project. FUSION constructs predictive models of the genetic components associated with functional or molecular phenotypes and evaluates these components for potential disease associations. FOCUS employs probabilistic fine-mapping techniques to assign weights to each gene within risk regions, prioritizing those with strong causal relationships. MAGMA reduces the dimensionality of the single nucleotide polymorphism (SNP) matrix through principal component analysis and subsequently uses these components as predictors in linear regression models to investigate the genetic mechanisms and functional pathways underlying polygenic traits. The Benjamini–Hochberg procedure was used to adjust P-values for controlling the false discovery rate (FDR), with a significance threshold of α = 0.05. A PFDR of less than 0.05 indicated statistical significance. Conditional and joint analyses (COJO) The COJO analysis, which follows the FUSION method, is intended to pinpoint independent genetic signals by factoring in linkage disequilibrium among markers. The study delivers an in-depth comprehension of the genetic structure responsible for differences in traits. Upon finishing the analysis, genes showing independent associations were regarded as jointly significant, while those no longer significant were classified as marginally significant. MAGMA analysis MAGMA found potential connections between these genes and specific phenotypes like TC. Its gene set enrichment analysis evaluated the combined impact of genes in certain pathways. A PFDR < 0.05, after FDR adjustment, was considered statistically significant. Colocalization analysis The “coloc” R package was used to determine if GWAS and eQTL signals shared causal variants through colocalization analysis. This analysis evaluates posterior probabilities for five hypotheses: H0 (no association), H1 or H2 (single trait association with specific variants), H3 (two traits with different SNPs), and H4 (two traits sharing a causal SNP). A PP.H4 value over 0.70 suggests colocalization, while a value above 0.90 indicates strong evidence of a shared causal variant. Mendelian randomization We used the ‘TwoSampleMR’ R package to conduct Mendelian Randomization (MR) analysis, aiming to identify causal links between identified genes and TC, using eQTL and TC GWAS data. Instrumental variables (IVs) were selected with an R² of less than 0.001 and 10,000 kb is the measurement for LD. With a single independent IV, the MR effect was estimated using the Wald ratio method, taking into account a significance threshold of P < 0.05. Tumor microenvironment analysis The expression matrices of thyroid cancer and normal tissue samples obtained from datasets [51]GSE165724, [52]GSE29265, [53]GSE33630, [54]GSE3467, [55]GSE35570, [56]GSE3678, [57]GSE6004, [58]GSE60542, and [59]GSE65144 were utilized to analyze the impact of identified genes on the tumor microenvironment. Employing the ESTIMATE method, tumor purity was measured and compared by evaluating stromal and immune cells in malignant tumor tissues using expression data. Results FUSION results Using FUSION for single-tissue validation, 37 genes demonstrated significant association signals in TWAS (FDR < 0.05) (Supplementary Table [60]S1) (Fig. [61]1). Fig. 1. [62]Fig. 1 [63]Open in a new tab Manhattan plot of FUSION results for TC Conditional and joint analyses We performed conditional and joint analyses on 37 genes identified by FUSION to confirm their independent association with TC. Among them, 25 genes were found to be independent markers, each conveying different genetic information (FDR < 0.05). These genes remained significantly associated with TC after conditional analysis, indicating their independent association rather than being influenced by linkage disequilibrium (Table [64]1). Certain genetically regulated genes’ expression may drive some GWAS signals (Supplementary Fig. [65]1). Table 1. Significant genes for TC in FUSION and COJO analysis ID TWAS.Z TWAS.P JOINT.BETA JOINT.BETA.SE JOINT.Z JOINT.P TGFB2 −9.6 8.3e−22 −9.6 1 −9.6 8.3e−22 SDCCAG8 4.9 9.6e−07 4.9 1 4.9 9.6e−07 RP5-968P14.2 −4 6.6e−05 −4 1 −4 6.6e−05 IMPDH1P5 −4.1 4.2e−05 −4.1 1 −4.1 4.2e−05 CAB39L −4.2 2.6e−05 −4.2 1 −4.2 2.6e−05 RP11-259K15.2 10 1.7e−23 8 1 7.6 2.6e−14 RP11-116N8.4 −9.4 5.9e−21 −7.1 1 −6.8 9.6e−12 SAMD4A 4.2 2.2e−05 4.2 1 4.2 2.2e−05 SMAD3 −7.6 4.3e−14 −7.6 1 −7.6 4.3e−14 RAD51-AS1 −4 7.6e−05 −4 1 −4 7.7e−05 CTD-2515H24.2 3.9 0.00011 3.9 1 3.9 0.00011 ATMIN 4.1 4.5e−05 4.1 1 4.1 4.5e−05 MAFTRR −4.1 4.5e−05 −4.1 1 −4.1 4.5e−05 RP11-616M22.12 3.9 9.7e−05 3.9 1 3.9 9.7e−05 VN1R85P 5 6e−07 5 1 5 6e−07 RP11-678G14.2 −4.4 1.3e−05 −4.4 1 −4.4 1.3e−05 POT1-AS1 −4.3 1.5e−05 −4.3 1 −4.3 1.5e−05 CDCA7L 4 6.8e−05 4 1 4 6.8e−05 CTA-398F10.2 −5.5 4.1e−08 −5.5 1 −5.5 4.1e−08 RP11-244N9.4 10.3 7.8e−25 13.4 1.2 10.8 2.4e−27 SAXO1 −6.4 1.5e−10 −6.4 1 −6.4 1.5e−10 TRIM14 5 4.7e−07 5 1 5 4.7e−07 PTCSC2 5 5.6e−07 −6.3 1.5 −4.3 2.1e−05 TRMO −4.6 5.1e−06 −6.7 1.3 −5.4 7.8e−08 TMOD1 −4.1 3.6e−05 −4.1 1.1 −3.9 1e−04 [66]Open in a new tab MAGMA analysis In the cross-tissue TWAS using MAGMA, 34 genes exhibited significant signals (FDR < 0.05) (Supplementary Table S2) (Fig. [67]2A). These genes were enriched in pathways such as: GOBP_CELL_CYCLE_PROCESS, GOCC_MITOTIC_SPINDLE, WP_NEOVASCULARISATION_PROCESSES, BENPORATH_CYCLING_GENES, GOBP_REGULATION_OF_INTEGRIN_BIOSYNTHETIC_PROCESS, GOBP_CELL_CYCLE, BIOCARTA_TGFB_PATHWAY. (Supplementary Table S3) (Fig. [68]2B). Fig. 2. [69]Fig. 2 [70]Open in a new tab MAGMA Analysis. A Using MAGMA in the cross-tissue TWAS, 34 genes showed significant signals (FDR < 0.05). B Pathway enrichment analysis Statistical fine mapping We conducted fine mapping of TWAS associations with the help of FOCUS software, focusing on data from a European population. Under the criteria of FDR < 0.05 and PIP (posterior inclusion probability) > 0.8, 8 positive genes were identified in thyroid tissue (Supplementary Table S4). The summary statistics and PIP of key genes, along with the results, are shown in Fig. [71]3A–D. Finally, the Venn diagram showed that 3 key genes related to TC were identified using three methods: FUSION, MAGMA and FOCUS (Fig. [72]3). Fig. 3. [73]Fig. 3 [74]Open in a new tab Fine mapping results using FOCUS software. A The most significant gene TGFB2 in chromosome 1 region (216243634–218705513); B The most significant gene SMAD3 in chromosome 15 region (67094767–69017999). C The most significant gene SDCCAG8 in chromosome 1 region (242071602–244109499); D The key genes identified by the intersection of three methods. PIP, posterior inclusion probability Colocalization of eQTL and GWAS associations To evaluate the possibility of shared signals between GWAS and eQTL, we carried out colocalization analysis, concentrating on three major susceptibility genes identified by the three TWAS techniques. TGFB2 (rs1764705) on chromosome 1 (PP4 = 0.998), SMAD3 (rs17293632) on chromosome 15 (PP4 = 0.986), and SDCCAG8 (rs2490395) on chromosome 1 (PP4 = 0.841) likely share the same GWAS and eQTL signals (Table [75]2; Fig. [76]4A–C). Table 2. Results of colocalization of eQTL and GWAS associations Gene SNP PP.H0.abf PP.H1.abf PP.H2.abf PP.H3.abf PP.H4.abf SMAD3 rs17293632 9.24505635564691e-34 2.23754112895368e-10 1.38448698168594e-26 0.0023531673160453 0.997646832460195 TGFB2 rs1764705 1.23314204338749e-21 6.62446994923668e-10 2.85065452082743e-14 0.0143281156155687 0.985671883721959 SDCCAG8 rs2490395 1.08635396052033e-13 8.96638851310928e-05 1.93809815999709e-10 0.159123107695604 0.840787228225346 [77]Open in a new tab Fig. 4. [78]Fig. 4 [79]Open in a new tab Results of colocalization of eQTL and GWAS associations. A Colocalization of TGFB2 in eQTL and TC GWAS associations. B Colocalization of SMAD3 in eQTL and TC GWAS associations. C Colocalization of SDCCAG8 in eQTL and TC GWAS associations. D Forest plot of MR results Mendelian randomization (MR) MR analysis used eQTL data for TGFB2, SMAD3 and SDCCAG8 along with TC GWAS data. All SNPs used in the MR analysis were considered strong instruments (F > 10). The findings inferred causal relationships between TGFB2, SMAD3 and SDCCAG8 genes and TC. Patients carrying TGFB2 (OR − 0.5, 95% CI − 0.65 to − 0.36) and SMAD3 (OR − 0.34, 95% CI − 0.43 to − 0.25) genes have a lower likelihood of developing TC than non-carriers. Patients carrying SDCCAG8 (OR 0.27, 95% CI 0.13 to 0.42) genes have a higher likelihood of developing TC than non-carriers (Supplementary Table S5) (Fig. [80]4D). The effects of identified genes on the tumor microenvironment According to Fig. [81]5A, TGBF2 expression levels were higher in thyroid cancer tissues than in normal tissues in the datasets [82]GSE165724, [83]GSE35570, [84]GSE3678, and [85]GSE60542. In the [86]GSE5364 dataset, samples with increased expression of TGFB2 demonstrated higher stromal scores (Fig. [87]5B). The expression levels of SMAD3 were found to be lower in thyroid cancer tissues than in normal tissues in the [88]GSE35570 and [89]GSE58454 datasets (Fig. [90]5C). Within the datasets [91]GSE33630, [92]GSE35570, [93]GSE58545, and [94]GSE60542, samples with high levels of SMAD3 expression had decreased ESTIMATE scores and enhanced tumor purity (Fig. [95]5D). The expression levels of SDCCAG8 in thyroid cancer tissues were greater than those in normal tissues in the [96]GSE33630 and [97]GSE35570 samples (Fig. [98]5E). In datasets [99]GSE29265 and [100]GSE65144, samples exhibiting elevated SDCCAG8 expression showed reduced ESTIMATE scores and increased tumor purity (Fig. [101]5F). Fig. 5. [102]Fig. 5 [103]Open in a new tab Analysis of tumor microenvironment. A Expression of TGFB2 in different data and correlation with ESTIMATE scores. Stromal scores, Immune scores and tumor purity. B Expression of SMAD3 in different data and correlation with ESTIMATE scores, Stromal scores, Immune scores and tumor purity. C Expression of SDCCAG8 in different data and correlation with ESTIMATE scores, Stromal scores, Immune scores and tumor purity Discussion There has been substantial progress in thyroid cancer research due to GWAS and TWAS in recent years. As a biased-free method, GWAS has been widely applied in genetic basis research for various complex diseases. In the study of thyroid cancer, GWAS has revealed multiple genetic variants associated with thyroid cancer risk. For example, a study conducted GWAS analysis on non-malignant thyroid cancer identified five new risk loci, which include 1q42.2, 3q26.2, 5q22.1, 10q24.33, and 15q22.33 [[104]14]. Additionally, another study found that GWAS analysis of thyroid-stimulating hormone (TSH) levels revealed that TSH-related variants are negatively correlated with thyroid cancer risk [[105]15]. TWAS, as a complement to GWAS, can better understand the complex mechanisms of genetic variation in disease regulation by integrating gene expression data. In thyroid cancer research, TWAS has been used to identify genes associated with the disease. For example, TWAS Atlas, as an integrated knowledge base, has collected a large amount of gene-phenotype association data, providing important resources for studying the genetic basis of thyroid cancer [[106]16]. Additionally, OTTERS, as a new TWAS framework, enhances TWAS’s application capabilities in broader contexts by utilizing summary-level reference data [[107]17]. The integration of GWAS and TWAS in thyroid cancer research has uncovered numerous new molecular markers and possible therapeutic targets. Research has identified several molecular changes in thyroid cancer that can be targeted with drugs, such as mutations in BRAF, RAS, and the TERT promoter, which are vital in the advancement of the disease [[108]18]. In addition, findings have demonstrated that the molecular profile of thyroid cancer can be utilized to develop individualized treatment strategies [[109]19]. These advances will help better understand the pathological mechanisms of thyroid cancer and promote innovation and optimization of related treatments. We thoroughly analyzed the genetic predictive relationships between gene expression and TC using the TC GWAS dataset. To enhance the precision of our TC risk gene findings, we utilized FUSION, conditional and joint analyses, and then annotated the genes with MAGMA and fine-mapped them using FOCUS. Among the pathways proposed by MAGMA analysis, many are closely related to the occurrence and development of thyroid cancer, such as: GOBP_CELL_CYCLE_PROCESS, GOCC_MITOTIC_SPINDLE, WP_NEOVASCULARISATION_PROCESSES, BENPORATH_CYCLING_GENES, GOBP_CELL_CYCLE and BIOCARTA_TGFB_PATHWAY. According to Liu et al., USP44 significantly inhibited the growth of thyroid cancer cells by blocking the G1/S phase transition in the cell cycle. On a mechanistic level, USP44 directly bound to p21 and removed its K-48-linked polyubiquitination chain [[110]20]. Re-expressing OVOL2 limited EMT and aggressiveness in ATC cells, which was linked to widespread irregularities in mitotic spindle structure and cytoskeletal organization [[111]21]. According to Yi et al., LIFR-AS1 acted as a sponge for miR-31-5p, leading to an increase in SIDT2 levels, which in turn suppressed the secretion of vascular endothelial growth factor in PTC cells and reduced angiogenesis in human umbilical vein endothelial cells [[112]22]. We discovered three risk genes by combining the outcomes from three genetic analysis techniques: MAGMA, FUSION, and FOCUS. Finally, Bayesian colocalization analysis and MR analysis were performed on these genes to suggest causal relationships with TC. Our results can improve understanding of the genetics and etiology of TC. By integrating three TWAS methods, we identified three risk genes (TGFB2, SMAD3, and SDCCAG8). COLOC colocalization analysis determined the shared SNPs between TGFB2 (rs1764705), SMAD3 (rs17293632), SDCCAG8 (rs2490395) and TC. We combined various datasets at the transcriptome level to analyze the expression of these crucial genes in both normal and thyroid cancer tissues, and assessed their relationship with the tumor microenvironment. TGFB2, an important component of the TGF-β superfamily, has a complex role in the advancement of solid tumors. In the tumor microenvironment (TME), TGFB2 drives immunosuppression by polarizing M2 macrophages and expanding regulatory T cells (Tregs), which dampen cytotoxic T-cell activity and promote immune evasion [[113]23–[114]25]. In pancreatic ductal adenocarcinoma (PDAC), The m6A modification mediated by METTL14 stabilizes TGFB2 after transcription, encouraging lipid accumulation, and the resulting increase in triglycerides contributes to gemcitabine resistance, as revealed by lipidomic profiling [[115]26]. Research findings suggest that DLGAP1-AS1 can control the expression of TGFB2 by acting as an e-RNA for miR-149-5p, which results in the progression and resistance to 5-FU treatment in colorectal cancer [[116]27]. SMAD3 is a critical component of the TGF-β signaling pathway and has a significant role in different malignant tumors. Studies conducted in recent years suggest that SMAD3 plays a crucial role in diagnosing cancer, predicting its outcome, and forecasting immune responses [[117]28]. Higher SMAD3 levels in pancreatic ductal adenocarcinoma are correlated with epithelial-mesenchymal transition (EMT) and suggest a poorer prognosis [[118]29]. Furthermore, the function of SMAD3 in radiotherapy has been a focal point of research; findings suggest that suppressing SMAD3 can heighten DNA damage during radiotherapy by associating with the MRE11-RAD50-NBS1 complex, thereby enhancing sensitivity of gliomas to radiotherapy [[119]30]. The level of SMAD3 expression in hepatocellular carcinoma is strongly related to poor outcomes and significantly correlates with immune cell infiltration [[120]28]. The interaction between SMAD3 and S100A4, a protein associated with metastasis, promotes the invasiveness of breast cancer cells [[121]31]. Research on SDCCAG8 in cancerous tumors has attracted considerable interest because of its complex involvement in cellular activities and its impact on cancer development. Known for its participation in ciliogenesis and Hedgehog signaling, SDCCAG8 has been identified as a key factor in the development and progression of different types of cancer. Research has shown that SDCCAG8 interacts with RAB effector proteins and is essential for Hedgehog signaling, which is crucial for cell proliferation and differentiation [[122]32]. SDCCAG8’s role in cilia formation and function may be linked to its involvement in tumorigenesis. The carboxyl-terminal part of SDCCAG8 is recognized as a crucial component necessary for the formation of cilia, the development of organs, and maintaining homeostasis, suggesting its possible influence on tumor biology [[123]33]. Moreover, SDCCAG8 has been implicated in metabolic reprogramming, a hallmark of cancer. The abnormal activation of metabolic pathways, such as aerobic glycolysis, is a common feature in malignant tumors, contributing to chemoresistance and radiotherapy resistance [[124]34]. Exploring the role of SDCCAG8 in these mechanisms could lead to new understandings of how to combat treatment resistance in individuals with cancer. The relationship between SDCCAG8 and the SLFN family, which influences immune responses and chemosensitivity, further suggests SDCCAG8 as a promising target for cancer therapy [[125]35]. Exploring the roles and interactions of SDCCAG8 may help in devising novel treatment strategies that adjust its activity to combat tumor growth and progression. By advancing our understanding of SDCCAG8’s role in cancer, we can enhance the effectiveness of existing therapies and develop new strategies to combat malignancies more effectively. A key advantage of this study is the application of various methods (FUSION, MAGMA, FOCUS, and colocalization analyses) to pinpoint and confirm TC susceptibility genes, which boosts the dependability and consistency of our results. This research has some limitations. Mainly, the GWAS data and GTExv8 eQTL reference data utilized are from European populations, which may restrict the applicability of our results to other ethnic groups. On the flip side, the absence of clinical data restricts us from performing subgroup analyses by sex, age at diagnosis, and major TC subtypes (PTC, FTC, PDTC, and ATC), as well as from conducting polygenic risk score (PRS) analysis based on the main independent genome-wide significant SNPs for TC count, the goal is to examine whether this trait correlates with differences in TC among various sexes, ages, and phenotypic subtypes. Finally, the potential mechanisms of the identified risk genes have not been experimentally validated. Conclusion To sum up, this research pinpointed three genes linked to TC risk through three TWAS approaches, with colocalization analysis verifying shared SNPs between these genes and TC. According to Mendelian randomization, TGFB2 and SMAD2 could potentially decrease TC risk, whereas SDCCAG8 might elevate it. These discoveries shed light on the genetic underpinnings of TC and underscore the necessity of examining these genes in the context of TC development. Electronic supplementary material [126]Supplementary Material 1^ (29.7KB, xlsx) [127]Supplementary Material 2^ (647.8KB, jpg) Acknowledgements