Abstract

   One of the most robust approaches to the prediction of causal driver
   genes of complex diseases is to apply reverse engineering methods to
   infer a gene regulatory network (GRN) from gene expression profiles
   (GEPs). In this work, we analysed 794 GEPs of 1117 human whole-blood
   samples from Amyotrophic Lateral Sclerosis (ALS) patients and healthy
   subjects reported in the [41]GSE112681 dataset. GRNs for ALS and
   healthy individuals were reconstructed by ARACNe-AP (Algorithm for the
   Reconstruction of Accurate Cellular Networks - Adaptive Partitioning).
   In order to examine phenotypic differences in the ALS population
   surveyed, several datasets were built by arranging GEPs according to
   sex, spinal or bulbar onset, and survival time. The designed reverse
   engineering methodology identified a significant number of potential
   ALS-promoting mechanisms and putative transcriptional biomarkers that
   were previously unknown. In particular, the characterization of ALS
   phenotypic networks by pathway enrichment analysis has identified a
   gender-specific disease signature, namely network activation related to
   the radiation damage response, reported in the networks of bulbar and
   female ALS patients. Also, focusing on a smaller interaction network,
   we selected some hub genes to investigate their inferred pathological
   and healthy subnetworks. The inferred GRNs revealed the interconnection
   of the four selected hub genes (TP53, SOD1, ALS2, VDAC3) with
   p53-mediated pathways, suggesting the potential neurovascular response
   to ALS neuroinflammation. In addition to being well consistent with
   literature data, our results provide a novel integrated view of ALS
   transcriptional regulators, expanding information on the possible
   mechanisms underlying ALS and also offering important insights for
   diagnostic purposes and for developing possible therapies for a disease
   yet incurable.

   Keywords: Amyotrophic lateral sclerosis (ALS), Algorithm for the
   reconstruction of accurate cellular networks - adaptive partitioning
   version (ARACNe-AP), Gene expression profile (GEP), Gene regulatory
   network (GRN), Reverse engineering methods

Highlights

     * •
       ARACNe-AP tool was applied to reconstruct the gene regulatory
       networks (GRNs).
     * •
       ARACNe inferred GRNs from blood transcriptomic data of ALS and
       healthy individuals.
     * •
       GRNs based on gender and ALS disease onset type outlined novel
       deregulated pathways.
     * •
       Regulatory patterns of some hub genes (TP53, SOD1, ALS2, VDAC3)
       were characterized.
     * •
       ARACNe-GRNs identified ALS phenotypic motifs and transcriptional
       gene signatures.

1. Introduction

   A major challenge in transcriptomic research involves inferring the
   relationships among transcription factors (TFs) and their targets from
   gene expression profiles (GEPs) by making use of reverse engineering
   methods [[42][1], [43][2], [44][3]]. Reverse engineering is the process
   of deconstructing and analysing a pre-existing system (natural or
   artificial) to elucidate its underlying design, functionality, and
   components, with the aim of comprehending its design principles,
   functionality, and/or structure. It is often also for the purpose of
   replicating, enhancing, or redesigning the system. In the field of
   genomics, reverse engineering refers to the analytical process of
   deducing the regulatory networks and functional relationships among
   genes, proteins, and other molecular components within a given
   biological system. This computational approach involves the integration
   of experimental data, computational pipelines, and statistical
   techniques to infer the complex interactions that govern gene
   expression, protein-protein interactions, and metabolic/developmental
   pathways. Genomics-oriented reverse engineering aims to uncover the
   underlying biological mechanisms and signalling cascades, often with
   the goal of understanding pathological states and processes,
   identifying therapeutic targets, or elucidating the dynamics of
   cellular systems. This process is fundamental to unravelling the
   complexities of biological systems and advancing precision medicine.

   Through this approach, the connectivity among naturally occurring genes
   is reconstructed in a large-scale gene regulatory network (GRN).
   Inferring a GRN from GEPs is useful not only for approximating patterns
   of gene regulation, but also for characterizing the impact that changes
   in gene expression may have on the structural and functional control of
   the GRN in response to certain cellular and environmental stimuli
   [[45][4], [46][5], [47][6]]. GRN analysis involves the reconstruction
   of the GRNs that represent interactions (the edges) between TFs and
   target genes (the nodes), enabling researchers to infer regulatory
   relationships and identify key drivers of disease processes. To date,
   GRN inference has yielded important insights into the molecular
   mechanisms of some complex and multifactorial diseases, such as cancers
   and neurodegenerative diseases [[48][7], [49][8], [50][9], [51][10],
   [52][11], [53][12]]. In particular, the use of GRNs offers particular
   advantages for the elucidation of the mechanisms of some rare diseases
   such as amyotrophic lateral sclerosis (ALS). These present severe
   problems for clinicians, such as misleading or late diagnosis, high
   mortality rates, lack of therapy, or poor data management and patient
   selection. We contend that the use of GRNs to study complex and
   multifactorial diseases such as ALS offers great opportunities,
   allowing deregulated pathways, biomarkers, and potential therapeutic
   targets to be identified [[54]13]. ALS is a heterogeneous
   neurodegenerative disorder caused by the degeneration of motoneurons
   [[55]14]. The majority of cases (90 %), have no familial correlation
   (sALS) while the remaining 10 % of ALS cases are hereditary (fALS)
   [[56]14]. One of the hallmarks of ALS is the presence of several
   inclusions formed by apparently disparate misfolded proteins, which are
   unrelated in sequence, structure, function, and localization [[57]15].
   ALS predominately affects rather than women (3:1 ratio) with survival
   rates depending on clinical manifestations and genetic defects. In
   fact, survival ranges from 2 to 5 years from onset in most patients to
   10 years or more in about 10 % of cases [[58]16]. Currently, more than
   30 candidate genes have been identified as being involved in ALS
   pathogenesis ([59]http://alsod.iop.kcl.ac.uk/). Four major genes are
   associated with the onset of ALS: SOD1, TDP-43, FUS and C9orf72
   [[60]17]. These are joined by more than 20 other minor genes (e.g.
   ALS2, SETX, VAPB, FIG4, ERBB4, MATR3, ANG, OPTN, VCP, UBQLN2, CHMP2B,
   PFN1, hNRPA1 A2/B1, TUBA4A, NEK1, UNC12A, ANXA11) [[61][18], [62][19],
   [63][20]]. Mutations in these genes are responsible for about 70 % of
   fALS and about 12 % of sALS [[64]18]. ALS can have a spinal onset
   (SALS), which is more frequent and has better survival prospects, or a
   bulbar onset (BALS), which is less frequent, and results in shorter
   survival times [[65]21]. However, the two forms are difficult to
   distinguish in the early stages of the disease. In addition to these
   issues, and genetic variability between the sporadic and familial forms
   of ALS, there is a lack of definite information on factors influencing
   risk, rate of disease progression, and survival. These problems are
   compounded by gender differences and site of disease onset (SALS and
   BALS) [[66]21]. It is likely that not all genes directly and/or
   indirectly involved in the disease have been identified to date. Also
   to be considered the limitations of preclinical animal models in
   advancing our understanding of ALS, a disease that in 90 % of cases has
   sporadic onset. Among the criticisms raised of animal models those of
   not taking into account the multifactorial nature of the diseases and
   thus not recapitulating the complexity of human neurodegenerative
   diseases [[67]22]. However, most of our mechanistic knowledge about ALS
   comes from studies in animal models with a specific and well-defined
   genetic cause. The proteins encoded by ALS genes have apparently
   different functions, which prevents drawing a single picture
   representing the pathological mechanism of ALS. Therefore, targeting
   predictive methods to find links between the known mechanisms of ALS
   may represent a great opportunity for the expansion of mechanistic
   knowledge about the disease.

   By reconstructing networks from diverse patient samples (and control
   samples), GRN analysis can identify regulatory modules that are
   consistently altered in ALS, regardless of the specific genetic
   mutations involved [[68]23]. To appreciate and assess the value of GRN
   analysis in ALS study, it is important to compare it with other
   commonly used methods, such as differential expression analysis,
   weighted gene co-expression network analysis (WGCNA), and genome-wide
   association studies (GWAS). While differential expression analysis
   identifies individual genes that are up- or down-regulated, it lacks
   insight into gene interactions and is often noisy due to ALS
   heterogeneity. WGCNA analysis reveals gene clusters but does not
   capture causal relationships, leading to potential misinterpretations.
   GWAS identifies genetic variants linked to ALS but struggles to explain
   their functional role in disease progression. In contrast, GRN analysis
   uncovers regulatory interactions between genes, providing a
   systems-level view of gene regulation. This enables the identification
   of key regulatory nodes and understanding how genetic variants
   influence gene expression and disease pathways, complementing GWAS
   findings. Hence, compared with to traditional methods like differential
   expression analysis, co-expression network analysis and GWAS, our
   approach, based on GRN analysis by ARACNe-AP, provides a more
   mechanistic and comprehensive understanding of gene regulation in ALS
   [[69]8,[70]23,[71]24].

   The aim of this research work was to elucidate some still poorly
   understood aspects of the ALS by focusing on as yet unknown gene
   interconnections. Using a reverse engineering approach, we performed a
   large-scale in silico analysis [[72]25,[73]26] that allowed us to map
   the network of interactions between TFs that control ALS gene
   expression and identify novel gene interactions in the ALS
   transcriptional program.

2. Results

2.1. Characterization of ALS phenotypic networks

   The workflow to build the GRNs and subnetworks (subGRNs) from 794 ALS
   blood GEPs reported in [74]GSE112681 dataset [[75]25,[76]26] is
   schematically represented in [77]Fig. 1a–f.

Fig. 1a–f.

   [78]Fig. 1a–f
   [79]Open in a new tab

   Research pipeline of ALS consensus network building. The algorithm
   ARACNe-AP used as input files: (a) 794 GEPs from [80]GSE112681 dataset,
   and (b) a list of regulators comprising human TFs (see [81]Table S1) to
   build: (c) GRN1-7, pruned by 95th percentile of mutual information;
   and, (d) four subnetworks of some hub genes selected among top 5 %
   (TP53) and among 30 % (SOD1, ALS2, VDAC3) of the GRN1 and GRN2. Data
   analysis and interpretation was carried out by genomic approaches of
   (e) biological process enrichments for GRN1-7 and (f) functional gene
   clustering for hub genes to investigate underlying pathophysiological
   mechanisms and their normal counterparts.

   All gene regulatory networks and subnetworks identified from ARACNe-AP
   were produced by processing data on 1635 human TFs ([82]Table S1)
   [[83]27]. The GO database was then used to compare the activity of the
   pathways associated with the different ALS phenotypic networks,
   obtained by ARACNe-AP and filtered with 95th percentile (perc) of
   mutual information (MI) (see 5. Materials and Methods). We thus
   identified seven lists of hub genes ([84]Table S2) and the following
   ALS-associated networks: the ‘pathological’ (GRN1) versus the ‘healthy’
   (GRN2), the “disease-control’ (GRN3), the ‘spinal onset – SALS” (GRN4),
   the ‘bulbar onset - BALS’ (GRN5), the ‘female’ (GRN6) and the ‘male’
   (GRN7) network. More details can be found in the hub gene lists
   ([85]Table S2) and GRN1-7 plots ([86]Figs. S1–S2).

   The GO pairwise comparison between the pathological and healthy network
   revealed protein metabolism and radiation damage response among the top
   pathways in the pathological network (GRN1) ([87]Fig. 2; [88]Table
   1a–b). In contrast, the biological processes in the healthy network
   (GRN2) ([89]Fig. 2; [90]Table 1a) were related to vascular biology
   involving several immune responses, platelet aggregation and apoptosis,
   transcription and translation, glucose and energy metabolism. These
   findings are consistent with the GO terms of another study focused on
   characterizing the normal heterogeneity of gene expression in blood
   cells [[91]28]. Furthermore, as the characteristics of the data used to
   train the ARACNe-AP model have a huge influence on the information that
   can be obtained from the gene relationships [[92]23], we separated ALS
   samples of both sexes in two groups, spinal-SALS and bulbar-BALS, to
   prepare GRN4-5 ([93]Table 1b), and then we analysed the opposite
   clustering of GEPs, assembling female and male patients of both at
   disease-onset, respectively in GRN6-7 ([94]Fig. 2; [95]Table 1c).

Fig. 2.

   [96]Fig. 2
   [97]Open in a new tab

   Representation of key findings of pathway analysis of GRN hub genes. In
   shaded boxes, it is highlighted the most significant difference
   obtained from the GO pairwise comparison (see [98]Table 1a–c)
   associated with the ‘response to light/radiation’ emerged in certain
   phenotypic cluster assembled in GRNs.

Table 1a-c.

   Pathway analysis of: a. Pathological (GRN1), healthy (GRN2) and
   disease-control network (GRN3); b. spinal-onset (GRN4) and bulbar-onset
   ALS network (GRN5); c. female (GRN6) and male ALS patients' network
   (GRN7). GO analysis of hub gene list ([99]Table S2) calculated with
   95th perc of MI of ARACNe-AP algorithm was performed using Fisher's
   exact test and Bonferroni correction for P < 0.05; Panther
   Overrepresentation test (release 20210224; Panther v.16.0 released
   2020-12-01). In bold, the first ranked GO pathways of interest.
   GO Biological Process Ref. list Our list Expected Fold Enrichment +/− P
   value
   A
   Pathological network (GRN1) response to light stimulus 317 46 19,81
   2,32 + 1,20E-02
   response to radiation 453 59 28,31 2,08 + 7,56E-03
   negative regulation of metabolic process 2952 249 184,47 1,35 +
   1,70E-02
   negative regulation of biological process 5187 405 324,14 1,25 +
   8,08E-03
   regulation of metabolic process 6586 496 411,57 1,21 + 1,46E-02
   cellular protein metabolic process 3219 268 201,16 1,33 + 1,75E-02
   cellular macromolecule metabolic process 4442 353 277,58 1,27 +
   1,43E-02
   cellular process 15074 1060 941,99 1,13 + 5,55E-10
   protein metabolic process 3807 308 237,9 1,29 + 2,20E-02
   primary metabolic process 6869 511 429,25 1,19 + 3,43E-02
   Healthy network (GRN2) regulation of transport 1732 158 107,98 1,46 +
   4,34E-02
   regulation of biological process 11528 806 718,72 1,12 + 1,52E-02
   biological regulation 12249 848 763,67 1,11 + 2,72E-02
   multicellular organismal process 6584 494 410,48 1,2 + 1,67E-02
   regulation of cellular process 11076 777 690,54 1,13 + 2,25E-02
   cellular process 15074 1054 939,79 1,12 + 3,08E-09
   Disease-control network (GRN3) response to light stimulus 317 47 20.84
   2.26 + 1.38E-02
   response to radiation 453 60 29.78 2.01 + 1.54E-02
   cellular protein metabolic process 3219 281 211.63 1.33 + 1.20E-02
   cellular macromolecule metabolic process 4442 371 292.04 1.27 +
   9.11E-03
   cellular process 15074 1122 991.03 1.13 + 3.63E-12
   macromolecule metabolic process 5743 458 377.57 1.21 + 3.53E-02
   primary metabolic process 6869 538 451.60 1.19 + 1.83E-02
   negative regulation of biological process 5187 420 341.01 1.23 +
   2.73E-02
   regulation of biological process 11528 849 757.90 1.12 + 1.16E-02
   biological regulation 12249 899 805.30 1.12 + 3.58E-03
   regulation of metabolic process 6586 521 432.99 1.20 + 1.05E-02
   regulation of cellular process 11076 820 728.18 1.13 + 1.13E-02
   B
   Spinal network (GRN4) cellular protein metabolic process 3219 274
   205,85 1,33 + 1,29E-02
   cellular macromolecule metabolic process 4442 367 284,06 1,29 +
   1,85E-03
   cellular process 15074 1087 963,95 1,13 + 8,27E-11
   macromolecule metabolic process 5743 451 367,25 1,23 + 1,02E-02
   protein metabolic process 3807 317 243,45 1,3 + 9,56E-03
   primary metabolic process 6869 530 439,26 1,21 + 3,90E-03
   negative regulation of metabolic process 2952 251 188,77 1,33 +
   4,77E-02
   regulation of metabolic process 6586 507 421,16 1,2 + 1,21E-02
   Bulbar network (GRN5) response to light stimulus 317 44 18,69 2,35 +
   1,03E-02
   response to radiation 453 55 26,7 2,06 + 2,63E-02
   response to abiotic stimulus 1125 108 66,31 1,63 + 2,39E-02
   response to organic substance 2688 219 158,45 1,38 + 1,52E-02
   negative regulation of metabolic process 2952 235 174,01 1,35 +
   3,01E-02
   negative regulation of biological process 5187 389 305,75 1,27 +
   1,91E-03
   regulation of metabolic process 6586 472 388,22 1,22 + 8,69E-03
   negative regulation of cellular process 4785 353 282,06 1,25 + 4,54E-02
   regulation of primary metabolic process 5707 412 336,41 1,22 + 3,69E-02
   cellular process 15074 996 888,56 1,12 + 1,81E-08
   C
   Female network (GRN6) response to radiation 453 60 30,42 1,97 +
   3,02E-02
   negative regulation of biological process 5187 426 348,32 1,22 +
   4,67E-02
   multicellular organismal process 6584 525 442,13 1,19 + 4,93E-02
   cellular process 15074 1143 1012,25 1,13 + 9,27E-12
   negative regulation of biological process 5187 426 348,32 1,22 +
   4,67E-02
   multicellular organismal process 6584 525 442,13 1,19 + 4,93E-02
   cellular process 15074 1143 1012,25 1,13 + 9,27E-12
   Male network (GRN7) cellular protein metabolic process 3219 263 193,5
   1,36 + 4,04E-03
   cellular macromolecule metabolic process 4442 342 267,02 1,28 +
   1,09E-02
   cellular process 15074 1010 906,12 1,11 + 1,89E-07
   protein metabolic process 3807 302 228,85 1,32 + 5,47E-03
   primary metabolic process 6869 495 412,91 1,2 + 2,24E-02
   cellular protein metabolic process 3219 263 193,5 1,36 + 4,04E-03
   cellular macromolecule metabolic process 4442 342 267,02 1,28 +
   1,09E-02
   [100]Open in a new tab

   Panther classification system: The 2nd column contains the number of
   genes in the reference list that map to this annotation data category.
   The 3rd column contains the number of genes in your uploaded list that
   map to this annotation data category. The 4th column contains the
   expected value, which is the number of genes you would expect in your
   list for this category, based on the reference list. The 5th column
   shows the Fold Enrichment of the genes observed in the uploaded list
   over the expected (number in your list divided by the expected number).
   If it is greater than 1, it indicates that the category is
   overrepresented in your experiment. Conversely, the category is
   underrepresented if it is less than 1. The 6th column has either a + or
   -. A plus sign indicates over-representation of this category in your
   experiment: you observed more genes than expected based on the
   reference list (for this category, the number of genes in your list is
   greater than the expected value). Conversely, a negative sign indicates
   under-representation. The 7th column is the raw p-value as determined
   by Fisher's exact test.

   In these four networks (GRN4-7), GEPs of ALS patients were further
   arranged for survival time (<1 year, from 1 to 3 years, from 3 to 5
   years, from 5 to 10 years), as schematically represented in [101]Figs.
   S1a–b (see 5. Materials and Methods). The stratification of ALS
   patients was aimed at identifying the most different pathways mediating
   the spinal and bulbar ALS phenotypes, as well as the otherwise
   undetectable gender response to disease onset. Another issue to
   consider when interrogating eventual gender modulation or gender
   prevalence in spinal or bulbar ALS trajectories is the possible bias of
   qualitative analysis of biochemical pathways related to sex hormones
   regulation [[102]29]. For this reason, the hub gene lists of GRN6-7
   ([103]Table S2) were cleared of genes implicated in gonadal
   differentiation [[104]29] (see 5. Materials and Methods).
   Interestingly, the pathway analysis showed no significant differences
   between female and male patients ([105]Table 1c), with the exception of
   the radiation response in the female network ([106]Fig. 2).
   Furthermore, the hub genes ([107]Table S2) were compared with the
   pattern of ALS survival-associated upregulated genes identified by
   Swindell et al. [[108]25], to identify a functional link between
   survival risk markers and our transcriptional regulators. We thus found
   in the spinal, bulbar and female networks (GRN4-6) two genes (SLC40A1,
   CLEC7A) out of the 30 upregulated genes associated with worse prognosis
   and no targets among the 31 upregulated genes associated with good
   prognosis. The male network (GRN7) was represented only by the SLC40A1
   gene. Double crossover was detected in GRN1 and GRN3, while in the
   'healthy network’ GRN2 the VRK3 gene was evaluated as associated with
   good prognosis.

   A first analysis performed by comparing the filtered hub gene lists
   ([109]Table S2) shows numerical variation between the different groups:
   GRN1 (n = 1313), GRN2 (n = 1305), GRN3 (n = 1381), GRN4 (n = 1339);
   GRN5 (n = 1239); GRN6 (n = 1415), GRN7 (n = 1259). As expected, most
   genes are shared between GRNs, with some genes specific to GRN1
   (n = 7), GRN2 (n = 251), GRN3 (n = 23), GRN4 (n = 18); GRN5 (n = 46);
   GRN6 (n = 91), GRN7 (n = 21). These differences could underlie the
   observed variations in GRN morphology ([110]Figs. S1–2). Specifically,
   the shape of the pathological gene regulatory pattern (GRN1) looks like
   an inverted 'L' with an oblong central stem and a small upward folded
   leg. Several small lateral branches of nodes are also evident,
   including a more descending branch of genes. Some large nodes (red
   circles) are scattered in the central stem. The shape of the normal
   pattern (GRN2) results from a mainly horizontal distribution, with a
   rounded central body, nodes on the inside and a flexed lateral tail.
   The disease control network (GRN3) retains the pathological shape and
   healthy branches appear to have been lost or heavily pruned ([111]Fig.
   S1).

   The spinal (GRN4) and bulbar (GRN5) networks appear similar, both
   having a symmetrical division with a cluster of hubs on each side, but
   the bottleneck is more densely connected in the spinal network than in
   the bulbar network ([112]Fig. S2). Comparing the graphical view of the
   female (GRN6) and male (GRN7) networks, a number of differences become
   apparent: (i) a dividing line and a narrower constraint point in the
   male network (GRN7); (ii) in the female network (GRN6) the branches are
   mirrored, with a more regular distribution than in the male SLA ALS
   network developed on one side only ([113]Fig. S2). The analysis of
   geometric differences in the spatial organization of GRNs may suggest
   concepts describing the pairwise relationship between the shape of
   network and their gene connections associated with ontological
   enrichments.

2.2. Analysis of TP53, SOD1, ALS2 and VDAC3 sub-networks

   Surprisingly, in the top 5 % of hub genes in networks generated from
   ALS patient data, we found no known ALS-associated genes except for the
   tumour suppressor gene TP53 (OMIM#[114]191170). Although the TP53
   function has been extensively associated to neoplastic processes, this
   gene is also known to play a role in neurodegeneration, resulting in
   neural apoptosis in response to several insults [[115]30,[116]31]. In
   addition, recent studies showed that TP53 is a central driver in ALS
   associated with mutant SOD1 or C9orf72 [[117]32,[118]33] and, more
   broadly, that it is included in the ALS transcriptional signature
   [[119]34]. Next, searching among hub genes for known ALS-associated
   genes, we found, in the top 30 % of the GRNs, the superoxide dismutase
   1 gene (SOD1; OMIM#[120]147450) and the Alsin Rho guanine nucleotide
   exchange factor ALS2 gene (ALS2; OMIM#[121]606352) [[122]35]. The plot
   of SOD1 subnetwork revealed no common targets between pathological and
   healthy conditions. In particular, subGRN1 showed 14 new putative
   disease targets, all strongly implicated in morphological development
   and cell viability, while several known SOD1-interacting partners
   involved in mitochondrial metabolism were found in the healthy subGRN2.
   In contrast, most of the ALS2-interacting genes found in both
   sub-networks have never been associated with ALS. To sum up, we
   identified 8 new genes out of 11 in subGRN1 and another 4 new genes,
   encoding zinc-finger proteins in subGRN2. Within the same cut-off point
   as the SOD1 and ALS2 genes, we also identified VDAC3
   (OMIM#[123]610029), a gene encoding an important mitochondrial protein
   also thought to be related to neurodegeneration due to its role as a
   sensor of redox state and protection of mitochondria from oxidation
   [[124]36,[125]37]. Moreover, a putative impact of VDAC3 in ALS has
   recently been highlighted by our group [[126]37]. Briefly, the
   regulatory scenario related to VDAC3 showed 12 genes in subGRN1, an
   additional 12 genes in subGRN2 and two common nodes. Considering the
   still limited knowledge about the regulation and transcriptional
   interactions of VDAC3, the inferred GRNs provide important information
   to define the role of this gene in the pathological context. Data
   produced by these analyses summarize the plot information of molecular
   interactions related to the first-neighborhood of four hub genes,
   distinguishing two types of results, red for the pathological
   sub-network (subGRN1) and blue for the healthy sub-network (subGRN2),
   further grouped as ‘known’ and ‘novel’ target genes, based on the
   available literature.

2.2.1. TP53 sub-networks

   Consistent with the TP53 gene's functions as both a transcriptional
   regulator ([127]Table S1) and effector, two types of connections were
   deduced from this hub as incoming edges, i.e. a set of putative
   upstream genes acting as TFs, and outgoing edges representing p53
   downstream targets ([128]Fig. 3a–b). The analysis of TP53-related
   subGRN1 and subGRN2 revealed the articulated transcriptional dynamics
   of the disease. Moreover, most of the p53 targets in both sub-networks
   are unique and describe poorly investigated activities of TP53 with
   respect to known mechanisms. Regarding the upstream genes identified by
   our analysis, 4/7 TFs have never been associated with ALS (PURA, ZFP90,
   ZNF410, ZNF717) ([129]Fig. 3b), nor have their 43 downstream
   interactors. In the subGRN2 sub-network, 24/26 new downstream targets
   were identified and only one gene was a common target between the two
   sub-networks; this had also not been detected previously ([130]Fig.
   3b).

Fig. 3a–b.

   [131]Fig. 3a–b
   [132]Open in a new tab

   The TP53 subnetwork. (a) The TP53 subnetwork was obtained by including
   all the genes that have P < 1.0 × 10-8 based on their pairwise MI with
   TP53. For representation purposes, only the first neighbors are shown.
   Red nodes indicate the target genes interacting with TP53 in the
   pathological network. Blue nodes represent the predicted interactions
   in the healthy network. Grey node is referred to common target in both
   networks. Red lines indicate the upstream inferred arcs in the
   pathological network (i.e. putative TP53-related TFs). The size of each
   circle is proportional to the number of the gene interactions. For hubs
   with more than 100 interactions, the exact number of first neighbors is
   shown beside the gene symbol.

   In subGRN1, TP53 shows to be modulated by seven putative TP53-related
   TFs (FLYWCH1, FOXM1, KLF13, PURA, ZFP90, ZNF410, ZNF717), and in turn,
   to modulate 43 genes (ARAF, BCLAF1, C6orf201, C9orf135, CARTPT, CSTF1,
   DAGLA, DDX19-DDX19L, ELFN2, ERLIN2, FGD4, FN1, FPGT, FYCO1, IL1F5,
   KRT33B, LRRC7, MAGEA2B, MAT2A, MRPS18A, NAGPA, NUP54, ODF4, OR13F1,
   PHYHIP, PPM1M, PPP3CA, PRPF6, PSTK, RABEPK, REP15, RGS9BP, RNASE1,
   ROM1, SEMA3E, SLC2A8, SNORA65, TCTN1, TSPYL6, TTC4, VWC2, WFDC9, XPR1).

   The subGRN2 contains 26 genes (AP2B1, C16orf56, CD86, CD99L2, CDC42SE2,
   CDCA5, CDH24, COMMD1, COQ5, DBT, DHRS1, EYA4, FRAG1, GOSR1, GTF3C2,
   LIMD1, MS4A4A, NUDT11, NUMB, PCDH9, REXO2, SAA4, SPG21, SUOX, ULK1 and
   UST). The only common target (grey node) between subGRN1 and subGRN2 is
   a non-coding RNA gene, LCAP (L1CAM-AS1). (b) Analysis of known (book
   symbol) and novel (flag symbol) interacting partners in the
   pathological and healthy subnetworks were schematically summarized.
   n/a = not available information.

   Seven TFs (FOXM, FLYWCH1, KLF13, PURA, ZFP90, ZNF410, ZNF717) whose
   transcripts have been identified in studies of ALS patients are found
   in our TP53 sub-network ([133]Fig. 3a–b). In addition to some
   zing-finger factors, we detected the PURA (Purine Rich Element Binding
   Protein A) gene, encoding a highly conserved protein with regulatory
   roles in replication and transcription, RNA transport and mRNA
   translation. PURA is also involved in RNA foci formation, as recently
   found in ALS/FTD [[134]38]. All downstream pathological targets
   observed were novel and most of their functions are associated with
   some neurodegeneration mechanisms, including ALS-related ones. For
   instance, we detected the genes encoding a-Raf, Bclaf1 and Ppm1m, known
   mediators of p53-related oncogenic signalling [[135]39]. Other
   predicted interactions, although not directly related to TP53, are
   known to play a role in ALS. One of these involves the DAGLA gene,
   encoding diacylglycerol lipase alpha, the main enzyme involved in the
   biosynthesis of endocannabinoids, important for their antioxidant
   properties and action in multiple neurotoxic pathways and also related
   to ALS [[136]40]. Another interesting finding concerns NUP54, the gene
   encoding nucleoporin 54 - a member of the nuclear pore complex, which
   mediates the shuttling of RNA cargoes, proteins, and TFs from the
   cytoplasm to the nucleus. Indications for the pathological cytoplasmic
   accumulation of RNA-binding proteins have been addressed due to defects
   in the nuclear import and Nup abnormalities in ALS and related
   neurodegenerative diseases [[137]41].

   In addition, the SEMA3E gene, belonging to the semaphorin gene family,
   encodes extracellular signalling proteins involved in the cytoskeleton
   organization and development of many tissues, especially in the nervous
   system for axonal growth. In particular, semaphorins have been
   extensively investigated in ALS pathobiology as candidates of
   axonopathy and muscle denervation [[138]42]. Another member of the TP53
   downstream group, the CSTF1 gene, encodes one of three subunits of the
   cleavage stimulation factor implicated in the polyadenylation and
   3′-end cleavage of pre-mRNAs. In this regard, upregulation of CSTF1
   linked to altered RNA metabolism was revealed in the SOD1^G86R mouse
   model [[139]43]. Recently, it has been reported that CSTF1 expression
   is significantly elevated in the peripheral blood of Alzheimer's
   disease (AD) patients compared with controls, proving its potential
   diagnostic value [[140]44]. Also of interest is that XPR1 was
   identified as a novel susceptibility locus for progressive supranuclear
   palsy, whose clinical manifestations overlap with ALS and Parkinson's
   disease (PD) [[141]45]. In addition, in a study in an ALS Drosophila
   melanogaster model with defects in FUS protein, it was shown that
   knockdown of Xrp1 in motor neurons rescues the phenotypes of ALS-mutant
   FUS flies [[142]46].

   For a clear presentation of the data, targets sharing some functional
   properties and cooperate in the same pathway were clustered in two main
   groups: the cytoskeleton and crosstalk between intracellular vesicles
   (ELFN2, ERLIN2, FGD4, FPGT, FYCO1, IL5F/IL36 R N, LRRC7, MAGEA2B,
   MRPS18A, PPP3CA, RABEPK, TCTN1, TTC4) and the endosomal-lysosomal
   pathway (FN1, NAGPA, REP15, RNASE1, VWC2, WFDC9). Both pathways,
   already associated with ALS, have recently come to attention, showing a
   synergistic connection between mechanisms and organelles (vesicular
   transport, Golgi, endolysosomal and mitochondrial systems) involved in
   the regulation of neuronal trafficking [[143]47].

   Overall, we can speculate that TP53 interaction maps are congruent with
   altered blood-specific transcriptional regulation for two main reasons:
   (i) the impairment of the endolysosomal system implies a dysfunction of
   autophagy, considered one of the key p53-mediated signalling events, as
   already reported for several neurodegenerative disorders [[144]48];
   (ii) the accumulation of extracellular vesicles as vehicles of
   misfolded proteins detectable in ALS patients’ biological fluids, such
   as in blood and cerebrospinal fluid; suggesting their use as clinical
   biomarkers [[145]49].

   As for the healthy subnetwork, although most of the TP53-interacting
   partners were unknown to date, the identified signalling pathways
   correlate well with many responses evoked by p53. Among these are
   indeed found autophagy, tumour suppression, cell cycle and chromatin
   assembly, dysregulation of RNA polymerase III gene and inflammation
   [[146]39]. We also found the expected enrichment of interplaying
   organellar-secretory pathways, demonstrating the relevance of p53 in
   controlling survival mechanisms involving autophagy and endocytosis. An
   interesting target was the COMMD1 gene, whose functions range from
   copper homoeostasis and sodium transport to regulation of the NF-κB
   pathway and expression of hypoxia-inducible factor-1α (HIF-1α)-mediated
   genes [[147]50]. Although there is no evidence for a direct interaction
   with p53, several COMMD1-influenced factors may induce activation of
   the p53 pathway, such as neurotoxicity due to copper accumulation,
   copper-mediated oxidative stress through protein aggregation between
   COMMD1 and mutated SOD1, and altered modulation with nuclear lamin A,
   leading to premature senescence [[148]51]. Therefore, the failure to
   identify the TP53-COMMD1 interaction in the pathological subnetwork
   allows us to postulate that altered regulation of copper homeostasis
   may be one of the main contributors in ALS pathogenesis, as well as one
   of the most studied mechanisms in SOD1 mutant mice to reverse the signs
   of ALS disease by copper therapy [[149]52]. The common downstream
   target of both subnetworks was LCAP/L1CAM-AS1 (lung
   carcinoma-associated 10), a novel long intergenic non-coding RNA
   (lncRNA). As few studies have investigated lncRNA-disease associations,
   we found only one transcriptional analysis (E-GEOD-59612) in the Open
   Targets Platform ([150]https://platform.opentargets.org/), showing
   significant downregulation (Log[2] -1.9; p-value 2.28 × 10-^3) in
   controls compared with glioma patients [[151]53].

2.2.2. SOD1 sub-networks

   In the SOD1 pathological subnetwork, we found a remarkable enrichment
   of genes associated with pathways related to neuronal development,
   namely: ALX4, FOXA1, NEUROG3, PAX5, RORB/NR1F2, SOX13, ZNF2, ZNF280A,
   ZNF513, ZNF555, ZNF594, ZNF747 and ZNF93 ([152]Fig. 4a–b). In contrast,
   genes involved in the role of counteracting oxidative stress, which
   relate to the main physiological function performed by the SOD1 gene,
   are mostly found in the healthy subnetwork ([153]Fig. 4a–b). This
   evidence is compatible with the notion that supersaturated proteins,
   like SOD1, may be are involved in ALS due to their propensity to
   aggregate, rather than because of their biological function [[154]15].
   In this regard, it is known that wild-type SOD1 produces misfolded
   aggregates in spinal cord of ALS patients [[155]54].

Fig. 4a–b.

   [156]Fig. 4a–b
   [157]Open in a new tab

   The SOD1 subnetwork. (a) The SOD1 subnetwork was obtained by including
   all the genes that have P < 1.0 × 10-8 based on their pairwise MI with
   SOD1. For representation purposes, only the first neighbors are shown.
   Red nodes indicate the target genes interacting with SOD1 in the
   pathological network (subGRN1). Blue nodes represent the predicted
   interactions in the healthy network (subGRN2). The size of each circle
   is proportional to the number of the gene interactions. For hubs with
   more than 100 interactions, the exact number of first neighbors is
   shown beside the gene symbol. In subGRN1, SOD1 shows to interact with
   14 genes (ALX4, CREB3, FOXA1, NEUROG3, PAX5, RORB, SOX13, ZNF2,
   ZNF280A, ZNF513, ZNF555, ZNF594, ZNF747, ZNF93), and with 4 genes
   (BCL6, ETV4, TBX6, TSHZ1) in the subGRN2. (b) Analysis of known (book
   symbol) and novel (flag symbol) interacting partners in the
   pathological and healthy subnetworks were schematically summarized.

   Considering FOXA1, this gene is known to regulate midbrain dopaminergic
   neuron development [[158]55], and the NEUROG3 gene controls the spinal
   cord gliogenesis [[159]56].

   Evidence of their interaction has been provided and, in particular,
   co-expression of Neurog3 and Foxa2 has been observed in the mouse
   ventral spinal cord, probably aimed at controlling the correct
   specification of ventral progenitors of serotonergic neurons.

   [[160]57]. The PAX5 gene, known for its involvement in the maintenance
   of neural progenitors in the human brain, has been identified as a
   marker of selectively vulnerable excitatory neurons in the entorhinal
   cortex, also expressed in neurofibrillary inclusions of AD patients
   [[161]58]. Regarding the CREB3 gene, it is strongly associated with the
   expression of several genes involved in endoplasmatic reticulum (ER)
   and Golgi stress response [[162]59]. In contrast, ALX4 and SOX13 are
   genes implicated in embryonic development and cell fate determination
   [[163]60]. Taken together, these findings underscore a lesser known
   activity of SOD1 as a putative neuromodulator of synaptic transmission
   and activation of the transduction pathway involving muscarinic
   acetylcholine receptors, the most abundant receptors type in the CNS
   [[164]61]. Moreover, it is noteworthy that the calculated interactive
   pattern was not based on the mutated form of SOD1, but the gene
   relationships detected involved the SOD1 wild type, as a hub gene, and
   its binding partners that are presumably affected in the pathological
   condition. This could help to understand the possible steps that SOD1
   is able to move in the molecular chessboard, especially in light of the
   pathogenic implication of wild-type SOD1 in ALS cases not linked to
   SOD1 mutations [[165]62].

   In contrast, genes involved in the role of counteracting oxidative
   stress, which is consistent with the main physiological function
   performed by the SOD1 gene, are mostly found in the healthy subnetwork
   ([166]Fig. 4a–b). Among them, we found the BCL6 and ETV4 genes, which
   have not yet been identified as SOD1 partners. The authors of the
   [167]GSE112681 dataset [[168]25,[169]26] used in this study, identified
   the BCL6 gene among the upregulated genes associated with hypoxia and
   inflammation. BCL6, in the context of inflammatory and cell death
   pathways, can determine both activation and inhibition of TP53
   transcription and/or p53 translation, depending on genomic instability
   [[170]63]. The potential BCL6-SOD1 complex could therefore represent a
   novel intermediate of p53-mediated mitochondrial apoptosis [[171]64].
   The role of ETV4 has been more documented in the pathological than in
   the normal context. In particular, apoptosis seem to be triggered by
   upregulation of ETV4 in response to lipid signalling defects, as
   observed in oligodendrocytes of SOD1^G37R mice [[172]65].

   Therefore, we cannot exclude that the unusual interplay among these
   genes is an information inferred by ARACNe-AP simulations. In fact, not
   all impaired gene interactions lead to up or down-regulation, but
   sometimes they are missed in the pathological condition.

2.2.3. ALS2 sub-networks

   The regulatory framework obtained for the ALS2 gene ([173]Fig. 5a–b)
   shows that both subnetworks are enriched mainly in zinc-finger
   transcription factors (ZF-TFs) which control the expression of multiple
   genes involved in cell proliferation and cell fate.

Fig. 5a–b.

   [174]Fig. 5a–b
   [175]Open in a new tab

   The ALS2 subnetwork. (a) The ALS2 subnetwork was obtained by including
   all the genes that have P < 1.0 × 10^−8 based on their pairwise MI with
   ALS2. For representation purposes, only the first neighbors are shown.
   Red nodes indicate the target genes interacting with ALS2 in the
   pathological network (subGRN1). Blue nodes represent the predicted
   interactions in the healthy network (subGRN2). The size of each circle
   is proportional to the number of the gene interactions. For hubs with
   more than 100 interactions, the exact number of first neighbors is
   shown beside the gene symbol. ALS2 shows to interact with 11 genes
   (ADNP, E2F2, FOXP3, SP6, TCF7, ZBTB5, ZFP90, ZIC1, ZNF268, ZNF620,
   ZNF780B) in subGRN1, and with 4 genes (GLIS3, ZNF439, ZNF687, ZNF763)
   in subGRN2. (b) Analysis of known (book symbol) and novel (flag symbol)
   interacting partners in the pathological and healthy subnetworks were
   schematically summarized. n/a = not available information.

   Although the identified ZF-TFs are not all well characterized, we found
   some of them associated with a highly CNS-specific expression pattern.
   In particular, the ZIC1 gene (expressed in the cerebellum [[176]66])
   and the ZFP90 gene (expressed in the caudate nucleus and substantia
   nigra [[177]67]) are found in subGRN1; while the ZNF687 gene (expressed
   in the amygdala, corpus callosum, cerebellum, caudate nucleus, and
   spinal cord [[178]67]) is located in subGRN2. Furthermore, the ALS2
   network analysis revealed at least three genes known to be associated
   with pathological events (ADNP, FOXP3, TCF7), supporting the validity
   of the predictions. In fact, there is ample evidence of the
   neurotrophic function of the ADNP (activity-dependent neuroprotective
   protein) gene, which is implicated in many neurodegenerative disorders
   (AD, PD and ALS) but also in neuropsychiatric disorders, such as
   schizophrenia and autism [[179]68,[180]69]. With regard to FOXP3, its
   upregulation in the T-helper cells of ALS patients has been defined as
   a potential marker of the early phase of the disease [[181]70]. In our
   case, however, the original datasets did not distinguish ALS samples
   according to clinical progression, and so we were unable to prepare
   more time-specific matrices (see 5. Material and Methods).
   Nevertheless, this observation could be interpreted as a useful
   confirmation of blood-specific regulatory signals. Finally, the TCF7
   (Transcription Factor 7) gene has been described as one of the “23
   potential master regulators in ALS motor neuron degeneration”
   [[182]13].

2.2.4. VDAC3 sub-networks

   The networks obtained for the VDAC3 gene ([183]Fig. 6a–b), represent
   the first attempt to characterize and interpret the regulatory pathways
   related to this gene.

Fig. 6a–b.

   [184]Fig. 6a–b
   [185]Open in a new tab

   The VDAC3 subnetwork. (a) The VDAC3 subnetwork was obtained by
   including all the genes that have P < 1.0 × 10^−8 based on their
   pairwise MI with VDAC3. For representation purposes, only the first
   neighbors are shown. Red nodes indicate the target genes interacting
   with VDAC3 in the pathological network (subGRN1). Blue nodes represent
   the predicted interactions in the healthy network (subGRN2). Grey nodes
   are the two common targets shared by both networks. The size of each
   circle is proportional to the number of the gene interactions. For hubs
   with more than 100 interactions, the exact number of first neighbors is
   shown beside the gene symbol. VDAC3 shows to interact with 12 genes
   (BACH1, E4F1, HOXD1, HOXD10, LMX1B, PLAGL1, POU4F3, ZBED5, ZHX2,
   ZNF491, ZNF70, ZNF786) in subGRN1, and with 12 genes (ESR2, FOXN4,
   HES4, HOXB3, IRF5, POU3F3, REXO4, TCF21, ZNF197, ZNF420, ZNF648, ZNF85)
   in subGRN2. Only two common targets (grey nodes) between GRN1 and GRN2
   are found (THRA, ZNF546). (b) Analysis of known (book symbol) and novel
   (flag symbol) interacting partners in the pathological and healthy
   subnetworks were schematically summarized. n/a = not available
   information.

   In the pathological subGRN1 subnetwork, we found some genes
   specifically implicated in defining the characteristics of the spinal
   motor neurons (HOXD1, HOXD10, LMX1B) and in the development of the
   oculomotor system (BRN3A/POU4F, ZHX2) [[186]71]. Indeed, HOXD1 and
   HOXD10 genes are known to coordinate the dorsal and lumbar
   specification of motor neurons [[187]72] ([188]Fig. 6a–b). LMX1B is an
   evolutionarily conserved TF that, in association with LMX1A,
   orchestrates the complex temporal and spatial gene expression of spinal
   cord dorsal horn neurons and midbrain dopaminergic neurons [[189]73].
   The POU4F1 gene (also known BRN3A) regulates the morphology and
   functional diversity of retinal ganglion cells [[190]74], while the
   ZHX2 gene functions as a transcriptional repressor of the developing
   retina [[191]75]. In addition, a small group of genes in the VDAC3
   network (BACH1, E4F1, PLAGL1) has been recognized to be part of the p53
   pathway. In particular, BACH1 is a transcriptional suppressor that
   heterodimerizes with small Maf proteins and binds to Maf recognition
   elements in the promoters of target genes related to the oxidative
   stress response [[192]76]. E4F1 is another transcriptional repressor,
   which can also function as a ubiquitin ligase mediating the
   ubiquitination of chromatin-associated p53 and directly controlling
   genes involved in mitochondrial functions and cell-cycle checkpoints
   [[193]77], its deficiency results in oxidative stress-mediated cell
   death. The PLAGL1 gene, also known as Zac1, seems to be implicated in
   neural plasticity mechanisms in the mouse model [[194]78,[195]79] as
   well as in the mitochondrial apoptosis, being a direct target of p53
   [[196]80]. In the healthy subGRN2 subnetwork, we found three genes
   (ESR2, HES4, IRF5) functionally related to neuroinflammatory
   mechanisms. ESR2 gene encodes a member of the estrogen receptor family
   that influences mitochondrial membrane permeabilization in response to
   toxic events, such as ischemic brain injury and neurodegeneration
   [[197]81]. Epigenetic alteration of the HES4 gene promoter is
   associated with striatal degeneration in post-mortem brains of
   Huntington disease patients [[198]82]. The IRF5 gene is a known key
   mediator of innate and acquired immunity able to regulate numerous
   immune cells and pro-inflammatory cytokines [[199]83]. Interestingly,
   its involvement in mediating neuronal injury was identified in a recent
   work, showing that knockdown of IRF5 in NSC-34 cells inhibits apoptosis
   and mitochondrial redox homeostasis [[200]84].

   The subGRN2 also showed a group of genes associated with CNS
   development, such as FOXN4, implicated in retinogenesis and spinal
   neurogenesis [[201]85]. In the same network the following genes have
   been also found: HOXB3, mainly investigated in PD mice models for its
   role in the development of dopaminergic neurons [[202]86]; POU3F3, also
   named Brain-1, a well-known inhibitor of apoptosis and involved in
   neural development; it appears to play a central role in developmental
   delays and/or intellectual disability [[203]87]. Moreover, the ZNF197
   gene regulates the DNA methylation pattern in the placenta, which is
   influenced by copper metabolism [[204]88], and ZNF420 (also named Apak)
   is a KRAB-type zinc-finger protein that negatively regulates
   p53-mediated apoptosis [[205]89]. Also, the involvement of the THRA
   gene in human cortical development has recently been discovered, with
   neurocognitive and sensory-motor deficits in cases of its
   downregulation [[206]90]. These results support the known role of
   mitochondrial ROS sensor attributed to VDAC3 [[207]36] together its
   involvement in the organelle functions. These VDAC3 functions are well
   correlated with the oxidative stress linked to mitochondrial
   dysfunction and neurodegeneration [[208]91]. Specifically, the presence
   of overoxidized cysteines in VDAC3 from ALS samples is reported
   [[209]37] while in physiological contexts the single cysteines of VDAC3
   are devoid of such modifications, thus able to form S-S bonds with
   protein interactors [[210]36,[211]37]. This would suggest that
   oxidative damage typical of ALS produces structural and functional
   defects at VDAC3 capable of both modifying the pool of its interactors
   and limiting its role as a sensor of mitochondrial oxidative state.

   Overall, the analysis of the VDAC3 subnetworks, pathological and
   healthy, suggests that VDAC3 may have a role in neuroinflammatory
   mechanisms but also in the development of spinal motor neurons and the
   oculomotor system. These findings complement the expression data from
   the Expression Atlas repository ([212]www.ebi.ac.uk/gxa/home) which,
   for humans, report higher levels of VDAC3 gene in the cerebellum,
   diencephalon and spinal cord [[213]79].

3. Discussion

   Over the past 15–20 years, different computational methods have been
   developed to infer gene regulatory networks for the analysis of
   transcriptomics data from different sources [[214][92], [215][93],
   [216][94]]. Of particular interest is the ability of differential
   co-expression analysis methods to identify regulatory genes underlying
   various phenotypes. These methods are mostly useful for identifying
   disease regulatory genes in pathological phenotypes [[217]95], but also
   offer interesting insights for developing effective therapeutic
   approaches. Unfortunately, to date, studies aimed at inferring gene
   regulatory networks in ALS are lacking and no synthetic biology
   approach has been used to address this topic.

   In this study, adopting a reverse engineering strategy based on the use
   of the ARACNe-AP algorithm, we analysed transcriptomic data from the
   blood of a large sample of ALS patients and their controls. By this
   approach we produced novel data on ALS and already known information
   that strengthen our study. Exactly, we inferred 7 different ALS
   regulatory genes networks (GRNs) for many phenotypic conditions of the
   disease. Specifically, GRN-1 and GRN3-7 were relative to survival time,
   site of onset (spinal or bulbar), and probability of developing the
   disease according to the sex. In each of these transcriptional
   regulatory networks, we detected the most relevant ALS biological
   pathways and potential ALS-promoting mechanisms, some of which
   completely novel. Among them, we identified signs of increased
   vulnerability in male patients and the activation of the radiation
   response-related network as a disease-specific signature for female
   BALS patients.

   In this regard, one study demonstrated reduced metabolism in BALS
   patients [[218]96], while another showed in SALS upregulation of the
   miR-23a-3p, a microRNA associated with motor neurons loss in BALS
   [[219]97].Taken together, these findings highlight the relevance of the
   distinction between SALS and BALS for predicting disease outcome and
   assessing treatment.

   Furthermore, we divided the large-scale data into smaller and more
   informative network segments, obtained thus subnetworks of some hub
   genes (TP53, SOD1, ALS2, VDAC3) that could function as potential
   pathological drivers of ALS. By investigating the pathological and
   healthy subnetworks of the selected hub genes, we discovered different
   disease-related pathways and, interestingly, their interconnection with
   p53-mediated pathways. This result suggests a potential neurovascular
   response to ALS neuroinflammation.

   In addition, data from ALS blood [[220]98,[221]99] confirm that patient
   blood cells respond to systemic and local signals with changes in gene
   expression similar to those found in the spinal cord or brain of ALS
   patients. This underlines the value of blood for diagnostic purposes,
   to assess the onset and progression of ALS. Moreover, the research
   strategy used in this study responds well to the growing need to study
   the disease with alternative approaches, also aimed at using
   investigation methods that do not involve animal models. In fact, the
   limitations of preclinical mouse models in recapitulating the
   complexity of the human disease are well known.

   Overall, the synthetic biology approach applied in this study has
   allowed to successfully address some of the new challenges of ALS
   research: the analysis of big data released in patient databases;
   developing a broad and detailed overview of the molecular mechanisms of
   the disease; reducing animal studies; paving the way for the search in
   the blood of ALS “gold standard” biomarkers; laying the foundations for
   the design of new potential therapies for ALS.

3.1. Identification of biological pathways and topological key motifs in ALS
GRNs

   The prevalence of certain biological processes in a network over others
   may denote the activation of a regulatory axis with a more or less
   significant impact on the phenotype. Our analysis revealed few some
   differences and altered functions in pathway enrichment, with some
   punctual changes resulting in a noticeable regulatory shift of specific
   groups of hub genes. In particular, pairwise comparison of pathological
   (GRN1), healthy (GRN2) and disease control (GRN3) networks revealed
   clear differences in the pathway representation of affected and healthy
   individuals ([222]Table 1a). Indeed, in GRN2 we mainly observed an
   enrichment related to common circulatory pathways (e.g. immune response
   and platelet aggregation), confirming already known data [[223]28].
   Interestingly, in GRN1 and GRN3 the top pathway found was 'radiation
   response’ (GO:0009314). Under this macro-category are the response
   pathways to light stimuli and ionizing radiation, comprising almost 60
   genes encoding for the antioxidant, anti-inflammatory and DNA repair
   machinery. This finding underlines the role of radiation-induced
   oxidative stress that correlates well with certain inflammatory
   mechanisms in ALS, such as altered lipid and protein metabolism and
   energy imbalance [[224]100]. The identification of this peculiar
   oxidative stress condition in blood samples from ALS patients supports
   the so-called “neurovascular scenario”, according to which alterations
   in gene expression may be associated with some altered ALS signalling
   pathways due to blood-brain barrier damage [[225]101]. This suggests
   that the adoption of diagnostically valid and non-invasive blood
   biomarkers related to genes of the ALS regulatory network, could be
   advantageous not only to initiate preventive actions, but also as an
   aid in monitoring the course [[226]75]. Furthermore, it is known that
   low-dose radiation can induces increased oxidative metabolism and redox
   imbalance of the central nervous system (CNS) microenvironment,
   mitochondrial dysfunction and protein degradation, leading to neural
   apoptosis in many neurodegenerative and inflammatory diseases
   [[227]102]. Among the 60 genes we identified by GO analysis ([228]Table
   1; [229]Table S2) there are 3 (CCND1, MMP1 and MMP3) suspected to play
   a role in ALS and related to radiation-induced oxidative stress. In
   particular, the cyclin D1 (CCND1), not only is a radiation-responsive
   gene but also interacts with the FUS gene by blocking its translation.
   This action is known to be toxic for motor neurons [[230]103].
   Regarding the matrix metalloproteinase genes, MMP1 and MMP3, they
   represent potential biomarkers of motor neuron degeneration since their
   deregulation is associated with ALS [[231]78]. The identification of
   reliable biomarkers is one of the main goals of ALS research [[232]25].
   ALS is still an incurable disease, with few treatment options useful
   only to slightly slow down its course. Having ‘gold standard’
   biomarkers would certainly help the development of new drugs for this
   disease.

   In addition, we performed a topological analysis of the inferred GRNs,
   since it is known that the structure of a network can be as informative
   as its complexity. Of particular interest is the analysis of
   disease-associate networks because pathogenesis usually induces
   topological variations useful to distinguish the pathology [[233]104].
   Our results shows that the genomic data from GRN1-GRN3 reflect well the
   'topological’ characteristics of their networks ([234]Fig. S2). In
   particular, a common L-shaped spatial conformation of gene interactions
   in all three networks (GRN1-GRN3) is easily recognizable. This could
   denote a hierarchical pattern of molecular signals expressed in blood
   as a gene signature of the target tissue. Moreover, the density and
   orientation of genes in GRN1 overlap with those in GRN3, sharing a
   similar arrangement of the central body and side branches ([235]Fig.
   S2). These data reveal thus features in common between the profiles of
   ALS patients (i.e. the GO:0009314 “radiation response” pathway) and
   those of controls. Also, some transcripts of the healthy regulatory
   profiles in GRN2 are missing or severely impaired in the pathological
   GRN1 network and the GRN3 fusion network ([236]Fig. S2). These results
   are in line with what is known, in general, about the topology of
   networks. Namely, that the interacting components of a network with a
   constant topology have evolved to perform specific cellular functions
   [[237]105]. Although parameters that do not change the topology of a
   network may equally influence its response dynamics [[238]106].

   Concerning the GO signal (GO:0009314) of “radiation response” was not
   found in SALS networks (GRN4) or “male ALS patients” (GRN7), enriched
   instead in pathways related to protein metabolism. The signal “response
   to radiation” was instead found in both BALS - patients with reduced
   survival (GRN5) ([239]Table 1b) and “female ALS patients” (GRN6)
   networks ([240]Table 1c) normally less affected than men by the disease
   and more prone to spinal onset ([241]Figs. S1a–b). This leads to the
   hypothesis that the gene enrichment characteristic of BALS women may
   derive from the impairment of the protective effects of antioxidant
   genes, essential to attenuate/counteract the deleterious effects of the
   disease [[242]79]. Therefore, radiation-induced oxidative damage would
   seem to have an important role in the pathogenesis of ALS. There are
   several reports that associate ALS with defective DNA repair. For
   example, FUS pathology has been shown to disrupt DNA damage response
   signalling (DDR) [[243]107,[244]108]. While an important role in the
   NHEJ mechanism of DNA double-strand break repair and mismatch repair
   was discovered for TDP43 [[245]109,[246]110].On the other hand, loss of
   key DDR components leads not only to accumulation of DNA damage but
   also to marked proteotoxic stress and genome instability [[247]111].
   The latter condition can be induced by radiation exposure, and
   interestingly, it has been reported that low-dose radiation can
   increase the risk of neurodegeneration [[248]112]. Regarding the
   radiation-induced oxidative damage, is it not surprising that
   edaravone, prescribed to ALS patients for its anti-oxidant and
   anti-inflammatory effects, was studied to assess its ability to protect
   cells from the oxidising action of radiation [[249]113].

   As for GRNs 1–3, also for GRNs 4–7, the ontological correspondence
   coordinates well with the physical model of the networks, a: GRN4
   resembles GRN6 and GRN5 resembles GRN7 ([250]Fig. S2). This shows that
   the networks reconstructed by ARACNe are plausibly matched and are able
   to predict two models of the same disease but distinct for phenotypic
   profile, a less resilient one composed of male BALS patients and a more
   resilient one composed of female SALS patients. These data correlate
   well with findings from other studies focusing on the influence of sex
   and site of onset in ALS progression. Moreover, they hint at the
   importance of sex-specific factors in understanding the pathophysiology
   of ALS, as also suggested by others [[251]114]. Interestingly, a recent
   study on iPSC motor neurons showed that few genes are differentially
   expressed between ALS patients and healthy controls, although the
   number of differentially expressed genes in ALS is much higher in males
   than in females. In particular, in ALS males, an enrichment of stress
   response pathways related to TNF, NF-kB, and radiation exposure was
   found. Instead, in ALS females, pathways related to ribosomal function,
   oxidative phosphorylation, as well as mitochondria were more involved
   [[252]115]. Also, it has emerged the idea that sexual dimorphism in ALS
   results from interactions between the central nervous system and
   peripheral organs, implicating the vascular, endocrine, metabolic,
   musculoskeletal, and immune systems, which are physiologically
   different between males and females [[253]116].

   In order to identify putative prognostic markers of ALS specifically
   associated with patients' survival time, we implemented disease
   stratification by enhancing the pathway-level analysis of gender
   responses to the type of ALS site of onset. The obtained results
   suggest the existence of a lowest common denominator among the
   different phenotypes associated with ALS patients [[254]118]. By
   analyzing the 61 gene signatures for predicting survival of ALS
   patients, we showed that the same two markers of poor prognosis
   (SLC40A1, CLEC7A) are found in the spinal, bulbar and female networks.
   This confirms the accurate inference of the algorithm in predicting
   significant regulators of gene perturbations. Contextually, we also
   questioned the significance of the absence of the CLEC7A marker in the
   male network. This gene is a member of the superfamily of proteins
   containing the C-type lectin/C-type lectin-like domain (CTL/CTLD),
   typically expressed by cells of the innate immune response. Recent
   evidence in ALS patients [[255]119] and in mouse models of ALS
   [[256]120] evaluated it as a key gene of microglial activity,
   associated with short survival and severe condition in a male-dependent
   manner. The results obtained surprised us because, considering the ALS
   case history, we expected more worse prognosis targets in the male
   network than in the female network. The Cox proportional-hazards model
   used by Swindell et al. [[257]25] identifies gene predictors of ALS
   survival with a different approach from ours. In fact, ARACNe-AP
   analysis can obtain information on the transcriptional relationships of
   hub genes, not on changes in gene expression levels [[258]24]. In
   particular, the construction of the matrix of gender-distinct disease
   profiles allowed us to identify some subtle regulatory differences more
   stringently. In this regard, loss of CLEC7A could be an aggravating
   factor that exacerbates the vulnerability of male patients. However,
   this is a hypothesis that requires further study to be confirmed. In
   any case, our results are in line with other evidence on the role of
   this gene and lead to narrow the assessment of a differential
   neuroinflammatory response in male and female due to the absence of a
   neuroprotective mechanism caused by CLEC7A. The inferred GRN thus
   reveals some hidden pathways of initiation, suggesting that attention
   should be turned to as yet under-investigated aspects [[259]5,[260]69].

3.2. ALS transcriptional networks of TP53, SOD1, ALS2 and VDAC3 genes

   In silico analysis of the subGRNs sub-networks of the four selected hub
   genes produced interesting and useful data to better understand ALS.
   First, within the individual subnetworks deduced for each hub gene, we
   found several novel target genes, not yet associated with ALS and
   putatively useful as disease biomarkers. Also, for each analysed hub
   gene, the relative involvement in specific pathways was highlighted,
   many of which related to mechanisms of neurodegeneration, some directly
   related to ALS, others more generic but no less interesting. For
   example, the neuronal development or spinal motor neuron pathways found
   associated, respectively, with the pathological subnetworks of SOD1 and
   VDAC3. Interestingly, VDAC3 also seems to be involved in the
   development of the oculomotor system, the only apparatus whose muscles
   are not affected by ALS. TP53 appears to promote some preferential
   pathways involving autophagy and neuronal apoptosis in the ALS contest,
   which are also correlated to endolysosomal and mitochondrial metabolism
   genes [[261]30]. Another interesting result emerged from the
   elaboration of the ALS2 gene subnetwork. Indeed, its transcriptional
   map was the one with the highest number of ALS-related hits. A data of
   great interest for drug design on new target molecules.

   Certainly, a strong finding from this analysis is that we have revealed
   the interconnection of the four selected hub genes (TP53, SOD1, ALS2,
   VDAC3) with p53-mediated pathways, which suggests a potential
   neurovascular response to ALS neuroinflammation. This is confirmed by
   the results of other studies that identify p53 as a key regulator of
   neurodegeneration caused by ALS-associated genes [[262]33,[263]34]. Our
   data therefore reinforce the previous results and, in addition, provide
   evidence that the aforementioned mechanisms may have a primary role in
   the onset of the ALS, both sporadic and genetic. Furthermore,
   neuroinflammation also correlates well with other pathways associated
   with the 4 hub genes identified by us, namely the response to radiation
   damage or to the oxidative stress associated with it. In this regard,
   VDAC3, functioning as an oxidative stress sensor, could well guide this
   response by controlling various pathways involved, including
   mitochondrial metabolism itself [[264]91]. It is therefore possible
   that DNA damage, as well as the related genomic instability, may have a
   significant role in the onset of the disease.

   Overall, the information on the molecular mechanisms associated with
   ALS resulting from this analysis, in addition to being overall
   consistent with the data reported in the literature, have the value of
   providing a new integrated vision of the transcriptional regulators of
   ALS. A vision that not only helps to broaden knowledge about the
   disease but is also useful to identify certain molecular features on
   which to focus future studies also with the intent of discovering
   effective ALS biomarkers.

3.3. Study limitations and strengths

   We are aware that our study is burdened by some weaknesses. Probably
   the main ones are related to the quality and quantity of input data
   (e.g. GEP) used to generate the GRNs. In this regard, the limited
   number and accessibility of repository datasets [[265]121] made it
   difficult to collect a sufficiently large sample to prepare input
   matrices for ARACNe-AP with case and control GEPs from different tissue
   samples. In particular, the limited availability of GEPs based on
   human/murine spinal cord and neurons prevented the expected
   cross-validation between tissues and species, forcing us to scale down
   the initial research design by looking for information from microarray
   data from blood of ALS patients. However, there are several research
   lines [[266][122], [267][123], [268][124], [269][125], [270][126],
   [271][127]] that demonstrate that blood is a promising alternative to
   the use of other biological samples. Also, some studies have evaluated
   the link between pathological processes in the brain and
   transcriptional levels in blood cells of ALS subjects
   [[272]99,[273][127], [274][128], [275][129]].

   In fact, although not directly affected by the disease, blood is still
   as “informative” as other tissues considered more suitable (e.g.
   cerebrospinal fluid, CSF) thanks to the continuous exchanges at the
   blood-CSF barrier. In addition, blood is easily available and sampled
   by patients and therefore highly suitable for meta-analysis studies
   with statistically valid results, unthinkable to obtain from CSF or
   spinal cord. Therefore, the transcriptional signature in peripheral
   blood cells is a useful surrogate for evaluating CNS gene expression.
   This makes blood an excellent biological sample in which to search for
   ALS biomarkers also useful to speed up the diagnosis, especially in the
   early stages of ALS, and to follow patients in the follow-up of
   treatment.

   Another gap of this study stems from the limited information retrieved
   on the exact number of sporadic and familial patients, genetic
   variations, sample ID associated with sporadic and familial profiles of
   the [276]GSE112681 dataset we used. Probably, this constraint hindered
   the opportunity to perform more accurate inferences and interpretations
   on the stratification of the ALS population and thus elaborate more
   deeply the regulatory mechanisms of the genotype-phenotype correlation.
   However, we believe that most of weaknesses may be compensated by the
   size of the final sample analysed and the potential of the algorithm
   used.

   Among the strength points of our GRN analysis is the ability of
   ARACNe-AP to model the complex nonlinear relationships between genes.
   Indeed, by capturing the interactions between multiple TFs and their
   target genes, GRN analysis can reveal how perturbations in one part of
   the network propagate through the overall system, leading to disease.
   This is particularly important in ALS for two reasons. First, because
   the effects of genetic mutations may not be immediately evident at the
   level of individual genes, becoming more evident when considering the
   broader regulatory context. Second, it is particularly useful if the
   input data, as in the present study, lack some anamnestic details
   without affecting the robustness of the reverse engineered models.

4. Conclusions

   In this work, blood networks of from ALS patients and control subjects
   were reverse engineered by ARACNe-AP algorithm to determine the
   specific gene interactions of each network, as well as the pathways
   linking the normal state to the ALS condition. The potential of the
   designed reverse engineering framework, based on the processing of
   transcriptome data useful for predicting phenotypic differences between
   pathological and healthy subjects, allowed us to identify a
   considerable number of potential ALS-promoting mechanisms and putative
   transcriptional biomarkers previously unknown. Noteworthy is the
   presence of SOD1 among the hub genes in the blood of non-SOD1 ALS
   patients. Among the most interesting results emerging from our
   predictive study is the characterization of the radiation-induced
   oxidative pathway (GO:0009314), which associated the “female gender
   with ALS” group (less vulnerable to the disease than the “male gender”
   group) with bulbar onset (ALS patients with worse prognosis) with the
   weakest pathway signal among the most significant pathological
   mechanisms usually investigated [[277]117]. Another significant finding
   is the presumed transcriptional cooperation of hub genes playing a role
   in inter-organelle communication. In particular, communication between
   the ER, mitochondria, lysosomes, and Golgi apparatus correlates well
   with the known activity of p53 in counteracting neuroinflammatory
   response [[278]25,[279]130].

   Overall, the originality and importance of the study consist in having
   predicted/identified, for the first time:
     * 1.
       Connections between single molecular mechanisms of ALS related to
       mutations on specific genes;
     * 2.
       Molecular pathways associated with specific ALS phenotypes, related
       to the site of onset and/or to the sex of the individual;
     * 3.
       VDAC3 as a hub gene of GRNs in ALS.
     * 4.
       Putative pathological targets of TP53 in ALS;
     * 5.
       Possible disease biomarkers in the blood of patients, from which to
       design putative therapies against ALS.

   Taken together, our results demonstrate the utility of an integrated
   systems biology approach to obtain a detailed view of the gene
   interaction networks putatively involved in the regulation of the
   molecular mechanisms related to ALS. The proposed approach is not
   limited to this biological context but could also find application in
   other disease contexts.

5. Materials and methods

5.1. The algorithm for the reconstruction of accurate cellular networks -
adaptive partitioning version (ARACNe-AP) pipeline

   In order to map the transcriptional interplay that may contribute to
   ALS pathogenesis, we used the reverse engineering algorithm for the
   reconstruction of accurate cellular networks - adaptive partitioning
   version (ARACNe-AP) [[280]24] and GEPs of the [281]GSE112681 dataset
   [[282]25,[283]26]. These datasets were generated in the Netherlands
   from peripheral blood samples taken from a panel of ALS patients and
   from a control group of neurologically healthy individuals. The
   ARACNe-AP algorithm, developed by Andrea Califano
   [[284]8,[285]23,[286]24] is an advanced computational methodology in
   the field of computational systems biology. ARACNe-AP is designed to
   infer regulatory relationships within biological systems by
   statistically analyzing gene expression data. It utilizes MI and
   statistical significance tests to deduce direct interactions between
   genes, thereby revealing potential gene regulatory networks. This
   algorithm aims to provide a rigorous and data-driven approach for the
   reconstruction of accurate cellular networks, contributing to a deeper
   understanding of biological processes at the genomic level.

   The inference of gene regulatory networks using the ARACNe-AP algorithm
   can be described in a series of 9 steps:
     * 1.
       Data Acquisition and Pre-processing:
          + -
            Begin with gene expression data, typically in the form of
            microarray or RNA-Seq datasets [[287]131].
          + -
            Pre-process the data by normalizing and filtering to remove
            noise and systematic biases.
     * 2.
       MI Calculation:
          + -
            Calculate the MI between pairs of genes. MI measures the
            statistical dependence between two variables - in this case,
            the expression levels of genes.
     * 3.
       Statistical Significance Assessment:
          + -
            Estimate the statistical significance of the MI scores to
            identify meaningful interactions while controlling for false
            positives.
          + -
            Typically, permutation tests or other statistical methods are
            used to determine significance thresholds.
     * 4.
       Network Construction:
          + -
            Create an initial network where nodes represent genes, and
            edges represent MI scores between gene pairs.

   These edges could transduce a signal of:

   Activation: A directed arc from a gene (node) A to a gene (node) B
   indicates an activation effect. In other words, gene A promotes or
   stimulates the expression of gene B. This can occur when gene A
   produces a protein or TF that acts as an activator for gene B.

   Inhibition: A directed arc from gene A to gene B can also represent an
   inhibition effect. This means that gene A suppresses or reduces the
   expression of gene B. This can occur when gene A encodes for a protein
   or TF that acts as a repressor for gene B.
     * -
       Apply a significance threshold to retain only significant
       interactions, thereby reducing the network to a subset of
       high-confidence edges.
     * 5.
       Data Processing Inequality (DPI) Pruning:
          + -
            Apply the DPI principle to remove indirect interactions and
            false positives from the network.
          + -
            DPI pruning ensures that only direct interactions are retained
            in the network.
     * 6.
       Network Refinement:
          + -
            Further refine the network by removing any remaining redundant
            or spurious interactions, using techniques such as the Minimum
            Spanning Tree (MST) algorithm.
          + -
            This step helps to simplify the network structure.
     * 7.
       Network Visualization and Analysis:
          + -
            Visualize the resulting gene regulatory network to gain
            insights into the relationships between genes.
          + -
            Analyse network properties, such as hubs, clusters, and
            central nodes, to identify key regulators and functional
            modules.
     * 8.
       Biological Interpretation:
     * -
       Functionally annotate the genes in the network to understand their
       roles in biological processes.
     * -
       Investigate the potential regulatory pathways and feedback loops
       within the network.
     * 9.
       Iterative Process:
          + -
            Gene regulatory network reconstruction is often an iterative
            process, with refinement based on additional data and
            experimental findings.

5.2. Gene expression dataset analysis

   The transcriptomic data that we used to infer the GRNs derive from 1117
   whole blood RNA samples included in the SuperSeries [288]GSE112681
   (composed by following SubSeries [289]GSE112676 and [290]GSE112680),
   available from the Gene Expression Omnibus (GEO) database
   ([291]www.ncbi.nlm.nih.gov/geo); (see Refs. [[292]25,[293]26]). In
   details, among the 1117 microarray expression profiles reported in
   [294]GSE112681, 397 are patients with ALS, 75 are patients defined as
   ‘ALS-mimics’ who affected by diverse ALS-like conditions (the most
   common diagnoses were benign fasciculations, spinal muscular atrophy,
   and myelopathy), and 645 are neurological healthy volunteers matched
   for gender and age. Samples were hybridized to two different platforms:
   Illumina's HumanHT-12 version 3 and version 4 BeadChips according to
   manufacturer's protocol (Illumina, Inc., San Diego, CA, U.S.A.). Some
   phenotypic data of [295]GSE112681 are reported in Refs.
   [[296]25,[297]26], and these allowed us to prepare different networks
   based on gender (239M/158F), survival status (from one year up to 10
   years); and site of onset (251 spinal-onset vs. 146 bulbar-onset
   samples) ([298]Figs. S1a–b). The gender composition of the original
   cohort study includes 60.2 % males (M) and 39.8 % females (F) of ALS
   patients; 55.3 % M and 44.7 % F in the control group. Regarding the
   genetic mutation screening of familial and sporadic ALS, the C9orf72
   expansion is found in 5.2 % of [299]GSE112676 and 12.8 % of
   [300]GSE112680; however, this feature has not been annotated in the
   corresponding GEO file of the patient. A very particular feature
   reported in both datasets is the diagnostic form of ALS patients, the
   spinal-onset (SALS) and bulbar-onset ALS (BALS). In the [301]GSE112676
   were diagnosed with the 61.4 % of SALS patients and the 38.6 % of BALS
   patients, while the 66.9 % of SALS and the 33.1 % BALS in the
   [302]GSE112680. Moreover, the average age of ALS patients is 63.6
   (55–72 years), and 62.3 for controls (55–69 years). The survival time
   of ALS patients ranges from less than one year to more than 10 years
   (in just a few cases).

   According to the properties of the study sample reported in
   [303]GSE112681, inclusion and exclusion criteria were defined to answer
   to our research question. Hence, the large-screening dataset
   ([304]GSE112681) is eligible (inclusion criteria) as ARACNe-AP
   algorithm requires ≥100 GEPs to build GRNs [[305]24]. With the aim to
   ensure the representativeness of ALS population and the statistical
   power of the sample-specific networks analysis [[306][132], [307][133],
   [308][134]], we assembled ARACNe-AP matrices to generate GRNs using a
   reduced set from the original entire sample of 1117 profiles, selecting
   794 expression profiles exclusively associated to patients with
   confirmed ALS diagnosis (inclusion criteria). Therefore, the previously
   mentioned 75 ALS-mimics (exclusion criteria) were excluded from the
   current study sample.

5.3. Gene network inference

   For optimal analysis, ARACNe-AP meets two certain requirements, large
   dataset of GEPs (≥100), as first input, and the list of gene regulators
   (e.g. TFs) of the study organism [[309]24], as second input.

   The consensus network inferred with ARACNe-AP was built using a subset
   of 794/1117 GEPs ([310]GSE112681) [[311]25,[312]26], and a list of 1635
   human TFs available from database of Terrence Donnelly Centre for
   Cellular and Biomolecular Research (CCBR), University of Toronto (1.01
   version) (humantfs.ccbr.utoronto.ca/download.php) ([313]Table S1)
   [[314]27] to determine whether the inferred activities of each
   regulator were significantly correlated using a conservative MI
   threshold (p ≤ 1.0 × 10-8, i.e., p ≤ 0.05 Bonferroni corrected for all
   candidate interactions).

   With regards of data normalization, each gene matrix prepared for
   ARACNe-AP contains as input data quantile normalized values of GEPs
   processed as reported in [315]GSE112681 [[316]25,[317]26]. In details,
   the used raw microarray data files contain the raw intensity
   measurements (raw counts) directly from the microarray scanner where
   the proper gene annotations are available to map probe sets to gene
   IDs. Background correction methods are used to correct for this noise.
   Microarray data have been normalized to make the distributions of probe
   intensities similar across all arrays. It has been used the standard
   procedure of the quantile normalization. Microarray expression have
   been log2-transformed to stabilize the variance and make the data more
   normally distributed. This step is very important because ARACNe-AP
   uses mutual information, which assumes that the data are continuous and
   appropriately scaled. In practice the data were normalized using
   quantile normalization with IlluminaGUI in R. After quantile
   normalization, gender was checked by expression of gender-specific Y
   chromosomal probes (JARID1D and RPS4Y1). Samples with gender mismatches
   were excluded. Subsequently expression values were log2 transformed and
   quantile normalized. Arrays were projected along the first and second
   principal component and outliers were dismissed. Moreover, the genes
   with very low variance across samples have been filtered out since they
   are unlikely to contribute meaningful information to network inference.
   Finally, using k-nearest neighbors, missing values were identified in
   the data set, samples or genes with missing values were removed. It is
   crucial to ensure that the final input file has no missing values, as
   ARACNe-AP requires complete data for accurate MI calculation.

   As previously specified, the clinical group of ALS-mimics of the
   original dataset was not included to avoid potential confounding
   variables in the regulatory analysis. The network inferred with
   ARACNe-AP is a directed graph that consists of a series of arches
   representing the connections between regulators (predefined list of
   input genes, e.g. TFs) and their targets. Each connection is computed
   by the correlation measure of MI that indicates the pairwise connection
   strength. The huge complexity of the network output was filtered using
   the statistical parameter of 95th perc of MI to select gene connections
   with higher-degree MI corresponding with the strongest gene regulators
   [[318]135] ([319]Table S2). To identify hub genes and their related ALS
   and healthy control subnetworks, subGRN1 and subGRN2 respectively, we
   used the top 5 % and 30 % ranked arcs of the complete network between
   the regulators and their targets [[320]24].

   We prepared different matrices with 794 GEPs from [321]GSE112681 to
   build the following seven GRNs: ALS patients (both genders, SALS and
   BALS condition; n = 397) named “pathological network” (GRN1); only
   controls (n = 397), defined as “healthy network” (GRN2); all samples of
   GRN1 and GRN2 (n = 694), defined as “disease-control network” (GRN3);
   only SALS patients (n = 251), called “SALS network” (GRN4); only BALS
   patients (n = 146), “BALS network” (GRN5); only female patients (SALS
   and BALS including; n = 158), “female ALS network” (GRN6); and, only
   male patients (SALS and BALS including; n = 239), “male ALS network”
   (GRN7).

   Before running all GRNs simulations, the hub gene list ([322]Table S2)
   was pruned of probes associated with uncharacterized or unknown
   functions (e.g. LOC, KIAA, FAM, FLJ, DKFZp, C#orf) to reduce random
   noise [[323]136]. In addition, before generating the two
   gender-specific GRNs (GRN6-7), the hub gene lists of GRN6-7 ([324]Table
   S2) were cleared of genes implicated in gonadal differentiation
   [[325]29]. To simulate a temporal signature of disease state in
   relation to the different ALS onset site and gender, all GEPs were
   preprocessed stratifying patients based on the survival duration, as
   illustrated in [326]Figs. S1a–b. The clustering of survival data was
   accomplished to compute GRN4-5 and GRN6-7, in order to analyse
   differences in the gene regulatory pattern among these GRNs and uncover
   possible critical mediators or interdependencies of genes/biological
   pathways involved in the ALS progression mechanisms and prognostic
   outcomes.

5.4. Pathway analysis

   We analysed the list of ARACNe-AP inferred genes at 95th perc of MI
   ([327]Table S2) by using Gene Ontology (GO) (geneontology.org/). We
   also used the hierarchically organized underlying GO-terms calculated
   by Fisher's exact test and using the Bonferroni correction for
   P < 0.05, and the Panther Overrepresentation test (release 20210224;
   Panther v.16.0 released 2020-12-01) in order to find the biological
   pathways of each reconstructed network [[328]137].

CRediT authorship contribution statement

   Xena G. Pappalardo: Writing – review & editing, Writing – original
   draft, Formal analysis. Giorgio Jansen: Writing – original draft,
   Visualization, Software, Formal analysis, Data curation. Matteo
   Amaradio: Visualization, Software. Jole Costanza: Software, Data
   curation. Renato Umeton: Software, Data curation. Francesca Guarino:
   Writing – review & editing, Supervision. Vito De Pinto: Writing –
   review & editing, Supervision, Funding acquisition. Stephen G. Oliver:
   Writing – review & editing, Writing – original draft, Supervision,
   Conceptualization. Angela Messina: Writing – review & editing, Writing
   – original draft, Supervision, Funding acquisition, Conceptualization.
   Giuseppe Nicosia: Supervision, Methodology, Conceptualization.

Data availability statement

   All data generated or analysed during this study are included in this
   published article (and its Supplementary Information files).

Ethical approval

   Not applicable.

Funding

   This research was funded by the European Union-Next Generation EU -
   PRIN 2022 PNRR call, Project No. P20225J5NB, to Angela Messina.

Declaration of competing interest

   The authors declare that they have no known competing financial
   interests or personal relationships that could have appeared to
   influence the work reported in this paper.

Acknowledgements