Graphical abstract

   graphic file with name fx1.jpg
   [25]Open in a new tab

Highlights

     * •
       HistoMIL: a package that streamlines deep learning tasks on
       histopathology slides
     * •
       Comprehensive preprocessing and efficient implementation of MIL
       algorithms
     * •
       Up to 85% AUROC in predicting hallmark gene expression in breast
       cancer H&E slides
     * •
       Cell cycle-related gene expression was most easily captured in
       pathology slides
     __________________________________________________________________

   Histology; Artificial intelligence; Machine learning

Introduction

   Histopathology slides stained with hematoxylin and eosin (H&E) are
   widely regarded as the gold standard for diagnosing cancer and other
   diseases. Deep learning (DL)[26]^1 based approaches have demonstrated
   remarkable potential for reproducing the workflows of human experts
   employing such slides in a variety of tasks, e.g., diagnosing cancer
   and classifying tumor types[27]^2^,[28]^3; segmenting sub-regions at
   the pixel level to identify nuclei or tissue boundaries[29]^4^,[30]^5;
   and predicting important clinical metrics such as survival[31]^6,
   recurrence rates[32]^7^,[33]^8 and response to treatment.[34]^9 Several
   studies have shown that DL-based approaches can also predict more
   complex molecular characteristics from whole slide image (WSI)
   datasets, such as microsatellite instability, DNA damage repair
   deficiencies, mutations or gene expression
   patterns.[35]^10^,[36]^11^,[37]^12^,[38]^13

   Although transfer learning (TL) was initially widely used for WSI
   classification tasks, recent research has introduced multiple instance
   learning (MIL) as an alternative machine learning (ML) framework. MIL
   is designed to learn from bag-level labels rather than the more precise
   instance-level labels, and has been shown to outperform TL in certain
   tasks like survival prediction.[39]^14^,[40]^15 However, implementing a
   MIL-based pipeline to predict molecular labels from WSI datasets
   presents significant challenges.

   Digital pathology WSI datasets consist of large images scanned from
   original diagnostic tissue slides stained with H&E, often containing
   billions of pixels in a single file. This brings about specific
   challenges when applying MIL-based methods: (1) WSI files cannot be
   directly read by widely used image processing packages such as
   PIL,[41]^16 (2) classic architectures of a neural network are designed
   for lower resolution (i.e., 224 × 224 pixels[42]^17), and (3) loading
   an entire batch of WSIs during training is almost unmanageable and
   untraceable due to the limited GPU memory. There are various strategies
   to tackle these issues, and from a toolkit design perspective, they can
   be categorized into slide reading, preprocessing and ML oriented
   packages. While earlier implementations primarily focused on
   user-friendly WSI reading APIs, an increasing number of packages are
   now being designed to meet the requirements of ML algorithms. However,
   the existing pipelines do not fully exploit the capabilities of DL
   approaches.

   WSI reading packages/libraries such as OpenSlide,[43]^18
   BioFormats,[44]^19 and HighDicom[45]^20 offer efficient tools for
   accessing raw WSI data formats, yet they exhibit shortcomings when
   integrated with neural network training. Notably, QuPath allows a basic
   ML pipeline with packages like scikit-learn[46]^20 through its
   graphical user interface, but it is not designed for DL or MIL
   algorithms. Pre-processing packages like PyHIST,[47]^21
   deep-histopath,[48]^22 Multi_Scale_Tools,[49]^23 and ASAP[50]^24
   include common processing steps like reading multi-scale
   information,[51]^25 tissue segmentation or patching. However, designing
   a comprehensive and broadly applicable ML-oriented package that
   encompasses basic components like WSI reading, patch extraction and
   color normalisation[52]^26^,[53]^27 remains difficult. Some
   preprocessing packages, such as CLAM[54]^28 and deep-histopath,[55]^22
   deliver quality preprocessing pipelines but may be restrictive when
   integrated with other algorithms or different datasets.

   ML oriented packages like DeepMed,[56]^27 MONAI[57]^29 or
   TIAToolBox[58]^25 streamline the training process of DL models. For
   instance, MONAI extends PyTorch’s capabilities for medical data and
   imaging, providing specialized AI model architectures, transformations
   and utilities tailored for specific purposes. However, its core code
   does not specifically optimize for H&E data or various WSI formats.
   Packages like PathML[59]^30 aim to deliver richer content for their
   users by integrating more extensive models such as tissue segmentation.
   Also, researchers can easily and quickly train DL models on WSI
   datasets by using the TL protocol. For some of the open source
   packages, various useful tools have been integrated together, e.g.,
   cell segmentation[60]^30 or graph aggregator.[61]^25 Packages like
   TIAToolBox also offer improved scalability through their module design
   and unit-testable code.[62]^25 However, a significant challenge in the
   context of TL arises when dealing with molecular-level labels lacking
   pixel-level annotations. This leads to reliance on pseudo-labels,
   potentially affecting model performance or generalization
   capabilities.[63]^31 For instance, consider a scenario where a tumor
   region is marked as exhibiting high expression of a particular gene
   that is unique to a subtype of T cells. In such cases, the model being
   used may end up learning intricate patterns for all cell populations in
   the tumor tissue, leading to the misclassification of all cells as
   positive cases. A robust feature encoder can also significantly
   influence the final results in target classification tasks.
   Self-Supervised Learning (SSL) has emerged as a prominent training
   paradigm that leverages the inherent data structure, negating the need
   for extra labeling. In the context of Whole Slide Imaging (WSI)
   classification, SSL holds the potential to create highly specialized
   feature encoders tailored for WSI datasets.[64]^32 Nevertheless,
   employing existing SSL packages requires multiple preprocessing steps
   for WSIs, PNG or JPEG conversion, followed by SSL training with
   invocation and processing. This essentially entails building a complete
   SSL pipeline alongside the core training process for WSI
   classification.

   To address some of the challenges highlighted above, we introduce
   HistoMIL, a DL package based on PyTorch[65]^33^,[66]^34 and PyTorch
   Lightning[67]^35 that simplifies the training of WSI-based
   classification models. HistoMIL provides a complete preprocessing
   pipeline, reducing the complexity of converting raw WSI data into a
   usable format for DL frameworks. We implement multiple MIL models that
   allow flexible training against different target labels, along with
   multiple SSL models that simplify the training of feature encoders. We
   demonstrate HistoMIL’s performance in predicting over 2,000
   cancer-relevant molecular labels in breast cancer H&E slides from the
   Cancer Genome Atlas (TCGA). Additionally, we pinpoint specific pathways
   that can be effectively identified within histopathological tissue.

Results

   To facilitate the implementation of MIL-based workflows for classifying
   histopathology images based on specific molecular labels, we have
   developed a DL Python package entitled HistoMIL. HistoMIL leverages
   PyTorch and PyTorch Lightning to provide an efficient framework for all
   essential steps in DL tasks involving H&E slides. These steps encompass
   preprocessing slide data, training MIL and TL models on the processed
   dataset to predict molecular labels of interest, and visualizing the
   classifier performance and prediction results within the examined
   slides. Below, we showcase the features and capabilities of the
   package, accompanied by examples of its application to large-scale
   cancer datasets.

Overview of the HistoMIL package

   The HistoMIL library is structured into three tiers: data, model, and
   experiment ([68]Figure 1A). The data level encompasses multiple data
   preprocessing steps, including WSI reading, tissue segmentation and
   patching. Similar to other ML-oriented packages, we incorporate
   functions for image normalization and feature extraction to store
   intermediate data and expedite model training. Additionally, we
   introduce a cohort level to handle metadata such as patient details,
   molecular labels, and other information. The design of HistoMIL aligns
   with the evolution of relevant packages in the existing literature
   ([69]Figure 1B).

Figure 1.

   [70]Figure 1
   [71]Open in a new tab

   Overview of the HistoMIL package design and working pipeline

   (A) The diagram illustrates the relevant modules comprising HistoMIL
   and the logical structure organizing them. HistoMIL features four
   levels encompassing file reading, WSI-related processing modules, deep
   learning algorithms, and experiment management modules.

   (B) The relationship between HistoMIL modules and state-of-the-art
   libraries. The proposed package covers the entire workflow of WSI
   reading, preprocessing, and MIL algorithms, a design informed by the
   primary functionalities of other packages in the literature.

   (C) How HistoMIL’s various modules operate within an actual pipeline.
   We enumerate a typical processing workflow, including preprocessing,
   MIL training, and inference components. It is evident that different
   modules correspond to distinct functional parts within the process.

   The data level draws inspiration from various WSI reading and
   processing packages. We introduce a universal reader class that allows
   users to customize different wrappers to access interfaces from other
   readers. This approach mitigates the limitation of relying solely on
   one reader package. For instance, OpenSlide,[72]^18 BioFormats,[73]^19
   HighDicom[74]^20 are typical tools that provide a user-friendly API to
   handle WSI data formats. However, the Python interface of
   OpenSlide[75]^18 does not support certain formats, such as
   OME-TIFF.[76]^18 BioFormats[77]^19 and QuPath[78]^20 support a wider
   array of WSI formats but are heavily dependent on Java environments. In
   HistoMIL, users only need to adhere to our predefined abstract class,
   implementing a unified interface function for various reading
   libraries. This simplifies the conversion of diverse data types into a
   standard numpy matrix and associated metadata. Consequently, users have
   the flexibility to choose their preferred reading library based on
   individual requirements, thereby circumventing issues related to file
   formats. For the preprocessing steps, we have incorporated features
   inspired by the strengths of existing packages. For instance, we
   introduce a multi-threaded preprocessing design inspired by
   CLAM,[79]^28 and we allow users to pre-calculate features for the
   selected patches, a concept drawn from TIAToolBox,[80]^25 to expedite
   MIL model training.

   The model level is divided into two key components: the backbone and
   the MIL method. Our backbone module seamlessly incorporates the
   interface of the timm PyTorch image model,[81]^33 enabling the
   downloading of various backbone network architectures and pre-trained
   parameters for feature extraction. This design philosophy is also
   present in other open source packages like DeepMed[82]^27 or
   TIAToolBox.[83]^25 However, these packages rely on TL methods, which
   consider only the original slide-level or patient-level labels,
   assigning them to each partitioned patch as training label information.
   In TL-based approaches, introducing pseudo-labels becomes necessary
   when pixel level labels are unavailable, but this introduces additional
   noise to the model training process. Training a batch of models may
   also pose challenges for packages like PathML,[84]^30 as models may
   learn misleading information from pseudo-labels and become trapped in
   local optima. In extreme cases, areas containing the cells of interest
   may be extremely small, with only a handful of patches containing
   genuinely relevant information. Our package implements multiple MIL
   methods (ABMIL,[85]^31 DSMIL,[86]^32 TransMIL[87]^36) alongside a
   baseline TL model. The entire algorithm implementation is based on
   PyTorch Lightning,[88]^35 enabling rapid training and fine-tuning of
   models for specific labels. Additionally, our package allows users to
   apply SSL protocols to train the feature extractor from scratch using
   only WSI datasets. Existing libraries often use pre-trained models from
   ImageNet as feature extractors. While recent research demonstrates that
   feature extractor networks trained with SSL on WSIs can enhance final
   performance,[89]^14,[90]^32 this feature is not supported by existing
   ML packages. Furthermore, HistoMIL includes pre-defined parameters as
   default settings, which allows users to quickly try out and scale-up an
   algorithm for different targets. Inspired by packages like
   stainlib,[91]^26 we have included some utility functions to handle data
   augmentation, a technique employed to ensure slides with different
   color ranges are transformed in a uniform way to enable meaningful
   comparisons.

   At the experiment level, configuring different models for various
   datasets and hardware conditions becomes a straightforward task. For
   instance, researchers may wish to to gain a comprehensive understanding
   of how multiple biomarkers are spatially distributed by predicting
   these molecular labels from H&E slides. They might want to predict
   hundreds of target labels in an initial setting. Most existing packages
   lack the ability to efficiently scale up algorithms for multiple
   molecular labels, as they are not designed to deploy a batch of models
   simultaneously. In HistoMIL, researchers can effortlessly initialize a
   series of instances to explore the hyperparameter space defined by the
   customizable para class. A Trainer class instance will initialize a
   PyTorch Lightning module, adapting to hardware requirements
   automatically. Furthermore, third-party tuners like the Ray
   tuner[92]^37 can directly utilize these module instances, making it
   easy to discover the optimal model configuration. [93]Figure 1C
   illustrates which modules are used at different stages of a process
   that employs a MIL algorithm.

   Details on the interplay between modules during preprocessing, SSL and
   MIL can be found in [94]Figure 2. Moreover, [95]Figure 2 demonstrates
   how the HistoMIL package streamlines complex pipelines with minimal
   code. Further information on the implementation of HistoMIL, including
   data-level design, model training and experimental setup, can be found
   in the [96]STAR methods section.

Figure 2.

   [97]Figure 2
   [98]Open in a new tab

   Diagram depicting three primary application scenarios for HistoMIL and
   their corresponding code blocks

   (A) The invocation method for HistoMIL during batch preprocessing. With
   predefined processing parameters, HistoMIL can process an entire
   dataset in one go using just three to four simple commands.

   (B) The code and calling logic for SSL training. HistoMIL facilitates
   SSL on WSI datasets by predefining SSL-related parameters and modifying
   the trainer accordingly.

   (C) The code and module calling process for MIL training. Compared to
   SSL, MIL adds adjustments to the model’s parameters while simplifying
   the training setup process. For users with access to a GPU server,
   numerous models can be trained simultaneously by configuring parameters
   in the bash file.

Applying HistoMIL to predict cancer hallmarks in H&E stained slides

   In this section, we demonstrate the usability and scalability of
   HistoMIL in WSI-level prediction tasks employing state-of-the-art DL
   models. One clinically relevant task in cancer that has been recently
   made feasible by modern computational pathology is assessing the state
   of specific genes or entire molecular pathways directly within
   histopathology tissue slides using DL techniques.[99]^10^,[100]^38 This
   rapid evaluation can aid in diagnosis, prognosis and treatment
   decisions. Typically, such assessments involve gene panel profiling or
   immunohistochemistry (IHC), but these tests introduce time delays and
   additional costs, as they are additional steps following visual
   inspection of routinely stained H&E sections. HistoMIL simplifies this
   process by providing a fast and efficient implementation for predicting
   diverse molecular labels in H&E-stained slides.

   Here, we showcase HistoMIL’s capacity to streamline the entire analysis
   workflow required to predict thousands of cancer hallmarks. We used
   1,012 H&E-stained slides of breast cancer tissue and matched RNA-seq
   data from TCGA to train over 8,000 models for the classification of
   2,487 cancer hallmark genes. First, we processed the WSI dataset using
   pre-defined preprocessing functions. This step involves extracting the
   tissue area from H&E histological images, generating patches
   automatically, and saving them as H5DF and image files. Subsequently,
   we trained the Feature Extractor Network using the SSL module and
   predicted the target gene expression labels using the built-in MIL
   algorithms. As depicted in [101]Figure 3A, the implementation of the
   pipeline is simplified through the use of the generic interface
   provided by HistoMIL. This reduction in complexity significantly eases
   the workload for researchers aiming to expand these methods.

Figure 3.

   [102]Figure 3
   [103]Open in a new tab

   Experimental workflow and performance comparison

   (A) The diagram showcases the complete workflow for utilizing HistoMIL
   in predictive experiments. The raw data from TCGA-BRCA undergoes
   preprocessing, yielding image patches and associated labels suitable
   for MIL processing. The experimental task involves predicting gene
   expression, with the model’s performance displayed as AUROC.

   (B) Comparison of performance distribution among different algorithms
   in predicting the expression of 2,487 cancer-related genes. Each
   distribution contains 2,487 data points. The box centerlines depict the
   medians, and the edges depict the first/third quartiles. TransMIL
   exhibits superior performance relative to the other algorithms.

   (C) The top 30 genes with the highest AUROC scores. In the test set,
   the model’s accuracy in predicting gene expression levels (high or low)
   reaches up to ∼86%. See also [104]Table S2.

   Despite the clear benefits of utilizing MIL algorithms for H&E-based
   predictions, researchers with limited coding experience may encounter
   difficulties when attempting to replicate MIL workflows, where slight
   variations in code may result in vastly different outcomes. Moreover,
   the complexity of handling high-dimensional histological data may
   discourage novice researchers from adopting this approach. In training
   our models, we adhered to the methodologies detailed in the original
   papers of TL,[105]^27 ABMIL,[106]^31 DSMIL[107]^32 and
   TransMIL,[108]^36 and employed the PyTorch Lightning wrapper designed
   by the HistoMIL library for the training process. This package
   simplifies these steps, making them accessible and ensuring
   reproducibility even for those lacking an in-depth understanding of MIL
   algorithms. For each algorithm, we trained over 2,000 models to predict
   the expression levels of selected cancer hallmark genes.

   To demonstrate how HistoMIL handles the complex WSI processing steps,
   we have reproduced the pipeline in an instance notebook. This can be
   found in the Notebooks folder of the HistoMIL GitHub repository (see
   [109]STAR methods). We have used code snippets and condensed the code
   to illustrate the usage of HistoMIL, underscoring its efficiency for
   WSI prediction tasks. Additionally, our model implementation includes
   options for computing an attention score or providing patch-level
   predictions, enabling MIL algorithms to predict the target labels for
   individual patches. In this particular example, we focused on WSI-level
   labels, where one expression value is available per gene and per slide.
   To expedite training and inference times, we did not perform additional
   data augmentation, yet the models were still able to successfully
   predict the target label.

   Predicting gene expression labels in the benchmark TCGA breast cancer
   dataset showcases the potential of using HistoMIL on WSI datasets. All
   of the predictions are based only on WSI data. Our target labels
   encompass the expression levels of 2,487 different cancer hallmark
   genes derived from MSigDB, selected as a benchmark task to demonstrate
   the scalability of our package. We regarded this task as a typical
   weakly supervised learning challenge and opted for the TransMIL method
   to learn the aggregation function. We utilized the ResNet-18
   pre-trained feature extractor as a backbone network, and all training
   procedures were executed on the PyTorch Lightning platform with early
   stopping.

   Our experiments involved two training paradigms (TL and MIL) and a
   total of four different DL algorithms: TL, ABMIL,[110]^31
   DSMIL,[111]^32 and TransMIL.[112]^36 The experimental outcomes
   consistently demonstrated that MIL algorithms generally outperformed TL
   algorithms when using the same feature encoder ([113]Figure 3B).
   Furthermore, variations in performance were observed among different
   MIL algorithms, with a general trend of TransMIL > DSMIL > ABMIL.
   Considering the characteristics of these algorithms, it can be reasoned
   that the introduction of attention mechanisms and neighborhood
   information have played a significant role in driving these performance
   differences. Firstly, the slide-level attention mechanism could allow
   the algorithm to compare the importance of patches within a slide for
   classification. Additionally, TransMIL, which incorporates neighborhood
   information, has yielded the best results in our experiments. This is
   possibly due to the neighbor information capturing the broader changes
   in tissue structure, which may be relevant in the context of a certain
   gene being expressed or deactivated.

   Among the top genes where different models demonstrated good
   performance metrics on the test set ([114]Figure 3C, [115]Table S2)
   were MAD2L1, KIF2C and AURKA. These are cell cycle regulators involved
   in spindle assembly and stabilization, and they promote chromosome
   segregation during mitosis.[116]^39^,[117]^40^,[118]^41 Other
   top-ranking genes such as F2RL2 or FSHB are involved in G
   protein-coupled receptor signaling,[119]^41 while MMP2 and POSTN are
   involved in matrix remodeling and cell adhesion.[120]^42^,[121]^43
   These highly relevant cancer-promoting processes would be expected to
   leave a clear morphological trace in the tissue. Therefore, the
   activity of these genes might be more easily identifiable due to linked
   changes in tumor cell morphology and tissue structure as seen in the
   H&E slides. In addition, since different MIL algorithms show similar
   performance in predicting the expression of these genes, it further
   indicates that the expression patterns of these genes in H&E slides
   have a certain predictability. These patterns can be easily captured
   throughout the entire slide using heatmap gradients of expression, thus
   informing on the spatial distribution of activity for a specific gene
   throughout the entire tissue ([122]Figure 4).

Figure 4.

   [123]Figure 4
   [124]Open in a new tab

   Model predictions for gene expression throughout the entire slide

   The first image on the left in the top row is the original tissue
   slide. The remaining heat maps in the first row demonstrate the spatial
   distribution of gene activity for selected lowly expressed genes that
   were predicted by the model. The heat maps in the second row exhibit
   the spatial distribution of gene activity for highly expressed genes
   that were accurately predicted by the model. The purple to yellow
   gradient indicates increasing levels of predicted gene expression per
   patch. Intratumor heterogeneity of gene expression can be observed,
   particularly for LAS1L or CYB5R3.

   Some genes, on the other hand, are more difficult to predict
   ([125]Table S3). For instance, the performance of the TransMIL
   algorithm is poor when predicting the expression levels of SPRR3, a
   marker for terminal squamous cell differentiation linked with tumor
   progression in early stage breast cancers,[126]^44 and PGAM2, a gene
   involved in oxidative stress responses.[127]^45 This could be explained
   by the complexity of the regulatory processes driven by these genes
   which may not render clear morphological changes in the cells. This
   could also be compounded by tumor heterogeneity, e.g., if oxidative
   stress is present in only part of the cancer tissue. Furthermore,
   unlike MAD2L1, F2RL2, and KIF2C, genes such as SPRR3 and PGAM2 exhibit
   higher performance variability among different MIL algorithms, and
   their performance is normally lower than an AUROC of 65%.

   To further explore the capability of MIL methods, we focused on the
   higher-level activity captured by individual genes within the tissue,
   which can be summarized within hallmark pathways underlying cancer
   initiation and progression. These included processes such as
   angiogenesis, which plays a crucial role in the formation of blood
   vessels to support tumor growth, hypoxia, triggered when cancer cells
   experience inadequate levels of oxygen, or the P53 pathway, which
   regulates cell-cycle arrest and apoptosis in response to DNA damage
   (see [128]Table S1 for a complete list). Across 14 key hallmark
   pathways we observed markedly consistent levels of performance of
   different MIL algorithms in predicting the expression of the genes
   involved in the respective pathways ([129]Figure 5).

Figure 5.

   [130]Figure 5
   [131]Open in a new tab

   Performance of various algorithms on predicting pathway-level activity
   in cancer

   TransMIL, DSMIL, ABMIL and TL (Transfer Learning) algorithm performance
   is compared across selected pathways. The boxes depict the AUROC
   distributions for each algorithm, colored according to the legend.
   White circles represent the median values, while black lines indicate
   the mean. The edges of the boxes depict the first/third quartiles and
   the whiskers depict the minima/maxima. The different hallmark pathway
   groups are arranged from left to right in descending order of median
   values according to the TransMIL algorithm.

   The E2F target genes had the highest average AUROC in our experiments
   ([132]Figure 5). This pathway participates in the cell cycle G1/S
   transition and DNA replication, and is generally upregulated in tumor
   cells, leading to abnormal cell proliferation.[133]^46 Such
   proliferation differences may be more easily captured in the morphology
   of the cells as well as nuclear staining within H&E-stained sections.
   In fact, the top 50 highest performing models were for genes involved
   in cell cycle checkpoints, mitosis and DNA integrity ([134]Table S4),
   suggesting that cell division-related processes are most easily
   captured within cancer H&E slides using this methodology. This is not
   surprising, given the remarkably high performances (>90%) of DL models
   when it comes to distinguishing tumor areas from normal cells[135]^47
   considering that proliferation is the key defining hallmark of cancer.
   In contrast, the bottom 50 least performing models (with AUROCs <62%)
   were for genes involved in tyrosine kinase and apoptotic signaling,
   nucleotide excision repair and oxidative stress responses
   ([136]Table S5), which might not be accompanied by visible phenotypic
   changes in the tumor microenvironment.

   Thus, we demonstrate how HistoMIL can be used to assess the detection
   of thousands of disease-relevant molecules in a speedy and efficient
   manner, with most informative results obtained for genes that are
   either expressed throughout the entire slide or not expressed at all
   within the tissue.

Discussion

   Building a MIL pipeline for WSI datasets is a complex undertaking,
   demanding extensive engineering expertise and a deep understanding of
   model hyperparameters. The intricacies of WSIs necessitate additional
   efforts is terms of data retrieval, preprocessing and normalization
   compared to typical image processing models used in standard
   classification tasks. In particular, WSI data requires specialized
   handling during retrieval and preprocessing due to its unique nature.
   Moreover, since H&E slides are relatively sensitive to color
   variations, they require an additional normalization step and should
   carefully select data augmentation steps (i.e., must not undergo
   extreme cropping operations). In this paper, we introduce HistoMIL, a
   Python library designed to streamline the pre- and post-processing of
   WSIs within a MIL-based pipeline. It seamlessly integrates with
   PyTorch, a widely used DL toolkit, simplifying the training of a batch
   of MIL-models for diverse targets. The use of PyTorch Lightning further
   simplifies the training process. Furthermore, our HistoMIL package
   offers an uncomplicated method for training a feature extractor using
   an SSL protocol, which can enhance the performance of various methods
   within the target data domain. We provide a detailed tutorial on how to
   use this package, tailored to both technical and non-technical users,
   in our Notebooks folder at [137]https://github.com/secrierlab/HistoMIL.

   HistoMIL aims to facilitate the training and application of MIL
   algorithms for predicting molecular labels based on digital pathology
   slides by providing an easy-to-use API. In doing so, we hope to equip
   users with a comprehensive toolkit that empowers them to concentrate on
   developing new algorithms and addressing biological questions. To
   achieve this goal, we have implemented a wide range of modules and
   functions to streamline the process of training and using SSL and MIL.
   With HistoMIL’s SSL component, generating a feature extractor for the
   diagnostic slide dataset becomes effortless, subsequently enabling the
   prediction of various molecular labels. In our experiments, we
   illustrate how HistoMIL can be used to predict of the expression status
   of multiple genes in H&E slides using the built-in MIL algorithms.
   These pipelines have been implemented in the form of interactive
   notebooks and can be opened and evaluated on cloud platforms such as
   Google Colab and Kaggle. This highlights how HistoMIL can be used to
   greatly simplify the complexity of engineering implementations. We hope
   that the examples provided will assist other users in incorporating MIL
   methods as effective tools in their analysis pipelines.

   By designing HistoMIL to maintain consistency and ease of use when
   introducing customized models, we have observed that the data
   processing steps of different algorithms can be shared due to the
   repeated use of HistoMIL modules. Additionally, different algorithms
   can also share the same intermediate data results. Furthermore, batch
   processing and patch aggregation in HistoMIL come with pre-set values,
   which greatly reduces the level of difficulty for users. It is
   important to note that HistoMIL is not limited to the implemented MIL
   algorithms. Due to its modular and scalable design, users can
   conveniently implement and modify new algorithms. This flexibility
   allows for the training of customizable algorithms based on existing
   work. Therefore, any algorithm implementation involving MIL and WSIs
   can benefit from the functionality we provide. In addition, many tasks
   involving WSIs can be easily accommodated through modifications to
   algorithms or loss functions.

   We showcased the strong performance of HistoMIL across more than 2,000
   gene expression prediction tasks using a variety of MIL algorithms.
   Notably, we achieved AUCs exceeding 80% for 130 genes. TransMIL
   demonstrated superior results compared to other MIL or TL algorithms.
   We showed that amongst classical cancer hallmark pathways the most
   identifiable within H&E-stained slides are the ones related to cell
   proliferation, in line with other findings in the field,[138]^10
   whereas kinase signaling, apoptosis and oxidative stress processes are
   more difficult to capture from the tissue morphology. This suggests
   that cancer diagnosis and progression tasks could be more easily
   automated on histopathology slides than tasks related to targeted
   treatment decisions. Furthermore, the spatial visualization
   capabilities of HistoMIL pave the way toward further analyses of
   spatial patterns of gene activity, which could be used to understand
   how cancers develop within the tissue and interact with their
   environment.

   HistoMIL is an open-source project that will continue to add additional
   pre-trained models and functionality. In the future, we intend to
   enhance the currently available models by training them on additional
   datasets and including other MIL algorithms. A logical extension of our
   work is its application in tasks like cancer grading, survival
   prediction and the exploration of other molecular or clinical labels in
   different tumor types. Importantly, the package is not restricted to
   the analysis of cancer datasets alone, and could be readily applied to
   address a wide range of biomedical questions seeking to uncover
   meaningful patterns within H&E stained slides. Future versions of
   HistoMIL could aid with patient diagnosis or help pathologists identify
   subtle differences in gene activity that might not be readily
   discernible in traditional histological sections. Extracting features
   and analyzing results is typically a time-consuming and computationally
   intensive task, but by using HistoMIL we can train models on a very
   large scale. This may lead to faster and more efficient analysis of
   whole slide images in future clinical usage.

Limitations of the study

   Our experiments have shed light on certain limitations within the
   existing MIL algorithms. First, as expected, the performance of these
   algorithms is highly dependent on the image quality of the slides. As
   the expression levels of some genes are correlated with the color
   patterns of the images, the models are sensitive to the color changes
   in the original slides. In our experiment, to maintain simplicity in
   the training process and reduce extraneous variables, we did not
   introduce additional data augmentation steps. This might have
   compromised the generalization capabilities of the resulting model. In
   practice, different molecular labels may require distinct data
   augmentation strategies as critical parameters for performance
   optimization. We underscore the importance of considering this as an
   essential step for accurate model construction and prediction.

   MIL algorithms also incorporate attention mechanisms and neighborhood
   information, rendering them more susceptible to variations within a
   single slide, such as local color changes. Moreover, patch-level
   predictions can introduce biases. Some genes might not be expressed in
   all regions of the slide, but because a single expression label is
   available per slide, the models would tend to assume high expression in
   all areas when this label is high. We frequently observed this
   phenomenon in our experiments. Whether this reflects a genuine feature
   of gene activity within the tissue or a model limitation remains to be
   determined. To address this issue, future work should focus on
   incorporating additional annotations, such as those obtained through
   immunohistochemistry, to acquire higher-resolution data for a specific
   biomarker across the entire tissue rather than relying on a single
   label per slide.

   The interpretability of an AI model in biology and medical research
   cannot be overstated. In HistoMIL, the ABMIL and TransMIL
   implementations inherently feature attention scores to demonstrate good
   interpretability. The DSMIL method generates patch-level prediction
   values as part of its output. Furthermore, gradient-based
   interpretability methods can be incorporated with some simple
   additional functions. In subsequent extensions, we plan to introduce
   distinct interpretability components to the HistoMIL framework, thereby
   providing a unified interpretability interface for all the supported
   models. To further enhance interpretability, future research endeavors
   in the field should also incorporate experimental validation of the
   model predictions.

   The number of samples in the tested dataset also limits our analyses.
   In the context of TL, the number of training samples is determined by
   the number of WSIs in the dataset multiplied by the number of patches
   within each WSI. In essence, each patch extracted from a WSI is treated
   as an independent sample, resulting in a substantial number of training
   samples. However, when employing the MIL method, the number of training
   samples for the slide-level classifier is equivalent to the number of
   WSIs in the dataset. This is because MIL treats each WSI as a single
   sample, rather than considering each patch within the WSI as an
   individual sample. This fundamental distinction has the potential to
   impact the generalization capability of the MIL method. It also makes
   the model aggregation part prone to being misled by certain samples and
   thus trapped in local optima. Although we selected the largest cancer
   dataset available from TCGA (TCGA-BRCA), the total number of WSI
   samples remains relatively limited, which further constrains the
   potential performance of our models. Furthermore, while TCGA-BRCA is a
   cancer dataset, due to time and resource constraints, our current
   experiment did not include testing on other cancer datasets or those
   related to other diseases. Thus, our biological conclusions are
   unlikely to be generalizable to other types of cancers or diseases. Our
   future work will aim to address those issues by expanding the pool of
   diagnostic slides from other sources.

STAR★Methods

Key resources table

   REAGENT or RESOURCE SOURCE IDENTIFIER
   Software and algorithms
     __________________________________________________________________

   HistoMIL This paper GitHub:
   [139]https://github.com/secrierlab/HistoMIL; Zenodo:
   [140]https://doi.org/10.5281/zenodo.8220572
   PyTorch Paszke et al.[141]^34 [142]https://pytorch.org/
   PyTorch Lightning Falcon[143]^35
   [144]https://lightning.ai/docs/pytorch/stable/
   OpenSlide-pytorch Goode et al.[145]^18
   [146]https://openslide.org/api/python/
   ABMIL Leiby et al.[147]^31
   [148]https://github.com/AMLab-Amsterdam/AttentionDeepMIL
   DSMIL Li et al.[149]^32 [150]https://github.com/binli123/dsmil-wsi
   TransMIL Shao et al.[151]^36
   [152]https://github.com/szc19990412/TransMIL
   TCGAbiolinks Colaprico et al.[153]^48
   [154]https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.h
   tml
   msigdbr Igor Dolgalev [155]https://igordot.github.io/msigdbr/
   GeneMania Warde-Farley et al.[156]^49 [157]https://genemania.org/
     __________________________________________________________________

   Other
     __________________________________________________________________

   H&E slides and RNA-seq data from the TCGA BRCA cohort (GDC data portal)
   TCGA, Weinstein et al.[158]^50 [159]https://www.cancer.gov/tcga;
   [160]https://portal.gdc.cancer.gov/projects/TCGA-BRCA
   MSigDB hallmark gene set Liberzon et al.[161]^51
   [162]https://www.gsea-msigdb.org/gsea/msigdb/human/genesets.jsp?collect
   ion=H
   [163]Open in a new tab

Resource availability

Lead contact

   Further information and requests for resources should be directed to
   and will be fulfilled by the lead contact, Maria Secrier
   ([164]m.secrier@ucl.ac.uk).

Materials availability

   This study did not generate new unique reagents.

Experimental model and study participant details

   This paper analyses publicly available RNA-sequencing and H&E data from
   TCGA, collected from 1,062 breast cancer patients. All data comply with
   ethical regulations, with approval and informed consent for collection
   and sharing already obtained by The Cancer Genome Atlas (TCGA). Sex,
   age and ethnicity information for the study participants can be found
   at [165]https://portal.gdc.cancer.gov/projects/TCGA-BRCA. Clinical
   information was available for 1,048 patients, out of which 99% (n =
   1,036) were female and 1% (n = 12) were male, with ages comprised
   between 26 and 90 (median of 58). The cohort was split by ethnicity as
   follows: 38 Hispanic or Latino, 842 not Hispanic/Latino, 168
   unreported. The tumor stages analyzed in the study were as follows: T1
   (n = 266), T2 (n = 607), T3 (n = 134), T4 (n = 39), Tx (n = 2). The sex
   and ethnicity information were not taken into account in the analysis
   because breast cancer predominantly affects women (99% in the analyzed
   cohort) and TCGA data is not sufficiently diverse and powered for
   ethnicity analyses in the context of our study. This may limit the
   study’s generalisability. Experimental groups were defined by the
   median expression of each hallmark gene considered in the study as
   detailed below.

Ethics approval and consent to participate

   All data employed in this study comply with ethical regulations, with
   approval and informed consent for collection and sharing already
   obtained by The Cancer Genome Atlas (TCGA).

Method details

Experimental setting and dataset

   In this paper, we exemplify the power and versatility of HistoMIL in
   oncology-related tasks by building prediction models for 2,487
   cancer-related genes.

Dataset

   The Cancer Genome Atlas (TCGA) is a collaborative project that aims to
   characterise the genomic and molecular landscape of various
   cancers.[166]^50 We chose TCGA-BRCA as the largest available dataset
   with diagnostic H&E-stained slides and matched RNA-sequencing which we
   could use to define cancer-relevant labels (n = 2,487). The TCGA-BRCA
   dataset contains a large amount of data, including whole slide images,
   genomic data, clinical data, and more. It includes samples from a
   diverse patient population, including patients of different ages,
   races, and ethnicities. This helps avoid biases as potential factors
   that may affect model performance. As a widely used data source, the
   TCGA-BRCA dataset has undergone quality control, which ensures that the
   data is of high quality and well-annotated. This can help reduce the
   noise and variability in the dataset. We downloaded 1,133 diagnostic
   WSIs from the Genomic Data Commons (GDC) Data Portal.[167]^52 In our
   experiment, we split WSIs into patches, then automatically selected
   WSIs with more than 1,000 patches, and further removed the images which
   were blurry or containing marks. This left us with 1,012 WSIs for
   analyses, which we split into 80% for training and 20% for testing.

Prediction task

   The target labels for the experiments were extracted from the RNA-seq
   profiles of the same TCGA-BRCA patients for which H&E-stained
   diagnostic slides were also available, downloaded using the
   TCGAbiolinks package.[168]^48 We surveyed 2,487 genes involved in
   various cancer hallmark pathways derived from MSigDB[169]^51 using the
   msigdbr R package (see [170]Table S1), and used the FPKM normalised
   expression values for each gene to categorise tumors as “highly” or
   “lowly” expressing the respective gene based on the median split. This
   is a targeted task designed to demonstrate that HistoMIL can assist
   researchers in rapidly scaling up the classifiers required for their
   studies. In this task, the HistoMIL package was used to train and
   validate prediction models for the expression of the selected 2,487
   cancer hallmark genes. In the steps involving MIL training and
   inference, we employed servers powered by A100 and V100 GPUs.
   Typically, training for a gene expression spans 100 epochs with early
   stopping. Predictions for some genes might peak as early as the 5th or
   6th epoch, triggering the early stopping mechanism. Within the trainer
   of HistoMIL, we incorporated options for researchers to either randomly
   split the training set proportionally or choose K-fold validation,
   facilitating the validation of the training process. By integrating
   support for TorchMetrics, users can easily select how to evaluate model
   performance and monitor training progress. Researchers can also
   determine which slides, generated by specific hospitals, would be
   placed in a different set as an independent test set to evaluate model
   performance. This approach facilitates a more robust evaluation of the
   model’s generalization capabilities. For this experiment, we used an
   80% training set and 20% test set strategy to keep a simple
   experimental setting. We note that we did not employ the SSL module in
   this analysis example in order to simplify the training process and to
   focus on comparing the performance of TL and MIL methods.

   Finally, functional enrichment analysis was performed using
   GeneMania.[171]^49 By predicting the expression of a gene of interest
   throughout the entire tissue slide, researchers can highlight the
   features derived from the attention score or gradient vectors.

   We believe using HistoMIL could aid in diagnosis or help pathologists
   identify subtle differences in gene activity that might not be apparent
   in traditional histological sections. Extracting features and analysing
   results is normally a time-consuming and computationally intensive
   task. By using HistoMIL, we can train models on a very large scale.
   This may lead to faster and more efficient analysis of whole slide
   images in future clinical usage.

Module design in HistoMIL

Data level design

   The HistoMIL design includes a data level to handle different data
   formats, preprocessing and other meta-information of the target cohort.
   Pre-processing WSIs can be time-consuming and computationally expensive
   in many cases. By integrating a number of configurations and functions,
   the HistoMIL package offers a functional interface to smoothly run each
   necessary step in a different pipeline, and save the intermediate
   output as needed. After initialising a Slide instance and reading the
   slide files, we divide the preprocessing steps into three separate
   concept categories: Tissue, Patch, and Feature. Each of these concepts
   is linked with a parameter class to help users modify the preprocessing
   steps by following their own experimental requirements. A manager class
   named WSI_collector helps users unify the initialisation, calculation
   and loading process of these concepts to further decrease the
   complexity of usage.

Reading raw WSI data

   HistoMIL can handle different data formats by implementing
   slide_backend wrappers for widely considered packages such as
   OpenSlide-Python.[172]^18 This design can help researchers access the
   raw data of different WSI formats and provides a common interface for
   the subsequent steps. For instance, the scale pyramid in WSI processing
   can be more important than other areas as some WSI files naturally
   include multi-scale representation of the raw data.[173]^25 A series of
   downscaled or upscaled versions of the original slide will help the
   potential deep learning models get more information. HistoMIL will
   create a common scale pyramid as metadata of a WSI file. Also,
   researchers can implement customised slide wrappers to access
   additional slide formats. This modular design enables the rapid
   integration of customised slide backends into existing pipelines,
   thereby facilitating the support of different datasets. All these
   elements are consolidated within the Slide instance when using
   HistoMIL.

Tissue masking and patch extraction

   WSIs typically involve a considerable area that only includes
   non-biologically relevant background elements such as glass or marker
   pen. HistoMIL includes a Tissue class to identify and remove these
   areas by generating a tissue mask. Inspired by other ML-oriented
   packages, the Tissue class includes a wrapper for an Otsu function to
   categorise pixels into foreground or background.[174]^28 Some basic
   morphological operations are integrated to eliminate small holes within
   the tissue region. Researchers can also implement different wrapper
   functions for different purposes. Patch extraction, which aims to
   decrease the cost of GPU memory when training a model, works in a
   similar manner. A large WSI can be partitioned into small patches by
   using an integrated function to iteratively walk through the tissue
   area. All the intermediate data is stored for further usage, and the
   proposed implementation can avoid filling available memory with
   enhanced memory efficiency. By using this scale pyramid, we configure
   the segmentation process in higher scale pyramid levels as our default
   setting to decrease computational costs. Similarly, the default
   settings of patch selection functions, which can decide whether current
   patches need to be processed in a pipeline, are applied at a high
   pyramid level. We use a multiple processing pool to further accelerate
   the patching step, similarly to the CLAM package.[175]^28 All these
   parameters (such as step size and window size in the patch concept
   class) can be easily modified in related parameter classes to fit
   different requests.

Feature extraction

   A pre-calculation step for the feature extraction part may decrease
   time costs during training. Some MIL
   algorithms[176]^14,[177]^32^,[178]^36 can work with pre-trained feature
   extraction networks and choose to calculate feature vectors for each
   selected patch in a preprocessing step. By following this paradigm, we
   designed a Feature class to synchronise the manipulation and
   maintenance of the feature vectors generated from each slide. This
   design can also simplify the potential clustering process for feature
   vectors within each bag or clustering on the entire dataset. Hence,
   preprocessing, which would have required the implementation of multiple
   functions and methods, has been simplified to a few lines of code
   invocation in HistoMIL ([179]Figure 2A).

Model training

   In HistoMIL, model training is a crucial part especially for the target
   molecular labels that may be difficult to identify by only considering
   tissue morphology differences. MIL has been introduced as a higher
   performing weakly supervised learning protocol for WSI classification.
   A slide (or WSI) is considered as a bag of instances in the MIL
   protocol. Each bag may contain hundreds or thousands of instances
   (patches) that are assembled into slide-level features for
   classification. To the best of our knowledge, no existing package to
   date has been designed to handle MIL methods. There is considerable
   complexity in building a MIL pipeline for WSI classification tasks due
   to the inherent complexity of WSI formats, heterogeneous nature of cell
   morphology, and unbalanced target labels. HistoMIL is designed to
   simplify the implementation of MIL-based pipelines by implementing
   Self-Supervised Learning and Multiple Instance Learning modules
   ([180]Figures 2B and 2C).

Self-supervised learning for feature extraction

   End-to-End training for a MIL pipeline that consists of feature
   extractors and aggregators may be computationally expensive, especially
   in a large WSI. Therefore, the existing model either uses fixed patch
   features derived from CNNs or simply employs a small number of
   high-scoring patches to update feature extractors. Inspired by the TL
   paradigm, the feature extractor network can re-use pre-trained
   parameters from general image classification domains (e.g., ImageNet).
   The advantage is that there are different pre-trained feature
   extraction networks that can be chosen for different tasks. Thus,
   target classification models can be trained quickly with a pre-trained
   feature extractor network. However, if a pre-trained feature network is
   not trained on a WSI dataset, it may require extra effort in the
   preprocessing steps and the model may not converge. The process of
   aggregating and classifying may also be susceptible to the issue of
   overfitting and inadequate supervision, resulting in potential bias and
   other limitations.

   Self-Supervised Learning (SSL) is frequently mentioned as a solution
   for the mentioned problems. While some studies, such as Lu
   et al.,[181]^28 have shown that the SSL phase is a crucial element in a
   pipeline, integrating the WSI dataset into an SSL pipeline remains
   difficult. HistoMIL offers an easy way to accomplish this part by using
   a user-friendly SSL module. To train a feature extractor, the HistoMIL
   package offers an easy way to apply self-supervised training methods.
   Firstly, we introduce a trainer class to simplify the training process
   on a WSI dataset. A wrapper class is created for the timm
   package[182]^33 which includes most of the popular backbone
   architectures and potential pre-trained parameters. Some widely
   employed SSL methods have been implemented, such as the
   MoCo[183]^53^,[184]^54 and SimCLR[185]^55 methods. By using these
   built-in methods, HistoMIL also offers predetermined hyperparameters to
   further simplify the training process to new users. Also, it is
   specifically designed for WSI data. Users can simply download raw data
   from projects such as TCGA and do not need to worry about preprocessing
   and data augmentation. In addition, utilising the feature extractor
   network trained by HistoMIL’s SSL module negates the necessity to
   account for discrepancies in operator implementation that may arise
   when importing trained parameters from other packages. Moreover, by
   embedding SSL methods within the HistoMIL package, researchers can
   effortlessly incorporate the selection of SSL methods and backbone
   architectures as hyperparameters when optimising their models.

MIL methods

   To train a slide-level classifier on target labels, HistoMIL offers a
   variety of algorithms that are ready to use. To the best of our
   knowledge, no existing package can simplify the training and
   classification process for MIL algorithms in WSI datasets. Transfer
   Learning (TL) protocols and Multiple Instance Learning (MIL) protocols
   have both been widely considered by researchers for classification
   tasks on WSIs. While packages such as DeepMed[186]^27 have been
   designed to apply TL-based models on WSI datasets, there are
   considerable difficulties in implementing a MIL model for WSI
   classification tasks. Whole slide images often encompass several
   gigabytes or even terabytes, and the aggregation function of MIL
   methods may suffer due to its high-dimensional feature space, with tens
   of thousands of feature vectors per slide. Moreover, there is the lack
   of a standard implementation for various MIL algorithms. For molecular
   labels, training a MIL model may be even more difficult when faced with
   problems such as heterogeneity of cell morphology and imbalanced data.
   In some extreme cases, the target classification model may be
   vulnerable to overfitting, and is unable to explore rich feature
   representations because of the insufficient supervised signal. All
   these challenges require users to have considerable experience with
   implementing and training deep neural networks. This requirement may
   exclude researchers who need these models but lack the necessary
   experience. HistoMIL simplifies this aspect by introducing built-in MIL
   modules with pre-defined hyperparameters.

   In HistoMIL, a customizable sampler function in our Dataloader
   implementation is introduced that only samples some instances from each
   bag. Different batch sizes can be used if the MIL algorithm needs to
   sample N instances from each bag. But if all the patches must be read
   in at once, the batch size should be fixed at 1 and this may lead to an
   unstable training process. In our default setting, we chose to decrease
   the initial learning rate and accumulate gradients over the training
   steps to smooth the optimisation process. We also include a cohort
   instance during training to handle patient metadata, which means users
   can also assign pseudo labels for each patch based on the label per
   slide, and this enables users to train their model using the transfer
   learning paradigm as well.

Trainer and experiment manager

   A significant amount of time and effort is spent on the tuning process
   and hyperparameter selection steps when training on WSI datasets. We
   designed the trainer class and experiment class in HistoMIL, which are
   built on the PyTorch Lightning framework, to cut these costs. Our
   target is to simplify the tuning process for different target labels.
   The SSL, TL and MIL algorithms mentioned above are implemented using a
   related PyTorch Lightning module in HistoMIL. Researchers can
   initialise a PyTorch Lightning instance with HistoMIL, and this
   instance can be easily fed into third-party tuners such as the Ray
   tuner package.[187]^37

Software and package

   HistoMIL utilises Python as the primary implementation language and
   PyTorch as the underlying deep learning platform.[188]^33 Using these
   libraries, HistoMIL can easily implement customised algorithms.
   Furthermore, we introduced PyTorch Lightning as a higher-level
   framework to simplify the implementation of training code.[189]^35 All
   the built-in algorithms in HistoMIL adhere to PyTorch Lightning’s
   design philosophy, which decomposes the training process into different
   functions. This facilitates user customization while also ensuring
   concise implementation. The HistoMIL package is available at the
   following GitHub repository:
   [190]https://github.com/secrierlab/HistoMIL. The version of the code
   used to produce the results of the paper has been deposited at Zenodo
   ([191]https://doi.org/10.5281/zenodo.8220572).

Development environment

   In the present research, we employed the PyTorch framework due to its
   renowned flexibility and accessibility for deep learning tasks. The
   computational experiments were executed on a cluster equipped with
   NVIDIA A100 or V100 graphics cards and 1TB of storage. Our system runs
   on the Ubuntu 18.04 operating system, with the Anaconda platform
   facilitating Python-based development. Additionally, to enhance the
   training efficiency of our deep learning models, GPU acceleration was
   leveraged using the CUDA toolkit provided by NVIDIA.

Quantification and statistical analysis

Evaluation metrics

   To facilitate a robust comparison with state-of-the-art predictors, we
   adopted the Area Under the Receiver Operating Characteristic Curve
   (AUROC) as our primary performance evaluation criterion to validate the
   efficacy of our model. The dataset was randomly partitioned into
   training and test sets in an 8:2 ratio. Classification tasks for all
   datasets utilized in this study pertained to binary classification
   based on varying levels of gene expression. The AUROC metric
   encapsulates both sensitivity and specificity of predictions.
   Generally, a higher AUC score indicates superior classifier performance
   for a given task. An AUC score of 0.5 signifies random guessing,
   whereas a score of 1 denotes perfect classification.

   The pathway enrichment analysis was performed via a hypergeometric test
   as implemented in GeneMania. All enriched pathways with an FDR>0.1 were
   taken into account.

Acknowledgments