Abstract

   Cancer prognosis is of essential interest, and extensive research has
   been conducted searching for biomarkers with prognostic power. Recent
   studies have shown that both omics profiles and histopathological
   imaging features have prognostic power. There are also studies
   exploring integrating the two types of measurements for prognosis
   modeling. However, there is a lack of study rigorously examining
   whether omics measurements have independent prognostic power
   conditional on histopathological imaging features, and vice versa. In
   this article, we adopt a rigorous statistical testing framework and
   test whether an individual gene expression measurement can improve
   prognosis modeling conditional on high-dimensional imaging features,
   and a parallel analysis is conducted reversing the roles of gene
   expressions and imaging features. In the analysis of The Cancer Genome
   Atlas (TCGA) lung adenocarcinoma and liver hepatocellular carcinoma
   data, it is found that multiple individual genes, conditional on
   imaging features, can lead to significant improvement in prognosis
   modeling; however, individual imaging features, conditional on gene
   expressions, only offer limited prognostic power. Being among the first
   to examine the independent prognostic power, this study may assist
   better understanding the “connectedness” between omics profiles and
   histopathological imaging features and provide important insights for
   data integration in cancer modeling.

   Keywords: cancer prognosis, independent prognostic power, omics
   profiles, histopathological imaging features

1. Introduction

   In cancer research and practice, prognosis is of essential interest.
   Extensive statistical modeling has been conducted, and yet there is
   still much room for additional research [[28]1,[29]2]. Multiple
   families of biomarkers have been used in cancer prognosis modeling. In
   the past decades, with the development of profiling techniques, omics
   data have been extensively used and shown to be effective. For example,
   in [[30]3], a signature composed of 21 gene expressions is used for
   modeling breast cancer prognosis. In [[31]4], hsa-mir-155 and
   hsa-let-7a-2 microRNAs are found as prognostic for lung cancer. In
   [[32]5], the prognostic power of methylated RASSF1A and/or APC serum
   DNA for breast cancer is identified. Such findings are biologically
   highly sensible as cancer is a genetic disease and the development and
   progression of cancer are strongly affected by molecular changes.

   Imaging techniques have been extensively used in cancer practice. In
   particular, for definitive diagnosis, biopsies are usually ordered, and
   pathologists review the slides of representative sections of tissues.
   The histopathological imaging data generated in this process directly
   reflect the histological organization and morphological characteristics
   of tumor cells and their surrounding tumor microenvironment. More
   recently, beyond diagnosis, histopathological imaging data have also
   been used for modeling other cancer outcomes/phenotypes. For example,
   Harpole et al. [[33]6] showed that the levels of tumor cell
   dedifferentiation are associated with survival outcomes. Automated
   histopathological imaging analysis systems have been shown to be
   effective in the prognostic determination of various malignancies,
   including breast cancer [[34]7], neuroblastoma [[35]8], lymphoma
   [[36]9], nonsmall cell lung cancer [[37]10], precancerous lesions in
   the esophagus [[38]11], and others.

   Omics profiles and histopathological images are biologically connected
   and contain overlapping information. In particular, properties of
   tumors and their microenvironment, as described in histopathological
   images, are highly regulated by molecular changes. Multiple statistical
   modelings have been conducted on their interconnections. For example,
   Yu et al. [[39]10] found the correlation of quantitative
   histopathological features with TP53 mutation in lung adenocarcinoma.
   Cooper et al. [[40]12] demonstrated that PDGFRA, EGFR, and MDM2
   amplifications are associated with imaging features such as greater
   minor axis length, area, and circularity in glioblastoma.

   On the other hand, it is also possible that omics profiles and
   histopathological images may contain independent information. Tissues
   used to generate omics measurements are usually heterogeneous and
   mixtures of different components, making the observed omics
   measurements represent an aggregation of distinct cells [[41]12].
   Histopathological images, with extremely high spatial resolutions, also
   contain information on tissue relationships and characteristics of the
   spatial organizations of tumor cells. Features of histopathological
   images, beyond regulated by molecular changes, may also be affected by
   personal immune system, microenvironment, and other factors. There are
   a few recent studies integrating omics measurements and imaging
   features and showing that such a data integration can lead to improved
   prognosis modeling for breast cancer [[42]13], brain tumor [[43]14],
   and nonsmall cell lung cancer [[44]15]. Such studies seem to suggest
   that there is independent information in omics and histopathological
   imaging data.

   Our literature review suggests that there is a lack of study rigorously
   quantifying whether omics profiles contain independent information,
   conditional on histopathological images, on cancer prognosis, and vice
   versa. It is noted that the aforementioned and other data integration
   studies do not suggest which type of measurement has independent
   information. In addition, they are mostly estimation-based and do not
   have a rigorous statistical inference framework. Some studies rely on
   prescreening to accommodate high data dimensionality and can be
   statistically limited. This study is conducted to directly tackle these
   problems. It can advance from the existing literature in multiple
   aspects. Specifically, it delivers an analysis pipeline which has a
   rigorous statistical inference framework and can show whether
   individuals of a type of measurement have additional prognostic power
   conditional on the other type of measurement. It can provide solid
   evidence on whether data integration is needed in modeling and clinical
   practice. In addition, data analysis may also provide further insights
   into two deadly cancers.

2. Data

   The Cancer Genome Atlas (TCGA) is one of the largest and most
   comprehensive cancer projects, and is organized by the National Cancer
   Institute (NCI) and National Human Genome Research Institute (NHGRI)
   [[45]16]. It has published high quality omics data and
   histopathological images for 33 types of cancer. In this study, we
   analyze data on lung adenocarcinoma (LUAD) and liver hepatocellular
   carcinoma (LIHC), both of which have high mortality rates and pose
   increasing public concerns. They also have relatively larger sample
   sizes, which is critical for making reliable findings. The proposed
   analysis can be extended to other cancer types straightforwardly. For
   omics measurements, we consider gene expressions, which have a central
   role in cancer prognosis modeling. The response of interest is overall
   survival time, which is right censored. It is noted that other types of
   omics measurements, for example protein expressions and microRNAs, and
   other prognosis outcomes can be analyzed in a similar manner.

2.1. mRNA Gene Expression Measurements

   We download the mRNA gene expression data from cBioPortal
   ([46]http://cbioportal.org), which have been measured using the
   Illumina Hiseq2000 RNA Sequencing Version 2 analysis platform, and
   processed and normalized using the RSEM software. We refer to the
   literature [[47]17,[48]18] and TCGA website for more detailed
   information on data collection and processing. Data are available on
   25,031 gene expression measurements. As the number of cancer-related
   genes is not expected to be large, we conduct a simple prescreening to
   reduce the dimensionality and also improve the stability of estimation.
   Specifically, the top 5000 genes with the largest variances are
   selected for downstream analysis.

2.2. Histopathological Imaging Features

   We download the whole-slide histopathology images directly from the
   TCGA website ([49]https://portal.gdc.cancer.gov), which are in the svs
   format. These tissue slides are formalin-fixed, paraffin-embedded
   slides with which the cell morphology is well-preserved and thus
   appropriate for image feature recognition. They are captured at 20× or
   40× magnification by the Aperio medical scanner. To extract imaging
   features, we conduct the following three-step preprocessing. The
   overall flowchart is provided in [50]Figure 1.

Figure 1.

   [51]Figure 1
   [52]Open in a new tab

   Flowchart of extracting imaging features. Step 1: whole-slide
   histopathology images are cropped into small subimages of 500 × 500
   pixels, and 20 subimages are then randomly selected. Step 2: Imaging
   features are extracted using CellProfiler [[53]19] for each subimage.
   Step 3: For each patient, features are averaged.

   In the first step, we process the downloaded svs files using the
   Openslide Python library [[54]20]. Specifically, to make data in an
   appropriate format for morphological feature extraction, each svs image
   is cropped into small subimages of 500 × 500 pixels, and each subimage
   is then saved in the tiff format. Among them, subimages that contain
   mostly (>50%) background white space are filtered out. From the
   remaining subimages, twenty representative ones are randomly selected
   for each svs image to avoid potential subjective bias and reduce
   computational cost [[55]21].

   In the second step, we use CellProfiler [[56]19] to extract features
   from each subimage. CellProfiler is an open source cell image analysis
   platform developed by the Broad Institute. Specifically, image colors
   are separated based on hematoxylin staining and eosin staining, and
   converted to grayscale to extract regional features, such as cell and
   tissue texture and granularity. These are “classic” image processing
   features, which have been examined in a large number of imaging
   studies. It is noted that they are not expected to be specific
   histopathological features, and cannot be recognized by pathologist.
   More specifically, texture describes a set of metrics calculated in
   CellProfiler to quantify the perceived texture of histopathological
   images, and includes information on the spatial arrangement of color or
   intensities in images. Granularity describes the size of how well the
   structure element fits in images. Then, cell nuclei are identified and
   segmented to collect the cell-level features, such as cell size, shape,
   distribution, and texture of nuclei. Other cell features such as
   regional occupation, area fraction, and neighboring architecture are
   also captured. The above process generates 832 features for each
   subimage. A further screening is conducted to exclude irrelevant
   features such as file size and execution information. Finally, for each
   subimage, a total of 772 features are available for analysis.

   In the third step, for each slide, the average feature values over
   twenty representative subimages are computed. When a subject has
   multiple slides, the average values over multiple slides are further
   computed for downstream analysis.

   The above preprocessing is applied to both LUAD and LIHC data. There
   are missing values in imaging feature data. Subjects with more than 25%
   missing imaging features are excluded. The remaining missing values are
   imputed using sample medians.

Remark 1.

     We adopt CellProfiler to extract imaging features, which is a
     popular choice in recent literature. The feature names are
     automatically provided by CellProfiler. The extracted features
     represent objective attributes of histopathological images,
     including the area and perimeter of nucleus and cytoplasm, mean and
     standard deviation of these measures, and other general image
     attributes. For each patient, multiple slides and subimages may be
     available. To extract as much information as possible, we consider
     multiple slides and subimages simultaneously, and the average values
     are used to summarize information. A closer examination suggests
     that they are not explicitly associated with specific
     histopathological findings, such as cellularity, atypia, anaplasia,
     nuclear pleomorphism, and some others. However, these attributes
     have been shown to be associated with pathological stages
     [[57]7,[58]10] and prognosis [[59]21], and also been used in
     multiple existing cancer modeling studies [[60]13,[61]15]. For
     example, Yu et al. [[62]22] show that some of these features are
     associated with specific subtypes of lung cancer. In this study, our
     goal is to rigorously quantify independent information in omics
     profiles and histopathological images for cancer prognosis, as
     opposed to focusing on specific interpretation of imaging features.
     As such, these features are sensible for our analysis, although they
     may not have simple, explicit biological interpretations.

2.3. Available Data

   After data matching, we obtain records on 316 and 358 subjects for LUAD
   and LIHC, respectively, with 5000 gene expression measurements, 772
   imaging features, and survival time. For LUAD, there are 103 deaths
   during follow-up, with survival times ranging from 0 to 214.77 months
   (median 6.03 months). For LIHC, there are 121 deaths during follow-up,
   with survival times ranging from 0 to 120.73 months (median 19.25
   months). Summary information of the analyzed subjects is provided in
   [63]Table 1.

Table 1.

   Sample characteristics.
            Characteristic               LUAD             LIHC
             Sample size                  316             358
   Age at diagnosis: median (range)   66 (39–88)       61 (16–90)
      Follow-up: median (range)     6.03 (0–214.77) 19.25 (0–120.73)
         Vital status: n (%)
                Alive                 213 (67.4%)     233 (65.0%)
               Deceased               103 (32.6%)     125 (35.0%)
              Sex: n (%)
                 Male                 144 (45.6%)     242 (67.6%)
                Female                172 (54.4%)     116 (32.4%)
         Cancer stage: n (%)
                  I                   180 (57.0%)     166 (46.4%)
                  II                  69 (18.7%)       82 (22.9%)
                 III                   41 (13%)        83 (23.2%)
                  IV                   21 (6.6%)        5 (1.4%)
                  NA                   5 (1.6%)        22 (6.1%)
   [64]Open in a new tab

3. Methods

   For describing cancer survival, we use the accelerated failure time
   (AFT) model. This model has been extensively adopted in cancer studies
   with high-dimensional variables because of its lucid interpretations
   and low computational cost [[65]23]. For each gene expression, we
   consider the prognosis model with its effect along with all imaging
   features. A statistically rigorous test is then conducted on the gene
   expression’s effect, which can suggest whether this particular gene has
   independent information for prognosis conditional on the imaging
   features. Then a parallel analysis is conducted, reversing the roles of
   gene expressions and imaging features. With a special emphasis on omics
   and imaging features, clinical factors are not included in the
   prognosis models.

   Consider n independent subjects. For the ith subject, denote
   [MATH: <mrow><mrow><msub><mi
   mathvariant="bold-italic">x</mi><mi>i</mi></msub><mo>=</mo><mrow><mo>(<
   /mo><mrow><msub><mi>x</mi><mrow><mi>i</mi><mn>1</mn></mrow></msub><mo>,
   </mo><mo> </mo><msub><mi>x</mi><mrow><mi>x</mi><mn>2</mn></mrow></msub>
   <mo>,</mo><mo> </mo><mo>…</mo><mo>,</mo><mo> </mo><msub><mi>x</mi><mrow
   ><mi>i</mi><mi>p</mi></mrow></msub></mrow><mo>)</mo></mrow></mrow></mro
   w> :MATH]
   and
   [MATH: <mrow><mrow><msub><mi
   mathvariant="bold-italic">z</mi><mi>i</mi></msub><mo>=</mo><mrow><mo>(<
   /mo><mrow><msub><mi>z</mi><mrow><mi>i</mi><mn>1</mn></mrow></msub><mo>,
   </mo><mo> </mo><msub><mi>z</mi><mrow><mi>i</mi><mn>2</mn></mrow></msub>
   <mo>,</mo><mo> </mo><mo>…</mo><mo>,</mo><mo> </mo><msub><mi>z</mi><mrow
   ><mi>i</mi><mi>q</mi></mrow></msub></mrow><mo>)</mo></mrow></mrow></mro
   w> :MATH]
   as the p- and q-dimensional vectors of gene expressions and imaging
   features,
   [MATH: <mrow><mrow><msub><mi>T</mi><mi>i</mi></msub></mrow></mrow>
   :MATH]
   and
   [MATH: <mrow><mrow><msub><mi>C</mi><mi>i</mi></msub></mrow></mrow>
   :MATH]
   as the logarithm of the event and censoring times. With right
   censoring, we observe
   [MATH:
   <mrow><mrow><mrow><mo>(</mo><mrow><msub><mi>y</mi><mi>i</mi></msub><mo>
   =</mo><mi>min</mi><mrow><mo>(</mo><mrow><msub><mi>T</mi><mi>i</mi></msu
   b><mo>,</mo><msub><mi>C</mi><mi>i</mi></msub></mrow><mo>)</mo></mrow><m
   o>,</mo><msub><mi>δ</mi><mi>i</mi></msub><mo>=</mo><mi
   mathvariant="normal">I</mi><mrow><mo>(</mo><mrow><msub><mi>T</mi><mi>i<
   /mi></msub><mo>≤</mo><msub><mi>C</mi><mi>i</mi></msub></mrow><mo>)</mo>
   </mrow><mo>,</mo><msub><mi
   mathvariant="bold-italic">x</mi><mi>i</mi></msub><mo>,</mo><msub><mi
   mathvariant="bold-italic">z</mi><mi>i</mi></msub></mrow><mo>)</mo></mro
   w></mrow></mrow> :MATH]
   .

   First consider the analysis with one gene expression and all imaging
   features. The analysis with one imaging feature and all gene
   expressions can be conducted in a similar manner and will not be
   reiterated. For the jth gene expression, consider the following AFT
   model:
   [MATH:
   <mrow><mrow><msub><mi>T</mi><mi>i</mi></msub><mo>=</mo><mi>α</mi><mo>+<
   /mo><msub><mi
   mathvariant="bold-italic">x</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub
   ><msub><mi>β</mi><mi>j</mi></msub><mo>+</mo><msub><mi
   mathvariant="bold-italic">z</mi><mi>i</mi></msub><msub><mi
   mathvariant="bold-italic">θ</mi><mi>j</mi></msub><mo>+</mo><msub><mi
   mathvariant="sans-serif">ε</mi><mi>i</mi></msub></mrow></mrow> :MATH]
   (1)

   where
   [MATH: <mrow><mi>α</mi></mrow> :MATH]
   is the intercept,
   [MATH: <mrow><mrow><msub><mi>β</mi><mi>j</mi></msub></mrow></mrow>
   :MATH]
   and
   [MATH: <mrow><mrow><msub><mi
   mathvariant="bold-italic">θ</mi><mi>j</mi></msub><mo>=</mo><msup><mrow>
   <mrow><mo>(</mo><mrow><msub><mi>θ</mi><mrow><mi>j</mi><mn>1</mn></mrow>
   </msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>θ</mi><mrow><mi>j</mi><m
   i>q</mi></mrow></msub></mrow><mo>)</mo></mrow></mrow><mo>′</mo></msup><
   /mrow></mrow> :MATH]
   are the unknown coefficients for the jth gene expression and all
   imaging features, and
   [MATH: <mrow><mrow><msub><mi
   mathvariant="sans-serif">ε</mi><mi>i</mi></msub></mrow></mrow> :MATH]
   is the random error. A statistical test
   [MATH: <mrow><mrow><mrow><msub><mi
   mathvariant="normal">H</mi><mn>0</mn></msub><mo>:</mo><mo> </mo><msub><
   mi>β</mi><mi>j</mi></msub><mo>=</mo><mn>0</mn></mrow><mrow><mo> </mo><m
   i>vs</mi><mo>.</mo><mo> </mo><msub><mi
   mathvariant="normal">H</mi><mn>1</mn></msub><mo>:</mo><mo> </mo><msub><
   mi>β</mi><mi>j</mi></msub><mo>≠</mo><mn>0</mn></mrow></mrow></mrow>
   :MATH]

   can reveal whether
   [MATH: <mrow><mrow><msub><mi
   mathvariant="bold-italic">x</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub
   ></mrow></mrow> :MATH]
   is independently associated with
   [MATH: <mrow><mrow><msub><mi>T</mi><mi>i</mi></msub></mrow></mrow>
   :MATH]
   given
   [MATH: <mrow><mrow><msub><mi
   mathvariant="bold-italic">z</mi><mi>i</mi></msub></mrow></mrow> :MATH]
   . Here, loosely speaking, a smaller p value can indicate a stronger
   association/more prognostic power. The analysis is challenging with the
   high dimensionality of imaging features, which makes the “standard”
   estimation and inference techniques not applicable. To tackle this
   problem, we consider a high-dimensional inference approach [[66]24]
   recently developed under a related but simpler context. Specifically,
   for estimation, the weighted least squares approach is adopted. Assume
   that data have been sorted according to
   [MATH: <mrow><mrow><msub><mi>y</mi><mi>i</mi></msub></mrow></mrow>
   :MATH]
   ’s from the smallest to the largest. Then we have the Kaplan–Meier
   weights defined as
   [MATH:
   <mrow><mrow><mrow><msub><mi>w</mi><mn>1</mn></msub><mo>=</mo><mfrac><mr
   ow><msub><mi>δ</mi><mn>1</mn></msub></mrow><mi>n</mi></mfrac></mrow><mr
   ow><mo>,</mo><mo> </mo><msub><mi>w</mi><mi>i</mi></msub><mo>=</mo><mfra
   c><mrow><msub><mi>y</mi><mi>i</mi></msub></mrow><mrow><mi>n</mi><mo>−</
   mo><mi>i</mi><mo>+</mo><mn>1</mn></mrow></mfrac><msubsup><mstyle
   mathsize="140%"
   displaystyle="true"><mo>∏</mo></mstyle><mrow><mi>j</mi><mo>=</mo><mn>1<
   /mn></mrow><mrow><mi>i</mi><mo>−</mo><mn>1</mn></mrow></msubsup><msup><
   mrow><mrow><mo>(</mo><mrow><mfrac><mrow><mi>n</mi><mo>−</mo><mi>j</mi><
   /mrow><mrow><mi>n</mi><mo>−</mo><mi>j</mi><mo>+</mo><mn>1</mn></mrow></
   mfrac></mrow><mo>)</mo></mrow></mrow><mrow><msub><mi>δ</mi><mi>j</mi></
   msub></mrow></msup><mo>,</mo><mi>i</mi><mo>=</mo><mn>2</mn><mo>,</mo><m
   o>…</mo><mo>,</mo><mi>n</mi><mo>,</mo></mrow></mrow></mrow> :MATH]

   where a further normalization is conducted to make
   [MATH: <mrow><mrow><msubsup><mstyle mathsize="80%"
   displaystyle="true"><mo>∑</mo></mstyle><mrow><mi>i</mi><mo>=</mo><mn>1<
   /mn></mrow><mi>n</mi></msubsup><msub><mi>w</mi><mi>i</mi></msub><mo>=</
   mo><mi>n</mi></mrow></mrow> :MATH]
   . Let
   [MATH: <mrow><mrow><mi
   mathvariant="bold-italic">W</mi><mo>=</mo><mi>diag</mi><mrow><mo>{</mo>
   <mrow><msub><mi>w</mi><mn>1</mn></msub><mo> </mo><mo>,</mo><msub><mi>w<
   /mi><mn>2</mn></msub><mo>,</mo><msub><mi>w</mi><mn>3</mn></msub><mo>,</
   mo><mo>…</mo><mo>,</mo><msub><mi>w</mi><mi>n</mi></msub></mrow><mo>}</m
   o></mrow></mrow></mrow> :MATH]
   ,
   [MATH: <mrow><mrow><msub><mi
   mathvariant="bold-italic">x</mi><mrow><mo>·</mo><mi>j</mi></mrow></msub
   ></mrow></mrow> :MATH]
   ,
   [MATH: <mrow><mrow><msub><mi
   mathvariant="bold-italic">z</mi><mrow><mo>·</mo><mi>k</mi></mrow></msub
   ></mrow></mrow> :MATH]
   , and
   [MATH: <mrow><mi mathvariant="bold-italic">y</mi></mrow> :MATH]
   denote the vectors composed of
   [MATH:
   <mrow><mrow><msub><mi>x</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub></m
   row></mrow> :MATH]
   ’s,
   [MATH:
   <mrow><mrow><msub><mi>z</mi><mrow><mi>i</mi><mi>k</mi></mrow></msub></m
   row></mrow> :MATH]
   ’s, and
   [MATH: <mrow><mrow><msub><mi>y</mi><mi>i</mi></msub></mrow></mrow>
   :MATH]
   ’s, which are weighted normalized such that
   [MATH: <mrow><mrow><msup><mn
   mathvariant="bold">1</mn><mo>′</mo></msup><mi
   mathvariant="bold-italic">W</mi><msub><mi
   mathvariant="bold-italic">x</mi><mrow><mo>·</mo><mi>j</mi></mrow></msub
   ><mo>=</mo><mn>0</mn><mo>,</mo><mo> </mo><msup><mn
   mathvariant="bold">1</mn><mo>′</mo></msup><mi
   mathvariant="bold-italic">W</mi><msub><mi
   mathvariant="bold-italic">z</mi><mrow><mo>·</mo><mi>k</mi></mrow></msub
   ><mo>=</mo><mn>0</mn><mo>,</mo><mrow><mo> </mo><mi>and</mi><mo> </mo></
   mrow><mn mathvariant="bold">1</mn><mo>′</mo><mi
   mathvariant="bold-italic">W</mi><mi>y</mi><mo>=</mo><mn>0</mn></mrow></
   mrow> :MATH]
   , and
   [MATH: <mrow><mrow><mi
   mathvariant="bold-italic">Z</mi><mo>=</mo><mrow><mo>(</mo><mrow><msub><
   mi
   mathvariant="bold-italic">z</mi><mrow><mo>·</mo><mn>1</mn></mrow></msub
   ><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi
   mathvariant="bold-italic">z</mi><mrow><mo>·</mo><mi>q</mi></mrow></msub
   ></mrow><mo>)</mo></mrow></mrow></mrow> :MATH]
   . With the high data dimensionality, regularized estimation is needed.
   In addition, not all imaging features are expected to be associated
   with survival, posing a variable selection problem. Consider the
   weighted penalized estimation:
   [MATH:
   <mrow><mrow><mrow><mo>{</mo><mrow><mtable><mtr><mtd><mrow><mrow><mo>(</
   mo><mrow><msubsup><mover
   accent="true"><mi>β</mi><mo>˜</mo></mover><mi>j</mi><mo>*</mo></msubsup
   ><mo>,</mo><msub><mover accent="true"><mi
   mathvariant="bold-italic">θ</mi><mo>˜</mo></mover><mi>j</mi></msub></mr
   ow><mo>)</mo></mrow><mo>=</mo><msub><mrow><mi>argmin</mi></mrow><mrow><
   msub><mi>β</mi><mi>j</mi></msub><mo>,</mo><mi
   mathvariant="bold-italic">θ</mi><mo>∈</mo><msup><mi>R</mi><mrow><mn>1</
   mn><mo>+</mo><mi>q</mi></mrow></msup></mrow></msub><mrow><mo>{</mo><mro
   w><mfrac><mn>1</mn><mrow><mn>2</mn><mi>n</mi></mrow></mfrac><mo>‖</mo><
   msup><mi
   mathvariant="bold-italic">W</mi><mrow><mfrac><mn>1</mn><mn>2</mn></mfra
   c></mrow></msup><msubsup><mrow><mrow><mo>(</mo><mrow><mi
   mathvariant="bold-italic">y</mi><mo>−</mo><msub><mi
   mathvariant="bold-italic">x</mi><mrow><mo>·</mo><mi>j</mi></mrow></msub
   ><msub><mi>β</mi><mi>j</mi></msub><mo>−</mo><mi
   mathvariant="bold-italic">Z</mi><msub><mi
   mathvariant="bold-italic">θ</mi><mi>j</mi></msub></mrow><mo>)</mo></mro
   w><mo>‖</mo></mrow><mn>2</mn><mn>2</mn></msubsup><mo>+</mo><msub><mi>λ<
   /mi><mn>0</mn></msub><msubsup><mstyle mathsize="80%"
   displaystyle="true"><mo>∑</mo></mstyle><mrow><mi>k</mi><mo>=</mo><mn>1<
   /mn></mrow><mi>q</mi></msubsup><mrow><mo>|</mo><mrow><msub><mi>θ</mi><m
   row><mi>j</mi><mi>k</mi></mrow></msub></mrow><mo>|</mo></mrow></mrow><m
   o>}</mo></mrow></mrow></mtd></mtr><mtr><mtd><mrow><msub><mover
   accent="true"><mi
   mathvariant="bold-italic">b</mi><mo>˜</mo></mover><mi>j</mi></msub><mo>
   =</mo><msub><mrow><mi>argmin</mi></mrow><mrow><msub><mi
   mathvariant="bold-italic">b</mi><mi
   mathvariant="bold-italic">j</mi></msub><mo>∈</mo><msup><mi>R</mi><mi>q<
   /mi></msup></mrow></msub><mrow><mo>{</mo><mrow><mfrac><mn>1</mn><mrow><
   mn>2</mn><mi>n</mi></mrow></mfrac><mo>‖</mo><msup><mi
   mathvariant="bold-italic">W</mi><mrow><mfrac><mn>1</mn><mn>2</mn></mfra
   c></mrow></msup><msubsup><mrow><mrow><mo>(</mo><mrow><msub><mi
   mathvariant="bold-italic">x</mi><mrow><mo>·</mo><mi>j</mi></mrow></msub
   ><mo>−</mo><mi mathvariant="bold-italic">Z</mi><msub><mi
   mathvariant="bold-italic">b</mi><mi>j</mi></msub></mrow><mo>)</mo></mro
   w><mo>‖</mo></mrow><mn>2</mn><mn>2</mn></msubsup><mo>+</mo><msub><mi>λ<
   /mi><mn>1</mn></msub><msubsup><mstyle mathsize="80%"
   displaystyle="true"><mo>∑</mo></mstyle><mrow><mi>k</mi><mo>=</mo><mn>1<
   /mn></mrow><mi>q</mi></msubsup><mrow><mo>|</mo><mrow><msub><mi>b</mi><m
   row><mi>j</mi><mi>k</mi></mrow></msub></mrow><mo>|</mo></mrow></mrow><m
   o>}</mo></mrow></mrow></mtd></mtr></mtable></mrow></mrow><mo>,</mo></mr
   ow></mrow> :MATH]
   (2)

   where
   [MATH:
   <mrow><mrow><msub><mrow><mo>‖</mo><mo>·</mo><mo>‖</mo></mrow><mn>2</mn>
   </msub></mrow></mrow> :MATH]
   denotes the L[2] norm,
   [MATH: <mrow><mrow><msub><mi mathvariant="bold-italic">b</mi><mi
   mathvariant="bold-italic">j</mi></msub><mo>=</mo><msup><mrow><mrow><mo>
   (</mo><mrow><msub><mi>b</mi><mrow><mi>j</mi><mn>1</mn></mrow></msub><mo
   >,</mo><mo>…</mo><mo>,</mo><msub><mi>b</mi><mrow><mi>j</mi><mi>q</mi></
   mrow></msub></mrow><mo>)</mo></mrow></mrow><mo>′</mo></msup></mrow></mr
   ow> :MATH]
   is the unknown coefficient vector, and
   [MATH: <mrow><mrow><msub><mi>λ</mi><mn>0</mn></msub></mrow></mrow>
   :MATH]
   ,
   [MATH: <mrow><mrow><msub><mi>λ</mi><mn>1</mn></msub></mrow></mrow>
   :MATH]
   > 0 are data-dependent tuning parameters for Lasso penalties.

   With
   [MATH: <mrow><mrow><msub><mover accent="true"><mi
   mathvariant="bold-italic">θ</mi><mo>˜</mo></mover><mi>j</mi></msub></mr
   ow></mrow> :MATH]
   and
   [MATH: <mrow><mrow><msub><mover accent="true"><mi
   mathvariant="bold-italic">b</mi><mo>˜</mo></mover><mi>j</mi></msub></mr
   ow></mrow> :MATH]
   , the final estimate
   [MATH: <mrow><mrow><msub><mover
   accent="true"><mi>β</mi><mo>˜</mo></mover><mi>j</mi></msub></mrow></mro
   w> :MATH]
   of
   [MATH: <mrow><mrow><msub><mi>β</mi><mi>j</mi></msub></mrow></mrow>
   :MATH]
   is defined as
   [MATH: <mrow><mrow><msub><mover
   accent="true"><mi>β</mi><mo>˜</mo></mover><mi>j</mi></msub><mo>=</mo><m
   sup><mrow><mrow><mo>(</mo><mrow><msub><mover accent="true"><mi
   mathvariant="bold-italic">x</mi><mo>˜</mo></mover><mrow><mo>.</mo><mi>j
   </mi></mrow></msub><msup><mrow></mrow><mo>′</mo></msup><mi
   mathvariant="bold-italic">W</mi><msub><mi
   mathvariant="bold-italic">x</mi><mrow><mo>.</mo><mi>j</mi></mrow></msub
   ></mrow><mo>)</mo></mrow></mrow><mrow><mo>−</mo><mn>1</mn></mrow></msup
   ><msub><mover accent="true"><mi
   mathvariant="bold-italic">x</mi><mo>˜</mo></mover><mrow><mo>.</mo><mi>j
   </mi></mrow></msub><msup><mrow></mrow><mo>′</mo></msup><mi
   mathvariant="bold-italic">W</mi><mrow><mo>(</mo><mrow><mi>y</mi><mo>−</
   mo><mi mathvariant="bold-italic">Z</mi><msub><mover accent="true"><mi
   mathvariant="bold-italic">θ</mi><mo>˜</mo></mover><mi>j</mi></msub></mr
   ow><mo>)</mo></mrow><mo>,</mo></mrow></mrow> :MATH]
   (3)

   where
   [MATH: <mrow><mrow><msub><mover accent="true"><mi
   mathvariant="bold-italic">x</mi><mo>˜</mo></mover><mrow><mo>.</mo><mi>j
   </mi></mrow></msub><mo>=</mo><msub><mi
   mathvariant="bold-italic">x</mi><mrow><mo>·</mo><mi>j</mi></mrow></msub
   ><mo>−</mo><mi mathvariant="bold-italic">Z</mi><msub><mover
   accent="true"><mi
   mathvariant="bold-italic">b</mi><mo>˜</mo></mover><mi>j</mi></msub></mr
   ow></mrow> :MATH]
   . It has been shown in [[67]24] that
   [MATH:
   <mrow><mrow><msqrt><mi>n</mi></msqrt><mrow><mo>(</mo><mrow><msub><mover
   accent="true"><mi>β</mi><mo>˜</mo></mover><mi>j</mi></msub><mo>−</mo><m
   sub><mi>β</mi><mi>j</mi></msub></mrow><mo>)</mo></mrow></mrow></mrow>
   :MATH]
   is asymptotically normal with variance defined as
   [MATH: <mrow><mrow><mover
   accent="true"><mi>Σ</mi><mo>˜</mo></mover><mrow><mo>(</mo><mrow><msub><
   mover
   accent="true"><mi>β</mi><mo>˜</mo></mover><mi>j</mi></msub></mrow><mo>)
   </mo></mrow><mo>=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><msup><mrow><m
   row><mo>(</mo><mrow><msub><mover accent="true"><mi
   mathvariant="bold-italic">x</mi><mo>˜</mo></mover><mrow><mo>.</mo><mi>j
   </mi></mrow></msub><msup><mrow></mrow><mo>′</mo></msup><mi
   mathvariant="bold-italic">W</mi><msub><mi
   mathvariant="bold-italic">x</mi><mrow><mo>.</mo><mi>j</mi></mrow></msub
   ></mrow><mo>)</mo></mrow></mrow><mrow><mo>−</mo><mn>1</mn></mrow></msup
   ><mover accent="true"><mrow><msub><mi>Σ</mi><mn>1</mn></msub></mrow><mo
   stretchy="true">˜</mo></mover><msup><mrow><mrow><mo>(</mo><mrow><msub><
   mi
   mathvariant="bold-italic">x</mi><mrow><mo>.</mo><mi>j</mi></mrow></msub
   ><msup><mrow></mrow><mo>′</mo></msup><mi
   mathvariant="bold-italic">W</mi><msub><mover accent="true"><mi
   mathvariant="bold-italic">x</mi><mo>˜</mo></mover><mrow><mo>.</mo><mi>j
   </mi></mrow></msub></mrow><mo>)</mo></mrow></mrow><mrow><mo>−</mo><mn>1
   </mn></mrow></msup><mo>,</mo></mrow></mrow> :MATH]

   where
   [MATH: <mrow><mrow><mover
   accent="true"><mrow><msub><mi>Σ</mi><mn>1</mn></msub></mrow><mo
   stretchy="true">˜</mo></mover></mrow></mrow> :MATH]
   is the sample variance based on
   [MATH: <mrow><mrow><msub><mover
   accent="true"><mi>x</mi><mo>˜</mo></mover><mrow><mi>i</mi><mi>j</mi></m
   row></msub><mo
   stretchy="false">(</mo><msub><mi>y</mi><mi>i</mi></msub><mo>−</mo><msub
   ><mi>x</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><msub><mover
   accent="true"><mi>β</mi><mo>˜</mo></mover><mi>j</mi></msub><mo>+</mo><m
   sub><mi mathvariant="bold-italic">z</mi><mi>i</mi></msub><msub><mover
   accent="true"><mi
   mathvariant="bold-italic">θ</mi><mo>˜</mo></mover><mi>j</mi></msub><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   . With this asymptotic normal distribution, the test statistic for
   [MATH: <mrow><mrow><msub><mi
   mathvariant="normal">H</mi><mn>0</mn></msub><mo>:</mo><msub><mrow><mrow
   ><mo> </mo><mi>β</mi></mrow></mrow><mi>j</mi></msub><mo>=</mo><mn>0</mn
   ></mrow></mrow> :MATH]
   can be defined as
   [MATH: <mrow><mrow><msub><mover
   accent="true"><mi>β</mi><mo>˜</mo></mover><mi>j</mi></msub><mo>/</mo><m
   sqrt><mrow><mover
   accent="true"><mi>Σ</mi><mo>˜</mo></mover><mrow><mo>(</mo><mrow><msub><
   mover
   accent="true"><mi>β</mi><mo>˜</mo></mover><mi>j</mi></msub></mrow><mo>)
   </mo></mrow><mo>/</mo><mi>n</mi></mrow></msqrt><mo>,</mo></mrow></mrow>
   :MATH]

   which follows Student’s T distribution. The unadjusted p value can then
   be obtained. When all gene expressions are considered together, to
   accommodate multiple comparisons, p values are adjusted using the
   voxel-level false discovery rate approach [[68]25].

   An advantage of the above analysis is that it can be realized via
   simple coding. The most challenging step is the estimation in (2),
   which can be realized using the R function glmnet. The tuning
   parameters
   [MATH: <mrow><mrow><msub><mi>λ</mi><mn>0</mn></msub></mrow></mrow>
   :MATH]
   and
   [MATH: <mrow><mrow><msub><mi>λ</mi><mn>1</mn></msub></mrow></mrow>
   :MATH]
   are selected using the EBIC approach [[69]26]. To facilitate data
   analysis in and beyond this study, we have developed R code
   implementing the proposed approach. To illustrate its usage, we have
   also provided an example R file with the LUAD data. The code and data
   are publicly available at [70]http://www.github.com/shuanggema/TestLDHD
   as well as in [71]Supplementary Materials.

Remark 2.

     The effectiveness of the AFT model for cancer prognosis modeling has
     been well tested. Penalization has been shown effective for
     screening out irrelevant variables and accommodating high
     dimensionality. As shown in [[72]24], the estimation (2) can
     effectively “single out” the effect of the one gene expression. It
     is noted that, as the gene expression effect is of particular
     interest, its coefficient is not subject to penalization in
     estimation. A “byproduct” of penalized estimation is that imaging
     features associated with prognosis, conditional on the one gene
     expression effect, are identified, which may assist understanding
     the associations between imaging features and prognosis as well as
     the associations between imaging features and gene expressions. The
     statistical distribution result for the test statistic has been
     rigorously proved in [[73]24]. The T distribution makes inference
     very easy.

4. Results

4.1. Identification of Gene Expressions with Independent Prognostic Power
Conditional on Imaging Features

   We first apply the analysis approach described above to identify
   individual gene expressions that have independent prognostic power
   conditional on the high dimensional histopathological imaging features.
   With significance level 0.05 as cutoff, 85 genes are identified as
   significantly associated with prognosis for LUAD. Detailed estimation
   results are provided in the [74]Table S1. A quick literature search of
   PubMed suggests that many of the findings have sound biological basis.
   For top ten representative genes, we provide brief information and
   references on their associations with lung cancer prognosis in