Abstract

   Molecularly targeted therapeutics hold promise of revolutionizing
   treatments of advanced malignancies. However, a large number of
   patients do not respond to these treatments. Here, we take a systems
   biology approach to understand the molecular mechanisms that prevent
   breast cancer (BC) cells from responding to lapatinib, a dual kinase
   inhibitor that targets human epidermal growth factor receptor 2 (HER2)
   and epidermal growth factor receptor (EGFR). To this end, we analysed
   temporal gene expression profiles of four BC cell lines, two of which
   respond and the remaining two do not respond to lapatinib. For this
   analysis, we developed a Gaussian process based algorithm which can
   accurately find differentially expressed genes by analysing time course
   gene expression profiles at a fraction of the computational cost of
   other state-of-the-art algorithms. Our analysis identified 519
   potential genes which are characteristic of lapatinib
   non-responsiveness in the tested cell lines. Data from the Genomics of
   Drug Sensitivity in Cancer (GDSC) database suggested that the basal
   expressions 120 of the above genes correlate with the response of BC
   cells to HER2 and/or EGFR targeted therapies. We selected 27 genes from
   the larger panel of 519 genes for experimental verification and 16 of
   these were successfully validated. Further bioinformatics analysis
   identified vitamin D receptor (VDR) as a potential target of interest
   for lapatinib non-responsive BC cells. Experimentally, calcitriol, a
   commonly used reagent for VDR targeted therapy, in combination with
   lapatinib additively inhibited proliferation in two HER2 positive cell
   lines, lapatinib insensitive MDA-MB-453 and lapatinib resistant HCC
   1954-L cells.

1 Introduction

   BC is the most common type of cancer in women accounting for 25% of all
   cases [[36]1]. Clinically, BC is classified based on tumour
   progression, histopathology, and the expression status of HER2,
   oestrogen, and progesterone receptors. The most suitable treatment for
   a patient is decided based on these parameters. HER2-positive BC, in
   which the HER2 receptor is either overexpressed or amplified, accounts
   for approximately 20–25% of human BC cases [[37]2] and is associated
   with poor prognosis [[38]3]. Standard treatment options such as
   radiation, surgery and chemotherapy as well as more targeted approaches
   are used to treat these types of patients. The monoclonal antibody
   trastuzumab [[39]4] and the dual tyrosine kinase inhibitor lapatinib
   are among the targeted therapies currently in clinical use [[40]5].
   Here, we focused on lapatinib, which inhibits both HER2 and EGFR and
   prevents activation of important downstream pathways, such as ERK/MAPK
   (extracellular-signal-regulated kinase/mitogen-activated protein
   kinase) and PI3K (phosphatidylinositol 3-kinase) which can drive cancer
   progression [[41]6, [42]7]. Lapatinib is currently approved for
   treatment of metastatic BC in combination with capecitabine [[43]8],
   and experimentally may synergise with trastuzumab [[44]9]. However, a
   significant proportion of HER2-positive tumours do not respond to
   lapatinib. Recent clinical studies suggest lapatinib has a success rate
   of 12.4–24.7% depending on whether it is administered alone or in
   combination with capecitabine or trastuzumab [[45]8, [46]10, [47]11].
   These low response rates underline the clinical urgency to understand
   and overcome the molecular mechanisms that prevent BC from responding
   to lapatinib treatment.

   Here, we address this question by reanalysing a previously published
   dataset consisting of gene expression changes in a panel of lapatinib
   responsive and non-responsive cell lines over a period of time
   following different doses of lapatinib treatment [[48]12]. We sought to
   compare the temporal patterns of gene expression between the responsive
   and non-responsive cell lines in order to identify genes which are
   characteristic of lapatinib non-responsiveness [[49]12]. Comparing
   temporal patterns of gene expression across different cell
   lines/experimental conditions is not straightforward, mainly because of
   the different types of variabilities in these measurements [[50]13,
   [51]14]. Firstly, genotypic differences within different cell
   populations and measurement errors add random variability to the data
   [[52]13, [53]14]. Secondly, gene expression is subjected to various
   levels of regulation, including epigenetic control, efficiency of
   transcriptional initiation and extension, and post-transcriptional
   processing, such as mRNA splicing and degradation. These processes add
   systematic time dependent variabilities to the measured expression
   [[54]13, [55]14]. Standard statistical tools such as t-test, rank sum
   test etc. which are commonly used to compare gene expression, account
   for the random variabilities, but fail to capture the systematic
   temporal variabilities in gene expression data [[56]13, [57]14].
   Several methods have been proposed to model and compare temporal gene
   expression patterns [[58]15–[59]21]. Gaussian Process (GP) based
   methods are particularly attractive, since they stem from widely
   studied, tried and tested theories and are well known to accurately
   model time course data [[60]16, [61]17, [62]19, [63]21, [64]22]. In
   this paper, we developed a fast and accurate GP based differential time
   course gene expression analysis tool, called GEAGP (Gene Expression
   Analysis using Gaussian Process), to compare temporal gene expression
   patterns. We first tested this tool on a simulated dataset and compared
   its performance with other similar statistical tools. Subsequently, we
   used this tool to identify potential genes which characterize
   non-responsiveness of BC cells to lapatinib. Some of the identified
   genes were then validated by quantitative RT-PCR. Further
   bioinformatics analysis led to the discovery of a potentially new
   target for treating lapatinib insensitive cells, the vitamin D receptor
   (VDR). The potential of the VDR as a new treatment option was then
   examined in cell line models using concurrent treatment with varying
   doses of the VDR agonist calcitriol in combination with different doses
   of lapatinib in lapatinib insensitive and resistant cell lines.

2 Materials and methods

2.1 Imputing missing values in the microarray data

   In the microarray dataset most experiments were performed in
   quadruplicates, except in a few cases where only three replicates were
   available. In these cases, we calculated the sample mean (μ[s]) and
   standard deviations (σ[s]) using the three available replicates and
   then generated a random number by sampling a normal distribution (
   [MATH:
   <mrow><mi>N</mi><mrow><mo>(</mo><mrow><msub><mi>μ</mi><mi>s</mi></msub>
   <mo>,</mo><msubsup><mi>σ</mi><mi>s</mi><mn>2</mn></msubsup></mrow><mo>)
   </mo></mrow></mrow> :MATH]
   ) with the same mean and standard deviation. This random number was
   then used to represent the missing fourth replicate value [[65]23].

2.2 Empirical estimation of the GP parameters μ[i](t), Σ[i]

   We assumed that the expression profile of an mRNA is a GP with mean
   function and covariance matrix μ[i](t), Σ[i] respectively, i.e.
   m[i](t)∼N(μ[i](t), Σ[i]). We further assumed that the mean function
   itself is a GP, i.e. it has the following prior distribution:
   μ[i](t)∼N(μ[0](t), Σ[0]) where μ[0](t), Σ[0] are hyperparameters. Since
   the distribution of the mean function is unknown a priori, μ[0](t) was
   assumed to be zero, and the Σ[0] was assumed to be defined by a
   Ornstein–Uhlenbeck function (
   [MATH:
   <mrow><msubsup><mo>Σ</mo><mn>0</mn><mrow><mi>t</mi><mo>,</mo><msup><mi>
   t</mi><mo>′</mo></msup></mrow></msubsup><mo>=</mo><msub><mi>ψ</mi><mi>i
   </mi></msub><mtext>exp</mtext><mrow><mo>(</mo><mrow><mo>−</mo><mfrac><m
   row><mrow><mo>|</mo><mrow><mi>t</mi><mo>−</mo><msup><mi>t</mi><mo>′</mo
   ></msup></mrow><mo>|</mo></mrow></mrow><mrow><msub><mi>δ</mi><mi>i</mi>
   </msub></mrow></mfrac></mrow><mo>)</mo></mrow></mrow> :MATH]
   ) [[66]24, [67]25] which is an exponential functional form that
   describes the correlation between values of μ[i](t) at different time
   points (μ, μ[i]). The parameters δ[i], ψ[i], d[i], v[i], and
   [MATH: <mrow><msubsup><mi>σ</mi><mi>i</mi><mn>2</mn></msubsup></mrow>
   :MATH]
   of the covariance matrices Σ[0] and Σ[i] can only take positive values.
   Therefore, they were assumed to have gamma prior distributions with
   location and scale parameters α = 0.1 and b = 100 respectively. The
   choices of a, b ensures that the corresponding gamma distributions are
   flat, thereby allowing a wide range of values for the above parameters
   to be considered during inference/estimation of μ[i](t), Σ[i].
   Multiplying the likelihood (see [68]Eq 2 in the “[69]Results” section)
   by the prior distributions of μ[i](t), δ[i], ψ[i], d[i], v[i], and
   [MATH: <mrow><msubsup><mi>σ</mi><mi>i</mi><mn>2</mn></msubsup></mrow>
   :MATH]
   , and integrating the resulting joint distributions with respect to
   μ[i](t) one arrives at the following marginal posterior.
   [MATH: <mrow><mi>p</mi><mo
   stretchy="false">(</mo><msub><mi>δ</mi><mi>i</mi></msub><mo>,</mo><mtex
   t> </mtext><msub><mi>ψ</mi><mi>i</mi></msub><mo>,</mo><mtext> </mtext><
   msub><mi>d</mi><mi>i</mi></msub><mo>,</mo><mtext> </mtext><msub><mi>v</
   mi><mi>i</mi></msub><mo>,</mo><msubsup><mrow><mtext> </mtext><mo>σ</mo>
   </mrow><mtext>i</mtext><mn>2</mn></msubsup><mo>|</mo><msubsup><mi>m</mi
   ><mi>i</mi><mi>r</mi></msubsup><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mr
   ow><mo
   stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mrow><msup><mrow><mr
   ow><mo>(</mo><mrow><mn>2</mn><mi>π</mi></mrow><mo>)</mo></mrow></mrow><
   mrow><mfrac><mrow><msub><mi>N</mi><mi>r</mi></msub><mo>*</mo><msub><mi>
   N</mi><mi>T</mi></msub></mrow><mn>2</mn></mfrac></mrow></msup><mo> </mo
   ><msup><mrow><mrow><mo>|</mo><mrow><msub><mo>Σ</mo><mi>i</mi></msub></m
   row><mo>|</mo></mrow></mrow><mrow><mfrac><mrow><msub><mi>N</mi><mi>r</m
   i></msub></mrow><mn>2</mn></mfrac></mrow></msup></mrow></mfrac><mstyle
   displaystyle="true"><msubsup><mo>∏</mo><mrow><mi>r</mi><mo>=</mo><mn>1<
   /mn></mrow><mrow><msub><mi>N</mi><mi>r</mi></msub></mrow></msubsup><mro
   w><mtext>exp</mtext><mrow><mo>(</mo><mrow><mo>−</mo><mfrac><mrow><msup>
   <mrow><mover
   accent="true"><mrow><msubsup><mi>μ</mi><mi>m</mi><mi>r</mi></msubsup><m
   row><mo>(</mo><mi>t</mi><mo>)</mo></mrow></mrow><mo
   stretchy="true">¯</mo></mover></mrow><mi>T</mi></msup><msubsup><mrow><m
   text> </mtext><mo>Σ</mo></mrow><mi>i</mi><mrow><mo>−</mo><mn>1</mn></mr
   ow></msubsup><mover
   accent="true"><mrow><msubsup><mi>μ</mi><mi>m</mi><mi>r</mi></msubsup><m
   row><mo>(</mo><mi>t</mi><mo>)</mo></mrow></mrow><mo
   stretchy="true">¯</mo></mover></mrow><mn>2</mn></mfrac></mrow><mo>)</mo
   ></mrow><mo>×</mo><mo>Γ</mo><mrow><mo>(</mo><mrow><msup><mi>σ</mi><mn>2
   </mn></msup><mo>|</mo><mi>a</mi><mo>,</mo><mi>b</mi></mrow><mo>)</mo></
   mrow><mo>Γ</mo><mrow><mo>(</mo><mrow><msub><mi>δ</mi><mi>i</mi></msub><
   mo>|</mo><mi>a</mi><mo>,</mo><mi>b</mi></mrow><mo>)</mo></mrow><mo>Γ</m
   o><mrow><mo>(</mo><mrow><msub><mi>σ</mi><mi>i</mi></msub><mo>|</mo><mi>
   a</mi><mo>,</mo><mi>b</mi></mrow><mo>)</mo></mrow><mo>Γ</mo><mrow><mo>(
   </mo><mrow><msub><mi>d</mi><mi>i</mi></msub><mo>|</mo><mi>a</mi><mo>,</
   mo><mi>b</mi></mrow><mo>)</mo></mrow><mo>Γ</mo><mrow><mo>(</mo><mrow><m
   sub><mi>v</mi><mi>i</mi></msub><mo>|</mo><mi>a</mi><mo>,</mo><mi>b</mi>
   </mrow><mo>)</mo></mrow></mrow></mstyle></mrow> :MATH]

   Where
   [MATH: <mrow><mover
   accent="true"><mrow><msubsup><mi>μ</mi><mi>m</mi><mi>r</mi></msubsup><m
   row><mo>(</mo><mi>t</mi><mo>)</mo></mrow></mrow><mo
   stretchy="true">¯</mo></mover><mo>=</mo><msubsup><mi>m</mi><mi>i</mi><m
   i>r</mi></msubsup><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mrow><mo>−</mo>
   <mover
   accent="true"><mrow><msub><mi>μ</mi><mi>i</mi></msub><mrow><mo>(</mo><m
   i>t</mi><mo>)</mo></mrow></mrow><mo
   stretchy="true">¯</mo></mover></mrow> :MATH]
   and
   [MATH: <mrow><mover
   accent="true"><mrow><msub><mi>μ</mi><mi>i</mi></msub><mrow><mo>(</mo><m
   i>t</mi><mo>)</mo></mrow></mrow><mo
   stretchy="true">¯</mo></mover><mo>=</mo><msup><mrow><mo
   stretchy="false">(</mo><msubsup><mo>Σ</mo><mn>0</mn><mrow><mo>−</mo><mn
   >1</mn></mrow></msubsup><mo>+</mo><msub><mi>N</mi><mi>r</mi></msub><msu
   bsup><mo>Σ</mo><mtext>i</mtext><mrow><mo>−</mo><mn>1</mn></mrow></msubs
   up><mo
   stretchy="false">)</mo></mrow><mrow><mo>−</mo><mn>1</mn></mrow></msup><
   mtext> </mtext><msub><mi>N</mi><mi>r</mi></msub><msubsup><mo>Σ</mo><mte
   xt>i</mtext><mrow><mo>−</mo><mn>1</mn></mrow></msubsup><mtext> </mtext>
   <msub><mi>m</mi><mi>a</mi></msub></mrow> :MATH]
   (1)

   We used active set algorithm implemented in MATLAB’s fmincon function
   (see
   [70]http://uk.mathworks.com/help/optim/ug/constrained-nonlinear-optimiz
   ation-algorithms.html for details) to find the values of δ[i], ψ[i],
   d[i], v[i], and
   [MATH: <mrow><msubsup><mi>σ</mi><mi>i</mi><mn>2</mn></msubsup></mrow>
   :MATH]
   which maximizes the above posterior probability. The likelihood of an
   observed mRNA expression profile was calculated by replacing the
   maximum a posteriori estimates of δ[i], ψ[i] in [71]Eq 1, replacing
   μ[i](t) in the likelihood function (see [72]Eq 2 in the “[73]Results”
   section) by
   [MATH: <mrow><mover
   accent="true"><mrow><msub><mi>μ</mi><mi>i</mi></msub><mrow><mo>(</mo><m
   i>t</mi><mo>)</mo></mrow></mrow><mo
   stretchy="true">¯</mo></mover></mrow> :MATH]
   in [74]Eq 1, replacing d[i], v[i], and
   [MATH: <mrow><msubsup><mi>σ</mi><mi>i</mi><mn>2</mn></msubsup></mrow>
   :MATH]
   in [75]Eq 2 by their maximum a posteriori values.

2.3 Simulating time course gene expression profiles

   The temporal profiles m[ij](t) of the i^th gene in the j^th condition
   were generated by sampling a hierarchical GP with mean and covariance
   matrix μ[ij](t), Σ[ij] respectively. The covariance matrix (Σ[ij]) has
   two components
   [MATH:
   <mrow><msubsup><mo>Σ</mo><mrow><mi>i</mi><mi>j</mi></mrow><mrow><mi>t</
   mi><mo>,</mo><msup><mi>t</mi><mo>′</mo></msup></mrow></msubsup></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msubsup><mo>Σ</mo><mrow><mi>i</mi><mi>j</mi></mrow><mi>N</mi></m
   subsup></mrow> :MATH]
   to account for systematic and random variabilities, respectively. For
   simulation, we used a Gaussian kernel function (
   [MATH:
   <mrow><msubsup><mo>Σ</mo><mrow><mi>i</mi><mi>j</mi></mrow><mrow><mi>t</
   mi><mo>,</mo><msup><mi>t</mi><mo>′</mo></msup></mrow></msubsup><mo>=</m
   o><mtext>exp</mtext><mo>−</mo><mfrac><mrow><msup><mrow><mrow><mo>(</mo>
   <mrow><mi>t</mi><mo>−</mo><msup><mi>t</mi><mo>′</mo></msup></mrow><mo>)
   </mo></mrow></mrow><mn>2</mn></msup></mrow><mrow><msub><mi>l</mi><mrow>
   <mi>i</mi><mi>j</mi></mrow></msub></mrow></mfrac></mrow> :MATH]
   ) to model the systematic variabilities
   [MATH:
   <mrow><mrow><mo>(</mo><mrow><msubsup><mo>Σ</mo><mi>i</mi><mrow><mi>t</m
   i><mo>,</mo><msup><mi>t</mi><mo>′</mo></msup></mrow></msubsup></mrow><m
   o>)</mo></mrow></mrow> :MATH]
   where l[ij] is the smoothing parameter. The random noise is Gaussian
   with 0 mean and variance
   [MATH:
   <mrow><msubsup><mi>σ</mi><mrow><mi>i</mi><mi>j</mi></mrow><mn>2</mn></m
   subsup></mrow> :MATH]
   , i.e.
   [MATH:
   <mrow><msubsup><mo>Σ</mo><mrow><mi>i</mi><mi>j</mi></mrow><mi>N</mi></m
   subsup><mo>=</mo><msubsup><mi>σ</mi><mrow><mi>i</mi><mi>j</mi></mrow><m
   n>2</mn></msubsup><mi>I</mi></mrow> :MATH]
   , I is an identity matrix. The smoothing parameter l[ij] was sampled
   from a gamma distribution with shape and scale parameters a[ij] and
   b[ij], respectively. The mean function μ[ij](t) itself was randomly
   generated by sampling a GP with mean 0 and covariance matrix
   [MATH:
   <mrow><mo> </mo><msubsup><mo>Σ</mo><mrow><mi>i</mi><mi>j</mi></mrow><mr
   ow><mi>t</mi><mo>,</mo><msup><mi>t</mi><mo>′</mo></msup></mrow></msubsu
   p></mrow> :MATH]
   . If a gene (i) is not differentially expressed between two conditions
   (j = 1, 2), then μ[i1](t) = μ[i2](t) and a[i1] = a[i2] = 15, b[ij] = 1.
   In the opposite case, i.e. when a gene (i) is differentially expressed
   between two conditions (j = 1, 2), we assumed μ[i1](t) ≠ μ[i2](t) and
   these were generated by sampling two corresponding GPs. The smoothing
   parameters l[i1],l[i2] were generated by sampling Gamma distributions
   with the parameters a[i1] = 15, b[i2] = 1; a[i2] = 25, b[i2] = 1
   respectively.

2.4 Gene expression data

   The gene expression dataset, which has been described previously, was
   obtained from GSK in the form of raw data files (.cel files) [[76]12].
   Hegde et al., treated four cell lines (BT-474, SKBR-3, T47D and
   MDA-MB-468) with 0.1% DMSO (control), 0.1 μM lapatinib and 1 μM
   lapatinib at 0, 2, 6, 12, 24 hours. Not all samples that were taken had
   data for all time-points (see [77]Table 1). All measurements were
   performed using the U133A Affymetrix human 22,000-element microarray.
   There were 36 arrays for each cell line, so a total of 144 arrays were
   used in our analysis. We performed a probe centric analysis, i.e. the
   intensity values of each Affymetrix probe was modelled using the
   aforementioned GP model. Once a probe was identified as a potential
   marker, the corresponding gene was identified using BioMart
   ([78]http://www.biomart.org/) mapping service.

Table 1. Treatment timepoints available from the Hedge et al.’s dataset.

   Cell lines 0.1 μM lapatinib   1 μM lapatinib    0.1% DMSO
   SKBR-3     6, 12 and 24 hours 6 and 12 hours    0, 2, 6, 12, 24 hours
   BT-474     2, 6, 12, 24 hours
   T-47D                         6, 12 and 24 hours
   MDA-MB-468
   [79]Open in a new tab

2.5 Cell-lines and reagents

   SKBR-3, BT-474, MDA-MB-453, HCC 1954, MDA-MB-468, and T47D were
   obtained from ATCC and the JIMT-1 cells were obtained from the German
   Tissue Repository DSMZ. HCC 1954-L was developed in-house through
   continuous exposure to lapatinib [[80]26]. BC cell lines were
   maintained in RPMI 1640 medium supplemented with 10% fetal bovine serum
   (Gibco). HCC 1954-L were maintained in culture in 1μM lapatinib,
   however, cells were removed from drug 7 days prior to all assays. All
   cell lines were maintained at 37°C in a 5% CO[2] incubator.

   Calcitriol and lapatinib were obtained from Sequoia Research Products
   (Pangbourne UK). Cell culture media, DMSO, ethanol were obtained from
   Sigma-Aldrich (Dublin, Ireland).

   Lapatinib stock solution (10 mM) was prepared in DMSO. Calcitriol stock
   solution (10 mM) was prepared in ethanol and stored at -20°C.

2.6 Lapatinib treatment and RNA extraction

   Triplicate RNA samples were prepared from SKBR-3, BT-474, MDA-MB-468,
   T47D, MDA-MB-453 and JIMT-1. The cell lines were grown to approximately
   75% confluency. Test samples were treated with 1 μM lapatinib for 16
   hours. Control samples were treated with equivalent concentrations of
   DMSO. After the 16 hour incubation, RNA was extracted from the control
   and treated samples using the RNeasy mini Kit (Qiagen, Hilden, Germany)
   according to the manufacturer’s protocol and were treated with
   RNase-free DNase (Qiagen).

   RNA was isolated from MDA-MB-453 and JIMT-1 cells using TriReagent
   (Sigma, Ireland). The RNA was extracted using a liquid-liquid
   extraction procedure with chloroform (0.2 mL) and iso-propanol. The RNA
   was washed with 75% ethanol and resuspended in 40 μL RNase free water.

   cDNA was prepared from 2 μg of total RNA using a high capacity RNA to
   cDNA kit (Applied Biosystems, Foster City, CA, USA).

2.7 Taqman RT PCR

   Taqman Array 96 well fast plates (Applied Biosystems, Dublin Ireland)
   were pre-coated with primers for the specific genes for validation
   (ALDH3A2, AMD1, ANKRD10, ATP2B4, COPS3, EIF2S2, EIF4B, GAPDH, GTPBP4,
   GUSB, IGBP1, ING4, KLHDC2, KLHL24, MPZL1, NUDC, PPIC, PPP2CA, PTP4A1,
   PTP4A2, RB1CC1, RNF13, SOCS3, SPAG1, TFAP2C, TIMM17A, TJP2, VDR, WIPI2,
   ZDHHC17).

   40 ng of cDNA and 5 μL of Taqman Fast Universal Master Mix (2x), no
   AmpErase UNG (Applied Biosystems) were dispensed into each well.

   The following thermal cycling specifications were performed on the ABI
   7900 Fast Real-Time PCR system (Applied Biosystems, Foster City, CA,
   USA); 20 s at 95°C and 40 cycles each for 3 s at 95°C and 30 s at 60°C.
   Expression values were calculated using the comparative cycle threshold
   (Ct) method [[81]27]. 18S ribosomal RNA was selected as the endogenous
   control. The cycle threshold (Ct) indicates the cycle number by which
   the amount of amplified target reaches a fixed threshold. The Ct data
   for 18S was used to create ΔCt values [ΔCt = Ct (target gene)-Ct
   (18S)]. ΔΔCt values were calculated by subtracting ΔCt of the
   calibrator (DMSO controls) from the ΔCt value of each target. Relative
   quantification (RQ) values were calculated using the equation 2-ΔΔCt.
   Differences in the mRNA expression level between untreated and drug
   treated samples were assessed using the Students t test.

   A VDR gene expression assays (Hs01045840_m1) was performed with
   MDA-MB-453 and JIMT-1 cDNA (5 ng/well), using 18S as (Hs99999901_S2) as
   endogenous control.

2.8 Proliferation assay in vitro

   Proliferation was measured using an acid phosphatase assay [[82]28].
   2.5 to 5 x 10^3 cells per well were seeded in 96 plates and incubated
   for 24 hours prior to the addition of drug. After 5 days of drug
   treatment cells were washed with PBS. 10 mM paranitrophenol phosphate
   substrate (Sigma-Aldrich) in 0.1 M sodium acetate buffer with 0.1%
   Triton X (Sigma Aldrich) was added to each well and incubated at 37°C
   for 2 hours. The reaction as stopped with 50 μL of 1 M NaOH and the
   absorbance was read at 405 nM (reference—620 nM). Growth of drug
   treated cells was calculated relative to control untreated cells in
   biological triplicate.

3 Results

3.1 Comparing time course gene expression profiles using GP

   Time course gene expression data consists of noisy measurements of mRNA
   levels at different time points. As discussed before, these
   measurements contain both systematic and random variabilities. It is
   necessary to take both types of variabilities into account when
   comparing two sets of time course measurements of mRNA expression. We
   developed an empirical GP model of time course mRNA profiles that
   captures (a) average mRNA levels at different points in time, (b) the
   extent of random uncertainty caused by cell to cell variability and
   experimental measurement noise; and (c) the temporal variability in
   mRNA profiles caused by internal mechanisms of gene regulation. In our
   formulation, a time course mRNA profile (m[i](t)) is assumed to have a
   Gaussian distribution with a mean (μ[i](t)) which represents the
   average mRNA levels at different time points (t), and a covariance
   matrix (Σ[i]) which captures the overall variability in the mRNA
   profile. The covariance matrix (Σ[i]) has two components (
   [MATH:
   <mrow><msub><mo>Σ</mo><mi>i</mi></msub><mo>=</mo><msubsup><mo>Σ</mo><mi
   >i</mi><mrow><mi>t</mi><mo>,</mo><msup><mi>t</mi><mo>′</mo></msup></mro
   w></msubsup><mo>+</mo><msubsup><mo>Σ</mo><mi>i</mi><mi>N</mi></msubsup>
   </mrow> :MATH]
   ), one (
   [MATH:
   <mrow><msubsup><mo>Σ</mo><mi>i</mi><mrow><mi>t</mi><mo>,</mo><msup><mi>
   t</mi><mo>′</mo></msup></mrow></msubsup></mrow> :MATH]
   ) accounts for systematic variability between mRNA expressions at any
   two instants of time (t, t') and the other (
   [MATH: <mrow><msubsup><mo>Σ</mo><mi>i</mi><mi>N</mi></msubsup></mrow>
   :MATH]
   ) accounts for random noise. We used a modified Ornstein–Uhlenbeck
   function (
   [MATH:
   <mrow><msubsup><mo>Σ</mo><mi>i</mi><mrow><mi>t</mi><mo>,</mo><msup><mi>
   t</mi><mo>′</mo></msup></mrow></msubsup><mo>=</mo><msub><mi>v</mi><mi>i
   </mi></msub><mtext>exp</mtext><mrow><mo>(</mo><mrow><mo>−</mo><mfrac><m
   row><mrow><mo>|</mo><mrow><mi>t</mi><mo>−</mo><msup><mi>t</mi><mo>′</mo
   ></msup></mrow><mo>|</mo></mrow></mrow><mrow><msub><mi>d</mi><mi>i</mi>
   </msub></mrow></mfrac></mrow><mo>)</mo></mrow></mrow> :MATH]
   ) [[83]24, [84]25, [85]29], to model the systematic variabilities
   [MATH:
   <mrow><mrow><mo>(</mo><mrow><msubsup><mo>Σ</mo><mi>i</mi><mrow><mi>t</m
   i><mo>,</mo><msup><mi>t</mi><mo>′</mo></msup></mrow></msubsup></mrow><m
   o>)</mo></mrow></mrow> :MATH]
   in mRNA expressions between two instants in time (t, t'), where v[i]
   and d[i] are two parameters [[86]30]. The random noise is assumed to be
   Gaussian with 0 mean and variance
   [MATH: <mrow><msubsup><mi>σ</mi><mi>i</mi><mn>2</mn></msubsup></mrow>
   :MATH]
   (i.e.
   [MATH:
   <mrow><msubsup><mo>Σ</mo><mi>i</mi><mi>N</mi></msubsup><mo>=</mo><msubs
   up><mi>σ</mi><mi>i</mi><mn>2</mn></msubsup><mi>I</mi></mrow> :MATH]
   , I is an identity matrix). Under these assumptions, the likelihood of
   a set of observed mRNA profiles (
   [MATH:
   <mrow><msubsup><mi>m</mi><mi>i</mi><mi>r</mi></msubsup><mrow><mo>(</mo>
   <mi>t</mi><mo>)</mo></mrow></mrow> :MATH]
   ) is given by
   [MATH:
   <mrow><mi>p</mi><mrow><mo>(</mo><mrow><msubsup><mi>m</mi><mi>i</mi><mi>
   r</mi></msubsup><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mrow><mo>|</mo><m
   sub><mi>μ</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mr
   ow><mo>,</mo><msub><mo>Σ</mo><mi>i</mi></msub></mrow><mo>)</mo></mrow><
   mo>=</mo><mfrac><mn>1</mn><mrow><msup><mrow><mrow><mo>(</mo><mrow><mn>2
   </mn><mi>π</mi></mrow><mo>)</mo></mrow></mrow><mrow><mfrac><mrow><msub>
   <mi>N</mi><mi>r</mi></msub><mo>*</mo><msub><mi>N</mi><mi>T</mi></msub><
   /mrow><mn>2</mn></mfrac></mrow></msup><mo> </mo><msup><mrow><mrow><mo>|
   </mo><mrow><msub><mo>Σ</mo><mi>i</mi></msub></mrow><mo>|</mo></mrow></m
   row><mrow><mfrac><mrow><msub><mi>N</mi><mi>r</mi></msub></mrow><mn>2</m
   n></mfrac></mrow></msup></mrow></mfrac><mstyle
   displaystyle="true"><msubsup><mo>∏</mo><mrow><mi>r</mi><mo>=</mo><mn>1<
   /mn></mrow><mrow><msub><mi>N</mi><mi>r</mi></msub></mrow></msubsup><mro
   w><mtext>exp</mtext><mrow><mo>(</mo><mrow><mo>−</mo><mfrac><mrow><msubs
   up><mi>μ</mi><mi>m</mi><mi>r</mi></msubsup><msup><mrow><mrow><mo>(</mo>
   <mi>t</mi><mo>)</mo></mrow></mrow><mi>T</mi></msup><msubsup><mo>Σ</mo><
   mi>i</mi><mrow><mo>−</mo><mn>1</mn></mrow></msubsup><msubsup><mi>μ</mi>
   <mi>m</mi><mi>r</mi></msubsup><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mro
   w></mrow><mn>2</mn></mfrac></mrow><mo>)</mo></mrow></mrow></mstyle></mr
   ow> :MATH]
   (2)

   where N[r] is the number of replicates and
   [MATH:
   <mrow><msubsup><mi>μ</mi><mi>m</mi><mi>r</mi></msubsup><mrow><mo>(</mo>
   <mi>t</mi><mo>)</mo></mrow><mo>=</mo><msubsup><mi>m</mi><mi>i</mi><mi>r
   </mi></msubsup><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mrow><mo>−</mo><ms
   ub><mi>μ</mi><mi>i</mi></msub><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mro
   w></mrow> :MATH]
   . μ[i](t) and Σ[i] are estimated from observed data as shown in
   Materials and Methods ([87]M&M).

   We designed a statistical hypothesis test to see whether two sets of (
   [MATH:
   <mrow><msubsup><mi>m</mi><mi>i</mi><mrow><msub><mi>c</mi><mn>1</mn></ms
   ub></mrow></msubsup><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mrow><mo>,</m
   o><msubsup><mi>m</mi><mi>i</mi><mrow><msub><mi>c</mi><mn>2</mn></msub><
   /mrow></msubsup><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mrow></mrow>
   :MATH]
   ) expression profiles of an mRNA, measured under two experimental
   conditions (c[1], c[2]), are significantly different from each other.
   In this test we compared the likelihoods ([88]Eq 2) of two hypotheses:
   (a) both sets of mRNA profiles (
   [MATH:
   <mrow><msubsup><mi>m</mi><mi>i</mi><mrow><msub><mi>c</mi><mn>1</mn></ms
   ub></mrow></msubsup><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mrow><mo>,</m
   o><msubsup><mi>m</mi><mi>i</mi><mrow><msub><mi>c</mi><mn>2</mn></msub><
   /mrow></msubsup><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mrow></mrow>
   :MATH]
   ) have the same GP model (
   [MATH:
   <mrow><msubsup><mi>m</mi><mi>i</mi><mrow><msub><mi>c</mi><mn>1</mn></ms
   ub></mrow></msubsup><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mrow><mo>,</m
   o><msubsup><mi>m</mi><mi>i</mi><mrow><msub><mi>c</mi><mn>2</mn></msub><
   /mrow></msubsup><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mrow><mo>~</mo><m
   i>G</mi><mi>P</mi><mrow><mo>(</mo><mrow><msub><mi>μ</mi><mi>i</mi></msu
   b><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mrow><mo>,</mo><msub><mo>Σ</mo>
   <mi>i</mi></msub></mrow><mo>)</mo></mrow></mrow> :MATH]
   ); and (b) they come from two different sets of GP models (
   [MATH:
   <mrow><msubsup><mi>m</mi><mi>i</mi><mrow><msub><mi>c</mi><mn>1</mn></ms
   ub></mrow></msubsup><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mrow><mo>~</m
   o><mi>G</mi><mi>P</mi><mrow><mo>(</mo><mrow><msubsup><mi>μ</mi><mi>i</m
   i><mrow><msub><mi>c</mi><mn>1</mn></msub></mrow></msubsup><mrow><mo>(</
   mo><mi>t</mi><mo>)</mo></mrow><mo>,</mo><msubsup><mo>Σ</mo><mi>i</mi><m
   row><msub><mi>c</mi><mn>1</mn></msub></mrow></msubsup></mrow><mo>)</mo>
   </mrow><mo>,</mo><msubsup><mi>m</mi><mi>i</mi><mrow><msub><mi>c</mi><mn
   >2</mn></msub></mrow></msubsup><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mr
   ow><mo>~</mo><mi>G</mi><mi>P</mi><mrow><mo>(</mo><mrow><msubsup><mi>μ</
   mi><mi>i</mi><mrow><msub><mi>c</mi><mn>2</mn></msub></mrow></msubsup><m
   row><mo>(</mo><mi>t</mi><mo>)</mo></mrow><mo>,</mo><msubsup><mo>Σ</mo><
   mi>i</mi><mrow><msub><mi>c</mi><mn>2</mn></msub></mrow></msubsup></mrow
   ><mo>)</mo></mrow></mrow> :MATH]
   )). The comparisons were performed using a likelihood ratio test. The
   test was repeated for each mRNA in the dataset, producing a set of
   p-values. Subsequently, the Benjamini-Hochberg method was used to
   correct for multiple testing, and differentially expressed mRNAs were
   selected within a 5% false discovery rate (FDR). These mRNAs are the
   ones whose temporal expression patterns are significantly different
   between the two experimental conditions under investigation. An
   implementation of the above pipeline, named GEAGP, in MATLAB is freely
   available for download from [89]https://github.com/SBIUCD/GEAGP.git.

3.2 Comparing the performance of GEAGP with other methods

   To evaluate and compare the performance of the GEAGP framework with
   other methods we generated 100 simulated datasets. Each simulated
   dataset consists of the expression profiles of 1000 genes measured at
   10 equal intervals over a period of 100 hours in two different
   conditions. Four replicates were generated for each expression profile.
   Approximately half of the genes in each dataset were randomly chosen to
   have differential expression patterns between the two conditions. For
   the simulation, we assumed that the genes which are not differentially
   expressed have the same underlying pattern (average expression at each
   time point) with the same level of added systematic and random
   variabilities in each replicate across the two conditions. On the other
   hand, the genes which are differentially expressed were assumed to have
   different underlying patterns with different levels of systematic
   variabilities across the two conditions. The random variability, which
   represents biological variability and measurement errors, was kept at
   the same level in both conditions. Details of the data simulation
   process are described in the methods section. Random noise was kept at
   a moderate level (standard deviation = 0.2).

   Then, we used four different methods to find differentially expressed
   genes in each simulated dataset. Two of these methods, ttest & ranksum
   test, are widely used to analyse static gene expression data. The other
   two methods, GEAGP and GPREGE [[90]17], use GP to analyse temporal
   expression profiles. The main differences between GEAGP and GPREGE
   [[91]17] are the following: GEAGP directly estimates the mean functions
   of the GP from data, whereas GPREGE assumes that the mean function is a
   linear combination of a set of basis functions, and attempts to infer
   the coefficients of these linear equations. The two methods also use
   different mathematical formulations for parameter estimation and
   optimization process (see [92]M&M and [[93]17] for details). Prediction
   accuracy of each method was evaluated by calculating the area under the
   receiver operating characteristic (AUROC) curve (for details see
   [[94]31]) based on the results produced by these methods on each
   dataset. AUROCs can have values between 0 and 1, and the closer these
   values are to 1 the better is the accuracy of the inferred networks,
   with AUROC = 1 being the ideal case. The average AUROCs across all 100
   datasets represents the overall prediction accuracy of a method and the
   corresponding standard deviation represents the confidence intervals
   surrounding these estimates. The results of the above analysis suggest
   that GEAGP and GPREGE have comparable accuracies (AUROCs = 0.946 ±
   0.0074 and 0.952 ± 0.0046 respectively), whereas both t-test and
   ranksum test were less accurate than the GP based methods (average
   AUROCs = 0.7538± 0.0087 and 0.7432 ± 0.0082 respectively). Although the
   accuracies of GPREGE and GEAGP are comparable, GEAGP has the distinct
   advantage of speed. In the above dataset GEAGP and GPREGE took on an
   average ≈ 0.5 and ≈ 7.5 seconds, respectively, to execute each
   differential expression test on a Intel Core i7 (3rd Gen) 3720QM based
   laptop with 20Gb RAM, i.e. GEAGP is on an average about 15 times faster
   than the GPREGE. To illustrate what it means for a medium scale
   bioinformatic analysis, consider a dataset consisting 20,000 genes and
   10 experimental conditions. To find differentially expressed genes
   between each pair of conditions one needs to perform
   [MATH:
   <mrow><mrow><mo>(</mo><mrow><mtable><mtr><mtd><mrow><mn>10</mn></mrow><
   /mtd></mtr><mtr><mtd><mn>2</mn></mtd></mtr></mtable></mrow><mo>)</mo></
   mrow><mo>=</mo><mn>45</mn></mrow> :MATH]
   differential expression tests for each gene. Using a single core of the
   Intel Core i7 (3rd Gen) 3720QM processor, GEAGP will take approximately
   5 days to do this analysis, whereas GPREGE will take more than 78 days.
   Using all four cores of the processor GEAGP will finish the task in
   about 1 days and 8 hours, whereas GPREGE will take more than 19 days.
   Thus, GEAGP provides similar accuracy as GPREGE at a fraction of the
   computational cost, making it more suitable for the analysis of medium
   to large scale time course data sets. Encouraged by the above results,
   we used GEAGP to analyse real gene expression datasets.

3.3 Discovering markers of lapatinib insensitivity in BC cells using GEAGP

   GEAGP was used to analyse time course mRNA profiles in BC cell lines
   which were treated with different doses of lapatinib and DMSO [[95]12].
   In this experiment the mRNA expression profiles were measured in two
   lapatinib responsive (BT-474, SKBR-3) and two lapatinib non-responsive
   (MDA-MB-468, T47D) BC cell lines. HER2 and EGFR, the two receptor
   kinases targeted by lapatinib, have different expression patterns in
   these cell lines ([96]Table 2).

Table 2. Histological-subtypes, HER2 and EGFR receptor expressions and
mutation status of SKBR-3, BT-474, T47D and MDA-MB-468 cell lines.

   Protein Expression Gene Mutation Status
   Cell line Subtype EGFR HER2 KRAS BRAF PI3K RB1 p53 PTEN ERBB2
   SKBR-3 HER2 +ve - - R175H
   BT-474 HER2 +ve 38.65 97.99 K111N E285K AMPL
   T47D Luminal A 44.35 77.55 H1047R L194F
   MDA-MB-468 Basal Like 99.66 39.05 DEL R273H E545A
   [97]Open in a new tab

   The percentile scores indicate the percentage of cell lines in GDSC
   database which had similar or lower expressions of the same genes. The
   first column contains cell-line name. Second column contains
   histological subtypes, third and fourth columns contain expression of
   EGFR and HER2 receptors. The expressions are given in percentiles, i.e.
   percent of cell-lines in the GDSC database which have similar or lower
   expressions of HER2 or EGFR than the cell line corresponding to the
   percentile value. Fifth till eleventh columns contain mutation status.
   Empty cells represent no mutation, DEL represents deletion, AMPL
   represents amplification, and the remaining non-empty cells contains
   amino acid substitution information.

   HER2 is over-expressed in both lapatinib responsive cell lines SKBR-3
   and BT-474 [[98]32]. EGFR is overexpressed in SKBR3 [[99]33], but has
   relatively low expression in BT-474 (~62% of 1018 cancer cell lines in
   the GDSC database had higher EGFR expression than BT-474) [[100]34].
   The non-responsive cell lines, T47D and MDA-MB-468 are classified as
   luminal A and basal-like BC cells in literature [[101]32]. Both types
   of BC cells have low HER2 expression. However, T47D has high HER2
   expression (more than 77.55% of GDSC cell lines [[102]34]) and moderate
   low EGFR expression (lower than ~56% of GDSC cell lines [[103]34]).
   MDA-MB-468 cells have low HER2 expression (lower than ~61% of GDSC cell
   lines [[104]34])) and very high level of EGFR expression (higher than
   ~99.6% of all GDSC cell lines [[105]34]). We also looked at the
   mutation spectrum of these cell lines using the COSMIC database
   (cancer.sanger.ac.uk). Data on some of the mutations which were
   previously implicated in lapatinib resistance are shown in [106]Table
   2. Among the responsive cell lines, SKBR3 has a p53 mutation, whereas,
   BT-474 has PI3K, ERBB2 and p53 mutations. Among the non-responsive cell
   lines, T47D has a p53 and PI3KCA mutation, whereas MDA-MB-468 has p53,
   PTEN and RB1 mutations. None of the cell lines have KRAS or BRAF
   mutation. Therefore, there is no particular set of mutations that
   separate the above responsive and non-responsive cell lines. This
   suggest that the expressions of EGFR and HER2 receptors and/or the
   mutation statuses do not explain the responsiveness of the above cell
   lines to lapatinib treatment. Hence, these cell lines are reasonable
   choices for studying lapatinib response in BC cells.

   The cells were treated with 0.1 μM DMSO, 0.1 μM or 1 μM
   lapatinib[[107]12]. The measurements were taken at 0, 2, 6, 12 and 24
   hours for DMSO treated cells; at 2, 6, 12, 24 hours for cells treated
   with low doses (0.1μM) of lapatinib (except for SKBR-3 cells for which
   no measurements were available at 2 hours); and at 2, 6 and 12 hours
   for cells treated with high doses (1.0 μM) of lapatinib (except for
   BT-474 and SKBR-3 cell lines for which measurements were not available
   at 2 hours). Each experiment was replicated 3–4 times. Missing
   replicates (where only 3 replicates were available instead of 4) were
   imputed based on the intensities observed in the other replicate
   experiments (see [108]M&M for details). The resulting dataset was then
   used to identify potential markers of lapatinib insensitivity in BC
   cells.

   We assumed that genes which satisfy all of the following criteria are
   potential markers for lapatinib insensitivity:
     * Have similar expression profiles within the responsive or
       non-responsive groups of cell lines, but are differentially
       expressed between these two groups.
     * Have altered expression in response to both doses of lapatinib
       treatment in responsive cells but not in non-responsive cells.

   To find genes which satisfy the above criteria we performed a two stage
   analysis. In the first stage each mRNA expression profile was subjected
   to the following hypothesis tests ([109]Fig 1):

Fig 1. A schematic diagram of the GP based analysis performed on the gene
expression dataset of [[110]12].

   [111]Fig 1
   [112]Open in a new tab
     * Responsive vs non-responsive groups (Test 1): The objective of this
       test was to identify genes which have differential expression
       patterns between DMSO treated responsive (SKBR-3, BT-474) and
       non-responsive cell lines (MDA468, T47D). Here we tested the
       hypothesis that a single GP model is sufficient to represent the
       mRNA expression profiles of a gene in all four cell lines when the
       cells are treated with DMSO, vs the alternative hypothesis that two
       GP models are needed, one for the responsive cell lines and another
       for the non-responsive cell lines.
     * Responds to 0.1μM of lapatinib in responsive cells (Test 2): This
       test aimed to identify genes which have differential expression
       patterns between DMSO treated and 0.1 μM lapatinib treated
       responsive cells (BT-474, SKBR-3). Here we tested the hypothesis
       that one GP model is sufficient to model the mRNA expression
       profiles of a gene in both DMSO and lapatinib (0.1 μM) treated
       responsive cell-lines (BT-474, SKBR-3) against the alternative
       hypothesis that two separate GP models are needed, one for DMSO
       treated cells and the other for lapatinib (0.1 μM) treated cells.
     * Responds to 1 μM of lapatinib in responsive cells (Test 3): The
       goal of this test was to identify genes which have differential
       expression patterns between DMSO and 1 μM lapatinib treated
       responsive cells (BT-474, SKBR-3). Here, we tested the hypothesis
       that the one GP model is sufficient to model the mRNA expression
       profiles in DMSO and lapatinib (1 μM) treated responsive cells
       (BT-474, SKBR-3) against the alternative hypothesis that two
       separate GP models are needed, one for DMSO treated cells and the
       other for lapatinib (0.1 μM) treated cells.
     * Responds to 0.1 μM of lapatinib in non-responsive cells (Test 4):
       This test identified genes whose expression is altered in response
       to low doses (0.1 μM) of lapatinib in non-responsive cells (MDA468
       and T47D). Here, we tested the hypothesis that one GP model is
       sufficient to model the mRNA expression profiles in DMSO and
       lapatinib (0.1 μM) treated non-responsive cells (MDA468 and T47D)
       against the hypothesis that two separate GP models are needed, one
       for DMSO treated cells and the other for lapatinib (0.1 μM) treated
       cells.
     * Responds to 1μM of lapatinib in non-responsive cells (Test 5): This
       test identified the genes whose expression in altered in response
       to high doses (1 μM) of lapatinib in non-responsive cells (MDA468
       and T47D). This tested the hypothesis that only one GP model is
       sufficient to model the mRNA expression profiles in DMSO and
       lapatinib (1 μM) treated non-responsive cells (MDA468 and T47D)
       against the hypothesis that two separate GP models are need, one
       for DMSO treated cells and the other for lapatinib (0.1 μM) treated
       cells.

   The genes which passed the first test and at least one of the second
   and third tests, but fail both of the last two tests within a 5% false
   discovery rate were assumed to satisfy the criteria of genes
   selectively regulated by lapatinib in responsive, but not in
   non-esponsive cells. This analysis selected 519 genes out of 22000
   genes as potential markers of lapatinib response (Table A in [113]S1
   File). In a second step, these genes were further analysed using
   established bioinformatics methods as described below.

3.4 Bioinformatics analysis of markers of lapatinib insensitivity in BC cells

   First, we performed gene ontology enrichment analysis on the 519
   selected genes using DAVID [[114]35, [115]36]. The enriched gene
   ontology terms were then summarized and visualized by REVIGO ([116]Fig
   2A) [[117]37]. The most enriched gene ontology terms fall into seven
   main categories, i.e. (i) cell cycle, (ii) apoptosis or programmed cell
   death, (iii) DNA damage, (iv) RNA splicing, (v) intracellular
   transport, (vi) signal transduction and (vii) chromosome assembly.
   These biological processes are usually found altered in cancer cells,
   and therefore it is not surprising to see genes which participate in
   these process responding differentially to lapatinib treatment in
   responsive and non-responsive BC cells. We also performed pathway
   enrichment analysis using DAVID [[118]35, [119]36] and employing KEGG
   [[120]38] and PANTHER pathway databases [[121]39]. Several signalling
   pathways that play key roles in cancer, e.g. EGFR pathway, FGF pathway,
   spliceosome, proteasome, FcγR mediated signalling, were also found to
   be enriched in the selected genes ([122]Fig 2B). FGF and FcγR share
   several components with EGFR pathway which is activated by the EGFR
   family of receptors, two of which are targets of lapatinib. Ten genes,
   RAC1, STAT1, YWHAZ, PPP2CB, MAP2K1, MAPK1, PPP2R5B, PLCG1, PPP2CA,
   STAT5B, belonging to the EGFR pathway were in the list of potential
   markers of lapatinib insensitivity.

Fig 2. Bioinformatics analysis of the selected genes.

   [123]Fig 2
   [124]Open in a new tab

   (A) Enriched gene ontology terms are summarized and visualized by
   REVIGO. Here, each circle represents a gene ontology term and the size
   of the circle represents the extent of enrichment. (B) Pathway
   enrichment analysis of the selected genes. Pathways are shown in X-axis
   and the enrichment scores are shown in Y-axis. (C) Transcriptional
   module found in the identified genes. (D) Transcriptional activity of
   VDR. (E) PPI network induced by the identified genes. (F,G) PPIs of VDR
   and EIF2S2.

   We then looked for known transcription regulatory interactions (TRIs)
   among the selected genes. Human genome wide known TRIs were retrieved
   from two sources, the ENCODE database containing ChIP (chromatin
   immunoprecipitation) on ChIP data [[125]40–[126]42] and the HTRIB
   database [[127]43] which contains experimentally validated TRIs. All
   TRIs, where both the transcription regulator and its target gene were
   identified as potential markers for lapatinib insensitivity in the
   aforementioned GEAGP analysis, were retrieved. The ensemble of these
   TRIs ([128]Fig 2C) represents part of the known human transcription
   regulatory network which is dysregulated in lapatinib non-responsive BC
   cells. Interestingly, this dysregulated network consists of a
   transcriptional module formed by eight transcription factors STAT1,
   CTCF, EP300, FOXA1, VDR, MAPK1, TFAP2C, E2F1 ([129]Fig 2C) which
   regulate each other and a large number of the genes selected by the
   GEAGP analysis. Among these transcription factors, STAT1, MAPK1, FOXA1,
   TFAP2C, E2F1 are associated with resistance or insensitivity to several
   anticancer therapies including tyrosine kinase inhibitors such as
   lapatinib in BC cells [[130]44–[131]51]. CTCF and EP300 also have been
   linked to BC. The putative tumour suppressor gene EP300 is frequently
   mutated in BC [[132]52, [133]53], while CTCF overexpression is
   correlated with resistance to apoptosis in BC [[134]52, [135]53].
   Polymorphism of the VDR (Vitamin D Receptor) was previously associated
   with BC [[136]54], but its expression has no known association either
   with BC or with lapatinib insensitivity. Since it was found to be part
   of the same transcriptional module whose constituents are associated
   with lapatinib insensitivity, VDR could be a potential target for
   modulating lapatinib response in BC cells which do not respond to
   lapatinib.

   We further investigated the protein-protein interaction (PPI) network
   formed by the protein products of the selected genes. The STRING
   database [[137]55] was used to find PPIs between the products of the
   selected genes ([138]Fig 2E). Interestingly, many of the above
   transcription factors (MAPK1, TFAP2C, EP300) act as large hubs in the
   PPI network. They were also found to interact with each other, not only
   at the transcriptional level as found in the TRI network, but also at
   the protein level ([139]Fig 2E–2G). In the PPI networks, VDR interacts
   with relatively few proteins ([140]Fig 2F). However, its interactors
   include STAT1, MAPK1 and PPP2CA, all of which have been associated with
   lapatinib response [[141]44–[142]46, [143]56]. This strengthens the
   case for VDR as a potential target for influencing lapatinib
   insensitivity. Another interesting protein is EIF2S2. It interacts with
   several large hubs such as MAP2K1, ELAVL1 and PAPOLA1 in the PPI
   network ([144]Fig 2G), and therefore may play important roles in
   insensitivity of BC cells to cancer drugs.

   The above analysis suggests that many of the genes which were
   identified by the GEAGP algorithm have association with BC and
   lapatinib insensitivity. Additionally, it also demonstrates that some
   of these genes play important role in the transcriptional and protein
   interaction networks that are dysregulated in lapatinib insensitive BC
   cells, and therefore can be potential targets for treating these cells.

3.5 Validating the markers of lapatinib insensitivity using GDSC data

   We analysed the GDSC database [[145]34] to see if the expression of the
   genes selected by the GEAGP algorithm correlates with responses of BC
   cells to HER2 and/or EGFR targeted therapies. Any such correlation will
   strengthen their potential as markers for BC cell response to HER2
   and/or EGFR targeted therapies. GDSC database [[146]34] contains basal
   gene expression and drug response data for fifty BC cell lines
   belonging to different histological subtypes and five drugs (lapatinib,
   erlotinib, EKB-569, afatinib and CP724714) that target HER2 and/or EGFR
   tyrosine kinases. Expression data was available for 458 of 519 genes
   selected by the GEAGP algorithm. We calculated pearson correlation
   between the mRNA expressions of each gene and the area under the dose
   response curves of each cell-line for each drug. mRNA expressions of
   120 genes out of 458 had statistically significant (p-value<0.05)
   correlation with the responses of BC cells to at least one of the five
   drugs (Table B in [147]S1 File). When corrected for false discovery
   rate (FDR), 32 of the 120 genes were selected at 10% FDR (Table B in
   [148]S1 File). Most of these genes’ expressions correlated with
   response to afatinib which, like lapatinib, targets HER2 and EGFR.
   There were relatively little correlation between the expressions of
   these genes and lapatinib response in GDSC dataset [[149]34]. This is
   most likely due to lack of lapatinib response data which is available
   for only 13 out of 50 BC cell lines. Nevertheless, the above analysis
   suggests that many of the genes that were identified as markers of
   lapatinib insensitivity in T47D and MDA-MB-468 cells, can indeed be
   generally used as markers of BC cell response to a variety of HER2
   and/or EGFR targeted therapies.

3.6 Experimental validation of selected genes

   It is not feasible to experimentally validate all 519 genes which were
   identified in the above analysis due to time and cost factors. To
   select a smaller subset for experimental validation, we first compared
   our list of potential lapatinib insensitivity markers to that
   determined in a previous study by O'Neill et al.[[150]57]. O’Neill et
   al. used multiple parallel t-tests to analyse part of the same
   microarray dataset used here for identifying potential lapatinib
   biomarkers. 25 genes were found to be common between our list and that
   of the O’Neill et al.‘s study. These common genes include
     * VDR, EIF2S2 and SOCS3 (regulator of STAT1) which were found to play
       important roles in the PPI and transcriptional networks in the
       bioinformatics analysis
     * ANKRD10, PTP4A1, PTP4A2, VDR, ATP2B4 which had statistically
       significant correlation with BC cell responses to HER2 and/or EGFR
       targeted therapies

   Therefore, this list includes a mix of genes identified in the
   bioinformatics analysis, GDSC data [[151]34] analysis, and a set of new
   lapatinib insensitivity markers. We further added two more genes to
   this list, phosphatase PPP2CA and transcription factor TFAP2C, both of
   which was identified to be important in the bioinformatics analysis and
   the later also has statistically significant correlation with BC cell
   responses to HER2 and/or EGFR targeted therapies (Table B in [152]S1
   File).

   The differential gene expression levels of the resulting 27 genes (see
   [153]Table 1) and an endogenous control gene (18S ribosomal RNA) were
   measured in mRNA from SKBR-3, BT-474, T47D and MDA-MB-468 cell lines
   treated with DMSO or lapatinib (1 μM) for 16 hours, using real time
   quantitative polymerase chain reaction (qRT-PCR). The data obtained
   from the above experiment were then statistically analysed to examine
   whether expression of these genes was differentially altered in
   response to lapatinib in responsive and non-responsive cell lines.
   Relative expression values were calculated using the comparative
   threshold cycle (C[T]) method [[154]27]. First, we subtracted the cycle
   threshold (C[T]) values of the housekeeping gene (18S) from those of
   the selected genes [[155]58]. The resulting values (ΔC[T]) represent
   the responses of each gene to DMSO/ lapatinib and were analysed using
   ANCOVA (Analysis of Covariance) analysis [[156]59]. For the ANCOVA
   analysis, the ΔC[T] values of each gene were used as response
   variables, the dose of lapatinib was used as predictor variable (1 μM
   lapatinib for treated cell, 0 μM for DMSO treated control cells), and
   the responsive or non-responsive (to lapatinib) status of the cell
   lines were used as covariates. The p-values calculated by the ANCOVA
   analysis were then subjected to correction for multiple testing. For
   this purpose, a false discovery rate (FDR) corresponding to each
   p-value was calculated using the Benjamini Hochberg method. Genes with
   FDR lower than 0.1 (representing 10% false discovery rate) were
   considered to pass the validation test. We used MATLAB functions
   aoctools and mafdr for the ANCOVA and FDR analysis, respectively. 16
   out of 28 genes passed the test and their changes (in terms of average
   log-fold change with respect to DMSO treated cells) to lapatinib
   treatment in all four cells lines are shown in [157]Fig 3.

Fig 3. Real time qPCR based validation of a set of selected potential
biomarkers.

   [158]Fig 3
   [159]Open in a new tab

   Each panel consists of the log fold changes (lapatinib treated vs
   untreated, shown in Y-axis) of the corresponding gene (shown in the
   header of each panel) in all four cell lines (X-axis). Plots are based
   on N = 3 biological replicates.

3.7 Identifying potential therapeutic targets for treating lapatinib
insensitive BC cells

   To find potential targets, we focussed on the genes which satisfy the
   following criteria:
     * identified by GEAGP as potential lapatinib insensitivity markers
     * have statistically significant correlation (at <10% FDR) with BC
       cell response to HER2 and/or EGFR targeted therapy.

   32 genes (Table B in [160]S1 File) satisfy these conditions. To further
   narrow down the list of potential targets we investigated whether the
   expressions of any of these genes correlate with BC patient survival.
   Ideally, we would like to analyse data from BC patients who was treated
   with HER2 and/or EGFR targeted therapies. But, due to lack of
   sufficient details on treatment history of patients, we analysed data
   from HER2 negative, and basal type BC patients who typically do not
   respond to lapatinib therapies [[161]60, [162]61]. In some cases, HER2
   positive patients do not respond to these types of therapies due to
   lack of EGFR expression or acquired resistance [[163]62]. Therefore, we
   also included HER2 positive patients in our survival data analysis. We
   used Kaplan-Meier plot to find association between expression of each
   gene in the list (Table B in [164]S1 File) and recurrence free
   survivals of HER2 negative, basal-like and HER2 positive patient
   cohorts. We used kmplot [[165]63] web tool which integrate data from
   several public sources to perform this analysis. 26, 21 and 23 genes
   (out of total 32) were associated at 10% FDR with the survival of HER2
   negative, basal and HER2 positive patient cohorts (Table C in [166]S1
   File). 14 genes, SDS, UCP2, KPNB1, REEP5, ALAS3, ZNF10, SDC1, GPAA1,
   RHOB, EXOSC4, FOXA1, VDR, GLANT7, MLPH, had significant association
   with the survival of all three patient cohorts. We did not find any
   reference to REEP5, ALAS3, ZNF10, EXOSC4 and GLANT7 in existing BC
   literature. Therefore association of these genes to BC, especially
   lapatinib insensitivity of BC cells, may be novel finds of this study.
   SDS, UCP2, KPNB1, SDC1, GPAA1, and MLPH had been sporadically studied
   in the context of BC [[167]64–[168]69]. But, to the best of our
   knowledge, their role in lapatinib insensitivity of BC cells are not
   known. VDR, FOXA1 and RHOB are well studied in the context of BC.
   However, to the best of our knowledge, their potential as a therapeutic
   target for treating lapatinib insensitive BC cells was not previously
   investigated. We selected VDR for further experimental testing.

3.8 Investigation of VDR as a therapeutic target

   Vitamin D Receptor (VDR), a member of the nuclear receptor family of
   proteins, acts as a receptor for active vitamin D as well as a
   DNA-binding transcription factor. In that respect, it is similar to
   more familiar cancer targets such as oestrogen receptor, androgen
   receptor and progesterone receptor [[169]70]. Patient survival analysis
   using data from kmplot database [[170]63] suggests that VDR expression
   does not correlate with recurrence free survival of BC patients in
   general ([171]Fig 4A). However, HER2 negative BC patients (typically do
   not respond to lapatinib) with high VDR expressions are more likely to
   survive longer than those with low VDR expressions ([172]Fig 4B).
   Interestingly, basal-like BC patients, most of whom have low or no HER2
   expression show the opposite trend, that is, patients with low VDR
   expression are more likely to survive longer than patients with high
   VDR expression ([173]Fig 4C). HER2 positive patients exhibit similar
   trend as the basal-like patients ([174]Fig 4D). Furthermore, using
   Regulome explorer ([175]http://explorer.cancerregulome.org/) we
   investigated whether VDR expression has any significant association
   with other genomic features of BC patients. It was found to have
   significant correlations with protein expression of Estrogen receptor
   and the expressions of twenty two miRNAs ([176]Fig 4E), most of which
   had been previously implicated in proliferation, apoptosis, senescence,
   resistance to drugs including HER2 targeted therapies [[177]71,
   [178]72]. This further strengthens VDR’s potential as therapeutic
   target for lapatinib insensitive BC patients. One of the most commonly
   used commercially available reagents for VDR targeted therapy is
   Calcitriol, the active hormonal form of Vitamin D[3] [[179]73]. It
   primarily functions by binding to and activating VDR leading to
   signalling and transcriptomic changes [[180]73]. However, the effect of
   calcitriol depends on dose, frequency and time lapsed following
   treatment. For instance, low levels of calcitriol have been shown to
   possess pro-tumorigenic effects, whereas high doses of calcitriol,
   especially when applied in pulses, were shown to have
   anti-proliferative and anti-tumorigenic effects in several types of
   cancer cells [[181]74–[182]76]. Although at the protein level
   calcitriol activates VDR signalling by binding to it, it was also shown
   to reduce the mRNA expression level of VDR [[183]77–[184]79]. Recently,
   combinations of calcitriol and antiestrogen or gefitinib (a tyrosine
   kinase inhibitor), respectively, were found to reduce proliferation and
   increase apoptosis of BC cells more effectively than any of these
   compounds alone [[185]80, [186]81].

Fig 4. Interrogation of VDR as potential therapeutic targets for sensitizing
lapatinib insensitive breast cancer cell lines to lapatinib treatment.

   [187]Fig 4
   [188]Open in a new tab

   (A-D) Association between recursion free survival of all, HER2
   negative, triple negative, HER2 positive breast cancer patients with
   VDR expression. (E) Association between VDR expression and the
   expressions of 22 miRNAs in the TCGA breast cancer patient cohort. (F)
   ΔΔC[T] values for Vitamin D Receptor in HER2 positive lapatinib
   insensitive cell lines (MDA-MB-453 and JIMT-1) showing changes between
   untreated cells and cells expose to lapatinib for 16 hours. (G-J)
   Combined treatment with lapatinib and calcitriol in MDA-MB-453, JIMT-1,
   HCC 1954 and HCC 1956-L cells respectively (* denotes p<0.05).Plots are
   based on N = 3 biological replicates.

   We investigated whether the combination of calcitriol and lapatinib can
   be used as an effective treatment strategy for lapatinib insensitive BC
   patients. We selected four cell lines for our experiment. Two of these,
   MDA-MB-453 and JIMT-1 are HER2 positive but innately insensitive to
   lapatinib. Whereas HCC 1954-L cell lines, which was developed by
   exposing HER2 positive HCC 1954 cells to lapatinib for a long period of
   time, has acquired resistance to lapatinib. The clinico-pathological
   subtypes and the expression levels of HER2, EGFR and VDR which are
   targets of lapatinib and calcitriol are shown in [189]Table 3.

Table 3. Clinico-pathological subtypes of and receptor expressions in
MDA-MB-453 and JIMT1 cell lines.

   Cell line Clinico pathological subtype HER2 (percentile) EGFR
   (percentile) VDR (percentile)
   MDA-MB-453 Triple negative[[190]82], HER2 positive [[191]83] 97.50
   79.03 76.18
   JIMT-1 HER2 positive 97.30 2.90 95.33
   HCC 1954 HER2 positive 99.21 76.20 87.92
   [192]Open in a new tab

   The receptor expression data were collected from the GDSC database
   [[193]34]. The expression data are shown in percentile, here X
   percentile means X percent of all cell lines in the GDSC database
   [[194]34] had equal or less expression than the above cell lines.

   Among the innately lapatinib insensitive cell lines,
   clinico-pathological subtype of MDA-MB-453 is fiercely debated in
   literature, while some suggest it is a triple negative BC cell with no
   or low HER2 expression [[195]82], others reports it to be HER2 positive
   with overexpressed HER2 [[196]83]. We compared HER2 expression in
   MDA-MB-453 cells with other cancer cell lines in GDSC database
   [[197]34] and found that HER2 has higher mRNA expression in MDA-MB-453
   than ~97.5% of all (1018) cell lines in this database. JIMT-1 is known
   to be HER2 positive [[198]84] which is corroborated by the GDSC data
   [[199]34] ([200]Table 3). While EGFR is over expressed in MDA-MB-453,
   it is significantly under-expressed in JIMT-1 ([201]Table 3). VDR has
   above average expressions in MDA-MB-453 and JIMT-1 cell lines and have
   higher expression in JIMT1 than in MDA-MB-453 according to the GDSC
   database. However, in our qRT-PCR analysis VDR has higher expression in
   MDA-MB-453 than in JIMT-1 cells (Figure A in [202]S1 File). Despite
   overexpressing at least one of HER2 or EGFR neither of these cell lines
   respond to lapatinib. Following 16 hours of treatment with 1 μM
   lapatinib neither cell-line showed a significant change in VDR gene
   expression ([203]Fig 4F), reflecting the same behaviour as the
   lapatinib insensitive T47D and MDA-MB-468 cell lines, which were used
   to generate the time course gene expression data.

   HCC1954, which was used to develop HCC 1954-L with acquired lapatinib
   resistace, is known to be HER2 positive and has overexpressed HER2,
   EGFR, and VDR ([204]Table 3).

   The cells were treated with calcitriol (6 μM for MDA-MB-453, JIMT-1,
   HCC 1954 and 500nM for HCC 1954-L) and lapatinib (0.25 μM, 0.5 μM, 1.0
   μM for all four cell lines), individually and in combination
   (calcitriol + lapatinib) for 5 days before measuring cell
   proliferation. In MDA-MB-453, JIMT-1, HCC1954 and HCC1954-L cells
   calcitriol alone resulted in 19%, 11%, 21% and 30% growth inhibition
   respectively ([205]Fig 4G–4J), whereas, different doses of lapatinib
   inhibited the growth of MDA-MB-453, JIMT-1, HCC1954 and HCC1954-L cells
   by 10–19%, 1–5%, 50–64% and 25–44% respectively ([206]Fig 4G–4J).
   However, the combination of calcitriol + lapatinib resulted in 36–38%,
   13–17% and 43–52%, 49–70% growth inhibition in MDA-MB-453, JIMT-1,
   HCC1954 and HCC1954-L cells respectively depending on the doses of
   lapatinib ([207]Fig 4G–4J). Thus, among the innately insensitive cells,
   treatment with lapatinib and calcitriol showed a significant additive
   effect in MDA-MB-453 but not in JIMT-1 cells. The lower expression
   levels of VDR expression in the JIMT-1 cells compared to MDA-MB-453
   cells may plausibly explain why these cells are less sensitive to
   calcitriol treatment.

   On the other hand, among the isogenic HCC1954 and HCC1954-L cells, only
   the latter displayed an additive effect of the lapatinib and calcitriol
   combination ([208]Fig 4I and 4J). Based on qRT-PCR analysis, VDR mRNA
   expression is increased in the HCC1954-L cells (1.7 fold) (Figure B in
   [209]S1 File), which may contribute to benefit observed with calcitriol
   treatment.

4 Discussion

   The rapid advent of omics technology made it possible to investigate
   the molecular networks of living cells in unprecedented detail.
   Consequently, it is now possible to track the behaviours of thousands
   of genes simultaneously over a period of time. However, identifying
   genes which show different temporal behaviour under different
   experimental conditions is not straightforward. Here, we developed a
   statistical tool that addresses this question and applied it to study
   the gene expression changes associated with lapatinib resistance in BC
   cells. Using existing datasets, our study identified several potential
   biomarkers of lapatinib resistance in BC, some of which were
   experimentally validated.

   We identified more than 500 genes as potential lapatinib resistance
   markers, many of which can also be potential targets for treating
   lapatinib and, more generally, HER2-targeted TKI resistant BC cells.
   One such target identified in this study is VDR. We experimentally
   verified whether VDR can be a viable target for combination therapy for
   treating HER2 positive lapatinib resistant (innate and acquired) BC
   patients. We selected two cell lines for the investigation of innate
   insensitivity, MDM-MB-453 where both HER2 and VDR are relatively highly
   expressed, and JIMT-1 where HER2 is highly expressed but VDR has
   significantly lower mRNA expression compared to MDA-MB-453. The
   combination of lapatinib and calcitriol inhibited proliferation more
   effectively than individual treatment by either reagents in MDA-MB-453,
   but not in JIMT-1. In the acquired resistance setting we used the
   HCC1954-L cells which were developed by exposing HCC1954 cells to
   lapatinib for 6 months. The combination of calcitriol and lapatinib
   inhibited proliferation of the lapatinib resistant HCC1954-L more
   effectively than either agent alone. These results suggest that the
   combination of calcitriol and lapatinib can be a potential combination
   therapy regimen for patients who express HER2 and VDR but do not
   respond to lapatinib treatment. However, these are preliminary results
   and need further experimental validation in a larger panel of cell
   lines and animal models to assess the clinical viability of this drug
   combination.

   Drug resistance is a complex process which involves dynamic adaptations
   of the interplay between signalling and transcriptional networks of
   cancer cells induced by drug treatments. A system level understanding
   of transcriptional and signalling networks is required to design
   therapeutic strategies targeted at overcoming such resistance. Such
   insight can be gained by reconstructing quantitative models of
   transcriptional and signalling network from experimental data using
   network reconstruction methods [[210]50, [211]85–[212]90]. The
   reconstructed models can then be used to simulate the effects of drug
   treatments on signalling and transcriptional networks of drug resistant
   cancer cells. We are currently pursuing this avenue to identify potent
   combination therapy regimens for treating BC patients.

Supporting information

   S1 File. Supplementary information for this manuscript.

   Contains supplementary Tables A-C and supplementary Figures A,B.

   (DOCX)
   [213]Click here for additional data file.^ (65.1KB, docx)

Data Availability

   The dataset was previously published with the following article: Hegde,
   P.S., et al., Delineation of molecular mechanisms of sensitivity to
   lapatinib in breast cancer cell lines using global gene expression
   profiles. Mol Cancer Ther, 2007. 6(5): p. 1629-40.

Funding Statement

   This work was supported by the Science Foundation Ireland under Grant
   No. 06/CE/B1129 (CSET) and the Irish Cancer Society Collaborative
   Cancer Research Centre BREAST-PREDICT (CCRC13GAL).

References