Abstract

   Endocrine therapies targeting the estrogen receptor (ER/ESR1) are the
   cornerstone to treat ER-positive breast cancers patients, but
   resistance often limits their effectiveness. Notable progress has been
   made although the fragmented way data is reported has reduced their
   potential impact. Here, we introduce EstroGene2.0, an expanded database
   of its precursor 1.0 version. EstroGene2.0 focusses on response and
   resistance to endocrine therapies in breast cancer models.
   Incorporating multi-omic profiling of 361 experiments from 212 studies
   across 28 cell lines, a user-friendly browser offers comprehensive data
   visualization and metadata mining capabilities
   ([60]https://estrogeneii.web.app/). Taking advantage of the harmonized
   data collection, our follow-up meta-analysis revealed transcriptomic
   landscape and substantial diversity in response to different classes of
   ER modulators. Endocrine-resistant models exhibit a spectrum of
   transcriptomic alterations including a contra-directional shift in ER
   and interferon signalings, which is recapitulated clinically.
   Dissecting multiple ESR1-mutant cell models revealed the different
   clinical relevance of cell model engineering and identified
   high-confidence mutant-ER targets, such as NPY1R. These examples
   demonstrate how EstroGene2.0 helps investigate breast cancer’s response
   to endocrine therapies and explore resistance mechanisms.

   Subject terms: Breast cancer, Next-generation sequencing,
   Transcriptomics, Receptor pharmacology

Introduction

   Breast cancer remains the most common cause of cancer-associated death
   among women and is responsible for nearly 15% of all new cancer cases
   each year in the United States^[61]1. This disease presents as a
   heterogeneous entity, with various subtypes distinguished by unique
   molecular profiles, guiding different treatment strategies. Among these
   subtypes, estrogen receptor-positive (ER+) breast cancer, characterized
   by the presence of estrogen receptor (ER/ESR1), accounts for
   approximately 70% of cases^[62]2,[63]3. Not surprisingly, targeted
   therapies which directly inhibit the transcriptional activity of ER, or
   which block the synthesis of estrogens have and will continue to be the
   cornerstone of interventions used to treat this disease at all
   stages^[64]4–[65]6.

   The Selective Estrogen Receptor Modulators (SERMs), like tamoxifen,
   function as competitive inhibitors of estrogen binding to ER and also
   disrupt the integrity of the protein-protein interaction surfaces
   needed for coactivator recruitment^[66]7. Tamoxifen is used in
   combination with GnRH analogues (to disrupt estrogen biosynthesis) in
   premenopausal women with ER-positive breast cancer^[67]5. In
   postmenopausal women with this disease subtype, aromatase inhibitors
   (AIs), such as letrozole and anastrozole, which inhibit the peripheral
   conversion of androgens to estrogens are now the standard of care for
   frontline intervention^[68]8. The classic models of ER action suggest
   that a small molecule ligand (i.e. an estrogen) is required for the
   activation of the receptor. However, in cancer cells it has been shown
   that ER transcriptional activity can occur in a ligand-independent
   manner secondary to the overexpression of certain coactivators (i.e.
   SRC3) or increased activity of signaling pathways which impinge upon
   the ER-coregulator complex^[69]9. Further dysregulated expression of
   the expression and/or activity of coregulators such as SRC3 can also
   have a profound effect on the pharmacology of
   ER-modulators^[70]10,[71]11. These mechanistic insights gave rise to
   the idea that the elimination of ER as opposed to its inhibition may be
   a better approach in cancer.

   Emerging from these efforts was the first-generation SERD fulvestrant,
   whose clinical efficacy, albeit modest, validated the general approach
   of eliminating the receptor as a therapeutic approach^[72]12. The first
   non-steroidal oral SERD, Etacstil, was developed over 25 years ago and
   although it demonstrated clinical efficacy its development was
   discontinued^[73]13. However, the lessons learned from studying this
   drug informed the discovery and development of the third generation
   SERDs, a large number of which have been evaluated in the clinic^[74]6.
   Also driving SERD development is the identification of constitutively
   active hotspot mutations like ESR1 Y537S and D538G^[75]14,[76]15, ESR1
   fusions which exhibit neomorphic transcriptional activities and the
   demonstration of ESR1 amplification in some tumors^[77]16,[78]17.
   Elacestrant was the first SERD of this class of medicine to be FDA
   approved and is currently used in patients whose tumors harbor an
   activating ESR1 mutation^[79]18,[80]19. Camizestrant, giredestrant,
   imlunestrant and palazestrant are all in late-stage clinical trials in
   patients with AI-resistant disease with some being evaluated in earlier
   settings in patients at high risk of recurrence^[81]20–[82]24. All of
   the SERDs described above have a common mechanism of action in that
   they induce denaturing conformational changes in ER structure which
   result in it being targeted to the proteasome. However,
   proteosome-dependent degradation is also achieved by the
   ligand-directed degrader ARV471 (vepdegestrant), a bifunctional ER
   ligand-directed degrader (LDD) which has a receptor binding moiety and
   a second functionality which results in the recruitment of the cereblon
   E3 ligase^[83]25. Challenging the idea that SERD activity is an
   absolute requirement of ER modulators for advanced disease is the SERM
   lasofoxifene, has shown substantial clinical activity in patients whose
   tumors express ESR1 mutations. It is likely that several oral SERDs,
   ARV-471 and the SERM lasofoxifene will be approved clinically and thus
   the next challenge will be to understand how to distinguish these
   functionally distinct drugs and identify best clinical
   utility^[84]26,[85]27. Key to success in this regard will be to define
   how ER activity and pharmacology are impacted by alterations in the
   expression and activity of specific coregulators (coactivators and
   corepressors) and by the activity of PI3K^[86]28, MAPK^[87]29, and
   ERBB2 signaling pathways^[88]30. Also important will be to define how
   epigenetic modifications, such as changes in DNA methylation and
   histone modifications, impact ER pharmacology and the response of
   tumors to these agents.

   There are few validated in vivo models of ER+ breast cancer and thus
   cell line models have played a vital role in studies that have
   elucidated the molecular mechanisms that determine ER pharmacology and
   how pathways are dysregulated upon drug resistance. Two broad
   categories of endocrine resistance models have been widely utilized:
   consequence-mimicking and molecular-mimicking models^[89]31. The former
   involves the development of cell subclones under selective pressure of
   chronic treatment with different endocrine therapies, in vitro or in
   vivo, and subsequent elucidation of the cause-and-effect relationships
   between specific molecular events and drug pharmacology. The
   tamoxifen-resistant (TamR) models and long-term estradiol deprivation
   (LTED) models are among the most widely used. Molecular-mimicking
   models, on the other hand, engineer cells with known genetic
   alterations identified from endocrine-resistant cases, such as ESR1
   mutations. These models, created using genome editing or ectopic
   overexpression, have resulted in the discovery of new biology and
   druggable therapeutic vulnerabilities^[90]32. Advances in
   high-throughput sequencing technologies have revolutionized our ability
   to explore the molecular landscapes of ER modulator activity in models
   of endocrine therapy sensitivity and resistance. However, despite the
   wealth of publicly available datasets, and tools like NCBI GEO^[91]33
   and Cistrome DB^[92]34, accessing and analyzing these data in a
   comprehensive manner remains challenging.

   The EstroGene2.0 knowledgebase has been developed to address the
   limitations in processing and analyzing current datasets that probe
   ER-biology focusing on transcriptomic and cistromic analyses. Extending
   its predecessor, EstroGene1.0^[93]35, which focused solely on
   experiments analyzing treatment with estrogens. EstroGene2.0
   encompasses transcriptomic and ER ChIP-seq profiling data from studies
   performed with a large number of different ER modulators in models of
   both endocrine therapy sensitive and resistant breast cancer cells. Our
   updated platform features an enhanced user-friendly browser, enabling
   researchers to swiftly access and review experimental documentation,
   dataset quality controls, and insights into gene regulation and ER
   binding associations across hundreds of curated experiments.
   Additionally, we have integrated an unpublished ER Rapid
   Immunoprecipitation Mass spectrometry of Endogenous protein (RIME)
   profiling data from 16 ER+ cancer cell lines, offering additional
   access into ER interactomes. Following the development of EstroGene2.0,
   we conducted a series of pilot studies to explore potential mechanisms
   of response and resistance to ER modulators and herein report these
   findings and provide examples of new testable hypothesis that can be
   tested experimentally.

Methods

EstroGene2.0 data curation

   To obtain comprehensive ER modulator treatment and endocrine-resistant
   cell model-related database in breast cancer, we established a
   standardized curation model with three main steps. First, we conducted
   a literature search from the Gene Expression Omnibus (GEO database)
   using the combination of ER modulator compound (e.g. “tamoxifen” or
   “4OHT) or endocrine resistant model type (e.g. “tamoxifen resistant” or
   TamR) plus “breast cancer” plus the name a specific type of sequencing
   technology (e.g., “RNA-seq” or “RNA sequencing”) towards publications
   released earlier than January 2023. Secondly, we manually reviewed
   these articles, only literature conducting the required experiments on
   human breast cancer cell lines were incorporated into the EstroGene2.0
   database. We curated details of (1) publications including GEO
   accession number, PMID, publication date and institution; (2)
   experimental designs including cell lines, replicates used, compound
   dose, duration for ER modulator treatment, resistant development
   procedure, model construction method and variants type (for ESR1
   mutations), library preparation method and NGS sequencing platforms.
   All the relevant information is summarized in Supplementary Data1. Of
   note, among the 53 tamoxifen-treatment experiments we documented, 36 of
   them were clarified as 4-hydroxytamoxifen (4OHT) while the rest 17 were
   not specified (Supplementary Data1). Special cautions need to be drawn
   to the compound type when mining tamoxifen treatment data sets given
   they are likely a pool of both compounds.

Website construction

   The EstroGene II website utilizes Firebase
   ([94]https://firebase.google.com) for its robust NoSQL database
   infrastructure. We developed specialized gene-based indices and stored
   this data structure in Firebase, which specifically provides constant
   time retrieval for our website. It reduces search latency, thereby
   enhancing the responsiveness of our platform. As a result, EstroGene II
   operates with higher efficiency and speed compared to its predecessor.
   Our front end is crafted using Angular 16.2.0 ([95]https://angular.io)
   with TypeScript, HTML, and CSS. Angular simplifies the development
   process with its tooling and libraries. Its efficient rendering
   pipelines minimize the resource footprint and enhance the speed of view
   updates. In addition, R Shiny was used to construct the external
   analysis page including mode2 and mode3 of the analysis tab. These
   features enable EstroGene II to handle complex interactions and
   maintain a reactive user experience across all user activities.

Transcriptomic data process and analysis

   For all the RNA-seq experiments, we took advantage of the recently
   released NCBI-generated RNA-seq count data (BETA version). Briefly, SRA
   runs are aligned to genome assembly GCA_000001405.15 (hg38) using
   HISAT2. Runs that pass a 50% alignment rate are further processed with
   Subread featureCounts which outputs a raw count file for each run. For
   Human data, the Homo sapiens Annotation Release 109.20190905 was used
   for gene annotation. Genes with 0 counts in each experiment were
   removed and DESeq2^[96]36 was used to compute log2-fold change and
   adjust P values of each gene between control and targeted samples. For
   specific datasets lacking replicates, we generated log2-fold change of
   each gene by subtracting TMM normalized log2(CPM + 1) values of
   controls from the corresponding targeted samples.

   For microarray datasets, we collected the raw array files from GEO
   database and normalized the data with different packages according to
   the platform. Affy^[97]37 and oligo^[98]38 packages were used to
   process Affymetrix-based microarray data following RMA normalizations.
   For data generated based on Agilent platform, loess normalization was
   performed directly on preprocessed data were downloaded from GEO.
   Different version of probe ID were converted to gene ID using BioMart
   package. Probes representing the same gene were merged by averaging the
   normalized intensity. LIMMA^[99]39 was used to compute differential
   expressing genes for datasets including biological replicates. For
   experiments without replicates, log2-fold changes were calculated by
   subtracting the control values from the matched samples.

   For the integration of RNA-seq and microarray comparisons under the
   same section (e.g. short-term ER modulator treatment), the percentile
   of each gene within each comparison was calculated independently by
   ranking the treatment or condition-induced fold changes, with -100–0 as
   repression and 0–100 as activation and 0 as no regulation. For the
   merged regulatory percentile, genes appeared in below 80% comparisons
   of each section were filtered out. For downstream analysis, regulator
   prediction were performed using LISA^[100]40. Venn diagrams were
   generated using jVenn^[101]41. Data visualizations were performed using
   “ggpubr”^[102]42 and “Complexheatmap”^[103]43.

   For multivariant regression modeling of short-term ER modulator
   treatment, comparisons defined by ER modulator treatment group of the
   same cell line, drug time, and time, compared to control group, we
   linearly scaled expression fold change of each gene (from differential
   gene expression analysis) to percentile rank, ranging from −100 to 100.
   Specifically, the most down-regulated gene is set as −100, and most
   up-regulated gene as 100, and genes with no change remain as 0. We
   selected experiments with tamoxifen or fulvestrant treatment as input
   for the regression model. For each gene, we modeled its percentile rank
   by mixed effect linear regression model, using each comparison as a
   single observation.
   [MATH: <mtable><mtr><mtd columnalign="left"><mi
   mathvariant="normal">Percentile</mi><mspace
   width="0.25em"></mspace><msub><mrow><mi
   mathvariant="normal">Rank</mi></mrow><mrow><mi
   mathvariant="normal">Gene</mi></mrow></msub><mo>~</mo><mi
   mathvariant="normal">Cell</mi><mspace width="0.16em"></mspace><mi
   mathvariant="normal">Line</mi><mo>+</mo><mi
   mathvariant="normal">Sequencing</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">Technique</mi></mtd></mtr><mtr><mtd
   columnalign="left"><mo>+</mo><mspace width="0.25em"></mspace><mi
   mathvariant="normal">Drug</mi><mspace width="0.16em"></mspace><mi
   mathvariant="normal">Type</mi><mo>×</mo><mi
   mathvariant="normal">Treatment</mi><mspace width="0.25em"></mspace><mi
   mathvariant="normal">Duration</mi><mspace
   width="0.25em"></mspace><mrow><mo>(</mo><mrow><mi
   mathvariant="normal">h</mi></mrow><mo>)</mo></mrow><mspace
   width="0.25em"></mspace><mo>∣</mo><mspace width="0.25em"></mspace><mi
   mathvariant="normal">Batch</mi><mspace width="0.16em"></mspace><mi
   mathvariant="normal">ID</mi></mtd></mtr></mtable> :MATH]

   Specifically, sequencing technique is either RNA-seq or Microarray.
   Batch ID marks a unique experiment which consists of one to multiple
   experiment groups using same set of control samples. Regression model
   is calculated with python statsmodel package^[104]44 (v.0.14.1). We
   recorded regression coefficient and significance p value for each term
   and interaction term for each model.

   For pathway enrichment analysis, we used EnrichR^[105]45
   ([106]https://maayanlab.cloud/Enrichr/) to calculate pathway
   enrichment, selecting the following datasets: KEGG (human, 2021),
   Reactome (2022), Gene Ontology Biological Process (2023), Gene Ontology
   Cellular Component (2023), and Gene Ontology Molecular Function (2023).
   For each dataset in ER modulator treatment, the top 5 enriched gene
   set, ranked by adjusted p value, were selected for presentation for
   each enrichment analysis. The enrichment plot was generated using
   gseapy^[107]46 (v1.1.0) dotplot function, showing ratio of gene overlap
   with pathways as dot size, and colored by adjusted p value from pathway
   enrichment. For other data sets, we mainly focused on the Hallmark
   signature collection, and generated the enrichment bubble plots for
   significantly enriched pathways using adjusted p values and odds ratio.

   We also used NES based method for different types of compound pathway
   enrichment in Fig. [108]2J. Briefly, we first calculated average gene
   percentile rankings of each gene from all comparisons treated with the
   specific ER modulator type. Then for each drug type, we ran gene set
   enrichment analysis (prerank function from gseapy v1.1.0), using MSigDB
   hallmark (2020) gene sets using the average gene percentile ranking as
   pre-ranked correlation. Other computation parameters include minimum
   and maximum allowed gene number from gene set and data set as 5 and
   1000 respectively, permutation number as 1000, and random seed as 6.
   This generated a normalized enrichment score (NES) and an FDR value for
   each pathway and each drug type. In the heatmap, we colored
   drug-pathway pairs by NES values.

Fig. 2. Diversity of short-term ER modulator transcriptomic response.

   [109]Fig. 2
   [110]Open in a new tab

   A Boxplots depicting the log10 (coefficient variations)^2 as a
   measurement of data set variation within all the tamoxifen and
   fulvestrant comparisons. Mann–Whitney U test was used. B Distribution
   of log10 p values of each variable from mixed effect multivariant
   linear regression model of gene regulatory percentiles tile from −100%
   to 100% with short-term tamoxifen or fulvestrant treatment experiments,
   among all genes. C Scatter plot showing Pearson correlation of average
   regulatory percentiles of each gene from tamoxifen and fulvestrant
   treatment experiments. Highlighted dots include consistently regulated
   genes with average fold change percentile ≥25 and ≤−25 in both
   tamoxifen or fulvestrant treatment experiments (‘Tam-up & Fulv-up’,
   blue and ‘Tam-down & Fulv-down’, green) as well as inconsistently
   regulated genes with average regulatory percentile ≥25 and ≤−25 in one
   of the compounds (‘Tam-up & Fulv-down’, red and ‘Tam-down & Fulv-up’,
   yellow). D–G Dot plots showing enriched pathways in genes which were
   ‘Tam-up & Fulv-up’ (D), ‘Tam-down & Fulv-down’ (E), ‘Tam-up &
   Fulv-down’ (F), and ‘Tam-down & Fulv-up’ (G), using five different
   databases. H, K Heatmaps illustrating the Pearson correlation
   coefficients of global gene regulatory percentile between each of the
   two compounds (H) and cell lines (K) with unsupervised clustering. I A
   heatmap showing the regulatory percentile of 11,191 genes in each of
   the indicated experiments and average values of fulvestrant and
   tamoxifen. Cell lines and compound names are labeled with color coded
   experiments and drug types. J A heatmap showing GSEA normalized
   enrichment score (NES) of average drug-caused enrichment changes from
   50 MSigDB Hallmark signatures with unsupervised clustering. L Left:
   Venn diagram showing the overlapping of ER interactors from RIME
   experiment (log2FC > 5, padj < 0.05 to IgG) between CAMA1 and union of
   MCF7, T47D, BT483, BT474 and EFM19. Right: Overlapping of CAMA1 unique
   ER interactors and predicted regulators from CAMA1-specific up- and
   down-reglated genes. Consistent targets are labeled.

   For Oncotype DX analysis, 19 genes were extraced from the
   panel^[111]47. CTSL2 and GUS were excluded due to limited number of
   comparisons included them after data filtering. The weighed impact on
   Oncotype DX recurrence scores were further calculated by using the
   following formula:
   [MATH: <mtable><mtr><mtd columnalign="left"><mi
   mathvariant="normal">Weighted</mi><mspace width="0.16em"></mspace><mi
   mathvariant="normal">impact</mi><mo>=</mo><mn>0.47</mn><mo>×</mo><mrow>
   <mo>(</mo><mrow><mi mathvariant="normal">HER</mi><mn>2</mn><mspace
   width="0.25em"></mspace><mi mathvariant="normal">group</mi><mspace
   width="0.16em"></mspace><mi mathvariant="normal">gene</mi><mspace
   width="0.16em"></mspace><mi
   mathvariant="normal">percentile</mi></mrow><mo>)</mo></mrow><mo>−</mo><
   mn>0.34</mn></mtd></mtr><mtr><mtd columnalign="left"><mo>×</mo><mspace
   width="0.25em"></mspace><mrow><mo>(</mo><mrow><mi
   mathvariant="normal">ER</mi><mspace width="0.16em"></mspace><mi
   mathvariant="normal">group</mi><mspace width="0.16em"></mspace><mi
   mathvariant="normal">gene</mi><mspace width="0.16em"></mspace><mi
   mathvariant="normal">percentile</mi></mrow><mo>)</mo></mrow><mo>+</mo><
   mspace
   width="0.25em"></mspace><mn>1.04</mn><mo>×</mo><mrow><mo>(</mo><mrow><m
   i mathvariant="normal">Proliferation</mi><mspace
   width="0.16em"></mspace><mi mathvariant="normal">group</mi><mspace
   width="0.16em"></mspace><mi mathvariant="normal">gene</mi><mspace
   width="0.16em"></mspace><mi
   mathvariant="normal">percentile</mi></mrow><mo>)</mo></mrow></mtd></mtr
   ><mtr><mtd columnalign="left"><mo>+</mo><mspace
   width="0.25em"></mspace><mn>0.1</mn><mo>×</mo><mrow><mo>(</mo><mrow><mi
   mathvariant="normal">Invasion</mi><mspace width="0.16em"></mspace><mi
   mathvariant="normal">group</mi><mspace width="0.16em"></mspace><mi
   mathvariant="normal">gene</mi><mspace width="0.16em"></mspace><mi
   mathvariant="normal">percentile</mi></mrow><mo>)</mo></mrow><mo>+</mo><
   mn>0.05</mn><mo>×</mo><mspace width="0.16em"></mspace><mi
   mathvariant="normal">CD</mi><mn>68</mn><mspace
   width="0.25em"></mspace><mi mathvariant="normal">regulatory</mi><mspace
   width="0.16em"></mspace><mi
   mathvariant="normal">percentile</mi></mtd></mtr><mtr><mtd
   columnalign="left"><mo>−</mo><mspace
   width="0.25em"></mspace><mn>0.08</mn><mo>×</mo><mi
   mathvariant="normal">GSTM</mi><mn>1</mn><mspace
   width="0.25em"></mspace><mi mathvariant="normal">regulatory</mi><mspace
   width="0.16em"></mspace><mi
   mathvariant="normal">percentile</mi><mo>−</mo><mn>0.07</mn><mo>×</mo><m
   i mathvariant="normal">BAG</mi><mn>1</mn><mspace
   width="0.25em"></mspace><mi mathvariant="normal">regulatory</mi><mspace
   width="0.16em"></mspace><mi
   mathvariant="normal">percentile</mi><mo>.</mo></mtd></mtr></mtable>
   :MATH]

   For clinical cohort analysis, expression data and sample metadata of
   POETIC trial were downloaded from [112]GSE105777^[113]48. Enrichment of
   interferon and estrogen response signatures were calculated using GSVA
   package^[114]49. For ESR1 mutant metastatic breast cancer cohorts,
   differentially expression genes between WT and mutant samples in each
   cohort was described before using padj < 0.1 as the cutoff. For
   logistic regression modeling, GSVA enrichment scores were first
   calculated with up- and down-regulated genes with regulatory percentile
   above 60% or below -60% from 32 genome-edited and 12 ectopic
   overexpression models. An integrated enrichment score of each sample
   were calculated by subtracting the repression score from the activation
   score. Lme4 package^[115]50 was used for the regression modeling using
   ESR1 mutation status as predictor and pROC package^[116]51 was used to
   generate ROC curves. Immune infiltration scores were inferred using the
   ESTIMATE package^[117]52.

ChIP-seq data process and analysis

   ChIP-seq raw fastq files were downloaded from GEO with corresponding
   SRR accession numbers. Reads were aligned to hg19 genome assembly using
   Bowtie 2.0^[118]53, and peaks were called using MACS2.0 with q value
   below 0.05^[119]54. For gained peaks in ESR1 mutant cells versus WT
   controls, we used DiffBind package^[120]55 to intersect BED files.
   Intensity plots for binding peaks were visualized by Seqplots^[121]56
   using BigWig files and BED files as input. Peak visualization was
   conducted via WashU EpiGenome Browser^[122]57 using BigWig files as
   input. For gene annotation, Binding and Expression Target
   Analysis^[123]58 minus was used with 100 kb as the distance from gene
   TSS within which peaks. The average BETA score from 16 ESR1 mutant
   gained peak sets were calculated for each gene.

Cell culture

   BT474, BT483, EFM19, EFM192A, HCC1428, T47D, ZR751 & ZR7530 were
   cultured in in RPMI 1640 Medium (no phenol red, no L-glutamine) (Gibco
   # 32404014). The following supplements were added: 1 mM sodium pyruvate
   (Gibco # 11360039) and 2 mM GlutaMax (Gico # 35050038) and
   heat-inactivated FBS was used. For BT483, human insulin (Sigma # I0516)
   was added at a final concentration of 0.01 mg/ml. Genome-edited ESR1 WT
   and mutant T47D (RRID: CVCL_0553) ESR1 mutant cell models experiments,
   cells were deprived in phenol-red-free IMEM with 5% CSS. CSS was
   purchased from Hyclone (#SH30068).

ER RIME

   Cells were seeded in 15 cm plates and when 85-90% confluent, cells were
   double crosslinked with 2 mM disuccinimidyl glutarate for 20 min
   followed by 1% formaldehyde. Pulldown was performed with either ER
   antibody mix or Rabbit IgG as described before^[124]59. RIME samples
   were digested with trypsin (Pierce) and the peptides were purified with
   Ultra-Micro C18 Spin Columns (Harvard Apparatus) prior to mass
   spectrometry analysis as previously described^[125]60. Dried peptides
   were reconstituted in 0.1% formic acid for further LC–MS/MS analysis.

   For LC-MS/MS Analysis, reconstituted peptides were analyzed on a Dionex
   Ultimate 3000 system coupled with the nano-ESI Fusion Lumos (Thermo
   Scientific). Peptides were loaded on the Acclaim PepMap 100,
   100 μm × 2 cm C18, 5 μm, 100 Ȧ trapping column and separated with the
   EASY-Spray analytical column (75 μm × 25 cm, C18, 2 μm.) with a 5–45%
   acetonitrile gradient in 0.1% formic acid at 300 nL/min flow rate. The
   full scans were performed in the Orbitrap in the range of 400 to
   1600 m/z at 120k resolution. The MS2 scans were performed in the ion
   trap with 2.0 Th isolation window, HCD collision energy 28% and dynamic
   exclusion 30 seconds.

   For data analysis, Spectral .raw files from data-dependent acquisition
   were processed with the SequestHT search engine on Thermo Scientific
   Proteome Discoverer 2.4 software. Data was searched using Uniprot
   Database Homo sapiens fasta file (taxon ID 9606—Version June2). The
   node for SequestHT included the following parameters: Precursor Mass
   Tolerance 20 ppm, Fragment Mass Tolerance 0.5 Da, Dynamic Modifications
   were Oxidation of M ( + 15.995 Da) and Deamidation of N, Q
   ( + 0.984 Da). The Precursor Ion Quantifier node (Minora Feature
   Detector) was used for label-free quantification included a Minimum
   Trace Length of 5, Max. ΔRT of Isotope Pattern 0.2 min. The consensus
   workflow included peptide validator, protein filter and scorer. For
   calculation of Precursor ion intensities, the Feature mapper was set
   True for RT alignment, with the mass tolerance of 10 ppm. Precursor
   abundance was quantified based on intensity and the level of confidence
   for peptide identifications was estimated using the Percolator node
   with a Strict FDR at q-value < 0.01.

   For bioinformatics analysis, label-free quantification was used to
   generate peptide intensity values. The data analysis was performed
   using Bioconductor-based R package qPLEXanalyzer. Initially, the data
   was normalized via within-group median scaling, where IgG served as one
   group and all other pulldowns constituted the other. Peptides detected
   in less than 25% of pulldown samples were excluded from subsequent
   analysis. For the remaining peptides, missing values were imputed by
   two methods: utilizing the minimum value in each sample (for all IgG
   controls and in pulldown samples for peptides found in less than half
   of replicates) or employing a k-nearest neighbors algorithm (for
   pulldown samples with peptides detected in more than half of
   replicates). Subsequently, a limma-based statistical analysis was
   conducted to identify differentially enriched proteins compared to the
   IgG control.

Transient siRNA transfection

   Individual ESR1 WT and mutant T47D clones were hormone-deprived in
   IMEM/5% CSS for 3 days and evenly pooled for siRNA transfection.
   Briefly, 500,000 cells were seeded in 6-well plates in IMEM/5% CSS and
   forward transfected with 50 nM final concentration of ON-TARGETplus
   Non-targeting Control Pool siRNA (Dharmacon #D-001810-10-05),
   ON-TARGETplus Human NPY1R SMARTPool siRNA (Dharmacon
   #L-005672-00-0005), ON-TARGETplus Human RLN1 SMARTPool siRNA (Dharmacon
   #L-017403-00-0005), or ON-TARGETplus Human SUSD3 SMARTPool siRNA
   (Dharmacon #L-016811-02-0005) using Lipofectamine RNAiMAX (Invitrogen
   #13778) protocol for 24 or 72 h.

qRT-PCR

   ESR1 WT and mutant T47D cells were collected after 3 days of hormone
   deprivation or 72 h post-siRNA transfection. RNA extraction was
   performed using RNeasy mini kit (Qiagen #74106). Reverse transcription
   to cDNA was performed with PrimeScript™ RT Master Mix (Takara Bio
   #RR036B). RT-PCR was then performed with SsoAdvanced Universal SYBR
   (Bio-Rad #1726275) with primers as detailed in supplemental materials
   (see below). The ΔΔCt method was used to analyze relative mRNA fold
   changes with GAPDH serving as the internal control. Statistical
   differences were evaluated using a paired t test. Sequences of primers
   are list below:
    Name   Sequence 5’ > 3'
   NPY1R-F CCATCGGACTCTCATAGGTTGTC
   NPY1R-R GACCTGTACTTATTGTCTCTCATC
   GAPDH-F TCGGAGTCAACGGATTTGGTC
   GAPDH-R AAACCATGTAGTTGAGGTCAATG
   [126]Open in a new tab

Growth assay

   Twenty-four hours after siRNA transfection, ESR1 WT and mutant T47D
   cells were plated into 96-well plates at 2000 cells per well. Cell
   viability was measured at day 0, 2, 4, 6, and 8 and quantified with
   PrestoBlue HS Cell Viability Reagent (Life Technology #[127]P50201)
   following the manufacturer’s protocol. Growth rate was determined as
   fold change compared to day 0.

Results

Expansion of EstroGene database to include studies of ER modulator response
and resistance

   To increase the clinical relevance and utility of the EstroGene
   database, we expanded its scope to include data generated in breast
   cancer cell models subjected to ER modulator treatment or designed to
   mimic endocrine therapy resistance. A comprehensive search was
   undertaken of publicly available data from the Gene Expression Omnibus
   (GEO) using specific keywords related to compound names (e.g.,
   tamoxifen, fulvestrant) or resistant model types (e.g., tamoxifen
   resistance, ESR1 mutation), coupled with the name of a specific
   technology (e.g., “RNA-seq” or “RNA sequencing”). Following this
   initial search, we performed manual filtration to ensure data accuracy.
   Our focus for endocrine-resistant models centered on
   tamoxifen-resistant (TamR), long-term estradiol deprivation (LTED), and
   ESR1 mutation models, given their prevalence in published literature.
   This search strategy yielded 178 experiments from 94 studies,
   comprising 104 RNA-seq, 57 microarray, and 17 ER ChIP-seq experiments,
   supplementing the existing 92 transcriptomic and 75 ER ChIP-seq
   experiments from the first version of the EstroGene database.
   Additionally, we incorporated in-house experiments of ER RIME (Rapid
   Immunoprecipitation Mass Spectrometry of Endogenous proteins)^[128]61
   proteomic profiling across 16 ER+ cancer cell lines (15 breast cancer
   and one endometrial cancer) representing an unbiased survey of
   ER-associated protein complexes using IP-mass spectrometry. The
   EstroGene2.0 database now encompasses multi-omic profiling data from
   361 experiments across 212 studies, spanning 28 cell lines and
   publications from 2004 to 2023 (Fig. [129]1A and Supplementary
   Data[130]1).

Fig. 1. Expansion of EstroGene database towards ER modulator response and
resistance profiles.

   [131]Fig. 1
   [132]Open in a new tab

   A A flowchat depicting the process for the EstroGene2.0 database
   collection. B–J Stacked histogram showing the metadata separated by
   technologies and two major model types, ER modulator treatment and
   endocrine resistance (Endo Resistant), across all the curated datasets
   related to year of dataset publication (B), cell line used (C),
   replicates used (D). Within ER modulator treatment collection, further
   separation by compound types (E), dose (F) and duration (G) for
   tamoxifen and fulvestrant are shown. Within endocrine resistant models,
   further subgroup by specific model types (H), ESR1 mutant cell model
   types (I) and specific variants (J) are shown.

   Similar to observations from EstroGene1.0^[133]35, publicly available
   data from studies performed after 2017 was for the most part generated
   using next-generation sequencing technologies, while microarray was
   more prevalent in earlier studies (Fig. [134]1B). Notably, most studies
   (78.6% RNA profiling and 77.4% ChIP seq) were performed in the
   ER-positive MCF7 and T47D cell lines (or endocrine therapy-resistant
   models thereof), and while this ensured that the data generated were
   robust it is also expected to introduce unavoidable biases (Fig.
   [135]1C). Most of the transcriptomic profiling experiments employed
   biological duplicates (39.8%) or triplicates (36%), whereas 82.3% of ER
   ChIP-seq experiments included only one sample (Fig. [136]1D).

   For experiments using ER-modulators, we ingested data from six
   categories of compounds (SERM, SERD, SERM/SERD Hybrid, SERCA,
   LDD/PROTAC and SERD Analog), with tamoxifen (42.7%) and fulvestrant
   (29%) being the most frequently used SERM and SERD, respectively (Fig.
   [137]1E). It is noteworthy that the majority of studies used doses of
   tamoxifen or fulvestrant ranging from 100 nmol/L to 1000 nmol/L (Fig.
   [138]1F), considerably higher than the reported physiological serum
   peak concentrations from previous pharmacokinetic studies in animal
   models and in humans^[139]62–[140]64. This raises the concern that some
   of the changes noted may be related to off-target activities of these
   drugs. Transcriptomic profiling experiments typically involved longer
   compound exposure, with a median treatment duration of 24 h, while all
   ER ChIP-seq experiments employed a maximum treatment duration of only
   1 h (Fig. [141]1G).

   We gathered 30, 16, and 8 experiments from ESR1 mutation, TamR, and
   LTED endocrine-resistant models, respectively (Fig. [142]1H). TamR
   models were profiled as early as 2008, whereas studies on ESR1 mutation
   models have increased since 2013 (Supplementary Fig. [143]1A).
   Researchers have employed various strategies to model hotspot ESR1
   mutations in breast cancer cells, including genome editing approaches
   such as CRISPR/Cas9 or recombinant adeno-associated viral system (rAAV,
   53.3%), and conventional ectopic overexpression (33.3%). Only one study
   revealed cell lines with natural occurrence of ESR1 mutations under
   long-term estradiol deprivation. In genetically engineered cells, the
   ESR1 Y537S and D538G hotspot variants were the most frequently modeled,
   reflecting their high frequency detected in patient samples^[144]65
   (Fig. [145]1J). The impact of clonal selection of engineered cells and
   how this introduces biases into the resultant transcriptomes is largely
   unanswered in studies performed to date. Regarding TamR models, the
   development procedures vary, with most models using in vitro selection
   with 100-1000 nmol/L 4OHT for over six months, while one study employed
   in vivo selection followed by harvesting of tumor cells for downstream
   characterization (Supplementary Fig. [146]1B–[147]S1C)^[148]66.

The EstroGene2.0 browser

   With the curated datasets, we downloaded and performed unified data
   processing (detailed in the Methods section). We amalgamated these
   datasets into a singular web server platform, facilitating tailored
   data exploration and visualization for researchers, particularly those
   without specialized bioinformatics skills. The EstroGene2.0 web server
   categorizes datasets into five distinct biological sections: treatment
   with an estrogen (from EstroGene1.0), ER-modulator treatment, TamR,
   LTED, and ESR1 mutation models. Building upon the foundation of the
   original EstroGene browser, this updated version introduces significant
   feature enhancements detailed below.

   The web server is structured around two primary functional modules:
   ‘metadata’ and ‘analysis’ (Supplementary Fig. [149]2A). The metadata
   section offers detailed insights into individual experiments. Users can
   navigate through specific sections and refine their search parameters
   by employing filters such as modalities, cell lines, compounds
   (ER-modulator), or mutation variants (ESR1 mutation) (Supplementary
   Fig. [150]2B). Upon identifying experiments of interest, users can
   delve deeper into sample-level documentation by accessing corresponding
   experiment IDs. For transcriptomic profiling, users can retrieve raw
   normalized gene expression values from the “Expression Matrix” tab by
   inputting a gene of interest (Supplementary Fig. [151]2C), while the
   “Differential Expression” tab furnishes log2 fold change values with
   adjusted p-values for all reasoned comparisons within a specific
   experiment (Supplementary Fig. [152]2D). Additionally, principal
   component analysis is available for each experiment, aiding in the
   evaluation of variations across replicates and biological intervention
   (Supplementary Fig. [153]2E). The complete sample-level normalized
   expression matrix and differential expression output are accessible for
   download via the “Download” tab (Supplementary Fig. [154]2F). For ER
   ChIP-seq datasets, users can access ER peak numbers and mapped reads
   for each sample (Supplementary Fig. [155]2G). Notably, a sample-level
   genomic track view feature allows direct visualization of ChIP signal
   intensity and the distribution of the user’s locus of interest through
   integration with the WashU Epigenome Browser (Supplementary Fig.
   [156]2H). Additionally, sample-level peak calling files and normalized
   read counts are available for download (Supplementary Fig. [157]2I).

   The analysis module encompasses four modes (Supplementary Fig.
   [158]3A). The single gene visualization mode (Mode1) continues the same
   strategy used in the Estrogene1.0 browser, whereby users can input a
   gene of interest (GOI) and generate volcano plots to visualize
   concordance of gene expressional changes associated with the specific
   conditions across all the pre-computed comparisons (E2: n = 149; ER
   modulators: n = 112; TamR: n = 15; LTED: n = 16; ESR1 mutation: n = 46,
   Supplementary Fig. [159]3B). A filter function allows users to tailor
   analyses towards specific experimental contexts. Users can access the
   original GEO deposition page by clicking on a data point in the plot.
   In Estrogene2.0, we additionally provide a percentile distribution plot
   as an alternative approach, which projects the gene regulatory
   percentile in a −100 to +100 window (100 represents the value of the
   topmost repression or activation normalized to all the processed genes
   in each comparison) within each comparison and merges all the results
   together, which offers further quantitative estimation of regulatory
   trend and consistency across experiments. As shown in Supplementary
   Fig. [160]3C, a well studied ER inducible gene GREB1 showed predominant
   enrichment in the range of +80 to +100 with a few exceptions below 0 in
   ESR1 mutant cells, depicting the quality of this visualization method
   alongside a representation of noted experimental variations. For the
   estrogen treatment and ESR1 mutation sections, the analysis pages
   include targeted ER genomic binding plots based upon ER ChIP-seq data.
   The TSS (Transcriptional Start Site) region view summarizes peak
   intensities within a −/+ 200 kb range of the input gene’s
   transcriptional start site (Supplementary Fig. [161]3D), while the
   genomic track view redirects users to the WashU Epigenome Browser for
   more detailed visualization (Supplementary Fig. [162]3E). While the dot
   represents the center of each ER peak in the TSS proximal region view,
   users can directly assess the full genomic coordinates by hovering the
   mouse over a dot, facilitating the selection of regions of interest for
   downstream applications such as ChIP-qPCR. The gene signature
   enrichment analysis mode (Mode2) enables an estimation of enrichment
   changes of a user-defined multiple-gene signature across all the
   comparisons in a volcano plot format. The backend calculates geometric
   mean of all the customized input genes, estimates the fold change and
   associated adjusted p value of each comparison and separately plots by
   estrogen treatment, ER-modulator treatment and endocrine resistance
   models with specific condition filters. As shown in Supplementary Fig.
   [163]3F, inputting our previously generated EstroGene estrogen response
   signature^[164]35 resulted in nearly all positive values in all the
   E2-treatment experiments, confirming the robustness of this signature.
   In Mode 3, the cross-model pattern and similarity analysis provides a
   holistic examination of a user’s input gene. Firstly, it showcases the
   regulatory trend and consistency across four biological models in a
   side-by-side comparison. Secondly, it identifies other genes displaying
   similar regulation patterns within each section, scored by Pearson
   correlation of regulation percentile. Thirdly, it determines the
   intersection of significantly similar genes across different models.
   For instance, using GREB1 as an illustration, we observe a bimodal
   distribution of its regulatory percentile in E2 treatment and
   ESR1-mutant models (close to +100%) and ER modulator treatment
   experiments (close to -100%) and vice versa for an ER repressed gene
   CCNG2 (Supplementary Fig. [165]3G). Notably, ESR1-mutant cells exhibit
   a considerably higher number of genes with similar regulatory trends as
   GREB1 (Supplementary Fig. [166]3H). Finally, in Mode 4, we integrate
   in-house RIME experiments from 16 ER+ cancer cell lines cultured in
   full medium. Users can ascertain whether a target of interest is an ER
   interaction partner and determine the most suitable cell line for
   studying this interaction. FOXA1, an ER interactive pioneer factor, is
   presented as a default entry (Supplementary Fig. [167]3I).

   In summary, the EstroGene2.0 browser not only broadens the scope of
   biological contexts and experimental modalities but also significantly
   enhances analytical functionality and diversity. This update provides
   an accessible, rapid, and comprehensive overview of cis- and trans
   regulation in estrogen/ER modulator response and resistance in breast
   cancer.

Diversity of short-term transcriptomic response to ER-modulators

   The EstroGene2.0 database not only represents a user-friendly
   accessible webserver for customized investigation, but also produces a
   harmonized large data set. We utilized this opportunity to address
   several crucial questions in the field, particularly focusing on data
   congruence. We first analyzed the impact of short-term treatment with
   ER modulators, based upon 123 pre-computed comparisons across 51
   individual studies. Within each comparison, genes were ranked based on
   fold change normalized to corresponding controls, with regulatory
   percentiles scaled from −100 to +100, representing the spectrum from
   topmost down-regulation to up-regulation. Given the prevalence of
   tamoxifen (n = 63 comparisons) and fulvestrant (n = 29 comparisons)
   experiments in our dataset and the long-standing history in breast
   cancer therapeutics of both drugs, we initiated our analysis by
   directly comparing their effects. We observed a significantly higher
   variation in overall gene regulation in tamoxifen-treated experiments
   compared to fulvestrant-treated (Fig. [168]2A), indicating a more
   heterogeneous response to tamoxifen. To elucidate the factors driving
   this variation, we constructed a multivariate regression model
   incorporating drug type, dose, duration, sequencing modality, and cell
   line for each gene across all tamoxifen and fulvestrant experiments. In
   addition to drug types, dose emerged as one of the most influential
   predictors for post drug treatment transcriptomic alterations (Fig.
   [169]2B). Previous studies have highlighted the dose-dependent nature
   of interventions with ER modulators^[170]62,[171]63. Therefore, we
   initially assessed the regulatory percentile correlation between
   experiments using 1 μmol/L and 100 nmol/L doses of tamoxifen and
   fulvestrant. Fulvestrant exhibited a robust positive correlation
   (R = 0.78) between the two doses (Supplementary Fig. [172]4A), whereas
   tamoxifen displayed a weaker correlation (R = 0.21) (Supplementary Fig.
   [173]4B), indicating a notable dose-related discrepancy. Further
   stratification identified 193 and 169 genes that were more susceptible
   to repression by 1 μmol/L and 100 nmol/L doses of tamoxifen
   (Supplementary Data[174]2), respectively, including notable candidates
   such as TFF3 and IRX4 (Supplementary Fig. [175]4C). Pathway enrichment
   analysis revealed that 100 nmol/L tamoxifen could more effectively
   block estrogen response signaling, while 1 μmol/L tamoxifen
   preferentially suppressed TNFα/NFkB, hypoxia, and TGF-β pathways
   (Supplementary Fig. [176]4D). These analyses underscore the importance
   of dose in considering the pharmacology of ER-modulators and explain in
   part the large intrinsic variation among data sets describing the
   activities of these drugs in different models.

   We next integrated the regulatory effects of tamoxifen and fulvestrant.
   Analyzing the gene-level correlation of average regulatory percentiles
   between tamoxifen and fulvestrant treatments revealed a positive
   association (R = 0.49). Notably, tamoxifen exhibited a narrower average
   distribution range, confirming the higher intrinsic variation (Fig.
   [177]2C). To delve deeper, we examined genes showing concordant or
   discordant changes between these two compounds. Pathways associated
   with cell cycle and DNA replication were downregulated in both
   compounds (Fig. [178]2D), aligning with the anti-proliferative
   activities that are attributable to SERMs and SERDs. Among commonly
   upregulated genes, enrichment was observed in FOXO-mediated cell death,
   glycan degradation, and lysosome pathways, indicating an acute
   elevation in adaptation to external stress stimuli (Fig. [179]2E).
   Notably, genes specifically upregulated with tamoxifen but
   downregulated with fulvestrant were enriched in transport activity
   (Fig. [180]2F), while genes upregulated with fulvestrant but
   downregulated with tamoxifen were involved in amino acid metabolism
   (Fig. [181]2G). Some of these differences may relate to the partial
   agonist activity of tamoxifen manifest in some cell/promoter contexts
   but may also be a manifestation of the difference in inhibiting ER
   action (SERMs) vs eliminating ER expression (SERDs).

   Tamoxifen and fulvestrant are currently used for the treatment of
   ER-positive breast cancers. However, of late there have emerged several
   new classes of ER-modulators among these are (a) oral SERDs
   (camizestrant (AZD9833), giredestrant (GDC9545), imlunestrant, and
   palazestrant (b) SERMs (lasofoxifene) (c) SERM/SERD hybrids (i.e.
   elacestrant and bazedoxifene), (d) ligand directed degraders (LDDs)
   (ARV-471), (e) ligands which inhibit ER through covalent binding to the
   receptor (SERCAs) and (f) ER non-degrader structurally similar to SERD
   (SERD analog)^[182]6. This prompts the question as to the mechanisms by
   which these drugs distinguish themselves at the level of
   transcriptional activity and how and if this could impact their
   clinical use. In our analysis, we evaluated four types of compounds
   (SERMs, SERDs, a SERCA, and ARV-471) from 10 individual studies, using
   ESR1 siRNA knockdown as a reference. Pairwise average gene regulatory
   percentile correlations of each pair of compounds revealed an overall
   positive correlation, with R values ranging between 0.2 and 0.7, except
   for the SERM endoxifen, which showed no correlation with other
   compounds (Fig. [183]2H). ARV-471 and a number of SERDs (including
   fulvestrant, giredestrant, and camizestrant) were generally more
   similar to each other, whereas SERMs/SERM SERD hybrids (including
   tamoxifen, raloxifene, and bazedoxifene) were distinctly clustered from
   each other. Additionally, SERCA H3B-6545 and H3B-9707 showed a more
   similar regulatory landscape. Unsupervised clustering of all genes
   recapitulated an overall similar regulatory trend, despite data from
   the same studies tending to be more alike (Fig. [184]2I). To
   investigate unique pathway regulation by each drug type, we calculated
   normalized enrichment scores (NES) using the MSigDB Hallmark database
   (Fig. [185]2J). Notably, we observed consistent inhibitory effects of
   all drugs towards estrogen response and cell cycle-related signatures,
   as expected. The two SERCAs (H3B-6545, H3B-9709), endoxifen and
   lasofoxifene, specifically inhibited metabolic pathways involved in
   peroxisome and fatty acid metabolism. ARV-471 and camizestrant
   exhibited additional blockade towards notch signaling, Wnt-β catenin,
   and apical surface cell adhesion function, which were not discerned
   with fulvestrant. Furthermore, a strong repression of interferon
   response signaling was observed in with second generation SERDs
   GDC0927, GNE274, and with lasofoxifene. Overall, this analysis
   highlights the functional diversity exhibited by different types of
   ER-modulators despite their similar antiproliferative activities in
   cellular models.

   Context-dependent effects of ER modulators was observed by gene level
   clustering in a few cell lines (Fig. [186]2I), namely MDA-MB-330,
   HCC1500, and CAMA1. These cell lines displayed distinct gene regulatory
   patterns, with some genes even showing opposite regulation (Fig.
   [187]2I). This effect was further reflected in a cell-line based
   clustering, where a major cluster (MCF7, T47D, BT483, BT474, and EFM19)
   and a minor cluster consisting of MDA-MB-330, CAMA1, and HCC1500 were
   segregated (Fig. [188]2K). Additionally, MDA-MB-134VI, an invasive
   lobular carcinoma cell line, exhibited a distinct response pattern
   distinguished from other models, as we reported previously^[189]67. It
   is likely that these differences in response to ER-modulators reflects
   the impact of these ligands on receptor structure and how this
   regulates the interaction of the receptor with functionally distinct
   cell-line specific coregulators. To explore this possibility, we
   conducted ER RIME profiling in 16 ER+ cell lines, including all five in
   the major cluster and CAMA1 in the minor cluster. Among the 15 breast
   cancer lines, we identified 95 proteins that consistently interact with
   ER including the canonical coregulator NCOA5 (Supplementary Fig.
   [190]4E and Supplementary Data[191]3). Further intersection analysis
   identified 42 CAMA1-specific ER interactors (Fig. [192]2L,
   Supplementary Fig. [193]4F and Supplementary Data[194]4). We inferred
   the upstream regulators controlling genes uniquely up and
   down-regulated in CAMA1 normalized to the major cluster cell lines
   using LISA^[195]40 (Supplementary Data[196]4) and overlaid the unique
   ER interactors with the predicted regulators in CAMA1 (Fig. [197]2L and
   Supplementary SFig. [198]4G, H). We identified four overlapping
   targets, including FOXA1, DDX20, and SUPT5H in both groups, and TLE3 as
   a unique ER interactor associated with CAMA1-specific up-regulated
   genes in response to ER modulator treatment. Of note, TLE3 has
   previously been characterized as an ER co-repressor^[199]68. This
   suggests that ER repression in CAMA1 cells might rely heavily on TLE3,
   and dictate a subset of transcriptional responses in this cell line.

Transcriptomic heterogeneity of endocrine resistant breast cancer cell models

   The substantial diversity in response to ER-modulators underscores the
   heterogeneous mechanisms that impact receptor pharmacology and which
   underly resistance. Consequently, we classified the transcriptomic
   features of models with resistance to different types of ER-modulators.
   Our primary focus was on models of tamoxifen resistance (TamR),
   Long-Term Estrogen Deprivation (LTED), and those which harbored ESR1
   mutations, where the number of datasets collected provided sufficient
   statistical power to draw solid conclusions (n = 15 comparisons for
   TamR, n = 16 for LTED, and n = 46 for ESR1 mutation). Correlation
   analysis utilizing average regulatory percentages from all comparisons
   of these three models, along with E2 regulation as a control, revealed
   distinct shifts in estrogenic signaling amongst the three models (Fig.
   [200]3A): 1) ESR1-mutant cells exhibited a high similarity to E2
   treatment (R = 0.46), indicative of their well-characterized
   ligand-independent ER activation. 2) LTED displayed a strong negative
   association with E2 stimulation and ESR1 mutant models, underscoring
   the selection of clones with a repression of ER signaling, potentially
   making their growth less dependent on ER. 3) Transcriptomic features of
   TamR models showed minimal association with global estrogen response.
   Examining the overlap of genes enriched in the top 10% percentile of
   up- or down-regulated targets across at least 40% of comparisons for
   each model highlighted the distinct nature of different
   endocrine-resistant types (Fig. [201]3B and Supplementary Data [202]2)
   with minimal to no overlap of regulated genes. LIN7A emerged as the
   only consistent downregulated target, while ER target genes GREB1, PGR,
   and CA12 were shared down-regulated targets between TamR and LTED
   models. Unsupervised clustering of Hallmark signature alterations
   within each model normalized to their corresponding controls (i.e.,
   parental cells for LTED and TamR, isogenic ESR1 wild-type models for
   ESR1 mutants) further revealed divergent functional shifts that
   correlated with the resistance phenotype (Fig. [203]3B). On average,
   ESR1 mutant cells strongly upregulated the expression of genes
   associated with estrogen signaling, cell cycle progression E2F and G2M
   checkpoints, as well as Wnt- β-catenin signaling. In contrast, LTED and
   TamR models showed a decline in estrogenic and cell cycle progression
   signature enrichment but gained unique features. LTED models
   exclusively exhibited enhanced metabolic functions such as fatty acid
   metabolism, bile acid metabolism, and peroxisome activities, while TamR
   lines displayed a more diverse range of gain-of-functions with several
   exclusively enriched in proinflammatory signatures such as TNFα
   signaling via NFκB, IL6-JAK-STAT3, and inflammatory response (Fig.
   [204]3C). Notably, the evident repression of E2 response signatures but
   lack of a global E2 regulation association in TamR models suggests that
   only a subset of canonical estrogen response program was impacted,
   which might be sufficient to trigger or maintain tamoxifen resistance.
   We also assessed how different endocrine resistance types and ER
   modulators affect gene expressions within the Oncotype DX panel, a
   widely used genetic test for clinical decision making towards early
   stage ER+ breast cancer. We found ER hyperactivation (E2 and ESR1
   mutation) resulted in increased proliferation and estrogen signaling
   genes and repressed HER2 pathway genes while TamR, LTED and
   ER-modulators displayed an inverse pattern (Supplementary Fig.
   [205]5A). ER signaling hyperactivation (E2 treatment group), ER
   modulator treatment and LTED results in overall decreased Oncotype DX
   recurrence score while TamR induces a higher score.

Fig. 3. Transcriptomic heterogeneity of endocrine resistant breast cancer
cell models.

   [206]Fig. 3
   [207]Open in a new tab

   A A heatmap representing the Pearson correlation coefficient between
   average regulatory percentile from each of the two endocrine resistant
   models and estrogen treatment experiments. B Venn diagrams showing the
   overlay of genes that fall into top 10% up- and down-regulation and
   consistent in at least 40% comparisons of each endocrine resistant
   models. C A heatmap depicting the average alteration of GSVA enrichment
   scores of each cell models normalized to the corresponding controls
   from 50 MSigDB Hallmark signatures with unsupervised clustering. D
   Scatter plots showing the correlation of percentage of each gene that
   falls into top 10% up- and down-regulated parts. Color code indicates
   the density of each proportion. Bi-direction genes are highlighted in
   red frame and defined as genes showing top10% activation and repression
   simultaneously in at least 15% comparisons. E A Venn diagram depicting
   the overlapping of bidirectional genes from the three endocrine
   resistant models in (D). F Dot plot showing the significantly enriched
   Hallmark pathways from bidirectional genes of each mode. Pathways are
   prioritized towards commonly changed ones. G Scatter plot showing the
   Pearson correlation between average shift of interferon response
   signatures (mean of interferon response α and γ) and estrogen response
   signatures (mean of estrogen response early and late) in ESR1 mutant
   models and TamR/LTED models respectively. H Heatmaps showing the
   average shift of interferon response signatures and estrogen response
   signatures of each comparison indicated in (G) as rows. I Line plots
   depicting the change of estrogen response signatures (left), interferon
   response signatures (middle) and ESTIMATE immune infiltration scores
   (right) between baseline and surgical samples in responder and
   non-responder from POETIOC clinical trial. Wilcoxon signed rank paired
   sample test was used. J, K Scatter plots representing the Pearson
   correlation between treatment-caused changes of interferon response
   signatures and estrogen response signatures (J) as well as each of them
   towards treatment-induced changes of KI67% (K) of each patient
   separated by responders and non-responders.

   Correlating the probability of a gene being in the top 10% of up- and
   down-regulated genes revealed a non-linear negative association,
   indicating that most genes are regulated in a single direction. The
   majority of these genes showed a correlation below 20%, underscoring
   significant variation among these experimental models (Fig. [208]3D).
   Intriguingly, we identified a subset of bidirectionally regulated genes
   in each model within the top 10% of both upregulated and downregulated
   targets and in at least 15% of comparisons (Fig. [209]3D and
   Supplementary Data [210]2). Five genes were shared among all three
   models, with four of them being involved in innate immune response
   (BST2, IFI27, IFIT1, and OAS1) (Fig. [211]3E and Supplementary Fig.
   [212]5B). Furthermore, unbiased pathway analysis consistently revealed
   enrichment of interferon α and γ response signatures in these
   bidirectional gene sets from all three models, and this divergence was
   not associated with the cell line background (Supplementary Fig.
   [213]5C). Given that the shift in estrogen response is a prominent
   feature of these endocrine-therapy resistant models, we tested whether
   the noted heterogeneity in the innate immune response signatures and
   estrogen response are linked. We separately analyzed this correlation
   in ESR1-mutant and TamR/LTED models, given the previously characterized
   nature of activated or repressed estrogen response programs,
   respectively. Surprisingly, we found that the degree of change in
   estrogen response signature showed a negative association with an
   interferon response signature in both models (Fig. [214]3G), where
   interferon signatures were only elevated in ESR1-mutant models with
   weak estrogen activation and TamR/LTED models with weak estrogen
   repression (Fig. [215]3H). These results highlight the possibility that
   the innate immune cascade serves as a universal negative mediator of
   estrogen signaling in endocrine-therapy resistant breast cancer.

   We examined the relationship between response to endocrine therapy and
   innate immunity using data from the recently published POETIC phase III
   neoadjuvant endocrine therapy trial in postmenopausal women
   (n = 115)^[216]69. We compared baseline and surgical samples after
   treatment with an aromatase inhibitor revealing a significant overall
   decrease in estrogen response signatures in both responders (n = 85)
   and non-responders (n = 30), albeit with considerable inter-patient
   heterogeneity (Fig. [217]3I). Intriguingly, interferon response
   signatures as well as inferred immune infiltration scores were
   increased within the responder group (Fig. [218]3I). Moreover, we found
   that the degree of estrogen response and interferon response associated
   with endocrine therapy were inversely correlated, regardless of patient
   outcome (Fig. [219]3J), suggesting either mutual or mono-directional
   inhibitory effects between the two signals and consistent with the
   observation in cell models. Importantly, we observed a numerically
   significant inverse correlation between interferon response and changes
   in tumor Ki67 levels associated with treatment, specifically in
   non-responders (Fig. [220]3K), while a similar trend was also observed
   between treatment-induced immune infiltration scores and Ki67 levels
   (Supplementary Fig. [221]5D). In conclusion, these analyses unveiled
   that endocrine therapy may trigger interferon response signatures and
   immune activation which is associated with greater endocrine response.
   This sheds light on potential therapeutic avenues for overcoming
   endocrine resistance in breast cancer patients.

Concordance among ESR1-mutant breast cancer cell models

   Among all the endocrine resistance model types, hotspot ESR1-mutant
   models have the largest data set available. These mutations have
   undergone extensive characterization over the past decade, facilitated
   by the establishment of a variety of different cell
   models^[222]70–[223]73. However, the majority of published studies have
   drawn their conclusions from a single cell model, with only a few
   conducting cross-validation experiments through collaborative
   efforts^[224]65,[225]74. Leveraging our comprehensive collection of
   datasets we aimed to address several key questions regarding data
   congruency.

   First, the distinct methods of cell model construction using either
   genome editing (GE), or ectopic overexpression (OE) may introduce
   potential artifacts. We compared transcriptomic regulation in 12
   ectopic overexpression and 32 genome-edited ESR1 mutant models to their
   wild-type control counterparts in the absence of estrogen stimulation.
   Surprisingly, only a weak correlation (R = 0.19) was observed in
   regulatory percentile correlation, highlighting significant inter-model
   discrepancies (Fig. [226]4A). Subsequently, we identified 539 and 133
   genes that were more prevalently regulated in GE and OE models,
   respectively (delta regulatory PCT > 70%), such as BFSP2 and ENTPD8
   (Supplementary Data [227]2 and Supplementary Fig. [228]6A).
   Interestingly, 32 out of 133 (24%) OE-specific targets belonged to the
   category of noncoding RNA such as lncRNA, ncRNA, and miRNA (e.g.,
   LINC00886, Supplementary Fig. [229]6B), suggesting that the
   overexpression step might particularly influence RNA processing.
   Importantly, this elevation was consistent in both transient
   doxycycline-inducible models and long-term stable overexpression models
   (Supplementary Fig. [230]6B). Pathway analysis identified a strong
   enrichment of innate immune response pathways (Interferon, complement,
   and TNFα/NFkB) in GE unique genes, possibly as a result of genome
   disruption. On the other hand, OE unique genes were enriched for
   estrogen response signatures, likely due to the overexpression of
   receptor levels amplifying ER-related differences (Fig. [231]4B). Given
   the differences identified, we examined how each type of model compared
   to clinical samples harboring ESR1 mutations. We harmonized four
   published cohorts comprising 313 ER+ metastatic breast cancer patients,
   consisting of 246 and 67 patients harboring WT and mutant ESR1,
   respectively. Logistic regression modeling, using genes with average
   regulatory PCT above 60% in each model type, revealed that signatures
   from both models could effectively distinguish ESR1 mutant samples from
   WT counterparts, with genome-edited models showing a greater
   performance in terms of specificity and sensitivity trade-off (Fig.
   [232]4C), suggesting a better clinical representation.

Fig. 4. Concordance among ESR1 mutant breast cancer cell models.

   [233]Fig. 4
   [234]Open in a new tab

   A Scatter plot showing Pearson correlation of average regulatory
   percentile between 12 ectopic overexpression models (OE) and 32
   genome-edited (GE) ESR1 mutant models of all the genes. Genes that more
   pronouncedly regulated by each model were highlighted (|delta
   regulatory percentile| >100 between the two models). B Bar graph
   showing the significantly enriched Hallmark pathways in GE and
   OE-preferentially regulated genes in (I). C Receiver operating
   characteristic curve depicting the performance of GE and OE upregulated
   genes (average percentile > 60%) as signatures in distinguishing ESR1
   mutant from WT clinical samples from four merged cohort of 313 samples.
   D Scatter plot showing Pearson correlation of average regulatory
   percentile between ESR1 mutant and estrogen treatment of all the genes.
   Genes are subgrouped into three parts: ligand-dependent genes (average
   percentile above 50% or below −50% in both conditions, red and blue)
   and de novo genes (average percentile above 50% in TamR and within −15
   to 15% in E2 treatment, green). E Box plot representing the BETA score
   comparison of each selected gene subgroups in (D). BETA score were
   calculated based on gained peaks of ESR1 mutant cells merged from 16
   ChIP-seq samples normalized to their corresponding controls.
   Mann–Whitney U test was used. F Scatter plot showing the correlation of
   −log10 p values of LISA predicted regulators from ligand-dependent and
   de novo genes in (A). Only significantly enriched regulators were
   shown, and top targets skewed to each side were labeled. G Scatter plot
   showing correlation of average regulatory percentile between ESR1
   mutant and estrogen treatment of 127 epigenetic modifiers. Targets
   consistently altered in both conditions or uniquely altered in one of
   the conditions were labeled. H Line plots representing the intensity of
   ATAC-seq signals from MCF7 WT and ESR1 mutant cell lines on different
   histone modification regions from ChIP-seq data of MCF7 cell line. I
   Scatter plot showing Pearson correlation of average regulatory
   percentile between Y537S and D538G ESR1 mutant variants of all the
   genes. Genes that more pronouncedly regulated by Y537S and D538G were
   highlighted (|delta regulatory percentile| >70 between the two
   variants). J Scatter plot showing the correlation of −log10 p values of
   LISA predicted regulators from Y537S and D538G-preferrentially
   regulated genes in (I). Only significantly enriched regulators were
   shown and top targets skewed to each side were labeled. K Box plots
   representing the BETA score comparison of each selected gene subgroups
   in (I). BETA score were calculated based on gained peaks of ESR1 mutant
   cells merged from 9 Y537S and 6 D538G ChIP-seq samples normalized to
   their corresponding controls. Mann–Whitney U test was used.

   Multiple research teams, including ours, have reported de novo
   transcriptomic reprogramming by ESR1 mutations, which may confer
   neomorphic gain-of-function phenotypes not observed by ligand-activated
   wild-type ER^[235]15,[236]70,[237]71. However, there are considerable
   differences in results from different groups. We observed a strong
   positive correlation (R = 0.46), between genes regulated in 146
   estrogen-treated experiments and 46 experiments in ESR1-mutant cells
   compared to their corresponding WT controls in the absence of E2 (Fig.
   [238]4D). However, we noted 78 genes showing ligand-independent
   activation and 113 genes showing ligand-independent repression (average
   PCT > 40% in both E2 and ESR1 mutation regulation) (Fig. [239]4D and
   Supplementary Data [240]2), such as EGR3 and FNBP1L, as well as 87 de
   novo targets (average PCT > 40% in ESR1 mutation and -15%-15% in E2
   regulation), such as CNTFR and CRLF1 (Supplementary Fig. [241]6C,[242]
   D). Next, we investigated whether these ligand-independent and de novo
   targets are associated with mutant ER genomic binding. We analyzed
   gained peak sets from 16 ER ChIP-seq datasets in ESR1-mutant models
   normalized to their corresponding wild-type controls from four
   independent studies^[243]32,[244]65,[245]70,[246]72,[247]74, and
   calculated the proximal gene regulatory score (−100 to +100 kb range of
   the TSS) from the peak union using Binding and Expression Target
   Analysis (BETA)^[248]58. De novo genes showed significantly lower BETA
   scores compared to both upregulated and downregulated
   ligand-independent genes (Fig. [249]4E), suggesting these targets are
   less likely to be direct ER transcriptional targets. Utilizing the LISA
   algorithm^[250]40, we predicted potential regulators of these genes.
   Canonical ligand-independent genes were closely associated with ER and
   FOXA1, while numerous epigenetic regulators at the DNA and histone
   levels were linked to de novo genes, including DNA methylation and
   histone modifiers such as KDM5C, JARID2, and KDM2B (Fig. [251]4F).
   Furthermore, we comprehensively assessed the regulatory percentile
   correlations of 127 epigenetic modifiers in ESR1-mutant and
   E2-stimulated conditions (Fig. [252]4G). We found that top
   mutant-specific mediators predominantly regulate histone H3 lysine 27
   (H3K27) methylation, including the unique increase of H3K27me3
   demethylase KDM6B. In line with this, we found H3K27me3 marked regions
   in ESR1 mutant cells presented higher degree of chromatin accessibility
   by projecting ATAC-seq signals from a previous study^[253]70, which was
   not discerned in other histone marked regions such as H3K9me3, H3K27ac
   and H3K4me3 (Fig. [254]4H). In summary, these analyses provide valuable
   insights indicating that ESR1-mutant cells exhibit an ER
   binding-independent gene expression profile via H3K27me3
   modification-driven epigenetic remodeling.

   Y537S and D538G are the two most frequently detected ESR1 mutations in
   metastatic breast cancers resistant to endocrine therapy^[255]65. While
   both mutations contribute to constitutive ER activation, pre-clinical
   studies have also revealed allele-specific
   effects^[256]15,[257]65,[258]70,[259]71, although with limited
   consistency across studies. We correlated the global gene regulatory
   percentiles and hallmark pathway alterations from 18 Y537S and 15 D538G
   cell model comparisons. While we observed a strong positive correlation
   (R = 0.74), we identified 420 and 364 genes that were more prevalently
   regulated in Y537S and D538G variants compared to WT in the absence of
   E2 (Fig. [260]4I and Supplementary Data [261]2), respectively, such as
   SYT3 and SH3TC1 (Supplementary Fig [262]6E). At the pathway level,
   Y537S was enriched for genes encoding proteins associated with
   metabolic functions such as glycolysis, oxidative phosphorylation, and
   fatty acid metabolism, while D538G regulated genes were more associated
   with hedgehog signaling. However, estrogen response and E2F signatures
   were highly upregulated in both variants (Supplementary Fig [263]6F).
   LISA analysis revealed that Y537S-specific genes were consistently
   linked to ESR1 and FOXA1 across different studies, whereas the top
   D538G regulomes involved a diverse range of developmental-related
   regulators such as EOMES and WT1, as well as alternative nuclear
   receptors such as AR and PR (Fig. [264]4J). Integration of ER ChIP-seq
   data from 9 Y537S and 6 D538G specific models revealed that
   Y537S-specific genes showed significantly higher association with Y537S
   ER genomic binding, while such a difference was not observed in the
   D538G ER ChIP-seq profile (Fig. [265]4K). In conclusion, we found
   highly consistent transcriptomic regulation between ESR1 Y537S and
   D538G variants, while each variant possesses unique features. The
   unique alteration in Y537S is likely due to its distinct ER genomic
   binding, while D538G may involve the hijacking of nonconventional
   transcriptional factors to partially reshape its unique transcriptome.

Consensus targets in ESR1-mutant breast cancer cell models

   We have determined that despite the differences in gene expression in
   the ESR1 mutant cell models, they show similarities to the
   transcriptomes identified in human clinical samples. With this in mind,
   we merged the ESR1-mutant cell model data with profiles of clinical
   specimens to pinpoint consistently regulated genes that may have been
   overlooked in previous studies using single models and datasets. We
   focused primarily on upregulated genes, given their potential utility
   as therapeutic targets. Among all 46 comparisons in ESR1 mutant models,
   we identified 25 genes that consistently ranked among the top 10% of
   upregulated targets in at least 50% of comparisons. Notably, twelve of
   these genes also overlapped with differentially upregulated genes (padj
   < 0.1, log2FC > 0) detected in ESR1 mutant versus WT clinical samples
   from at least one of four individual metastatic breast cancer cohorts
   (Fig. [266]5A and Supplementary Fig. [267]7A). Surprisingly, many of
   these genes have not been previously studied in the context of ESR1
   mutation or even estrogen regulation, such as RBM24 and C5AR2.
   Examination of their estrogen regulatory potential revealed that the
   majority of these targets are regulated via ligand-independent ER
   activation, while a few, such as RND2 and NUPR1, are subject to de novo
   regulation (Fig. [268]5B). This finding underscores the potential value
   of comprehensive data analysis in uncovering novel insights that might
   otherwise be missed in single-dataset analyses.

Fig. 5. Consensus targets in ESR1 mutant breast cancer cell models.

   [269]Fig. 5
   [270]Open in a new tab

   A A bar plot ranking the genes by consisteny of falling into top10%
   upregulated targets in ESR1 mutant cell models. Only genes showing
   equal to or above 50% consistency out of 46 comparisons were shown.
   Color code indicates the number of cohorts (four in total) in which
   this gene is also identified as a significantly upregulated target in
   ESR1 mutant samples. B Scatter plot showing the correlation of average
   regulatory percentile between ESR1 mutant and estrogen treatment of 25
   targets in (A). C Box plots showing the expression of NPY1R between
   ESR1 WT and mutant samples from the three cohorts. D, E Bar plot
   showing the distribution of regulatory percentile of NPY1R in all the
   comparisons from ESR1 mutant cells (D), estrogen and ER modulator
   treatments (E) experiments. Percentiles are ranged between −100 to +100
   and larger number indicates stronger regulation. F Genomic track view
   of ER ChIP-seq signal from ESR1 WT and ESR1 mutant models in two cell
   lines from [271]GSE148277. Genomic track scale is adjusted to the
   identical level for all the samples. G Box plots showing the relative
   expression of NPY1R in genome-edited MCF7 and T47D ESR1 mutant cells
   and WT treated with E2 condition. ∆ ∆Ct method is used and p values
   were calculated using one-way ANOVA. H Line plot showing the growth
   rate of T47D WT and ESR1 mutant cells in the presence of scramble
   (siNT) and NPY1R siRNA transfection for 8 days. Two-way ANOVA was used
   for statistic comparison. Experiment has been repeated for 3 times.

   Of particular interest, we observed two neuropeptide Y receptor (NPYR)
   genes, NPY1R and NPY5R, among the top three consistently upregulated
   targets in all the ESR1 mutant models. NPY1R has been studied in the
   context of endocrine sensitivity in ER+ breast cancer^[272]75 although
   it specific role(s) in ESR1 mutant breast cancers has not been
   explored. Initially, we confirmed a significant upregulation of NPY1R
   in ESR1 mutant tumors in three out of four clinical cohorts (Fig.
   [273]5C), along with its consistent elevation in ESR1 mutant cell
   models (irrespective of cell line and ESR1 mutant type) (Fig. [274]5D
   and Supplementary Fig. [275]7B). Notably, NPY1R demonstrated dominant
   upregulation upon E2 stimulation, while its expression was suppressed
   by inhibitors of ER signaling (Fig. [276]5E). Furthermore, we
   consistently observed gained ER binding at the proximal region of the
   NPY1R locus in ESR1 mutant cells (Fig. [277]5F). Taken together, these
   findings identify NPY1R as one of the most consistent and universal
   upregulated targets in ESR1 mutant breast cancer, likely driven by the
   constitutive activation of mutant ER and their increased binding
   activity at this gene locus. To further investigate its role in
   endocrine resistance in ESR1 mutant cells, we conducted knockdown
   experiments using MCF7 and T47D genome-edited cell models^[278]71.
   qRT-PCR validated its elevated expression in both T47D cell lines and
   MCF7 Y537S models (Fig. [279]5G). Remarkably, knockdown of NPY1R
   significantly inhibited the growth of T47D WT cells under E2
   stimulation, as well as the ligand-independent growth of ESR1 mutant
   cells (Fig. [280]5H and Supplementary Fig. [281]7C), underscoring the
   essential role of NPY1R in estrogen-dependent and ESR1 mutant breast
   cancer cell growth.

Discussion

   The increase in multiomic cancer data represents an invaluable
   resource, yet integration and cross-comparison between individual
   studies poses significant challenges. The initial EstroGene
   database^[282]35 laid a solid foundation by providing a unified
   analysis of multiple datasets and offering a versatile tool for
   comprehensive exploration of transcriptional responses to estrogens in
   validated models of breast cancer. EstroGene2.0 represents a
   substantial enhancement by incorporating experimental data on responses
   to endocrine response and on resistance to various interventions.
   EstroGene2.0 has a robust and user-friendly interface, empowering
   researchers to assess individual genes or gene signatures generated
   under specific experimental conditions. With access to both gene
   expression data and ER proximity binding information, users can
   investigate cross-dataset consistency. EstroGene2.0 provides an
   unprecedented opportunity to dissect the technical and biological
   intricacies inherent in response to endocrine therapies.

   In this study, we identified a more heterogeneous transcriptomic
   response in tamoxifen treatment compared to fulvestrant. This disparity
   may stem from the fact that tamoxifen is a partial ER agonist^[283]76.
   For instance, the diverse transcriptomic reprogramming triggered by
   different doses of tamoxifen may arise from the recruitment of distinct
   co-activators and the distinct status of monomeric versus dimeric ER.
   It is also plausible that there were greater dose-dependent variations
   in tamoxifen than fulvesntrant as indicated in Supplementary Fig.
   [284]4A, [285]B. Our analysis revealed that a concentration of 100
   nmol/L tamoxifen in vitro is sufficient to block estrogen signaling. In
   contrast, a dose of 1 μmol/L predominantly resulted in alterations in
   other signaling pathways, such as TNF-α signaling and hypoxia,
   suggesting potential off-target effects due to an oversaturation of the
   dose. Notably, a single 20 mg dose of tamoxifen (considered
   physiological) in human only results in a peak plasma level of
   approximately 20 ng/mL (or 54 nmol/L) after 24 h^[286]77. Therefore,
   significant caution must be exercised when selecting appropriate doses
   of ER modulators for in vitro studies. It is crucial to consider
   physiologically relevant concentrations, such as 100 nmol/L for both
   4-hydroxy tamoxifen (4OHT) and fulvestrant, to avoid potential false
   positives arising from off-target effects. We also observed that 32% of
   the experiments (17 out of 53) in our database did not specify the type
   of tamoxifen compound used (e.g., 4OHT or another form). This lack of
   detail may contribute to the observed heterogeneity in transcriptomic
   response patterns. To enhance reproducibility, we recommend that future
   studies clearly document the specific compound details, including the
   commercial catalog number. Moreover, despite an overall positive
   correlation between gene regulation across all ER-modulators evaluated,
   each compound type can regulate distinct gene clusters. Notably, the ER
   LDD ARV-471 exhibited similar effects to SERDs, attributable to their
   shared mechanism of ER downregulation. Among SERMs, tamoxifen,
   raloxifene, and bazedoxifene elicited markedly different effects
   compared to lasofoxifene and endoxifen. While this variation may
   reflect the limited experiments included for the latter two compounds
   and the differential distribution of cell model and modality used, it
   is plausible that different SERMs confer unique molecular portraits,
   warranting further investigation. It is also possible that these agents
   may differentially regulates other estrogen receptors such as ERβ and G
   Protein-Coupled Estrogen Receptor (GPER). Furthermore, we observed a
   tight link between transcriptomic response and cell line context.
   Integration with RIME data revealed the differential dependency of
   cell-line-specific transcriptional regulators. For instance, ER
   demonstrated a stronger binding affinity with co-repressor TLE3 in the
   CAMA1 cell line, and ER modulator treatment could more profoundly
   reshape TLE3/ER-related downstream targets, resulting in a distinct
   molecular signature compared to other cell lines. Importantly, the
   identification of these context-dependent effects underscores the
   necessity for future studies to encompass a wider spectrum of models
   when testing ER modulators. The examination of more clinically relevant
   models, such as patient-derived organoids and immunocompetent animal
   models, is warranted to further elucidate the complex interplay between
   ER modulators and complexed breast cancer biology.

   Endocrine resistance presents a complex landscape of molecular
   determinants, as our integrated analysis revealed. We observed
   interferon response signaling showing as a consistent negative
   association with estrogen signaling shift. Mutual regulation between
   these two axes has been previously reported in wild-type breast cancer
   cells. Specifically, activation of ER signaling can inhibit type I
   interferon response by restricting the engagement of the IFN-stimulated
   gene factor 3 (ISGF3) complex at the ISG promoters and disrupting the
   ISGF3 complex via interaction with STAT2^[287]78. Additionally, it has
   been demonstrated that the addition of type I and II interferon
   treatment can enhance the anti-proliferative effects of tamoxifen in
   breast cancer cell lines^[288]79. These findings suggest that the
   plasticity of interferon response may buffer the alterations in
   estrogen signaling seen in endocrine-resistant settings, either
   boosting ER signaling in ER repressive models like LTED or attenuating
   ER signaling in ER hyperactivation models like ESR1 mutants. Combining
   therapies that increase interferon signaling with standard-of-care ER
   modulator treatment may offer a potential avenue to overcome this
   resistance. Analysis from the POETIC clinical trial^[289]48 also
   highlights a potential prognostic role of interferon response and
   predicted immune infiltration scores in endocrine-resistant cases,
   suggesting this cell-autonomous interferon signaling variation might
   directly affect immune response in patients. However, caution should be
   exercised when interpreting these data, as the interferon response
   signature from bulk tumor sequencing may also be influenced by
   different immune cell subtypes and their activities. Additionally, the
   source of divergent interferon response in these models warrants
   further investigation. One possibility relates to the method of model
   construction; for instance, tamoxifen is known to induce DNA
   damage^[290]80, which could trigger innate immune responses via the
   cGAS-STING pathway. Therefore, the varying doses and durations of
   tamoxifen selection in various studies could introduce different
   degrees of interferon response, and some selected clones may harbor
   mitigated interferon signaling. It once again emphasizes the importance
   of incorporating optimal control into experimental design, particularly
   for models undergone extensive engineering or selection.

   ESR1 mutations are likely drivers of resistance to endocrine therapy
   and are associated with metastatic disease. While pre-clinical cell
   models have extensively probed this phenomenon, the consistency of
   findings from in vitro models remains to be examined. In our prior
   studies we uncovered epigenetic dysregulation in ESR1 mutant cells,
   notably the pronounced enrichment of FOXA1, CTCF, and OCT1 in open
   chromatin present in ESR1 mutant cells^[291]70, alongside alterations
   in chromatin interaction patterns driven by CTCF/cohesin complex
   remodeling^[292]74. Building upon this foundation, investigation of 46
   ESR1 mutant transcriptomic profiles highlights a tight association
   between epigenetic regulators and de novo transcriptomes. This
   correlation may stem from increased accessibility at facultative
   heterochromatin regions marked by H3K27me3, while changes at
   H3K27ac-marked regions were not discerned. These insights underscore
   how ESR1 mutations, by reducing repressive effects at existing
   heterochromatin marks, can sufficiently reshape the transcriptome, and
   the elevation of H3K27me3 demethylase KDM6B in ESR1 mutant cells
   supports this proposition. Furthermore, our observations point towards
   the involvement of DNA methylation in regulating these de novo genes,
   aligning with findings of mutant-specific alterations in DNA
   methyltransferase, such as DNMT3B. Notably, previous pharmacological
   studies have revealed heightened sensitivity of ESR1 mutant cell models
   towards epigenetic modifier inhibitors like OTX015, which targets the
   bromodomain and extra terminal domain (BET)^[293]81. These findings
   underscore the potential for further exploration into the mechanisms of
   epigenetic reprogramming in ESR1 mutant breast cancer, offering a
   promising avenue for uncovering novel vulnerabilities and therapeutic
   strategies.

   Previous preclinical studies have shed light on allele-specific
   functions of ESR1 Y537S and D538G variants, an idea bolstered by
   several clinical observations. For example, the BOLERO2 trial showcased
   differences in overall survival and response to everolimus treatment
   between tumors harboring Y537S and D538G mutations^[294]82. Our
   comprehensive analysis, comparing 18 Y537S and 15 D538G-induced
   transcriptomic experiments, reveals an overall high degree of
   similarity but we also note unique biology attributable to each mutant.
   Specifically, our findings suggest that Y537S variant-specific
   transcriptomes are predominantly regulated by ER genomic binding,
   whereas the shifts unique to D538G variants stem from a diverse array
   of ER-independent functions. This aligns with previous clinical data
   analyses indicating that Y537S mutant tumors exhibit a more pronounced
   enhancement of ER signaling^[295]65. Given that these models
   predominantly utilize MCF7 and T47D cell lines, there is an urgent need
   to establish additional ESR1-mutant cell lines and PDX and organoid
   models with diverse genetic backgrounds to corroborate these
   observations. Lastly, leveraging our comprehensive data collection, we
   identified several consistently upregulated genes in both ESR1 mutant
   models and clinical samples, many of which have not been previously
   described. Notably, two neuropeptide Y receptor (NPYR) genes, NPY1R and
   NPY5R, emerged among the top three genes in our analysis. Our
   additional experiments validated the increased expression of NPY1R in
   ESR1 mutant cells and its role in conferring growth dependency.
   Interestingly, a recent study also identified NPY1R as an estrogen
   (E2)-activated gene in breast cancer cell lines^[296]75, and showed
   that activation of NPY1R by ligand stimulation partially inhibited
   E2-driven cell proliferation. This finding contrasts with our finding
   that knockdown of NPY1R reduced estrogen stimulated and ESR1 mutant
   mediated growth, however, our studies were performed in the absence of
   NPY1R ligand activation. It is plausible that the inactive state of
   NPY1R itself may engage in crosstalk with other signaling pathways at
   baseline, contributing to the ligand-independent growth phenotype in
   ESR1 mutant cells, given previous studies demonstrating its extensive
   crosstalk network in central nervous system diseases^[297]83.

   In conclusion, the EstroGene2.0 database stands as a user-friendly
   platform tailored for the analysis and visualization of endocrine
   response and resistance in breast cancer. Similar to our initial
   observations stemming from Estrogene1.0, meta-analysis of multiple
   experiments and model systems reveals numerous novel findings not
   possible from single model studies. Our future vision extends beyond
   its current capabilities, as we aim to enrich the platform with
   additional epigenetic datasets, including histone modifications (e.g.,
   histone ChIP-seq) and chromatin interaction profiles (e.g., Hi-C),
   thereby providing deeper mechanistic insights in the future.
   Furthermore, our roadmap includes the integration of datasets from
   clinical trials involving various endocrine therapies in ER+ breast
   cancer. This expansion will culminate in the development of a consensus
   network informed by user input from clinical settings, facilitating
   informed decision-making in patient care. Moreover, our studies have
   generated a plethora of hypotheses, underscoring the need for future
   experimental validation and analysis of clinical specimens. We
   anticipate that the EstroGene2.0 database will serve as a catalyst in
   the quest to overcome resistance to endocrine therapy in breast cancer
   patients and beyond, ultimately contributing to improved patient
   outcomes.

Supplementary information

   [298]Supplementary Information^ (2.2MB, pdf)
   [299]Supplementary Data1^ (46.4KB, xlsx)
   [300]Supplementary Data2^ (37.8KB, xlsx)
   [301]Supplementary Data3^ (519.3KB, xlsx)
   [302]Supplementary Data4^ (26.5KB, xlsx)

Acknowledgements