Abstract

   Microarray technology (Human OneArray microarray, phylanxbiotech.com)
   was used to compare gene expression profiles of non-invasive MCF-7 and
   invasive MDA-MB-231 breast cancer cells exposed to dioscin (DS), a
   steroidal saponin isolated from the roots of wild yam, (Dioscorea
   villosa). Initially the differential expression of genes (DEG) was
   identified which was followed by pathway enrichment analysis (PEA). Of
   the genes queried on OneArray, we identified 4641 DEG changed between
   MCF-7 and MDA-MB-231 cells (vehicle-treated) with cut-off log2 |fold
   change|≧1. Among these genes, 2439 genes were upregulated and 2002 were
   downregulated. DS exposure (2.30 μM, 72 h) to these cells identified
   801 (MCF-7) and 96 (MDA-MB-231) DEG that showed significant difference
   when compared with the untreated cells (p<0.05). Within these gene
   sets, DS was able to upregulate 395 genes and downregulate 406 genes in
   MCF-7 and upregulate 36 and downregulate 60 genes in MDA-MB-231 cells.
   Further comparison of DEG between MCF-7 and MDA-MB-231 cells exposed to
   DS identified 3626 DEG of which 1700 were upregulated and 1926 were
   down-regulated. Regarding to PEA, 12 canonical pathways were
   significantly altered between these two cell lines. However, there was
   no alteration in any of these pathways in MCF-7 cells, while in
   MDA-MB-231 cells only MAPK pathway showed significant alteration. When
   PEA comparison was made on DS exposed cells, it was observed that only
   2 pathways were significantly affected. Further, we identified the
   shared DEG, which were targeted by DS and overlapped in both MCF-7 and
   MDA-MB-231 cells, by intersection analysis (Venn diagram). We found
   that 7 DEG were overlapped of which six are reported in the database.
   This data highlight the diverse gene networks and pathways in MCF-7 and
   MDA-MB-231 human breast cancer cell lines treated with dioscin.
     __________________________________________________________________

   Specification Table
   Subject area Biology
   More specific subject area Breast Cancer
   Type of data Table, Figure
   How data was acquired Microarray analysis; data were done by Phalanx
   Biotech Group using Human OneArray (array version HOA 6.1) which
   contains 31,741 mRNA probes that can detect 20, 672 genes in human
   genome.
   Data format Analyzed
   Experimental factors Both MCF-7 and MDA-MB-231cells (~500×10^3 cells)
   were treated with DS (2.30 µM) for three days followed by RNA
   extraction and analysis.
   Experimental features MCF-7 and MDA-MB-231 cells were cultured in
   phenol red free DMEM-F12 (1:1) medium supplemented with 10% dextran
   charcoal treated fetal bovine serum, 50 U/mL penicillin and 50 µg/mL
   streptomycin as Pen-Strep and 2 mM of l-glutamine at 37 °C in a
   humidified atmosphere of 95% air and 5% CO[2]. The cells (~500×10^3
   cells) were allowed to attach in the 25 cm^3 culture flasks in 6 mL
   volume and after 24 h the cultures were treated with DS (2. 30 µM) for
   three days.
   Data source location N/A
   Data accessibility Data is within this article and available at the
   NCBI database via GEO series accession numbers GEO: [31]GSE79465; GEO:
   GPL 19137; GEO:[32]GSM2095708; GEO:[33]GSM2095709; GEO:[34]GSM2095710
   [35]Open in a new tab

   Value of the data
     * •
       May stimulate further research on the utility of DS as a preventive
       agent of metastatic breast cancer.
     * •
       May facilitate new therapies to target specific genes that are
       associated with metastatic breast cancer.
     * •
       Genes participating in MAPK signaling pathways are the probable
       targets of breast cancer metastasis.

1. Data

   [36]Table 1 showed data on the global gene expression profile in MCF-7
   and MDA-MB-231 cell lines treated with vehicle (DMSO) or DS in vitro.
   [37]Table 2, [38]Table 3, [39]Table 4 showed gene ontology analysis
   based on molecular functions ([40]Table 2), biological processes
   ([41]Table 3), and cellular components ([42]Table 4). Various canonical
   pathways, which were significantly altered between the cell lines
   (vehicle-treated) or after DS treatment, were presented in [43]Table 5.
   The genes that were overlapped between these two cell lines (MCF-7 and
   MDA-MB-231) after DS treatment were listed in [44]Table 6 and in a Venn
   diagram format in [45]Fig. 1.

Table 1.

   Number of differentially expressed genes in MCF-7 and MDA-MB-231 cells.
           Comparison        Up-regulated (number) Down-regulated (number)
   1 MCF-7C/MDA-MB-231C      2439                  2002
   2 MCF-7C/MCF-7T           395                   406
   3 MDA-MB-231C/MDA-MB-231T 36                    60
   4 MCF-7T/MDA-MB-231T      1700                  1926
   [46]Open in a new tab

Table 2.

   Gene ontology analysis based on molecular functions.
   Gene set name Number of genes in the gene set Number of genes overlap
     __________________________________________________________________

   MCF-7 (T/C) MDA-MB-231 (T/C) MCF-7 (C)/ MDA-MB-231 (C) MCF-7
   (T)/MDA-MB-231 (T)
   Magnesium ion binding 452 38[47]^⁎ – 125[48]^⁎ 97
   Cytokine activity 195 – 8[49]^⁎ – –
   Enzyme binding 523 38 – 141[50]^⁎ 109
   Actin binding 326 23 – 95[51]^⁎ 76
   Cytoskeletal protein binding 504 – – 135[52]^⁎ 102
   Purine ribonucleotide binding 1836 95 – 410[53]^⁎ 306
   Ribonucleotide binding 1836 95 – 410[54]^⁎ –
   Purine nucleotide binding 1918 96 – 424[55]^⁎ 323
   Nucleotide binding 2245 110 – 485[56]^⁎ –
   Adenyl ribonucleotide binding 1497 81 – 332[57]^⁎ –
   ATP binding 1477 81 328[58]^⁎ 251
   Protein domain specific binding 331 – – 89[59]^⁎ –
   Nucleoside binding 1612 84 – 353[60]^⁎ 278
   Purine nucleoside binding 1601 83 – 350[61]^⁎ 273
   Adenyl nucleotide binding 1577 82 – 345[62]^⁎ 270
   Transcription factor binding 513 29 – 127[63]^⁎ –
   Enzyme activator activity 335 21 – 88[64]^⁎ 62
   [65]Open in a new tab
   ^⁎

   The asterisk indicates q<0.05 [66][3].

Table 3.

   Gene ontology analysis based on biological process.
   Gene set name Number of genes in the gene set Number of genes in
   overlap
     __________________________________________________________________

   MCF-7 (T/C) MDA-MB-231 (T/C) MCF-7 (C)/ MDA-MB-231 (C) MCF-7
   (T)/MDA-MB-231 (T)
   Protein complex biogenesis 505 47[67]^⁎ – 129[68]^⁎ –
   Protein complex assembly 505 47[69]^⁎ – 129[70]^⁎ –
   Macromolecular complex assembly 665 55[71]^⁎ – – –
   Macromolecular complex subunit organization 710 56[72]^⁎ – 165 –
   Protein oligomerization 174 20[73]^⁎ – 50 –
   Protein amino acid phosphorylation 667 47[74]^⁎ – 156 –
   Protein heterooligomerization 52 10[75]^⁎ – 17 –
   Negative regulation of cell proliferation 361 22 10[76]^⁎ 81 86[77]^⁎
   Cell cycle 776 46 – 210[78]^⁎ –
   Regulation of cell death 815 52 11 205[79]^⁎ 165[80]^⁎
   Regulation of apoptosis 804 52 11 202[81]^⁎ 163[82]^⁎
   Induction of programmed cell death 321 21 – 94[83]^⁎ 73[84]^⁎
   Regulation of programmed cell death 812 52 11 202[85]^⁎ 163[86]^⁎
   Induction of apoptosis 320 21 – 93[87]^⁎ 72[88]^⁎
   Positive regulation of cell death 435 27 – 119[89]^⁎ 95[90]^⁎
   Cell cycle process 565 34 – 147[91]^⁎ –
   Regulation of binding 153 78 4 52[92]^⁎ 42[93]^⁎
   Positive regulation of Programmed cell death 433 27 – 117[94]^⁎
   94[95]^⁎
   Positive regulation of apoptosis 430 27 – 116[96]^⁎ 93[97]^⁎
   Cell death 719 47 8 176[98]^⁎ 146[99]^⁎
   Mitotic cell cycle 370 – – 100[100]^⁎ –
   Cell division 295 – – 83[101]^⁎ –
   Death 724 47 8 176[102]^⁎ –
   Programmed cell death 611 40 8 152[103]^⁎ –
   Apoptosis 602 38 8 150[104]^⁎ –
   Regulation of DNA binding 121 – 4 41[105]^⁎ 35[106]^⁎
   Regulation of cell proliferation 787 48 12 182 183[107]^⁎
   Positive regulation of cell proliferation 414 29 – – 97[108]^⁎
   Cell proliferation 436 28 – 110 99[109]^⁎
   Neuron differentiation 438 – – – 98[110]^⁎
   Death 724 47 8 – 146[111]^⁎
   Regulation of locomotion 192 15 – 56 50[112]^⁎
   Cell migration 276 – 5 69 66[113]^⁎
   Regulation of cell motion 193 16 – 56 50[114]^⁎
   Blood vessel development 245 – – 64 60[115]^⁎
   Neuron projection development 256 – – – 62[116]^⁎
   Vasculature development 251 – – 64 61[117]^⁎
   Cell projection organization 368 – – 91 82[118]^⁎
   Regulation of cellular component size 271 – 6 66 64[119]^⁎
   Transmembrane receptor protein serine/threonine kinase signaling
   pathway 103 12 – 35 31[120]^⁎
   Regulation of cell migration 169 14 – 51 44[121]^⁎
   Hemopoietic or lymphoid organ development 260 – – 60 61[122]^⁎
   Positive regulation of developmental process 278 18 6 72 64[123]^⁎
   Axon guidance 107 – – – 31[124]^⁎
   Hemopoiesis 236 – – – 56[125]^⁎
   Positive regulation of locomotion 98 12 – 32 29[126]^⁎
   Locomotory behavior 274 – – 63[127]^⁎
   Response to vitamin 66 – – 22[128]^⁎
   [129]Open in a new tab
   ^⁎

   The asterisk indicates q<0.05 [130][3].

Table 4.

   Gene ontology analysis based on cellular component.
   Gene set name Number of genes in the gene set Number of genes in
   overlap
     __________________________________________________________________

   MCF-7 (T/C) MDA-MB-231 (T/C) MCF-7 (C)/ MDA-MB-231 (C) MCF-7
   (T)/MDA-MB-231 (T)
   Membrane-enclosed Lumen 1856 111[131]^⁎ – 397[132]^⁎ –
   Organelle lumen 1820 108[133]^⁎ – 391[134]^⁎ 300
   Intracellular organelle Lumen 1779 106[135]^⁎ – 382[136]^⁎ 291
   Nuclear lumen 1450 91[137]^⁎ – 312[138]^⁎ 243
   Nucleoplasm 882 62[139]^⁎ – 186 –
   Intracellular Non-membrane-bounded Organelle 2596 134[140]^⁎ – – –
   Non-membrane-bounded Organelle 2596 134[141]^⁎ – – –
   Cytosol 1330 74[142]^⁎ – 285[143]^⁎ –
   Cytoskeleton 1381 74[144]^⁎ – – –
   Nuclear matrix 56 9[145]^⁎ – – –
   Nuclear periphery 61 9[146]^⁎ – – –
   Extracellular space 685 – 12[147]^⁎ –
   Extracellular region part 960 – 14[148]^⁎ – –
   Lytic vacuole 211 17 – 71[149]^⁎ 56[150]^⁎
   Lysosome 211 – – 71[151]^⁎ 56[152]^⁎
   Vacuole 252 18 – 79[153]^⁎ 62[154]^⁎
   Basolateral plasma Membrane 203 14 – 64⁎ –
   Non-membrane-bounded Organelle 2596 134 – 543[155]^⁎ –
   Intracellular Non-membrane-bounded Organelle 2596 134 – 543[156]^⁎
   413[157]^⁎
   Anchoring junction 172 14 – 52[158]^⁎ 46[159]^⁎
   Adherens junction 155 – – 48[160]^⁎ 41[161]^⁎
   Golgi apparatus 872 – – 197[162]^⁎ 150
   Mitochondrion 1087 56 – 239[163]^⁎ –
   Cell fraction 1083 – – 237[164]^⁎ 209[165]^⁎
   Nucleolus 698 – – 107[166]^⁎ 129
   Cell leading edge 138 – – 41[167]^⁎ 37[168]^⁎
   Extracellular matrix 345 – 5 – 78[169]^⁎
   Insoluble fraction 839 – – – 159[170]^⁎
   [171]Open in a new tab
   ^⁎

   The asterisk indicates q<0.05 [172][3].

Table 5.

   Gene set enrichment analysis based on the canonical pathway.
   Gene set name Number of genes in the gene set Number of genes in
   overlap
     __________________________________________________________________

   MCF-7 (T/C) MDA-MB-231 (T/C) MCF-7 (C)/ MDA-MB-231 (C) MCF-7
   (T)/MDA-MB-231 (T)
   MAPK signaling pathway 267 – 7[173]^⁎ 70 56
   Pathways in cancer 328 27 – 99[174]^⁎ 76[175]^⁎
   Apoptosis 87 – – 34[176]^⁎ 23
   Lysosome 117 – – 41[177]^⁎ 37[178]^⁎
   VEGF signaling pathway 75 – – 29[179]^⁎ –
   Focal adhesion 201 – – 60[180]^⁎ –
   Prostate cancer 89 – – 32[181]^⁎ –
   mTOR signaling pathway 52 – – 21[182]^⁎ –
   Pancreatic cancer 72 – – 26[183]^⁎ –
   Colorectal cancer 84 – – 29[184]^⁎ –
   Renal cell carcinoma 70 – – 25[185]^⁎ –
   Regulation of actin cytoskeleton 215 16 – 59[186]^⁎ –
   Small cell lung cancer 84 – – 28[187]^⁎ –
   [188]Open in a new tab
   ^⁎

   The asterisk indicates q<0.05 [189][3].

Table 6.

   List of genes overlapped between the two cell lines.
   Gene symbol Description of the gene Log2 (ratio)
     __________________________________________________________________

   MDA-MB-231C/MCF-7C MCF-7T/MCF-7C MDA-MB-231T/MDA-MB-231C
   MDA-MB-231T/MCF-7T
   ERRFI1 ERBB receptor feedback inhibitor 1 0.01 1.33 1.06 −0.35
   MMP1 Matrix metallopeptidase 1 (interstitial collagenase) 1.59 2.70
   2.09 0.96
   SOD2 Superoxide dismutase 2, mitochondrial 2.54 1.04 1.08 2.61
   IL24 Interleukin 24 −0.93 1.44 2.86 0.37
   PTRF Polymerase I and transcript release factor −1.54 −2.35 −1.03 −0.23
   ALKBH5 AlkB, alkylation repair homolog 5 (E. coli) −0.70 −1.36 −1.01
   −0.40
   [190]Open in a new tab

Fig. 1.

   [191]Fig. 1
   [192]Open in a new tab

   Venn diagram of the overlap among DEGs of MCF-7 and MDA-MB-231 cells
   exposed to DS (2.30 µM, 72 h). The MCF-7 and MDA-MB-231 cells shared
   seven genes of which six genes were found in the data base.

2. Experimental design, materials and methods

2.1. Cell culture, DS treatment, and extraction of nucleic acids

   The detailed procedure of cell culture, treatment with DS, and the
   isolation of RNA have been described in our previous study [193][1]. In
   brief, human breast adenocarcinoma, MCF-7 (ER^+) and MDA-MB-231 (ER^−)
   cells were maintained in phenol red free DMEM-F12 (1:1) medium
   supplemented with 10% dextran charcoal treated fetal bovine serum,
   50 U/mL penicillin and 50 µg/mL streptomycin and 2 mM of l-glutamine.
   The cells (~500×10^3 cells) were allowed to attach in the 25 cm^3
   culture flasks in 6 mL volume for 24 h before treating with DS
   (2.30 µM) for three days. After complete removal of the media, the
   cells were trypsinized, resuspended in the medium, and washed twice
   with PBS. RNA extraction was made by Trizol reagent as described
   previously [194][1]. Briefly, Trizol reagent (Invitrogen, Carlsberg,
   CA) was used to lyse the cells. Chloroform was added to the lysate for
   phase separation. The clean aqueous phase (RNA) was transferred to a
   clean 1.5 ml Eppendorf tube and RNA was precipitated by 2-propanol.
   After a quick wash in 75% ethanol, the extracted RNA was dissolved in
   nuclease-free water. The samples (extracted RNA) were further treated
   with DNase I (Promega, Madison, WI), to remove DNA contamination, if
   any. Finally, the concentration of RNA was determined by NanoDrop 2000c
   (Thermo Fisher Scientific, Waltham, MA) and the samples were stored at
   −80 °C until sending to Phalanx Biotech Group for microarray analysis.

2.2. Microarray analysis

   Microarray analysis was carried out by Phalanx Biotech Group using
   OneArray (array version HOA 6.1) which contains 31,741 mRNA probes that
   can detect 20, 672 genes in human genome. In brief, the purity of the
   extracted RNA was checked using NanoDrop ND-1000. The Pass criteria for
   absorbance ratios are established as A260/A280≥1.8 and A260/A230≥1.5.
   RIN values are ascertained using Agilent RNA 6000 Nano assay to
   determine RNA integrity. Pass criteria for RIN value is established at
   >6. Genomic DNA (gDNA) contamination was evaluated by gel
   electrophoresis. Any RNA that did not meet these criteria was excluded
   from the analysis.

   Target preparation was performed using an Eberwine-based amplification
   method with Amino Allyl MessageAmp II aRNA Amplification Kit (Ambion,
   AM1753) to generate amino-allyl antisense RNA (aa-aRNA). Labeled aRNA
   coupled with NHS-CyDye (Cy5) was prepared and purified prior to
   hybridization. Purified coupled aRNA was quantified using NanoDrop
   ND-1000; pass criteria for CyDye incorporation efficiency at >15 dye
   molecular/1000 nt. All the raw data are available in NCBI׳s gene
   expression Omnibus and are accessible through GEO series accession
   number [195]GSE79465
   ([196]http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE79465).

2.3. Gene expression data analysis

   Global scaling normalization (scatter plot, histogram and volcano plot,
   principal component analysis) was carried out, and the fold changes
   (cut-off (log2 |fold change|≧1)) were calculated based on the relative
   signal intensities (scanned by Agilent 0.1 XDR protocol). A filtering
   step was performed using Rosetta error model [197][2] which allowed for
   determination of the statistical significance of every pair wise gene
   between different groups. The default multiple testing corrections used
   was Benjamini and Hochberg [198][3] false discovery rate with a q value
   cutoff <0.05. The testing correction was the least stringent of all
   corrections and provided a good balance between the discovery of
   statistically significant genes and the limitation of false positive
   occurrences by removing all gene spots with a q value >0.05 in all
   conditions. This procedure narrowed the list of genes to those
   significantly affected by DS treatment. Gene annotation was based on
   two data bases: NCBI ref seq release 57.ensembl release 70 cDNA
   sequences and homo_sapiens_core_70_37. Finally the pathway enrichment
   analysis (PEA) was utilized to group and display genes with similar
   expression profiles. The online tool Database for Annotation,
   Visualization, and Integrated Discovery (DAVID) [199][4] was used for
   PEA. The selected KEGG (Kyoto Encyclopedia of Genes and Genomes)
   pathways with an adjusted EASE (Expression Analysis Systematic Explore)
   score p value ≤0.05 and count >2. Data gained by this technique may
   help to understand more on in vitro studies of botanical natural
   products used in breast cancer treatment. The pathway analysis was used
   to examine functional correlations within the cell lines and different
   treatment groups. Data sets containing gene identifiers and
   corresponding expression values were uploaded into the application.
   Each gene identifier was mapped to its corresponding gene object in the
   KEGG pathway map with an adjusted EASE (Expression Analysis Systematic
   Explore) score p value ≤0.05 and count >2. Networks were “named” on the
   most common functional group(s) present in the database. Canonical
   pathway analysis (GeneGo maps) as evaluated acknowledged
   function-specific genes significantly present within the network
   [200][5].

Acknowledgments