Abstract Microarray technology (Human OneArray microarray, phylanxbiotech.com) was used to compare gene expression profiles of non-invasive MCF-7 and invasive MDA-MB-231 breast cancer cells exposed to dioscin (DS), a steroidal saponin isolated from the roots of wild yam, (Dioscorea villosa). Initially the differential expression of genes (DEG) was identified which was followed by pathway enrichment analysis (PEA). Of the genes queried on OneArray, we identified 4641 DEG changed between MCF-7 and MDA-MB-231 cells (vehicle-treated) with cut-off log2 |fold change|≧1. Among these genes, 2439 genes were upregulated and 2002 were downregulated. DS exposure (2.30 μM, 72 h) to these cells identified 801 (MCF-7) and 96 (MDA-MB-231) DEG that showed significant difference when compared with the untreated cells (p<0.05). Within these gene sets, DS was able to upregulate 395 genes and downregulate 406 genes in MCF-7 and upregulate 36 and downregulate 60 genes in MDA-MB-231 cells. Further comparison of DEG between MCF-7 and MDA-MB-231 cells exposed to DS identified 3626 DEG of which 1700 were upregulated and 1926 were down-regulated. Regarding to PEA, 12 canonical pathways were significantly altered between these two cell lines. However, there was no alteration in any of these pathways in MCF-7 cells, while in MDA-MB-231 cells only MAPK pathway showed significant alteration. When PEA comparison was made on DS exposed cells, it was observed that only 2 pathways were significantly affected. Further, we identified the shared DEG, which were targeted by DS and overlapped in both MCF-7 and MDA-MB-231 cells, by intersection analysis (Venn diagram). We found that 7 DEG were overlapped of which six are reported in the database. This data highlight the diverse gene networks and pathways in MCF-7 and MDA-MB-231 human breast cancer cell lines treated with dioscin. __________________________________________________________________ Specification Table Subject area Biology More specific subject area Breast Cancer Type of data Table, Figure How data was acquired Microarray analysis; data were done by Phalanx Biotech Group using Human OneArray (array version HOA 6.1) which contains 31,741 mRNA probes that can detect 20, 672 genes in human genome. Data format Analyzed Experimental factors Both MCF-7 and MDA-MB-231cells (~500×10^3 cells) were treated with DS (2.30 µM) for three days followed by RNA extraction and analysis. Experimental features MCF-7 and MDA-MB-231 cells were cultured in phenol red free DMEM-F12 (1:1) medium supplemented with 10% dextran charcoal treated fetal bovine serum, 50 U/mL penicillin and 50 µg/mL streptomycin as Pen-Strep and 2 mM of l-glutamine at 37 °C in a humidified atmosphere of 95% air and 5% CO[2]. The cells (~500×10^3 cells) were allowed to attach in the 25 cm^3 culture flasks in 6 mL volume and after 24 h the cultures were treated with DS (2. 30 µM) for three days. Data source location N/A Data accessibility Data is within this article and available at the NCBI database via GEO series accession numbers GEO: [31]GSE79465; GEO: GPL 19137; GEO:[32]GSM2095708; GEO:[33]GSM2095709; GEO:[34]GSM2095710 [35]Open in a new tab Value of the data * • May stimulate further research on the utility of DS as a preventive agent of metastatic breast cancer. * • May facilitate new therapies to target specific genes that are associated with metastatic breast cancer. * • Genes participating in MAPK signaling pathways are the probable targets of breast cancer metastasis. 1. Data [36]Table 1 showed data on the global gene expression profile in MCF-7 and MDA-MB-231 cell lines treated with vehicle (DMSO) or DS in vitro. [37]Table 2, [38]Table 3, [39]Table 4 showed gene ontology analysis based on molecular functions ([40]Table 2), biological processes ([41]Table 3), and cellular components ([42]Table 4). Various canonical pathways, which were significantly altered between the cell lines (vehicle-treated) or after DS treatment, were presented in [43]Table 5. The genes that were overlapped between these two cell lines (MCF-7 and MDA-MB-231) after DS treatment were listed in [44]Table 6 and in a Venn diagram format in [45]Fig. 1. Table 1. Number of differentially expressed genes in MCF-7 and MDA-MB-231 cells. Comparison Up-regulated (number) Down-regulated (number) 1 MCF-7C/MDA-MB-231C 2439 2002 2 MCF-7C/MCF-7T 395 406 3 MDA-MB-231C/MDA-MB-231T 36 60 4 MCF-7T/MDA-MB-231T 1700 1926 [46]Open in a new tab Table 2. Gene ontology analysis based on molecular functions. Gene set name Number of genes in the gene set Number of genes overlap __________________________________________________________________ MCF-7 (T/C) MDA-MB-231 (T/C) MCF-7 (C)/ MDA-MB-231 (C) MCF-7 (T)/MDA-MB-231 (T) Magnesium ion binding 452 38[47]^⁎ – 125[48]^⁎ 97 Cytokine activity 195 – 8[49]^⁎ – – Enzyme binding 523 38 – 141[50]^⁎ 109 Actin binding 326 23 – 95[51]^⁎ 76 Cytoskeletal protein binding 504 – – 135[52]^⁎ 102 Purine ribonucleotide binding 1836 95 – 410[53]^⁎ 306 Ribonucleotide binding 1836 95 – 410[54]^⁎ – Purine nucleotide binding 1918 96 – 424[55]^⁎ 323 Nucleotide binding 2245 110 – 485[56]^⁎ – Adenyl ribonucleotide binding 1497 81 – 332[57]^⁎ – ATP binding 1477 81 328[58]^⁎ 251 Protein domain specific binding 331 – – 89[59]^⁎ – Nucleoside binding 1612 84 – 353[60]^⁎ 278 Purine nucleoside binding 1601 83 – 350[61]^⁎ 273 Adenyl nucleotide binding 1577 82 – 345[62]^⁎ 270 Transcription factor binding 513 29 – 127[63]^⁎ – Enzyme activator activity 335 21 – 88[64]^⁎ 62 [65]Open in a new tab ^⁎ The asterisk indicates q<0.05 [66][3]. Table 3. Gene ontology analysis based on biological process. Gene set name Number of genes in the gene set Number of genes in overlap __________________________________________________________________ MCF-7 (T/C) MDA-MB-231 (T/C) MCF-7 (C)/ MDA-MB-231 (C) MCF-7 (T)/MDA-MB-231 (T) Protein complex biogenesis 505 47[67]^⁎ – 129[68]^⁎ – Protein complex assembly 505 47[69]^⁎ – 129[70]^⁎ – Macromolecular complex assembly 665 55[71]^⁎ – – – Macromolecular complex subunit organization 710 56[72]^⁎ – 165 – Protein oligomerization 174 20[73]^⁎ – 50 – Protein amino acid phosphorylation 667 47[74]^⁎ – 156 – Protein heterooligomerization 52 10[75]^⁎ – 17 – Negative regulation of cell proliferation 361 22 10[76]^⁎ 81 86[77]^⁎ Cell cycle 776 46 – 210[78]^⁎ – Regulation of cell death 815 52 11 205[79]^⁎ 165[80]^⁎ Regulation of apoptosis 804 52 11 202[81]^⁎ 163[82]^⁎ Induction of programmed cell death 321 21 – 94[83]^⁎ 73[84]^⁎ Regulation of programmed cell death 812 52 11 202[85]^⁎ 163[86]^⁎ Induction of apoptosis 320 21 – 93[87]^⁎ 72[88]^⁎ Positive regulation of cell death 435 27 – 119[89]^⁎ 95[90]^⁎ Cell cycle process 565 34 – 147[91]^⁎ – Regulation of binding 153 78 4 52[92]^⁎ 42[93]^⁎ Positive regulation of Programmed cell death 433 27 – 117[94]^⁎ 94[95]^⁎ Positive regulation of apoptosis 430 27 – 116[96]^⁎ 93[97]^⁎ Cell death 719 47 8 176[98]^⁎ 146[99]^⁎ Mitotic cell cycle 370 – – 100[100]^⁎ – Cell division 295 – – 83[101]^⁎ – Death 724 47 8 176[102]^⁎ – Programmed cell death 611 40 8 152[103]^⁎ – Apoptosis 602 38 8 150[104]^⁎ – Regulation of DNA binding 121 – 4 41[105]^⁎ 35[106]^⁎ Regulation of cell proliferation 787 48 12 182 183[107]^⁎ Positive regulation of cell proliferation 414 29 – – 97[108]^⁎ Cell proliferation 436 28 – 110 99[109]^⁎ Neuron differentiation 438 – – – 98[110]^⁎ Death 724 47 8 – 146[111]^⁎ Regulation of locomotion 192 15 – 56 50[112]^⁎ Cell migration 276 – 5 69 66[113]^⁎ Regulation of cell motion 193 16 – 56 50[114]^⁎ Blood vessel development 245 – – 64 60[115]^⁎ Neuron projection development 256 – – – 62[116]^⁎ Vasculature development 251 – – 64 61[117]^⁎ Cell projection organization 368 – – 91 82[118]^⁎ Regulation of cellular component size 271 – 6 66 64[119]^⁎ Transmembrane receptor protein serine/threonine kinase signaling pathway 103 12 – 35 31[120]^⁎ Regulation of cell migration 169 14 – 51 44[121]^⁎ Hemopoietic or lymphoid organ development 260 – – 60 61[122]^⁎ Positive regulation of developmental process 278 18 6 72 64[123]^⁎ Axon guidance 107 – – – 31[124]^⁎ Hemopoiesis 236 – – – 56[125]^⁎ Positive regulation of locomotion 98 12 – 32 29[126]^⁎ Locomotory behavior 274 – – 63[127]^⁎ Response to vitamin 66 – – 22[128]^⁎ [129]Open in a new tab ^⁎ The asterisk indicates q<0.05 [130][3]. Table 4. Gene ontology analysis based on cellular component. Gene set name Number of genes in the gene set Number of genes in overlap __________________________________________________________________ MCF-7 (T/C) MDA-MB-231 (T/C) MCF-7 (C)/ MDA-MB-231 (C) MCF-7 (T)/MDA-MB-231 (T) Membrane-enclosed Lumen 1856 111[131]^⁎ – 397[132]^⁎ – Organelle lumen 1820 108[133]^⁎ – 391[134]^⁎ 300 Intracellular organelle Lumen 1779 106[135]^⁎ – 382[136]^⁎ 291 Nuclear lumen 1450 91[137]^⁎ – 312[138]^⁎ 243 Nucleoplasm 882 62[139]^⁎ – 186 – Intracellular Non-membrane-bounded Organelle 2596 134[140]^⁎ – – – Non-membrane-bounded Organelle 2596 134[141]^⁎ – – – Cytosol 1330 74[142]^⁎ – 285[143]^⁎ – Cytoskeleton 1381 74[144]^⁎ – – – Nuclear matrix 56 9[145]^⁎ – – – Nuclear periphery 61 9[146]^⁎ – – – Extracellular space 685 – 12[147]^⁎ – Extracellular region part 960 – 14[148]^⁎ – – Lytic vacuole 211 17 – 71[149]^⁎ 56[150]^⁎ Lysosome 211 – – 71[151]^⁎ 56[152]^⁎ Vacuole 252 18 – 79[153]^⁎ 62[154]^⁎ Basolateral plasma Membrane 203 14 – 64⁎ – Non-membrane-bounded Organelle 2596 134 – 543[155]^⁎ – Intracellular Non-membrane-bounded Organelle 2596 134 – 543[156]^⁎ 413[157]^⁎ Anchoring junction 172 14 – 52[158]^⁎ 46[159]^⁎ Adherens junction 155 – – 48[160]^⁎ 41[161]^⁎ Golgi apparatus 872 – – 197[162]^⁎ 150 Mitochondrion 1087 56 – 239[163]^⁎ – Cell fraction 1083 – – 237[164]^⁎ 209[165]^⁎ Nucleolus 698 – – 107[166]^⁎ 129 Cell leading edge 138 – – 41[167]^⁎ 37[168]^⁎ Extracellular matrix 345 – 5 – 78[169]^⁎ Insoluble fraction 839 – – – 159[170]^⁎ [171]Open in a new tab ^⁎ The asterisk indicates q<0.05 [172][3]. Table 5. Gene set enrichment analysis based on the canonical pathway. Gene set name Number of genes in the gene set Number of genes in overlap __________________________________________________________________ MCF-7 (T/C) MDA-MB-231 (T/C) MCF-7 (C)/ MDA-MB-231 (C) MCF-7 (T)/MDA-MB-231 (T) MAPK signaling pathway 267 – 7[173]^⁎ 70 56 Pathways in cancer 328 27 – 99[174]^⁎ 76[175]^⁎ Apoptosis 87 – – 34[176]^⁎ 23 Lysosome 117 – – 41[177]^⁎ 37[178]^⁎ VEGF signaling pathway 75 – – 29[179]^⁎ – Focal adhesion 201 – – 60[180]^⁎ – Prostate cancer 89 – – 32[181]^⁎ – mTOR signaling pathway 52 – – 21[182]^⁎ – Pancreatic cancer 72 – – 26[183]^⁎ – Colorectal cancer 84 – – 29[184]^⁎ – Renal cell carcinoma 70 – – 25[185]^⁎ – Regulation of actin cytoskeleton 215 16 – 59[186]^⁎ – Small cell lung cancer 84 – – 28[187]^⁎ – [188]Open in a new tab ^⁎ The asterisk indicates q<0.05 [189][3]. Table 6. List of genes overlapped between the two cell lines. Gene symbol Description of the gene Log2 (ratio) __________________________________________________________________ MDA-MB-231C/MCF-7C MCF-7T/MCF-7C MDA-MB-231T/MDA-MB-231C MDA-MB-231T/MCF-7T ERRFI1 ERBB receptor feedback inhibitor 1 0.01 1.33 1.06 −0.35 MMP1 Matrix metallopeptidase 1 (interstitial collagenase) 1.59 2.70 2.09 0.96 SOD2 Superoxide dismutase 2, mitochondrial 2.54 1.04 1.08 2.61 IL24 Interleukin 24 −0.93 1.44 2.86 0.37 PTRF Polymerase I and transcript release factor −1.54 −2.35 −1.03 −0.23 ALKBH5 AlkB, alkylation repair homolog 5 (E. coli) −0.70 −1.36 −1.01 −0.40 [190]Open in a new tab Fig. 1. [191]Fig. 1 [192]Open in a new tab Venn diagram of the overlap among DEGs of MCF-7 and MDA-MB-231 cells exposed to DS (2.30 µM, 72 h). The MCF-7 and MDA-MB-231 cells shared seven genes of which six genes were found in the data base. 2. Experimental design, materials and methods 2.1. Cell culture, DS treatment, and extraction of nucleic acids The detailed procedure of cell culture, treatment with DS, and the isolation of RNA have been described in our previous study [193][1]. In brief, human breast adenocarcinoma, MCF-7 (ER^+) and MDA-MB-231 (ER^−) cells were maintained in phenol red free DMEM-F12 (1:1) medium supplemented with 10% dextran charcoal treated fetal bovine serum, 50 U/mL penicillin and 50 µg/mL streptomycin and 2 mM of l-glutamine. The cells (~500×10^3 cells) were allowed to attach in the 25 cm^3 culture flasks in 6 mL volume for 24 h before treating with DS (2.30 µM) for three days. After complete removal of the media, the cells were trypsinized, resuspended in the medium, and washed twice with PBS. RNA extraction was made by Trizol reagent as described previously [194][1]. Briefly, Trizol reagent (Invitrogen, Carlsberg, CA) was used to lyse the cells. Chloroform was added to the lysate for phase separation. The clean aqueous phase (RNA) was transferred to a clean 1.5 ml Eppendorf tube and RNA was precipitated by 2-propanol. After a quick wash in 75% ethanol, the extracted RNA was dissolved in nuclease-free water. The samples (extracted RNA) were further treated with DNase I (Promega, Madison, WI), to remove DNA contamination, if any. Finally, the concentration of RNA was determined by NanoDrop 2000c (Thermo Fisher Scientific, Waltham, MA) and the samples were stored at −80 °C until sending to Phalanx Biotech Group for microarray analysis. 2.2. Microarray analysis Microarray analysis was carried out by Phalanx Biotech Group using OneArray (array version HOA 6.1) which contains 31,741 mRNA probes that can detect 20, 672 genes in human genome. In brief, the purity of the extracted RNA was checked using NanoDrop ND-1000. The Pass criteria for absorbance ratios are established as A260/A280≥1.8 and A260/A230≥1.5. RIN values are ascertained using Agilent RNA 6000 Nano assay to determine RNA integrity. Pass criteria for RIN value is established at >6. Genomic DNA (gDNA) contamination was evaluated by gel electrophoresis. Any RNA that did not meet these criteria was excluded from the analysis. Target preparation was performed using an Eberwine-based amplification method with Amino Allyl MessageAmp II aRNA Amplification Kit (Ambion, AM1753) to generate amino-allyl antisense RNA (aa-aRNA). Labeled aRNA coupled with NHS-CyDye (Cy5) was prepared and purified prior to hybridization. Purified coupled aRNA was quantified using NanoDrop ND-1000; pass criteria for CyDye incorporation efficiency at >15 dye molecular/1000 nt. All the raw data are available in NCBI׳s gene expression Omnibus and are accessible through GEO series accession number [195]GSE79465 ([196]http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE79465). 2.3. Gene expression data analysis Global scaling normalization (scatter plot, histogram and volcano plot, principal component analysis) was carried out, and the fold changes (cut-off (log2 |fold change|≧1)) were calculated based on the relative signal intensities (scanned by Agilent 0.1 XDR protocol). A filtering step was performed using Rosetta error model [197][2] which allowed for determination of the statistical significance of every pair wise gene between different groups. The default multiple testing corrections used was Benjamini and Hochberg [198][3] false discovery rate with a q value cutoff <0.05. The testing correction was the least stringent of all corrections and provided a good balance between the discovery of statistically significant genes and the limitation of false positive occurrences by removing all gene spots with a q value >0.05 in all conditions. This procedure narrowed the list of genes to those significantly affected by DS treatment. Gene annotation was based on two data bases: NCBI ref seq release 57.ensembl release 70 cDNA sequences and homo_sapiens_core_70_37. Finally the pathway enrichment analysis (PEA) was utilized to group and display genes with similar expression profiles. The online tool Database for Annotation, Visualization, and Integrated Discovery (DAVID) [199][4] was used for PEA. The selected KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways with an adjusted EASE (Expression Analysis Systematic Explore) score p value ≤0.05 and count >2. Data gained by this technique may help to understand more on in vitro studies of botanical natural products used in breast cancer treatment. The pathway analysis was used to examine functional correlations within the cell lines and different treatment groups. Data sets containing gene identifiers and corresponding expression values were uploaded into the application. Each gene identifier was mapped to its corresponding gene object in the KEGG pathway map with an adjusted EASE (Expression Analysis Systematic Explore) score p value ≤0.05 and count >2. Networks were “named” on the most common functional group(s) present in the database. Canonical pathway analysis (GeneGo maps) as evaluated acknowledged function-specific genes significantly present within the network [200][5]. Acknowledgments