Abstract

   Colorectal cancer (CRC) is the third most common cancer worldwide and
   liver metastases are the leading cause of death in patients with CRC.
   In this study, we performed next‐generation sequencing profiling on
   primary colorectal tumor tissues obtained from three CRC patients with
   liver metastases and three CRC patients without liver metastases to
   identify differentially expressed genes (DEGs) that might be
   responsible for the metastases process. After filtering 2690 DEGs,
   comprising 996 upregulated and 1694 downregulated RNAs, 22 upregulated
   and 73 downregulated DEGs were identified. Gene ontology (GO) and
   pathway analyses were performed to determine the underlying mechanisms.
   Single‐organism process (biological process), cell (cellular
   component), and binding (molecular function) were the most related
   terms in the GO analysis. We selected the top 13 upregulated and top 12
   downregulated genes by fold change to verify their differential
   expression using quantitative real‐time reverse transcription PCR
   (qRT‐PCR) and immunohistochemistry (IHC). The validation showed that
   three most significantly upregulated DEGs were HOXD10,UGT2A3, and
   SLC13A2, whereas the five most significantly downregulated DEGs were
   SPP1,CXCL8,MMP3,OSM, and CXCL6, respectively. These aberrantly
   expressed genes may play pivotal roles in promoting or inhibiting
   metastases. Further studies are required to determine the functions of
   DEGs to promote the diagnosis of metastases and provide novel
   chemotherapy targets.

   Keywords: Colorectal cancer, CRC, liver metastases, next‐generation
   sequencing, NGS

Introduction

   Colorectal cancer (CRC) is the third most common type of cancer
   globally, resulting in 700,000 deaths annually, and this number is
   still increasing [44]1. The liver is the most frequent distant
   metastases organ for CRC, and liver metastases are the leading death
   cause for CRC patients [45]2. Approximately 10–15% of patients are
   diagnosed with CRC with synchronous liver metastases; however, only
   10–20% of patients qualify for surgery, which it is the only possible
   treatment to cure CRC with liver metastases currently [46]3.
   Unfortunately, about 30–50% of patients would suffer from reoccurrence
   of local tumor or distant metastasis even after curative resection of
   primary tumors [47]4, [48]5. Liver metastases are a multiple‐step
   process influenced by various factors, and is not just the simple
   process whereby cells migrate from primary lesions to distant organs.
   Given our poor understanding of genetic factors that affect this
   process, it is critically important to reveal the genetic and molecular
   causes that underlie the occurrence of liver metastasis. Meanwhile,
   conventional methods to detect distant metastases are still radiology
   and biopsy, which lack economic efficiency and convenience [49]6. Thus,
   novel biomarkers of metastatic progression are required urgently to
   detect liver metastases and improve diagnostic accuracy.

   The emergence of next‐generation sequencing (NGS) has already changed
   the ways we perform genomic studies. Before NGS, gene expression was
   quantified by other methods, such as hybridization‐based microarrays.
   Compared with microarrays, NGS is able to detect RNAs with very low
   expressions and identify unrecognized novel targets [50]7, as well as
   providing more detailed information [51]8. Several previous studies
   [52]9, [53]10, [54]11, [55]12, [56]13, [57]14, [58]15, [59]16 used NGS
   to seek mutations and differentially expressed genes (DEGs) that could
   play pivotal roles in the colorectal carcinogenesis and metastases,
   which reveals the significance of NGS.

   However, although NGS has been applied in studies involving CRC liver
   metastases, it was reported that metastatic and primary CRC
   demonstrated a high concordance in gene expression. Therefore, our
   study adopted another strategy. Comprehensive NGS‐based
   whole‐transcriptome profiling was performed on six primary colorectal
   cancer tissues obtained from three patients with liver metastases and
   three patients without liver metastases. Libraries of total RNA and
   small RNAs were constructed for sequencing. DEGs were examined and
   filtered after sequencing. Our goal was to identify DEGs that might be
   vital in regulating the process of metastases and potential target
   biomarkers through next‐generation RNA‐seq technology.

Materials and Methods

Patients and samples

   We adopted some methods to screen all the samples from the biobank in
   the Department of Colorectal Surgery, the First Affiliated Hospital of
   Nanjing Medical University, between 2014 and 2016. First, we divided
   these samples into two categories: one is primary CRC samples with
   liver metastases and the other is primary CRC samples without liver
   metastases. Then, to decrease potential discordance, only T4 samples
   (according to TNM staging) were collected from patients with only left
   colon tumors who were enrolled in this study. Finally, samples must be
   obtained from patients without receiving neoadjuvant chemotherapy.
   Through this filtering process, 36 fresh primary CRC samples were
   collected and then we performed quality tests twice on all 36 samples
   before next‐generation RNA‐seq. After that, three primary CRC samples
   with liver metastasis and three primary CRC samples without metastasis
   (i.e., six primary CRC samples) were subjected to NGS. We therefore
   performed NGS on these qualified samples. All the patients and grouping
   information are listed in the Table [60]1. All these samples were
   confirmed as pathologically adenocarcinoma. All samples were
   snap‐frozen in the liquid nitrogen after surgery [61]15 and stored at
   −80°C within 30 min after excision of the tumor. Written informed
   consent was obtained from all patients, and this study was approved by
   the Ethics Committee of the First Clinical Medicine College, Nanjing
   Medical University.

Table 1.

   Information for patients whose tissue were obtained for next‐generation
   sequencing (NGS)‐based profiling
   No. Group Age Gender Location Pathology Metastases Stage Primary tumor
   size (cm)
   T2 T 62 M Left colon Adenocarcinoma Liver T4N2M1 4*3*1.5
   T8 T 57 M Left colon Adenocarcinoma Liver T4N1M1 7*6*2
   T9 T 54 F Left colon Adenocarcinoma Liver T4N2M1 4.5*2.5*0.5
   T4 C 57 M Left colon Adenocarcinoma No T4N0M0 6*4*2
   T6 C 59 M Left colon Adenocarcinoma No T4N0M0 6*6*1.5
   T7 C 62 M Left colon Adenocarcinoma No T4N0M0 3*2*1
   [62]Open in a new tab

RNA extraction

   Total RNA was isolated from each sample using the TRIzol reagent
   (Invitrogen, Carlsbad, CA), following the manufacturer's protocol. The
   purified RNA samples were quantified using a NanoDrop 2000
   spectrophotometer (Agilent, Santa Clara, CA). The isolated RNA was
   stored at −80°C until analysis. After screening all 36 RNA samples
   using the Agilent 2100, six of them met the level A standard for
   subsequent sequencing. The RNA Integrity Number (RIN) was confirmed as
   above 7.0 and rRNA 28S/18S ratio was above 1.0 separately in these six
   samples.

Illumina‐based sequencing

   The total RNA samples were first treated with DNase I to degrade any
   possible DNA contamination. Oligo(dT) magnetic beads were then used to
   enrich the mRNA. The mRNA was fragmented into short fragments after
   mixing with fragmentation buffer. Using random hexamer‐primers, first
   strand cDNA was synthesized. The second strand was synthesized using
   buffer, dNTPs, RNase H, and DNA polymerase I. The double strand cDNAs
   were purified with magnetic beads. Finally, sequencing adaptors were
   ligated to the fragments. PCR amplification then enriched the
   fragments. The library products were ready for sequencing using the
   Illumina HiSeq^™ 2000 system.

Quality control and filtering

   Primary sequencing data produced by the Illumina HiSeq^™ 2000 were
   termed raw reads. “Dirty” raw reads were defined as those that
   contained the sequence of the adaptor, high content of unknown bases,
   and low‐quality reads. These dirty reads were filtered out as follows:
   (1) Reads with adaptors were removed; (2) reads in which unknown bases
   were more than 10% were removed; and (3) low‐quality reads (the
   percentage of low‐quality bases was over 50% in a read) were removed.
   All subsequent analyses were based on the clean data.

Gene quantification

   To quantify each DEG, the fragments per kb per million fragments (FPKM)
   method are adopted [63]17. To calculate the expression level, the
   following formula was used:
   [MATH:
   <mrow><mtext>FPKM</mtext><mo>=</mo><mfrac><mrow><msup><mn>10</mn><mn>6<
   /mn></msup><mi
   mathvariant="normal">C</mi></mrow><mrow><mtext>NL</mtext><mo
   stretchy="false">/</mo><msup><mn>10</mn><mn>3</mn></msup></mrow></mfrac
   ></mrow> :MATH]

   In this formula, Given the gene A, C represents the number of fragments
   that are uniquely mapped to gene A, N is the total number of fragments
   that are uniquely mapped to all genes, and L is the number of bases of
   gene A. The FPKM method can eliminate the influence of different gene
   length and sequencing discrepancy on the calculation of gene
   expression. Therefore, the calculated gene expression can be used
   directly for comparisons between samples.

Data analysis

   P‐value and false discovery rate (FDR) filtering was used to identify
   the mRNAs and small RNAs whose expression was significantly different
   between Treatment (T) Groups (T2, T8, and T9) and Control (C) Groups
   (T4, T6, and T7). These differentially expressed mRNAs and small RNAs
   were identified through fold change filtering.

Gene function and Kyoto Encyclopedia of Genes and Genomes pathway enrichment
analysis

   Gene Ontology (GO) is an international standard gene functional
   classification system that provides a dynamically up‐to‐date
   vocabulary. GO covers three ontologies: molecular function, cellular
   component, and biological process. First, this method mapped all DEGs
   to GO terms in the database ([64]http://www.geneontology.org/),
   calculating the numbers of DEGs for every term. Then, a hypergeometric
   test was used to identify significantly enriched GO terms in the input
   list of DEGs. This analysis is able to recognize the main biological
   functions and processes that involve various DEGs. Pathway analysis is
   a functional analysis that maps genes to the Kyoto Encyclopedia of
   Genes and Genomes (KEGG) pathways. This analysis identifies metabolic
   or signal transduction pathways that are significantly enriched by
   DEGs.

Quantitative RT‐PCR verification for target mRNAs

   After filtering the results of the next‐generation sequencing, several
   target RNAs were selected for quantitative real‐time reverse
   transcription PCR (qRT‐PCR) validation. Eight samples of primary CRC
   with liver metastases and another eight without liver metastases were
   collected from the Department of Colorectal Surgery, the First
   Affiliated Hospital of Nanjing Medical University. For each sample,
   total RNA was isolated using the TRIzol (Invitrogen, Carlsbad, CA) and
   then reverse transcribed using the Prime Script RT reagent kit. QRT‐PCR
   was performed with SYBR green mix reagents on an ABI 7500 (Applied
   Biosystems, Carlsbad, CA), according to the manufacturer's
   instructions, to detect the expression levels of the target mRNAs.

Immunohistochemistry analysis

   Primary samples collected from patients with and without liver
   metastases were cut into 4‐μm‐thick serial sections. The sections were
   submerged in antigenic retrieval buffer and microwaved for antigen
   fixation, after which they were deparaffinized with xylene. Slides were
   treated with hydrogen peroxide to quench endogenous nonspecific binding
   activity. The slides were then probed with the following primary
   antibodies: rabbit anti‐human SPP1, rabbit anti‐human MMP3, rabbit
   anti‐human OSM, rabbit anti‐human SLC13A2, and mouse anti‐human CXCL8
   (all Proteintech, Inc. Wuhan, China); rabbit anti‐human HOXD10 (Bioss,
   Inc. Beijing, China), and rabbit anti‐human CXCL6 (Abcam, Cambridge,
   UK), respectively. Next, the slides were incubated with horseradish
   peroxidase (HRP)‐polymer‐conjugated secondary anti‐rabbit or anti‐mouse
   antibody at 37°C for 1 h. Then, the immunoreactivity was detected with
   the diaminobenzidine (DAB) substrate to determine protein expression. A
   negative control was performed by replacing the primary antibodies with
   phosphate‐buffered saline (PBS). The stained tissue slices were
   reviewed and scored by two pathologists separately.

Results

Overview

   Six primary CRC samples were included in the NGS‐based profiling,
   generating approximately 12,807,481 raw sequencing reads, of which
   12,782,782 clean reads remained after filtering low‐quality samples,
   making the average clean reads ratio 99.81% (Table [65]2). After
   Quality control (QC), clean reads are mapped to reference using BWA
   [66]18 and Bowtie2 [67]19 tools. The average mapping ratio with
   reference genes was 74.48%. Table [68]3 lists separate mapping rates
   for each sample.

Table 2.

   Summary of sequencing data
   Sample Sequencing strategy Raw data size (bp) Raw reads number Clean
   data size (bp) Clean reads number Clean data rate (%)
   T2 SE49 639,113,762 13,043,138 638,528,996 13,031,204 99.9
   T8 SE49 611,143,876 12,472,324 610,085,427 12,450,723 99.82
   T9 SE49 613,761,652 12,525,748 612,491,866 12,499,834 99.79
   T4 SE49 639,188,046 13,044,654 636,882,547 12,997,603 99.63
   T6 SE49 639,151,394 13,043,906 638,229,557 13,025,093 99.85
   T7 SE49 623,040,733 12,715,117 621,919,564 12,692,236 99.82
   [69]Open in a new tab

   Strategy to sequence samples with paired‐end (PE); the following number
   reflects the read length.

Table 3.

   Alignment statistics of reads aligned to the reference gene
   Sample Total reads Total bases Total mapped reads (%) Perfect match (%)
   Mismatch (%) Unique match (%) Multi‐position match (%) Total unmapped
   reads (%)
   T2 13,031,204 638,528,996 72.43 62.27 10.15 63.82 8.6 27.57
   T8 12,450,723 61,008,5427 75.17 61.7 13.47 66.75 8.42 24.83
   T9 12,499,834 612,491,866 73.37 60.03 13.34 64.78 8.59 26.63
   T4 12,997,603 636,882,547 77.59 63.28 14.31 68.74 8.86 22.41
   T6 13,025,093 638,229,557 73.69 63.53 10.15 65.09 8.6 26.31
   T7 12,692,236 621,919,564 74.62 61.19 13.44 65.98 8.65 25.38
   [70]Open in a new tab

   Gene expression levels were quantified by a software package called
   RSEM [71]20. We counted the number of identified expressed genes and
   calculated their proportion to total number of genes (26,929 in total)
   in the database for each sample, as shown in Figure [72]1A. Also,
   histogram distributions were performed for each sample based on gene
   expressions. The average FPKM values were 24.76, 22.26, 24.69, 21.53,
   22.84, and 24.42 for the T2, T8, T9, T4, T6, and T7 samples,
   respectively (Fig. [73]1B–G).

Figure 1.

   Figure 1
   [74]Open in a new tab

   Analyses of identified genes (A) Number of identified genes. The x‐axis
   is the sample name. The y‐axis is number of identified expressed genes.
   The proportion at the top of each bar equals the number of expressed
   genes divided by the total gene number reported in database. (B–G)
   Histograms showing the distribution of genes at the expression level of
   T2, T8, T9, T4, T6, and T7. The x‐axis is fragments per kb per million
   fragments (FPKM) value. The y‐axis is gene number of at the
   corresponding FPKM.

   The whole‐transcriptome analysis revealed the differences between
   Treatment (T) Group and Control (C) Group. The 2690 dysregulated RNAs
   (log 2 ratio ≥2.0‐fold change) comprised 996 upregulated and 1694
   downregulated DEGs. In addition, probability analysis reveals the
   possibility of how significantly these genes are differentially
   expressed between groups. The resulting range narrowed a lot at a
   probability ≥0.8, identifying of 32 upregulated and 111 downregulated
   DEGs (log 2 ≥ 2.0‐fold change, P < 0.05, P ≥ 0.8) (Fig. [75]2A and B).
   We then removed genes with extremely low FPKM (FPKM < 0.5) values in
   both the T and C groups on that basis [76]17, [77]21. Therefore, 22
   upregulated and 73 downregulated DEGs were included after filtering
   (Table [78]S1). Among the upregulated genes, HOXD10 was the most
   upregulated DEG with a 4.56‐fold change, whereas the most downregulated
   DEG was SPP1, with a −7.12 fold‐change.

Figure 2.

   Figure 2
   [79]Open in a new tab

   Scatter plots and Volcano graphs of all expressed genes. (A) Scatter
   plots of all expressed genes in each pairwise comparison. The x and
   y‐axes represent the log 2 value of gene expression for treatment (T)
   and control (C) groups. Blue means a downregulated gene, orange means
   an upregulated gene, and brown means a nonregulated gene. The screening
   threshold is shown on the top legend. (B) Volcano graph of all
   expressed genes in each pairwise comparison. The x‐axis shows the log 2
   (fold change) and the y‐axis shows the –log 10 (1−P), representing the
   threshold values in log transformation. Each dot is a differentially
   expressed gene (DEG). Red dots mean significant DEGs and black dots are
   nonsignificant DEGs.

Gene ontology and pathway analyses

   In the GO analysis, the DEGs were annotated to different terms and
   grouped into three categories: biological process, cellular component,
   and molecular function. Between T and C groups, the three terms most
   associated with biological process were single‐organism process (82),
   cellular process (74), and response to stimulus (65). In the cellular
   component category, cell (64), cell part (64), and organelle (44) were
   the three most substantial components. Among molecular functions,
   binding (73) ranked first, followed by catalytic activity (23) and
   molecular transducer activity (8) (Fig. [80]3A).

Figure 3.

   Figure 3
   [81]Open in a new tab

   Functional analysis of differentially expressed genes (DEGs). (A) Gene
   ontology (GO) functional classification on DEGs between groups. The
   x‐axis represents the number of DEGs (the number is presented by its
   square root value). The y‐axis represents the GO terms. All GO terms
   are grouped into three ontologies: blue represents biological process,
   brown is cellular component, and orange is molecular function. (B) The
   Rich Factor is the ratio of DEGs numbers annotated in this pathway term
   to all gene numbers annotated in this pathway. A higher Rich Factor
   means greater intensiveness. The Q‐value is the corrected P‐value
   ranging from 0 to 1, and a smaller Q‐value means greater intensiveness.
   (C) The phagosome pathway is annotated with most DEGs among all the
   referred pathways.

   The pathway enrichment analysis reveals the way in which DEGs would
   interact with other factors and how they participate in the various
   biological functions. Between the two groups, the top 10 pathways
   annotated with higher numbers of DEGs among all the dysregulated genes
   were: Phagosome (17), Cytokine‐cytokine receptor interaction (14),
   Pertussis (14), Cell adhesion molecules (CAMs) (13), PI3K‐Akt signaling
   pathway (13), Hematopoietic cell lineage (12), Leishmaniasis (12),
   Proteoglycans in cancer (12), Rheumatoid arthritis (11), and
   ECM‐receptor interaction (11). All pathways displayed in Figure [82]3B
   reached statistically significance (P < 0.05). The pathway involving
   the most DEG annotations is displayed in Figure [83]3C.

Hierarchical clustering analysis of DEGs

   Genes with similar expression patterns usually have functional
   correlations. Therefore, clustering analysis of DEGs was performed
   using Cluster [84]22, according to the provided cluster plans for DEGs.
   We also provided a complete intersection and union DEGs heatmap
   (Fig. [85]4A) for each cluster plan.

Figure 4.

   Figure 4
   [86]Open in a new tab

   Analysis of differentially expressed genes (DEGs). (A) Only genes that
   were differentially expressed in all pairwise comparisons of the
   cluster plan were used to build this heatmap. The gradient color
   barcode at the right top indicates the log 2 fold change value. Each
   row represents a DEG and each column represents a pairwise condition.
   DEGs with similar fold changes value are clustered both at the row and
   column level. (B) Comparison of the expressions obtained using two
   platforms showed good correspondence for most pairs. The blue columns
   represent the log 2 ratio fold change in each DEG with the application
   of next‐generation sequencing (NGS) and red columns represent log 2
   fold changes using quantitative real‐time reverse transcription PCR
   (qRT‐PCR).

Target DEGs selection and validation

   After removing genes with extremely low FPKM values (FPKM < 0.5), we
   selected the top 13 upregulated and top 12 downregulated genes by fold
   change to verify their differential expression using qRT‐PCR. These
   target DEGs are listed in Table [87]4. QRT‐PCR validated that eight of
   the 25 selected DEGs demonstrated significant changes between groups.
   The three most significantly upregulated DEGs were HOXD10, UGT2A3, and
   SLC13A2, whereas the five most significantly downregulated DEGs were
   SPP1, CXCL8, MMP3, OSM, and CXCL6, respectively. The most up‐ and
   downregulated DEGs in the list reached statistical significance. A
   comparison was also performed between outcomes of NGS and qRT‐PCR, as
   shown in the Figure [88]4B and Table [89]4. Good correlations of
   expressions were observed in most pairs between the two platforms,
   demonstrating the credibility of these results.

Table 4.

   Validation outcomes of target genes via next‐generation sequencing
   (NGS) and quantitative real‐time reverse transcription PCR (qRT‐PCR)
   separately
   Symbol Regulation (T/C) Log 2 ratio (T/C) (NGS) Probability (T/C)
   P‐value (PCR) Significance (PCR) Log 2 ratio (T/C) (PCR)
   HOXD10 UP 4.56 0.85 0.0399 * 1.7818003
   MAGEB17 UP 4.44 0.87 0.2719 NS 1.5873628
   UGT2A3 UP 4.35 0.88 0.0231 * 1.9476197
   HOXD11 UP 4.20 0.83 N NS N
   UGT2B17 UP 3.77 0.87 0.2305 NS 5.8695526
   CYP2W1 UP 3.43 0.88 0.6263 NS 0.49270165
   ADH1C UP 3.34 0.87 0.9272 NS −0.10134707
   CKB UP 3.32 0.89 0.2883 NS 1.066491
   SLC39A2 UP 3.28 0.82 0.4047 NS 2.0332544
   PCP4 UP 2.77 0.81 0.1535 NS 5.333551
   SLC13A2 UP 2.72 0.82 0.0088 ** 2.0332544
   HOXB13 UP 2.69 0.81 0.1130 NS 0.88917387
   PRAC1 UP 2.62 0.85 0.6127 NS 0.58156765
   SPP1 DOWN −7.12 0.97 0.0055 ** −2.5303776
   KLK10 DOWN −6.43 0.96 0.3147 NS 1.0371485
   HCAR2 DOWN −5.88 0.92 0.0572 NS 1.1890949
   IGFL2 DOWN −5.76 0.92 0.0545 NS −0.13972865
   CXCL8 DOWN −5.65 0.96 0.0073 ** −2.2027724
   SLC11A1 DOWN −5.56 0.90 0.7785 NS −0.154957
   PPBP DOWN −5.37 0.90 0.3707 NS −2.3749266
   HCAR3 DOWN −5.32 0.91 0.3892 NS 0.61707544
   MMP3 DOWN −5.29 0.93 0.0301 * −5.1104937
   FCGR3B DOWN −5.21 0.91 0.1368 NS −1.0682427
   OSM DOWN −5.20 0.89 0.0437 * −1.4058526
   CXCL6 DOWN −5.15 0.90 0.0392 * −1.5858094
   [90]Open in a new tab

   NS means not significant and statistical significance (*) is defined as
   P < 0.05.

   **P<0.01

Immunohistochemistry validation of the expressions of target genes

   Immunohistochemistry (IHC) validations were performed on primary CRC
   samples with and without liver metastases. Among the upregulated DEGs,
   IHC for HOXD10 and SLC13A2 showed higher levels in primary samples with
   liver metastases (LM) than in those without liver metastases (NLM)
   (Fig. [91]5A and B). Notably, since anti‐human UGT2A3 could not be
   applied in IHC according to the Abcam official instructions, we did not
   perform IHC to detect UGT2A3 protein levels. For the downregulated
   DEGs, OSM, MMP3, CXCL6, and CXCL8 all displayed expression patterns
   similar to the qRT‐PCR validations; that is, the protein levels were
   lower in the primary CRC tissues with liver metastases than in those
   without liver metastases (Fig. [92]5C–F). The IHC staining on tissues
   with liver metastases with cytokine CXCL6 and CXCL8 antibodies were
   extremely weak, demonstrating a strong distinction compared to those in
   tissues without liver metastases. However, IHC analysis between tissues
   using SPP1 antibodies demonstrated no significant difference
   (Fig. [93]5G). Collectively, the outcomes of the IHC analyses were
   mostly consistent with the qRT‐PCR results, except for SPP1.

Figure 5.

   Figure 5
   [94]Open in a new tab

   IHC validation for HOXD10 (A), SLC13A2 (B), OSM (C), MMP3 (D), CXCL6
   (E), CXCL8 (F), and SPP1 (G). For each gene: LM represents primary CRC
   samples with liver metastases; NLM represents primary CRC samples
   without liver metastases; NC represents negative control; original
   magnification: ×200.

Discussion

   Liver is the most frequent organ to form distant metastases in CRC, and
   liver metastases are the leading cause of death for patients with CRC.
   Therefore, it is important to explore the underlying mechanism and
   detect novel biomarkers of liver metastases. Several microarray studies
   revealed that liver metastases sites demonstrated a high correspondence
   [95]17 with their primary tumor in terms of gene expression; therefore,
   we adopted the strategy whereby primary tumors from patients with and
   without liver metastases were selected to detect gene expressions via
   NGS, which avoided the potential bias caused by different organs.
   Moreover, we selected only left colon tumors as samples to rule out the
   bias from different locations of CRC, because the new NCCN guidelines
   of colon cancer have already mentioned that anti‐EGFR therapy has
   relatively lower benefits on the right‐sided colon compared with the
   left‐sided colon. Our results revealed that the potential DEGs might
   contribute to or inhibit the process of liver metastases.

   HOXD10 was the most upregulated DEG in this study. HOX genes are a
   family of regulators that play important roles in organ and cell
   development during embryogenesis. Interestingly, studies have revealed
   that HOXD10 affects the outcomes of different tumors. According to
   Sharpe and colleagues [96]23, HOXD10 and HOXD11 are expressed at high
   levels in head and neck squamous cell carcinoma, which corresponded to
   our results. Knockdown of HOXD10 led to decreased migration and
   proliferation in head and neck squamous cell carcinoma cells in their
   study, which suggested that HOXD10 acts as an oncogene that promote
   metastases. Interestingly, studies concerning breast cancer and gastric
   cancer showed opposite effects. According to Wang and colleagues
   [97]24, HOXD10 is downregulated in gastric cancer tissues and cell
   lines compared with normal tissues, suggesting that high expression of
   HOXD10 impaired cell migration and invasion, which may function as
   tumor suppressor. In breast cancer, significant differences were found
   between cancerous and normal tissues, and low expression of HOXD10 was
   associated with high‐grade breast cancer, displaying a similar tumor
   suppressor function to that in gastric cancer [98]19. In our study,
   HOXD10 was expressed at a significantly higher level in primary CRC
   with liver metastases than in those without liver metastases,
   indicating that our results favor the identification of HOXD10 as an
   oncogene in colorectal cancer. This might need further validation.

   The UDP‐glucuronosyltransferase (UGT) enzymes are responsible for the
   glucuronidation of exogenous and endogenous compounds, including drugs,
   environmental intoxicants, steroids, neurotransmitters, bile acids, and
   other hormones. UGT2A3 is one of the UGT family members. It is reported
   that UGT2A3 expressed at its highest levels in tissues that are mostly
   related to drug clearance, of which liver is the most expressed organ,
   followed by the gastrointestinal tract, and the kidneys. Also, UGT2A3
   was found to glucuronidate bile acid specifically, which is a product
   of cholesterol metabolism. Normally, 90% of bile acids that pass
   through the biliary system would be reabsorbed and returned to the
   liver. Therefore, if the bile flow is somewhat impaired or limited by
   cholestatic liver disease, liver injuries would occur due to the
   accumulation of bile acids. In our study, primary CRC patients with
   liver metastases showed significantly higher levels of UGT2A3 than
   those without liver metastases. The liver dysfunction caused by liver
   metastases may influence the normal bile flow, therefore leading to
   cholestasis. The accumulation of bile acids requires higher UGT2A3
   activity to catalyze the process of glucuronidation. This hypothesis
   highlights the significance of metabolism in carcinogenesis, as
   reported by Lindsey and colleagues [99]25. We hypothesized that UGT2A3
   could be a potential biomarker of CRC with liver metastases [100]26.

   Like UGT2A3, few reports have focused on SLC13A2. SLC13A2 encodes the
   Na^+‐dicarboxylate cotransporter isoform 1 (NaDC1), which plays an
   important role in dealing with renal citrate. Mostly, NaDC1 is located
   on the apical membrane of epithelial cells of the renal proximal tubule
   and small intestine, where absorption of citric acids take place
   [101]27. In the kidney, the function of NaDC1 is involved with
   formation of kidney stones because of the existence of divalent citrate
   (citrate^2+). Although it seems irrelevant to CRC with liver
   metastases, our results showed that SLC13A2 is associated with liver
   metastases, although the underlying mechanism has yet to be determined.

   Among the downregulated DEGs, OSM and SPP1 shared one the same pathway,
   the PI3K‐Akt pathway, indicating that this pathway is might be involved
   in tumorigenesis and metastases. SPP1, also known as OPN, activates the
   PI3K/Akt signaling pathway through an extracellular matrix
   (ECM)‐Receptor interaction, whereas OSM activates the pathway via a
   cytokine‐cytokine receptor interaction. Matrix metalloproteinase‐3
   (MMP‐3) is a member of the MMP family that is associated with several
   kinds of malignant tumors [102]28. The CXCR‐1 binding chemokines
   IL‐8/CXCL8 and CXCL6 are often coinduced with CXCL6, according to
   several reports [103]29, [104]30, resulting in the same trend in
   regulation. Interestingly, it is reported that CXCL6 staining
   correlates with MMP9 expression in gastrointestinal malignancies. In
   contrast, another study reported that CXCL6 suppressed the activity of
   MMP2 in the first trimester, indicating various functions and roles of
   CXCL6 [105]31. In addition, with regard to CXCL8, it is reported that a
   high serum level of CXCL8 is a protective factor to prevent metastases,
   and high CXCL8 levels are associated with better prognosis [106]32.
   Meanwhile, some studies [107]33, [108]34 show that CXCL8 promotes
   cancer progression and is linked to poor prognosis. Thus, it still
   remains controversial how CXCL8 would influence the metastases;
   however, our results showed that reductions in the expression of CXCL8
   is associated with tumor progression and metastases, indicating that
   these cytokines could possibly play roles as tumor suppressors and
   biomarkers. Both CXCL6 and CXCL8 showed diverse expression between
   them, by qRT‐PCR and IHC analysis, and both methods demonstrated a
   great correspondence in all samples.

   Advanced bioinformatics tool could help us to understand the roles of
   these DEGs. In terms of Gene Ontology, one of the top terms associated
   with biological process is single‐organism process, which agrees with a
   previous study by Chen and colleagues [109]35, whereas in the molecular
   function analysis, binding is another term shared between this study
   and the previous study [110]35. This consistency not only suggests that
   most DEGs contribute to these two common processes, but also enhanced
   the credibility of both studies. For the KEGG pathways,
   cytokine‐cytokine receptor interaction, CAMs, PI3K‐Akt signaling
   pathway, Leishmaniasis, and rheumatoid arthritis were also identified
   in the previous study [111]35. Among these, the PI3K‐Akt pathway is
   recognized as a carcinogenesis‐related pathway, which is also the
   pathway shared by two validated DEGs, OSM and SPP1, suggesting that the
   PI3K‐Akt pathway is very significant [112]36, [113]37.
   Cytokine‐cytokine receptor interaction is often associated with
   carcinogenesis and metastases of CRC [114]38, [115]39, [116]40,
   [117]41, which also corresponds to our observation of significantly
   differentially expressed cytokines CXCL6 and CXCL8. CAMs also ranked
   high among all these pathways because the process of cells migration to
   distant sites is associated with adhesion ability and the
   epithelial‐mesenchymal transition (EMT)[118]42, [119]43, [120]44, which
   is also highly correlated with “binding” in GO molecular function. We
   believe that these pathways might play more important roles in the
   metastases as well as in carcinogenesis.

   Obviously, one of the disadvantages of this study is that number of
   samples for NGS is relatively small. More qualified samples are
   required to perform large‐sized NGS‐based whole‐transcriptome
   profiling. Moreover, functional validation of these DEGs is also
   required for the connections between some DEGs and carcinogenesis and
   metastases. Lastly, since there is abundant information for numerous
   DEGs, more valuable results could be obtained from the primary
   statistics.

   Collectively, the goal of this study was to discover DEGs that might be
   responsible for liver metastases or could be biomarkers of liver
   metastases, using NGS‐based whole‐transcriptome profiling on CRC
   tissues with and without metastases. Validations of the top 25 up and
   downregulated DEGs were performed via qRT‐PCR and IHC. The results
   showed that these DEGs were significantly differentially expressed and
   corresponded to the previous NGS profile. To the best of our knowledge,
   this is the first study to perform NGS‐based mRNA whole‐transcriptome
   on CRC tissues with and without liver metastases. These findings might
   provide novel targets for colorectal liver metastases and biomarker
   discovery, suggesting potential roles in therapeutic application.

Conflict of Interest

   The authors declare that they have no competing interests.

Supporting information

   Table S1. Potential target DEGs
   [121]Click here for additional data file.^ (68.3KB, xlsx)

   Cancer Medicine 2017; 6(8):1976–1987

References