Abstract Objective Using high-throughput RNA sequencing technology, this study aimed to sequence the transcriptome of kidney and liver tissues harvested from Peninsular Malaysia cynomolgus macaque (Macaca fascicularis). M. fascicularis are significant nonhuman primate models in the biomedical field, owing to the macaque’s biological similarities with humans. The additional transcriptomic dataset will supplement the previously described Peninsular Malaysia M. fascicularis transcriptomes obtained in a past endeavour. Results A total of 75,350,240 sequence reads were obtained via Hi-seq 2500 sequencing technology. A total of 5473 significant differentially expressed genes were called. Gene ontology functional categorisation showed that cellular process, catalytic activity, and cell part categories had the highest number of expressed genes, while the metabolic pathways category possessed the highest number of expressed genes in the KEGG pathway analysis. The additional sequence dataset will further enrich existing M. fascicularis transcriptome assemblies, and provide a dataset for further downstream studies. Electronic supplementary material The online version of this article (10.1186/s13104-018-4014-1) contains supplementary material, which is available to authorized users. Keywords: Macaca fascicularis, Cynomolgus macaque, RNA sequencing, Transcriptome, Biomedical science, Kidney, Liver Introduction Cynomolgus macaques (Macaca fascicularis) are nonhuman primate (NHP) models significant to biomedicine due to their close evolutionary relationship with humans. The cynomolgus macaque’s recapitulation of human physiology, genetics, and behaviour is advantageous as translational models for various studies in the biomedical field, including drug development and safety testing [[37]1, [38]2]. Cynomolgus macaque individuals from different geographical locations have been shown to exhibit genetic characteristics that varies between geographical origins [[39]3]. Therefore, it is vital that the genomes and transcriptomes of cynomolgus macaque NHP models originating from different geographical locations are sequenced as a reference for future biomedical research design and implementation. This study describes the RNA sequencing (RNA-seq) of two tissues—kidney and liver—harvested from wild Peninsular Malaysian cynomolgus macaques, and is an extension of a previous study [[40]4] whereby the lymph node, spleen, and thymus transcriptomes of wild Peninsular Malaysian M. fascicularis were also sequenced with RNA-seq technology. An additional 75,350,240 sequence reads were obtained from this study, supplementing the previous wild Peninsular Malaysian M. fascicularis dataset for downstream applications. Furthermore, additional Malaysian cynomolgus macaque RNA-seq datasets will further furnish the cynomolgus macaque genome and transcriptome annotations and also provide a valuable asset for biomedical studies involving cynomolgus macaques. Main text Materials and methods Samples Two male conflict macaques that appeared to belong in the same family group were captured by the Department of Wildlife and National Parks (DWNP) from the state of Selangor. Visual examination showed that the macaques were free of disease. Euthanisation and harvesting of organs were also performed by authorised and qualified veterinarians of DWNP. In brief, the macaques were sedated intramuscularly using general anaesthesia (combination of ketamine, 5–10 mg/kg and xylazine, 0.2–0.4 mg/kg) before a lethal dosage of Dolethal^® were given intravenously. Harvested kidney and liver tissues were stored in RNAlater RNA Stabilization Agent (Qiagen). Total RNA extraction was carried out with RNeasy Mini Kit (Qiagen). A modified protocol was employed to remove genomic DNA contamination by incorporating Epicentre’s DNase I solution. RNA extract integrity was determined using 2100 Bioanalyzer (Agilent Technologies, Inc., United States of America) via RNA Pico chip. Samples with RIN > 7.0 were selected for library construction. High-throughput sequencing, data analysis, and validation Preparation of M. fascicularis kidney and liver RNA-seq libraries followed methods previously described in Ee Uli et al. [[41]4]. Raw sequencing reads were submitted to NCBI Short Read Archive under accessions SRX2499144-SRX2499147. For data analyses, we used the same bioinformatic workflow and tools as depicted in our earlier publication [[42]4]. Briefly, the assembly of raw sequence reads, normalisation, empirical analysis of differentially expressed genes (EDGE test), and gene ontology (GO) and KEGG pathway annotations were performed in CLC Genomics Workbench (CLCGW) version 7.5.1 (Qiagen Denmark), while functional classification of expressed genes in the kidney and liver transcriptomes was performed using Panther Database version 11.1 released on 2016-10-24 ([43]http://pantherdb.org/) and DAVID Bioinformatics Resources version 6.8 ([44]https://david-d.ncifcrf.gov/home.jsp). To better present the tissue-specific genes, we combined gene expression data from this study and our previous study [[45]4] to create a Venn diagram representation of tissue-specific genes using the web tool Venn Diagrams ([46]http://bioinformatics.psb.ugent.be/webtools/Venn). Validation of the RNA-seq dataset was performed with NanoString nCounter Elements XT (NanoString Technologies Inc., Seattle, WA, USA) gene expression assay. The methodology was the same as previously described in Ee Uli et al. [[47]4]. Additional file [48]1 lists the genes utilised for the RNA-seq validation and their respective probe pair sequences. Results Sequence reads trimming and RNA-seq mapping The number of raw kidney and liver sequence reads obtained were 44,452,083 and 30,898,157 respectively, each read possessing a sequence length of 70 bp. After trimming, the number of kidney and liver sequence reads were 41,854,334 and 28,943,585 singly. Summing up to a total of 70,788,919 good quality reads retained for assembly, with the average lengths of the reads ranging from 58.8 to 60.1 bp. Post-trimming, 93.94% of the reads were retained and were considered suitable for RNA-seq mapping. The overall mapping percentage of the sequence reads for each RNA-seq library ranged from 65.00 to 68.59%. Additional file [49]2 summarises the statistics of the sequence reads trimming and mapping. Normalisation and differential gene expression analysis A total of 5473 significant differentially expressed genes were called. Additional file [50]3 lists the differentially expressed genes identified from the EDGE test with their respective annotated gene names and descriptions. Within the Kidney vs. Liver tissue comparison, the top three most upregulated genes in liver were PLA2G2A, SAA1, and ORM1, while the three most downregulated genes include UMOD, TINAG, and DHDH. Tissue-specific genes Using gene expression data obtained from our previous study [[51]4] and this study, a Venn diagram representation of tissue-specific genes in kidney, liver, lymph node, spleen, and thymus tissues was generated (Fig. [52]1). Lymph node showed the highest number of tissue-specific genes (2156 genes) among the five tissues compared; while kidney and liver had 310 and 154 tissue-specific genes respectively. The three most highly expressed tissue-specific genes in the kidney include MIOX, AQP2, and CDH16, while the top three tissue-specific genes expressed in the liver include APCS, APOA2, and LOC102121341. A complete list of tissue-specific genes can be found in Additional file [53]4. Fig. 1. Fig. 1 [54]Open in a new tab Venn diagram of the number of expressed genes (normalised expression value > 1) in their respective tissues Functional annotation and classification In the kidney, 14,001, 7046, and 5635 expressed genes were categorised to biological process, molecular function, and cellular component domains respectively, while 12,004, 6059, and 4935 expressed genes in the liver were categorised to BP, MF, and CC domains respectively. The distribution of the expressed genes into different level 2 GO categories are shown in Table [55]1. The BP category with the highest number of expressed genes for both tissues is cellular process (kidney, 4039; liver, 3496). For the MF domain, the catalytic activity category contains the highest number of expressed genes for both tissues (kidney, 2709; liver, 2419). As for CC, both tissues show the highest number of expressed genes in the cell part category (kidney, 2185; liver, 1954). Table 1. Categorisation of expressed genes to gene ontology categories in kidney and liver tissues Gene ontology Number of genes Kidney Liver Biological process Cellular process 4039 3496 Metabolic process 3493 3114 Localization 1060 926 Response to stimulus 1031 878 Developmental process 939 745 Cellular component organization or biogenesis 842 753 Biological regulation 923 740 Multicellular organismal process 668 499 Immune system process 499 443 Biological adhesion 247 211 Reproduction 165 120 Locomotion 81 71 Rhythmic process 6 1 Growth 4 4 Cell killing 4 3 Molecular function Catalytic activity 2709 2419 Binding 2660 2310 Transporter activity 572 435 Receptor activity 446 348 Structural molecule activity 450 384 Signal transducer activity 147 110 Translation regulator activity 32 31 Antioxidant activity 14 11 Channel regulator activity 16 11 Cellular component Cell part 2185 1954 Organelle 1359 1220 Membrane 851 676 Macromolecular complex 744 681 Extracellular region 318 264 Extracellular matrix 96 75 Cell junction 50 39 Synapse 32 26 [56]Open in a new tab In the kidney and liver, a total of 4406 and 3960 genes were assigned to KEGG pathway categories respectively. The metabolic pathways category contains the highest number of expressed genes for both kidney and liver tissues—903 and 839 genes respectively. The distribution of the expressed genes into different KEGG pathway categories are shown in Table [57]2. Table 2. Categorisation of expressed genes to KEGG pathway categories in kidney and liver tissues KEGG pathway Number of genes Kidney Liver Metabolism Global overview and maps 903 1046 mcf01100: metabolic pathways 903 839 mcf01230: biosynthesis of amino acids 0 58 Carbohydrate metabolism 55 37 mcf00520: amino sugar and nucleotide sugar metabolism 0 37 mcf00562: inositol phosphate metabolism 55 0 Energy metabolism 139 135 mcf00190: oxidative phosphorylation 139 135 Glycan biosynthesis and metabolism 0 35 mcf00510: N-glycan biosynthesis 0 35 Genetic information processing Folding, sorting and degradation 124 119 mcf04141: protein processing in endoplasmic reticulum 124 119 Environmental information processing Membrane transport 37 0 mcf02010: ABC transporters 37 0 Signal transduction 600 128 mcf04010: MAPK signalling pathway 180 0 mcf04012: ErbB signalling pathway 69 67 mcf04015: Rap1 signalling pathway 154 0 mcf04022: cGMP-PKG signalling pathway 119 0 mcf04064: NF-kappa B signalling pathway 0 61 mcf04668: TNF signalling pathway 78 0 Signalling molecules and interaction 177 0 mcf04512: ECM-receptor interaction 71 0 mcf04514: cell adhesion molecules (CAMs) 106 0 Organismal system Development 101 0 mcf04360: axon guidance 101 0 Immune system 154 199 mcf04610: complement and coagulation cascades 0 56 mcf04611: platelet activation 101 89 mcf04662: B cell receptor signalling pathway 53 54 Nervous system 172 88 mcf04722: neurotrophin signalling pathway 93 88 mcf04725: cholinergic synapse 79 0 Endocrine system 302 221 mcf04910: insulin signalling pathway 106 98 mcf04915: estrogen signalling pathway 80 73 mcf04917: prolactin signalling pathway 0 50 mcf04921: oxytocin signalling pathway 116 0 Cellular processes Transport and catabolism 250 325 mcf04142: lysosome 0 86 mcf04144: endocytosis 182 174 mcf04146: peroxisome 68 65 Cellular community—eukaryotes 168 152 mcf04510: focal adhesion 168 152 [58]Open in a new tab Validation of RNA-seq differential gene expression Thirteen genes were selected to validate the RNA-seq differential gene expression dataset via NanoString gene expression assay. Normalised fold change values between the RNA-seq and NanoString platforms show concordance in magnitude directionality (Additional file [59]5). Our previous study also employed the NanoString gene expression assay to validate the RNA-seq dataset, and has shown similar concordance between the two platforms [[60]4]. Discussion In our previous endeavour [[61]4], the lymph node, spleen, and thymus transcriptomes of the Malaysian cynomolgus macaque were sequenced via Illumina Hi-seq 2500 technology. The dataset presented in this study provides additional transcriptomic sequences from the kidney and liver of the Malaysian cynomolgus macaque, with a sum of 75,350,240 sequence reads obtained from four tissue libraries. In Ee Uli et al. [[62]4], the transcriptome sequences were derived from immune-related tissues, whereas the kidney and liver transcriptomes generated in this study were obtained from metabolism-related tissues, and to a certain extent, these tissues also pertain to immune functions as well. Cynomolgus macaques are utilised as NHP models for liver-expressed drug metabolising enzyme studies, for example, the pharmacokinetics of the cytochrome P450 family of enzymes have been extensively studied in cynomolgus macaques [[63]2, [64]5, [65]6]. It was observed that the percentages of the processed sequence reads that mapped back to the reference genome for all four libraries were below 70%. To rule out sequence contamination, we performed a BLAST alignment of the unmapped reads to NCBI’s nucleotide (nt) database and classified the hits into their specific taxonomic classification. The majority of the unmapped reads were assigned to Cercopithecidae and other primate family taxa, while the remaining reads were unassigned or did not have any BLASTN hits (Additional file [66]6). This suggests the absence of contaminating sequences, and also suggests the presence of novel transcripts in the kidney and liver dataset of the Malaysian cynomolgus macaque, which are yet to be described and curated in the M. fascicularis reference genome. Comparisons were made between the kidney and liver tissues due to the similarities in metabolic function between the two tissues. However, the specific metabolic functions of both kidney and liver tissues are ultimately different. For instance, the KEGG pathway enrichment analysis highlights 58 liver genes that were specifically enriched in the biosynthesis of amino acids pathway, while 55 kidney genes were involved in the Inositol phosphate metabolism pathway. Based on the differential gene expression analysis of kidney and liver tissues, the top three most upregulated genes in liver were PLA2G2A, SAA1, and ORM1, while the three most downregulated genes in kidney include UMOD, TINAG, and DHDH. The PLA2G2A gene codes for phospholipase A2 group IIA, and is suggested to be involved in the metabolism of phospholipids, inflammatory responses, and antimicrobial defence [[67]7]. SAA1 codes for the acute-phase serum amyloid A1 protein, and is involved in inflammation response [[68]8]. ORM1 codes for orosomucoid 1, which is suggested to induce the conversion of monocytes to M2b macrophages in response to type 2 cytokines in cancer patients [[69]9]. UMOD codes for uromodulin, a urinary protein which inhibits the crystallisation of renal fluids that causes renal stone formation, and also expedites transepithelial migration of neutrophils in inflammation responses [[70]10]. Tubulointerstitial nephritis antigen is a glycoprotein coded by TINAG, which is a nephritogenic antigen implicated in tubular homeostasis and cell survival in the kidney [[71]11]. DHDH codes for the enzyme dihydrodiol dehydrogenase, and is suggested to be involved in detoxification processes in the kidney and liver [[72]12]. In the GO analysis, both kidney and liver transcriptomes were observed to have the majority of genes represented by the same categories in their respective GO domains. In the BP domain, the highly represented category was cellular process, while the category with the highest gene representation in the MF domain was catalytic activity. As for CC domain, the highest number of expressed genes was represented in the cell part category. The representation of functional GO categories in the M. fascicularis transcriptome were observed to be similar to that of other chordates, including that of the silver carp [[73]13], pig [[74]14], black grouse [[75]15], and pufferfish [[76]16], and suggest a fair representation of the M. fascicularis transcriptome with that of other chordates. The Malaysian cynomolgus macaque, M. f. fascicularis, is a unique subspecies that is suggested to have high nucleotide diversity and is genetically unique compared to other South East Asian M. fascicularis populations [[77]17–[78]19]. Genetic heterogeneity is suggested to contribute to varied responses in cynomolgus macaques towards drugs and pathogens [[79]20], making research into the Malaysian population of cynomolgus macaque invaluable to the biomedicine field. The transcriptome dataset of the Peninsular Malaysian cynomolgus macaque obtained from this study can be compared with cynomolgus macaques originating from other geographical locations including Mauritius, the Philippines, Indochina, Indonesia, and China. Such comparisons can be utilised for the detection of single nucleotide polymorphisms in the macaque transcriptome, to infer phylogeny utilising a larger set of genic sequences, and also to determine the nucleotide diversity of the Peninsular Malaysian cynomolgus macaque. Furthermore, the transcriptomic data obtained from this study may be utilised to design genic molecular markers for rapid assessment of wild populations of cynomolgus macaques in Peninsular Malaysia in a relatively shorter period of time and at a lower cost. Conclusion Via the Illumina Hi-seq 2500 platform, 75,350,240 sequence reads in total were obtained from kidney and liver transcriptomes of the wild Peninsular Malaysian cynomolgus macaque. Additional sequence reads from the Malaysian cynomolgus macaque individuals will potentially further supplement the present genomic and transcriptomic data of this significant NHP primate model. Limitations * Only two biological replicates were available for library construction and data analyses. * RNA sequencing of additional tissues will provide more robust transcriptomic dataset of wild Peninsular Malaysia M. fascicularis. Additional files [80]13104_2018_4014_MOESM1_ESM.xlsx^ (11KB, xlsx) Additional file 1. Selected genes for NanoString nCounter XT Gene Expression assay validation of RNA-Seq gene expression profiles. Table lists selected genes utilised for the validation of RNA-Seq gene expression profiles via NanoString nCounter XT Gene Expression assay and their respective probes nucleotide sequences. [81]13104_2018_4014_MOESM2_ESM.xlsx^ (9.8KB, xlsx) Additional file 2. Macaca fascicularis kidney and liver sequence reads trimming and reference guided assembly summary statistics. Summary of Macaca fascicularis kidney and liver sequence reads quality trimming and reference guided assembly. [82]13104_2018_4014_MOESM3_ESM.xlsx^ (2.8MB, xlsx) Additional file 3. List of differentially expressed genes from kidney vs. liver comparison. Differentially expressed genes in kidney and liver tissues called from the empirical differential gene expression analysis in CLC Genomics Workbench. Listed are the genes with their corresponding fold change values, together with their gene ontology and pathway annotations. [83]13104_2018_4014_MOESM4_ESM.xlsx^ (64.4KB, xlsx) Additional file 4. Tissue-specific genes (sorted from highest to lowest Normalised mean expression value). A list of tissue-specific genes generated from a Venn diagram comparison of expressed genes (normalised expression value > 1) in kidney, liver, lymph node, spleen, and thymus tissues. [84]13104_2018_4014_MOESM5_ESM.pdf^ (5.9KB, pdf) Additional file 5. Validation of RNA-seq differential gene expression results. Log[2] transformed fold change values were obtained from RNA-seq and NanoString nCounter XT platforms. Black bar represents fold change values obtained from RNA-Seq platform, while white bar represents fold change value obtained from NanoString nCounter XT platform. [85]13104_2018_4014_MOESM6_ESM.jpg^ (433.2KB, jpg) Additional file 6. BLAST alignment of unmapped Macaca fascicularis reads to NCBI nucleotide database. Graphical representation of unmapped M. fascicularis sequence reads assigned to taxonomic ranks based on BLAST alignment with NCBI nucleotide (nt) database. Intensity of the colour green represents the number of unmapped reads mapped to a respective taxonomic rank—the more intense the green, the higher the number of reads assigned to a particular taxon. Authors’ contributions JEU, CS-YY, SKY, NBA, and SGT conceptualised and designed the study. JJR-R provided cynomolgus macaque samples. JEU performed the lab work. JEU, CS-YY, SKY, and NMI analysed the dataset. Manuscript was prepared by JEU. All authors read and approved the final manuscript. Acknowledgements