Abstract Background Transcriptomic sequencing (RNA-seq) related applications allow for rapid explorations due to their high-throughput and relatively fast experimental capabilities, providing unprecedented progress in gene functional annotation, gene regulation analysis, and environmental factor verification. However, with increasing amounts of sequenced reads and reference model species, the selection of appropriate reference species for gene annotation has become a new challenge. Methods We proposed a novel approach for finding the most effective reference model species through taxonomic associations and ultra-conserved orthologous (UCO) gene comparisons among species. An online system for multiple species selection (MSS) for RNA-seq differential expression analysis was developed, and comprehensive genomic annotations from 291 reference model eukaryotic species were retrieved from the RefSeq, KEGG, and UniProt databases. Results Using the proposed MSS pipeline, gene ontology and biological pathway enrichment analysis can be efficiently achieved, especially in the case of transcriptomic analysis of non-model organisms. The results showed that the proposed method solved problems related to limitations in annotation information and provided a roughly twenty-fold reduction in computational time, resulting in more accurate results than those of traditional approaches of using a single model reference species or the large non-redundant reference database. Conclusions Selection of appropriate reference model species helps to reduce missing annotation information, allowing for more comprehensive results than those obtained with a single model reference species. In addition, adequate model species selection reduces the computational time significantly while retaining the same order of accuracy. The proposed system indeed provides superior performance by selecting appropriate multiple species for transcriptomic analysis compared to traditional approaches. Electronic supplementary material The online version of this article (10.1186/s12859-018-2278-z) contains supplementary material, which is available to authorized users. Keywords: RNA-seq, Reference model species, Differential expression analysis, Ultra-conserved orthologous gene, Gene ontology, Biological pathway Background High-throughput RNA sequencing (RNA-seq) involves the sequencing of cDNA samples using next-generation sequencing instruments, thus permitting the analysis of complete transcriptomes from samples subjected to different experimental conditions. In addition to providing a snapshot of an organism’s transcriptional profile, RNA-seq can also be used to observe changes in gene expression levels by comparing transcriptomes obtained at various time points or from different tissues under different environmental settings [[37]1]. In recent years, owing to advances allowing for high-throughput, relatively fast, and low-cost experiments, RNA-seq has led to unprecedented development of studies focusing on gene functions and regulation, as well as correlations between genes and environmental factors. Compared to traditional microarray hybridization approaches for gene expression analysis, a major advantage of RNA-seq is that it allows for analysis of non-model target species without comprehensive gene annotations in advance [[38]2]. This is because RNA-seq can still accurately detect novel gene transcripts and expression levels in different tissues or under different conditions through de novo assembly of the sequenced reads. In addition, RNA-seq provides high dynamic expression ranges, and costs are relatively low for obtaining whole-transcriptome data [[39]3]. Based on these advantages, non-model organisms now have been frequently selected by the research community for extensive study. It is also due to the facts that non-model organisms possess many interesting and unique features compared to model organisms, and comprehensive transcriptome analysis reveals novel genes and helps to address genetic evolution and ecological issues from different aspects. Although transcriptional data can be obtained using RNA-seq without any prior knowledge of gene sequences, for functional genomic analysis of non-model species, a sequence similarity alignment must be performed with reference sequence datasets containing known gene functions in order to accurately annotate the gene functions of assembled sequences. The most popular gene sequence database is the International Nucleotide Sequence Database Collaboration (INSDC), which regularly updates sequence information from three databases: the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EMBL-EBI), and the DNA Data Bank of Japan (DDBJ). The non-redundant (nr) sequence database GenBank, maintained by NCBI, contains a popular reference dataset for sequence annotation; thus, several RNA-seq studies performed on novel species have thus far aligned sequences to the nr sequence database for gene function annotation [[40]4–[41]6]. However, the volume of this database is enormous. As of June 2016, a total of 37,382,402 sequences had been indexed. Using this dataset as a reference for assembled sequence alignment and gene function annotation not only requires a large amount of computational resources, but the alignment results may also contain false positive annotations due to contaminated samples and false assembled contigs. An alternative method commonly used by biologists is to select the single reference model species that has the greatest evolutionary relevance to the target species. While this method is rapid and simple, the selected model species may not necessarily have complete gene annotations, and thus it is possible that certain information may be missed. Therefore, in this study, we aimed to develop a tool that can use multiple model species simultaneously as references for annotating genes from new RNA-seq