Abstract Time-course experiments using parallel sequencers have the potential to uncover gradual changes in cells over time that cannot be observed in a two-point comparison. An essential step in time-series data analysis is the identification of temporal differentially expressed genes (TEGs) under two conditions (e.g. control versus case). Model-based approaches, which are typical TEG detection methods, often set one parameter (e.g. degree or degree of freedom) for one dataset. This approach risks modeling of linearly increasing genes with higher-order functions, or fitting of cyclic gene expression with linear functions, thereby leading to false positives/negatives. Here, we present a Jonckheere–Terpstra–Kendall (JTK)-based non-parametric algorithm for TEG detection. Benchmarks, using simulation data, show that the JTK-based approach outperforms existing methods, especially in long time-series experiments. Additionally, application of JTK in the analysis of time-series RNA-seq data from seven tissue types, across developmental stages in mouse and rat, suggested that the wave pattern contributes to the TEG identification of JTK, not the difference in expression levels. This result suggests that JTK is a suitable algorithm when focusing on expression patterns over time rather than expression levels, such as comparisons between different species. These results show that JTK is an excellent candidate for TEG detection. INTRODUCTION Time-course experiments using a parallel sequencer or mass spectrometry capture dynamic changes during the development or perturbation of a cellular system over time ([26]1,[27]2). Although specific issues in biological data, such as low sampling frequency, exist ([28]3), recent advancements in modern high-throughput techniques have enabled description of the regulatory molecular circuits that drive differentiation processes and adaptation to the environment in greater detail ([29]4,[30]5). One of the major steps in analyzing time-series omics data is the identification of genes that are differentially expressed between two groups (e.g. wild-type versus knockout strain) on a time axis ([31]6). We defined differentially expressed genes over time as temporal differentially expressed genes (TEGs). While algorithms for general differentially expressed gene analysis ([32]7,[33]8) and algorithms for interpretation of time-series data in the field of circadian rhythms have been relatively well studied ([34]9,[35]10), the golden standard for TEG analysis has not been established. Many tools have been implemented in the detection of TEGs in time-course experiments ([36]11). MaSigPro ([37]12,[38]13) performs polynomial regressions to model time-course expression values, and a log likelihood ratio test to detect TEGs. MaSigPro consists of two steps; in the first step, dynamic (non-flat) genes are selected, and in the second step, the best model is sought, and the P-value is calculated using user-specified parameters. SplineTimeR (splineTC) ([39]14) was originally developed for the construction of gene networks and for provision of pathway-enrichment analysis and visualization functions. SplineTC fits natural cubic spline curves to time-course data and applies empirical Bayes moderate F-statics between two groups. ImpulseDE2 ([40]15) fits the impulse model ([41]16,[42]17) to time-course data, and the null model is represented by a common impulse model; therefore, the alternative model is represented by a different impulse model. LimoRhyde is designed to detect differential rhythmicity and differential expression using cosinor regression ([43]18). These methods have shown high accuracy in comprehensive comparative studies of Spies et al., except for the recently published LimpRhyde ([44]11). These studies are considered as good benchmarks for TEG detection algorithms. Previous approaches for TEG analysis have mainly been performed by fitting regression models to two groups and by assessing whether the models are statistically consistent by using a hypothesis test. However, experimental data contain genes and proteins that differ in complexity, which poses the risk of modeling linearly increasing genes with higher-order functions or fitting cyclically varying genes with linear functions, thereby leading to false positives or false negatives. To tackle this issue, we need to explore the parameters to model each gene or develop a model-free and non-parametric approach. Here, we propose a Jonckheere–Terpstra–Kendall (JTK)-based non-parametric TEG detection algorithm to characterize time-course experiments. The Jonckheere–Terpstra test ([45]19,[46]20) is a non-parametric test for the detection of ordering patterns between two measured quantities, with the correlation coefficient, Kendall’s τ, measuring the ordinal association between two groups. In circadian rhythm studies, the JTK algorithm, which combines these two statistical methods, has been widely used to detect oscillating molecules in omics datasets ([47]9,[48]21). We expanded this application of the JTK algorithm to TEG detection. To the best of our knowledge, JTK is the only non-parametric TEG detection method. Our study demonstrates the novelty of a powerful non-parametric approach in the exploration of differentially expressed genes in time-series omics datasets. MATERIALS AND METHODS Design The schematic diagram of JTK and the definition of TEG in each method are shown in Figure [49]1. JTK calculates Kendall’s τ between two groups (e.g. control and case). For two time-series with lengths n, the replicates were averaged, x = (x[1], x[2], ..., x[n]) and y = (y[1], y[2], ..., y[n]), we define: graphic file with name M8.gif (1) graphic file with name M9.gif (2) The numerator in Equation ([50]1) indicates the number of pairs with consistent behavior between the two time-series, minus the number with inconsistent behavior. All possible pairs, and not just neighboring points, are included in the calculation of τ. The denominator indicates the total number of pairs in the datasets, and τ must be in the range from −1 to 1. Two time-series are perfectly correlated if τ = 1, and are perfectly anti-correlated if τ = −1, or uncorrelated if τ = 0. Null distributions for each gene were predicted by permutation. The number of Kendall’s τ obtained by permutation, which is smaller than the τ obtained from the actual order, was counted and divided by the number of permutations (1000 times) to be considered as the P-value. Figure 1. Figure 1. [51]Open in a new tab (A) An illustration of the JTK algorithm. Solid arrows pointing upward, downward and sideways indicate an increase, decrease and constant, respectively, between two points. JTK compares the increase/decrease pattern of two sequences (e.g. control versus case). The relationship of all combinations between the points is summed up, and normalized by the number of combinations that are within the range of −1 to 1. The two time-series are perfectly correlated if τ = 1, are perfectly anti-correlated if τ = −1 and are uncorrelated if τ = 0. (B) The definition of the TEG in each method. (*1) MaSigPro performs a TEG test for non-flat genes only. (*2) SplineTC offers a choice to include or exclude the intercept in the test. Data simulation To assess the performance of each method, we generated synthetic data that mimicked read count data of RNA-seq data. The time-course of each gene was simulated using the following four functions based on the work of Wang et al. ([52]22): graphic file with name M10.gif (3) graphic file with name M11.gif (4) graphic file with name M12.gif (5) graphic file with name M13.gif (6) where, t is the time point, and α and β set the amplitude and intercept, respectively. The α and β followed a power-law distribution, and were generated by the rplcon function (n = 1, min = 30, α = 2.5) of the powerRaw package in R. The values obtained by these functions (average expression values) were converted to read counts by a negative binomial (NB) distribution. graphic file with name M14.gif (7) where ϕ indicates the magnitude of the noise, and in this study, the noise level refers to ϕ. TEGs, which had different time-series expression patterns between controls and cases, were generated from different functions for controls and cases, while non-TEGs were generated from the same function. The synthetic dataset for Figures [53]2 and [54]3 contains 200 time-series gene expression data conducted for each condition. The data in Figure [55]4 contains 100 time-series gene expression data for each condition. For the gene-labeled non-TEG, the time-series data for the control and cases were generated using the same function, and for the gene-labeled TEG, were generated using the two specified functions. The combination of functions in the labeled genes is constant and is shown in the figure. Figure 2. [56]Figure 2. [57]Open in a new tab Example plots and ROC curves of simulated data. (A) Simulated data for different sequence lengths and (B) number of points. In the case of sine curves, the number of peaks increases as the sequence length increases. As the number of time points increases, the number of peaks remains the same, but the sampling intervals become shorter. The box on the left indicates typical examples of non-TEG genes, and the box on the right indicates typical examples of TEG genes. The noise level is 0.05, and the error bars represent the standard deviation (n = 3). The top boxes represent examples of sequence length or time points of 8, while bottom boxes represent examples of 9. Comparison of method performance with ROC curves for different (C) sequence lengths and (D) number of time points. Datasets in A were used to generate the ROC curves in (C), and datasets in (B) corresponded to (D). The numbers at the top indicate the length of the sequence or the number of time points, and the numbers on the right indicate the noise level. The degree and degree of freedom are shown using the color legend. As the degree and degree of freedom should be less than a time point, only limited conditions are illustrated in the boxes with sequence length or time points of 4. Figure 3. Figure 3. [58]Open in a new tab Comparison of method performance with ROC curves for different number of replicates with 16 time points. Numbers at the top indicate the number of replicates, while numbers on the right indicate the noise level. Since ImpulseDE2 requires multiple replicates to predict the NB distribution, the condition of n = 1 is not shown. Figure 4. [59]Figure 4. [60]Open in a new tab Preferences of each method and function. Functions 1–4 correspond to