Abstract

   Discovering dominant patterns and exploring dynamic behaviors
   especially critical state transitions and tipping points in
   high‐dimensional time‐series data are challenging tasks in study of
   real‐world complex systems, which demand interpretable data
   representations to facilitate comprehension of both spatial and
   temporal information within the original data space. This study
   proposes a general and analytical ultralow‐dimensionality reduction
   method for dynamical systems named spatial‐temporal principal component
   analysis (stPCA) to fully represent the dynamics of a high‐dimensional
   time‐series by only a single latent variable without distortion, which
   transforms high‐dimensional spatial information into one‐dimensional
   temporal information based on nonlinear delay‐embedding theory. The
   dynamics of this single variable is analytically solved and
   theoretically preserves the temporal property of original
   high‐dimensional time‐series, thereby accurately and reliably
   identifying the tipping point before an upcoming critical transition.
   Its applications to real‐world datasets such as individual‐specific
   heterogeneous ICU records demonstrate the effectiveness of stPCA, which
   quantitatively and robustly provides the early‐warning signals of the
   critical/tipping state on each patient.

   Keywords: critical state transition, interpretable data representation,
   spatial‐temporal PCA, ultralow‐dimensionality reduction
     __________________________________________________________________

   The proposed spatial‐temporal principal component analysis (stPCA)
   method analytically reduces high‐dimensional time‐series data to a
   single latent variable by transforming spatial information into
   temporal dynamics. By preserving the temporal properties of the
   original data, stPCA effectively identifies critical transitions and
   tipping points. It provides robust early‐warning signals, demonstrating
   effectiveness on both simulation and real‐world datasets.

   graphic file with name ADVS-12-2408173-g005.jpg

1. Introduction

   Numerous physical and biological processes are characterized by
   high‐dimensional nonlinear dynamical systems, in which critical state
   transitions often occur.^[ [36]^1 ^] However, for most real‐world
   systems that are too complex to be simply described by an explicit
   model, we have to study their dynamics especially identifying the
   tipping point before an upcoming critical transition through
   high‐dimensional time‐series or sequential data. The exploratory
   analysis of these systems requires data dimensionality reduction and
   linear or nonlinear data representations,^[ [37]^2 ^] which are
   critical in dynamical analysis, pattern recognition, and
   visualization.^[ [38]^3 ^] It is still a challenging problem to
   efficiently perform dimensionality reduction and explore
   ultralow‐dimensional dynamics in an interpretable way for complex
   dynamical systems. A variety of data‐driven techniques have been
   developed to facilitate this task, including t‐distributed stochastic
   neighbor embedding (t‐SNE),^[ [39]^4 ^] uniform manifold approximation
   and projection,^[ [40]^5 ^] multidimensional scaling (MDS),^[ [41]^6 ^]
   local linear embedding (LLE),^[ [42]^7 ^] isometric feature mapping
   (ISOMAP).^[ [43]^8 ^] Nevertheless, these approaches generally lack the
   desirable linear and temporal interpretability, which is essential in
   enabling a more lucid depiction of location, direction, distance, and
   dynamics in the low‐dimensional representation space.

   Linear dimensionality reduction methods, including principal component
   analysis (PCA),^[ [44]^9 ^] linear discriminant analysis,^[ [45]^10 ^]
   sparse dictionary learning (SDL),^[ [46]^11 ^] and independent
   component analysis^[ [47]^12 ^] offer interpretability by preserving
   the relations among variables using linear transformations. For
   example, PCA projects the original data onto a lower‐dimensional
   subspace in a linear form such that the projected latent variables
   capture the maximum variance in the representation space. However,
   these existing methods do not leverage the temporal or inherent
   relations among sequentially observed samples within the original data;
   hence, explaining the dynamic nature of the time‐series data by these
   methods is difficult. Besides, these methods are sensitive to outliers
   and thereby susceptible to noise interference. These limitations are a
   barrier to the practical usage of data representation for real‐world
   dynamical systems.

   Deep neural networks (DNNs) have emerged as powerful tools in
   sequential data analysis and dynamic modeling, leveraging deep learning
   techniques to handle nonlinear embedding and intricate patterns in
   dynamical systems, including recurrent neural networks (RNNs)^[ [48]^13
   ^] and the low‐rank variants,^[ [49]^14 , [50]^15 ^] temporal
   convolutional networks,^[ [51]^16 ^] autoencoders (AEs),^[ [52]^17 ^]
   deep belief networks (DBNs),^[ [53]^18 ^] transformers,^[ [54]^19 ^]
   CEBRA,^[ [55]^20 ^] LFADS,^[ [56]^21 ^] CANDyMan,^[ [57]^22 ^] and
   MARBLE.^[ [58]^23 ^] Despite of the state‐of‐the‐art performances of
   these DNN architectures, traditional RNNs can be computationally
   intensive and may suffer from issues such as vanishing or exploding
   gradients, which can affect their ability to capture all relevant
   information. Additionally, many DNN architectures still demand
   significant computational and memory resources, which can be limiting
   in resource‐constrained environments or scenarios with strict latency
   requirements.

   Recently, the Koopman operator theory offers an alternative
   perspective, suggesting that linear superposition may characterize
   nonlinear dynamics through the infinite‐dimensional linear Koopman
   operator,^[ [59]^24 , [60]^25 , [61]^26 , [62]^27 ^] but obtaining its
   representations has proven difficult in all but the simplest systems.
   One approach to find tractable finite‐dimensional representations of
   the Koopman operator is DNN,^[ [63]^28 ^] however, they still face
   challenges related to the requirement of large training samples and
   other computational issues. Dynamic mode decomposition (DMD)^[ [64]^24
   , [65]^25 , [66]^26 ^] is another commonly employed method for
   approximating the Koopman operator from time‐series data. It decomposes
   the system dynamics into temporal modes, each of which is associated
   with a fixed oscillation frequency and decay/growth rate. Some variants
   of the DMD algorithm mitigate noise interference by using techniques
   such as a Schweppe‐type Huber maximum‐likelihood estimator to flag
   outliers, incorporating regularizers and constraints  on the
   decomposition parameters, or deriving a maximum a posteriori (MAP)
   estimator for systems corrupted by multiplicative noise.^[ [67]^29 ,
   [68]^30 , [69]^31 ^] Nonetheless, these DMD methods may require
   extensive time‐series data, and involves the computation of
   pseudo‐inverses of linear matrix operator and truncation of spectrum,
   all potentially impacting accuracy. Another method for representing
   dynamical systems is sparse identification of nonlinear dynamical
   systems,^[ [70]^32 ^] which combines sparsity‐promoting techniques and
   machine learning with nonlinear dynamical systems to discover governing
   equations, but the accuracy highly depends on the choice of measurement
   coordinates and the sparsifying function.

   Furthermore, these dimensionality reduction methods mentioned above
   generally require multiple variables or principal components to
   represent the original high‐dimensional data. In other words, multiple
   representation components are usually mandatory for reconstructing a
   dynamical system by those methods to avoid information loss, which
   leads to challenges in efficiently visualizing or quantifying nonlinear
   phenomena, e.g., the critical slowing down (CSD) effect.^[ [71]^33 ,
   [72]^34 ^] In fact, when a dynamical system approaches a bifurcation
   point from a stable steady state, the space is generically constrained
   to a center manifold, which typically has codim‐1 local bifurcation
   with a dominant real eigenvalue such as the saddle‐node bifurcation;
   thus, this one‐dimensional variable is roughly considered to represent
   the center manifold near the tipping point. A natural question arises
   for analyzing the observed high‐dimensional time‐series data: is it
   possible to devise an ultralow‐dimensionality reduction method with
   only a single latent variable to fully represent or reconstruct the
   whole dynamics of the original, yet unknown, high‐dimensional system?

   Based on the generalized Takens’ embedding theory,^[ [73]^35 , [74]^36
   ^] the dynamics of a system can be topologically reconstructed from the
   delay embedding scheme if L > 2d > 0, where d is the box‐counting
   dimension of the attractor for the original system and L represents the
   embedding dimension. By assuming that the steady state of a
   high‐dimensional system is contained in a low‐dimensional manifold,
   which is generally satisfied for dissipative real‐world systems, the
   spatial‐temporal information (STI)^[ [75]^37 ^] transformation has
   theoretically been derived from the delay embedding theory. This
   approach transforms the spatial information of high‐dimensional data
   into the temporal dynamics of any (one) target variable, i.e., a
   one‐dimensional delay embedding of this single variable, thus
   exploiting not only inter‐sample (cross‐sample) information but also
   intra‐sample information embedded in the high‐dimensional/spatial data.
   Several methods, such as the randomly distributed embedding,^[ [76]^37
   ^] the autoreservoir neural network,^[ [77]^38 ^] and the
   spatiotemporal information conversion machine,^[ [78]^39 ^] have been
   developed for predicting short‐term time series within the framework of
   STI transformation, which depends on an explicit target variable from
   high‐dimensional data.

   In this study, according to the STI transformation but relying on a
   latent variable, we develop a novel ultralow‐dimensionality reduction
   framework: spatial‐temporal principal component analysis (stPCA), to
   fully represent the dynamics of high‐dimensional time‐series data by
   only a single latent variable. This framework enables effective
   detection of the impending critical transitions in dynamical systems
   with codim‐1 local bifurcation by examining the fluctuation of this
   single variable derived by transforming high‐dimensional spatial
   information into low‐dimensional temporal information. On the basis of
   a solid theoretical background in nonlinear dynamics and delay
   embedding theory (Figure [79] 1A), stPCA naturally imposes constraints
   on samples from the inherent temporal characteristics of dynamical
   systems, rendering it an applicable method in representing a complex
   dynamical system with almost no information loss. Furthermore, this
   single representation variable is analytically obtained from a delay
   embedding‐based optimization problem by solving a characteristic
   equation, rather than using an iterative numerical optimization
   algorithm which generally depends on initial values of parameters W,^[
   [80]^40 ^] and can theoretically preserve the dynamical properties of
   the original high‐dimensional system.

Figure 1.

   Figure 1
   [81]Open in a new tab

   Schematic illustration of stPCA. A) The given high‐dimensional
   “spatial” information X ^t can be transformed into one‐dimensional
   “temporal” information Z ^t based on the STI equations derived from the
   delay‐embedding theorem. B) Given the high‐dimensional time series X,
   stPCA seeks to obtain the transformation matrix W and the projected
   Hankel matrix Z. The optimization objective consists of two terms: the
   first term aims to maximize the variance of the projected variable Z,
   and the second term ensures that the projected Hankel matrix Z
   satisfies the delay embedding condition. The analytical solution to the
   optimization problem is obtained by solving a characteristic equation
   H(X)V = αV, where H(X) is a block‐tridiagonal matrix. C) Through stPCA,
   the high‐dimensional time series X is effectively reduced to the Hankel
   matrix Z, which is actually formed by a single variable. On the other
   hand, PCA is employed to reduce the high‐dimensional time series X to
   Y. Notably, both the single latent variable z from stPCA and multiple
   latent variables from PCA can topologically reconstruct the original
   dynamical system.

   Specifically, stPCA employs a spatial‐temporal transformation denoted
   as W, merging i) the attributes of linear interpretation from PCA and
   ii) the temporal dynamics from delay embeddings (Figure [82]1B). On one
   hand, similar to PCA, stPCA requires the maximum projected variance to
   attain a proper feature transformation (i.e., explainable features),
   ensuring the capability of a latent variable to quantify critical state
   transitions based on CSD theory; on the other hand, it requires the
   projected latent variable to adhere to the delay embedding constraints
   (i.e., temporal features);^[ [83]^38 , [84]^39 ^] thus, this ultralow
   1‐dimensional variable can be theoretically used to topologically
   reconstruct the original high‐dimensional dynamical system. Therefore,
   it can be obtained by simultaneously optimizing the projected variance
   and the consistency with the delay embedding for the representative
   data. The existence of a closed‐form solution makes stPCA analytically
   tractable and offers a fast computational way to even large datasets.

   Owing to the single representative variable of stPCA, it may become
   convenient to explore, quantify, and even predict the short‐term
   nonlinear phenomena/behaviors of real‐world dynamical systems from the
   observed high‐dimensional time‐series data, e.g., the deterioration of
   diseases,^[ [85]^41 ^] climate changes,^[ [86]^42 ^] and the
   occurrences of strong earthquakes.^[ [87]^43 ^] In particular, such a
   single representative variable derived from stPCA can be directly
   utilized to detect early‐warning signals of a critical state or tipping
   point just before a sudden event based on the CSD effect (a state
   transition condition).^[ [88]^33 , [89]^41 , [90]^44 ^] We applied
   stPCA first to three representative models, including coupled Lorenz
   systems,^[ [91]^45 , [92]^46 ^] multiple‐node networks with
   Michaelis‒Menten forms,^[ [93]^47 ^] and a DREAM4 dataset.^[ [94]^48 ^]
   Subsequently, we extended the analysis to real‐world scenarios, i.e.,
   the Medical Information Mart for both Intensive Care III (MIMIC‐III)^[
   [95]^49 ^] and Intensive Care IV (MIMIC‐IV)^[ [96]^50 ^] datasets, with
   the aim of identifying the tipping points of patients during their ICU
   hospitalizations. With the dynamics of a single variable derived by
   stPCA from heterogeneous high‐dimensional ICU data, we proposed a
   decision‐making scheme for ICU discharge with the goal of improving
   patient survival rates and ICU performance. In addition, stPCA was
   utilized to analyze a single‐cell RNA sequencing dataset of human
   embryonic stem cells,^[ [97]^51 ^] successfully detecting the cell fate
   transition of differentiation into definitive endoderm. Finally, we
   predicted the risk of transitioning from a healthy state to type 2
   diabetes (T2D) based on continuous glucose monitoring data.^[ [98]^52
   ^] All of these results validated the effectiveness and efficiency of
   the proposed stPCA, as a new ultralow‐dimensionality reduction tool for
   dynamically decoding high‐dimensional time series.

2. Results

2.1. The stPCA Framework for High‐Dimensional Time‐Series Data: an Analytical
and Explainable Approach of a Dynamical System

   The proposed stPCA framework is a dynamic dimensionality reduction
   method designed to maintain the interpretability of a linear subspace
   while representing a high‐dimensional dynamical system with a single
   projected variable using a delay embedding strategy. For an observed
   high‐dimensional time series
   [MATH: <mrow><mrow><msup><mi
   mathvariant="bold">X</mi><mi>t</mi></msup><mo>=</mo><msup><mrow><mo
   stretchy="false">(</mo><mrow><msubsup><mi>x</mi><mn>1</mn><mi>t</mi></m
   subsup><mo>,</mo><msubsup><mi>x</mi><mn>2</mn><mi>t</mi></msubsup><mo>,
   </mo><mtext>…</mtext><mo>,</mo><msubsup><mi>x</mi><mi>n</mi><mi>t</mi><
   /msubsup></mrow><mo
   stretchy="false">)</mo></mrow><mo>′</mo></msup></mrow></mrow> :MATH]
   with n variables and
   [MATH:
   <mrow><mrow><mi>t</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo
   ><mtext>…</mtext><mo>,</mo><mspace
   width="0.33em"></mspace><mi>m</mi></mrow></mrow> :MATH]
   , we construct a corresponding delayed vector Z ^t = (
   [MATH: <mrow><mrow><msup><mi
   mathvariant="bold">Z</mi><mi>t</mi></msup><mo>=</mo><msup><mrow><mo
   stretchy="false">(</mo><msup><mi>z</mi><mi>t</mi></msup><mo>,</mo><msup
   ><mi>z</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow></msup><mo>,</mo>
   <mtext>…</mtext><mo>,</mo><msup><mi>z</mi><mrow><mi>t</mi><mo>+</mo><mi
   >L</mi><mo>−</mo><mn>1</mn></mrow></msup><mo
   stretchy="false">)</mo></mrow><mo>′</mo></msup></mrow></mrow> :MATH]
   by a delay‐embedding strategy with L > 1 as the embedding dimension
   (Figure [99]1A), where the symbol “ ′ ” represents the transpose of a
   vector. Clearly, X ^t is a spatial vector with n variables observed at
   one‐time point t, while Z ^t is a temporal vector formed by only one
   variable z but at many time points
   [MATH:
   <mrow><mrow><mi>t</mi><mo>,</mo><mi>t</mi><mo>+</mo><mn>1</mn><mo>,</mo
   ><mtext>…</mtext><mo>,</mo><mi>t</mi><mo>+</mo><mi>L</mi><mo>−</mo><mn>
   1</mn></mrow></mrow> :MATH]
   . According to Takens’ embedding theory and its generalized versions,
   such as a delay‐embedding scheme, Z ^t topologically reconstructs the
   equivalent dynamics of the original system X ^t if L > 2d > 0, where d
   is the box‐counting dimension of the attractor of X ^t , via the
   spatial‐temporal information transformation (STI) equations Φ (X ^t ) =
   Z ^t shown in the Materials and Methods section. In this way,
   [MATH: <mrow><mrow><mi>Z</mi><mo>=</mo><mo
   stretchy="false">[</mo><msup><mi
   mathvariant="bold">Z</mi><mn>1</mn></msup><mo>,</mo><msup><mi
   mathvariant="bold">Z</mi><mn>2</mn></msup><mo>,</mo><mtext>…</mtext><mo
   >,</mo><msup><mi mathvariant="bold">Z</mi><mi>m</mi></msup><mo
   stretchy="false">]</mo></mrow></mrow> :MATH]
   is a Hankel matrix formed by an extended vector
   [MATH: <mrow><mrow><mi mathvariant="bold">z</mi><mo>=</mo><mo
   stretchy="false">(</mo><mrow><msup><mi>z</mi><mn>1</mn></msup><mo>,</mo
   ><mspace
   width="0.33em"></mspace><msup><mi>z</mi><mn>2</mn></msup><mo>,</mo><mte
   xt>…</mtext><mo>,</mo><msup><mi>z</mi><mi>m</mi></msup><mo>,</mo><mspac
   e
   width="0.33em"></mspace><msup><mi>z</mi><mrow><mi>m</mi><mo>+</mo><mn>1
   </mn></mrow></msup><mo>,</mo><mtext>…</mtext><mo>,</mo><msup><mi>z</mi>
   <mrow><mi>m</mi><mo>+</mo><mi>L</mi><mo>−</mo><mn>1</mn></mrow></msup><
   /mrow><mo stretchy="false">)</mo></mrow></mrow> :MATH]
   of a single variable. Then, aiming to identify the critical state
   transition of a complex dynamical system, we linearize the STI equation
   as Z = WX. Based on the delay embedding requirement for Z, the latter m
   − 1 elements in the i‐th row are identical to the former m − 1 elements
   in the (i + 1)‐th row of the matrix Z, thus we can establish the
   following equation:
   [MATH: <mrow><mrow><msub><mi
   mathvariant="bold">W</mi><mi>i</mi></msub><mi>P</mi><mo
   linebreak="badbreak">=</mo><msub><mi
   mathvariant="bold">W</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></m
   sub><mspace width="0.33em"></mspace><mi>Q</mi></mrow></mrow> :MATH]
   (1)

   where P and Q are two submatrices of X, i.e., P = [X ^2  X ^3 ⋅⋅⋅ X ^m
   ], Q = [X ^1  X ^2 ⋅⋅⋅ X ^m − 1], and W [i ] = (w [i1], w [i2],⋅⋅⋅,
   w[in] ), i = 1, 2, ⋅⋅⋅, L − 1 (see the [100]Experimental Section for
   details). More explanations about linear embedding are provided in the
   Discussion section.

   On the other hand, to obtain W such that Z can be obtained through
   Z = WX given X = [X ^1  X ^2 ⋅⋅⋅ X ^m ] with n variables {
   [MATH:
   <mrow><mrow><msubsup><mi>x</mi><mn>1</mn><mi>t</mi></msubsup><mo>,</mo>
   <msubsup><mi>x</mi><mn>2</mn><mi>t</mi></msubsup><mo>,</mo><mtext>…</mt
   ext><mo>,</mo><msubsup><mi>x</mi><mi>n</mi><mi>t</mi></msubsup></mrow><
   /mrow> :MATH]
   }, i.e., Z [i ]= W [i ]  X, it is necessary to minimize the
   reconstruction error or maximize the variance of the projected variable
   [MATH: <mrow><mrow><mo stretchy="false">∥</mo><msub><mi
   mathvariant="bold">Z</mi><mi>i</mi></msub><msubsup><mo
   stretchy="false">∥</mo><mn>2</mn><mn>2</mn></msubsup></mrow></mrow>
   :MATH]
   . Then, the sample mean of X is removed as is done in PCA. Thus, the
   loss function is shown as
   [MATH: <mrow><mrow><munder><mrow><mi>minimize</mi><mspace
   width="0.33em"></mspace><mspace
   width="0.33em"></mspace></mrow><mi>W</mi></munder><mo
   linebreak="badbreak">−</mo><mrow><mo
   stretchy="false">(</mo><mn>1</mn><mo
   linebreak="goodbreak">−</mo><mi>λ</mi><mo
   stretchy="false">)</mo></mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo
   >=</mo><mn>1</mn></mrow><mi>L</mi></munderover><mrow><mo
   stretchy="false">∥</mo></mrow><msub><mi
   mathvariant="bold">W</mi><mi>i</mi></msub><msubsup><mrow><mi>X</mi><mo
   stretchy="false">∥</mo></mrow><mn>2</mn><mn>2</mn></msubsup><mo
   linebreak="goodbreak">+</mo><mi>λ</mi><munderover><mo>∑</mo><mrow><mi>i
   </mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>L</mi><mo>−</mo><mn>1</mn></m
   row></munderover><msubsup><mrow><mo stretchy="false">∥</mo><msub><mi
   mathvariant="bold">W</mi><mi>i</mi></msub><mi>P</mi><mo>−</mo><msub><mi
   mathvariant="bold">W</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></m
   sub><mi>Q</mi><mo
   stretchy="false">∥</mo></mrow><mn>2</mn><mn>2</mn></msubsup><mo>,</mo><
   /mrow></mrow> :MATH]
   (2)
   [MATH:
   <mrow><mrow><mi>s</mi><mo>.</mo><mi>t</mi><mo>.</mo><mrow><mspace
   width="0.33em"></mspace><mspace
   width="0.33em"></mspace></mrow><munderover><mo>∑</mo><mrow><mi>i</mi><m
   o>=</mo><mn>1</mn></mrow><mi>L</mi></munderover><msub><mi
   mathvariant="bold">W</mi><mi>i</mi></msub><msubsup><mi
   mathvariant="bold">W</mi><mi>i</mi><mi>T</mi></msubsup><mo
   linebreak="goodbreak">=</mo><mn>1</mn></mrow></mrow> :MATH]
   (3)

   where λ is a predetermined regularization parameter between 0 and 1.
   The first term of Equation ([101]2) is the typical PCA loss, which
   guarantees the maximum projected variance and enables identification of
   the critical state transition by examining the fluctuation of z, as
   detailed in the Materials and Methods section. The second term makes Z
   a Hankel matrix, i.e., governed by the delay embedding constraint
   according to Equation ([102]1). Theoretically, the obtained single
   variable z^t can be used to topologically reconstruct the dynamics of
   the original system X ^t , thus characterizing the dynamical features
   of the whole original system.

   Given X, in this work, an analytical solution to the optimization
   problem in Equation ([103]2) was presented by solving the following
   characteristic equation:
   [MATH: <mrow><mrow><mi>H</mi><mfenced open="("
   close=")"><mi>X</mi></mfenced><mi mathvariant="bold">V</mi><mo
   linebreak="badbreak">=</mo><mi>α</mi><mi
   mathvariant="bold">V</mi></mrow></mrow> :MATH]
   (4)

   where H is a function that converts X to a block‐tridiagonal matrix, α
   and V denote the dominant eigenvalue and a corresponding eigenvector of
   matrix H(X), respectively. Matrix W is obtained by reshaping vector V
   (Figure [104]1B), then leading to Hankel matrix Z (Figure [105]1A). The
   main time cost comes from solving the dominant eigenvalue of H(X).
   Utilizing the tridiagonal matrix algorithm,^[ [106]^53 ^] its time
   complexity is O(nL), which ensures the high efficiency of the stPCA
   algorithm.

2.2. Ultralow‐Dimensionality Data Representation and High Dynamic Robustness:
Application of stPCA to Time‐Series Data of a Coupled Lorenz Model

   To illustrate the mechanism of the proposed framework, a series of
   coupled Lorenz models^[ [107]^45 , [108]^46 ^]
   [MATH: <mrow><mrow><mover accent="true"><mi
   mathvariant="bold">X</mi><mo>˙</mo></mover><mfenced open="("
   close=")"><mi>t</mi></mfenced><mo
   linebreak="badbreak">=</mo><mi>G</mi><mfenced separators="" open="("
   close=")"><mrow><mi mathvariant="bold">X</mi><mfenced open="("
   close=")"><mi>t</mi></mfenced><mo>;</mo><mi
   mathvariant="bold">P</mi></mrow></mfenced></mrow></mrow> :MATH]
   (5)

   were employed to generate synthetic time‐series datasets under
   different noise conditions, where G(·) is the nonlinear function of the
   Lorenz system with
   [MATH: <mrow><mrow><mi mathvariant="bold">X</mi><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow><mo>=</mo><msup><mrow><mo
   stretchy="false">(</mo><mrow><msubsup><mi>x</mi><mn>1</mn><mi>t</mi></m
   subsup><mo>,</mo><mtext>…</mtext><mo>,</mo><msubsup><mi>x</mi><mi>n</mi
   ><mi>t</mi></msubsup></mrow><mo
   stretchy="false">)</mo></mrow><mo>′</mo></msup></mrow></mrow> :MATH]
   , P is a parameter vector, and n is the dimension of X(t). More details
   about the Lorenz system are provided in Note [109]S4 (Supporting
   Information).

   To study the intrinsic dynamical information of variable z via stPCA,
   singular value decomposition (SVD) is carried out on the Hankel matrix
   Z stacked by z:
   [MATH: <mrow><mrow><mi>Z</mi><mo
   linebreak="badbreak">=</mo><msub><mfenced separators="" open="("
   close=")"><mtable><mtr><mtd><msup><mi>z</mi><mn>1</mn></msup></mtd><mtd
   ><mrow><mspace
   width="1em"></mspace><msup><mi>z</mi><mn>2</mn></msup></mrow></mtd><mtd
   ><mrow><mspace
   width="1em"></mspace><mtext>…</mtext></mrow></mtd><mtd><mrow><mspace
   width="1em"></mspace><msup><mi>z</mi><mi>m</mi></msup></mrow></mtd></mt
   r><mtr><mtd><msup><mi>z</mi><mn>2</mn></msup></mtd><mtd><mrow><mspace
   width="1em"></mspace><msup><mi>z</mi><mn>3</mn></msup></mrow></mtd><mtd
   ><mrow><mspace
   width="1em"></mspace><mtext>…</mtext></mrow></mtd><mtd><mrow><mspace
   width="1em"></mspace><msup><mi>z</mi><mrow><mi>m</mi><mo>+</mo><mn>1</m
   n></mrow></msup></mrow></mtd></mtr><mtr><mtd><mo>⋮</mo></mtd><mtd><mrow
   ><mspace width="1em"></mspace><mo>⋮</mo></mrow></mtd><mtd><mrow><mspace
   width="1em"></mspace><mi>⋱</mi></mrow></mtd><mtd><mrow><mspace
   width="1em"></mspace><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><msup><mi>z
   </mi><mi>L</mi></msup></mtd><mtd><mrow><mspace
   width="1em"></mspace><msup><mi>z</mi><mrow><mi>L</mi><mo>+</mo><mn>1</m
   n></mrow></msup></mrow></mtd><mtd><mrow><mspace
   width="1em"></mspace><mtext>…</mtext></mrow></mtd><mtd><mrow><mspace
   width="1em"></mspace><msup><mi>z</mi><mrow><mi>m</mi><mo>+</mo><mi>L</m
   i><mo>−</mo><mn>1</mn></mrow></msup></mrow></mtd></mtr></mtable></mfenc
   ed><mrow><mi>L</mi><mo>×</mo><mi>m</mi></mrow></msub><mo
   linebreak="goodbreak">=</mo><mi>U</mi><mi
   mathvariant="normal">Σ</mi><mi>R</mi></mrow></mrow> :MATH]
   (6)

   The columns of matrices U and R after SVD are structured hierarchically
   according to their capability to represent the columns and rows of Z.
   Usually, Z can be derived via a low‐rank approximation by utilizing the
   first r columns from U and R.^[ [110]^20 , [111]^21 ^] The initial r
   columns of R represent a time series of the intensity of each column of
   UΣ.^[ [112]^54 ^] In this manner, we can decompose the single variable
   z into multiple principal components projections (Figure [113] 2C,D).

Figure 2.

   Figure 2
   [114]Open in a new tab

   Synthetic time‐series datasets of a coupled Lorenz model were generated
   in noise‐free and noisy situations. A–I) For the strong noise‐perturbed
   scenario (noise intensity σ = 20), the dimensionality reduction results
   of a time series with length m = 50 and embedding dimension L = 20 of
   the 60D‐coupled (n = 60) Lorenz system. A) The original
   high‐dimensional time series X. B) The heatmap of the matrix
   [MATH: <mrow><mover accent="true"><mi>Z</mi><mo>^</mo></mover></mrow>
   :MATH]
   obtained by stPCA. C) The one‐dimensional variable z obtained by stPCA.
   D) Projections of the top three principal components (denoted as PC for
   short in the figure) obtained from the Hankel matrix from stPCA. E) The
   projected phase of the one‐dimensional variable z. F) The heatmap of
   the projections Y of X obtained by PCA. G) The proportions of PC
   variances from PCA. H) Projections of the top three PCs from PCA. I)
   The projected phase of the #1 PC projection from PCA. J) The
   projections of the top 2 PCs from stPCA or PCA for a low‐dimensional
   system (n = 3) with m = 10, L = 5 and noise intensity σ = 0. K–N) For
   dimensionalities n = 3, 9, 15, and 30, the curves of top 2 PC
   projections from stPCA and PCA, along with the PCFD between the curve
   in Figure 2J and that in the strong noise‐perturbed scenario by stPCA
   and PCA, respectively.

   To quantify the similarity between noisy cases and noise‐free cases for
   the principal component projections of the stPCA and PCA algorithms
   respectively, we designed a metric called the principal component
   Frechet distance (PCFD) based on the Frechet distance^[ [115]^55 ,
   [116]^56 ^] (Note [117]S7 for details, Supporting Information). Given a
   metric space
   [MATH: <mrow><mrow><mo stretchy="false">(</mo><mrow><mi
   mathvariant="script">H</mi><mo>,</mo><mi>D</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   with D as the distance metric in
   [MATH: <mrow><mi mathvariant="script">H</mi></mrow> :MATH]
   , for two curves
   [MATH: <mrow><mrow><mspace
   width="0.33em"></mspace><mi>y</mi><mo>:</mo><mo
   stretchy="false">[</mo><mrow><msub><mi>b</mi><mn>1</mn></msub><mo>,</mo
   ><mspace
   width="0.33em"></mspace><msub><mi>b</mi><mn>2</mn></msub></mrow><mo
   stretchy="false">]</mo><mo>→</mo><mi
   mathvariant="script">H</mi></mrow></mrow> :MATH]
   and
   [MATH: <mrow><mrow><mspace
   width="0.33em"></mspace><mi>z</mi><mo>:</mo><mspace
   width="0.33em"></mspace><mo
   stretchy="false">[</mo><mrow><msubsup><mi>b</mi><mn>1</mn><mo>′</mo></m
   subsup><mo>,</mo><mspace
   width="0.33em"></mspace><msubsup><mi>b</mi><mn>2</mn><mo>′</mo></msubsu
   p></mrow><mo stretchy="false">]</mo><mo>→</mo><mi
   mathvariant="script">H</mi></mrow></mrow> :MATH]
   , their discrete Fréchet distance (DFD) is given as
   [MATH: <mrow><mrow><mi>DFD</mi><mfenced separators="" open="("
   close=")"><mrow><mi>y</mi><mo>,</mo><mi>z</mi></mrow></mfenced><mo
   linebreak="badbreak">=</mo><munder><mo movablelimits="false"
   form="prefix">inf</mo><mrow><msub><mi>β</mi><mi>y</mi></msub><mo>,</mo>
   <msub><mi>β</mi><mi>z</mi></msub></mrow></munder><munder><mi>max</mi><m
   row><mi>t</mi><mo>∈</mo><mfenced separators="" open="["
   close="]"><mrow><mn>0</mn><mo>,</mo><mn>1</mn></mrow></mfenced></mrow><
   /munder><mi>D</mi><mfenced separators="" open="("
   close=")"><mrow><mi>y</mi><mfenced separators="" open="("
   close=")"><mrow><msub><mi>β</mi><mi>y</mi></msub><mfenced open="("
   close=")"><mi>t</mi></mfenced></mrow></mfenced><mo>,</mo><mi>z</mi><mfe
   nced separators="" open="("
   close=")"><mrow><msub><mi>β</mi><mi>z</mi></msub><mfenced open="("
   close=")"><mi>t</mi></mfenced></mrow></mfenced></mrow></mfenced></mrow>
   </mrow> :MATH]
   (7)

   where β[ y ]and β[ z ]are arbitrary continuous nondecreasing functions
   from [0, 1] onto [b [1], b [2]] and
   [MATH: <mrow><mrow><mo
   stretchy="false">[</mo><mrow><msubsup><mi>b</mi><mn>1</mn><mo>′</mo></m
   subsup><mo>,</mo><mspace
   width="0.33em"></mspace><msubsup><mi>b</mi><mn>2</mn><mo>′</mo></msubsu
   p></mrow><mo stretchy="false">]</mo></mrow></mrow> :MATH]
   , respectively. To better evaluate the distance of the two projection
   trajectories, we normalize the curves/projections by
   [MATH: <mrow><mrow><mover
   accent="true"><mi>y</mi><mo>∼</mo></mover><mo>=</mo><mrow><mspace
   width="0.33em"></mspace><mi>Normalize</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>y</mi><mo
   stretchy="false">)</mo></mrow></mrow></mrow> :MATH]
   and
   [MATH: <mrow><mrow><mover
   accent="true"><mi>z</mi><mo>∼</mo></mover><mo>=</mo><mrow><mspace
   width="0.33em"></mspace><mi>Normalize</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>z</mi><mo
   stretchy="false">)</mo></mrow></mrow></mrow> :MATH]
   . Any commonly used normalization methods, such as the z‐score, can be
   applied. Then
   [MATH: <mrow><mrow><mrow><mi>PCFD</mi><mspace
   width="0.33em"></mspace></mrow><mfenced separators="" open="("
   close=")"><mrow><mi>y</mi><mo>,</mo><mi>z</mi></mrow></mfenced><mo
   linebreak="badbreak">=</mo><mrow><mspace
   width="0.33em"></mspace><mi>min</mi></mrow><mfenced separators=""
   open="{" close="}"><mrow><mi>DFD</mi><mfenced separators="" open="("
   close=")"><mrow><mover
   accent="true"><mi>y</mi><mo>∼</mo></mover><mo>,</mo><mover
   accent="true"><mi>z</mi><mo>∼</mo></mover></mrow></mfenced><mo>,</mo><m
   i>DFD</mi><mfenced separators="" open="(" close=")"><mrow><mo
   linebreak="badbreak">−</mo><mover
   accent="true"><mi>y</mi><mo>∼</mo></mover><mo>,</mo><mover
   accent="true"><mi>z</mi><mo>∼</mo></mover></mrow></mfenced></mrow></mfe
   nced></mrow></mrow> :MATH]
   (8)

   By plotting the projections of the first three columns of matrix R in
   Equation ([118]6) of one noise‐free case (m = 50, L = 20, n = 20,
   σ = 0), we found that the projections of the top principal components
   from Z by stPCA are very similar to those by PCA (Figure [119]S5,
   Supporting Information), with PCFD values of 0.108, 0.291 and 0.254 for
   the top 3 components, respectively, which indicates that the
   1‐dimensional variable z actually contains as much information as
   multiple projected variables from PCA. This characteristic is evidence
   that our stPCA method achieves efficient dimensionality reduction to an
   ultralow‐dimensional space with almost no information loss compared
   with traditional PCA. The analysis results obtained by stPCA and PCA
   for more cases are presented in Figure [120]S1 (Supporting
   Information). Furthermore, both stPCA and PCA were applied in two
   scenarios: one with the original sample order and another with a
   randomly shuffled sample order, as shown in Figure [121]S2 (Supporting
   Information).

   In addition, we discuss the situation when there is additive noise,
   where n = 60, m = 50, L = 20, and noise intensity σ = 20
   (Figure [122]2A). The results are illustrated in Figure [123]2B–I. The
   projections of the top three principal components from stPCA in the
   presence of noise closely resemble those from the noise‐free situation,
   which are shown in Figure [124]S5 (Supporting Information). Besides,
   Figure [125]2D,H demonstrates the high robustness of stPCA against
   noise in contrast to PCA, which may be due to the inherent temporal
   constraints of stPCA imposed by the dynamical system in a steady state.
   The results of more cases with different noise intensities are
   illustrated in Figure [126]S5 (Supporting Information). The quantified
   PCFD values for different embedding dimensions L are presented in
   Table [127]S1 (Supporting Information).

   We also applied stPCA and PCA to short‐term time series with m = 10 and
   L = 5. With the relatively low dimensionality n = 3 and a noise
   intensity σ = 0 of system Equation ([128]5), the projections of the top
   two principal components obtained from stPCA and PCA are illustrated in
   Figure [129]2J. When strong noise is introduced (σ = 20), we
   illustrated the top two principal components for dimensionalities
   n = 3, 9, 15, and 30 in Figure [130]2K–N. Clearly, with low
   dimensionality (n = 3), both stPCA and PCA exhibit unsatisfactory
   performance due to the presence of noise compared with the noise‐free
   situation (Figure [131]2K). As dimensionality increases, the
   #1‐#2‐projection curve obtained from stPCA becomes smoother, and the
   PCFD between this curve and that of stPCA in Figure [132]2J decreases
   from 1.6 to 0.23 (Figure [133]2L–N). Meanwhile, the projection curve
   from PCA always exhibits irregular fluctuations owing to noise
   perturbations. The improved performance of stPCA with the increasement
   of dimensionality of system Equation ([134]5) indicates that the
   inclusion of additional relevant variables results in an expansion of
   spatial information. According to Takens' embedding theory, such
   extra/increased spatial information can be transformed into temporal
   information, thereby enhancing the robustness of stPCA to noise, which
   also demonstrates the ability of stPCA in converting spatiotemporal
   information. More results are demonstrated in Figure [135]S3
   (Supporting Information). We also applied stPCA to longer time series
   of a coupled Lorenz system and the DREAM4 dataset, and the results are
   shown in Figures [136]S4 and [137]S7 (Supporting Information).

2.3. The Early‐Warning Signals of a Critical Transition: Application of stPCA
to the Tipping Point Detection of Multiple‐Node Networks

   To illustrate how stPCA effectively characterizes high‐dimensional time
   series, two network models were employed:^[ [138]^47 ^] a
   sixteen‐node/variable system with a Fold bifurcation (Figure [139] 3A)
   and an eighteen‐node/variable system with a Hopf bifurcation
   (Figure [140]3E). More details of these two networks are presented in
   Figures [141]S8, [142]S9, and in Note [143]S5 (Supporting Information).
   From these systems, two datasets were generated by varying parameter τ
   from 1 to 310 and from 1 to 190, respectively (Tables [144]S2
   and [145]S3, Supporting Information). The critical transition points
   were set at τ = 210 (Hopf bifurcation) and τ = 100 (fold bifurcation).
   To investigate different states with the varying parameters of a
   dynamical system, a sliding‐window scheme was implemented to divide the
   entire process into smaller windows. For each window, the dynamics of
   the high‐dimensional time series are reduced to that of a latent
   variable z from stPCA (Figure [146]3B,F). Based on the CSD or dynamic
   network biomarker (DNB),^[ [147]^44 ^] the abrupt increase in the
   standard deviation (SD) of z = (z ^1,z ^2,⋅⋅⋅, z^m ,z ^m + 1, ⋅⋅⋅, z ^m
   + L − 1) (or the fluctuations of the “z” dynamics in a sliding window)
   indicates the upcoming critical transition (Figure [148]3C,G). The
   details are shown in the Materials and Methods section.

Figure 3.

   Figure 3
   [149]Open in a new tab

   Application of stPCA to the tipping point detection in simulated
   datasets. The datasets are generated from a multiple‐node network
   model. A) A 16‐node model of a 16‐dimensional dynamical system with a
   Hopf bifurcation. B) The observed dynamics of each node/variable as
   parameter τ varies for the 16‐node model. To capture the early‐warning
   signals of the tipping points before the critical transitions of the
   dynamical system, the original high‐dimensional time series are
   partitioned into sliding windows. The one‐dimensional latent variable z
   is obtained by stPCA from each sliding window. C) In each sliding
   window, the standard deviation (SD) of the latent variable z is
   calculated with the varying parameter τ and regularization parameter λ
   for the 16‐node model. D) The red and gray curves represent the average
   embedding error and all embedding errors as λ varies, respectively. E)
   An 18‐node model of an 18‐dimensional dynamical system with a fold
   bifurcation. F) The observed dynamics of each node/variable as
   parameter τ varies for the 18‐node model. The one‐dimensional latent
   variable z is obtained by stPCA from each sliding window. G) The SD of
   z is calculated within a sliding window as parameters τ and λ varies
   for the 18‐node model. H) The red curve represents the curve of the
   average embedding error as λ varies, while the gray curves represent
   the curve of all embedding errors as λ varies for the 18‐node model.

   In addition, when the regularization parameter λ varies from 0.78 to
   0.98, stPCA can effectively detect the imminent critical transitions of
   systems with fold bifurcation (Figure [150]3C) and Hopf bifurcation
   (Figure [151]3G) through the one‐dimensional dynamics of z. However,
   the embedding error
   [MATH:
   <mrow><mrow><msub><mi>E</mi><mrow><mi>e</mi><mi>m</mi><mi>b</mi><mi>e</
   mi><mi>d</mi><mi>d</mi><mi>i</mi><mi>n</mi><mi>g</mi></mrow></msub><mo>
   =</mo><mo stretchy="false">∥</mo><mi>Z</mi><mo>−</mo><mover
   accent="true"><mi>Z</mi><mo>^</mo></mover><msub><mo
   stretchy="false">∥</mo><mi>F</mi></msub></mrow></mrow> :MATH]
   increases accordingly (Figure [152]3D,H) with decreasing λ. To meet the
   delay embedding condition so that
   [MATH: <mrow><mover accent="true"><mi>Z</mi><mo>^</mo></mover></mrow>
   :MATH]
   can reconstruct the original high‐dimensional dynamics, it is necessary
   to keep the embedding error below 2, which suggests that λ typically
   ranges from 0.9 to 1.

2.4. ICU Decision‐Making: Real‐World Application of stPCA on MIMIC Datasets

   The MIMIC‐III database^[ [153]^49 ^] is a widely used resource for
   studying critical care patients.^[ [154]^57 , [155]^58 ^] It comprises
   deidentified health‐related data associated with patients who stayed in
   the critical care units of the Beth Israel Deaconess Medical Center
   between 2001 and 2012 and contains comprehensive clinical information
   collected from the ICU stays. The percentage of patients who
   experienced death within a 5‐year period was 2.3%, and unplanned 30‐day
   readmissions were 12.9%; therefore, making accurate judgments about
   whether a patient is ready for discharge from the ICU is of utmost
   importance in optimizing life‐saving measures with the constraints of
   available medical resources. By analyzing these heterogeneous
   high‐dimensional noisy data, we aimed to gain insights into the complex
   nature of critical care conditions and contribute to improving patient
   management and decision‐making processes.

   In this study, we first applied stPCA to a subset of the MIMIC‐III
   database. We randomly selected 10 000 patients from this database.
   After a rigorous preprocessing procedure shown in Figure [156]S14
   (Supporting Information), which involved data cleaning and quality
   control measures, a final dataset comprising 2908 patients was obtained
   (Figure [157] 4A). The data of each patient were represented by a
   heterogeneous high‐dimensional time series, i.e., a matrix consisting
   of at least 10 time points indicating the number of hours since his/her
   admission and containing no less than five diagnosis‐related
   indicators. Additional details of the preprocessing process are
   provided in Figure [158]S11 (Supporting Information).

Figure 4.

   Figure 4
   [159]Open in a new tab

   Performance of stPCA applied on the MIMIC‐III dataset with three groups
   on patients’ ICU stay. A) Patient admission data were extracted from
   the ICU PDMS. According to the screening criteria, we obtained 2908
   patient‐specific matrices that represent tested disease indicators at
   different hours. The patients were divided into three groups, with B–D)
   representing patients who stayed in the ICU for 1–24 h, E–G)
   representing patients who stayed in the ICU for 25–48 h, and H–J)
   representing patients who stayed in the ICU for 49–96 h. For each
   group, the high‐dimensional disease indicators of each patient were
   dimensionally reduced to a single variable by stPCA. Then, a DTW scheme
   was adopted to align these time series so that they have the same
   length. B) Each colored curve corresponds to the SD of the
   one‐dimensional variable for each patient, and the black curve
   represents the average SD. Then, time series clustering was performed
   on these curves, resulting in four clusters. Notably, Cluster 3
   exhibits strong fluctuations in the later stages. C) Survival analysis
   results for the four clusters with an ICU duration of 1–24 h. D)
   Proportion of patients in the four clusters with an ICU duration of
   1–24 h who experienced the highest disease mortality rate. E) Time
   series clustering results for patients with an ICU duration of 25–48 h,
   with Cluster 3 exhibiting strong fluctuations in the later stages. F)
   Survival analysis results for the four clusters with ICU durations of
   25–48 h. G) Proportion of patients in the four clusters with ICU
   durations of 25–48 h who experienced diseases with the highest
   mortality rates. H) Time series clustering results for patients with
   ICU durations of 49–96 h, with Cluster 2 exhibiting strong fluctuations
   in the later stages. I) Survival analysis results for the four clusters
   with ICU durations of 49–96 h. J) Proportion of patients in the four
   clusters with ICU durations of 49–96 h who experienced diseases with
   the highest mortality rates.

   For each patient, the high‐dimensional time series was reduced to one
   variable z by stPCA with the sliding‐window scheme, and then the
   fluctuation of z, Fl^z , was calculated;, i.e.,
   [MATH: <mrow><mrow><mi>F</mi><msubsup><mi>l</mi><mrow><mo
   stretchy="false">(</mo><mi>k</mi><mo
   stretchy="false">)</mo></mrow><mi>z</mi></msubsup><mo>=</mo><mrow><mspa
   ce width="0.33em"></mspace><mi>SD</mi></mrow><mrow><mo
   stretchy="false">(</mo><mrow><msubsup><mi>z</mi><mrow><mo
   stretchy="false">(</mo><mi>k</mi><mo
   stretchy="false">)</mo></mrow><mn>1</mn></msubsup><mo>,</mo><msubsup><m
   i>z</mi><mrow><mo stretchy="false">(</mo><mi>k</mi><mo
   stretchy="false">)</mo></mrow><mn>2</mn></msubsup><mo>,</mo><mtext>…</m
   text><mo>,</mo><msubsup><mi>z</mi><mrow><mo
   stretchy="false">(</mo><mi>k</mi><mo
   stretchy="false">)</mo></mrow><mi>m</mi></msubsup><mo>,</mo><msubsup><m
   i>z</mi><mrow><mo stretchy="false">(</mo><mi>k</mi><mo
   stretchy="false">)</mo></mrow><mrow><mi>m</mi><mo>+</mo><mn>1</mn></mro
   w></msubsup><mo>,</mo><mtext>…</mtext><mo>,</mo><msubsup><mi>z</mi><mro
   w><mo stretchy="false">(</mo><mi>k</mi><mo
   stretchy="false">)</mo></mrow><mrow><mi>m</mi><mo>+</mo><mi>L</mi><mo>−
   </mo><mn>1</mn></mrow></msubsup></mrow><mo
   stretchy="false">)</mo></mrow></mrow></mrow> :MATH]
   was calculated for the kth sliding window (Figure [160]S12, Supporting
   Information). Based on the individual variations in ICU stay duration,
   we classified the curves of Fl^z into four groups: Group 1 (ICU stay
   duration 1–24 h), Group 2 (ICU stay duration 25–48 h), Group 3 (ICU
   stay duration 49–96 h), and Group 4 (ICU stay duration exceeding 4
   days). To explore underlying dynamic patterns among patients in each
   group, we first used the dynamic time warping (DTW) algorithm^[
   [161]^59 , [162]^60 ^] to align the curves of Fl^z so that they have a
   consistent length. The details of the DTW algorithm are presented in
   Note [163]S8 (Supporting Information).

   Then, the SD curves were clustered^[ [164]^60 , [165]^61 ^] within each
   group based on the aligned time series. Four clusters were obtained
   within each group (Figure [166]4B,E,H). Notably, there is a common
   pattern among Cluster 3 of Group 1 (Figure [167]4C), Cluster 3 of Group
   2 (Figure [168]4F), and Cluster 2 of Group 3 (Figure [169]4I), all of
   which exhibit relatively strong fluctuations in z in the later stages,
   correlating with the poorest recorded prognosis within the
   corresponding cluster. In addition, there was a significant difference
   (pvalue < 0.05) in the prognoses of patients among different clusters
   in each group. Furthermore, by examining the proportion of patients
   within each cluster who suffered from diseases/diagnoses with the
   highest mortality rates (Figure [170]4D,G,J), it was found that in
   these clusters with poor prognoses, the proportions of patients
   afflicted by diagnoses with high mortality rates were also higher.
   Based on these observations, we speculate that for an individual
   patient, the significant fluctuations of z from stPCA may indicate a
   critical condition, warranting close monitoring and ongoing treatment
   until the Fl ^z index stabilizes.

   Furthermore, on the basis of Fl^z , i.e., the fluctuation of the latent
   variable z, and 2–5 diagnosis‐specific indicators, we proposed a new
   scheme to determine whether a patient should be discharged from the ICU
   at time point t, i.e., ICU decision‐making, as follows:

   In recent hours,
   [MATH: <mrow><mrow><mo
   stretchy="false">{</mo><mrow><mi>t</mi><mo>−</mo><mi>w</mi><mi>l</mi><m
   o>+</mo><mn>1</mn><mo>,</mo><mtext>…</mtext><mo>,</mo><mi>t</mi><mo>−</
   mo><mn>1</mn><mo>,</mo><mspace
   width="0.33em"></mspace><mi>t</mi></mrow><mo
   stretchy="false">}</mo></mrow></mrow> :MATH]
   , where wl represents the window length, if the average Fl ^z for a
   patient is significantly lower than the highest value observed since
   admission, it indicates a relatively stable condition after treatment.
   An index idx[ z ](t) of z is defined as follows:
   [MATH: <mrow><mrow><mi>id</mi><msub><mi
   mathvariant="normal">x</mi><mi>z</mi></msub><mfenced open="("
   close=")"><mi>t</mi></mfenced><mo
   linebreak="badbreak">=</mo><mi>max</mi><mfenced separators="" open="("
   close=")"><mrow><mi>mean</mi><mfenced separators="" open="("
   close=")"><msub><mi
   mathvariant="bold">S</mi><mrow><mi>past</mi><mfenced open="("
   close=")"><mi>j</mi></mfenced></mrow></msub></mfenced></mrow></mfenced>
   <mspace width="0.33em"></mspace><mo>/</mo><mspace
   width="0.33em"></mspace><mi>mean</mi><mfenced separators="" open="("
   close=")"><msub><mi
   mathvariant="bold">S</mi><mi>recent</mi></msub></mfenced></mrow></mrow>
   :MATH]
   (9)

   where vector S [past(j)] represents a vector of standard deviations of
   z during a past period (time window j);, i.e.,
   [MATH: <mrow><mrow><msub><mi
   mathvariant="bold">S</mi><mrow><mi>past</mi><mo
   stretchy="false">(</mo><mi>j</mi><mo
   stretchy="false">)</mo></mrow></msub><mo>=</mo><mrow><mo
   stretchy="false">(</mo><mrow><mi>F</mi><msubsup><mi>l</mi><mrow><mo
   stretchy="false">(</mo><mrow><mi>j</mi><mo>−</mo><mi>w</mi><mi>l</mi><m
   o>+</mo><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow><mi>z</mi></msubsup><mo>,</mo><mtext>…</m
   text><mo>,</mo><mi>F</mi><msubsup><mi>l</mi><mrow><mo
   stretchy="false">(</mo><mrow><mi>j</mi><mo>−</mo><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow><mi>z</mi></msubsup><mo>,</mo><mi>F</mi><
   msubsup><mi>l</mi><mrow><mo stretchy="false">(</mo><mi>j</mi><mo
   stretchy="false">)</mo></mrow><mi>z</mi></msubsup></mrow><mo
   stretchy="false">)</mo></mrow></mrow></mrow> :MATH]
   , with
   [MATH:
   <mrow><mrow><mi>j</mi><mo>=</mo><mi>w</mi><mi>l</mi><mo>,</mo><mi>w</mi
   ><mi>l</mi><mo>+</mo><mn>1</mn><mo>,</mo><mtext>…</mtext><mo>,</mo><mi>
   t</mi><mo>−</mo><mn>1</mn></mrow></mrow> :MATH]
   . In addition, vector S [recent] represents the standard deviations of
   z over the most recent wl time points, i.e.,
   [MATH: <mrow><mrow><msub><mi
   mathvariant="bold">S</mi><mi>recent</mi></msub><mo>=</mo><mrow><mo
   stretchy="false">(</mo><mrow><mi>F</mi><msubsup><mi>l</mi><mrow><mo
   stretchy="false">(</mo><mrow><mi>t</mi><mo>−</mo><mi>w</mi><mi>l</mi><m
   o>+</mo><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow><mi>z</mi></msubsup><mo>,</mo><mtext>…</m
   text><mo>,</mo><mspace
   width="0.33em"></mspace><mi>F</mi><msubsup><mi>l</mi><mrow><mo
   stretchy="false">(</mo><mrow><mi>t</mi><mo>−</mo><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow><mi>z</mi></msubsup><mo>,</mo><mi>F</mi><
   msubsup><mi>l</mi><mrow><mo stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow><mi>z</mi></msubsup></mrow><mo
   stretchy="false">)</mo></mrow></mrow></mrow> :MATH]
   .

   During the recent period, 2–5 diagnosis‐specific items
   [MATH: <mrow><msubsup><mi>x</mi><mi>i</mi><mi>t</mi></msubsup></mrow>
   :MATH]
   that are most closely associated should fall within the normal range
   (see Table [171]S4, Supporting Information). This can be evaluated
   using the following condition:
   [MATH: <mrow><mrow><mrow><mi>itmFlg</mi><mspace
   width="0.33em"></mspace></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow><mo>=</mo><msubsup><mo>∑</mo><mrow><mi>i<
   /mi><mo>=</mo><mn>1</mn></mrow><mi>K</mi></msubsup><mo>#</mo><mrow><mo
   stretchy="false">(</mo><msubsup><mi>x</mi><mi>i</mi><mi>t</mi></msubsup
   ><mo stretchy="false">)</mo></mrow></mrow></mrow> :MATH]
   with item flags
   [MATH: <mrow><mrow><mo>#</mo><mfenced separators="" open="("
   close=")"><msubsup><mi>x</mi><mi>i</mi><mi>t</mi></msubsup></mfenced><m
   o linebreak="goodbreak">=</mo><mfenced separators="" open="{"
   close=""><mtable><mtr><mtd
   columnalign="left"><mrow><mn>1</mn><mo>,</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mi>L</mi><msub><mi>b</mi><mi>i</mi></msub><mo
   linebreak="goodbreak">≤</mo><msubsup><mi>x</mi><mi>i</mi><mi>t</mi></ms
   ubsup><mo
   linebreak="goodbreak">≤</mo><mi>U</mi><msub><mi>b</mi><mi>i</mi></msub>
   <mo>,</mo></mrow></mtd></mtr><mtr><mtd
   columnalign="left"><mrow><mn>0</mn><mo>,</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mi>otherwise</mi><mo>,</mo></mrow></mtd></mtr
   ></mtable></mfenced></mrow></mrow> :MATH]
   (10)

   where K is the number of selected items, Lb[i] and Ub[i] represent the
   lower and upper bounds of the normal range for
   [MATH: <mrow><msubsup><mi>x</mi><mi>i</mi><mi>t</mi></msubsup></mrow>
   :MATH]
   , respectively.

   Then, the decision for ICU discharge at time point t is described as
   follows:
   [MATH: <mrow><mrow><mi>Decision</mi><mspace
   width="0.16em"></mspace><mfenced open="("
   close=")"><mi>t</mi></mfenced><mo linebreak="badbreak">=</mo><mfenced
   separators="" open="{" close=""><mtable><mtr><mtd
   columnalign="left"><mrow><mn>1</mn><mo>,</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mi>id</mi><mspace
   width="0.33em"></mspace><msub><mi
   mathvariant="normal">x</mi><mi>z</mi></msub><mfenced open="("
   close=")"><mi>t</mi></mfenced><mo
   linebreak="goodbreak">≥</mo><mi>F</mi><mi>C</mi><mspace
   width="0.33em"></mspace><mi>and</mi><mspace
   width="0.33em"></mspace><mi>itmFlg</mi><mfenced open="("
   close=")"><mi>t</mi></mfenced><mo>=</mo><mi>K</mi></mrow></mtd></mtr><m
   tr><mtd columnalign="left"><mrow><mn>0</mn><mo>,</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mi>otherwise</mi><mo>,</mo></mrow></mtd></mtr
   ></mtable></mfenced></mrow></mrow> :MATH]
   (11)

   where FC represents the fold change threshold. If Decision (t) = 1,
   i.e., a positive decision at time t, then the patient's condition is
   stable at time point t, thereby indicating suitability for ICU
   discharge. Otherwise, if Decision (t) = 0, i.e., a negative decision at
   time t, then the patient should stay in the ICU for further treatment
   and monitoring. An illustration of this discharge scheme is presented
   in Figure [172]S13 (Supporting Information).

   A set of standardized metrics are employed for evaluating the
   performance of the ICU discharge decision for a specific method as
   follows. When Decision (t) = 1, if the patient is discharged during a
   period of 5 h after t or all the relevant items thereafter are within
   the normal range, then this decision is considered a true positive
   (TP); otherwise, it is a false‐positive (FP). When Decision (t) = 0, if
   the patient is currently receiving treatment in the ICU, then this
   decision is considered a true negative (TN); otherwise, this decision
   is considered a false negative (FN). The F1‐score, accuracy (ACC), true
   positive rate (TPR)/recall, false positive rate (FPR), and precision
   metrics are calculated based on TP, FP, TN, and FN. The detailed
   descriptions and calculation formulas for these metrics are provided in
   Note [173]S6 (Supporting Information).

   Based on the aforementioned scheme, the overall performance of
   stPCA‐based decision‐making was assessed over the course of time since
   admission (Figure [174] 5A). The precision and F1‐score are ≈0.6 and
   0.7, while TPR/recall and ACC metrics are all greater than 0.8 when t ≥
   30 h. When t < 30 h, the slightly lower F1‐score and TPR are due to the
   limited number of discharged patients. Then, the performances across
   five age cohorts are demonstrated in Figure [175]5B. Moreover,
   Figure [176]5C presents the decision performance of stPCA for the top‐8
   diagnoses with the highest mortality rates: sepsis, intracranial
   hemorrhage, pneumonia, congestive heart failure, altered mental status,
   abdominal pain, stroke, and chest pain. Across various metrics, our
   decision scheme based on stPCA consistently demonstrates favorable
   results. Figure [177]5D illustrates the assessment of ICU discharge for
   four subjects diagnosed with sepsis, intracranial hemorrhage,
   pneumonia, and congestive heart failure. When t ≥ 40 h, 38 h, and 36 h
   for Subjects 1, 2, and 3, respectively, we determined that Decision
   (t) = 1; thus, Subjects 1–3 met the criteria for ICU discharge after
   treatment, which is consistent with their positive prognoses, as
   recorded. Conversely, Subject 4 exhibited inadequate treatment response
   and experienced a deteriorating state over time, which is consistent
   with the recorded negative prognosis.

Figure 5.

   Figure 5
   [178]Open in a new tab

   Performance of the decision for ICU discharge. A) For all samples
   across diagnoses of 2908 patients, the performance of stPCA‐based
   discharge decision was assessed for each hour since admission. B)
   TPR/Recall and Precision across five age cohorts are presented. C) For
   the top eight diagnoses with the highest mortality rate, we evaluate
   the performance of our discharge decision metrics based on the stPCA
   algorithm. D) We present four typical cases (patients suffering from
   sepsis, intracranial hemorrhage, pneumonia, and congestive heart
   failure). Combining the dimensionality reduction results of stPCA and
   2–5 diagnosis‐related indicators, we can make decisions on whether
   patients should be discharged from the ICU. The red interval indicates
   that the patient should continue receiving ICU treatment or
   observation, while the blue interval suggests that the patient's
   condition is relatively stable and that they can be discharged from the
   ICU.

2.5. Performance Comparisons of Dimensionality Reduction Methods on the
MIMIC‐III and MIMIC‐IV Datasets

   To further evaluate the performance of the stPCA algorithm in ICU
   decision‐making, stPCA is compared with seven commonly used
   dimensionality reduction or data representation methods including (1)
   PCA,^[ [179]^9 ^] (2) LLE,^[ [180]^7 ^] (3) t‐SNE,^[ [181]^4 ^] (4)
   MDS,^[ [182]^6 ^] (5) ISOMAP,^[ [183]^8 ^] (6) AE,^[ [184]^17 ^] (7)
   variational autoencoder (VAE),^[ [185]^62 ^] (8) MARBLE,^[ [186]^23 ^]
   (9) CANDyMan,^[ [187]^22 ^] (10) HAVOK,^[ [188]^26 ^] which can
   potentially be applied to the decision‐making process of ICU patient
   discharge. The ten methods for comparison are listed below, with
   hyperparameters mainly adopted from the original references. Further