Abstract

   T cell receptors (TCR) and gene expression provide two complementary
   and essential aspects in T cell understanding, yet their diversity
   presents challenges in integrative analysis. We introduce TCRclub, a
   novel method integrating single-cell RNA sequencing data and
   single-cell TCR sequencing data using local harmony to identify
   functionally similar T cell groups, termed ‘clubs’. We applied TCRclub
   to 298,106 T cells across seven datasets encompassing various diseases.
   First, TCRclub outperforms the state-of-the-art methods in clustering T
   cells on a dataset with over 400 verified peptide-major
   histocompatibility complex categories. Second, TCRclub reveals a
   transition from activated to exhausted T cells in cholangiocarcinoma
   patients. Third, TCRclub discovered the pathways that could intervene
   in response to anti-PD-1 therapy for patients with basal cell carcinoma
   by analyzing the pre-treatment and post-treatment samples. Furthermore,
   TCRclub unveiled different T-cell responses and gene patterns at
   different severity levels in patients with COVID-19. Hence, TCRclub
   aids in developing more effective immunotherapeutic strategies for
   cancer and infectious diseases.

   Keywords: Integration, Local Harmony, Single-Cell Analysis, T-Cell
   Clustering

   Subject terms: Computational Biology, Immunology

Synopsis

   graphic file with name 44320_2024_70_Figa_HTML.jpg

   TCRclub integrates scRNA-seq and scTCR-seq data by local harmony to
   identify functionally similar T cell groups, termed ‘clubs’, enhancing
   the understanding of T cell diversity.
     * TCRclub models the relationship between pairwise TCR embedding
       distances and pairwise expression distances, emphasizing local
       homogeneity to integrate TCRs and gene expressions.
     * TCRclub considers the nested cell hierarchy to capture the cellular
       functional diversity of T cells.
     * TCRclub demonstrates better performance in clustering T cells into
       pMHC-specificity clusters.
     * TCRclub unveils the functional landscape of T cells in diverse
       biological contexts, including cholangiocarcinoma, Anti-PD1
       therapy, and SARS-CoV-2 infection.
     __________________________________________________________________

   TCRclub integrates scRNA-seq and scTCR-seq data by local harmony to
   identify functionally similar T cell groups, termed ‘clubs’, enhancing
   the understanding of T cell diversity.

   graphic file with name 44320_2024_70_Figb_HTML.jpg

Introduction

   The immune system is an intricate network comprising cells, tissues,
   and organs that safeguard the body from infections and illnesses. T
   cells are crucial in recognizing and eliminating infected or cancerous
   cells among the various components of the immune system. The diversity
   of T cells is reflected in the diversity of their antigen receptors,
   known as the T cell receptor (TCR). TCR allows T cells to recognize
   pathogens and antigens, making them essential for effective immune
   responses against pathogens and cancer cells (De Simone et al,
   [37]2018; Emerson et al, [38]2017; Rocamora-Reverte et al, [39]2021).

   One of the primary challenges in immunology is identifying functionally
   similar T cells. Several computational methods have been developed to
   cluster T cells to address this challenge based on the similarity of
   the complementarity-determining region 3 (CDR3) sequences. CDR3 is a
   highly variable region of TCR and is directly involved in the binding
   of epitopes presented by major histocompatibility complex (MHC)
   molecules (Stadinski et al, [40]2014; Chronister et al, [41]2021a).
   TCRdist (Dash et al, [42]2017; Mayer-Blackwell et al, [43]2021) applies
   a pairwise distance metric to quantify the similarity of CDR3
   sequences. GLIPH (Glanville et al, [44]2017) focuses on identifying
   TCRs targeting specific antigens, grouping TCR sequences according to
   their shared amino acid motifs and global sequence similarity. GLIPH2
   (Huang et al, [45]2020) is an improved version of GLIPH that employs
   Fisher’s exact test to identify k-mer motifs and to revise global
   similarity to accelerate the processing of large datasets. ClusTCR
   (Valkiers et al, [46]2021) builds similarity graphs for every
   supercluster divided by Facebook artificial intelligence similarity
   search (Faiss) and then uses Markov clustering algorithm (MCL) to
   identify TCR clusters within each graph. GIANA (Zhang et al, [47]2021a)
   uses a geometric isometry-based method to transform CDR3 sequences into
   numerical vectors that preserve their similarity and then uses a
   classic nearest-neighbor search to group TCRs by their Euclidean
   distances.

   Although some methods, such as TCRdist (Dash et al, [48]2017;
   Mayer-Blackwell et al, [49]2021) and GIANA (Zhang et al, [50]2021a),
   employ a nearest-neighbor strategy to identify clusters directly based
   on TCR distances, the clusters may be unreliable in accurately
   reflecting the diversity of functions due to the neglect of the
   underlying T-cell hierarchy (Wu and Wu, [51]2020). In a recent study
   (Chen and Li, [52]2022), we have shown that cell-cell relationships can
   be mapped to a hierarchical structure of multiple layers, where
   cellular homogeneity increases from the root to the leaves. In the case
   of T cells, the root layer represents the entire population, followed
   by the subpopulation layer that contains a broad category of T cells,
   such as helper T cells and naive T cells. The club layer, which is the
   next layer in the hierarchy, provides a finer resolution of cellular
   homogeneity by grouping individual T cells based on their similar
   functions. The hierarchy is built in a specific order, sequentially
   adding each layer to the internal cell hierarchy. By contrast, methods
   that build clusters by the nearest-neighbor search are limited to basic
   layer information and generate flat cell groups, failing to fully
   capture the functional diversity of the cells.

   Moreover, while TCR sequences alone are crucial for T cell
   identification, they do not fully encompass the functional diversity of
   T cells, since functional diversity is also influenced by their gene
   expression profiles (Cheadle et al, [53]2005; Chtanova et al,
   [54]2005). Integrating TCR sequences with single-cell RNA sequencing
   (scRNA-seq) data can provide a more comprehensive view of the T cell
   landscape and reveal the phenotypes and functions of different T cell
   subsets. TESSA (Zhang et al, [55]2021a) leverages both gene expression
   and TCR sequences. It uses a Dirichlet process to map the TCR
   repertoire’s functional landscape and identify antigen-specific TCR
   clusters. However, one limitation of TESSA is that it assumes a fixed
   prior distribution for the Dirichlet process, which may not reflect the
   true diversity and complexity of the TCR repertoire. Additionally,
   TESSA requires all T cells in every iteration of the calculation to
   infer the posterior distribution of the latent variables and
   parameters. It may not be suitable for analyzing highly dynamic T-cell
   populations. Another clustering method, CoNGA (Schattgen et al,
   [56]2022), also utilizes both gene expression and TCR sequences through
   statistical analysis of gene expression and TCR similarity graphs.
   While CoNGA considers the neighbors of T cells, it constructs separate
   graphs for gene expression and TCR sequences and then compares them,
   rather than integrate them into a single graph.

   Here, we propose TCRclub, a novel approach that identifies the
   functional relevance of T cells. TCRclub receives single-cell gene
   expressions and numeric embeddings of TCRs as input. It aims to bridge
   the gap between gene expression and TCRs by modeling the relationship
   between pairwise TCR embedding and pairwise expression distances
   according to local harmony. Local harmony refers to the homogeneity of
   local neighbors surrounding an instance (e.g., a cell), since the
   neighbors in the distance space are more likely to have similar
   characteristics and belong to the same category (Uddin et al, [57]2022;
   Cover and Hart, [58]1967). By emphasizing local harmony, TCRclub
   reduces noise and increases the robustness of integration. Considering
   the built-in cell structure, TCRclub builds the T-cell hierarchy based
   on the residual-distance matrix obtained by integration and identifies
   the T-cell clubs. Finally, the TCRclub is repeated multiple times to
   obtain the consensus results of the clubs as the final output.

   We applied TCRclub to 298,106 T cells from seven scRNA-seq + scTCR-seq
   datasets encompassing various samples representing distinct diseases.
   First, we validated that our model performed better than existing
   methods in effectively grouping T cells into peptide-major
   histocompatibility complex (pMHC)—specificity clubs, with a higher
   clustering purity of 52.25% and clustering coverage of 87.45% across
   over 400 pMHC categories. Furthermore, we showcased the ability of
   TCRclub to unveil the functional landscape of T cells in diverse
   biological contexts, including cholangiocarcinoma, Anti-PD1 therapy,
   and SARS-CoV-2 infection.

Results

TCRclub identifies T-cell clubs by TCR sequences and gene expressions

   TCRclub takes paired gene expressions and TCR sequences as input to
   identify the T-cell clubs. A T-cell club is defined as a subpopulation
   of T cells that exhibits high functional similarity within the
   hierarchy of the T cells (Chen and Li, [59]2022).

   Given that CDR3β is the most diverse and variable component of TCR
   (Rosati et al, [60]2017), our study specifically focuses on the CDR3β
   sequence as the input for TCRclub. The term “TCR embedding” in our
   study refers to the numerical vector obtained by encoding the CDR3β
   sequence using an encoder-classifier approach (Luo et al, [61]2023).
   This CDR3β embedding represents the TCR for subsequent integration with
   gene expression data. A T-cell clone is defined as a group of T cells
   sharing identical CDR3β sequences from the same sample (Zhang et al,
   [62]2021a). The goal of TCRclub is to group functionally similar T
   cells into a club. Since T cells within a clone are likely to have the
   same antigen-specificity (Smith et al, [63]2021; Chronister et al,
   [64]2021) and tend to have similar gene expressions (Schattgen et al,
   [65]2022) (Appendix Fig. [66]S1), TCRclub takes the T-cell clone as the
   basic unit.

   As shown in Fig. [67]1A, TCRclub takes paired gene expressions and TCR
   embeddings as two input matrices. TCRclub consists of two main steps:
   (1) iterative integration of gene expressions and TCR embeddings using
   local harmony to produce a residual-distance matrix at convergence and
   (2) hierarchical clustering of T-cell clones based on the
   residual-distance matrix to identify T-cell clubs. TCRclub repeats this
   workflow for a specified number of rounds and obtains the consensus
   result by aggregating the T-cell clubs of rounds (Fig. [68]1B).

Figure 1. Schematic overview of TCRclub.

   [69]Figure 1
   [70]Open in a new tab

   (A) The workflow of TCRclub. (B) Aggregation of clubs for consensus
   result.

TCRclub discovers latent representations of TCRs

   During the integration of scRNA and TCR data, TCRclub assigns varying
   levels of importance to different dimensions of TCR embeddings, thereby
   highlighting regions of the TCR sequence potentially critical for
   antigen recognition and function. We used the concept score metric
   (Brocki and Chung, [71]2019) to evaluate the TCRs clustered by TCRclub.
   The concept score, obtained by calculating the dot product between the
   TCR embedding and the generated concept vector, allowed us to interpret
   the latent representation of high-level concepts within the TCR
   sequence (Brocki and Chung, [72]2019). Using concept scores, we
   generated density histograms for each club and compared them with other
   clubs within the same sample (Fig. [73]2A,B; Appendix Fig. [74]S2).
   Remarkably, when each club corresponds to prominent epitopes, a
   distinct separation between the clubs based on their peak positions was
   observed (Fig. [75]2A). In contrast, when the club does not correlate
   with prominent epitopes, a relatively closer separation between the
   clubs was noted (Fig. [76]2B). Moreover, for the clubs associated with
   prominent epitopes (Fig. [77]2A), we generated saliency maps using
   Deepexplain (Ancona et al, [78]2017) for the TCRs with the highest
   concept scores within each club. These maps unveiled shared attention
   positions among the TCRs within the same club (Fig. [79]2C; Appendix
   Fig. [80]S2). In summary, our findings underscore that TCRclub
   effectively captures both similarities and distinctions within and
   among TCR clubs.

Figure 2. Performance evaluation of TCRclub for T cell clustering.

   [81]Figure 2
   [82]Open in a new tab

   (A–C) Density histograms of three clubs obtained from the same sample
   illustrate the distribution of concept scores. The blue histograms
   represent the club of interest, while the red histograms represent
   other clubs from the same sample. (A) Three clubs presenting different
   prominent epitopes. (B) Three clubs without prominent epitopes. (C)
   Saliency maps for the three clubs in (A). Each row displays the
   saliency maps for the top four TCRs with the highest concept scores
   within the club. The dark-background heatmap indicates the one-hot
   embedding (see [83]Appendix Supplementary Methods) for each TCR, while
   the white-background heatmap represents the saliency of each amino acid
   in the TCR. The color bar denotes the saliency level, with deeper
   colors indicating higher importance. Shared attentional positions are
   highlighted in red. (D) Heatmap comparing the clustering results of
   different models with the ground truth. RNA expressions were
   dimensionally reduced by uniform manifold approximation and projection
   (UMAP) and are depicted in blue. The ground truth shows the verified
   pMHCs of T cells, with each pMHC assigned a unique color. In the ground
   truth, the same colors depict T cells with the same pMHC. For the
   models, T cells were colored according to the clustering result. For
   example, if the largest number of T cells in the cluster is against a
   particular pMHC, then all the T cells in this cluster will be assigned
   the color corresponding to that pMHC in the ground truth colormap. If a
   T cell does not belong to any cluster, it is assigned a color that is
   not present in the ground truth colormap. Therefore, the clustering
   result that aligns more closely with the colormap of the ground truth
   indicates higher purity and clustering coverage. (E–H) iTOL plots
   display the hierarchy of the clustering results from TCRclub (E), TESSA
   (F), and GLIPH2 (G). GIANA, CoNGA, and ClusTCR did not generate any
   clusters in the depicted example (H). Same as in (D), T cells are
   color-coded based on the verified pMHCs, where identical colors
   represent T cells recognizing the same pMHC. TCRclub (E) exhibits a
   tree-like structure, where T cells associated with the same node in the
   club layer are anticipated to share similar functions. Conversely,
   alternative models. Other models (F–H) generate produced flat cell
   groupings, where T cells affiliated with the same child node are
   expected to exhibit similar functions. The blue and green dotted lines
   mark the 50 and 100% clustering purity, respectively. (I) Comparison of
   TCRclub with other models in terms of average effectiveness, purity,
   and coverage of the dataset.

TCRclub demonstrates better performance in clustering T cells into
pMHC-specificity clusters

   In this section, we assess the ability of TCRclub to cluster T cells
   into various pMHC-specificity groups. We utilized a dataset (Data ref:
   Francis et al, [84]2021) containing 55 samples of single-cell gene
   expression coupled to TCR profiling, with pMHC specificity verified for
   each T cell by tetramer staining technique. Given the large number of
   pMHC categories (over 400), accurately distinguishing between
   functionally similar T-cell clusters posed a challenge. To benchmark
   TCRclub, we compared its performance with TESSA, CoNGA, GIANA, ClusTCR,
   and the series of GLIPH using the same samples and their default
   parameter settings, while quantifying the clustering performance. The
   clustering accuracy was assessed through clustering purity which counts
   the number of T-cell clones targeting the most frequent pMHC in each
   T-cell clone cluster, while clustering coverage represents the
   percentage of T-cell clones that can be assigned to clusters among all
   the input T-cell clones. In most scenarios, tuning the clustering to
   improve purity typically reduces coverage, and vice versa. Therefore,
   we introduce clustering effectiveness, defined as the product of the
   clustering purity and coverage, to capture the trade-off between these
   two critical aspects of T-cell clustering performance.

   We selected T cells with single-specificity (i.e., not considering
   multi-matching T cells) to provide an intuitive illustration of the
   result of its cluster. TCRclub exhibited a higher level of agreement
   with the ground truth, as demonstrated by the colormap (Fig. [85]2D).
   The clustering results obtained using TCRclub constructed a tree-like
   structure for T cells (Fig. [86]2E), where the level of functional
   similarity increases as one moves from the root towards the leaves
   (Chen and Li, [87]2022) (Appendix Fig. [88]S3). The “club layer”, the
   focus of our study, represents the finest resolution of functional
   similarity, and it is positioned just before the last layer (the
   individual T cells). Since each cell has a unique TCR, the last layer
   can also be regarded as T-cell clones. In contrast, other methods
   resulted in flat cell groups (Fig. [89]2F–H). We used the bar chart to
   display the purity of the T-cell clone clusters (Fig. [90]2E–H). We
   observed that TCRclub demonstrated higher purity and coverage compared
   to other methods.

   We compared TCRclub and other methods across all samples. TCRclub
   achieves a higher average clustering purity of 52.25% and a coverage of
   87.45% (Fig. [91]2I). When considering both clustering purity and
   coverage, TCRclub achieved a significantly higher clustering
   effectiveness of 45.34%, indicating its balanced ability to detect
   functionally similar clusters across various samples. In contrast,
   other methods demonstrated lower clustering effectiveness, regardless
   of whether they used only TCRs (e.g., GLIPH series and GIANA) or both
   TCRs and RNA (e.g., TESSA and CoNGA). Most methods had clustering
   effectiveness below 10%. The most comparable method, TESSA, achieved
   only 22.97%, significantly lower than TCRclub, indicating their
   inability to balance clustering purity and coverage effectively. The
   methods compared frequently produce no T-cell cluster in certain
   samples, highlighting their stringent dataset requirements and limited
   generalizability.

   We further evaluated conditions that might affect TCRclub’s clustering
   performance. Because of the prevalence of dropouts in scRNA data, we
   examined whether these dropouts would affect TCRclub’s clustering
   performance or not. By introducing additional dropouts into the raw
   gene expression counts while maintaining the same preprocessing steps,
   we found that increased dropouts could negatively impact both average
   clustering purity and clustering coverage (Appendix Fig. [92]S4). We
   also investigated how extreme clonal expansion might influence
   clustering outcomes. In simulations of extreme clonal expansion
   (see [93]Appendix Supplementary Methods), we observed that this
   condition does not affect clustering results compared to scenarios
   without such expansion (Appendix Fig. [94]S5).

TCRclub reveals the activated-to-exhausted transition in cholangiocarcinoma
patients

   In this investigation, employing TCRclub, we explored the dynamic
   trajectory of T cells within cholangiocarcinoma (CCA) patients.
   Specifically, we utilized TCRclub to analyze a dataset consisting of
   matched scRNA-seq and scTCR-seq profiles of T cells derived from five
   patients diagnosed with cholangiocarcinoma (Shi et al, [95]2022b; Data
   ref: Shi et al, [96]2022a). This dataset includes samples obtained from
   peripheral blood (PB), the tumor's primary site, and lymph nodes.
   First, we obtained the regression coefficients between gene expression
   distances and TCR distances for each clone using TCRclub (see Methods).
   We then compared these coefficients across PB, tumor, and lymph node
   tissues. Interestingly, we found no significant differences in the
   regression coefficients between the different tissues (Appendix
   Fig. [97]S6). This consistency suggests that the relationship between
   gene expression and TCR repertoire remains stable across different
   tissue origins in CCA patients. We annotated the cells of this dataset
   based on the annotation guidelines outlined by Shi et al (Shi et al,
   [98]2022b) (Appendix Figs. [99]S7–[100]S9). We integrated PB and tumor
   samples, or lymph node and tumor samples (if available), for each
   patient and subsequently applied TCRclub to the integrated samples. For
   a better joint analysis of different samples, we recommend integrating
   T-cell gene expressions and removing batch effects before applying
   TCRclub to cells from different samples. As an example, we compared
   integrated PB-tumor samples with unintegrated PB-tumor samples from
   five patients as an instance. We applied TCRclub to the groups,
   respectively. We calculated the ratio of clubs containing T-cell clones
   from both PB and tumor (i.e., mixed clubs). The results showed a
   significantly higher ratio of mixed clubs in the integrated samples
   compared to the unintegrated ones (Appendix Fig. [101]S10). This
   finding suggests that integrating gene expression data is crucial for
   the joint analysis of different samples; otherwise, TCRclub tends to
   cluster T cells with those from the same sample.

   For the T cells sharing identical CDR3β sequences within the same
   sample, we defined them as a T-cell clone, and our observations
   indicated that cells within a given clone tend to exhibit similar gene
   expressions (Appendix Fig. [102]S1). However, T cells with the same
   CDR3β sequences from different samples should not be regarded as a
   single clone, as their gene expression could be very different due to
   variations in the microenvironment. We integrated different samples
   from the same patient across various diseases to assess whether the
   similarity in gene expression among T cells sharing the same CDR3β
   sequences differed between cells from the same sample and those from
   different samples (Appendix Fig. [103]S11). Our analysis revealed a
   significant difference in gene expression similarity, suggesting that T
   cells with identical CDR3β sequences exhibit distinct gene expression
   profiles across samples, even after removing batch effects. Therefore,
   in our work, when applying TCRclub to T cells from different samples,
   gene expressions were integrated prior to TCRclub’s analysis. Clones
   sharing identical CDR3β sequences from different samples were treated
   as distinct clones, ensuring that all T-cell clones from each sample
   could be analysed.

   After applying TCRclub to the integrated PB (or lymph) and tumor
   samples, we categorized PB (or lymph)-derived T cells as
   ‘tumor-unrelated cells’ if their club comprised solely PB (or lymph) T
   cells. Conversely, PB (or lymph)-derived T cells were labeled as
   ‘tumor-related cells’ if their club included both PB (or lymph) T cells
   and tumor T cells. In lymph-derived T cells from three patients
   (Fig. [104]3A), upon comparing the numbers of tumor-related and
   tumor-unrelated T cells in each T-cell subtype, we observed a
   substantial number of tumor-related activated T cells (Fig. [105]3B;
   Appendix Fig. [106]S12). Similarly, in PB-derived T cells from five
   patients (Fig. [107]3C), we noted a significant presence of activated T
   cells associated with tumor T cells in each subtype (Fig. [108]3D;
   Appendix Fig. [109]S12). The findings validate the successful migration
   of activated T cells from lymph nodes or PB and infiltrate into the
   tumor microenvironment (TME) in response to tumor-specific antigens.
   Within the tumor (Fig. [110]3E), a cell was termed ‘PB-related T cells’
   if its club contained PB T cells; otherwise, it was classified as
   ‘PB-unrelated T cells.’ We found a significant number of activated
   cells, such as effector and memory CD8 cells, were associated with PB,
   whereas many exhausted CD8 T cells in the tumor were unrelated to PB T
   cells (Fig. [111]3F; Appendix Fig. [112]S12). The presence of activated
   effector and memory CD8 T cells associated with PB in tumors indicates
   immune infiltration from systemic circulation into the TME. The
   dissociation between exhausted CD8 T cells and PB T cells in the tumor
   suggests that the TME may employ immune evasion mechanisms to evade
   systemic immune surveillance after the activated T cells infiltrated
   into TME.

Figure 3. TCRclub reveals the dynamic journey: early-stage activated T cells
transition to late-stage exhausted phenotype in cholangiocarcinoma patients.

   [113]Figure 3
   [114]Open in a new tab

   (A) UMAP plot of lymph T cells labeled with distinct colors and
   accompanied by cell type annotations. (B) Tumor-unrelated T cells
   (left) and tumor-related T cells (right) depicted in lymph samples. (C)
   UMAP plot of PB T cells labeled with distinct colors and accompanied by
   cell type annotations. (D) Tumor-unrelated T cells (left) and
   Tumor-related T cells (right) visualized in PB samples. (E) UMAP plot
   showing T cells labeled with varying colors for tumor primary locus
   samples, with corresponding cell type annotations. (F) PB-related T
   cells (left) and PB-unrelated T cells (right) are depicted in tumor
   samples. (G) Trajectory plot depicting the progression of effector CD8
   T cells and exhausted CD8 T cells within tumor samples. (H) Pseudotime
   trajectory plot of effector CD8 T cells and exhausted CD8 T cells
   within tumor samples. (I) Trajectory plot demonstrating the states of
   effector CD8 T cells and exhausted CD8 T cells within tumor samples.
   (J) The ratio of PB-related to PB-unrelated effector CD8 T cells, along
   with the ratio of effector CD8 T cells to exhausted T cells in each
   state depicted in (I). Monocle 2 was employed for trajectory and
   pseudotime analysis. (K) Volcano plot displaying DEGs for PB-unrelated
   effector CD8 T cells (n = 520) compared with PB-related effector CD8 T
   cells (n = 440). The p values were generated from the two-sided
   Wilcoxon rank-sum test. (L) Volcano plot depicting DEGs for
   PB-unrelated memory CD8 T cells (n = 313) compared with PB-related
   memory CD8 T cells (n = 222). The p values were generated from the
   two-sided Wilcoxon rank-sum test.

   Given the critical role of CD8 T cells in mounting an immune response
   against tumor cells, primarily through their capacity to directly
   eliminate tumor cells, we isolated all CD8 T cells within the tumor
   primary locus for subsequent trajectory analysis (Appendix
   Fig. [115]S13A) and pseudotime analysis (Appendix Fig. [116]S13B). Our
   observations revealed progressive exhaustion of effector and memory CD8
   T cells over time, while a subset of memory and effector CD8 T cells
   persisted. Our analysis of CD8 T-cell distribution within the clubs
   echoed the findings: a majority of the clubs exhibited a simultaneous
   presence of effector, memory, and exhausted CD8 T-cell types, whereas a
   smaller subset of clubs only displayed the presence of effector and
   memory CD8 T cells (Appendix Fig. [117]S14). Specifically, we
   investigated the differentiation dynamics between PB-related activated
   T cells and PB-unrelated activated T cells. We applied trajectory and
   pseudotime analysis to effector CD8 T cells (Fig. [118]3G,H). We
   identified that PB-related effector CD8 T cells predominantly occupied
   the early stages, gradually transitioning into PB-unrelated effector
   CD8 T cells, and ultimately culminating in exhaustion (Fig. [119]3I,J).
   A similar trend was observed in PB-related memory CD8 T cells, where
   their numbers decreased over time. Some PB-related memory CD8 T cells
   transitioned into PB-unrelated memory CD8 T cells and persisted in the
   tumor, while others progressively progressed toward exhaustion
   (Appendix Fig. [120]S15).

   To elucidate the underlying mechanisms behind the transition from
   PB-related T cells to PB-unrelated T cells, we conducted a differential
   gene expression (DEG) analysis on effector (Fig. [121]3K) and memory
   CD8 T cell subsets (Fig. [122]3L) to uncover the distinctions between
   these two populations. Specifically, we observed the upregulation of
   CXCL13 in both PB-unrelated effector T cells and PB-unrelated memory T
   cells. CXCL13, a chemokine crucial for leukocyte trafficking and
   lymphoid tissue organization, has been implicated in T cell exhaustion.
   The upregulation of CXCL13 in PB-unrelated CD8 T cells may signify an
   indicative mechanism driving enhanced T cell exhaustion within these
   subsets (Dai et al, [123]2021; Yang et al, [124]2021). Additionally, we
   observed the downregulation of FGFBP2 and the upregulation of RPS26 in
   PB-unrelated effector T cells. FGFBP2, known for its involvement in
   regulating fibroblast growth factors (FGFs), negatively impacts T
   cells’ survival, proliferation, and antigen responsiveness (Chang et
   al, [125]2014). Conversely, RPS26 acts as a checkpoint regulating
   T-cell survival and homeostasis. The upregulation of RPS26 could
   represent an adaptive survival response aimed at preserving cellular
   functions and homeostasis within the TME (Chen et al, [126]2021).
   Collectively, the coordinated upregulation of both CXCL13 and RPS26,
   along with the downregulation of FGFBP2, may unveil a nuanced
   equilibrium between exhaustion and survival within PB-unrelated CD8 T
   cell subsets. It implies how the TME influences immune cell behavior,
   leading to a trend of exhaustion and struggle for survival in T cells.

TCRclub discovers the mechanisms underlying response to Anti-PD1 therapy in
BCC patients

   We conducted a study to analyze the response of T cells to therapeutic
   interventions in basal cell carcinoma (BCC) patients. The dataset used
   in this study comprises the paired scRNA-seq and scTCR-seq data of T
   cells obtained from site-matched tumors of BCC patients collected both
   before and after anti-PD1 therapy (Yost et al, [127]2019b; Data ref:
   Yost et al, [128]2019a). The dataset includes samples from a total of
   eleven patients, consisting of six responders and five non-responders
   to the therapy. We divided the dataset into four cohorts,
   responder-pre, responder-post, nonresponder-pre, and nonresponder-post,
   based on the patient’s response to Anti-PD1 therapy and the time of
   sample collection (pre-treatment or post-treatment).

   We explored the regression coefficient between pairwise expression
   distance and pairwise embedding distance calculated by TCRclub for the
   responder and nonresponder cohorts. Our analysis did not reveal a
   significant difference in these regression coefficients between the
   responder-post and nonresponder-post cohorts (Appendix Fig. [129]S16).
   However, TCRclub offers an alternative perspective on the origins of
   T-cell clone proliferation, unlike the previous discovery that
   suggested the proliferation did not originate from pre-existing
   tumor-infiltrating T cells (Yost et al, [130]2019b). We defined four
   cell types based on the patient’s response to therapy and whether the
   TCRs exclusively emerged in the post-treatment sample. For responders,
   we classified T cells as “Respost-PD1” cells if their CDR3β sequences
   were only detected in the post-treatment sample. Otherwise, they were
   labeled as “Respre-PD1” cells. Similarly, the definitions “Nonpre-PD1”
   and “Nonpost-PD1” were used for the non-responders. We aimed to explore
   whether the post-PD1 cells were related to the pre-PD1 cells.
   Consequently, we re-assigned the cell-type label for each T-cell clone
   based on the majority voting results of its corresponding club. The
   clustering results were visualized using UMAP (Fig. [131]4A).
   Interestingly, we observed that in both responders and non-responders,
   a subset of T cells changed their labels due to clustering with another
   cell type. However, this change was more pronounced in responders
   (Fig. [132]4B,C). Specifically, a high proportion of T cells initially
   labeled as post-PD1 in responders were re-labeled as pre-PD1 cells,
   indicating that more post-PD1 cells were related to the pre-treatment
   batch in responders than non-responders.

Figure 4. Investigating the mechanisms of response to Anti-PD1 therapy in BCC
patients using TCRclub.

   [133]Figure 4
   [134]Open in a new tab

   (A) UMAP plots of T cells colored according to the original cell-type
   labels. The upper plot displays the original labels prior to applying
   TCRclub, while the lower plot shows the re-assigned labels after the
   TCRclub application. (B) UMAP plots of T cells colored according to the
   cell-type labels before and after applying TCRclub. The upper plot
   depicts the original Respost-PD1 labels before the TCRclub application,
   while the lower plot highlights the re-assigned Respost-PD1 cells in
   red after the TCRclub application. (C) UMAP plots of T cells colored
   according to the cell-type labels before and after applying TCRclub.
   The upper plot shows the original Nonpost-PD1 labels before the TCRclub
   application, while the lower plot highlights the re-assigned
   NonPost-PD1 cells in red after the TCRclub application. (A–C) share the
   same legend. (D) Heatmap illustrating the DEGs discovered by analyzing
   the post-PD1 T cells related to the pre-PD1 T cells in responders. The
   top 25 genes with the highest p values are depicted. (E, F), Sankey
   plot (E) and Bubble plot (F) presenting the pathway enrichment analysis
   for the DEGs identified in the Respost-PD1 T cells related to the
   Respre-PD1 T cells (Respost-PD1 T cells, n = 671; Nonpost-PD1 T cells,
   n = 450). Seven enriched pathways, sorted based on the −log10 p values,
   along with their associated DEGs in (D), are displayed. The p values
   were generated from the two-sided Wilcoxon rank-sum test.

   We extracted the post-PD1 cells related to the pre-PD1 cells in two
   cohorts: responders and non-responders, and identified differentially
   expressed genes (DEGs) in responder vs. nonresponder (Fig. [135]4D;
   Appendix Fig. [136]S17). Among the DEGs of responders, we obtained some
   clues concerning the mechanism of the response to Anti-PD1 therapy. For
   instance, HLA-A, a member of MHC class I genes, plays a critical role
   in presenting antigens to cytotoxic T cells. When HLA-A is highly
   expressed, it increases the likelihood of presenting tumor-specific
   antigens to T cells, thereby enhancing T-cell recognition and response
   against tumor cells (Krensky, [137]1997). We also found JAK1 and SLA2
   genes in the DEGs of responders. They are involved in mediating the
   signaling of various cytokines and growth factors that regulate the
   activation and differentiation of immune cells (Wu et al, [138]2023;
   Albacker et al, [139]2017). The identified DEGs highlight the
   involvement of specific immune-related pathways, including antigen
   presentation and immune signaling, contributing to an effective
   response to PD1 therapy among responders.

   To further elucidate the functional implications of the DEGs in
   responders, we conducted pathway enrichment analysis (Appendix
   Fig. [140]S18) and ranked the results based on their −log10 p values.
   Seven pathways particularly related to immune response were selected
   (Fig. [141]4E,F). These pathways encompass critical aspects of immune
   signaling and antigen presentation. For instance, pathways such as
   [142]GO:0002479 and [143]GO:0042590 facilitate the generation and
   presentation of tumor antigens to T cells, the primary targets of PD1
   therapy (Niu et al, [144]2021). The efficiency and quality of these
   pathways can significantly impact the recognition and elimination of
   tumor cells by the immune system. Noteworthy is the signaling pathway
   [145]GO:0038061, which regulates the expression of genes involved in
   immune response, inflammation, cell survival, and apoptosis. This
   pathway may modulate the activity and regulation of NF-kappaB, a
   transcription factor controlling the expression of PD-L1 and other
   immune checkpoints (Yi et al, [146]2022).

TCRclub characterizes antigen-specific T-cell response and gene patterns in
COVID-19 patients

   Understanding the strength and distribution of T-cell responses to
   SARS-CoV-2 is essential for elucidating the pathogenesis of COVID-19
   and developing effective therapies. In this study, we analysed the
   paired scRNA-seq and scTCR-seq data from the dataset consisting of 41
   peripheral blood mononuclear cell (PBMC) samples from COVID-19 patients
   with disease severity, including asymptomatic (AS) patients, moderate
   patients, severe patients, and severe recovery (SR) patients (Wang et
   al, [147]2022; Data ref: Zhang C, [148]2022). We used TCRanno (Luo et
   al, [149]2023) to predict the target antigens and virus for the T
   cells, and we adopted the club-based approach to determine the targeted
   antigens for each cell. T cells in a club were regarded as having the
   same antigen-specificity as this club’s majority voting
   antigen-specificity. Similarly, the club was assumed to be specific to
   the voted antigen.

   Previous studies (Chen and John Wherry, [150]2020; Wan et al,
   [151]2020; Liu et al, [152]2020) have reported a negative correlation
   between peripheral blood T lymphocyte counts and the severity of
   COVID-19. However, the extent of T cell response specific to SARS-CoV-2
   among different patients remains to be clarified. We investigated the
   percentage of T cells specific to SARS-CoV-2 in relation to different
   severity levels. Specifically, we found a trend indicating that the
   symptomatic patients may exhibit a higher proportion of T cells
   specific to SARS-CoV-2 compared to those AS patients (Fig. [153]5A). We
   also found that the individuals aged over 60 remarkably showed a larger
   amount of T cells against SARS-CoV-2 (Fig. [154]5B). This observation
   suggests that patients with severe symptoms or older adults display an
   intensified and more pronounced immune response. It also implies that a
   higher count of T cells tailored to SARS-CoV-2 does not necessarily
   correlate with effective infection control or preventing severe
   symptoms. Multiple factors could contribute to this scenario, including
   a potential decline in overall immune function due to immunosenescence
   and the presence of cytokine storms (Del Valle et al, [155]2020;
   Nikolich-Zugich et al, [156]2020; Diao et al, [157]2020). Additionally,
   there was no such significant difference in the percentage of T cells
   targeted for SARS-CoV-2 between males and females or between long-term
   and short-term patients (Appendix Fig. [158]S19). We also compared the
   regression coefficients obtained by TCRclub across those cohorts and
   observed that the symptomatic patients tend to have higher regression
   coefficients than the AS patients. Besides, individuals aged over 60
   displayed higher regression coefficients than those aged below 60
   (Appendix Fig. [159]S20).

Figure 5. TCRclub characterizes antigen-specific T-cell response and gene
patterns in COVID-19 patients.

   [160]Figure 5
   [161]Open in a new tab

   (A, B) Percentage of T cells specific to SARS-CoV-2 across different
   severity levels (A) and age groups (B). Number of samples: AS, n = 5;
   Moderate, n = 13; Severe, n = 11; SR, n = 12; Age <60, n = 20; Age >60,
   n = 21. The p values were generated from the one-sided Mann–Whitney
   U-test. (C, D) Correlation of the usage of various V genes (C) and J
   genes (D) between different severity levels for different antigens. MG
   denotes membrane glycoprotein. NP denotes nucleocapsid phosphoprotein,
   and SG represents surface glycoprotein. (E) Most frequent V-J gene
   pairs for T cells specific to ORF1ab in different severity levels. (F,
   G) Percentage of T-cell clubs specific to ORF1ab (F) and surface
   glycoprotein (G) among patients with different severity levels. Number
   of samples: AS, n = 4; Moderate, n = 11; Severe, n = 10; SR, n = 12 (F)
   and AS, n = 2; Moderate, n = 10; Severe, n = 6; SR, n = 10 (G). The p
   values were generated from the one-sided Mann–Whitney U-test. (H, I)
   Volcano plots of DGEs for T-cell clubs specific to ORF1ab (H) and
   surface glycoprotein (I) in severe patients, compared with AS and
   moderate patients. The p values were generated from the two-sided
   Wilcoxon rank-sum test. Number of cells: AS and moderate, n = 1291;
   Severe, n = 1318 (H); AS and moderate, n = 802; Severe, n = 457 (I).
   Data information: For boxplots in (A, B, F, G), The central band
   represents the median. The lower and upper hinges represent the first
   and third quartiles, respectively. Whiskers are drawn to the farthest
   datapoint within ±1.5 × IQR from the nearest hinge.

   To explore whether patients with varying severity levels exhibit
   similar preferences for V and J gene usage towards the same antigen, we