Abstract T cell receptors (TCR) and gene expression provide two complementary and essential aspects in T cell understanding, yet their diversity presents challenges in integrative analysis. We introduce TCRclub, a novel method integrating single-cell RNA sequencing data and single-cell TCR sequencing data using local harmony to identify functionally similar T cell groups, termed ‘clubs’. We applied TCRclub to 298,106 T cells across seven datasets encompassing various diseases. First, TCRclub outperforms the state-of-the-art methods in clustering T cells on a dataset with over 400 verified peptide-major histocompatibility complex categories. Second, TCRclub reveals a transition from activated to exhausted T cells in cholangiocarcinoma patients. Third, TCRclub discovered the pathways that could intervene in response to anti-PD-1 therapy for patients with basal cell carcinoma by analyzing the pre-treatment and post-treatment samples. Furthermore, TCRclub unveiled different T-cell responses and gene patterns at different severity levels in patients with COVID-19. Hence, TCRclub aids in developing more effective immunotherapeutic strategies for cancer and infectious diseases. Keywords: Integration, Local Harmony, Single-Cell Analysis, T-Cell Clustering Subject terms: Computational Biology, Immunology Synopsis graphic file with name 44320_2024_70_Figa_HTML.jpg TCRclub integrates scRNA-seq and scTCR-seq data by local harmony to identify functionally similar T cell groups, termed ‘clubs’, enhancing the understanding of T cell diversity. * TCRclub models the relationship between pairwise TCR embedding distances and pairwise expression distances, emphasizing local homogeneity to integrate TCRs and gene expressions. * TCRclub considers the nested cell hierarchy to capture the cellular functional diversity of T cells. * TCRclub demonstrates better performance in clustering T cells into pMHC-specificity clusters. * TCRclub unveils the functional landscape of T cells in diverse biological contexts, including cholangiocarcinoma, Anti-PD1 therapy, and SARS-CoV-2 infection. __________________________________________________________________ TCRclub integrates scRNA-seq and scTCR-seq data by local harmony to identify functionally similar T cell groups, termed ‘clubs’, enhancing the understanding of T cell diversity. graphic file with name 44320_2024_70_Figb_HTML.jpg Introduction The immune system is an intricate network comprising cells, tissues, and organs that safeguard the body from infections and illnesses. T cells are crucial in recognizing and eliminating infected or cancerous cells among the various components of the immune system. The diversity of T cells is reflected in the diversity of their antigen receptors, known as the T cell receptor (TCR). TCR allows T cells to recognize pathogens and antigens, making them essential for effective immune responses against pathogens and cancer cells (De Simone et al, [37]2018; Emerson et al, [38]2017; Rocamora-Reverte et al, [39]2021). One of the primary challenges in immunology is identifying functionally similar T cells. Several computational methods have been developed to cluster T cells to address this challenge based on the similarity of the complementarity-determining region 3 (CDR3) sequences. CDR3 is a highly variable region of TCR and is directly involved in the binding of epitopes presented by major histocompatibility complex (MHC) molecules (Stadinski et al, [40]2014; Chronister et al, [41]2021a). TCRdist (Dash et al, [42]2017; Mayer-Blackwell et al, [43]2021) applies a pairwise distance metric to quantify the similarity of CDR3 sequences. GLIPH (Glanville et al, [44]2017) focuses on identifying TCRs targeting specific antigens, grouping TCR sequences according to their shared amino acid motifs and global sequence similarity. GLIPH2 (Huang et al, [45]2020) is an improved version of GLIPH that employs Fisher’s exact test to identify k-mer motifs and to revise global similarity to accelerate the processing of large datasets. ClusTCR (Valkiers et al, [46]2021) builds similarity graphs for every supercluster divided by Facebook artificial intelligence similarity search (Faiss) and then uses Markov clustering algorithm (MCL) to identify TCR clusters within each graph. GIANA (Zhang et al, [47]2021a) uses a geometric isometry-based method to transform CDR3 sequences into numerical vectors that preserve their similarity and then uses a classic nearest-neighbor search to group TCRs by their Euclidean distances. Although some methods, such as TCRdist (Dash et al, [48]2017; Mayer-Blackwell et al, [49]2021) and GIANA (Zhang et al, [50]2021a), employ a nearest-neighbor strategy to identify clusters directly based on TCR distances, the clusters may be unreliable in accurately reflecting the diversity of functions due to the neglect of the underlying T-cell hierarchy (Wu and Wu, [51]2020). In a recent study (Chen and Li, [52]2022), we have shown that cell-cell relationships can be mapped to a hierarchical structure of multiple layers, where cellular homogeneity increases from the root to the leaves. In the case of T cells, the root layer represents the entire population, followed by the subpopulation layer that contains a broad category of T cells, such as helper T cells and naive T cells. The club layer, which is the next layer in the hierarchy, provides a finer resolution of cellular homogeneity by grouping individual T cells based on their similar functions. The hierarchy is built in a specific order, sequentially adding each layer to the internal cell hierarchy. By contrast, methods that build clusters by the nearest-neighbor search are limited to basic layer information and generate flat cell groups, failing to fully capture the functional diversity of the cells. Moreover, while TCR sequences alone are crucial for T cell identification, they do not fully encompass the functional diversity of T cells, since functional diversity is also influenced by their gene expression profiles (Cheadle et al, [53]2005; Chtanova et al, [54]2005). Integrating TCR sequences with single-cell RNA sequencing (scRNA-seq) data can provide a more comprehensive view of the T cell landscape and reveal the phenotypes and functions of different T cell subsets. TESSA (Zhang et al, [55]2021a) leverages both gene expression and TCR sequences. It uses a Dirichlet process to map the TCR repertoire’s functional landscape and identify antigen-specific TCR clusters. However, one limitation of TESSA is that it assumes a fixed prior distribution for the Dirichlet process, which may not reflect the true diversity and complexity of the TCR repertoire. Additionally, TESSA requires all T cells in every iteration of the calculation to infer the posterior distribution of the latent variables and parameters. It may not be suitable for analyzing highly dynamic T-cell populations. Another clustering method, CoNGA (Schattgen et al, [56]2022), also utilizes both gene expression and TCR sequences through statistical analysis of gene expression and TCR similarity graphs. While CoNGA considers the neighbors of T cells, it constructs separate graphs for gene expression and TCR sequences and then compares them, rather than integrate them into a single graph. Here, we propose TCRclub, a novel approach that identifies the functional relevance of T cells. TCRclub receives single-cell gene expressions and numeric embeddings of TCRs as input. It aims to bridge the gap between gene expression and TCRs by modeling the relationship between pairwise TCR embedding and pairwise expression distances according to local harmony. Local harmony refers to the homogeneity of local neighbors surrounding an instance (e.g., a cell), since the neighbors in the distance space are more likely to have similar characteristics and belong to the same category (Uddin et al, [57]2022; Cover and Hart, [58]1967). By emphasizing local harmony, TCRclub reduces noise and increases the robustness of integration. Considering the built-in cell structure, TCRclub builds the T-cell hierarchy based on the residual-distance matrix obtained by integration and identifies the T-cell clubs. Finally, the TCRclub is repeated multiple times to obtain the consensus results of the clubs as the final output. We applied TCRclub to 298,106 T cells from seven scRNA-seq + scTCR-seq datasets encompassing various samples representing distinct diseases. First, we validated that our model performed better than existing methods in effectively grouping T cells into peptide-major histocompatibility complex (pMHC)—specificity clubs, with a higher clustering purity of 52.25% and clustering coverage of 87.45% across over 400 pMHC categories. Furthermore, we showcased the ability of TCRclub to unveil the functional landscape of T cells in diverse biological contexts, including cholangiocarcinoma, Anti-PD1 therapy, and SARS-CoV-2 infection. Results TCRclub identifies T-cell clubs by TCR sequences and gene expressions TCRclub takes paired gene expressions and TCR sequences as input to identify the T-cell clubs. A T-cell club is defined as a subpopulation of T cells that exhibits high functional similarity within the hierarchy of the T cells (Chen and Li, [59]2022). Given that CDR3β is the most diverse and variable component of TCR (Rosati et al, [60]2017), our study specifically focuses on the CDR3β sequence as the input for TCRclub. The term “TCR embedding” in our study refers to the numerical vector obtained by encoding the CDR3β sequence using an encoder-classifier approach (Luo et al, [61]2023). This CDR3β embedding represents the TCR for subsequent integration with gene expression data. A T-cell clone is defined as a group of T cells sharing identical CDR3β sequences from the same sample (Zhang et al, [62]2021a). The goal of TCRclub is to group functionally similar T cells into a club. Since T cells within a clone are likely to have the same antigen-specificity (Smith et al, [63]2021; Chronister et al, [64]2021) and tend to have similar gene expressions (Schattgen et al, [65]2022) (Appendix Fig. [66]S1), TCRclub takes the T-cell clone as the basic unit. As shown in Fig. [67]1A, TCRclub takes paired gene expressions and TCR embeddings as two input matrices. TCRclub consists of two main steps: (1) iterative integration of gene expressions and TCR embeddings using local harmony to produce a residual-distance matrix at convergence and (2) hierarchical clustering of T-cell clones based on the residual-distance matrix to identify T-cell clubs. TCRclub repeats this workflow for a specified number of rounds and obtains the consensus result by aggregating the T-cell clubs of rounds (Fig. [68]1B). Figure 1. Schematic overview of TCRclub. [69]Figure 1 [70]Open in a new tab (A) The workflow of TCRclub. (B) Aggregation of clubs for consensus result. TCRclub discovers latent representations of TCRs During the integration of scRNA and TCR data, TCRclub assigns varying levels of importance to different dimensions of TCR embeddings, thereby highlighting regions of the TCR sequence potentially critical for antigen recognition and function. We used the concept score metric (Brocki and Chung, [71]2019) to evaluate the TCRs clustered by TCRclub. The concept score, obtained by calculating the dot product between the TCR embedding and the generated concept vector, allowed us to interpret the latent representation of high-level concepts within the TCR sequence (Brocki and Chung, [72]2019). Using concept scores, we generated density histograms for each club and compared them with other clubs within the same sample (Fig. [73]2A,B; Appendix Fig. [74]S2). Remarkably, when each club corresponds to prominent epitopes, a distinct separation between the clubs based on their peak positions was observed (Fig. [75]2A). In contrast, when the club does not correlate with prominent epitopes, a relatively closer separation between the clubs was noted (Fig. [76]2B). Moreover, for the clubs associated with prominent epitopes (Fig. [77]2A), we generated saliency maps using Deepexplain (Ancona et al, [78]2017) for the TCRs with the highest concept scores within each club. These maps unveiled shared attention positions among the TCRs within the same club (Fig. [79]2C; Appendix Fig. [80]S2). In summary, our findings underscore that TCRclub effectively captures both similarities and distinctions within and among TCR clubs. Figure 2. Performance evaluation of TCRclub for T cell clustering. [81]Figure 2 [82]Open in a new tab (A–C) Density histograms of three clubs obtained from the same sample illustrate the distribution of concept scores. The blue histograms represent the club of interest, while the red histograms represent other clubs from the same sample. (A) Three clubs presenting different prominent epitopes. (B) Three clubs without prominent epitopes. (C) Saliency maps for the three clubs in (A). Each row displays the saliency maps for the top four TCRs with the highest concept scores within the club. The dark-background heatmap indicates the one-hot embedding (see [83]Appendix Supplementary Methods) for each TCR, while the white-background heatmap represents the saliency of each amino acid in the TCR. The color bar denotes the saliency level, with deeper colors indicating higher importance. Shared attentional positions are highlighted in red. (D) Heatmap comparing the clustering results of different models with the ground truth. RNA expressions were dimensionally reduced by uniform manifold approximation and projection (UMAP) and are depicted in blue. The ground truth shows the verified pMHCs of T cells, with each pMHC assigned a unique color. In the ground truth, the same colors depict T cells with the same pMHC. For the models, T cells were colored according to the clustering result. For example, if the largest number of T cells in the cluster is against a particular pMHC, then all the T cells in this cluster will be assigned the color corresponding to that pMHC in the ground truth colormap. If a T cell does not belong to any cluster, it is assigned a color that is not present in the ground truth colormap. Therefore, the clustering result that aligns more closely with the colormap of the ground truth indicates higher purity and clustering coverage. (E–H) iTOL plots display the hierarchy of the clustering results from TCRclub (E), TESSA (F), and GLIPH2 (G). GIANA, CoNGA, and ClusTCR did not generate any clusters in the depicted example (H). Same as in (D), T cells are color-coded based on the verified pMHCs, where identical colors represent T cells recognizing the same pMHC. TCRclub (E) exhibits a tree-like structure, where T cells associated with the same node in the club layer are anticipated to share similar functions. Conversely, alternative models. Other models (F–H) generate produced flat cell groupings, where T cells affiliated with the same child node are expected to exhibit similar functions. The blue and green dotted lines mark the 50 and 100% clustering purity, respectively. (I) Comparison of TCRclub with other models in terms of average effectiveness, purity, and coverage of the dataset. TCRclub demonstrates better performance in clustering T cells into pMHC-specificity clusters In this section, we assess the ability of TCRclub to cluster T cells into various pMHC-specificity groups. We utilized a dataset (Data ref: Francis et al, [84]2021) containing 55 samples of single-cell gene expression coupled to TCR profiling, with pMHC specificity verified for each T cell by tetramer staining technique. Given the large number of pMHC categories (over 400), accurately distinguishing between functionally similar T-cell clusters posed a challenge. To benchmark TCRclub, we compared its performance with TESSA, CoNGA, GIANA, ClusTCR, and the series of GLIPH using the same samples and their default parameter settings, while quantifying the clustering performance. The clustering accuracy was assessed through clustering purity which counts the number of T-cell clones targeting the most frequent pMHC in each T-cell clone cluster, while clustering coverage represents the percentage of T-cell clones that can be assigned to clusters among all the input T-cell clones. In most scenarios, tuning the clustering to improve purity typically reduces coverage, and vice versa. Therefore, we introduce clustering effectiveness, defined as the product of the clustering purity and coverage, to capture the trade-off between these two critical aspects of T-cell clustering performance. We selected T cells with single-specificity (i.e., not considering multi-matching T cells) to provide an intuitive illustration of the result of its cluster. TCRclub exhibited a higher level of agreement with the ground truth, as demonstrated by the colormap (Fig. [85]2D). The clustering results obtained using TCRclub constructed a tree-like structure for T cells (Fig. [86]2E), where the level of functional similarity increases as one moves from the root towards the leaves (Chen and Li, [87]2022) (Appendix Fig. [88]S3). The “club layer”, the focus of our study, represents the finest resolution of functional similarity, and it is positioned just before the last layer (the individual T cells). Since each cell has a unique TCR, the last layer can also be regarded as T-cell clones. In contrast, other methods resulted in flat cell groups (Fig. [89]2F–H). We used the bar chart to display the purity of the T-cell clone clusters (Fig. [90]2E–H). We observed that TCRclub demonstrated higher purity and coverage compared to other methods. We compared TCRclub and other methods across all samples. TCRclub achieves a higher average clustering purity of 52.25% and a coverage of 87.45% (Fig. [91]2I). When considering both clustering purity and coverage, TCRclub achieved a significantly higher clustering effectiveness of 45.34%, indicating its balanced ability to detect functionally similar clusters across various samples. In contrast, other methods demonstrated lower clustering effectiveness, regardless of whether they used only TCRs (e.g., GLIPH series and GIANA) or both TCRs and RNA (e.g., TESSA and CoNGA). Most methods had clustering effectiveness below 10%. The most comparable method, TESSA, achieved only 22.97%, significantly lower than TCRclub, indicating their inability to balance clustering purity and coverage effectively. The methods compared frequently produce no T-cell cluster in certain samples, highlighting their stringent dataset requirements and limited generalizability. We further evaluated conditions that might affect TCRclub’s clustering performance. Because of the prevalence of dropouts in scRNA data, we examined whether these dropouts would affect TCRclub’s clustering performance or not. By introducing additional dropouts into the raw gene expression counts while maintaining the same preprocessing steps, we found that increased dropouts could negatively impact both average clustering purity and clustering coverage (Appendix Fig. [92]S4). We also investigated how extreme clonal expansion might influence clustering outcomes. In simulations of extreme clonal expansion (see [93]Appendix Supplementary Methods), we observed that this condition does not affect clustering results compared to scenarios without such expansion (Appendix Fig. [94]S5). TCRclub reveals the activated-to-exhausted transition in cholangiocarcinoma patients In this investigation, employing TCRclub, we explored the dynamic trajectory of T cells within cholangiocarcinoma (CCA) patients. Specifically, we utilized TCRclub to analyze a dataset consisting of matched scRNA-seq and scTCR-seq profiles of T cells derived from five patients diagnosed with cholangiocarcinoma (Shi et al, [95]2022b; Data ref: Shi et al, [96]2022a). This dataset includes samples obtained from peripheral blood (PB), the tumor's primary site, and lymph nodes. First, we obtained the regression coefficients between gene expression distances and TCR distances for each clone using TCRclub (see Methods). We then compared these coefficients across PB, tumor, and lymph node tissues. Interestingly, we found no significant differences in the regression coefficients between the different tissues (Appendix Fig. [97]S6). This consistency suggests that the relationship between gene expression and TCR repertoire remains stable across different tissue origins in CCA patients. We annotated the cells of this dataset based on the annotation guidelines outlined by Shi et al (Shi et al, [98]2022b) (Appendix Figs. [99]S7–[100]S9). We integrated PB and tumor samples, or lymph node and tumor samples (if available), for each patient and subsequently applied TCRclub to the integrated samples. For a better joint analysis of different samples, we recommend integrating T-cell gene expressions and removing batch effects before applying TCRclub to cells from different samples. As an example, we compared integrated PB-tumor samples with unintegrated PB-tumor samples from five patients as an instance. We applied TCRclub to the groups, respectively. We calculated the ratio of clubs containing T-cell clones from both PB and tumor (i.e., mixed clubs). The results showed a significantly higher ratio of mixed clubs in the integrated samples compared to the unintegrated ones (Appendix Fig. [101]S10). This finding suggests that integrating gene expression data is crucial for the joint analysis of different samples; otherwise, TCRclub tends to cluster T cells with those from the same sample. For the T cells sharing identical CDR3β sequences within the same sample, we defined them as a T-cell clone, and our observations indicated that cells within a given clone tend to exhibit similar gene expressions (Appendix Fig. [102]S1). However, T cells with the same CDR3β sequences from different samples should not be regarded as a single clone, as their gene expression could be very different due to variations in the microenvironment. We integrated different samples from the same patient across various diseases to assess whether the similarity in gene expression among T cells sharing the same CDR3β sequences differed between cells from the same sample and those from different samples (Appendix Fig. [103]S11). Our analysis revealed a significant difference in gene expression similarity, suggesting that T cells with identical CDR3β sequences exhibit distinct gene expression profiles across samples, even after removing batch effects. Therefore, in our work, when applying TCRclub to T cells from different samples, gene expressions were integrated prior to TCRclub’s analysis. Clones sharing identical CDR3β sequences from different samples were treated as distinct clones, ensuring that all T-cell clones from each sample could be analysed. After applying TCRclub to the integrated PB (or lymph) and tumor samples, we categorized PB (or lymph)-derived T cells as ‘tumor-unrelated cells’ if their club comprised solely PB (or lymph) T cells. Conversely, PB (or lymph)-derived T cells were labeled as ‘tumor-related cells’ if their club included both PB (or lymph) T cells and tumor T cells. In lymph-derived T cells from three patients (Fig. [104]3A), upon comparing the numbers of tumor-related and tumor-unrelated T cells in each T-cell subtype, we observed a substantial number of tumor-related activated T cells (Fig. [105]3B; Appendix Fig. [106]S12). Similarly, in PB-derived T cells from five patients (Fig. [107]3C), we noted a significant presence of activated T cells associated with tumor T cells in each subtype (Fig. [108]3D; Appendix Fig. [109]S12). The findings validate the successful migration of activated T cells from lymph nodes or PB and infiltrate into the tumor microenvironment (TME) in response to tumor-specific antigens. Within the tumor (Fig. [110]3E), a cell was termed ‘PB-related T cells’ if its club contained PB T cells; otherwise, it was classified as ‘PB-unrelated T cells.’ We found a significant number of activated cells, such as effector and memory CD8 cells, were associated with PB, whereas many exhausted CD8 T cells in the tumor were unrelated to PB T cells (Fig. [111]3F; Appendix Fig. [112]S12). The presence of activated effector and memory CD8 T cells associated with PB in tumors indicates immune infiltration from systemic circulation into the TME. The dissociation between exhausted CD8 T cells and PB T cells in the tumor suggests that the TME may employ immune evasion mechanisms to evade systemic immune surveillance after the activated T cells infiltrated into TME. Figure 3. TCRclub reveals the dynamic journey: early-stage activated T cells transition to late-stage exhausted phenotype in cholangiocarcinoma patients. [113]Figure 3 [114]Open in a new tab (A) UMAP plot of lymph T cells labeled with distinct colors and accompanied by cell type annotations. (B) Tumor-unrelated T cells (left) and tumor-related T cells (right) depicted in lymph samples. (C) UMAP plot of PB T cells labeled with distinct colors and accompanied by cell type annotations. (D) Tumor-unrelated T cells (left) and Tumor-related T cells (right) visualized in PB samples. (E) UMAP plot showing T cells labeled with varying colors for tumor primary locus samples, with corresponding cell type annotations. (F) PB-related T cells (left) and PB-unrelated T cells (right) are depicted in tumor samples. (G) Trajectory plot depicting the progression of effector CD8 T cells and exhausted CD8 T cells within tumor samples. (H) Pseudotime trajectory plot of effector CD8 T cells and exhausted CD8 T cells within tumor samples. (I) Trajectory plot demonstrating the states of effector CD8 T cells and exhausted CD8 T cells within tumor samples. (J) The ratio of PB-related to PB-unrelated effector CD8 T cells, along with the ratio of effector CD8 T cells to exhausted T cells in each state depicted in (I). Monocle 2 was employed for trajectory and pseudotime analysis. (K) Volcano plot displaying DEGs for PB-unrelated effector CD8 T cells (n = 520) compared with PB-related effector CD8 T cells (n = 440). The p values were generated from the two-sided Wilcoxon rank-sum test. (L) Volcano plot depicting DEGs for PB-unrelated memory CD8 T cells (n = 313) compared with PB-related memory CD8 T cells (n = 222). The p values were generated from the two-sided Wilcoxon rank-sum test. Given the critical role of CD8 T cells in mounting an immune response against tumor cells, primarily through their capacity to directly eliminate tumor cells, we isolated all CD8 T cells within the tumor primary locus for subsequent trajectory analysis (Appendix Fig. [115]S13A) and pseudotime analysis (Appendix Fig. [116]S13B). Our observations revealed progressive exhaustion of effector and memory CD8 T cells over time, while a subset of memory and effector CD8 T cells persisted. Our analysis of CD8 T-cell distribution within the clubs echoed the findings: a majority of the clubs exhibited a simultaneous presence of effector, memory, and exhausted CD8 T-cell types, whereas a smaller subset of clubs only displayed the presence of effector and memory CD8 T cells (Appendix Fig. [117]S14). Specifically, we investigated the differentiation dynamics between PB-related activated T cells and PB-unrelated activated T cells. We applied trajectory and pseudotime analysis to effector CD8 T cells (Fig. [118]3G,H). We identified that PB-related effector CD8 T cells predominantly occupied the early stages, gradually transitioning into PB-unrelated effector CD8 T cells, and ultimately culminating in exhaustion (Fig. [119]3I,J). A similar trend was observed in PB-related memory CD8 T cells, where their numbers decreased over time. Some PB-related memory CD8 T cells transitioned into PB-unrelated memory CD8 T cells and persisted in the tumor, while others progressively progressed toward exhaustion (Appendix Fig. [120]S15). To elucidate the underlying mechanisms behind the transition from PB-related T cells to PB-unrelated T cells, we conducted a differential gene expression (DEG) analysis on effector (Fig. [121]3K) and memory CD8 T cell subsets (Fig. [122]3L) to uncover the distinctions between these two populations. Specifically, we observed the upregulation of CXCL13 in both PB-unrelated effector T cells and PB-unrelated memory T cells. CXCL13, a chemokine crucial for leukocyte trafficking and lymphoid tissue organization, has been implicated in T cell exhaustion. The upregulation of CXCL13 in PB-unrelated CD8 T cells may signify an indicative mechanism driving enhanced T cell exhaustion within these subsets (Dai et al, [123]2021; Yang et al, [124]2021). Additionally, we observed the downregulation of FGFBP2 and the upregulation of RPS26 in PB-unrelated effector T cells. FGFBP2, known for its involvement in regulating fibroblast growth factors (FGFs), negatively impacts T cells’ survival, proliferation, and antigen responsiveness (Chang et al, [125]2014). Conversely, RPS26 acts as a checkpoint regulating T-cell survival and homeostasis. The upregulation of RPS26 could represent an adaptive survival response aimed at preserving cellular functions and homeostasis within the TME (Chen et al, [126]2021). Collectively, the coordinated upregulation of both CXCL13 and RPS26, along with the downregulation of FGFBP2, may unveil a nuanced equilibrium between exhaustion and survival within PB-unrelated CD8 T cell subsets. It implies how the TME influences immune cell behavior, leading to a trend of exhaustion and struggle for survival in T cells. TCRclub discovers the mechanisms underlying response to Anti-PD1 therapy in BCC patients We conducted a study to analyze the response of T cells to therapeutic interventions in basal cell carcinoma (BCC) patients. The dataset used in this study comprises the paired scRNA-seq and scTCR-seq data of T cells obtained from site-matched tumors of BCC patients collected both before and after anti-PD1 therapy (Yost et al, [127]2019b; Data ref: Yost et al, [128]2019a). The dataset includes samples from a total of eleven patients, consisting of six responders and five non-responders to the therapy. We divided the dataset into four cohorts, responder-pre, responder-post, nonresponder-pre, and nonresponder-post, based on the patient’s response to Anti-PD1 therapy and the time of sample collection (pre-treatment or post-treatment). We explored the regression coefficient between pairwise expression distance and pairwise embedding distance calculated by TCRclub for the responder and nonresponder cohorts. Our analysis did not reveal a significant difference in these regression coefficients between the responder-post and nonresponder-post cohorts (Appendix Fig. [129]S16). However, TCRclub offers an alternative perspective on the origins of T-cell clone proliferation, unlike the previous discovery that suggested the proliferation did not originate from pre-existing tumor-infiltrating T cells (Yost et al, [130]2019b). We defined four cell types based on the patient’s response to therapy and whether the TCRs exclusively emerged in the post-treatment sample. For responders, we classified T cells as “Respost-PD1” cells if their CDR3β sequences were only detected in the post-treatment sample. Otherwise, they were labeled as “Respre-PD1” cells. Similarly, the definitions “Nonpre-PD1” and “Nonpost-PD1” were used for the non-responders. We aimed to explore whether the post-PD1 cells were related to the pre-PD1 cells. Consequently, we re-assigned the cell-type label for each T-cell clone based on the majority voting results of its corresponding club. The clustering results were visualized using UMAP (Fig. [131]4A). Interestingly, we observed that in both responders and non-responders, a subset of T cells changed their labels due to clustering with another cell type. However, this change was more pronounced in responders (Fig. [132]4B,C). Specifically, a high proportion of T cells initially labeled as post-PD1 in responders were re-labeled as pre-PD1 cells, indicating that more post-PD1 cells were related to the pre-treatment batch in responders than non-responders. Figure 4. Investigating the mechanisms of response to Anti-PD1 therapy in BCC patients using TCRclub. [133]Figure 4 [134]Open in a new tab (A) UMAP plots of T cells colored according to the original cell-type labels. The upper plot displays the original labels prior to applying TCRclub, while the lower plot shows the re-assigned labels after the TCRclub application. (B) UMAP plots of T cells colored according to the cell-type labels before and after applying TCRclub. The upper plot depicts the original Respost-PD1 labels before the TCRclub application, while the lower plot highlights the re-assigned Respost-PD1 cells in red after the TCRclub application. (C) UMAP plots of T cells colored according to the cell-type labels before and after applying TCRclub. The upper plot shows the original Nonpost-PD1 labels before the TCRclub application, while the lower plot highlights the re-assigned NonPost-PD1 cells in red after the TCRclub application. (A–C) share the same legend. (D) Heatmap illustrating the DEGs discovered by analyzing the post-PD1 T cells related to the pre-PD1 T cells in responders. The top 25 genes with the highest p values are depicted. (E, F), Sankey plot (E) and Bubble plot (F) presenting the pathway enrichment analysis for the DEGs identified in the Respost-PD1 T cells related to the Respre-PD1 T cells (Respost-PD1 T cells, n = 671; Nonpost-PD1 T cells, n = 450). Seven enriched pathways, sorted based on the −log10 p values, along with their associated DEGs in (D), are displayed. The p values were generated from the two-sided Wilcoxon rank-sum test. We extracted the post-PD1 cells related to the pre-PD1 cells in two cohorts: responders and non-responders, and identified differentially expressed genes (DEGs) in responder vs. nonresponder (Fig. [135]4D; Appendix Fig. [136]S17). Among the DEGs of responders, we obtained some clues concerning the mechanism of the response to Anti-PD1 therapy. For instance, HLA-A, a member of MHC class I genes, plays a critical role in presenting antigens to cytotoxic T cells. When HLA-A is highly expressed, it increases the likelihood of presenting tumor-specific antigens to T cells, thereby enhancing T-cell recognition and response against tumor cells (Krensky, [137]1997). We also found JAK1 and SLA2 genes in the DEGs of responders. They are involved in mediating the signaling of various cytokines and growth factors that regulate the activation and differentiation of immune cells (Wu et al, [138]2023; Albacker et al, [139]2017). The identified DEGs highlight the involvement of specific immune-related pathways, including antigen presentation and immune signaling, contributing to an effective response to PD1 therapy among responders. To further elucidate the functional implications of the DEGs in responders, we conducted pathway enrichment analysis (Appendix Fig. [140]S18) and ranked the results based on their −log10 p values. Seven pathways particularly related to immune response were selected (Fig. [141]4E,F). These pathways encompass critical aspects of immune signaling and antigen presentation. For instance, pathways such as [142]GO:0002479 and [143]GO:0042590 facilitate the generation and presentation of tumor antigens to T cells, the primary targets of PD1 therapy (Niu et al, [144]2021). The efficiency and quality of these pathways can significantly impact the recognition and elimination of tumor cells by the immune system. Noteworthy is the signaling pathway [145]GO:0038061, which regulates the expression of genes involved in immune response, inflammation, cell survival, and apoptosis. This pathway may modulate the activity and regulation of NF-kappaB, a transcription factor controlling the expression of PD-L1 and other immune checkpoints (Yi et al, [146]2022). TCRclub characterizes antigen-specific T-cell response and gene patterns in COVID-19 patients Understanding the strength and distribution of T-cell responses to SARS-CoV-2 is essential for elucidating the pathogenesis of COVID-19 and developing effective therapies. In this study, we analysed the paired scRNA-seq and scTCR-seq data from the dataset consisting of 41 peripheral blood mononuclear cell (PBMC) samples from COVID-19 patients with disease severity, including asymptomatic (AS) patients, moderate patients, severe patients, and severe recovery (SR) patients (Wang et al, [147]2022; Data ref: Zhang C, [148]2022). We used TCRanno (Luo et al, [149]2023) to predict the target antigens and virus for the T cells, and we adopted the club-based approach to determine the targeted antigens for each cell. T cells in a club were regarded as having the same antigen-specificity as this club’s majority voting antigen-specificity. Similarly, the club was assumed to be specific to the voted antigen. Previous studies (Chen and John Wherry, [150]2020; Wan et al, [151]2020; Liu et al, [152]2020) have reported a negative correlation between peripheral blood T lymphocyte counts and the severity of COVID-19. However, the extent of T cell response specific to SARS-CoV-2 among different patients remains to be clarified. We investigated the percentage of T cells specific to SARS-CoV-2 in relation to different severity levels. Specifically, we found a trend indicating that the symptomatic patients may exhibit a higher proportion of T cells specific to SARS-CoV-2 compared to those AS patients (Fig. [153]5A). We also found that the individuals aged over 60 remarkably showed a larger amount of T cells against SARS-CoV-2 (Fig. [154]5B). This observation suggests that patients with severe symptoms or older adults display an intensified and more pronounced immune response. It also implies that a higher count of T cells tailored to SARS-CoV-2 does not necessarily correlate with effective infection control or preventing severe symptoms. Multiple factors could contribute to this scenario, including a potential decline in overall immune function due to immunosenescence and the presence of cytokine storms (Del Valle et al, [155]2020; Nikolich-Zugich et al, [156]2020; Diao et al, [157]2020). Additionally, there was no such significant difference in the percentage of T cells targeted for SARS-CoV-2 between males and females or between long-term and short-term patients (Appendix Fig. [158]S19). We also compared the regression coefficients obtained by TCRclub across those cohorts and observed that the symptomatic patients tend to have higher regression coefficients than the AS patients. Besides, individuals aged over 60 displayed higher regression coefficients than those aged below 60 (Appendix Fig. [159]S20). Figure 5. TCRclub characterizes antigen-specific T-cell response and gene patterns in COVID-19 patients. [160]Figure 5 [161]Open in a new tab (A, B) Percentage of T cells specific to SARS-CoV-2 across different severity levels (A) and age groups (B). Number of samples: AS, n = 5; Moderate, n = 13; Severe, n = 11; SR, n = 12; Age <60, n = 20; Age >60, n = 21. The p values were generated from the one-sided Mann–Whitney U-test. (C, D) Correlation of the usage of various V genes (C) and J genes (D) between different severity levels for different antigens. MG denotes membrane glycoprotein. NP denotes nucleocapsid phosphoprotein, and SG represents surface glycoprotein. (E) Most frequent V-J gene pairs for T cells specific to ORF1ab in different severity levels. (F, G) Percentage of T-cell clubs specific to ORF1ab (F) and surface glycoprotein (G) among patients with different severity levels. Number of samples: AS, n = 4; Moderate, n = 11; Severe, n = 10; SR, n = 12 (F) and AS, n = 2; Moderate, n = 10; Severe, n = 6; SR, n = 10 (G). The p values were generated from the one-sided Mann–Whitney U-test. (H, I) Volcano plots of DGEs for T-cell clubs specific to ORF1ab (H) and surface glycoprotein (I) in severe patients, compared with AS and moderate patients. The p values were generated from the two-sided Wilcoxon rank-sum test. Number of cells: AS and moderate, n = 1291; Severe, n = 1318 (H); AS and moderate, n = 802; Severe, n = 457 (I). Data information: For boxplots in (A, B, F, G), The central band represents the median. The lower and upper hinges represent the first and third quartiles, respectively. Whiskers are drawn to the farthest datapoint within ±1.5 × IQR from the nearest hinge. To explore whether patients with varying severity levels exhibit similar preferences for V and J gene usage towards the same antigen, we