Abstract Background Identification of non-sputum diagnostic markers for tuberculosis (TB) is urgently needed. This exploratory study aimed to discover potential serum protein biomarkers for the diagnosis of active pulmonary TB (PTB). Method We employed Proximity Extension Assay (PEA) to measure levels of 92 protein biomarkers related to inflammation in serum samples from three patient groups: 30 patients with active PTB, 29 patients with other respiratory diseases with latent TB (ORD with LTBI+), and 29 patients with other respiratory diseases without latent TB (ORD with LTBI-). To understand the functional mechanisms associated with differentially expressed proteins, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses. Least absolute shrinkage and selection operator (LASSO) regression was employed to identify potential TB diagnostic protein biomarkers. Network interactions among the identified candidate diagnostic markers were then analyzed, and their diagnostic performance was evaluated using logistic regression and receiver operating characteristic (ROC) analysis. Result The analysis revealed 37 differentially expressed proteins (DEPs) in the active PTB group compared to both ORD with LTBI + and ORD with LTBI- groups. Gene Ontology analysis indicated that these DEPs were primarily involved in the inflammatory response, while KEGG enrichment analysis highlighted the cytokine-cytokine receptor interaction pathway as the top significant hit. LASSO regression identified eight promising candidate protein biomarkers: IFN-gamma, LIF, uPA, CSF-1, SCF, SIRT2, 4E-BP1, and GDNF. The combined set of these eight proteins yielded an AUC of 0.943 for differentiating active PTB from ORD with LTBI+, and an AUC of 0.927 for distinguishing PTB from ORD with LTBI-. Conclusion We have identified eight protein markers that reliably differentiate active PTB from ORD irrespective of LTBI presence. Further large-scale validation and translation of these protein markers into a user-friendly and affordable point-of-care test hold the potential to significantly enhance TB control in high-burden regions. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-024-10224-3. Keywords: Proximity extension assay, Serum protein markers, Tuberculosis diagnosis Background Tuberculosis (TB) is the second leading cause of mortality from a single infectious agent, despite being preventable and curable. An estimated 10.6 million new cases and 1.3 million deaths were attributed to TB in 2022 [[45]1]. Around one-fourth of the world’s population is estimated to be latently infected with Mycobacterium tuberculosis (M.tuberculosis) of which 5–10% are estimated to eventually develop active disease [[46]1, [47]2]. Ethiopia is one of the 30 high TB and TB-HIV burden countries with estimated incidence and mortality rates of 156 and 21 per 100,000 population, respectively [[48]1]. One of the most critical problems in global TB control strategy relates to case finding and diagnosis. More than 3 million TB cases were undetected and unreported worldwide in 2022 [[49]1], mainly due to inadequate diagnostics, which underscores the need to strengthen the capacity to diagnose the disease. Pulmonary TB (PTB), the most common and contagious form of TB, is diagnosed by identifying M. tuberculosis in a sputum sample, through either staining and microscopy, culture, or DNA detection. However, it can be difficult or even impossible to get sputum from all TB patients. Sputum scarcity contributes to delays in diagnosis for TB and HIV co-infected patients. This may lead to excess mortality in people living with TB-HIV coinfection [[50]3]. Sputum-based diagnosis has no benefit for Extrapulmonary TB patients, whose diagnosis remains challenging due to paucibacillary load in the biological specimens and localization of disease at sites that are difficult to access [[51]4]. Furthermore, sputum-based TB diagnosis has a low yield in children since most children cannot expectorate sputum and because of the paucibacillary nature of childhood TB [[52]5]. The development of the GeneXpert assay for M. tuberculosis DNA detection has been a significant advance in the TB diagnostic field. However, its performance on non-sputum samples is suboptimal [[53]6]. Tuberculin skin tests (TSTs) and interferon-gamma release assays (IGRAs) are widely available immunological tests for TB. However, they are non-specific, insensitive, and unreliable for distinguishing between latent TB infection (LTBI) and active TB disease [[54]7]. As a result, their utility is limited, particularly in regions with a high prevalence of TB. Delay in TB diagnosis associated with the current TB diagnosis approach has an impact on both disease prognosis at the individual level and transmission within the community [[55]8]. Therefore, there is a pressing need to identify and validate novel biomarkers for the diagnosis of active TB. Proteomics, the study of the collective set of proteins expressed by a cell or an organism at any given time, is being used to find new protein biomarkers for the early detection, diagnosis, and treatment monitoring of TB [[56]9]. Serum proteome is an attractive alternative for TB diagnosis because it is highly suitable for translation into lateral flow-based rapid test devices. Studies conducted to discover potential TB diagnostic protein biomarkers in serum and plasma have produced promising candidates, albeit with variable accuracy [[57]10–[58]16]. The TB target product profile set by the WHO prioritizes non-sputum biomarker tests with a minimum of 65% sensitivity/98% specificity for diagnosis tests and 90% sensitivity/70% specificity for triage tests [[59]17]. These requirements are met by some proteomics studies [[60]10–[61]13, [62]16], and results provide proof of principle that a diagnostic approach based on a proteomic signature can be applied to TB. The documented variety of protein markers, however, highlights the need for continued searching for potential biomarkers from different geographical regions to create consistent and accurate data and to facilitate transforming protein markers into readily available TB diagnostic tools. Mass spectrometry has been a commonly used platform for discovering TB diagnostic protein biomarkers [[63]9]. However, it has low sensitivity, particularly in the detection of low-abundance proteins. In addition, it can be labor-intensive and has a relatively low throughput for biomarker discovery [[64]18]. Affinity proteomics strategies such as Proximity Extension Assay (PEA) offer a powerful alternative for protein biomarker discovery as it has high sensitivity, multiplexing capability, accuracy, and ease of use [[65]19]. Explorations of PEA-based biomarker discovery have been increasing, particularly in areas where its high sensitivity and specificity for low-abundance proteins offer advantages over conventional mass spectrometry techniques. While PEA technology has been shown to successfully uncover biomarkers for various diseases [[66]20], its application for identifying novel diagnostic biomarkers for TB has been comparatively limited. Given the established sensitivity, specificity, and robustness, PEA holds significant potential for TB diagnostic biomarker discovery. Therefore, in the present study, we used PEA to identify serum protein biomarkers that distinguish patients with active PTB from those with other respiratory diseases (ORD). The study included patients with LTBI and those without LTBI to comprehensively evaluate the markers’ discriminatory power across various scenarios. The result reported here support the growing evidence that serum protein biomarkers can be used for TB diagnosis. Materials and methods Study population and setting We analyzed a case-control series of serum samples collected from patients with active PTB and ORD from primary healthcare clinics in Addis Ababa, Ethiopia. Adults between the ages of 18 and 70 years who presented with respiratory symptoms compatible with TB, including cough for more than two weeks plus at least one of the following conditions: fever, weight loss, hemoptysis, and night sweats, were included in this study. Both active PTB patients and those with ORD were recruited prospectively between January 2017 and August 2018. Individuals who were already taking anti-TB drugs had undergone immune suppressive therapy, were known to have alcohol or drug abuse issues, or had a hemoglobin level below 10 g/dl were excluded from this study. Data collection and sampling of clinical specimen The demographic and clinical data of the study participants were collected using structured questionnaires ([67]S1 Table). TB diagnosis was initially made by GenXpert MTB/RIF (Xpert) at diagnostic clinics and then confirmed by culture at the Armauer Hansen Research Institute (AHRI) laboratory. HIV testing was performed after pretest counseling as a part of routine care at the TB clinic. Six mL blood samples were collected directly into red top tubes and allowed to clot at room temperature for serum separation. Simultaneously, 4 mL of blood was collected in a lithium heparin tube for the QuantiFERON-TB Gold Plus (QFT_Plus) assay. Sputum was collected by asking the participant to cough and collect expectorated sputum in a sterile 50 mL falcon tube. Sample transport, processing, and storage Samples arrived at the AHRI laboratory within 2 h of collection, with blood transported at ambient temperature and sputum kept in ice packs. Blood samples were centrifuged at 2000 x g for 10 min, aliquoted, and frozen at -80 °C for further analysis. 25 µL aliquot of serum samples were shipped to the Science for Life Laboratory (SciLifeLab), Affinity proteomics unit, Stockholm, Sweden, on dry ice for proteomics analysis. Serum samples underwent no more than two freeze-thaw cycles before analysis. Whole blood collected in lithium-heparin tubes was aliquoted into QFT_Plus tubes (1 mL per tube). Following incubation for 16–24 h at 37 °C, the tubes were centrifuged at 2000 x g for 15 min. Supernatants were then harvested and stored at -80 °C until further analysis. IFN-gamma responses in the QFT supernatants were measured using the QFT_Plus ELISA kit following the manufacturer’s instructions. Mycobacterial culturing was performed on sputum samples following the procedure indicated in the Mycobacteriology Laboratory Manual [[68]21]. In brief, samples were decontaminated by the standard N-acetyl-L-cysteine and sodium hydroxide (NALC/NaOH) method with a final NaOH concentration of 1%. An equal volume of standard NALC/NaOH solution was added to the specimen and incubated for 15 min. After neutralization with PBS and 15 min centrifugation at 3,000 x g, the sediment was re-suspended in 1 mL of sterile PBS, inoculated on Lӧwenstein-Jensen (LJ) medium, and incubated at 37˚C. LJ slopes were examined weekly for up to eight weeks for any visible growth. Positive culture was confirmed by smear microscopy and Ziehl-Neelsen staining. Classification of study participants Participants in this study were classified as active PTB, ORD with LTBI+, and ORD with LTBI-. Active PTB was defined based on positive sputum Xpert results with subsequent confirmation by culture, ORD with LTBI + was defined as a positive QFT_Plus assay in the absence of clinical and microbiological evidence for active PTB, and ORD with LTBI- was defined as negative QFT_Plus assay in the absence of clinical and microbiological evidence for active PTB. Participants with ORD were followed for 2 months to confirm that cultures had not become positive, and were re-classified accordingly when needed. Figure [69]1 depicts a flowchart of participant groups and analyses conducted in this study. Fig. 1. [70]Fig. 1 [71]Open in a new tab Flowchart of study participant groups and analysis conducted in this study. The figure illustrates the classification of participant groups and the laboratory and data analyses performed in this study. TB: Tuberculosis; PTB: Pulmonary TB; ORD with LTBI+: Other Respiratory Diseases with Latent TB; ORD with LTBI-: Other Respiratory Diseases without Latent TB; DEPs: Differentially Expressed Proteins; GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes Pathway Enrichment; ROC: Receiver Operating Characteristic; PPI: Protein-Protein Interaction Serum proteomic profiling A total of 92 proteins were analyzed using the PEA inflammation panel from Olink Proteomics, Uppsala, Sweden ([72]www.olink.com). Proteins included in the Olink inflammation panel are available at [73]http://www.olink.com/inflammation and are also listed in [74]S2 Table. PEA technology has been well described previously [[75]19]. The assay was performed following the manufacturer’s instructions. Briefly, 1 µL of serum is mixed with a set of 92 pairs of antibodies that are linked to oligonucleotides. Upon binding to the target antigen, the DNA oligonucleotides hybridize with each other and form a unique PCR target and are subsequently amplified and quantified by microfluidic real-time PCR. The raw data are log2 transformed and finally returned as Normalized Protein eXpression (NPX) values. A high NPX value corresponds to a high protein concentration [[76]19]. One unit increase in NPX denotes a two-fold increase in the concentration of the protein of interest. The limit of detection (LOD) was determined as three times the standard deviation above the average value of negative controls. Samples were randomized across the plates and serum samples from active PTB and ORD were assayed on the same plate. Data was quality controlled and normalized using an internal extension control and an inter-plate control. Details about PEA technology, assay performance, and validation data are available at the manufacturer’s website ([77]www.olink.com). Analyses were performed by the Olink-certified lab personnel at Scilifelab, Affinity Proteomics Unit, Stockholm Sweden. Statistical and bioinformatics analysis Statistical analyses and graphing were generated using R version 4.3.2 and RStudio version 2023.12.1. Demographic and clinical data were tabulated using chi-square tests or Fisher exact tests as appropriate. NPX values were compared between the groups using a Welch Two Sample t-test and ANOVA F-test where appropriate. Differentially expressed proteins (DEPs) were visualized with volcano plots, which were generated using ggplot2 functions integrated into the OlinkAnalyze R package. Venn diagrams were plotted using the VennDiagram package to depict DEPs overlap across groups. To further explore the potential functions of DEPs, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were carried out using the Database for Annotation, Visualization and Integrated Discovery (DAVID) tool [[78]22, [79]23]. p values were adjusted with the Benjamini-Hochberg method (False Discovery Rate, FDR), with significance set at p < 0.05. Glmnet package was used for Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis with 10-fold cross-validations to select the smallest lambda with a statistic that gives the minimum mean square error. GraphPad Prism version 9.5.1 was used to depict the differences in serum protein levels among the groups. The protein-protein interaction (PPI) network analysis was performed using STRING database (version 12.0) [[80]24]. The pROC package was used for receiver operating characteristic (ROC) curve plotting and to assess the diagnostic efficacy of identified protein markers. Results Demographic and clinical characteristics of the study population Thirty patients with active PTB and 58 patients with ORD were included in this study. All active PTB cases were bacteriologically confirmed. All patients with ORD tested negative for sputum-Xpert and sputum-culture. These patients responded to broad-spectrum antibiotics and were ultimately diagnosed and treated for ORD by their physicians. The ORD group had a range of diagnoses, 28 (48.3%) had bronchitis, 12 (20.7%) had pneumonia, 7 (12.1%) had upper respiratory tract infection (URTI), 1 (1.7%) had allergic rhinitis, 1 (1.7%) had allergic rhinitis with bronchitis, 1 (1.7%) had URTI with pneumonia, 1 (1.7%) had lung fibrosis, and 7 (6.7%) had undefined diagnoses. Of the fifty-eight patients with ORD, 29 (50%) tested positive on the QFT_Plus test and were classified as having ORD with LTBI+, while 29 (50%) tested negative on the QFT_Plus test and were classified as having ORD with LTBI-. Low BMI was significantly more common in patients with active PTB (46.7%) compared to those with ORD with LTBI+ (14.3%) or ORD without LTBI- (20.7%). Fever (p = 0.040), weight loss (p = 0.006), and loss of appetite (p < 0.001) were all significantly more prevalent in the active PTB group than in the control groups. No differences were found in the number of individuals with BCG vaccination, previous history of TB, and HIV-positive status between the groups (p > 0.05). Detailed demographic and clinical characteristics of the study population are depicted in Table [81]1. Table 1. Demographic and clinical characteristics of the study participants All Study participants (N = 88) Patients with PTB (N = 30) Patients with ORD with LTBI^+ (N = 29) Patients with ORD with LTBI^_ (N = 29) p Value Age Mean (SD) 35.8 (10.9) 36.4 (10.8) 35.2 (10.7) 35.7 (10.9) 0.609 Sex Female 35 (39.8%) 11 (36.7%) 11 (37.9%) 13 (44.8%) Male 53 (60.2%) 19 (63.3%) 18 (62.1%) 16 (55.2%) 0.857 BMI < 18.5 24 (27.6%) 14 (46.7%) 4 (14.3%) 6 (20.7%) 18.5–24.9 49 (56.3%) 14 (46.7%) 15 (53.6%) 20 (69.0%) 0.009 25-29.9 12 (12.8%) 2 (6.7%) 7 (25.0%) 3 (10.3%) Inline graphic 30 2 (2.3%) 0 (0.0%) 2 (7.1%) 0 (0.0%) BCG scar Present 37 (42.0%) 17 (56.7%) 10 (34.5%) 10 (34.5%) 0.138 Previous TB history Yes 13 (14.8%) 6 (20.0%) 3 (23.1% 4 (13.8%) 0.920 QFT_result Positive 55 (62.5%) 26 (86.7%) 29 (100%) 0 (0.0%) Smoking History Yes 15 (17.0%) 10 (33.3%) 3 (10.3%) 2 (6.9%) 0.013 HIV Positive 8 (9.1%) 5 (16.7%) 0 (0.0%) 3 (10.3%) 0.069 TB suggestive symptoms Cough > 2 weeks 55 (88.7%) 27 (90.0%) 25 (86.2%) 26 (89.7%) 0.919 Fever 18 (31.8%) 14 (46.7%) 8 (27.6%) 6 (20.7%) 0.040 Night sweating 41 (46.6%) 15 (50.0%) 11(37.9%) 15 (51.7%) 0.577 Chest pain 49 (55.7%) 17 (58.6%) 14 (48.3%) 18 (60.0%) 0.640 Loss of appetite 44 (50.0%) 22 (73.3%) 10 (34.5%) 12 (41.4%) 0.006 Weight loss 31 (35.6%) 20 (66.7%) 8 (27.6%) 3 (10.7%) < 0.001 Sputum_Culture Positive 29 (33.0%) 29 (96.7%) 0 (0.0%) 0 (0.0%) Sputum_Xpert Positive 28 (31.8%) 28 (93.3%) 0 (0.0%) 0 (0.0%) Sputum_Smear Positive 23 (26.1%) 23 (76.7%) 0 (0.0%) 0 (0.0%) Scanty 1 (1.1%) 1 (4.3%) 1+ 5 (5.7%) 5 (27.7%) 2+ 10 (11.7%) 10 (43.5%) 3+ 7 (8.0%) 7 (30.5%) [82]Open in a new tab PTB: Pulmonary Tuberculosis; ORD with LTBI+: Other Respiratory Disease with Latent TB Infection; ORD with LTBI-: Other Respiratory Disease without latent TB Infection; SD: Standard Deviation; BMI: Body Mass Index; BCG: Bacille Calmette-Guérin; HIV: Human Immunodeficiency Virus; Scanty: 1–9 acid-fast bacilli (AFB) in 100 fields; 1+: 10–99 AFB in 100 fields; 2+: 1–10 AFB per field; 3+: >10 AFB per field. p-values are generated using Chi-square tests or Fisher’s exact tests, as appropriate Identification of differentially expressed proteins We employed the PEA to evaluate the expression levels of 92 inflammation-related proteins between the active PTB, ORD with LTBI+, and ORD with LTBI- groups. Proteins with more than 26% of samples with NPX values below LOD (n = 5) were excluded from further analysis. Three samples (two ORD with LTBI- and one PTB) were removed for failing the quality control check for all proteins analyzed. Following data pre-processing, we investigated the difference in serum protein level of 87 inflammation-related proteins between active PTB as compared to the ORD in the presence and absence of LTBI. A total of 44 DEPs were identified between active PTB and ORD with LTBI+ (p < 0.05), of which 39 were upregulated and 5 proteins were downregulated in the active PTB group (Fig. [83]2, [84]S3 Table). Similarly, a comparison between active PTB patients and ORD with LTBI- revealed 52 DEPs (p < 0.05), with 50 proteins upregulated and 2 downregulated in the active PTB group (Fig. [85]3, [86]S4 Table). Conversely, no significant DEPs were observed between ORD with LTBI + and ORD with LTBI-. Subsequently, we investigated whether positive smear microscopy results influenced serum inflammation marker levels within the active PTB group. However, within the active PTB group, no significant differences in DEPs were found between smear-positive and smear-negative cases. Fig. 2. [87]Fig. 2 [88]Open in a new tab Differentially expressed proteins between active PTB and ORD with LTBI+. The volcano plots display the difference in NPX values of serum protein level between active PTB and ORD with LTBI+. A horizontal dashed line indicates a p-value of less than 0.05. NPX: Normalized protein expression; PTB: Pulmonary Tuberculosis; ORD with LTBI+: Other Respiratory Diseases with Latent TB infection Fig. 3. [89]Fig. 3 [90]Open in a new tab Differentially expressed proteins between active PTB and ORD with LTBI-. The volcano plots display the difference in NPX values of serum protein level between active PTB and ORD with LTBI-. A horizontal dashed line indicates a p-value of less than 0.05. NPX: Normalized protein expression; PTB: Pulmonary Tuberculosis; ORD with LTBI-: Other Respiratory Diseases without Latent TB infection Functional analysis of differentially expressed proteins Initially, to specifically identify proteins differentially expressed in PTB, intersection analysis was conducted on DEPs between active PTB versus ORD with LTBI + and active PTB versus ORD with LTBI- following post hoc ANOVA analysis. This analysis revealed 37 proteins with significantly different serum protein levels in the active PTB group (Fig. [91]4). To understand the functional roles of these 37 DEPs, we conducted GO and KEGG pathway enrichment analyses. We limited the reporting to the top 15 statistically significant terms for each GO category (Biological Process [BP], Molecular Function [MF], and Cellular Component [CC]), and similarly, for KEGG pathway enrichment analysis, we presented the top 15 statistically significant pathways. The most significant GO terms for BP included inflammatory response, immune response, and neutrophil chemotaxis. The most significant MF terms were cytokine activity, chemokine activity, and growth factor activity. Additionally, prominent CC terms included extracellular region, extracellular space, and external side of the plasma membrane (Fig. [92]5A). The KEGG pathway enrichment analysis revealed that the identified DEPs are involved in various signaling pathways. Notably, key TB-related pathways like Cytokine-cytokine receptor interaction, Toll-like receptor signaling pathway, and Chemokine signaling pathway were found among the top 15 significant hits. However, the analysis also revealed pathways not directly linked to TB, such as Malaria and Rheumatoid Arthritis, within the top hits (Fig. [93]5B). Fig. 4. Fig. 4 [94]Open in a new tab A Venn diagram illustrating overlapping differently expressed proteins between the groups. The diagram displays the number of proteins that are differentially expressed in the PTB group after intersection analysis for protein expression between PTB Vs. ORD with LTBI + and PTB Vs. ORD with LTBI-. PTB: Pulmonary Tuberculosis; ORD with LTBI+: Other Respiratory Diseases with Latent TB infection; ORD with LTBI-: Other Respiratory Diseases without Latent TB infection Fig. 5. [95]Fig. 5 [96]Open in a new tab Functional enrichment analysis of differentially expressed proteins. (A) Gene Ontology (GO) analysis. The top 15 significant terms are shown for biological process (BP), molecular function (MF), and cellular component (CC). (B) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. The top 15 significant pathways are shown. Both GO and KEGG analyses were performed using the background of all annotated genes Candidate protein diagnostic marker identification To discover a set of potential protein biomarkers that can distinguish active PTB from ORD, in the presence and absence of LTBI, LASSO regression analysis was applied to the 37 identified DEPs. This analysis identified eight promising candidates: interferon-gamma (IFN-gamma), leukemia inhibitory factor (LIF), urokinase-type plasminogen activator (uPA), colony-stimulating factor 1 (CSF_1), stem cell factor (SCF), sirtuin 2 (SIRT2), eukaryotic translation initiation factor 4E-binding protein 1 (4E-BP1), and Glial cell line-derived neurotrophic factor (GDNF). Seven of these proteins were upregulated in the PTB group, while one was downregulated (Fig. [97]6). To understand the potential interactions within the 37 DEPs and 8 candidate biomarkers, we performed a PPI network analysis. Figure [98]7A presents the PPI network of the 37 DEPs, while Figure Fig. [99]7B illustrates the network among the 8 candidate proteins. Of the 8 candidate proteins, 6 show interactions among themselves, whereas 2 do not interact directly with each other or the others. Fig. 6. [100]Fig. 6 [101]Open in a new tab Distribution of protein expression levels for candidate biomarkers across groups. Box plot illustrating the difference in serum protein levels for the eight candidate protein biomarkers across the three groups. PTB: Pulmonary Tuberculosis; ORD with LTBI+: Other Respiratory Diseases with Latent TB infection; ORD with LTBI-: Other Respiratory Diseases without Latent TB infection; * p < 0.01; ** p < 0.001, *** p < 0.0001 Fig. 7. [102]Fig. 7 [103]Open in a new tab Protein-Protein Interaction (PPI) Network: (A) PPI network of the 37 differentially expressed proteins; (B) PPI network highlighting the eight candidate protein biomarkers. In the PPI network, black lines indicate co-expression, yellow lines represent automated text mining, green lines indicate known interactions from curated databases, purple lines represent experimentally determined known interactions, and light violet lines indicate protein homology. The figure is created using the STRING database We conducted ROC analysis using logistic regression to evaluate the diagnostic efficacy of these proteins in identifying active PTB. The ROC curve was constructed based on two distinct comparisons: active PTB versus ORD with LTBI + and active PTB versus ORD with LTBI-. The combined AUC was 0.943 (95% CI: 0.826–0.991) for PTB versus ORD with LTBI+ (Fig. [104]8A), with a sensitivity of 86% (95% CI: 68–96%) and a specificity of 97% (95% CI: 82–100%) at a probability cut-off of 0.5 or greater. For PTB versus ORD with LTBI- (Fig. [105]8C), the AUC was 0.927 (95% CI: 0.774–0.971), with a sensitivity of 86% (95% CI: 68–96%) and a specificity of 89% (95% CI: 72–98%) at a probability cut-off of 0.5 or greater. IFN_gamma emerged as the most accurate single marker for distinguishing PTB versus ORD with LTBI+, with an AUC of 0.908 (Fig. [106]8B), while uPA exhibited the highest performance (AUC 0.879) for PTB versus ORD with LTBI- ( Fig. [107]8D). Fig. 8. [108]Fig. 8 [109]Open in a new tab Receiver operating characteristic (ROC) curve analysis: (A). Diagnostic performance of the combined set of 8 protein biomarkers to distinguish active pulmonary tuberculosis (PTB) from other respiratory diseases in patients with latent tuberculosis infection (ORD with LTBI+). The area under the curve (AUC) is indicated; (B). Diagnostic performance of each protein marker in distinguishing active PTB from ORD with LTBI+. Each curve has its corresponding AUC value; (C). Diagnostic performance of the combined set of 8 protein biomarkers to distinguish active PTB from ORD in patients without LTBI (ORD with LTBI-). The AUC is indicated; (D). Diagnostic performance of each protein marker in distinguishing active PTB from ORD with LTBI-. Each curve has its corresponding AUC value Discussion The identification of reliable non-sputum biomarkers is crucial for the early and accurate diagnosis of TB. In this study, we used PEA to uncover potential diagnostic protein biomarkers in serum that could identify active PTB in a high TB epidemic setting. Our results support the growing body of evidence suggesting that serum protein profiling can be a valuable approach for diagnosing TB. The use of PEA enhances our ability to detect and interpret subtle variations in low-abundance proteins, which conventional proteomic approaches such as mass spectrometry may overlook [[110]19, [111]25]. Such proteins could prove pivotal in identifying distinctive protein markers for PTB diagnosis. Of the 92 inflammation-related proteins analyzed in this study, a total of 37 proteins were differentially expressed in the PTB group, compared to the ORD with LTBI + and ORD with LTBI-. Gene ontology and KEGG analysis of these DEPs shed light on the underlying biological processes and inflammatory pathways potentially associated with PTB. KEGG pathway analysis showed a diverse array of pathways associated with our DEPs, with key TB-associated pathways such as cytokine-cytokine receptor interaction, Toll-like receptor signaling pathway, and chemokine signaling pathway ranking among the top 15 significant hits. These findings align with the established involvement of these pathways in M. tuberculosis recognition, orchestration of immune responses, and granuloma formation [[112]26–[113]29]. Nonetheless, pathways not directly linked to TB, but to other diseases such as malaria and rheumatoid arthritis, were also included among the top 15 significant hits. This observation may stem from shared molecular mechanisms in host responses to different pathogens or the multifaceted effects of DEPs in various disease scenarios. Relying on individual circulating proteins for TB diagnosis through traditional serological methods has proven unreliable and imprecise. This approach frequently yields high rates of both false-positive and false-negative results, prompting the WHO to advise against its use [[114]30]. As a means to enhance the sensitivity and specificity of TB diagnosis, analyzing combinations of multiple protein biomarkers, rather than single ones, is proposed [[115]31]. In the present study, we examined 92 host serum protein markers and identified eight promising candidates, including IFN-gamma, LIF, uPa, CSF-1, SCF, SIRT2, 4E-BP1, and GDNF. This combined set of proteins effectively distinguishes PTB from ORD with LTBI + with an AUC of 0.943, a sensitivity of 86%, and a specificity of 97% at a probability cut-off of 0.5 or higher, and PTB from ORD with LTBI- with an AUC of 0.927, a sensitivity of 86%, and a specificity of 89% at a probability cut-off of 0.5 or higher. IFN-gamma is a widely investigated TB biomarker, with its level increasing during active TB [[116]32, [117]33]. IFN-gamma serves as the primary cytokine in mediating protection against M. tuberculosis by stimulating macrophages and enhancing their oxidative defense mechanisms. Macrophages activated by IFN gamma demonstrate the capability to overcome the blockage of phagolysosomes through apoptosis, thus thwarting the survival of pathogenic mycobacteria [[118]34]. A previous study documented the regulation of LIF and SCF during TB infection, reporting significantly elevated expression levels in plasma obtained from stimulated whole blood cells in both active TB and LTBI groups compared to non-TB controls [[119]35]. Notably, our study using unstimulated serum showed an upregulation of LIF but a downregulation of SCF in the active PTB group. This discrepancy might be due to the difference in sample type (stimulated whole blood vs. unstimulated serum) and the potential impact of stimulation on protein expression. LIF, a pleiotropic cytokine belonging to the IL-6 superfamily, exhibits diverse functions in a highly context-dependent manner, significantly impacting various physiological and pathological processes across different cell types and tissues [[120]36]. A mouse model study illustrated that elevated levels of LIF protein expression occur during respiratory syncytial viral infection, suggesting a potential protective role [[121]37]. Additionally, in vitro experiments have shown increased LIF expression following cell sensitization with M. tuberculosis in H37RV [[122]38]. SCF, also known as Kit Ligand (KITLG), regulates various cellular functions including cell proliferation, differentiation, and apoptosis [[123]39]. Prior research has highlighted the necessity of down-regulating SCF production for the survival of mast cells within tissues [[124]40]. The down-regulation of SCF observed in active PTB patients in our study may have implications for the function of tissue-resident mast cells. Despite extensive exploration of immune cell involvement in TB infection, mast cells have received comparatively less attention. However, these cells, predominantly found in the lungs and other peripheral tissues, possess the capacity to directly detect M. tuberculosis in antigens, suggesting a potential role in shaping the host’s immune response [[125]41, [126]42]. Animal study has demonstrated a substantial increase in lung mast cells following BCG inoculation [[127]43] and similarly, a significant increase in mast cells has been reported in TB lymphadenitis [[128]44]. This suggests that alterations in SCF levels could potentially influence mast cell dynamics and, consequently, impact the host’s immune response to TB infection. The upregulation of uPA observed in active PTB patients in our study is consistent with previous reports showing increased uPA and its receptor (uPAR) levels in active TB patients [[129]45–[130]47]. Furthermore, infection with M. tuberculosis has been found to induce high levels of uPA expression in the lungs of infected animals [[131]48]. This evidence suggests that uPA may play a role in TB pathogenesis, potentially by impacting inflammation and immune response through established functions in the fibrinolytic pathway, immune cell migration, and adhesion [[132]49, [133]50]. CSF-1, also upregulated in the active PTB group in our study, is recognized for its critical role in macrophage differentiation, proliferation, and survival [[134]51] which are the primary cells involved in engulfing and eliminating M. tuberculosis. However, the role of CSF-1 in TB infection appears multifaceted, potentially serving as a double-edged sword. On the one hand, CSF-1 may promote the adaptive immune response by increasing the expression of CCR7 and MHC class II molecules [[135]52]. On the other hand, it has been linked to the creation of M2-polarized macrophages, which exacerbate M. tuberculosis infection [[136]53]. Our study revealed a significant upregulation of both SIRT2 and 4E-BP1 in PTB patients. SIRT2 is a member of the sirtuin family of proteins, known for its role in regulating various cellular processes such as energy metabolism, cell aging, and genomic stability [[137]54]. While prior research has suggested the protective role of SIRT2 against certain bacterial infections, such as Shigella infection [[138]55], it can also facilitate bacterial infection, as evidenced in Listeria infection [[139]56]. Notably, SIRT2 inhibition has been linked to reduced M. tuberculosis burden, suggesting a possible role in promoting M. tuberculosis persistence within the host [[140]57]. Similarly, the upregulation of 4E-BP1 in a cell signifies activated mTOR signaling, a pathway known to promote M2 macrophage polarization and hinder their ability to eliminate bacteria [[141]58]. Inhibition of SIRT2 and mTOR has emerged as a potential host-directed treatment for TB [[142]59, [143]60]. This underscores a distinguishable advantage of protein biomarker detection in TB, not only as a diagnostic marker but also in identifying potential therapeutic targets. It is noteworthy that in our PPI analysis, SIRT2 and 4E-BP1 do not interact with each other or with the other six candidate markers reported in this study. However, these proteins hold individual significance in TB diagnosis, reflecting diverse pathways or mechanisms underlying TB pathogenesis. GDNF protein, primarily known for its role in maintaining and promoting neuronal survival, also plays a significant role in inflammation and infection [[144]61]. GDNF was upregulated in the PTB group in this study. Previous studies have reported dysregulation of GDNF in various disease contexts, including asthma, S. pneumoniae infection, and hepatitis [[145]62–[146]64]. However, the specific role of GDNF in TB remains unexplored. Therefore, further research is required to fully understand the impact of neuroimmune crosstalk in TB diseases, potentially shedding light on the diagnostic significance of GDNF. A notable disparity exists between the potential diagnostic markers identified in our study and those previously reported by other researchers. Chegou et al.. utilized Luminex to identify a seven-marker set consisting of c reactive protein (CRP), transthyretin (TTR), IFN-gamma, complement factor H, apolipoprotein-A1 (APOA1), IP-10, and serum amyloid A (SAA) with a sensitivity of 93.8% and specificity of 73.3%, meeting the criteria for WHO TPP for triage test [[147]10]. On the other hand, Jacobs et al.. utilized a similar platform and identified a combination of six markers, including neural cell adhesion molecule, serum amyloid P, IL-1β, sCD40L, IL-13, and APOA1, which exhibited a sensitivity of 100% and specificity of 89.3%, with 100% accuracy [[148]12]. De Groote et al.. identified six marker signatures, namely SYWC, kallistatin, complement C9, gelsolin, testican-2, and aldolase, capable of discriminating TB from non-TB patients with 90% sensitivity and 80% specificity, utilizing the SOMAscan proteomics platform [[149]11]. Garay-Baquero et al.. used mass spectrometry and identified 5-protein markers, FHR5, LRG1, CRP, LBP, and SAA1, which exhibited discriminatory power in distinguishing TB from ORD with AUC = 0.81 [[150]15]. Schiff HF et al.. used mass spectrometry for discovery and PEA for validation and reported 6-protein markers, FCGR3B, FETUB, LRG1, ADA2, CD14, and SELL, with an AUC of 0.972 for differentiating TB from healthy controls and an AUC of 0.930 for differentiating TB from ORD [[151]16]. Mousavian et al. used PEA and reported 12 plasma protein markers (IFN-gamma, IL-6, CDCP1, CXCL9, MMP-1, MCP-3, CCL19, CD40, VEGFA, IL-7, IL-12B, and PD-L1), which were highly enriched in PTB and extrapulmonary TB, and were further associated with disease severity [[152]65]. Clearly, there is variability in the reported candidate protein diagnostic biomarkers. This disparity can likely be attributed to methodological differences employed by various researchers, including the proteomics platforms used, the sample variations (serum or plasma), and the types of control groups included in these studies. The ideal control group for a PTB diagnostic biomarker discovery study would include individuals with ORD, both in the presence and absence of LTBI. However, some of the studies discussed above tend to exclude patients with LTBI or do not clearly explain the LTBI status of the control group [[153]10–[154]12, [155]16]. This can result in not capturing potential confounding factors, which may raise questions about the specificity of the markers for use in high TB endemic areas where there is a high background presence of both ORD and LTBI. There is a noticeable difference between hypothesis-driven and unbiased proteomics approaches. Targeted platforms often quantify low-abundance proteins, while untargeted methods more readily identify abundant ones [[156]9, [157]25]. Furthermore, a study has shown variations in protein profiles when analyzing the same samples using different platforms [[158]25]. However, targeted and unbiased proteomics approaches offer complementary strengths. Leveraging the complementary strengths of these platforms in future biomarker discovery efforts holds significant promise for a more comprehensive approach [[159]9, [160]25]. In addition to methodological factors, genetic heterogeneity in both the host and M. tuberculosis strains can contribute to the observed variations in reported candidate protein biomarkers [[161]66]. A key strength of this study is the enrollment of both PTB cases and control participants from the same TB-endemic areas. More importantly, the control participants in this study were patients with ORD which enabled us to specifically identify candidate protein markers for TB diagnosis in high TB transmission areas, where distinguishing TB from ORD cases is challenging. However, the study has some limitations. First, the sample size is relatively small. While this constraint did not hinder the identification of statistically significant differences in serum protein levels between active PTB cases and ORD in our exploratory analysis, it underscores the necessity for further validation of the identified candidate protein markers in a larger study. Second, the study focused solely on adult, bacteriologically confirmed PTB patients. The utility of the identified protein markers in more challenging diagnostic scenarios, such as pediatric TB and extrapulmonary TB, needs further evaluation. Third, since we only included baseline samples collected before treatment, our analysis focused solely on the diagnostic potential of the identified markers. Exploring their value as therapeutic targets would be a valuable direction for future research. Conclusion In conclusion, our study has identified eight host protein biomarkers, some of which have not been previously investigated in the context of TB, that effectively distinguish active PTB from ORD despite the coexistence of LTBI. The diagnostic performance exhibited by these protein markers shows encouraging accuracy for blood-based TB diagnosis. Further large-scale validation and translation of these protein markers into a user-friendly and affordable point-of-care test hold the potential to significantly enhance TB control in high-burden regions. Electronic supplementary material Below is the link to the electronic supplementary material. [162]Supplementary Material 1: Questionnaire^ (22.2KB, docx) [163]12879_2024_10224_MOESM2_ESM.docx^ (22.7KB, docx) Supplementary Material 2: List of proteins included Olink target 96 inflammation panel. The table includes the names of all 92 proteins analyzed and their corresponding UniPot ID [164]12879_2024_10224_MOESM3_ESM.xlsx^ (18.1KB, xlsx) Supplementary Material 3: Differentially expressed proteins between PTB and ORD with LTBI+. The table includes a list of proteins exhibiting significant differences in serum protein levels between PTB and ORD with LTBI+, along with the corresponding test statistics and NPX values [165]12879_2024_10224_MOESM4_ESM.xlsx^ (18.5KB, xlsx) Supplementary Material 4: Differentially expressed proteins between PTB and ORD with LTBI-. The table includes a list of proteins exhibiting significant differences in serum protein levels between PTB and ORD with LTBI-, along with the corresponding test statistics and NPX values Acknowledgements