Abstract Accurate early diagnosis is essential for preventing diseases and improving cure and survival rates. There are no reliable early‐diagnosis biomarkers for most major diseases. Here, esophageal squamous cell carcinoma (ESCC) is used as a disease model to develop a platform for detecting a panel of proteomic biomarkers for accurate early diagnosis by integrating a barcode immunoassay biochip with machine learning. The biochip captures small extracellular vesicles (EVs) from serum, lyses them in situ, and quantifies multiple proteins, including membrane and internal proteins of EVs. It is utilized to test 273 clinical samples across multiple centers. The validation sets are then analyzed using machine learning, resulting in a precise diagnostic model for ESCC. This model, based on nine diagnostic protein biomarkers identified through mass spectrometry analysis of differentially expressed proteins, achieves an accuracy of 91.0% in external validation, with a 90.8% accuracy in detecting early‐stage ESCC. These results significantly surpass the accuracy (only 14.4%) of the currently used biomarker for squamous cell carcinoma. Thus, integrating extracellular vesicles protein analysis with machine learning presents can identify ESCC patients. The developed extracellular vesicles analysis platform offers a promising tool for the clinical application of multi‐biomarker detection methods, advancing the early diagnosis of ESCC. Keywords: early diagnosis, machine Learning, microfluidic platform, small extracellular vesicles proteins __________________________________________________________________ This study develops a diagnostic platform for early detection of esophageal squamous cell carcinoma (ESCC) by integrating a barcode immunoassay biochip with machine learning. The biochip captures and analyzes small extracellular vesicle proteins from serum samples, identifying nine biomarkers linked to ESCC. Validation of 273 clinical samples achieves 91% accuracy, significantly outperforming existing diagnostic methods. graphic file with name ADVS-12-e06167-g007.jpg 1. Introduction Malignant tumors are one of the biggest killers of humans. For instance, esophageal cancer (EC) is ranked as the seventh most common malignant tumor globally, with squamous cell carcinomas accounting for 90% of cases and exhibiting a high mortality rate that significantly impacts patients' life quality.^[ [52]^1 , [53]^2 ^] While some traditional screening methods, for instance, endoscopic screening, reduce the incidence of EC, these methods are invasive, costly, and require a skilled operator.^[ [54]^3 , [55]^4 , [56]^5 ^] Additionally, tissue biopsies are limited by the challenges of obtaining samples and are subject to sampling bias due to the temporal and spatial heterogeneity of tumors.^[ [57]^6 ^] Consequently, less invasive methods are desired to screen a larger portion of the population.^[ [58]^7 ^] Liquid biopsies are minimally invasive, safe, and can overcome the difficulties associated with intra‐tumor heterogeneity.^[ [59]^8 , [60]^9 ^] Cell‐free DNA (cfDNA) and free proteins are the most commonly used markers for liquid biopsies; however, circulating cfDNA and free proteins have less than 20% sensitivity for early‐stage EC.^[ [61]^10 , [62]^11 , [63]^12 ^] Small extracellular vesicles, a subset of extracellular vesicles (EVs) smaller than 200 nm, are ubiquitously found in the interstitial space of tissues and body fluids, carrying molecular fingerprints indicative of their cellular origin.^[ [64]^13 , [65]^14 ^] Compared to other liquid biopsy markers, such as circulating tumor cells, small EVs are more abundant in blood, contain more diverse information, and are more stable due to their intact membrane structure.^[ [66]^15 ^] In various cancer types, including colorectal cancer (CRC),^[ [67]^16 ^] pancreatic ductal adenocarcinoma,^[ [68]^17 , [69]^18 ^] prostate cancer,^[ [70]^19 ^] and ovarian cancer,^[ [71]^20 ^] RNA and proteins carried by small EVs can serve as promising biomarkers, providing reliable information for tumor diagnosis and prognosis. Small EVs membrane proteins are primarily involved in functions such as EVs formation, intercellular communication, and targeting specificity.^[ [72]^21 ^] In contrast, most proteins inside small EVs are derived from the physiological or pathological state of the parent cell. This allows the internal proteins of small EVs to reflect intracellular biological processes and signaling mechanisms, thereby offering valuable insights into the role of small EVs in intercellular information transfer.^[ [73]^22 ^] Therefore, a comprehensive analysis of both membrane and internal proteins of small EVs can yield more holistic and precise biological data, which is crucial for evaluating the overall physiological state of the organism. Mass spectrometry can screen both small EVs membranes and internal proteins and can identify thousands of proteins. However, it takes mass spectrometry days to conduct the detection, is costly, and needs a large amount of sample volume, which limits its applications in clinical large‐scale screening. The combination of mass spectrometry and other detection techniques is promising for clinical applications, since mass spectrometry provides a list of differentially expressed proteins, and another detection technique conducts the fast, economic, and sensitive detection of small volume of clinical samples. Biochips have gained recognition for their advantages in detection speed, economic cost, and low sample consumption.^[ [74]^23 , [75]^24 ^] Currently, small EVs capturing platforms based on biochips demonstrate high specificity and sensitivity.^[ [76]^25 , [77]^26 , [78]^27 , [79]^28 , [80]^29 ^] However, existing biochip platforms for small EVs marker detection primarily focus on small EVs surface proteins, lacking a rapid platform for the comprehensive detection of both the membrane and internal proteins of small EVs. Here, using esophageal squamous cell carcinoma (ESCC) as a model, we developed an small EVs proteins detection platform integrating mass spectrometry and a barcode immunoassay biochip to investigate both membrane and internal proteins in patients and identify protein markers (differentially expressed in cancer patients and healthy control) with diagnostic potential through serum small EVs proteome. The mass spectrometry identified over 2000 proteins and provided potential protein markers for early diagnosis. The biochip captures small EVs from serum, lyses them in situ, and quantitatively analyzes multiple potential protein markers identified by mass spectrometry in clinical samples from two cohorts. Then, a reliable diagnostic model is developed for ESCC based on machine learning algorithms, which is validated in an independent clinical cohort. This platform enables high‐throughput and rapid analysis of multiple samples from multi‐centers, facilitating the translation of multi‐biomarker assays into clinical applications. 2. Results 2.1. Design of Small EVs Proteins Analysis for Early Diagnosis The general process of this study, including specific details on participant recruitment for each analysis, is presented in Figure [81]1 . In particular, we employed a liquid chromatography‐mass spectrometry (LC‐MS)‐based 4D data‐independent acquisition (4D‐DIA) method on 12 ESCC patients and 18 healthy controls (HC) to obtain proteomic data from their serum small EVs. Over 2000 small EVs proteins were recognized, and 14 potential markers were screened according to their differential expression and bioinformatics significance. The expression of these markers was then compared between 66 ESCC patients and 80 HC in Cohort 1. Machine learning techniques were utilized to explore the correlation between protein profiles and clinical phenotypes. A diagnostic model for ESCC was developed, and 9 of 14 potential markers were optimized to build the 9‐DM model, and its performance was evaluated in distinguishing ESCC patients from HC. Additionally, an external test set was employed to verify the robustness of the machine learning model, which has 67 samples, including 47 ESCC patients and 20 healthy persons. Figure 1. Figure 1 [82]Open in a new tab Schematic diagram of the small EVs protein panel for ESCC diagnosis. Abbreviations. DM: diagnostic markers; ESCC: esophageal squamous cell carcinoma; EVs: extracellular vesicles; HC: healthy controls; PC: principal component. 2.2. Screening of Candidate Small EVs Protein Markers To characterize the protein function of serum small EVs in ESCC, we performed 4D‐DIA mass spectrometry on small EVs isolated from the serum samples of 12 patients with ESCC and 18 HC (Figure [83]2A). In total, 2062 proteins were identified in the serum small EVs of ESCC patients, while 2370 proteins were detected in HC, among which 2053 proteins are common to both groups (Figure [84]2B). Principal component analysis (PCA) effectively separated ESCC samples from HC, suggesting that the serum small EVs proteome of ESCC underwent significant remodeling (Figure [85]2C). Figure 2. Figure 2 [86]Open in a new tab Proteomic landscape for ESCC and diagnosis relevance of functional protein modules. A) Flow chart of serum small EVs mass spectrometry. B) Venn diagram showing the overlap between the proteins identified in ESCC and HC serum small EVs. C) PCA plot displaying analyzable proteins in ESCC and HC serum small EVs. Each dot represents a sample, with blue dots for HC samples and red dots for ESCC samples. The % value indicates the explained variance. D) Volcano plot displaying differentially abundant proteins between ESCC and HC. Each dot represents a protein, with red dots for proteins significantly upregulated in ESCC and blue dots for proteins significantly downregulated in ESCC. The significance threshold is defined as upregulated in tumors (adjusted P < 0.05, log2FC > 0.58 (FC > 1.5)), downregulated in tumors (adjusted P < 0.05, log2FC < −0.58), or otherwise considered not significant. P values were calculated using the R package “limma” and adjusted using the Benjamini–Hochberg method. Two‐sided P values were calculated. E) Bubble plots showing the GO and F) KEGG pathway enrichment of ESCC and HC groups. The adjusted P < 0.05 is considered statistically significant. P values were calculated from KOBAS and adjusted using the Benjamini–Hochberg method. Two‐sided P values were calculated. Abbreviations. ESCC: esophageal squamous cell carcinoma; HC: healthy controls; PC: principal component; PCA: principal component analysis; Not sig: not significant. The differences between the common serum small EVs proteins of ESCC and HC were further analyzed using a fold change (FC) threshold of ≥1.5 and a significance level of P < 0.05. This analysis reveals that 203 proteins in the serum small EVs of ESCC are significantly upregulated compared to those in HC (Figure [87]2D). Additionally, Gene Ontology (GO) analysis of the differential proteins reveals that proteins in the serum extracellular vesicles of ESCC patients are significantly enriched in the pathways related to “Extracellular exosome,” “Protein binding,” and “Neutrophil degranulation” (Figure [88]2E). Furthermore, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway indicates significant upregulation in the “Ribosome”, “Chemokine signaling pathway” and “Endocytosis” pathways (Figure [89]2F). The above findings illustrate that multiple significant functional and metabolic pathways are activated in the serum small EVs of patients with ESCC. Based on the analysis of the serum small EVs proteomic differences between ESCC patients and HC, along with the top two GO pathways, 14 small EVs proteins are selected, which are significantly elevated in ESCC as candidate markers for subsequent diagnostic analysis (Figure [90]2G; Figures [91]S1 and [92]S2, Supporting Information). 2.3. Barcode Immunoassay Chip for Serum Small EVs Proteins Screening Efficient capture and detection of small EVs are crucial for the application of small EVs biomarkers. A barcode microfluidic platform is developed, which integrates small EVs capture, in situ lysis, and EV proteins detection to comprehensively analyze proteins both on the surface and inside small EVs, thereby improving the sensitivity of marker detection. Graphene oxide quantum dots (GOQDs) have demonstrated their strong capability to bind proteins and nucleic acids.^[ [93]^23 , [94]^30 , [95]^31 ^] Consequently, a polydimethylsiloxane (PDMS) chip with 60 reaction units is fabricated, and it is aligned with a GOQDs self‐assembled glass substrate.^[ [96]^31 ^] CD63, CD81, and CD9 are commonly used tetraspanin of small EVs that represent overlapping but distinct populations of small EVs.^[ [97]^22 , [98]^24 , [99]^25 , [100]^26 ^] As a result, a mixture of antibody CD63, CD81, and CD9 is immobilized on the GOQDs substrate in each reaction unit in order to improve the capture efficiency of small EVs (Figure [101]3A). Then, serum samples are loaded into each reaction unit, and small EVs are captured on the chip. Transmission electron microscopy (TEM) and nanoparticle tracking analysis (NTA) of the small EVs are shown in Figure [102]3B,C and Figure [103]S3 (Supporting Information). The size of small EVs is primarily concentrated below 200 nm. The capture efficiency of CD63, CD81, CD9 mixed antibodies is compared to that of three individual capture antibodies, and the mixed antibodies captured the largest number of small EVs both for cell line supernatant and serum (Figure [104]S4A,B, Supporting Information), prompting us to use mixed antibodies for small EVs capture. Figure 3. Figure 3 [105]Open in a new tab Small EVs analysis platform. A) Schematic of the chip for small EVs capture and in situ lysis. Capture antibodies for transmembrane proteins of small EVs are immobilized on a self‐assembled GOQD substrate to facilitate small EVs capture. Subsequently, in situ lysis of the small EVs is performed. This method enables the simultaneous detection of 60 samples, with each requiring 10 µL of serum. B) Representative transmission electron microscopy images of small EVs isolated with a lipid bilayer and cup‐shaped structure. Scale bar‐100 nm. C) Particle size distributions of small EVs measured by nanoparticle tracking analysis. D) Fluorescence scanning images demonstrate the chip's capture capacity at concentrations ranging from 10^8 to 10^10 particles mL^−1. Captured small EVs are detected using CD63 antibodies conjugated with APC, and scanning is performed with a fluorescence scanner with a 635 nm laser source. Calculation formula: [Initial fluorescence intensity/(Initial fluorescence intensity + Recovered fluorescence intensity)] ×100%. E) Fluorescence scanning images depict the lysis of small EVs at a concentration of 10^9 particles mL^−1. F) Schematic of the barcode immunoassay chip. The chip is constructed using GOQD‐assembled glass slides and a 60‐well PDMS microchamber layer. Antibodies for target proteins are micro‐patterned in a specified order to specifically capture target proteins from small EVs lysates. After incubation with detection antibodies conjugated with APC, fluorescence scanning is performed. G) The bar chart illustrates the detection limit of the 14 proteins. LOD = 3 ×δ /S, where δ refers to the relative standard deviation of the blank value and S refers to the slope in the linear regression equation. H) The scatter plot displays the stability of fluorescence readings for 14 proteins at a concentration of 1 ng mL^−1. Fluorescence reading spots were randomly taken on the barcode. Abbreviations. Ab: antibody; GOQD: Graphene oxide quantum dots; LOD: limit of detection. To evaluate the small EVs capture efficiency, a series of different small EVs concentrations were introduced into the reaction units. The small EVs were labeled by APC fluorescence, and the fluorescence signals were detected, which increased progressively from 10^5 to 10^10 particles mL^−1 (Figure [106]S4C, Supporting Information). By analyzing the fluorescence values between the captured small EVs and small EVs in the residue liquid, the small EVs capture efficiency is calculated, which reaches 91.9% at 10^10 particles mL^−1, 93.5% at 10^9 particles mL^−1, and 97.7% at 10^8 particles mL^−1 (Figure [107]3D). Through three independent experimental replicates, the platform consistently demonstrated EVs capture efficiencies exceeded 90% at concentrations ranging from 10^8 to 10^9 particles mL^−1 (Figure [108]S5, Supporting Information). Considering the typical small EVs concentration of ≈10^9 particles mL^−1 in healthy humans, a concentration of 10^9 particles mL^−1 from a cell line is utilized to optimize the incubation time. The amount of captured small EVs reaches the peak at 60 min of incubation at room temperature, and does not significantly improve the fluorescence signal of small EVs with extended incubation time (Figure [109]S4D, Supporting Information). As a result, the optimal incubation time for small EVs is 60 min. After capturing the small EVs, in situ lysis of small EVs is performed, and the amount of small EVs remaining on the chip is evaluated at different lysis times. Radioimmunoprecipitation assay (RIPA) buffer and protease inhibitors were added to the reaction units containing APC‐CD63 small EVs on the chip, and the decrease in fluorescence signal reflected the lysis process (Figure [110]3E). It is observed that the fluorescence intensity of small EVs gradually weakened over time, converging to a background level after 30 min of lysis (Figure [111]S4E, Supporting Information). As a result, 30 min are used to lyse captured small EVs. Subsequently, we constructed an antibody barcode chip by microprinting the capture antibodies for the target proteins onto the GOQD substrate. First, a PDMS chip with 14 parallel microchannels is fabricated by soft photolithography,^[ [112]^27 ^] and it is aligned with a GOQDs self‐assembled glass substrate,^[ [113]^27 ^] enabling the simultaneous printing of at least 14 capture antibodies barcode for capture of corresponding target small EVs proteins (Table [114]S1, Supporting Information). After capture antibodies barcode is printed on the GOQDs substrate, the microchannels PDMS chip is removed, and a PDMS chip for sample loading is aligned with the barcode chip (Figure [115]3F). The sample loading chip has 60 reaction units, and each unit covers a whole barcode array, which enables the simultaneous detection of 60 samples and 14 proteins of each sample. The lysis products were transferred to the antibody barcode chip for small EVs proteins detection, and the fluorescence values derived from the antibody barcode were recorded (Figure [116]S4F, Supporting Information). Furthermore, in control chips coated with BSA but without EVs‐capture antibodies, no specific signals of the target biomarkers were detected (Figure [117]S6, Supporting Information), demonstrating that our chip system achieved specific EVs capture while effectively preventing nonspecific binding of non‐vesicular components. Furthermore, the GOQD‐functionalized chips exhibited stable performance, demonstrating < 5% RSD in fluorescence signals during 24‐week storage at 4 °C. Antibody‐conjugated GOQD chips showed comparable stability with < 10% signal attenuation during 10‐month storage at −20 °C (Figure [118]S7, Supporting Information). First of all, 14 recombinant proteins at various concentrations are detected, and their quantitative curves are plotted as in Figure [119]S8 (Supporting Information). They present excellent specificity with each other (Figure [120]S9, Supporting Information). For the 14 small EVs proteins, the chip showed a detection limit of 0.01–96.61 pg mL^−1 (Figure [121]3G). To assess the uniformity of the barcode chip readings, the protein concentration was standardized at 1 ng mL^−1, and the 10 spots were randomly selected from the barcode fluorescent strip, which showed an error range of 2–7% of the mean reading value (Figure [122]3H). 2.4. Barcode Immunoassay Platform‐Based Clinical Sample Testing To validate the consistency between our microfluidic chip and mass spectrometry results, we performed parallel detection of the 14 candidate proteins in the same ESCC patient and healthy control cohorts using both platforms. Comparative analysis demonstrated strong concordance in differential expression patterns for the majority of proteins across platforms (Figures [123]S10 and [124]S11, Supporting Information). Based on the barcode immunoassay platform described above, we examine the 14 candidate proteins in the serum small EVs from Cohort 1 and Cohort 2. First, the two cohorts of ESCC patients and HC have a similar sex ratio (Figure [125]4A; Figures [126]S12A,C, Supporting Information), and the age distribution is predominantly over 50 years (Figure [127]4B; Figure [128]S12B,D, Supporting Information). Thus, potential interference due to age and gender differences between groups could be excluded. Additionally, there are no significant differences in the basic clinicopathologic characteristics of ESCC patients between the two cohorts (Figure [129]S12E, Table [130]S2, Supporting Information), indicating that the two clinical cohorts are comparable. Figure 4. Figure 4 [131]Open in a new tab 14 candidate diagnostic markers and SCC detection based on a microfluidic platform. A) The bar chart presents the gender ratio of ESCC and HC in two cohorts. B) The violin plot illustrates the age distribution of ESCC and HC in two cohorts. C) The heatmap displays the relative abundance of 14 proteins in Cohort 1 and Cohort 2. D) The bar chart presents the concentration of SCC in the serum of ESCC patients and HC. Each bar represents an individual sample, with red indicating ESCC and blue indicating HC. E) PCA plot displaying 14 proteins in ESCC and HC serum small EVs. Each dot represents a sample, with blue dots for HC samples and red dots for ESCC samples. The % value indicates the explained variance. F) ROC curve for SCC used to differentiate between ESCC and HC. Abbreviations. ESCC: esophageal squamous cell carcinoma; HC: healthy controls; PC: principal component; PCA: principal component analysis; AUC: area under the curve. The levels of these 14 proteins in the serum small EVs of ESCC patients in Cohorts 1 and 2 show significant differences compared to those of HC (Figure [132]4C). Specifically, the mean expression levels of PTX3, ICAM3, MMP9, CXCL6, BPIFB1, FST, RETN, EIF2S2, LCN2, and CRP are significantly higher in ESCC patients (Figures [133]S13–S16, Supporting Information). To further assess whether the markers screened offer advantages over existing clinical markers, the expression levels of SCC were detected, which is a commonly used squamous cell carcinoma marker. Using the microfluidic platform, the fluorescence intensity of standard proteins was measured at varying concentrations of SCC to create a standard protein curve (Figure [134]S17A, Supporting Information). The expression levels of SCC in the serum were then measured in 104 ESCC patients and 34 healthy controls (Figure [135]S17B, Supporting Information). And the concentration was calculated according to the measured fluorescence intensity and the standard protein quantitative curve. It is found that only 14.4% of ESCC patients have SCC expression levels exceeding 1.5 ng mL^−1 (Figure [136]4D). Furthermore, compared to HC, the average expression level of SCC in ESCC patients did not show a significant difference (Figure [137]S17C, Supporting Information). Additionally, PCA based on the fluorescence levels of the 14 screened proteins successfully distinguishes a more pronounced subgroup of ESCC patients from HC (Figure [138]4E). The diagnostic efficacy of the markers is evaluated using ROC curves, with the area under the ROC curves (AUC) for individual protein markers ranging from 0.52 to 0.88 (Figure [139]S18, Supporting Information), while the AUC for SCC was only 0.51 (Figure [140]4F). In conclusion, the 14 markers identified in this study demonstrate potential clinical diagnostic value for ESCC. 2.5. Construction and Validation of a Precise Diagnostic Model In order to develop an innovative diagnostic method for ESCC using candidate proteins, machine learning is employed to create models that precisely predict clinical status (Figure [141]5A). Initially, a model is constructed based on the 14 proteins using a random forest algorithm, and model parameters are optimized based on the combined scores of accuracy (ACC) and AUC (Figure [142]S19A–C, Supporting Information). The optimal values were set at ntree = 145 and mtry = 1. Figure 5. Figure 5 [143]Open in a new tab Machine learning‐derived prediction model based on serum small EVs proteins for ESCC diagnosis. A) Workflow design for constructing the diagnostic model. Feature selection and model training were performed using the random forest algorithm. The 14‐DM and 9‐DM models were validated in the internal test set (Test Set 1) and the external test set (Test Set 2). B) Predictive performance of the 14‐DM model in distinguishing ESCC (red) from HC (blue) in test set 1, and C) test set 2. The dashed line indicates a cutoff value of 0.50 for separately predicting HC (left) and ESCC (right). D) ROC curves for the 14‐DM model in diagnosing ESCC in test sets 1 and 2. E) Importance distribution plot of 14 biomarkers. Each point represents a protein biomarker, with the size of the point indicating the total number of nodes where the biomarker is used for splitting in the model. The X‐axis represents the average minimum depth of the variable, with smaller values indicating greater importance, while the Y‐axis represents the total number of trees in which the variable is used for splits. F) Line graph illustrating the model accuracy, AUC, and combined score of accuracy and AUC as different numbers of variables are included. G) Bar chart displaying the decline in accuracy of the 9‐DM model upon removal of each variable. H) The 9‐DM model predicts ESCC (red) versus HC (blue) in test sets 1 (G) and I) test 2. The dashed line indicates a cutoff value of 0.50 for separately predicting HC (left) and ESCC (right). J) ROC curves for the 9‐DM model in diagnosing ESCC in test set 1 and set 2. Abbreviations. AUC: area under the curve; DM: diagnostic markers; ESCC: esophageal squamous cell carcinoma; HC: healthy controls; PC: principal component. Cohort 1 was randomly split into a training set and a test set at a ratio of 7:3, while Cohort 2 served as an independent external test set, referred to as test set 2. To visualize the model performance, the graphs were generated to compare predicted values against actual disease status (ESCC or HC). With a cutoff value of 0.5 for classification, the 14‐diagnostic markers (DM) model accurately identified 63.2% of ESCC patients in test set 1 and 78.7% in test set 2 (Figure [144]5B,C). The model demonstrated 100% accuracy in the training set (Figure [145]S19D, Supporting Information), with accuracy rates of 79.1% in test set 1 (Figure [146]S19E, Supporting Information) and 82.1% in test set 2 (Figure [147]S19F, Supporting Information). The AUC values were 0.94 and 0.96 for the respective test sets (Figure [148]5D). To further optimize the model and reduce detection costs, the importance of variables in the 14‐DM model was ranked using the random forest algorithm. ELA2, CXCL6, MMP9, CRP, and BPIFB1 were ranked in the top five based on the average minimum depth and node counts (Figure [149]5E), and the other 9 proteins were applied to further analysis. As a result, the 9 protein variables yield the highest combined score based on ACC, AUC, and overall performance (Figure [150]5F). The selected variables are MMP9, CRP, CXCL6, FST, EIF2S2, ELA2, BPIFB1, PTX3, and CD16. Excluding the variable one by one results in varying degrees of accuracy decline, and it is observed that MMP9 presents the most significant decrease (Figure [151]5G), which means MMP9 plays an important role in the diagnostic model. An improved random forest model is constructed using the nine variables, optimizing parameters with n tree set to 400 and mtry to 1 (Figure [152]S20A–C, Supporting Information). This 9‐DM model accurately identifies 79.0% of ESCC patients in test set 1 and 91.5% in test set 2 (Figure [153]5H,I). Accuracy rates are 88.4% in test set 1 (Figure [154]S20D, Supporting Information) and 91.0% in test set 2 (Figure [155]S20E, Supporting Information), with AUC values of 0.96 and 0.95 for the respective test sets (Figure [156]5J). These results indicate that the 9‐DM model exhibits high sensitivity and reliability. To further validate the generalizability of the selected biomarkers and the 9‐DM model, an independent validation cohort (Cohort 3) consisting of 30 ESCC patients and 30 healthy controls was recruited at Qilu Hospital between January and March 2025. External validation results demonstrated consistent expression patterns of most candidate biomarkers with those observed in Cohorts 1 and 2 (Figures [157]S21 and [158]S22, Supporting Information), while the 9‐DM model maintained robust diagnostic performance with 83.3% classification accuracy (AUC = 0.93) (Figure [159]S23, Supporting Information), thereby confirming the clinical utility and reliability of our protein signature for ESCC detection. To assess its effectiveness in diagnosing early‐stage ESCC, the 9‐DM model is applied to differentiate between stage I‐II ESCC and HC in both test sets. The model successfully identified 85.7% of early ESCC patients and 93.2% of HC, resulting in an overall accuracy of 90.8% (Figure [160]S20F,G, Supporting Information) and an AUC of 0.96 (Figure [161]S20H, Supporting Information). In clinical settings, early detection of ESCC is vital for prompt intervention and radical resection, which can reduce surgical difficulties and significantly improve patient survival rates.^[ [162]^28 , [163]^29 ^] 2.6. The Specificity of Small EVs Protein Markers to Differentiate Cancers To evaluate the specificity of the 9‐DM model for ESCC diagnosis, serum samples were collected from patients with CRC, gastric cancer (GC), and breast cancer (BC). The gender and age distribution of patients is presented in Figure [164]S24 (Supporting Information). In the serum small EVs of CRC, GC, and BC, the expression levels of the 9 markers show significant differences compared to those in ESCC (Figure [165]6A; Figure [166]S25, Supporting Information). The PCA based on 9‐DM suggests that ESCC samples exhibit distinct characteristics when compared to other cancer types (Figure [167]6B). However, PCA using the 9‐DM fails to effectively differentiate between CRC, GC, and BC samples in relation to HC samples (Figure [168]S26A–C, Supporting Information). Notably, with the exception of CXCL6 and EIF2S2, the remaining seven markers were expressed at lower levels in the serum small EVs of CRC, GC, and BC patients (Figure [169]6C–K). Incorporating data from CRC, GC, and BC into the 9‐DM model completely failed to differentiate them from HC (Figure [170]S26D,E, Supporting Information). These findings underscore the specificity of the selected small EVs protein markers and the 9‐DM model in diagnosing ESCC. Figure 6. Figure 6 [171]Open in a new tab The specificity of ESCC small EVs proteins. Expression of ESCC biomarkers in other cancers. A) Heatmap showing the expression of 9 biomarkers in serum small EVs from HC and patients with ESCC, CRC, GC, and BC. B) PCA plot based on 9 biomarkers, comparing other cancers and ESCC. Each point represents a sample, with blue points indicating other cancers and red points representing samples from ESCC. The percentage values represent the explained variance. C) Box plot showing the average expression levels of 9 biomarkers in serum small EVs from patients with ESCC, CRC, GC, and BC. Abbreviations. BC: breast cancer; CRC: colorectal cancer; ESCC: esophageal squamous cell carcinoma; GC: gastric cancer; HC: healthy controls; PC: principal component; PCA: principal component analysis. 2.7. Tissue‐Based Validation of ESCC Biomarkers To further validate the accuracy of our selected biomarkers, we performed proteomic analysis of EVs derived from ESCC cell lines (KYSE150/KYSE510) and normal esophageal epithelial cell lines (HEEC). The results demonstrated significant overexpression of PTX3, EIF2S2, and FST in ESCC‐derived EVs, with fold‐change values exceeding 10 compared to HEEC controls. Subsequent immunofluorescence analysis of three paired ESCC and adjacent normal tissue samples revealed tumor‐specific expression patterns, with positive signals in ESCC tissues but only faint or negligible staining in histologically normal adjacent regions (Figure [172]S27, Supporting Information). These findings are consistent with previous reports documenting the tumor‐restricted expression profiles of these biomarkers.^[ [173]^32 , [174]^33 , [175]^34 ^] 3. Discussions and Conclusion In this work, although ESCC is taken as a model to test the application of the screened small EVs proteins panel to conduct early diagnosis, it is a universal method to screen the specific early‐diagnosis biomarkers, evaluate the biomarkers of treatment efficacy, and accurately prognosis biomarkers of other diseases. Early diagnosis can significantly improve patient prognosis.^[ [176]^26 , [177]^35 ^] Current endoscopic diagnostic and screening methods are limited, creating a need for new non‐invasive approaches to screen for and continuously monitor precancerous lesions.^[ [178]^1 ^] Although liquid biopsy has been explored in recent years, cfDNA has shown less than 20% sensitivity for the early diagnosis of EC.^[ [179]^10 , [180]^11 , [181]^12 ^] In contrast, small EVs are widely present in biological fluids and possess a double membrane structure, providing high stability.^[ [182]^22 ^] While several studies have demonstrated that the miRNAs in small EVs can differentiate tumor patients from healthy individuals, few have translated effectively into clinical applications.^[ [183]^36 , [184]^37 ^] Compared to RNA, small EVs‐derived proteins are more stable and therefore better suited for clinical examination.^[ [185]^38 ^] The increasing sophistication of proteomic assays in recent years has created an opportunity to understand protein function in both tumor patients and healthy individuals.^[ [186]^39 , [187]^40 ^] In this study, 4D‐DIA was used to identify and characterize all proteins in serum small EVs from patients and HC, revealing altered protein compositions in the small EVs of patients. Notably, important pathways such as small EVs composition, protein binding, and neutrophil degranulation are significantly upregulated, suggesting the functional status of their small EVs proteome. Additionally, pathways related to tumorigenesis, such as endocytosis, chemokines, and endothelial cell migration, are significantly elevated in the KEGG pathway analysis, indicating that small EVs proteins play a critical role in tumor formation and progression. This further supports the potential for identifying markers within the serum small EVs proteome for the diagnosis of cancer. Due to the inherent heterogeneity of tumors and the complexity of their microenvironments,^[ [188]^41 , [189]^42 ^] it is challenging for a single marker to accurately reflect overall changes in tumor characteristics. Consequently, the combination of multiple markers has become a prevailing trend in diagnostic and therapeutic approaches.^[ [190]^43 , [191]^44 ^] This trend places higher demands on detection methods, emphasizing the need for assays with low sample consumption and high throughput capabilities, which are invaluable for marker detection and cancer diagnosis. CD63, CD81, and CD9 are transmembrane proteins present in the small EVs membrane that represent incompletely overlapping small EVs populations.^[ [192]^22 , [193]^24 , [194]^25 ^] The microfluidic platform developed in this study utilizes a mixture of antibodies against CD63, CD9, and CD81 to specifically capture serum small EVs, achieving a capture efficiency of over 90% at concentrations of 10^8 to 10^10 particles mL^−1. The captured small EVs undergo in situ cleavage on the chip, yielding total small EVs proteins, including both membrane and internal proteins. This approach minimizes the loss of total small EVs proteins and enhances the overall sensitivity of the assay. The entire process requires only 10 µL of serum sample volume, significantly reducing sample loss. Furthermore, the barcode immunoassy platform enables the simultaneous detection of at least 14 markers across 60 clinical samples, significantly reducing the time required for multi‐sample analysis. This makes it particularly well‐suited for screening a diverse population. Notably, the detection limit of the chip for proteins is as low as 0.01–96.61 pg mL^−1, and fluorescence readings are relatively stable. Thus, the barcode immunoassy platform developed in this study holds substantial clinical application potential for the detection of small EVs proteins. While serum small EVs proteomics are suited to characterize proteins in diseases and identify promising biomarkers for diagnosis, interpreting complex histological data remains a challenge.^[ [195]^16 , [196]^45 , [197]^46 , [198]^47 ^] Machine learning is increasingly recognized as a valuable tool for enhancing the accuracy of medical monitoring and diagnosis. In this study, the random forest algorithm was utilized to select the optimal model by adjusting parameters based on ACC and the AUC. The accuracy of the diagnostic model reaches 79.1% in test set 1. To further improve accuracy and reduce detection costs, the best model, the 9‐DM, is identified through machine learning, achieving an accuracy of 88.4% in test set 1. Additionally, machine learning algorithms reveal the predictive potential of certain small EVs proteins that are often overlooked by traditional analytical methods. For instance, although CD16 and ELA2 did not exhibit significant differences when comparing the overall mean values between ESCC patients and HC, these two markers were retained in the model's selection of optimal variables. The machine learning algorithms used to create the diagnostic models were validated for generalization and demonstrated superior performance compared to clinically available squamous cancer markers. Specifically, Cohort 2 was validated as an independent external test set, where the 14‐DM and 9‐DM models achieved accuracies of 82.1% and 91.0%, respectively. Notably, the 9‐DM model maintained robust performance in the subsequent Cohort 3 (83.3% accuracy; AUC = 0.93), thereby confirming its generalizability across distinct patient populations. Furthermore, the 9‐DM model demonstrates strong performance in identifying early‐stage ESCC patients. In a cohort of stage I‐II ESCC patients and HC, the model accurately identified 85.7% of early ESCC patients, resulting in an overall accuracy of 90.8%, which surpasses existing clinical markers. Molecular characterization revealed significant overexpression of PTX3, FST, and EIF2S2 in ESCC‐derived extracellular vesicles (FC > 10), with tumor‐specific expression patterns confirmed by immunofluorescence. These findings collectively validate the model's clinical utility while elucidating molecular foundations, supporting its further development through expanded validation studies and algorithmic refinement for clinical implementation. Additionally, serum samples from patients with CRC, GC, and BC were tested in this study. Compared to ESCC, the proteins identified in these three cancer types generally exhibit lower expression levels. The 9‐DM model can accurately identify ESCC patients. This further substantiates the specificity of the selected markers and the diagnostic model for ESCC. Overall, our strengths lie in performing a proteomic analysis of serum small EVs and describing the functional landscape of serum small EVs proteins in patients. A set of straightforward protein markers has been screened, and a machine learning approach has been applied to construct a robust diagnostic model for ESCC, facilitating replication, optimization, and clinical application. Additionally, a barcode immunoassay platform has been developed that can utilize 10 µL serum samples for simultaneous high‐throughput assays involving multiple samples and parameters, which holds significant translational value for clinical applications. This study has several limitations that should be acknowledged. First, as our predictive model was developed using relatively quantitative proteomics data, its clinical application for ESCC risk assessment in new patients will require standardized quality control samples during model implementation. Despite these limitations, our work has successfully identified key EVs proteome markers that effectively discriminate ESCC from healthy controls, representing a crucial advancement toward clinically applicable diagnostic models. Second, the current platform requires manual data transfer of test results to the machine learning model, which may introduce operational complexity in clinical settings. Moving forward, we will pursue absolute quantitative proteomic analysis of these biomarkers in large‐scale multicenter cohorts to establish clinically relevant reference ranges and detection thresholds, while simultaneously working to develop a fully integrated automated platform that combines detection, data acquisition, and diagnostic output to enhance both clinical utility and operational efficiency. In conclusion, the proposed platform identifies biomarkers with potential diagnostic value and constructs a diagnostic model incorporating machine learning algorithms. Furthermore, a barcode immunoassay platform has been developed that integrates small EVs capture, their in situ lysis, and multiparameter detection, enabling the rapid analysis of multiple samples. The understanding of disease pathology is enhanced through these findings, which also facilitates early detection and develops new methods for multi‐parameter small EVs detection. More broadly, the machine learning‐based interpretation of small EVs data offers distinct advantages in tumor detection and clinical decision‐making, while the developed microfluidic assay platform holds significant translational prospects for practical clinical applications and may be extended to the study of other diseases. 4. Experimental Section Clinical Cohort Between August 2023 and May 2024, 66 patients with ESCC were recruited from Qilu Hospital of Shandong University, and 80 HC who underwent physical examinations were included, forming Cohort 1. Between July 2023 and September 2023, an additional 47 ESCC patients, treated at the Provincial Hospital of Shandong First Medical University, and 20 HC, were recruited, thus establishing Cohort 2. Furthermore, from January to March 2025, 30 ESCC patients and 30 HC were enrolled at Qilu Hospital to establish Cohort 3 for validation studies (Figure [199]S28, Supporting Information). Inclusion criteria: Participants were aged 18–85 years, had histologically confirmed ESCC, and were untreated. Exclusion criteria: Individuals with dual or multiple primary tumors; participants with immune disorders or autoimmune diseases; and those who had undergone organ transplantation, non‐autologous bone marrow transplantation, or stem cell transplantation were excluded. Inclusion criteria for healthy individuals: 1) Written consent obtained; 2) Age ranged from 18 to 85 years; and 3) Absence of malignant disease. The exclusion criteria mentioned above also applied to healthy individuals. All samples used in this study were retrospectively obtained from established institutional biobanks. The use of human materials was approved by the Medical Science Research Ethics Committee of Qilu Hospital, Shandong University (Approval Nos. KYLL‐2021(KS)‐011 and KYLL‐202409(YJ)‐024). All participants voluntarily signed the informed consent form. Collection of Clinical Samples Serum samples were collected from ESCC patients and HC. A 5 mL peripheral venous blood sample was obtained from each participant. The samples were promptly transferred to the laboratory within 1 h at 4 °C for further processing. Freshly collected blood samples were kept at room temperature (RT) for 1 h to allow clotting before centrifugation. The supernatant was then carefully removed by centrifugation at 4 °C and 1,000 g for 15 min and transferred to a clean test tube. To ensure complete removal of platelets and other debris, the samples underwent a second centrifugation at 4 °C and 10 000 g for 10 min. And the collected supernatant was stored at −80 °C. Collection and Isolation of Cell Supernatants Small EVs Cells were cultured in complete medium until reaching 70% confluence, at which point the medium was exchanged under fetal bovine serum (FBS)‐free medium. Supernatants were collected after 48 h of culture in FBS‐free conditions. The collected supernatants underwent serial centrifugation: first at 300 × g for 10 min, followed by 2000 × g for 10 min, and then at 10 000 × g for 30 min. Then the supernatants were passed through a 0.22‐µm filter and subjected to ultracentrifugation at 100 000 × g for 70 min. The supernatant obtained after centrifugation was designated as the small EVs‐removed fraction. The small EVs pellets were then resuspended in phosphate buffer saline (PBS) and underwent a second ultracentrifugation at 100 000 × g for 70 min to purify the small EVs. Nanoparticle Tracking Analysis Small EVs were resuspended and diluted 100‐ to 500‐fold to obtain a concentration of 20–100 particles per frame. Analysis was performed using the NanoSight NS300 system (NanoSight Technology, Malvern, UK), which features a 488 nm laser and a highly sensitive sCMOS camera. Following the manual introduction of small EVs into the chamber, each sample was measured in triplicate using a 13‐stage camera, with a 30‐s acquisition time and a detection threshold set to 7. NTA analysis software version 2.3 was used to analyze a minimum of 200 complete traces per video. Transmission Electron Microscopy Small EVs particles were treated with 2.5% glutaraldehyde for 10 min at RT. A drop of 5–10 µL of the small EVs’ suspension was then placed on a copper grid. Excess liquid was removed using filter paper, and the small EVs samples were washed three times with PBS, with each wash lasting 10 min. Finally, 10 µL of 1% uranyl acetate was applied for 1 min to negatively stain the small EVs on the copper mesh, which were then examined using a Tecnai G2 F20 transmission electron microscope. Serum Small EVs for Liquid Chromatography‐Mass Spectrometry (LC‐MS) and Data Independent Acquisition Serum small EVs were isolated and collected using the EVtrap kit. After isolation, the small EVs underwent lysis, protein extraction, reduction, alkylation, and trypsin digestion to generate peptide chains for mass spectrometry analysis. Next, peptides from each sample were subjected to mass spectrometry for DIA analysis. Additional details can be found in Figure [200]S29 (Supporting Information). Differential Protein and Pathway Enrichment Analysis Differential protein abundance between ESCC patients and HC was calculated using the “limma” package in R software. The significance level was set to an adjusted P‐value of < 0.05, with a log2 fold change (FC) threshold of > 0.58 (indicating upregulation) or < −0.58 (indicating downregulation in the tumor). Functional enrichment analysis was performed using KOBAS ([201]http://bioinfo.org/kobas) with default parameters. Fabrication of Microfluidic Platforms GOQDs substrate glass slides and PDMS microchamber layers were prepared following the same procedure previously reported.^[ [202]^31 ^] The microprinting chip with 14 parallel micorchannels was first bonded to GOQD‐functionalized glass slides, with this nanomaterial substrate demonstrating significantly enhanced antibody immobilization efficiency. Using a precision pipette, 3 µL of capture antibody solution was dispensed into each inlet of the independent microchannel, followed by a vacuum force to pump the loaded capture antibody from the inlet to the outlet at room temperature. The capture antibody was immobilized on the substrate surface in each corresponding microchannel. After the microprinting chip was removed, the substrate was treated with 1% BSA solution for surface passivation, followed by blocking with 3% BSA solution to saturate any remaining reactive sites outside the microchannels. Finally, the chips underwent rigorous washing, spin‐drying, and storage at 4 °C until use. The microprinting chip contained 14 zigzag‐patterned microchannels that maintain complete fluidic isolation. Each detection unit included a full circulation loop of all 14 microchannels (Figure [203]S30, Supporting Information). To capture small EVs, 4.5 µL of a mixed solution containing antibodies of CD63 (100 µg mL^−1), CD81 (100 µg mL), and CD9 (100 µg mL^−1) was added to each reaction unit and incubated the mixture overnight. Blocking was then performed using 3% BSA for 10 min, followed by successive washes with PBS and ultrapure water to remove unbound components. To enrich small EVs, 10 µL of serum sample was added to each reaction unit and incubated for 60 min. The supernatant was then removed, and the wells were rinsed with PBS. Subsequently, 5 µL of a mixture of RIPA lysis buffer and protease inhibitor (in a 100:1 ratio) was added to each reaction unit to lyse the captured small EVs in situ for 30 min. For antigen detection, the PDMS microchamber of the microarray chip was sealed with 1% BSA. The small EVs lysate was then transferred to the wells on the microarray chip and incubated for 45 min, after which the supernatant was removed. The microarray chip was incorporated into 1% BSA, and the wells were incubated with a detection antibody at a final concentration of 5 µg mL^−1 for 45 min, followed by incubation with APC for 30 min. The microarrays were thoroughly washed with PBS and distilled water and then shaken dry. The fluorescence intensity of the antigen was measured using a fluorescence scanner. The commercial antibodies and proteins used in this study were sourced from manufacturers such as R&D Systems, BioLegend, NOVUS, eBioscience, Abcam, and Santa Cruz Biotechnology. Microfluidic Platforms Data Analysis The detection process involves the following steps: 1) Signal Acquisition: The biochip was scanned using a 635 nm laser channel, and fluorescence signals from each barcode were quantified using GenePix Pro software, and the average fluorescence intensity from the detection area was achieved. 2) Data Normalization: Raw fluorescence intensity values were log10‐transformed prior to machine learning model input to normalize data distribution and enhance model performance. Diagnostic Predictive Modeling A diagnostic prediction model was developed for ESCC using R version 4.4.2, employing an integrated machine learning approach for multidimensional analysis. Methodologically, multiple R packages were systematically implemented: randomForest for random forest modeling, caret for data preprocessing and model tuning, pROC for performance evaluation, randomForestExplainer for feature importance analysis, ggplot2 for visualization, and Rtsne for high‐dimensional data dimensionality reduction. During model construction: data standardization and stratified sampling (70% training set/30% test set) were first performed using the caret package to ensure balanced data distribution. The random forest model was developed using the bootstrap aggregation algorithm implemented in randomForest, initially incorporating 14 plasma proteins as predictive variables. Through caret‐guided grid search validation, the model parameters were optimized to: ntree = 145 decision trees, mtry = 14 (number of features considered at each node split), and nodesize = 5 (minimum samples per terminal node). Parameter optimization was guided by both AUC values calculated by pROC and confusion matrix metrics generated by caret. Feature importance analysis was conducted using randomForestExplainer, which provided three complementary evaluation methods: 1) minimum depth analysis (min_depth_distribution) to identify the most classification‐critical features; 2) node purity improvement (measure_importance) to quantify each feature's contribution to model accuracy; and 3) root node splitting frequency to reflect global feature importance. Based on these analyses, the 9 most predictive protein biomarkers were selected from the initial 14 features for the final model. The validation phase employed a multidimensional evaluation strategy: the caret‐generated confusion matrix provided key metrics including accuracy, sensitivity, and specificity; pROC calculated AUC values from ROC curves; ggplot2 produced calibration curves; and Rtsne visualization demonstrated the separation pattern between ESCC patients and healthy controls in the protein biomarker space. The model outputs individualized prediction probabilities (range 0–1) through randomForest's predict function, with a diagnostic threshold set at 0.5 (probability > 0.5 classified as ESCC, ≤0.5 as healthy controls). All visualizations were created using ggplot2, including ROC curves, multi‐dimensional feature importance heatmaps, t‐SNE dimensionality reduction scatter plots, and model calibration curves. Immunofluorescence Staining Formalin‐fixed, paraffin‐embedded sections were deparaffinized through xylene and graded ethanol series, followed by antigen retrieval in citrate buffer (pH 6.0) using microwave heating. After blocking with 10% serum matching the secondary antibody host species, slides were incubated with the following primary antibodies at 4 °C overnight: anti‐FST (1:400, Proteintech #60060‐1‐Ig), anti‐PTX3 (1:200, Proteintech #13797‐1‐AP), and anti‐EIF2S2 (1:500, Servicebio #[204]GB111135). Alexa Fluor‐conjugated secondary antibodies (1:200) were applied for 50 min at room temperature, followed by DAPI counterstaining and autofluorescence quenching. Sections were mounted with anti‐fade medium and imaged using a digital slide scanner. PBS washes (3 min × 5 min) were performed between all steps. Statistical Methods The Kolmogorov–Smirnov test was applied to assess the normality of all data, while categorical data were analyzed using the chi‐square test. And for data that were normally distributed, unpaired two‐tailed t‐tests were utilized for comparisons, while the Mann–Whitney nonparametric test was used for non‐normally distributed data. The linear relationship between fluorescence intensity or optical density (O.D.) values and target antigen concentration was evaluated using the R‐squared (R^2) statistic. A p‐value of below 0.05 was considered indicative of statistical significance. The limit of detection (LOD) was determined by the equation: LOD = 3 × δ /S, where 𝛿 is the relative standard deviation of the blank value, and 𝑆 denotes the slope derived from the linear regression equation. Diagnostic efficiency was assessed using accuracy, ROC curves, and other relevant metrics. Analyses were conducted using GraphPad Prism (v.9.5), R software (v.4.4.1) (available at [205]https://www.r‐project.org/), and Origin (v.2021). Conflict of Interest The authors declare no conflict of interest. Author Contributions L.H., Y.Z., J.W., X.Z., and C.M. conceived the idea of the study. X.Z., Y.Z., J.Q., and Y.L. performed the experiments, article writing, and created the figures and tables. Y.Z., C.W., M.S., and X.C. guided the preparation of this manuscript. Z.L. and Y.Z. performed the sample collection and data acquisition. H.L., L.H., and C.M. conducted paper revisions. All authors read and approved the final manuscript. Supporting information Supporting Information [206]ADVS-12-e06167-s001.pdf^ (3.5MB, pdf) Acknowledgements