Abstract Objective Aging is a complicated process that triggers age‐related disease susceptibility through intercellular communication in the microenvironment. While the classic secretome of senescence‐associated secretory phenotype (SASP) including soluble factors, growth factors, and extracellular matrix remodeling enzymes are known to impact tissue homeostasis during the aging process, the effects of novel SASP components, extracellular small noncoding RNAs (sncRNAs), on human aging are not well established. Methods Here, by utilizing 446 small RNA‐seq samples from plasma and serum of healthy donors found in the Extracellular RNA (exRNA) Atlas data repository, we correlated linear and nonlinear features between circulating sncRNAs expression and age by the maximal information coefficient (MIC) relationship determination. Age predictors were generated by ensemble machine learning methods (Adaptive Boosting, Gradient Boosting, and Random Forest) and core age‐related sncRNAs were determined through weighted coefficients in machine learning models. Functional investigation was performed via target prediction of age‐related miRNAs. Results We observed the number of highly expressed transfer RNAs (tRNAs) and microRNAs (miRNAs) showed positive and negative associations with age respectively. Two‐variable (sncRNA expression and individual age) relationships were detected by MIC and sncRNAs‐based age predictors were established, resulting in a forecast performance where all R ^2 values were greater than 0.96 and root‐mean‐square errors (RMSE) were less than 3.7 years in three ensemble machine learning methods. Furthermore, important age‐related sncRNAs were identified based on modeling and the biological pathways of age‐related miRNAs were characterized by their predicted targets, including multiple pathways in intercellular communication, cancer and immune regulation. Conclusion In summary, this study provides valuable insights into circulating sncRNAs expression dynamics during human aging and may lead to advanced understanding of age‐related sncRNAs functions with further elucidation. Keywords: aging clock, circulating sncRNAs, human aging, machine learning __________________________________________________________________ Human aging is associated with increased susceptibility to age‐related diseases due to alteration of biological processes. Here we identified changes in extracellular small noncoding RNA (sncRNA) expression with age from plasma and serum samples. A machine learning‐based aging clock was developed using age‐related sncRNAs and is capable of predicting individual age information. As a result of profiling the circulating sncRNA transcriptome we identified putative core biomarkers linked to the aging process. graphic file with name AGM2-6-35-g001.jpg 1. INTRODUCTION Heterogeneity of human lifespan and health outcomes occurs due to differential aging process.[30] ^1 , [31]^2 , [32]^3 Organismal aging is often accompanied by dysregulation of numerous cellular and molecular processes that triggers age‐related pathologies such as tissue degradation,[33] ^4 tissue fibrosis,[34] ^5 arthritis,[35] ^6 renal dysfunction,[36] ^7 diabetes,[37] ^8 and cancer.[38] ^9 The highly proactive secretome from senescent cells, termed the senescence‐associated secretory phenotype (SASP), is one of main drivers that cause age‐related pathogenesis through intercellular communication.[39] ^10 The classical SASP includes secretome of soluble factors, growth factors, and extracellular matrix remodeling enzymes,[40] ^11 and it can transmit age‐related information to the healthy cells via cell‐to‐cell contact. As one of the emerging SASP components protected by extracellular vesicles (EVs), ribonucleoprotein (RNP) complexes, and lipoproteins,[41] ^12 extracellular RNAs (exRNAs) are found in many biological fluids[42] ^13 and can bridge the communication between “donor” and “recipient” cells through endocytosis, inducing paracrine senescence and pro‐tumorigenic processes.[43] ^14 , [44]^15 Deep sequencing of human plasma exRNA revealed more than 80% of sequencing reads mapped to small noncoding RNAs (sncRNAs) in human genome, including microRNAs (miRNAs), PIWI‐interacting RNAs (piRNAs), transfer RNAs (tRNAs), small nuclear RNAs (snRNAs), and small nucleolar RNAs (snoRNAs).[45] ^16 Extracellular miRNA expression in plasma of mice changes with age and cellular senescence can affect age‐related homeostasis throughout the body by circulating miRNA.[46] ^17 Other studies uncovered the roles of circulating miRNAs in age‐related dysfunction such as osteogenesis imperfecta,[47] ^18 decreased myelination,[48] ^19 tumorigenesis,[49] ^20 and cardiovascular disease.[50] ^21 However, the molecular function of other circulating sncRNAs in aging and age‐related diseases has been overlooked, and their expression profiles during human aging process must be further characterized. In this study, we determined the extracellular sncRNAs landscape during healthy human aging. Furthermore we generated an aging clock based on dynamic changes in extracellular sncRNAs and identified putative core sncRNAs with larger contribution weights in machine learning models for age‐related risks prediction. To achieve this, we used 446 pre‐selected small RNA‐seq data from plasma and serum samples (age: 20–99 years) and employed differential expression analysis and linear or nonlinear association measurements to determine age‐related sncRNAs as primary inputs for comprehensive machine learning modeling. Based on supervised machine learning models, aging estimators were created in high accuracy and sncRNAs candidates with top importance values in built models were considered as final age‐related biomarkers. Additionally, pathway enrichment of targets of core miRNAs strengthens our viewpoint that extracellular sncRNAs change with age‐related processes. 2. RESULTS 2.1. Overview of integrated human small RNAs dataset To profile sncRNAs features during human healthy aging, we obtained small RNA‐seq datasets from the Extracellular RNA (exRNA) Atlas data repository ([51]https://exrna‐atlas.org).[52] ^22 This work includes the studies for which information on age, health status, and gender, but only individuals having healthy aging process were retained for analysis. For datasets meeting the quality control standards established by the Extracellular RNA Communication Consortium (ERCC) (see experimental procedures), we created a bioinformatics procedure for reads mapping, processing, normalizing, categorizing, and modeling (Figure [53]1A). As a result of these criteria, 302 plasma and 144 serum samples (Figure [54]1B) were used in this study, with a similar number of samples representing each gender ranging from 20–99 years old (Figure [55]1C, Table [56]S1). As these datasets originate from distinct studies with multiple sampling and library preparations, there are clear batch effects after Counts Per Million (CPM) normalization (Figure [57]S1A,B). The ComBat function from the R package sva (v3.40.0) in Bioconductor[58] ^23 was employed to reduce or eliminate batch effect that may deviate from actual cross‐study results (Figure [59]S1C,D). These corrected data were used for correlation measurements and machine learning training described below. FIGURE 1. FIGURE 1 [60]Open in a new tab Identifying practical computational models of healthy aging via plasma and serum small noncoding RNAs (sncRNAs). (A) Flow chart of data preprocessing, normalizing, batch effect correcting, and analyses of 446 blood samples. (B) Proportion of plasma and serum samples from healthy donors. (C) Distribution of age and gender in plasma and serum 2.2. Identification of expressed sncRNAs in plasma and serum To determine sncRNAs expressed during aging, we considered sncRNAs with ≥1 CPM in at least 30% of individuals within an age group (young (20–30), adult (31–60), and aged (61+) groups) as expressed sncRNAs. As a result, there were 7953 and 6476 sncRNAs observed in plasma and serum samples respectively (Figure [61]1A). Further, we identified highly expressed sncRNAs by increasing minimal CPM to 10, resulting in 1243 and 1139 sncRNAs retained in plasma and serum samples respectively (Figure [62]1A, Table [63]S2). In terms of distribution of sncRNAs subtypes in three age groups, miRNAs account for a high proportion (26.5%–63.4%) of all sncRNAs in both plasma and serum, and their abundance consistently decreased with age (Figure [64]2A,B). tRNAs increased and became the dominant sncRNA in aged group while expression of miRNAs were reduced in older individuals (Figure [65]2A,B). The corresponding mapped reads are proportional to the number of each highly expressed subtype, even though miRNA showed relatively more sequencing reads than others in both plasma and serum (Figure [66]2C,D). FIGURE 2. FIGURE 2 [67]Open in a new tab Highly expressed sncRNAs in plasma and serum. Subtype distribution of highly expressed sncRNAs, which meet the expression cutoff (≥10 CPM in ≥30% of samples) among young (20–30 years), adult (31–60 years), and aged individuals (≥61 years) in plasma (A) and serum (B). Total sequencing reads of highly expressed sncRNAs among three age groups in plasma (C) and serum (D) 2.3. Exploring the correlation between sncRNAs and human aging We calculated the maximum information coefficient (MIC) (D. N.[68] ^24 ) to investigate both linear and nonlinear associations between sncRNAs expression and corresponding individual age. By employing batch‐corrected data of expressed sncRNAs, we identified 364 and 1941 age‐related sncRNAs from plasma and serum respectively (Figure [69]3A,B, Table [70]S3). Intriguingly, piRNAs became the most abundant sncRNAs in MIC measurement, with the number of snRNAs representing the second largest (Figure [71]S2A,B). Similarly, the over‐represented biological processes of miRNA targets were identified, and cellular response and epigenetic modification were enriched in plasma (Figure [72]3C), while biosynthetic processes were significantly observed in serum samples (Figure [73]3D). FIGURE 3. FIGURE 3 [74]Open in a new tab Identification of age‐related sncRNAs. MIC‐based age‐related sncRNAs in plasma (A) and serum (B), identified by both MIC and total information coefficient (TIC) values ≥0.7. Over‐representation analysis of biological process of MIC‐based age‐associated miRNAs targets in plasma (C) and serum (D) (p‐adjusted value <0.05) 2.4. Core feature selection of age‐related sncRNAs As the expression of sncRNAs changes with age, further data‐driven analysis was conducted to construct a human aging clock. MIC‐based age‐correlated sncRNAs were used as inputs to train regression models in plasma and serum samples. Compared to the linear models, such as Linear Regression (without feature selection) and Elastic Net (feature selection through regularization), the tree‐based ensemble machine learning methods (including Adaptive Boosting, Gradient Boosting, and Random Forest regressors) showed stronger power of prediction with better performance in accuracy (Figure [75]4) since its great capability of learning the underlying nonlinear patterns. With stably ideal performance in test subsets (Table [76]S4), all models inputting age‐correlated sncRNAs (MIC_plasma and MIC_serum) accurately predicted the ages of corresponding individuals in test sets, with average R ^2 values greater than 0.96, root mean squared error (RMSE) values less than 3.7 years and mean absolute error (MAE) values less than 1.9 years (Figure [77]4A–C). FIGURE 4. FIGURE 4 [78]Open in a new tab Performance evaluation of sncRNAs based aging clocks built by linear regression, elastic net, Adaptive Boosting, Gradient Boosting, and Random Forest approaches. Summary of R ^2 value (A), root mean squared error (RMSE) (B), and mean absolute error (MAE) (C). (D) Model fit based on plasma MIC‐based associated sncRNAs. (E) Model fit based on serum MIC‐based associated sncRNAs. All model fits were constructed using Adaptive Boosting method. Due to the strong generalization ability in all ensemble learning methods, core sncRNAs associated with aging processes were determined by combined statistics and sum of importance ranks in the three methods was used as the criteria for core sncRNAs identification. As a result, there were 222 and 321 core sncRNAs overlapped in all three methods with MIC_plasma and MIC_serum as the inputs respectively (Table [79]S5). Particularly, four snRNAs, three piRNAs, two small cytoplasmic RNAs, and one miRNA were identified as top core sncRNAs in plasma (Table [80]1). In serum samples, seven snRNAs, two tRNAs, and one small cytoplasmic RNA identified as top core sncRNAs in serum samples (Table [81]2). TABLE 1. Top core sncRNAs associated with age in plasma Model input Gene name RNA type Adaboost GB RF Sum of rank MIC_plasma piR‐33,748 piRNA 3 1 1 5 MIC_plasma U5‐L97 snRNA 1 2 2 5 MIC_plasma HY3‐L319 scRNA 2 6 3 11 MIC_plasma U6‐L1016 snRNA 7 7 4 18 MIC_plasma hsa‐miR‐11,181‐3p miRNA 6 13 14 33 MIC_plasma HY1‐L12 scRNA 22 4 10 36 MIC_plasma piR‐61,840‐L3 piRNA 16 3 18 37 MIC_plasma U7‐L212 snRNA 18 17 11 46 MIC_plasma piR‐49,811 piRNA 20 8 23 51 MIC_plasma U1‐L72 snRNA 36 5 26 67 [82]Open in a new tab Note: Importance ranking from three ensemble learning methods and corresponding sum of rank. Abbreviations: Adaboost, Adaptive Boosting; GB, Gradient Boosting; RF, Random Forest. TABLE 2. Top core sncRNAs associated with age in Serum Model input Gene name RNA type Adaboost GB RF Sum of rank MIC_serum U5‐L192 snRNA 2 9 6 17 MIC_serum U6‐L317 snRNA 27 4 4 35 MIC_serum HY3‐L199 scRNA 4 5 36 45 MIC_serum U2‐L87 snRNA 34 13 28 75 MIC_serum U3‐L6 snRNA 16 56 24 96 MIC_serum tRNA‐Thr‐AGT‐1‐1 tRNA 96 18 2 116 MIC_serum tRNA‐Ala‐AGC‐8‐1‐tRF5 tRNA 37 44 71 152 MIC_serum U2‐L1053 snRNA 135 3 15 153 MIC_serum U3‐L119 snRNA 109 41 3 153 MIC_serum U6‐L1640 snRNA 136 10 13 159 [83]Open in a new tab Note: Importance ranking from three ensemble learning methods and corresponding sum of rank. Abbreviations: Adaboost, Adaptive Boosting; GB, Gradient Boosting; RF, Random Forest. Notably, we also observed a gender‐specific model performance. When male‐only samples were used as training set for predicting female‐only test sets or vice versa, there were core sncRNAs unique to one gender (Figure [84]S3A,B and Table [85]S6), with slightly lower performance in R ^2 and RMSE values compared to the models trained in gender‐mixed data (Figure [86]S3C,D). 2.5. Core miRNAs are involved in aging‐related processes To gain further insight into extracellular sncRNAs potential functions in a microenvironment, we focused on miRNAs, which are well characterized in post‐transcriptional gene regulation. The most ranked miRNA with the largest importance score in plasma and serum, hsa‐miR‐11,181‐3p and has‐miR‐7845‐5p (Table [87]S5), were selected and their targets were separately predicted via the integration of eight miRNAs databases. The expressional profile of these two miRNAs in three age groups is in Figure [88]S4 and corresponding targets are included in Table [89]S7. As expected, these miRNA targets are enriched in canonical cell–cell communication pathways such as Sulfur relay system and Endocytosis pathways, as well as Immune development, Asthma and Ras signaling pathways that closely related to immune dysfunction and tumorigenesis during aging process (Figure [90]5A). FIGURE 5. FIGURE 5 [91]Open in a new tab Top core miRNAs are associated with human aging and aging‐related disease. (A) KEGG pathway enrichment analysis of core miRNA targets. Pathway terms are ranked by combined score in Erichr.[92] ^73 (B) Interaction network among core miRNAs (in red), targets (in blue), and corresponding regulatory proteins (in purple). Only targets and interacted proteins have validated function in cell senescence, human aging, and longevity (information from HAGR) are shown We also investigated the association between miRNA targets and protein coding genes previously validated in the human aging process from Human Aging Genomic Resources (HAGR),[93] ^25 and we found targets, including DDIT3, HLA‐DQA1, PTK2B, TTR, and YWHAG, were experimentally identified to be associated with cancer progression, senescence, aging, and longevity (Table [94]S8). Based on protein–protein interaction enrichment analysis, these targets were demonstrated to have regulatory relationship with hallmark proteins, such as PIK3R1, STAT3, IL7R, and JAK2 (Figure [95]5B and Table [96]S9), which have function in cancer, immune response, and intercellular transduction, bolstering the probability that other non‐miRNA sncRNAs also have functions in aging and aging‐related diseases. 3. DISCUSSION Our study comprehensively profiled the relationship of extracellular sncRNAs with age in blood and built an aging clock of healthy individuals using sncRNAs linear and nonlinear correlated with age. Previously, age predictors were developed through DNA methylation sites,[97] ^26 transcriptome expression,[98] ^27 , [99]^28 repeat elements,[100] ^29 microRNAs,[101] ^2 and protein abundance.[102] ^30 This study provides the first detailed analysis of relationship between circulating sncRNAs and age based on regression models and core sncRNAs whose expression changes with age, allowing reliable age prediction. From previous human biofluids studies, differential composition of small RNA has been reported in multiple biofluids. Godoy et al.[103] ^31 used 12 normal human biofluids including plasma and serum in their study and for mapping reads of corresponding RNA sequencing (RNA‐seq), miRNA showed relative high fraction (63.8906%, median) in adult plasma compared to serum (36.0154%, median). However, the percentage of tRNA mapped reads in serum increased (42.2067%, median) and became the most abundant RNA biotype, while median value was 0.7759% in adult plasma. One study determined the diversity of small RNA in different biofluids, and tRNA showed the largest percentage of mapped reads (39.7%) in serum compared to plasma (5.8%) and whole blood (2.1%).[104] ^32 Also, in the Max et al. study,[105] ^33 they characterized extracellular RNAs (exRNAs) from both plasma and serum samples of the same healthy volunteers, and interestingly they showed substantial differences of small RNA composition, with higher proportion of miRNA in plasma and more tRNA reads in serum. We have some serum and plasma samples from the same individuals (Table [106]S1) and consistent results were observed (Figure [107]2). Max et al.[108] ^33 also concluded that different biofluid types, even though they come from the same origin, plasma and serum show significant variable that impact exRNA profile. One of the reasons is that additional absorption and continuous degradation of exRNAs by retained blood clot will reduce exRNA abundance.[109] ^33 So proper exRNA isolation is essential and immediate platelet and cell debris depletion for plasma collection may avoid losses of exRNA characteristics as much as possible. It is of interest to identify a detectable increase of highly expressed tRNAs in aged individuals, and it has been reported that spleen and brain had the highest tRNA expression,[110] ^34 which may indicate unique and differential biological process happen as individuals age. A previous report similarly finds tRNAs were the second most abundant sncRNAs in healthy adults (20–40 years) when small cytoplasmic RNA was not mentioned.[111] ^35 Unlike tRNAs driving protein synthesis, tRNA‐derived small RNAs (tsRNAs), including tRNA‐derived fragment (tRF) and stress‐induced tRNA halves (tiRNA), have been uncovered as aging process related sncRNAs.[112] ^36 Similar as human studies, the expression of tsRNAs increased during aging in Drosophila,[113] ^37 C. elegans,[114] ^38 and mouse brain cells.[115] ^39 Compared with healthy controls, differential expression of tsRNAs in age‐related diseases has been employed in disease prediction such as Alzheimer's disease and Parkinson's disease,[116] ^40 ischaemic stroke,[117] ^41 and osteoporosis.[118] ^42 tsRNAs have roles not only in potential biomarkers, but also in expressional regulation of age‐related mRNAs.[119] ^36 For example, 5′‐tRF^Tyr from tyrosine pre‐tRNA can silence PKM2, which is the inhibitor of p53, to cause p53‐dependent neuronal death.[120] ^43 The number of highly expressed miRNA in our study displayed a decreased tendency in older group, and it has been observed in both plasma and serum. Both core miRNAs identified by machine learning models were found to have reduced expression as age increased, similar to decreased expression of a majority of age‐associated miRNAs in whole‐blood,[121] ^2 serum,[122] ^44 and peripheral blood mononuclear cells.[123] ^45 It has been previously demonstrated that circulating sncRNAs from serum samples show strong association with human aging,[124] ^46 while the human aging modeling based on regression relationship was not yet built. In our study, potential function of core sncRNAs was predicted via miRNA target prediction, and these targets showed enrichment in cancer, cell cycle, and longevity regulating pathways. There are overlapping genes included in both cancer and longevity regulation pathways, and this result was consistent with early study that profiled miRNAs expression between young and old individuals.[125] ^45 For example, increased PIK3R1 expression has been identified to impair anti‐tumor effect through PI3K‐Akt activation in breast and ovarian cancer chemotherapy.[126] ^47 , [127]^48 Previous research determined that protein level of p85α, which is the subunit of PIK3R1, was elevated with age, and age‐associated miRNAs that potentially target PIK3R1 were downregulated.[128] ^45 Studies in human aging also show that sequence variations within PIK3R1 gene are significantly correlated with longevity,[129] ^49 and individuals with different genotypes of PIK3R1 were associated with longevity through reduced mortality risk in cardiovascular disease.[130] ^50 Interestingly, both core miRNAs (hsa‐miR‐11,181‐3p and has‐miR‐7845‐5p) that are potentially involved in PIK3R1 regulation (Figure [131]5B) showed lower expression in aged individuals (Figure [132]S4). The hsa‐miR‐11,181‐3p has been used as biomarker for identification of glioma brain tumors from other brain tumor types.[133] ^51 By suppressing Wnt signaling inhibitor APC2, overexpression of hsa‐miR‐11,181‐3p can promote Wnt signaling pathway and increase cell viability in colon malignant tumor cell line.[134] ^52 For has‐miR‐7845‐5p, its expression in serum has been applied in constructing diagnostic classifier of ovarian cancer,[135] ^53 and higher expression was also observed in serum of patients with persistent atrial fibrillation.[136] ^54 Some direct targets of core miRNAs have been determined as drivers of age‐related process. For example, protein tyrosine kinase 2β (PTK2B) is a tyrosine kinase activated by angiotensin II through Ca^2+‐dependent pathways to mediate ion channels as well as map kinase signaling pathway.[137] ^55 PTK2B is involved in cell growth, inflammatory response, and osmotic pressure regulation after activation and mutated PTK2B is statistically associated with hypertension in Japanese population.[138] ^56 PTK2B has also been reported in memory formation and corresponding protein variants can trigger cognitive dysfunction and higher prevalence of Alzheimer's disease.[139] ^57 As a nuclear protein that activated by DNA damage, DNA‐damage inducible transcript 3 (DDIT3) shows increased expression and prevents gene transcription by dimerizing with transcription factors.[140] ^58 Specifically, DDIT3 plays role in endoplasmic reticulum (ER) protein processing and resulted ER stress promotes cardiomyocyte senescence in mouse hearts.[141] ^59 The function of most of age‐associated sncRNAs identified in this study is unknown and further investigation into their function may provide meaningful results. We also observed the mild sex‐dependent differences in the aging clock modeling. Similarly, a previous study indicated that sncRNAs differences between genders were minor[142] ^33 and sex‐specific training sets have relatively low performance score in prediction compared to the gender‐mixed training sets. During this process, some gender‐dependent core sncRNAs were identified, including male‐specific sncRNAs piR‐31,143 and piR‐48,977 in plasma, male‐specific sncRNAs piR‐33,527 and piR‐57,256 in serum, female‐specific sncRNAs hsa‐miR‐3789 and U5‐L214 in plasma and female‐specific sncRNAs U6‐L989 and piR‐30,597 in serum (Table [143]S6). Further mechanistic study is needed to uncover their prospective role in aging and aging‐related disease. A major limitation of our current study is the corresponding datasets utilized were developed by researchers for different, unique projects and with multiple RNA extraction protocols, which may bias extracellular RNA abundance.[144] ^35 Furthermore, trait information such as ethnicity, body mass, and smoking habits were not considered in our study due to the lack of information, and a more sophisticated and systematic sample processing and recording would help future research on big data‐based human aging modeling. In conclusion, we provide a novel insight into the circulating sncRNAs profile of human aging. We developed predictive models in uncovering core sncRNAs and estimated age by utilizing meta‐analysis based correlation measurement and machine learning modeling. The sncRNA dynamics with age provide valuable references for extracellular RNA