Abstract This manuscript describes the design and protocol of the Biobank for Metabolic Syndrome Consequences (BMSC), a prospective cohort study in Northwest China. Metabolic syndrome (MetS) is characterized by a group of interrelated disorders, including abdominal obesity, hyperglycemia, hypertension, and dyslipidemia. The presence of three or more of these conditions markedly increases the risk of multiple chronic diseases and mortality. The pathophysiology and natural course of MetS and its consequences are insufficiently understood. To improve our understanding, longitudinal research that combines biomarkers with longitudinal data measured over multiple time points is imperative. The BMSC, launched in August 2021 and still ongoing, is a prospective observational study of 2000 Chinese participants aged 18 to 75 years with MetS or relevant disorders living in the Northwest of China. At baseline survey, data on sociodemography, disease history, behavior and lifestyle, and mental health are collected by a structured questionnaire. The anthropometry is conducted by trained researchers. Fasting peripheral venous blood, urine, stool, and hair samples are collected according to standardized protocols. Extensive physical examinations are conducted in specific subgroups. Participants will be followed up every 3 months for at least 5 years for the incidence of MetS-related outcomes, such as cardiovascular disease, with clinical data and biological samples being collected at intervals similar to the baseline. These findings may contribute to improved prevention, early diagnosis, and personalized treatment of MetS-related conditions. Keywords: metabolic syndrome, obesity, diabetes, biobank, study design Background Metabolic syndrome (MetS) is a cluster of chronic conditions, including abdominal obesity, hyperglycemia, hypertension, and dyslipidemia,[56]^1 and has become a major global public health concern. The prevalence of MetS varies somewhat depending on the criteria used for the definition. A meta-analysis of global data from 28 million individuals reported a prevalence between 12.5% and 31.4% among adults worldwide.[57]^2 In China, the prevalence of MetS grew from 8.8% in 1991–1995 to 29.3% in 2011–2015,[58]^3 and current estimates are up to 29.2% for men and even 35.4% for women,[59]^4 coinciding with the steep incline in the prevalence of obesity and diabetes. These high estimates necessitate a greater focus on MetS prevention. MetS and its components are risk factors for cardiometabolic diseases (CMDs), such as type 2 diabetes (T2D), metabolic dysfunction-associated steatotic liver disease (MASLD), and cardiovascular diseases (CVDs)[60]^5–8 and contribute to an increased mortality risk.[61]^8–10 Comorbidities such as chronic kidney disease (CKD),[62]^11^,[63]^12 chronic obstructive pulmonary disease (COPD),[64]^13^,[65]^14 depression,[66]^15 and cognitive impairment[67]^16 are also more prevalent in populations suffering from MetS. Given the enormous public health and economic burden of MetS, it is essential to understand the underlying pathophysiology and natural course of MetS and its consequences. The etiology and mechanisms of MetS are heterogeneous and remain to be fully elucidated. Its development is influenced by modifiable risk factors, including obesity, sedentariness, and a high-fat diet.[68]^17^,[69]^18 However, the pathways through which these factors contribute to MetS, and their interactions with genetic predisposition, are still not fully understood. Furthermore, evidence regarding the long-term dynamic changes in risk factors and their impact on the progression and consequences of MetS remains scarce, largely due to the limited availability of prospective cohorts with repeated measurements. Mechanistic studies have suggested that defective adipocyte differentiation and excessive visceral fat accumulation can drive the sustained release of proinflammatory cytokines, which may serve as primary triggers of MetS.[70]^19^,[71]^20 Against this background, exploring novel biomarkers may provide further opportunities to better characterize the pathophysiology of MetS and its consequences; however, biomarkers suitable for early diagnosis of MetS-related diseases, such as metabolic dysfunction-associated steatohepatitis (MASH), are still lacking. Early identification of high-risk subgroups within populations with MetS is of great value for decreasing the risk of serious complications, underscoring the importance of developing such biomarkers. Although many prospective cohorts have previously been established, they were largely community-based and focused broadly on the prevention of MetS and relevant metabolic disorders, with limited emphasis on in-depth mechanistic exploration. Several large-scale biobanks, such as UK Biobank in Europe, the China Kadoorie Biobank in China, FinnGen in Finland, and All of Us Research Program in the US, have provided valuable insights into the epidemiology, genetic architecture, and risk factors of chronic diseases, including metabolic disorders. Their success has demonstrated the power of biobank-based resources in advancing disease prevention and precision medicine. However, few of these cohorts have incorporated multiple repeated measurements during follow-up, particularly after the onset of MetS, which restricts the ability to capture dynamic changes over time. Moreover, none of these existing biobanks were specifically designed to investigate MetS as the primary focus, nor did they systematically integrate multi-omics profiling. In addition, dietary behavior is closely related to the occurrence and development of metabolic diseases. Northwest China is characterized by a high-carbohydrate diet and distinct lifestyle patterns, leading to a disease profiling that may differ from that of other regions. To date, no large-scale studies with repeated follow-up assessments and ample biological samples have been established for this specific population. The BMSC cohort was therefore established to fill this critical gap by combining longitudinal repeated measurements with multi-omics data in a uniquely underrepresented population. To better understand the pathophysiology and natural course of MetS, it is imperative to combine studies with biomarkers and longitudinal data. Thus, we now aim to establish a comprehensive databank and biobank in Northwest China, i.e., the BMSC. This biobank is designed to 1) further understand the pathophysiology and consequences of MetS; 2) explore novel biomarkers to prevent and diagnose MetS and its consequences; and 3) supply useful information for the treatment of MetS and relevant disorders in a longitudinal observational study. By providing longitudinal multi-omics data in a well-characterized population, the BMSC may also inform personalized prevention strategies, enable early risk prediction, and guide targeted interventions for MetS-related conditions. This paper will describe the study design and the protocol of data collection in detail. Study Population and Design Study Population Men and women aged 18 to 75 years are invited to participate in the BMSC study if they have been diagnosed with at least one of the following metabolic disorders: pre-diabetes or T2D, body mass index (BMI) ≥ 24.0 kg/m^2, hypertension, and/or dyslipidemia. The definitions of these metabolic disorders are based on established international guidelines and were applied uniformly to all participants. Specifically, pre-diabetes is defined as a fasting plasma glucose (PG) level of 5.6–6.9 mmol/L or a 2-hour PG level of 7.8–11.0 mmol/L after an oral glucose tolerance test (OGTT), while diabetes is defined as a fasting PG ≥ 7.0 mmol/L or 2-h PG ≥ 11.1 mmol/L or glycosylated hemoglobin (HbA1c) ≥ 6.5%.[72]^21 Hypertension is defined as systolic blood pressure ≥ 140 mm Hg, diastolic blood pressure ≥ 90 mm Hg, or the current use of antihypertensive medication, in line with the Chinese Hypertension Guidelines (2020, in Chinese). Dyslipidemia is defined as meeting any of the following criteria: total cholesterol ≥ 5.2 mmol/L, low-density lipoprotein cholesterol (LDL-C) ≥ 3.4 mmol/L, triglycerides ≥ 1.7 mmol/L, or high-density lipoprotein cholesterol (HDL-C) < 1.0 mmol/L.[73]^22 Those who have lived locally for a long time (> 1 year) and are willing to accept long-term follow-up are eligible for the BMSC study. Individuals will be excluded if they: 1) are judged to have a life expectancy of less than 5 years; 2) are addicted to drugs or have a history of drug abuse; 3) have viral hepatitis, sexually transmitted diseases such as AIDS and syphilis, and infectious diseases such as active tuberculosis; or 4) have any circumstances that could affect enrollment. The Institutional Review Board has approved the study at the First Affiliated Hospital of Xi’an Jiaotong University (Xi’an, China). Investigators explain the research aims and protocol to eligible participants in a face-to-face manner, and those who agree to participate in the BMSC study need to sign the informed consent for participation, for storage of biological samples (including blood, urine, stool, and hair), and for obtaining medical records of both baseline and follow-up surveys. The BMSC study started in August 2021 and is still ongoing. By August 2024, a total of 1589 participants were recruited. Of them, 951 were men (59.8%), and the mean age was 47 years. The characteristics of the participants are presented in [74]Table 1. Table 1. Baseline Characteristics of Participants in the BMSC Study Up to August 2024 Characteristics Total (n = 1589) Male (n = 951) Female (n = 638) Demographic information Age, years 46.7±13.7 48.0±12.7 44.7±14.9 Education  Senior high school and below, n 496 (31.2) 272 (28.6) 224 (35.1)  College and above, n 1093 (68.8) 679 (71.4) 414 (64.9) Marital status  Unmarried, n 180 (11.3) 82 (8.6) 98 (15.4)  Married, n 1409 (88.7) 869 (91.4) 540 (84.6) Profession  Employed, n 986 (62.1) 644 (67.7) 342 (53.6)  Unemployed, n 603 (38.0) 307 (32.3) 296 (46.4) Monthly income, ¥  <3000 351 (22.1) 150 (15.8) 201 (31.5)  ≥3000 1238 (77.9) 801 (84.2) 437 (68.5) Census register  Shaanxi Province, n 1315 (82.8) 803 (84.4) 512 (80.2)  Others, n 274 (17.2) 148 (15.6) 126 (19.8) Source  Hospital 1420 (89.4) 861 (90.5) 559 (87.6)  Public areas 169 (10.6) 90 (9.5) 79 (12.4) Anthropometry Body mass index, kg/m^2 27.9±6.2 27.0±5.6 29.2±6.8 Waist circumference, cm 96.9±14.0 96.7±12.6 97.1±16.0 Hip circumference, cm 100.9±11.1 99.5±9.7 102.9±12.7 Biceps circumference, cm 31.3±4.8 30.8±4.1 32.0±5.6 Thigh circumference, cm 53.6±8.3 52.2±7.0 55.6±9.5 Biological samples, n (%) Fasting blood 1589 (100.0) 951 (100.0) 638 (100.0) Urine 1510 (95.0) 914 (96.1) 596(93.4) Stool 1307 (82.3) 790 (83.1) 517 (81.0) Hair 1248 (78.5) 632 (66.5) 616 (96.6) Metabolic disorders, n (%) Hyperglycemia 1286 (80.9) 848 (89.2) 438 (68.9) Obesity 598 (37.6) 283 (29.8) 315 (49.4) Hypertension 771 (48.5) 490 (51.5) 281 (44.0) Dyslipidemia 1062 (66.8) 674 (70.9) 388 (60.8) MetS 1046 (65.8) 653 (68.7) 393 (61.6) MetS components  0 13 (0.8) 4 (0.4) 9 (1.4)  1 184 (11.6) 100 (10.5) 84 (13.2)  2 346 (21.8) 194 (20.4) 152 (23.8)  3 425 (26.8) 261 (27.4) 164 (25.7)  4 406 (25.6) 267 (28.1) 139 (21.8)  5 215 (13.5) 125 (13.1) 90 (14.1) [75]Open in a new tab Abbreviations: BP, blood pressure; BMSC, Biobank for Metabolic Syndrome Consequences; FBG, fasting blood glucose; HDL-C, high-density lipoprotein cholesterol; MetS, metabolic syndrome; TG, triglyceride. Recruitment Strategies Participants are recruited through two strategies. First, patients presented in the Department of Endocrinology and Metabolism at the First Affiliated Hospital of Xi’an Jiaotong University are evaluated by doctors. The patients who meet the inclusion criteria for the BMSC study are invited and referred to the investigators. Second, participants are recruited through online advertisements and posters posted in public areas. Study Design The BMSC study is a prospective observational study of participants aged 18–75 years. After an overnight fast for the baseline survey, eligible participants are invited to the research center or the inpatient ward. Trained researchers conduct the anthropometry according to standard methods. The participants are also asked to complete a structural questionnaire at the first visit to report the basic demographic information, history of disease and medication use, family history of disease, lifestyle and behaviors, physical activity, diet, life events, symptoms of osteoarthritis, sleep, depression, and anxiety. All participants need to provide biological samples according to the study protocol. Professional nurses collect fasted blood, while hair is collected by trained researchers. Urine and stool are collected by patients after the researchers explain the sampling methods. Extensive examinations, such as bioelectrical impedance analysis (BIA), electrocardiogram (ECG), ultrasonography, electromyography (EMG), and fundus photography, are also performed. Echocardiography is performed in a subgroup of patients who need this examination after their physician’s evaluation. The measurements are listed in [76]Table 2. Table 2. Overview Measurements within the BMSC Study Measurements Content Questionnaire Demographic information, socioeconomic status, history of the disease and medication use, disease history of first-degree relatives, smoking behavior, alcohol consumption, physical activity, life events, dietary intake, sleep, symptoms of osteoarthritis, mental health (anxiety and depression), birth weight, and weight change Anthropometry Height, weight, waist circumference, hip circumference, abdomen circumference, biceps circumference, and thigh circumference Biobanking Fasting serum, EDTA-plasma, RNA, and DNA Spot morning urine samples Stool samples Hair samples Physical examinations Blood pressure, heart rate, and area of subcutaneous and visceral fat Bioelectrical impedance analysis, electrocardiogram, electromyography, ultrasonography (neck vessels and lower limbs), and echocardiography [77]Open in a new tab Abbreviation: BMSC, Biobank for Metabolic Syndrome Consequences. After successful enrollment, follow-up is conducted every 3 months thereafter until the 5-year mark. Anthropometry is performed at every follow-up, and biological samples are collected every two follow-ups. The changes in metabolic indicators and incidence of CMDs and mortality are followed up for at least 5 years ([78]Figure 1). Figure 1. [79]Figure 1 [80]Open in a new tab The data framework of the Biobank for Metabolic Syndrome Consequences study. ① Height, weight, WC, BP; ② Smoking, drinking, sleeping, and dietary intake; ③ Depression, anxiety; ④ Blood, urine, stool, and hair; ⑤ CMDs, MetS-related complications; ⑥ Bioelectrical impedance analysis, electrocardiogram, and ultrasonography, etc.; ⑦ Glucose, glycosylated hemoglobin, serum lipid profile, renal function, and liver function, etc. Blue text indicates data actively collected by researchers as part of cohort requirements, whereas green text represents clinical data, with specific tests determined primarily by physicians according to patients’ conditions. Yellow and red text denote follow-up time points. Abbreviations: BP, blood pressure; CMD, cardiometabolic disease; MetS, metabolic syndrome; WC, waist circumference. Data Collection Questionnaires The participants are asked to complete the structural questionnaire at the baseline survey, including questions on basic characteristics, history of diseases (including but not limited to diabetes mellitus, hypertension, hyperlipidemia, hyperuricemia, polycystic ovary syndrome, coronary heart disease, stroke, congestive heart failure, peripheral arterial disease, cancer, and thyroid disease), and medication use, disease history of first-degree relatives, lifestyle and behaviors, physical activity, life events, dietary intake, sleep, symptoms of osteoarthritis, and mental health (anxiety and depression). Socioeconomic data, including education level, occupation, household income, and insurance status, are collected through standardized, interviewer-administered questionnaires. Details on the requested information are as follows: Physical Activity Participants’ physical activity is evaluated by the Chinese version of the International Physical Activity Questionnaire (IPAQ)-long form, which has been well-validated in the Chinese population.[81]^23^,[82]^24 Dietary Intake The participants report a long-term dietary intake through a semi-quantitative food frequency questionnaire (FFQ). Because the FFQ is relatively easy to use, inexpensive, and reflects a long-term dietary intake pattern, it has become the most common tool to evaluate the diet in large epidemiological studies. The FFQ used in the BMSC study covers eighteen categories: grain (11 items); beans (8 items); fresh beans (5 items); root vegetable (5 items); melon vegetable (7 items); leaf vegetable (16 items); fruit (4 items); nut (4 items); animal meat (13 items); poultry (5 items); milk (1 item); eggs (4 items); aquatic products (8 items); fungi foods (5 items); pickle (4 items); alcohol and other beverages (5 items); oil (6 items); condiments (8 items); and nutritious supplementation. Sleep Sleep assessment is performed via three questionnaires. Epworth sleepiness scale (ESS) is used to assess daytime sleepiness,[83]^25 the Pittsburgh sleep quality index (PSQI) is used to evaluate the quality of sleep,[84]^26 and the Berlin questionnaire is used for assessment of the presence of obstructive sleep apnea syndrome.[85]^27 Symptoms of Osteoarthritis Symptoms of osteoarthritis are scaled by the Knee Injury and Osteoarthritis Outcome Score (KOOS), which is developed to assess the patient’s opinion on knee health and related problems and has been validated in the Chinese population.[86]^28 Mental Health Both depression and anxiety are evaluated. The Zung Seff-rating Depression Scale (SDS) is used to determine depressive symptoms via twenty terms. Indicators of anxiety are collected using the Beck Anxiety Inventory (BAI). SDS[87]^29 and BAI[88]^30 have been well validated in the Chinese population. Physical Examination All participants undergo a physical examination at baseline survey, including anthropometry measurement, blood pressure, heart rate, subcutaneous fat, and visceral fat. Height and weight are measured using an ultrasonic height and weight meter (OMRON MEDICAL Beijing Co., Ltd. HNH-318) to the nearest 0.01 m and 0.1 kg, respectively. The waist circumference is the circumference of the midpoint line between the lowest point of the rib and the upper margin of the iliac crest (the narrowest point of the waist) measured at the end of exhalation and before the beginning of inhalation using a tape measure, with a precision of 0.1 cm. Hip circumference is measured as the maximum circumference of the buttocks. Abdomen circumference is measured at the iliac crest point at the end of exhalation but before the beginning of inhalation. The biceps circumference is measured at the upper 1/3 of the right arm, while the thigh circumference is measured at the upper 1/3 of the right thigh. An electronic blood pressure monitor (OMRON MEDICAL Beijing Co., Ltd. HEM-7121) is used to measure the blood pressure and heart rate, after the participants are asked to be quiet and rest for at least 5 minutes. The subcutaneous and visceral fat area is measured using a visceral fat detector (OMRON MEDICAL Beijing Co., Ltd. 3 DUALSCAN, HDS-2000), with a unit of cm^2. Advanced examinations such as BIA, ECG, and ultrasonography are also performed. The body composition, including body fat (body fat percentage), muscle mass (skeletal muscle content), and visceral fat (visceral fat grade), is measured using the direct segmental multifrequency BIA method DSM-BIA (InBody H20) to the nearest 0.1 kg. A resting 12-lead ECG (Mortara Eli-350) is recorded and archived electronically at baseline survey. Carotid intima-media thickness is measured by color Doppler ultrasonography of the neck vessels. Peripheral arterial disease is determined with the help of color Doppler ultrasonography of lower limbs. Fundus photography of both eyes is performed to determine the presence of diabetic retinopathy. Surface EMG assessment is only performed to assist in the diagnosis of diabetic peripheral neuropathy. Echocardiography is performed in a subset of individuals needing examinations after their physicians’ evaluation. Fasting Blood Sampling A total of 20 mL peripheral venous blood is taken after an overnight fast of at least 8 hours, and used for the collection of serum, EDTA-plasma, RNA, and DNA, all stored in a −80°C freezer for future multi-omics analyses. Separate blood samples are taken for immediate tests, such as glucose, HbA1c levels, serum lipid profile (total cholesterol, triglycerides, HDL-C, LDL-C, apolipoprotein A1, total apolipoprotein B, and apolipoprotein E), indicators of liver function (aspartate transaminase, alanine transaminase, total protein, albumin, globulin, total bile acids, alkaline phosphatase, γ-glutamyltranspeptidase, total bilirubin, direct bilirubin, and indirect bilirubin), indicators of renal function (urea, cystatin C, uric acid, creatinine, glycated albumin, and retinol-binding protein), indicators of thyroid function (serum thyroid stimulating hormone, total and free thyroxine, total and free triiodothyronine), and blood cell count, mean red blood cell (RBC) volume, platelet concentration, platelet distribution width, mean platelet volume, and high-sensitivity C-reactive protein (CRP). Insulin release test, C-peptide release test, and serum insulin and C-peptide measurements are only performed in a subset of individuals who need these examinations according to the disease. Urine Sampling A tube of 15 mL urine is obtained from all participants at enrollment. The participants are asked to fast for at least 8 hours before the collection of midstream urine. Collected urine samples are stored in a −80°C freezer for future analyses. The urine samples used for routine analyses are separately collected. Creatinine and urinary microalbumin are tested using standard methods at the clinical laboratory. Stool Sampling After urination, two tubes of fecal samples are collected. After receiving the fecal samples, the researchers put them into liquid nitrogen for rapid freezing and then store them at −80°C freezer for future analyses. Hair Sampling Using blunt curved scissors, the researchers collect 30 to 50 hairs from the occipital bone, as close as possible to the root of the hair, taking 3 cm. The hair sample is rolled in tinfoil and stored at room temperature. Ascertainment of Incident Diseases During Follow-Up The primary endpoints of the BMSC study include major CMDs, such as CVDs (coronary heart disease, stroke, myocardial infarction, and congestive heart failure), MASLD, mortality, and other MetS-related diseases (T2D, depression, and osteoarthritis). Endpoints are ascertained through three complementary sources: (1) standardized questionnaires administered annually, including self-reported physician diagnoses; (2) biochemical indicators collected every six months, with prespecified thresholds (e.g., fasting glucose, HbA1c, lipid profile, liver function tests) used to define disease onset or progression; and (3) electronic medical records from the First Affiliated Hospital of Xi’an Jiatong University, linked and extracted two years after enrollment and subsequently at 1-year intervals. Diagnoses from medical records will be verified using ICD-10 codes and clinical criteria, and, where necessary, adjudicated by an independent panel of physicians to ensure accuracy. This multi-source and standardized approach is designed to enhance both the validity and reliability of endpoint determination, thereby improving the robustness of event-based analysis. Biological Sample Processing Methods As mentioned above, common clinical biomarkers, such as blood type and counts, glucose, HbA1c, lipids, indicators of liver function, kidney function, and thyroid function, are tested using automatic equipment in the hospital laboratory ([89]Table 3). Other advanced assays are performed as follows: Table 3. Laboratory Analyses at Baseline within the BMSC Study Laboratory Analyses Analysis Method Serum/plasma concentrations Alanine transaminase Velocity assays Albumin Bromocresol green method Alkaline phosphatase AMP buffer method Apolipoprotein A1, apolipoprotein B, apolipoprotein E Immunoturbidimetric Aspartate transaminase Velocity assays Blood cell count Flow cytometry method Concentration of platelet Impedance method Creatinine Enzymatic Cystatin C Immunoturbidimetric Direct bilirubin Vanadate oxidation method Free thyroxine Chemiluminescence Free triiodothyronine Chemiluminescence Globulin Calculation with total protein and albumin Glucose Hexokinase method Glycosylated hemoglobin HPLC Hemoglobin Colorimetric analysis High-density lipoprotein cholesterol Selective inhibition method High-sensitivity C-reactive protein Immunoturbidimetric Indirect bilirubin Calculation with total bilirubin and direct bilirubin Insulin Direct chemiluminescence Low-density lipoprotein cholesterol Selective direct method Mean platelet volume Impedance method Mean red cell volume Calculation Platelet distribution width Calculation Prealbumin Immunoturbidimetric Retinol-binding protein Latex immunoturbidimetric assay Thyroid-stimulating hormone Chemiluminescence Thyroxine Chemiluminescence Total bile acid Enzymatic cycling method Total bilirubin Vanadate oxidation method Total cholesterol Cholesterol oxidase Total protein Biuret method Triglyceride Deglycerin method Triiodothyronine Chemiluminescence Urea Urease UV kinetic assay Uric acid Enzymatic γ-glutamyltranspeptidase Velocity assays Urine (morning spot) Albumin Chemical analysis methods Creatinine Enzymatic [90]Open in a new tab Abbreviations: BMSC, Biobank for Metabolic Syndrome Consequences; HPLC, high-performance liquid chromatography. Blood samples are used for genomics according to the standard protocols. After DNA is extracted from white cells, the samples are delivered to a professional sequencing company for high-throughput genotyping after strict quality control such as gel electrophoresis, NanoDrop, and Bioanalyzer (Infinium CoreExome-24 BeadChip is intended to be selected). Plasma concentrations of cholesterol and its precursors, including desmosterol, are measured using rapid gas chromatogramtography-mass spectrometry (GC-MS). Plasma inflammatory factors such as ICAM-1, TNF-α, IL-6, and TGF-β are assessed by ELISA kits. Fecal samples are mainly intended for microbiome analysis, according to established protocols. Genomic DNA is extracted from fecal samples using the QIAamp DNA Stool Mini Kit (Qiagen, Germany) according to the manufacturer’s instructions and then used for 16S rRNA gene sequencing. Plasma and fecal samples are also intended for assessing metabolomics and lipidomics. Levels of steroid hormones in hair samples are determined through liquid chromatography-mass spectrometry (LC-MS). Quality Control Several strategies are conducted to ensure the quality of the BMSC study ([91]Figure 2). First, the BMSC study is led by a professional data collection team, and every data collector must have attended a professional training program. Additionally, a handbook for data collection has been developed to facilitate the daily work. Second, both artificial and system queries are performed to ensure the quality of the investigation and collection of anthropometry data. Third, the biological sample should be sent to the Biobank as soon as possible to ensure the quality of the samples after they have been withdrawn. Last, in the laboratory where the analysis of samples takes place, researchers should operate with strict experimental methods to ensure the validity of experimental data. Figure 2. [92]Figure 2 [93]Open in a new tab The process of data collection and quality control of the Biobank for Metabolic Syndrome Consequences study. The blue texts and arrows represent the data and sample collection process, while the red color indicates the quality control process during data collection. Statistical Analysis The BMSC study encompasses multiple research aims and involves a wide range of exposures and outcomes. To ensure adequate statistical power, we performed a sample size estimation using T2D as the exposure and stroke as the outcome. Based on results from the Jinchang Cohort Study in Gansu Province, the incidence of stroke was estimated to be 5.26% among participants with T2D and 1.71% among those without T2D. Assuming a two-sided significance level (α) of 0.05, statistical power (1-β) of 80%, and a 1:1 matching ratio, the required sample size to detect a significant difference between groups was calculated to be 840. To account for potential loss to follow-up and withdrawal during long-term observation, we increased the sample size by 20%, resulting in 1008 participants. Considering that the study aims to investigate multiple outcomes beyond stroke, we determined that a baseline sample size of approximately 2000 participants would be appropriate. Appropriate statistical methods will be used to analyze and present the basic characteristics. Regression models (e.g., logistic regression model) will be fitted to analyze the cross-sectional associations between influence factors and metabolic disorders, while potential prospective associations between endpoint diseases and influence factors will be assessed by linear mixed-effects models combined with Cox regression. Since BMSC is an open, prospective cohort, the baseline dates differ among participants based on their recruitment date. The variation in follow-up duration is fully acknowledged and will be addressed through time-to-event analysis. Most core covariates (age, sex, anthropometrics) are fully recorded at baseline. For questionnaire-derived data (e.g., lifestyle, socioeconomic status), missing covariates will be handled via multiple imputation where appropriate. For microbiome measurements, the Kruskal–Wallis test or regression analysis will be conducted to compare the α-diversity index, such as the Shannon index, Observed feature, and the Gini-Simpson Index. Species-level analysis will be performed using MaAsLin or multiple regression. For metabolomics and lipidomics data, the least absolute shrinkage and selection operator and principal component analysis will be used to reduce the dimensionality of the data. Correlation analysis, regression analysis, and machine learning methods (e.g., Light Gradient Boosting Machine) will be applied to identify associations between omics features and clinical outcomes. To integrate multi-omics datasets (genomics, metabolomics, microbiome) with epidemiological and clinical variables, we will employ data fusion approaches such as multi-view learning and network-based methods, enabling the identification of cross-omics biomarkers and molecular signatures. Pathway enrichment analysis will further aid in interpreting the biological relevance. This integrative framework is expected to support biomarker discovery, risk prediction, and provide mechanistic insights into the pathophysiology of MetS and its consequences. Basic Characteristics of the Participants at Recruitment Up to August 2024, a total of 1589 participants have been included in the BMSC study. Of them, 951 are men (59.8%) and 1420 (89.4%) are enrolled in the hospital. The age of participants ranged from 18 to 74 [mean = 47, standard deviation (SD): 14]. The average BMI was 27.0 kg/m^2 (SD: 5.6) for men and 29.2 kg/m^2 (SD: 6.9) for women. The total rate of biological sample collection was at least 78.5%. Hyperglycemia was the most prevalent metabolic disorder in the participants (80.9%). 37.6% of participants were obese, and the prevalence of MetS was 65.8%. Among the study population, the proportions of participants with 0, 1, 2, 3, 4, and 5 MetS components were 0.8%, 11.6%, 21.8%, 26.8%, 25.6%, and 13.5%, respectively ([94]Table 1). [95]Table 4 presents the distribution of several cardiometabolic biomarkers. Fasting plasma glucose, HbA1c, TyG, triglycerides, systolic blood pressure, diastolic blood pressure, and serum uric acid were higher in males, while total cholesterol, HDL-C, and LDL-C were higher in females (all P values ≤ 0.026). Table 4. Cardiometabolic Biomarkers of Study Participants in the BMSC Study Cardiometabolic Biomarkers Total (n =1589) Male (n =951) Female (n = 638) P value* FPG, mean (SD), mmol/L 7.7±3.5 8.2±3.9 7.1±2.8 <0.001 HbA1c, mean (SD), % 8.3±2.0 8.6±2.1 7.8±1.7 <0.001 TyG 9.1±0.8 9.2±0.8 9.0±0.7 <0.001 TC, mean (SD), mmol/L 4.4±1.1 4.3±1.2 4.5±1.0 <0.001 TG, mean (SD), mmol/L 2.0±1.8 2.1±2.0 1.9±1.4 0.019 HDL-C, mean (SD), mmol/L 1.0±0.2 1.0±0.2 1.1±0.3 <0.001 LDL-C, mean (SD), mmol/L 2.6±0.8 2.5±0.9 2.7±0.8 <0.001 SBP, mean (SD), mm Hg 129.6±16.3 130.4±16.4 128.4±16.2 0.019 DBP, mean (SD), mm Hg 80.0±11.2 81.1±11.1 78.3±11.1 <0.001 SUA, mean (SD), mmol/L 355.8±93.8 360.1±98.4 349.4±86.0 0.026 [96]Open in a new tab Notes: *P values were calculated (between males and females) by two-sample Student’s t test. Abbreviations: BMSC, Biobank for Metabolic Syndrome Consequences; DBP, diastolic blood pressure; FPG, fasting plasma glucose; HbA1c, glycated haemoglobin; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; SBP, systolic blood pressure; TC, total cholesterol; TG, triglycerides; SUA, serum uric acid. Strengths and Limitations The BMSC study is an ongoing prospective observational study of approximately 2000 adults with metabolic disorders. It has several strengths. First, the study’s deep phenotyping yields profiles of individual characteristics in many dimensions, including anthropometry, lifestyle factors, psychosocial characteristics, medical records, clinical biomarkers, and multi-omics profiles. Second, strict measures adopted during the project ensure high-quality data collection. Third, the standardized Biobank procedure guarantees the quality of biological samples, which is one of the distinctive strengths of the BMSC study. Fourth, repeated measurements and biological sampling across multiple time points make it possible to assess the dynamic change of influencing factors and biomarkers. However, some limitations should be acknowledged. First, participants in the BMSC study are recruited in Northwest China; hence, extrapolation of results to other regions or countries requires caution. Second, only participants with metabolic disorders meet the inclusion criteria of the BMSC study, which precludes the recruitment of healthy participants as controls. The absence of a control group limits the ability to determine whether the observed alterations are disease-specific or reflect normal variations, thereby reducing the capacity to infer causality and restricting the generalizability of our findings to broader populations. To address this limitation, we have initiated the recruitment of metabolically healthy individuals from the Department of Health Medicine of our hospital, who may serve as an appropriate control group in future analyses. This will allow us to better delineate disease-specific alterations and strengthen the robustness of our conclusions. Third, while observational cohort designs inherently limit causal inference, the BMSC study aims to elucidate potential mechanistic pathways and identify clinically relevant associations, rather than assert definitive causal relationships. In addition, the current sample size is modest compared to national mega-cohorts in China. However, the BMSC study especially focuses on deeply phenotyped, metabolically characterized individuals, with integrated multi-omics datasets and long-term biobanked biospecimens. Finally, several practical challenges should also be acknowledged. First, maintaining participant retention over a 5-year follow-up may be difficult, and loss to follow-up could reduce statistical power. Second, the management and analysis of high-dimensional multi-omics data are complex and will require rigorous bioinformatics pipelines and quality control procedures. Third, as the study is conducted in a single regional population, the generalizability of findings to other geographic or ethnic groups may be limited. These challenges will be carefully considered in the implementation and interpretation of the BMSC study. Prospective Contributions of the BMSC Study Although no study results are yet available, the BMSC protocol has several distinctive features that may provide novel insights into the etiology and progression of MetS and related chronic diseases. The repeated biospecimen collection combined with multi-omics profiling and clinical data enables comprehensive characterization of biological pathways. Moreover, the relatively high-frequency follow-up (every 3 months) enhances the granularity of longitudinal assessments, which is uncommon in large cohort studies. Together, these features make the BMSC study a valuable resource with strong potential to contribute to precision medicine research and to inform prevention and intervention strategies for metabolic and cardiovascular diseases. Beyond scientific discovery, the BMSC study may also have important implications for public health and clinical practice. For example, longitudinal biomarker profiling could help to refine risk stratification tools for MetS and related CMDs, thereby informing clinical guidelines for earlier detection and more personalized prevention strategies. In addition, the integration of multi-omics data with lifestyle and clinical outcomes may identify novel therapeutic targets, which could ultimately support the development of precision interventions for metabolic disorders. These potential applications highlight the translational value of the BMSC study beyond research contexts. Conclusion The BMSC study provides a new framework for a prospective MetS biobank with multi-dimensional information in an underrepresented population. This resource is expected to contribute to disease prediction, early diagnosis, and mechanistic understanding, thereby informing precision medicine and public health strategies. While we recognize practical challenges such as long-term participant retention, management of complex data, and limited generalizability beyond the study region, these can be addressed through rigorous methodology, robust infrastructure, and collaboration efforts. Continued stakeholder support and open opportunities for data sharing and collaboration will be critical to maximizing the value of the BMSC and ensuring its long-term impact. Acknowledgments