Abstract Rice metabolomics is widely used for biomarker research in the fields of pharmacology. As a consequence, characterization of the variations of the pigmented and non-pigmented traditional rice varieties of Tamil Nadu is crucial. These varieties possess fatty acids, sugars, terpenoids, plant sterols, phenols, carotenoids and other compounds that plays a major role in achieving sustainable development goal 2 (SDG 2). Gas-chromatography coupled with mass spectrometry was used to profile complete untargeted metabolomics of Kullkar (red colour) and Milagu Samba (white colour) for the first time and a total of 168 metabolites were identified. The metabolite profiles were subjected to data mining processes, including principal component analysis (PCA), Orthogonal Partial Least Square Discrimination Analysis (OPLS-DA) and Heat map analysis. OPLS-DA identified 144 differential metabolites between the 2 rice groups, variable importance in projection (VIP) ≥ 1 and fold change (FC) ≥ 2 or FC ≤ 0.5. Volcano plot (64 down regulated, 80 up regulated) was used to illustrate the differential metabolites. OPLS-DA predictive model showed good fit (R2X = 0.687) and predictability (Q2 = 0.977). The pathway enrichment analysis revealed the presence of three distinct pathways that were enriched. These findings serve as a foundation for further investigation into the function and nutritional significance of both pigmented and non-pigmented rice grains thereby can achieve the SDG 2. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-024-05123-3. Keywords: Traditional rice variety, Metabolite biomarkers, Principal compound analysis, Gas chromatography-mass spectrometry, SDG 2, KEGG pathway, Univariate and multivariate analysis Background Rice (Oryza sativa L) is a primary dietary component for more than half of the global population and ranks as the second largest cultivated cereal crop across the globe [[47]1, [48]2]. SDG 2 specifically addresses food, aiming to “end hunger, achieve food security and im-proved nutrition and promote sustainable agriculture.” However, a number of other goals also address issues facing the food system. The realization has emerged in recent years that traditional rice varieties constitute a valuable gene pool for features that may support modern rice varieties ability to adapt to climate change, particularly in light of the phenomenon of climate change. After COVID 19 Pandemic, the eating behaviour of the Indian people especially the Tamilians were changed drastically. This mainly highlighted the importance of rebuilding the immune system through nutrition rich diet balances [[49]3]. Traditional rice can be categorized into two types: pigmented and non-pigmented. Non-pigmented rice is consumed by approximately 85% of the world’s population, whereas pigmented rice has traditionally been enjoyed primarily in China, Japan, and Korea due to its distinct flavor and perceived health benefits [[50]4]. The rice grain contains metabolites that exhibit protective properties against human diseases when consumed through the diet. These metabolites also contribute positively to the immune system. In recent years, there has been a growing interest in pigmented rice varieties, with red rice in particular garnering attention due to its bioactive compounds. These compounds have been found to possess superior antioxidant, anti-inflammatory, antitumor, and hypoglycemic effects as supported by various studies [[51]5–[52]9]. In addition to its various health advantages, including other benefits as mentioned by [[53]10, [54]11], it is noteworthy that rice with darker shades holds properties beyond those found in light-colored varieties. The presence of distinct polyphenol subgroups in whole grain rice of different colors, which have the potential to positively influence human well-being [[55]11, [56]12]. The primary phenolics found in red rice varieties include ferulic acid, p-coumaric acid, and vanillic acid [[57]12] which is also reported in this study. Studies have suggested that p-coumaric acid and vanillic acid may play a crucial role in the antioxidant activity of red rice [[58]13]. Furthermore, existing research indicates a notable positive correlation between phenolic components and antioxidation [[59]12, [60]14–[61]16]. The predominant focus of research on pigmented rice has been on the relationship between anthocyanins and antioxidants, as well as its nutritional properties. This is because rice is widely acknowledged as a functional food in various Asian countries, known for its numerous reported health advantages [[62]17]. Conversely, Red rice (another type of pigmented rice) boasts elevated levels of proanthocyanidins and other phenolics [[63]18, [64]19]. Additionally, it has been observed that pigmented rice exhibits a higher level of antioxidant activity compared to non-pigmented rice. The degree of pigmentation also plays a role, with darker pigmentation indicating a greater presence of flavonoids and thus stronger antioxidant properties [[65]12, [66]20, [67]21]. These cultivars have rich and varied nutrient profiles including antioxidants, which greatly contribute to global food security and nutrition. Thereby, traditional rice is very compatible with SDG 2’s objective of guaranteeing that everyone has access to safe and nourishing food because of their ability to prevent malnutrition and address certain nutrient deficits. Their production also promotes resilience to climate change, sustainable farming methods, and the preservation of agricultural biodiversity. Extensive research has explored the different metabolites found in both types of rice around the world. However, until now, there have been few published studies on the metabolomics of pigmented and non-pigmented traditional rice varieties of Tamil Nadu. Hence, it is important to highlight the absence of comprehensive metabolic profiling that encompasses primary and secondary metabolites across various cultivars of pigmented and non-pigmented traditional rice. Additionally, there is a need to comprehend the metabolic networks that connect extensive datasets from metabolite profiling with metabolic pathways. The present study employed a gas chromatography-mass spectrometry (GC-MS/MS) metabolomics technique, specifically using triple quadrupole mass spectrometry. This technique was utilized to explore and quantify primary and secondary metabolites in pigmented and non-pigmented traditional rice varieties, namely Kullakar and Milagu Samba. The metabolite profiling of Kullakar and Milagu Samba is the first study of its kind. The analysis involved chemometric methods like principle component analysis (PCA) and partial least square discrimination analysis (PLS-DA) to effectively classify the samples based on their diversity. To categorize the functional metabolites that offer health benefits from traditional rice varieties, we conducted hierarchical cluster analysis and metabolic pathway identification. The results of this study will provide a valuable theoretical foundation for developing functional foods using traditional rice. Materials and methods Two rice cultivars were chosen based on their pericarp colour: Red rice (Kullakar) and white rice (Milagu Samba) (Fig. [68]1; Table [69]1). Prior to collecting the plant material, we have obtained legal permission from the Director of Research at Tamil Nadu Agricultural University in Coimbatore, Tamil Nadu, India. The collection process adheres to all institutional, national and international guidelines and legislation. The samples included in this study were exclusively collected from the farmers. The mentioned traditional varieties were cultivated by the farmers of Thanjavur district during the Samba season (July-December, 2021) by adopting the required agronomic practices. Following cultivation, the grains were sun-dried to achieve a water content of approximately 10–11%, and subsequently stored in darkness at a temperature of 4 °C until further use. Prior to analysis, each sample underwent manual dehulling to obtain brown rice, while any broken grains were discarded. To ensure accuracy and reliability of results, three biological replicates were conducted for each rice cultivar. Fig. 1. [70]Fig. 1 [71]Open in a new tab Photographs illustrating the pigmented and non-pigmented traditional rice Table 1. Detailed agronomic characteristics of the selected traditional rice Name of the Variety Kullakar Milagu Samba Origin Tamil Nadu, India Tamil Nadu, India Pedigree Unknown Unknown Duration (Days) 120–125 135–140 Average height (cm) 110 120 Number of grains per ear head 100–110 150–170 Yield of grain (kg acre^− 1) 1400 1500 1000 grain weight (g) 25 20 Colour of pericarp Red White [72]Open in a new tab Sample preparation and extraction To begin the analysis, a 20 ml centrifuge tube was utilized. Inside the tube, one gram of finely ground rice sample was carefully placed. To create a suitable solution, 10 ml of HPLC grade ethanol was added to the tube. The mixture in the tube underwent vortexing using a LABOID (International instrument from Himachal Pradesh, India) at a speed of 2000 rpm for a duration of ten minutes. After vortexing, the mixture was subjected to centrifugation for 20 min at a speed of 5000 rpm [[73]22]. Following centrifugation, the supernatant obtained from the process underwent concentration using a rotary evaporator and subsequently filtered utilizing a PVDF syringe filter with a pore size of 0. 2 μm. The resulting filtrate was then stored in an air-tight glass vial at temperatures of 4 °C in preparation for further chromatographic analysis. Chromatography Condition and Analysis The ethanolic extract that had undergone filtration was analyzed for metabolites using a Thermo Fisher ISQ triple quadrupole gas chromatograph - mass spectrometer, specifically the Thermo Fisher TSQ 8000 Duo Triple Quadrupole GC-MS/MS. The gas chromatograph (GC) was equipped with a fused silica capillary column DB-5 measuring 30 m in length and 0. 25 mm in internal diameter, with a film thickness of 0. 25 μm. Helium gas was utilized as the carrier gas at a flow rate of 1.0 ml/min. Subsequently, a volume of 1 ml from the specimen was preserved in a screw-top vial of 2 ml capacity, which was then loaded into an auto-injector. A minute quantity of 1 µl from the sample was injected in split mode (ratio: 1:10). The temperature of both the detector and injector were maintained at 250^◦C throughout the process. The oven temperature was programmed to increase gradually, starting at 70◦C for a duration of 15 min, followed by a rapid rise to 280^◦C at a rate of 30^◦C per minute. Afterward, it remained constant for another ten minutes be-fore being lowered to 250^◦C at a rate of ten degrees per minute. The mass spectrometer set-tings consisted of operating in full scan mode with electron impact spectra at an energy level of 70 eV. Additionally, the ion source temperature was set at 260^◦C while the trans-mission line temperature remained steady at 280^◦C. The range of the mass scan, measured in mass-to-charge ratio (m/z), was set between 50 and 650 atomic mass units (amu). A solvent delay of 3 min was implemented [[74]23]. The identification of bioactive molecules involved comparing their mass spectra with the NIST 08 Mass Spectra Library, which is maintained by the National Institute of Standards and Technology. The name, molecular weight, and structure of the identified molecules were determined using data-bases such as NIST, Pub Chem and HMDB. Statistical analysis The experiments were conducted in triplicate and the metabolites were annotated using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and the human metabolome database (HMDB). To analyze the metabolites, both univariate and multivariate analysis techniques were employed using the R package-based Metabo Analyst 5. 0 [[75]24]. Specifically, an OPLS-DA model was utilized to compare the metabolic characteristics of pigmented and non-pigmented rice varieties. Prior to analysis, the metabolite data underwent normalization and auto scaling procedures. The criteria for screening differential metabolites involved setting the variable importance in the projection (VIP) value to be greater than or equal to 1 in the OPLS-DA model, and also requiring an absolute Log2FC (fold change) value of at least 1. VIP metabolites were subjected to Debiased Sparse Partial Correlation algorithm (DSPC) and pathway map network in Cytoscape software. Venn diagrams were utilized to visually represent the count of these differential metabolites. Furthermore, pathways containing metabolites that exhibited significant regulation were subjected to a metabolite sets enrichment analysis (MSEA). The significance of these pathways was evaluated using p-values derived from hypergeometric tests. Results Metabolite detection The metabolite profiles of samples were systematically analyzed pigmented (Kullakar) and non-pigmented (Milagu Samba) rice for the first time in this research work. A comprehensive non targeted metabolite analysis of pigmented and non-pigmented rice was conducted using GC-MS/MS revealed total of 168 metabolites (Table [76]S1, Fig. [77]S1 & S2). Precisely, Kullakar exhibited 114 metabolites whereas non-pigmented Milagu Samba dis-closed with 103 metabolites with 49 metabolites being shared commonly between 2 varieties. Identified 168 metabolites includes 11 Benzene and substituted derivatives, 13 each under Prenol lipids and Saturated hydrocarbons, 19 separately by Organooxygen compounds and Steroids and steroid derivatives and 62 Fatty Acyls. Biological classes namely Acyl halides, Carboxylic acids and derivatives, Dihydrofurans, Phenols, Pyridines and derivatives and Unsaturated hydrocarbons consists of 2 metabolites under each category. The remaining 19 classes shares one metabolite individually. Identified metabolites were categorized into 32 significant biological classes. These covered of Benzene and substituted derivatives (6.63%), Prenol lipids (7.83%), Saturated hydrocarbons (7.83%), Organooxygen compounds (11.40%), Steroids and steroid derivatives (11.40%) and Fatty Acyls (37.3%) (Fig. [78]2). Classes such as Acyl halides, Carboxylic acids and derivatives, Dihydrofurans, Phenols, Pyridines and derivatives and Unsaturated hydrocarbons shares 1.20% each. Fig. 2. Fig. 2 [79]Open in a new tab The provided pie chart illustrates the distribution of identified metabolites across various biological classes, as per the classifications of the Human Metabolome Database (HMDB) A Venn diagram was constructed to exemplify the overlapping and differed metabolites between Kullakar and Milagu Samba (Fig. [80]3; Table S2). In the pairwise comparison, 49 metabolites were found to be overlapped that includes majorly fatty acids, steroids, sugars, terpenes and alkanes. Venn diagram exhibited 65 metabolites were particular to Kullakar and 54 to Milagu samba. Fig. 3. [81]Fig. 3 [82]Open in a new tab Venn diagram can be employed to visually represent the shared and distinct metabolites between pigmented and non-pigmented varieties of rice Principal component analysis A complete multivariate statistical analysis called PCA was performed on two different coloured traditional rice to explicate the dissimilarities in their metabolite composition. The principal components (PCs) that exhibited eigen value of greater than one was held in the study. Selected PCs in the score diagram (Fig. [83]4) clearly indicates the metabolite loadings of Kullakar on the negative and the non-pigmented on positive side which undoubtedly demonstrates the pigmentation based discernible variation between the samples. The unsupervised classification extracted two PCs explaining 98.30% of total variation (Table S3; Fig. S3). Fig. 4. [84]Fig. 4 [85]Open in a new tab Score plot of PCA The first PC alone described 96.4% variation that is correlated to 21 metabolites such as (3beta,4alpha,5alpha,9beta)-4,14-Dimethyl-9,19-cycloergost-24-en-3-ol, (9Z,11 S,16 S)-1-Acetoxy-9,17-octadecadiene-12,14-diyne-11,16-diol, 6-Octadecenoic acid, Acetylhydrazine, alpha-Sitosterol, Benzothiazole, Bovinic acid, Cholest-5-ene, Choles-tan-3-ol, cis-Vaccenic acid, Clionasterol, Dihydrobrassicasterol, Isofucosterol 3-O-[6-O-(9,12-Octadecadienoyl)-b-D-glucopyranoside], Lignocerane, Linoleic acid, Me-thyl hexadecanoic acid, Palmitic acid, Paullinic acid, Pentadecanoic acid, Stigmasterol and trans-12-Octadecenoic acid. The second PC shared 1.4% to the total variation was mainly contributed by 1-Hexadecanol, 1-hydroxylycopene, 2,6,6-Trimethylcyclohex-2-en-1-one, 3-Palmitoyl-sn-glycerol, 6-Octadecenoic acid, Ascorbic acid, Cholest-5-ene, Clionasterol, cis-Vaccenic acid, Dihydrobrassicasterol, Geranyl-geranyl-PP, Heptadecanoic acid, Isofucosterol 3-O-[6-O-(9,12-Octadecadienoyl)-b-D-glucopyranoside], Lignocerane, Oleic acid, Palmitic acid, Palmitoyl chloride, Paullinic acid, petroselinate and trans-12-Octadecenoic acid. Screening of differential metabolites An OPLS-DA model was constructed to precisely recognize the distinct metabolites among the 2 rice varieties. The values of R^2Y and Q^2 were adjacent to 1 (R^2X = 0.687, R^2Y = 0.999, Q^2 = 0.977), that validates the reliability and stability of OPLS-DA model for identifying the differential metabolites between the sample groups (Fig. S4, Table S4). The score graph of OPLS-DA (Fig. [86]5) markedly separated the Kullakar from Milagu samba, indicating the ascertain variation in the metabolite phenotypes of pigmented and non-pigmented rice varieties. The score plot of OPLS-DA model (Fig. [87]5) efficiently discriminated the metabolite distribution within and between the sample groups and the organisation of samples were exactly matched the PCA score plot (Fig. [88]4). The PLS-DA showed similar variance like PCA, and showed 68.7% and 9.5% variance along latent variables 1 and latent variables 2, respectively (Fig. [89]5). Fig. 5. [90]Fig. 5 [91]Open in a new tab Score plot of OPLS-DA for differential metabolites analysis of pigmented rice compared to non-pigmented rice Differential metabolites were chosen between the 2 rice groups according to the OPLS-DA model (Fig. [92]5), variable importance in projection (VIP) ≥ 1 and fold change (FC) ≥ 2 or FC ≤ 0.5. The screened 144 metabolites (80 up regulated and 64 down regulated) are visually depicted through volcano plots (Fig. [93]6) (Table S5) that were significantly different among Kullakar and Milagu Samba. Fig. 6. [94]Fig. 6 [95]Open in a new tab The volcano plot diagram depicts the expression levels of differential metabolites (with a fold change greater than 2) between pigmented and non-pigmented rice varieties Variable important projection (VIP) The PLS-DA model was used to calculate each metabolite’s Variable Importance in the Projection (VIP) ratings. The most significant biomarkers were found to be metabolites with a VIP score higher than 1.0. These particular metabolites have the ability to distinguish between pigmented and non-pigmented rice samples. The top 30 metabolite features having VIP value of more than one was identified from the pigmented and non-pigmented rice (Fig. [96]7). Most discriminated list of metabolites included Stearic acid, Paullinic acid, Cholestan-3-ol, Linoleic acid, Dihydrobrassicosterol, Lignocerane, Palmitic acid, Bovinic acid, Palmitoleic acid, Ribavirin and petroselinate, alpha-Sitosterol and Stigmasterol. Fig. 7. [97]Fig. 7 [98]Open in a new tab VIP scores for the selected differential metabolites Correlation network analysis To further elucidate the importance and association of identified top 30 metabolite features correlation network was constructed using a regularization technique called the DSPC was created to handle high-dimensional metabolomics data derived from mass spectrometry [[99]25] in Cytoscape software. A total of 30 metabolites were identified as candidate biomarkers based on OPLS-DA analysis through VIP Score among pigmented and non-pigmented varieties. A visualization of the correlation networks is presented in Fig. [100]8. All these 30 high influential metabolites were belonged to fatty acids, steroids, alkanes and organooxygen compounds. High positive correlation among the fatty acids were found between (9Z,11 S,16 S)-1-Acetoxy-9,17-octadecadiene-12,14-diyne-11,16-diol and 9Z,12E-Octadecadienoic acid. An organooxygen metabolite 2,6,6-Trimethylcyclohex-2-en-1-one was positively correlated with 9Z,12Z-octadecadienoyl-CoA and Methyl hexadecanoic acid. Among the steroids 24-Methylenecycloartan-3-ol was positively correlated with the fatty acid compounds like (9Z,11 S,16 S)-1-Acetoxy-9,17-octadecadiene-12,14-diyne-11,16-diol and 9Z,12E-Octadecadienoic acid. Alkanes namely Lignocerane and Heneicosane are having high negative correlation with 6 Octadecenoic acid. There is no negative correlation found among the fatty acids. Steroids have strong negative correlation with fatty acids. Fig. 8. [101]Fig. 8 [102]Open in a new tab The correlation network among the VIP metabolites is visualized. Positive correlations are represented by the orange color, while negative correlations are depicted in violet. The size of the network lines reflects the strength of the associations between the metabolites Pathway enrichment analysis Enrichment analysis was conducted with the purpose of revealing the vital biological role, which can greatly contribute to our understanding of the primary molecular function. The differential metabolites (144) identified through the Volcano analysis and Fold change were subjected to enrichment analysis based on the Overall representative analysis and p-values of the biological pathways that involved 27 metabolic pathways (Fig. [103]9; Table S6). Pathways namely Fatty Acid Biosynthesis, Beta Oxidation of Very Long Chain Fatty Acids and Plasmalogen Synthesis were enriched significantly (p-value < 0.05) in the Kullakar and Milagu Samba comparison. Fig. 9. [104]Fig. 9 [105]Open in a new tab The KEGG pathway analysis was performed on distinct metabolites for pigmented and non-pigmented rice. In the visual representation, each bubble represents a metabolic pathway, with the horizontal axis indicating the extent of associated factors (larger bubbles indicating more significant impacts). The colour of the bubbles corresponds to the p-value obtained from the enrichment analysis, with lighter colors indicating lower levels of enrichment Discussion Rice holds great significance both in terms of human impact and economic value on a global scale. Enhancing grain yield and disease resistance are the main objectives when it comes to improving crop genetics [[106]26]. Additionally, the focus on enhancing the nutritional quality of rice has gained considerable attention in recent years [[107]27]. Wild rice and landraces, with their diverse genetic makeup, have become crucial resources for advancing rice genetics and developing new cultivars [[108]28]. Landraces have garnered attention for their ability to adapt to local environments, tolerate abiotic stress, and possess specific metabolic components [[109]29]. Through ongoing domestication and improvement processes, landraces have exhibited distinct characteristics compared to cultivated rice [[110]30]. Pigmented rice is increasingly popular due to its bioactive compounds like phenols, flavonoids, nutritional benefits, minerals, vitamins and plant sterols [[111]11, [112]31, [113]32] amidst the COVID pandemic. Untargeted metabolomic approach was used to produce a comprehensive report on metabolomic profile of pigmented and non-pigmented rice. The study identified 168 metabolites comprised of fatty acids, sugars, steroids and benzene com-pounds which has significant role in nutritional as well as the pharmacological sectors (Table [114]2 & Table [115]S1). Table 2. Biological significance of metabolites identified from the study S.No. Important metabolites Biological significance References