Abstract Sorghum, a genetically diverse C[4] cereal, is an ideal model to study natural variation in photosynthetic capacity. Specific leaf nitrogen (SLN) and leaf mass per leaf area (LMA), as well as, maximal rates of Rubisco carboxylation (V[cmax]), phosphoenolpyruvate (PEP) carboxylation (V[pmax]), and electron transport (J[max]), quantified using a C[4] photosynthesis model, were evaluated in two field-grown training sets (n = 169 plots including 124 genotypes) in 2019 and 2020. Partial least square regression (PLSR) was used to predict V[cmax] (R^2 = 0.83), V[pmax] (R^2 = 0.93), J[max] (R^2 = 0.76), SLN (R^2 = 0.82), and LMA (R^2 = 0.68) from tractor-based hyperspectral sensing. Further assessments of the capability of the PLSR models for V[cmax], V[pmax], J[max], SLN, and LMA were conducted by extrapolating these models to two trials of genome-wide association studies adjacent to the training sets in 2019 (n = 875 plots including 650 genotypes) and 2020 (n = 912 plots with 634 genotypes). The predicted traits showed medium to high heritability and genome-wide association studies using the predicted values identified four QTL for V[cmax] and two QTL for J[max]. Candidate genes within 200 kb of the V[cmax] QTL were involved in nitrogen storage, which is closely associated with Rubisco, while not directly associated with Rubisco activity per se. J[max] QTL was enriched for candidate genes involved in electron transport. These outcomes suggest the methods here are of great promise to effectively screen large germplasm collections for enhanced photosynthetic capacity. 1. Introduction Sorghum (Sorghum bicolor L. Moench), a C[4] pathway species and the world's fifth most produced cereal [[43]1], is adapted to a range of environments and retains high photosynthetic efficiency in diverse conditions [[44]2–[45]4]. These characteristics make it a crop of interest for the dual challenge of meeting increasing demands for food and adapting to the effects of climate change [[46]5, [47]6]. In addition to the C[4] pathway, which confers adaptation to hot and dry environments, the natural genetic diversity of sorghum provides potential to identify genotypes or genetic loci associated with greater photosynthetic capacity [[48]7]. However, in order to select the photosynthetically favourable genotypes adapted to contrasting environments, tools are required to quantify the biochemical parameters underpinning photosynthetic capacity in a high-throughput manner, removing the phenotyping bottleneck with the traditional gas exchange approach. Photosynthesis is the process of converting captured solar radiation into chemical energy by fixing carbon dioxide (CO[2]) to form carbohydrates and biomass. Improving photosynthetic capacity is seen as a major target to further improve crop yields [[49]2, [50]3, [51]8]. Screening germplasm to directly breed for improved photosynthetic responses to environment conditions is constrained by the complexity of measuring such responses and requires development of higher-throughput indirect phenotyping techniques. In the C[4] photosynthetic pathway, the biochemical processes in the mesophyll cells are coordinated with a CO[2] concentrating mechanism in the bundle-sheath cells [[52]9, [53]10]. In the mesophyll, CO[2] is initially fixed by phosphoenolpyruvate (PEP) carboxylase into C[4] acids, which are then decarboxylated in the bundle sheath cells leading to high CO[2] levels and hence more efficient carboxylation of Ribulose-1,5-bisphosphate (RuBP) by Ribulose 1,5-bisphosphate carboxylase-oxygenase (Rubisco) [[54]11, [55]12]. The energy for the regeneration of RuBP in the bundle sheath and PEP in the mesophyll comes from chloroplast electron transport [[56]11]. Due to their key roles in the photosynthetic pathway, the maximal rates of Rubisco carboxylation (V[cmax], μmol m^−2s^−1), PEP carboxylation (V[pmax], μmol m^−2s^−1), and maximal electron transport rate (J[max], μmol m^−2s^−1) largely determine photosynthetic capacity of C[4] plants and therefore underpin crop productivity. Simulations using a diurnal canopy photosynthesis model predict that canopy growth rate of C[4] cereals responds largely to changes in J[max] [[57]13]. Quantification of these biochemical parameters is hence of value for selecting enhanced photosynthesis and growth. This is traditionally achieved by conducting gas exchange measurements and fitting observed photosynthetic responses to CO[2] or light with the Rubisco-activity or electron-transport limited equations in the C[4] photosynthesis model [[58]11, [59]14]. However, this method is very time-consuming and not suitable for high-throughput screening of large germplasm collections. The capacity of leaves to convert absorbed CO[2] and radiation into biomass also depends on key leaf physiological and structural properties [[60]15]. Two such properties are specific leaf nitrogen (SLN, g m^−2) and leaf mass per leaf area (LMA, g m^−2), and both of these are known to be closely associated with photosynthetic capacity [[61]16, [62]17]. Because nitrogen is a key element in photosynthetic machinery, such as chloroplasts, plant nitrogen status closely links with leaf photosynthetic rates and canopy radiation use efficiency [[63]18–[64]20] and is hence an important parameter in canopy performance modelling [[65]13, [66]21]. The relationship between leaf nitrogen content and maximal net photosynthesis rate is influenced by LMA which is strongly associated with leaf lifespan and thus affecting the rates of the photosynthetic parameters [[67]15, [68]16, [69]22]. However, conventional measurements of SLN and LMA are destructive and slow, limiting their potential to identify germplasm with higher photosynthetic capacity in large breeding programs. High-throughput plant phenotyping technologies enable the collection of plant biochemical and physiological traits rapidly and nondestructively at large scale [[70]23–[71]26]. Various vegetation indices, which are usually calculated using a few selected wavelengths, have been correlated with plant structural traits (e.g., leaf area index and biomass) or leaf pigment concentration (e.g., chlorophyll). Typical canopy size indicators include normalized difference vegetation index (NDVI) [[72]27, [73]28] and optimized soil adjusted vegetation index (OSAVI) [[74]29]. Chlorophyll content, on the other hand, has been indicated by indices, such as normalized difference red edge (NDRE) [[75]30] and chlorophyll vegetation index (CVI), which is an indirect measure of nitrogen content [[76]31]. Adjustments to these vegetation indices have also been reported. For example, replacing red bands with red edge when calculating some indices exhibited better performance in estimating chlorophyll content [[77]32]. More recently, hyperspectral imaging sensors with wavelengths in the visible (400-700 nm), near infrared (700-1000 nm), and shortwave infrared (1000-2500 nm) domain have advanced the development of high-resolution spectroscopy techniques. This has led to significant increases in the accuracy and the types of physiological properties that can be retrieved [[78]26, [79]33]. The linkage between photosynthetic capacity and hyperspectral features therefore constitutes a promising avenue to predict photosynthetic performance of plants across broad scales [[80]20, [81]34–[82]36]. Various studies have exploited the plethora of bands (>270) and the much narrower band width (<6 nm) available from current hyperspectral sensors to better quantify biochemical and physiological properties in crops [[83]35, [84]37]. However, most of the studies so far use hyperspectral reflectance to estimate leaf photosynthetic capacity in C[3] crops [[85]34, [86]35, [87]37–[88]41], and similar studies are much rarer for C[4] crops. At least one study focused on V[cmax], V[pmax], leaf nitrogen content, and specific leaf area from whole spectra reflectance (500-2400 nm) using partial least square regression (PLSR) in C[4] crop maize [[89]42]. However, J[max] that quantifies the rate of electron-transport limited photosynthetic rate [[90]11] is also important in determining daily biomass growth [[91]13], but has not previously been targeted. A more comprehensive study on quantifying the key parameters of photosynthesis V[cmax], V[pmax], and J[max] in a C[4] crop species is proposed. In addition, a high-throughput method to predict key parameters linked to photosynthetic capacity from canopy-level hyperspectral measurements will aid in the selection of genetic material with improved photosynthetic capacity at a large scale. To our knowledge, there are no published previous attempts to estimate the full set of key parameters known to limit C[4] photosynthesis, at canopy level, using hyperspectral reflectance. Additionally, next generation sequencing techniques have provided a high-throughput and cost-efficient tool for detecting genomic regions associated with crop traits of interest via genome-wide association studies (GWAS) [[92]43–[93]45]. Combining the techniques of hyperspectral sensing and GWAS would greatly facilitate the improvement of photosynthetic capacity and ultimate crop performance, which to date has rarely been explored. The main objective of this study was to estimate traits associated with photosynthetic capacity from proximal hyperspectral sensing of sorghum canopies. Specifically, we aimed to (i) develop algorithms to predict photosynthetic parameters (V[cmax], V[pmax], and J[max]), SLN, and LMA from proximal hyperspectral canopy reflectance captured with a spectrometer attached to a mobile phenotyping platform in two field-grown training sets; (ii) extrapolate the algorithms to GWAS trials grown adjacent to the training sets using a fully genotyped sorghum diversity panel; (iii) evaluate the heritability of the predicted traits; and (iv) undertake GWAS to detect genomic loci associated with the key photosynthetic parameters and identify potential candidate genes to assess the usefulness and robustness of the approaches used in this study. 2. Materials and Methods 2.1. GWAS Trials Two field experiments were conducted during two consecutive summer seasons (2019 and 2020) at Gatton Research Station (GAT), Gatton, Queensland, Australia (27°33′S, 152°20′E, 94 m above sea level). GAT1 and GAT2 were sown on 14 January 2019 and 12 November 2019, respectively. Both trials were designed using partial replication with spatially randomised genotypes arranged in rows and columns. There were 875 plots, including 650 genotypes in GAT1, and 912 plots, including 634 genotypes in GAT2, with 70 genotypes in common between trials ([94]Table 1). The genotypes in GAT1 were all inbred lines (n = 649) from a sorghum diversity panel comprising world-wide collections [[95]43], and one hybrid was also included. In GAT2, 89% genotypes were hybrids from the Queensland breeding program, and the rest were inbred lines from the sorghum diversity panel. Each plot (4.5 m length and 3 m width) sown to a genotype consisted of four rows. Both trials were planted with a GPS precision planter at a population density of 108,000 plants ha^−1. For both trials, 150 kg of nitrogen per hectare was applied preplanting, and plots were irrigated regularly to provide nutrient and water nonlimiting conditions. The temperature, photosynthetic photon flux (PPF), and relative humidity (RH) from 6 am to 6 pm for the duration of each trial are shown in [96]Table 1. Table 1. Top: mean and maximum daily temperatures, mean daily photosynthetic photon flux, and relative humidity during the two GWAS trials and two training sets in 2019 and 2020; bottom: number of plots and genotypes used in each experiment; and the genotypes in common between trials are in italic. Year Temperature (°C) PPF (μmol s^−1m^−2) RH (%) Mean Maximum Mean 2019 26.84 38.98 743.11 62.86 2020 29.22 38.52 1000.95 56.1 Trials TS1 TS2 GAT1 GAT2 TS1 80 plots (60 genotypes) 19 genotypes 60 genotypes 36 genotypes TS2 108 plots (93 genotypes) 30 genotypes 92 genotypes GAT1 875 plots (650 genotypes) 70 genotypes GAT2 912 plots (634 genotypes) [97]Open in a new tab Note: photosynthetic photon flux (PPF) and relative humidity (RH); the trials in 2019 including the training set TS1 and the GWAS trial GAT1; the trials in 2020 including the training set TS2 and the GWAS trial GAT2. 2.2. Training Sets Adjacent to each of the GWAS trials, a training set comprising a representative sample of the lines in the GWAS trials was used to collect ground truth data for association with hyperspectral measurements. Completely randomised block designs (row-column) were also used in the training sets. The middle two rows (0.63 m row spacing) of each four-row plot were used for the ground truth data collection while the outside two rows (0.75 m row spacing) were guard rows. The training set in 2019 (TS1) consisted of 80 plots comprising 60 genotypes which were all inbred lines and also included in GAT1. In the training set of 2020 (TS2), there were 108 plots with 93 genotypes of which 63 (68%) were hybrids. There were 19 genotypes in common between TS1 and TS2. Due to differences in germination and vigour of the diverse germplasm used, there was substantial variability in final plant establishment in both trials. The ground truth measurements were only taken from plots which had good establishment, which reduced the number of possible observations that could be used to develop the models. To maximise the number and the range of observations, the ground truth data from TS1 and TS2 were pooled. 2.3. Ground Truth Measurements in the Training Sets In both trials, gas exchange measurements were taken under mostly cloudless conditions (between 9 am and 12 pm) between 35 and 50 days after sowing (DAS)), which was during the active vegetative growth period for all genotypes and hence before the switch to reproductive growth which may introduce physiological and metabolic changes, but after full canopy closure. This period is known to be the most critical period for grain production in sorghum [[98]46]. In total, 75 CO[2] (ACi) and 75 light (Ai) response curves were collected across TS1 (n = 31 plots comprising 29 inbred lines) and TS2 (n = 44 plots comprising 30 hybrid and 10 inbred lines) with six inbred lines in common between TS1 and TS2. One plant per plot was randomly selected for gas exchange measurements. The ACi curves were performed on the last or second last fully expanded leaf using a LI-6400 (LI-COR, Inc., Lincoln, Nebraska USA) with a 6400-02B Red/Blue LED light source illuminating a leaf chamber of 6 cm^2. To measure ACi curves, photosynthetically active radiation (PAR) was set at 1800 μmol photons m^−2s^−1, flow rate through the chamber at 500 μmol mol^−1, and temperature was set to leaf temperature measured at the commencement of each curve. Vapour-pressure deficit (VPD) was generally held at around 3.0 kPa, by adjusting the scrubbing of the incoming air via the desiccant. For each ACi curve, the reference CO[2] levels were set to the sequences of 200, 100, 50, 250, 400, 650, 800, 1000, 1200, and 1400 ppm, with a duration of 1-5 min for each step. Measurements were made at each CO[2] supply point when gas exchange had equilibrated, at which point, the coefficient of variation for the CO[2] concentration differential between the sample and reference analysers was below 1%. The light levels for the Ai curves were set at 2000, 1500, 1000, 500, 250, 120, 60, 30, 15, and 0 μmol m^−2s^−1. The other controls were set as follows: reference CO[2] (constant at 400 μmol mol^−1), flow (500 μmol mol^−1), temperature was set to leaf temperatures, and humidity was controlled by scrubbing incoming air to maintain a VPD around 3.0 kPa. The duration for every light level was 1-3 min. Sample and reference analysers were matched before each data point was logged. A small square section of the leaf (1.6 cm^2) was collected with a leaf punch from the same leaf section as was used for gas exchange measurements. The leaf sections were dried at 80°C and weighed to calculate LMA (g m^−2). Percent nitrogen of each sample was determined with a continuous flow isotope ratio mass spectrometer (CF-IRMS), and SLN (g m^−2) was calculated by multiplying percent nitrogen with LMA. Across the two training sets, 129 SLN and 169 LMA observations (plots) were obtained, involving 124 unique genotypes. To generate a maximised dataset and enhance robustness of associating the ground truth data taken in a plot with hyperspectral measurements obtained from the same plot, individual plots, rather than genotypes, were considered as an observational unit. 2.4. Canopy Hyperspectral Measurements Hyperspectral data captured before anthesis and around the same time as the ground-truthing data (at 58 and 52 DAS in 2019 and 2020, respectively) was used to associate with the ground truth data. At this stage of sorghum crop growth, canopies are fully closed and nitrogen content of individual leaves is expected to be at a maximum as all mainstem leaves are fully expanded, but, prior to any translocation of nitrogen during senescence [[99]47]. A tractor-based field phenotyping platform (GECKO; developed at The University of Queensland) which enables simultaneous crop canopy proximal sensing was used [[100]48]. The tractor moves at a constant 1.1 metres per second and is integrated with a GPS real-time kinematic system with 2 cm accuracy to locate sampling plots (individual size of 4.5 × 3 m). A microhyperspectral imager (Micro-Hyperspec VNIR model, Headwall Photonics, Fitchburg, MA, USA) mounted on this phenotyping platform (3 m above ground and~1.7 m above the canopy) was used to obtain the spectral response of each pixel (5 × 5 mm) at 272 spectral wavelengths between 395 and 997 nm (visible and near infrared). The resolution was approximately 2.2 nm with 6.0 nm Full Width Half Maxima. A radiometric calibration (dark signal calibration) of the hyperspectral camera was performed weekly. A spectral calibration using the nominal white and spectral diffusers with specific band sets focused on the highest possible spectral resolution was conducted every three months by comparing their respective responses in almost identical illumination conditions. An automated software data calibration pipeline was used to convert raw digital numbers to reflectance values at each pixel. Pixel reflectance was calculated by the ratio between pixel radiance from the microhyperspectral imager and the reference pixel radiance from an upward sensor measuring incoming radiance. To segment plants from soil and remove background noise from lower canopy levels, a threshold of NDVI > 0.5 was applied for each pixel based on the fractional vegetation cover [[101]27, [102]36, [103]49], which could ensures only spectral information from green leaves is retained for the reflectance calculations and shadows and other background noise are excluded from the hyperspectral images. After masking by NDVI > 0.5, plant pixels within a plot were averaged to calculate reflectance of each plot. All hyperspectral data was collected from 9 am to 12 pm to minimise the effects of relative orientation of the sun, and no adjustments were made for the sensor or the distribution of leaf angles in the masking. As an example, images, radiance, and reflectance pre- and postmasking by NDVI > 0.5 for plot 361 in 2020 are shown in [104]Figure 1. Figure 1. Figure 1 [105]Open in a new tab An example (plot 361 in training set 2) of plant canopy area (a) before and (c) after masking by (b) NDVI > 0.5; averaged plot radiance and reflectance before and after masking by NDVI > 0.5 (d). A set of hyperspectral vegetation indices known to be associated with photosynthesis was computed from the plot reflectance involving 16 wavelengths as shown in [106]Figure 1. The equations used to calculate the indices in this study were summarised in [107]Table 2. Table 2. Summary of the equations for the set of vegetation indices associated with photosynthesis. Acronym Indices Traits associated Equations References