Abstract Over the past decade, liquid chromatography coupled with tandem mass spectrometry (LC–MS/MS) has evolved into the main proteome discovery technology. Up to several thousand proteins can now be reliably identified from a sample and the relative abundance of the identified proteins can be determined across samples. However, the remeasurement of substantially similar proteomes, for example those generated by perturbation experiments in systems biology, at high reproducibility and throughput remains challenging. Here, we apply a directed MS strategy to detect and quantify sets of pre-determined peptides in tryptic digests of cells of the human pathogen Leptospira interrogans at 25 different states. We show that in a single LC–MS/MS experiment around 5000 peptides, covering 1680 L. interrogans proteins, can be consistently detected and their absolute expression levels estimated, revealing new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans. This is the first study that describes the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date. Introduction System-wide investigation of gene expression at the mRNA transcript level has become routine and is widely used in systems biology and clinical studies to identify sets of genes that show distinct transcript profiles for a specific cellular state and to classify samples according to their respective molecular patterns ([35]van ’t Veer et al, 2002; [36]Gilchrist et al, 2006; [37]Ishii et al, 2007). It has also been shown that neither the concentration of transcripts ([38]Gygi et al, 1999; [39]Griffin et al, 2002) nor their quantitative change in response to perturbations ([40]MacKay et al, 2004; [41]Kislinger et al, 2006; [42]de Godoy et al, 2008) strongly correlate with the quantitative change of their corresponding proteins, the main functional products of gene expression. Therefore, quantitative proteomics holds great promise to enhance or complement the picture of gene expression in cells, and thus to contribute to the understanding of most molecular mechanisms in a cell. However, owing to the large heterogeneity in the amount and the physico-chemical properties of proteins, along with the lack of protein amplification methods, system-wide quantitative proteome analysis has been more technically challenging than transcriptome analysis. Recent advances in liquid chromatography–tandem mass spectrometry (LC–MS/MS), currently the method of choice for large-scale protein studies, have made the reliable identification and quantification of thousands of proteins in a single study a reality ([43]Brunner et al, 2007; [44]de Godoy et al, 2008; [45]Ahrens et al, 2010). However, particularly due to the selection of precursor ions using a simple intensity driven heuristics (data-dependent analysis, DDA), results from such studies still show a bias against the detection of low abundant protein species and a decreasing level of reproducibility of identified peptides with decreasing abundance. Comprehensive and more highly reproducible proteome coverage can be achieved by extensive sample pre-fractionation and the mass spectrometric analysis of each fraction, albeit at a cost that multiplies analysis time and limits throughput. Additionally, the detection of different proteome subsets in repetitive LC–MS analyses of similar samples impairs the generation of consistent, reproducible quantitative data sets across multiple samples, a crucial prerequisite in systems biology studies ([46]Ideker et al, 2001; [47]Rifai et al, 2006; [48]Schiess et al, 2009). Therefore, several alternative or complementary MS strategies have been developed to overcome some of the limitations of current LC–MS/MS workflows ([49]Schmidt et al, 2008; [50]Picotti et al, 2009; [51]Domon and Aebersold, 2010). They make use of a priori information gathered from previous MS studies to increase the reliability, reproducibility and/or throughput of subsequent measurements. Specifically, in each of these strategies, MS analysis is focused on a few proteotypic peptides (PTPs) per protein, thereby minimizing instrument time without compromising analytical sensitivity. Two specific implementations of such strategies have been proposed ([52]Pan et al, 2009; [53]Schmidt et al, 2009; [54]Domon and Aebersold, 2010), which we have termed targeted and directed MS, respectively. Targeted MS is based on selected reaction monitoring (SRM also known as multiple reaction monitoring) and is typically carried out on triple quadrupole mass spectrometers. Because of very high selectivity and sensitivity, it is capable of covering the full dynamic range of proteomes in moderately complex organisms such as yeast ([55]Picotti et al, 2009). However, since each LC–MS/MS run is limited to a few hundred targeted peptides ([56]Stahl-Zeng et al, 2007), the throughput required for proteome-wide measurements is currently difficult to achieve. Directed MS makes use of inclusion mass lists in order to guide the MS sequencing to a desired, pre-determined subset of peptides ([57]Jaffe et al, 2008; [58]Schmidt et al, 2008, [59]2009). Directed sequencing is carried out on the same types of instruments as discovery measurements by DDA. In contrast to the SRM methodology, directed MS monitors far larger sets of peptides per analysis. However, because the precursor ion signal of the peptide of interest has to be explicitly detected to trigger its identification, the overall dynamic range and sensitivity of directed sequencing is lower than that of SRM and more dependent on the sample matrix ([60]Domon and Aebersold, 2010). Here, we have studied global and time-resolved changes in the proteome of cells of the human pathogen Leptospira interrogans that were perturbed by antibiotic stress and serum stimulation. Overall, in 31 samples, representing 25 cellular states, 1669 proteins, representing 75% of the Leptospira proteome discovered by saturation sequencing using DDA MS, were consistently detected and their cellular concentrations determined ([61]Supplementary Table SV). This unique data set was generated via an integrated inclusion list driven MS strategy that maximizes protein coverage in individual samples by focusing precious MS-sequencing time on the best flying, PTPs of each protein ([62]Mallick and Kuster, 2010). The cellular concentrations of the detected proteins were estimated in each sample by correlating the average of the signal intensities of the three most highly responding peptides per protein with a calibration curve generated with a set of isotopically labeled reference ([63]Malmström et al, 2009). We show that the protein components of entire pathways can be quantified across several time points and, for the first time, large-scale, consistent proteome data sets can be subjected to cluster analysis, a tool that was previously limited to the transcript level due to incomplete sampling on protein level. We show that the proteomic changes measured differ from the available transcriptomics data. We demonstrate that Leptospira cells adjust the cellular abundance of a certain subset of proteins as a general response to stress while other parts of the proteome respond highly specific. They furthermore react to individual treatments by ‘fine tuning’ the abundance of certain proteins and pathways in order to cope with the specific cause of stress. Using serum treatment we simulated the host environment and elucidate which proteomic adjustments underlie virulence. The method can be implemented with standard high-resolution mass spectrometers and software tools that are readily available in the majority of proteomics laboratories. It is scalable to any proteome of low-to-medium complexity and can be extended to post-translational modifications or peptide-labeling strategies for quantification. We therefore expect the approach outlined here to become a cornerstone for microbial systems biology. Results To consistently detect and absolutely quantify the same, extensive subset of the L. interrogans proteome in multiple samples, we developed and deployed the general workflow displayed in [64]Figure 1. It consists of two main phases, proteome discovery and scoring. During the initial discovery phase, a comprehensive atlas of peptides and proteins identified by LC–MS/MS was generated by saturation sequencing of the L. interrogans proteome. To maximize proteome coverage, a pooled sample was generated and analyzed that consisted of aliquots from cells at different states. Subsequently, during the scoring phase, selected PTPs were detected in individual samples via inclusion list driven sequencing and quantified based on the ion current of the selected peptides, to generate quantitative proteome maps for each cellular state. Using this technique, comprehensive LC–MS/MS maps could be generated without the need for sample and time-consuming pre-fractionation steps, which significantly increases sample throughput. Figure 1. [65]Figure 1 [66]Open in a new tab Global protein profiling workflow. In the first phase of the study (discovery phase), the peptide samples representing different cell states were mixed and analyzed by data-dependent acquisition (DDA) followed by directed 1D-LC–MS/MS. To achieve comprehensive proteome coverage, all detectable precursor ions, referred to as features, were extracted, sequenced in sequential directed LC–MS/MS analyses and identified by database searching. All identified peptide sequences were stored in a 1D-PeptideAtlas together with their precursor ion signal intensity, elution times and mass-to-charge ratio. For each protein, mass and time coordinates from the five most suitable peptides (PTPs) for quantification were extracted from the PeptideAtlas and stored in an inclusion list. Additionally, a spectral library was generated from the identified spectra to improve both, the sensitivity and speed of spectral matching in the quantification phase. In this phase (scoring phase), LC–MS/MS analysis was focused on the pre-selected PTPs as well as a set of heavy labeled reference peptides that were added to each sample. This determined the concentrations of the corresponding proteins in the sample, which could be used as anchor points to translate the MS response of each identified protein into its concentration ([67]Malmström et al, 2009). After spectral matching, label-free quantification was employed to extract and align identified features and monitor their corresponding protein abundances redundantly over all samples. Generation of a L. interrogans PeptideAtlas To build a PeptideAtlas ([68]Desiere et al, 2006; [69]Deutsch et al, 2008) with maximal coverage of the L. interrogans proteome, we generated a pooled sample in which aliquots of extracts from different cell states were combined. Specifically, one aliquot of an untreated control sample and four aliquots of the individual perturbated cells (24 h treatments only, see [70]Figure 3) were pooled. We used a single dimension high-performance LC–MS/MS platform in combination with the recently introduced directed MS technique ([71]Schmidt et al, 2008) to maximize proteome coverage. In such measurements, precursor ion chromatograms are first extracted from two initial data-dependent (DDA) LC–MS/MS runs and the precursor ion maps (retention time versus mass over charge) that are also generated by these measurements are subjected to a peak extraction algorithm ([72]Mueller et al, 2007) to detect precursor ions not identified by DDA MS. In subsequent injections of the same sample, the mass spectrometer was then directed to acquire product ion spectra of previously non-selected precursor ions, to incrementally increase proteome coverage to saturation. We have shown earlier that this procedure maximizes the coverage of moderately complex proteomes at the peptide level while minimizing measurement and computational time ([73]Schmidt et al, 2008). Figure 3. [74]Figure 3 [75]Open in a new tab Hierarchical clustering of protein concentration changes. Hierarchical clustering of absolute protein abundance changes to the corresponding untreated control samples in copies/cell (log[10]) for all 24 treatments. The column dendrogram representing the clustering of the differentially perturbed samples is displayed and the clusters (1–6) obtained are indicated. Significantly enriched (P<0.05) biological processes (BP) based on GO are indicated for all eight protein clusters obtained (a–h). Specifically, the following sequence of analyses was carried out to collect the data for the L. interrogans PeptideAtlas. LC–MS/MS runs #1 and #2 were conventional DDA runs where precursor ions of different charge states (2 and >2, respectively) were selected. In subsequent LC–MS/MS runs #3–#20, precursor ions selected by the following criteria were added to inclusion lists and identified by directed precursor ion selection: (i) all features detected by a feature detection algorithm ([76]Mueller et al, 2007) in the initial DDA runs; (ii) precursor ions corresponding to all PTPs extracted from a recently published large-scale proteome analysis on the same species ([77]Beck et al, 2009); and (iii) predicted precursor ion signals for all PTPs that were computed but not observed from the L. interrogans genomic sequence. PTP predictions were carried out by the algorithm PeptideSieve ([78]Mallick et al, 2007). The L. interrogans proteome is highly accessible for the LC–MS analysis employed here since for the majority of gene products (3402/3658) five or more PTPs could be predicted ([79]Supplementary Figure S1). The fragment ion spectra generated from each of these analyses were database searched and the resulting data were filtered to a peptide and protein level false discovery rate (FDR) of 1% ([80]Reiter et al, 2009). At each stage, already identified features as well as proteins identified with more than five PTPs were excluded from further analysis in the subsequent stages. In the two initial DDA LC–MS/MS runs, we detected 37 833 unique features of which 7776 could be assigned to a peptide sequence, resulting in 6861 peptide identifications corresponding to 1223 proteins ([81]Table I). The remaining features (27 968) for which no MS/MS spectra were acquired were split into four inclusion lists, each comprising around 7000 features. These were then specifically sequenced by directed LC–MS/MS analyses. Thereby, the PeptideAtlas could be extended by 2356 (228) additional peptides (proteins). Finally, 12 and 10 additional directed LC–MS/MS-sequencing runs for the identification of missing proteins using PTPs from a recently published PeptideAtlas or predicted PTPs, respectively, increased the overall number of identifications to a total of 13 113 features, corresponding to 11 611 peptides and 1680 proteins. To reach this coverage, 28 LC–MS/MS runs were required ([82]Table I). As is evident from [83]Figure 2A, the number of protein identifications reaches saturation toward completion of each experimental phase, after rising at the beginning of the phase, indicating that different peptide subsets are identified at each of the analytical stages. The final feature map generated in this discovery phase contains the exact mass and time coordinates of each identified feature and represents a rich resource for the directed sequencing of all detected proteins in the scoring phase. Importantly, the identified features are well distributed by time and mass ([84]Figure 2C), which allowed their specific sequencing in a high number of samples by directed LC–MS/MS. Table 1. Number of unique features and peptides identified in the discovery phase. Data filtering Number of entries for all 1680 proteins identified in the discovery phase Number of entries for GroEL ^aDetected using the SuperHirn algorithm ([85]Mueller et al, 2007). ^bIdentified by database searching. FDR was set to 1%. ^cPTPs are defined as features that assigned peptide sequences show full tryptic cleavage, contain no modification and only match to one protein sequence in the database used. ^dUp to the five most intense PTPs per protein were selected for screening phase. Detected unique features^a 37 833 Not available Identified unique features^b 13 113 86 Identified unique peptides^b 11 611 76 Identified unique PTPs^b,c 6889 23 Selected unique PTPs^d 4953 5 [86]Open in a new tab Figure 2. [87]Figure 2 [88]Open in a new tab Directed LC–MS/MS analysis of the L. interrogans proteome. A pool of peptide samples generated from different perturbations was LC–MS analyzed to generate a comprehensive protein/peptide atlas of L. interrogans. (A) This was achieved by accumulating the MS data obtained from (i) two non-directed (DDA) LC–MS runs followed by (ii) directed LC–MS analysis of all detected features, (iii) previously detected PTPs ([89]Beck et al, 2009) and (iv) predicted PTPs ([90]Mallick et al, 2007). Proteins detected with five or more PTPs were excluded in the following analysis. The numbers of identified proteins (y axis) and peptides (inset) versus the number of identified tandem mass spectra are displayed. For comparison, the protein discoveries obtained from a non-directed LC–MS analysis of 24 OGE fractions (OGE/LC) are shown in red. A recently developed algorithm was deployed to estimate the increase in protein/peptide discoveries with additional LC–MS experiments (dashed lines) ([91]Claassen et al, 2009). (B) Venn diagram showing the overlap of proteins identified with the LC-only and the OGE/LC–MS approach. (C) LC–MS map of all identified features (green). Precursor ions identified as tryptic peptides of the GroEL protein are shown in red. The sequences as well as the coordinates of the five PTPs selected for GroEL monitoring in the scoring phase are indicated in blue. We next evaluated the extent of proteome coverage achieved by this iterative directed sequencing strategy with that achieved by more conventional proteome analyses via extensive sample fractionation and DDA analysis of each fraction. For the latter strategy, the same peptide sample used for inclusion list sequencing was fractionated by isoelectric focusing using off-gel electrophoresis (OGE) ([92]Heller et al, 2005) and each of the 24 fractions was analyzed once by DDA LC–MS/MS analysis. Intriguingly, this data set contained 60% more peptide identifications, but only 19% additional protein hits (number versus number, [93]Figure 2A), indicating a higher peptide per protein ratio of 12 (OGE) over 7 (LC only). We thus conclude that 81% of the proteins detected by the OGE–LC–MS/MS approach were also detected by the directed LC–MS/MS method, most of them with a sufficient number of peptides for accurate quantification in the scoring phase. Notably, only a slight increase in protein identifications is expected by additional LC–MS/MS analyses ([94]Claassen et al, 2009), demonstrating that we have detected most of the proteins identifiable by the two LC–MS/MS strategies employed ([95]Figure 2A, dashed lines). As expected, the majority of proteins (67.9%) were identified with both approaches. However, 23.3/8.9% of identified peptides were exclusively detected by the OGE–LC/LC-only approach, respectively ([96]Figure 2B). Functional annotation revealed that many of the 194 protein hits exclusively identified by the directed (LC only) LC–MS/MS approach and missed by the OGE–LC–MS/MS approach are membrane proteins ([97]Supplementary Figure S2), suggesting a decreased recovery of hydrophobic peptides after OGE. Conversely, the OGE–LC–MS/MS strategy showed an increased coverage, particularly of low abundant proteins, like transcription factors and regulators, confirming the higher protein concentration range accessible after extensive sample fractionation. In general, extensive proteome coverage was achieved with both strategies, which is supported by the lack of biases against any functional groups ([98]Supplementary Figure S2). Overall, of the 13 113 different features identified by directed LC–MS/MS ([99]Supplementary Table SII), 6889 represented suitable PTPs for protein quantification ([100]Supplementary Table SIII). For each protein, the five most suitable PTPs for protein quantification, referred to as top five PTPs, were extracted from the feature list considering the following attributes; (i) specificity to a single database entry, (ii) true tryptic cleavage termini, (iii) lack of modifications and (iv) high MS-signal response determined by the SuperHirn algorithm ([101]Mueller et al, 2007). The selected 4953 PTPs ([102]Table I) covered the whole feature intensity range ([103]Supplementary Figure S3) and all 1680 identified proteins ([104]Table I). The feature intensity range for the PTP precursor ions on the inclusion list spanned more than three orders of magnitude, a dynamic range that is expected to capture most of the L. interrogans proteome ([105]Malmström et al, 2009). The benefits of focusing on the most suitable PTPs for monitoring each protein can be demonstrated in the case of the chaperone GroEL. For this abundant protein, 86 different features could be identified ([106]Table I) of which the five most intense fulfill all PTP selection criteria ([107]Figure 2C, blue), supporting the observation that unspecifically proteolyzed or modified peptides constitute a minor but detectable fraction of the total ion current generated by the peptides from a protein ([108]Picotti et al, 2007). By focusing on these PTPs, >90% of the MS-sequencing cycles required to detect and monitor GroEL levels in the following scoring phase could be saved and thus used for measuring different proteins of interest. It is important to note that this effect is more pronounced for highly abundant and larger proteins for which high numbers of peptides are identified. Finally, 38 heavy labeled reference peptides from 19 proteins were added to estimate absolute protein concentration on a system-wide scale in each sample following a recently described protocol ([109]Malmström et al, 2009) ([110]Figure 1; [111]Supplementary Table SI). Thus, the final inclusion mass list was distributed over two LC–MS/MS runs and the coordinates of the heavy reference peptides and their endogenous counterparts were included in both runs. Therefore, the data generated in the discovery phase of the project allowed us to establish a method in which 1680 proteins per sample could be detected and absolutely quantified in two inclusion list LC–MS/MS runs with a total analysis time per sample of 4 h. To increase the speed and identification yield of the selected PTPs in the scoring phase, we computed a spectral library from the acquired MS-sequencing data in the discovery phase using SpectraST ([112]Lam et al, 2009). We included additional MS data from a recent large-scale LC–MS/MS study on the same species ([113]Beck et al, 2009) to further enhance the quality of the consensus spectra in the spectral library and applied very stringent filtering criteria to keep the overall FDR <0.2%. Overall, 321 498 identified MS2 spectra were merged to 33 766 distinct consensus spectra covering >2300 proteins. The library was added to the current L. interrogans PeptideAtlas and can be downloaded from [114]http://www.peptideatlas.org. Next, we assessed the performance of the described approach by analyzing a single control sample and comparing the number of identified peptides/proteins to the conventional shotgun LC–MS/MS methodology using the same number of runs. While the non-directed DDA LC–MS/MS analysis ([115]Supplementary Figure S4A, blue) identified a larger number of peptides, 404 (40%) additional proteins could be detected by the directed strategy (1593) ([116]Supplementary Figure S4A, red). The coverage was particularly enhanced for proteins of mid-to-low abundance, indicating an increased identification efficiency for these proteins by the directed MS approach compared with DDA LC–MS/MS-based strategies ([117]Supplementary Figure S4B). Finally, we assessed the utility of the generated inclusion list/spectral library on a different LC–MS platform in a different proteomics laboratory. After adjusting the retention times of the PTPs to the new LC system, the identified proteins could be detected with the same high consistency ([118]Supplementary Figure S5A and B) and coverage ([119]Supplementary Figure S5C) as on the LC–MS platform that was used to build the inclusion list and spectral library. This demonstrates the value of the generated data for the application in other laboratories and the usefulness of the generated, global PeptideAtlas and inclusion mass list for the proteomics community. Quantitative time course measurements of perturbed L. interrogans cells We next used the method established above to acquire quantitative proteome profiles of Leptospira cells grown under different conditions. Specifically, cells were cultured in EMJH supplement (control samples) and in the presence of fetal bovine serum (FBS; 10% v/v) and antibiotics (5 μg/ml ciprofloxacin, 10 μg/ml penicillin G, 15 μg/ml doxycycline, respectively) in EMJH supplement. The underlying molecular mechanisms of the individual treatments are displayed in [120]Figure 3. Samples were taken after 3, 6, 12, 24, 48 and 168 h of treatment. Thus, overall 31 protein samples were generated, including 7 controls. We used label-free quantification to generate proteome maps of all detected PTPs and employed them for absolute protein quantification within each sample as well as relative protein quantification across all samples. Two technical replicates were acquired and averaged for all samples, to improve quantification accuracy. We first evaluated the combined technical and biological reproducibility of the relative protein quantification by comparing the proteome maps of three different control samples ([121]Supplementary Figure S6). The high squared Pearson correlation R^2 (0.945–0.965) and the near straight lines indicated the nearly optimal linear relationship between the replicates. Specifically, minimal abundance variations between the replicate samples were observed by the inclusion list driven LC–MS/label-free quantification approach even for proteins of low abundance ([122]Supplementary Figure S6A–C). Consequently, with the measured coefficient of variances of the protein ratios being <26% between all controls, 1.5-fold changes (2 × σ) with a P-value <0.05 (ANOVA) can be confidently detected for most proteins by the described approach ([123]Supplementary Figure S6D–F). We next used the proteome maps to estimate the absolute quantities of the proteins in each perturbed sample and thus, in conjunction with the number of cells used to generate the samples, the cellular concentrations of the detected proteins. This was accomplished by translating the signal intensities of the high responder peptides from each detected protein into absolute protein quantities, using a recently published approach with some modifications ([124]Malmström et al, 2009). First, the absolute protein quantity of a consistent set of proteins was accurately determined in each sample by comparing the signal intensities of the sample intrinsic peptides with the corresponding signals generated from known amounts by isotopically labeled reference peptides of identical sequence that were added to each sample. Since these peptides were included in the directed LC–MS analysis, no additional SRM LC–MS analyses were required for their quantification. In this way, the precise concentrations of 29 peptides corresponding to 19 proteins could be calculated ([125]Supplementary Table SI). The concentrations of these proteins spanned almost three orders of magnitude, from 68 copies/cell for the flagellar M-ring protein ([126]YP_001355.1) to 13 649 copies/cell for the GroEL protein ([127]YP_001299.1, [128]Supplementary Table SI), confirming the high dynamic abundance range covered by the method ([129]Supplementary Figure S3). In general, the protein abundances determined by multiple heavy reference peptides per protein showed good agreement, even for low abundance proteins ([130]Supplementary Table SI). Moreover, the values determined here matched very well with those published in a recent study and the structural benchmarks employed therein ([131]Malmström et al, 2009) ([132]Supplementary Figure S7). In a second step, these abundance values were aligned with the average intensities of the three PTPs of each protein with the highest MS response, the same peptides that were in the focus of the directed LC–MS analysis for peptide identification. In the same operation, we therefore consistently estimated the absolute abundances of all identified proteins in each of the samples. On average, a high squared Pearson correlation (R^2=0.805) of the absolute abundances accurately determined by heavy peptide references and their average feature