Abstract Circulating cell-free DNA (ccfDNA) has great potential for non-invasive diagnosis, prognosis and monitoring treatment of disease. However, a sensitive and specific whole-genome sequencing (WGS) method is required to identify novel genetic variations (i.e., SNVs, CNVs and INDELS) on ccfDNA that can be used as clinical biomarkers. In this article, five WGS methods were compared: ThruPLEX Plasma-seq, QIAseq cfDNA All-in-One, NEXTFLEX Cell Free DNA-seq, Accel-NGS 2 S PCR FREE DNA and Accel-NGS 2 S PLUS DNA. The Accel PCR-free kit did not produce enough material for sequencing. The other kits had significant common number of SNVs, INDELs and CNVs and showed similar results for SNVs and CNVs. The detection of variants and genomic signatures depends more upon the type of plasma sample rather than the WGS method used. Accel detected several variants not observed by the other kits. ThruPLEX seemed to identify more low-abundant SNVs and SNV signatures were similar to signatures observed with the QIAseq kit. Accel and NEXTFLEX had similar CNV and SNV signatures. These results demonstrate the importance of establishing a standardized workflow for identifying non-invasive candidate biomarkers. Moreover, the combination of variants discovered in ccfDNA using WGS has the potential to identify enrichment pathways, while the analysis of signatures could identify new subgroups of patients. Subject terms: Biological techniques, Biotechnology, Cancer, Biomarkers Introduction The analysis of circulating cell-free DNA (ccfDNA) from plasma bears great promise for diagnosis, prognosis and monitoring the treatment of cancer^[34]1. In the context of precision medicine, the identification of novel non-invasive biomarkers is crucial but the analysis of ccfDNA is still a challenge. Indeed, ccfDNA is low concentrated, highly fragmented and the abundance depends on the type and the stage of cancer and the pre-analytical steps^[35]2,[36]3–[37]5. Due to its properties, a complete workflow for sample preparation, library preparation, sequencing and data analysis should be performed to ensure standardization of sample analysis especially in the case of clinical cohorts^[38]4,[39]6,[40]7. Pre-analytical steps including sample collection, storage, processing and extraction were compared to maximize the yield and size of ccfDNA^[41]3,[42]5,[43]8–[44]12. Furthermore, size analysis and quantification methods were used to evaluate the extracted ccfDNA. Sensitive approaches such as quantitative PCR, digital PCR, mass spectrometry and next generation sequencing (NGS) are commonly applied to analyze extracted ccfDNA^[45]2. With the improvement of NGS analysis, whole-genome sequencing (WGS) is a great approach to identify all types of genomic alteration including single nucleotide variant (SNV), insertion and deletion (INDEL), copy number variation (CNV) and structural variant (SV) for the identification of candidate biomarkers in cancer^[46]13. In particular, several specific and sensitive low-coverage sequencing approaches have been applied for the analysis of CNVs from cancer plasma samples^[47]14–[48]20. In addition, recent WGS studies allowed the analysis of nucleosome positioning, tumor fraction, fragmentation patterns and chromosomal and microsatellite instability using specific ccfDNA WGS methods^[49]21–[50]28. In the present work, we compared commercially available WGS kits based on Illumina sequencing for the analysis of ccfDNA. To ensure optimal analysis of samples, a sample preparation workflow was established^[51]8. Then, five commercially available WGS kits including one PCR-free kit and four kits based on final amplification were compared for the detection of germline and somatic mutations as well as CNVs. Results Five commercially available WGS kits were compared: ThruPLEX, QIAseq, NEXTFLEX, Accel with PCR and Accel PCR-free. Each library was prepared starting with 5–10 ng of input material to obtain sufficient amount of library to sequence at 10X or 30X sequencing coverage. Both germline and somatic mutations were detected using the GATK tool and CNVs were detected using the ichorCNA tool^[52]29–[53]31. Sample preparation A complete workflow was developed to maximize the yield of ccfDNA extracted from plasma, based upon previously compared ccfDNA extraction methods^[54]8. Commercially available plasma containing K2-EDTA as an anticoagulant was chosen to optimize ccfDNA analysis^[55]32. Thawed plasma samples were centrifuged to remove potential contamination of high molecular weight (HMW) DNA before extraction^[56]33. The extractions were performed using the commonly used QIAamp Circulating Nucleic Acid kit starting with 1 mL of plasma and using 100 µL of elution volume. ccfDNA was then quantified using Fluorometric assay and the fragment length sizes were analysed by electrophoresis to normalize each sample. A plasma control sample (HD816) was used to check the extraction efficiency and the recovery of this control sample was 80.7% +/− 4.3%. The average concentration of all extracted ccfDNA samples was 26.7+/− 13.5 ng/mL of plasma. The average fragmented size of all ccfDNA samples was 167 bp +/− 4 bp (Supplementary Fig. [57]S1). The fragment size analysis of breast cancer 1 sample also showed HMW DNA at about 10,000 bp and the pool of healthy donors also had a peak at about 8,500 bp. Only the prostate cancer patient provided enough ccfDNA (52 ng/mL of plasma) to perform the evaluation of all library constructions. The other ccfDNA samples were analyzed using the ThruPLEX method that has been used in several other studies^[58]15,[59]22,[60]23,[61]26. Three fragmented control DNAs (NA12878, HD780 and HD786) were used to mimic ccfDNA and to evaluate the detected variants. Sequencing of library preparation To ensure fair evaluation of the library preparation kits, a process was established starting with 5–10 ng of input material. To avoid adapter dimers, adapters were diluted for the QIAseq and NEXTFLEX protocols, PCR libraries were purified at 0.8X for QIAseq.^[62]34. Indeed, high ratio of adapter dimers into the library construction generates several clusters on the flow cell and consequently could reduce the sequencing capacity of the sample^[63]34,[64]35. Although the adapter primer was diluted, it was still detected in the NEXTFLEX library preparation but it represented about only 1% of all clusters of this sample (Fig. [65]1). Figure 1. [66]Figure 1 [67]Open in a new tab Size profiles of WGS using Accel, NEXTFLEX, QIAseq and ThruPLEX from prostate cancer plasma. The PCR-free product of Accel was not detected and consequently this protocol cannot be compared in this manuscript. For the four other library preparation kits, the number of PCR cycles was determined using qPCR assays for each sample to maximize the PCR library yield for 10X or 30X sequencing starting with this low amount of input material (5–10 ng)^[68]36,[69]37. The number of PCR cycles was between 7 to 10 for all kits which correspond to the manufacturer’s recommendation except for Accel which was greater (7 instead of 2 cycles) starting with this input. Consequently optimizing the number of cycles provides enough quantity of library to sequence at either 10X or 30X. Finally, PCR libraries were then quantified by qPCR and each size of library is analysed for equimolar pooling of samples (Fig. [70]1). The four library preparation kits were sequenced at 10X and/or 30X coverage. The median coverage and percentage of paired-end reads (PE) of all 30 WGS samples are shown in Supplementary Fig. [71]S2. Although, the median coverage is similar for 10X or 30X sequencing, Accel kit shows the highest median coverage. The percentages of PE reads are not significantly different between all kits (p-values between 0.19 and 0.75). Furthermore, the ThruPLEX library constructions from plasma samples show that the median coverage (10.3X+/− 2.5X), the percentage of PE reads (90.4% +/−6.8%) and the insert size (163 bp + /− 14.6 bp) depend also on the type of plasma sample. Finally, for the library construction of NEXTFLEX, 5 ng of starting material was used as recommended by the manufacturer except for the NA12878 WGS at 30X. WGS comparison of all NA12878 samples at 30X shows that 10 ng of starting input can also be used for NEXTFLEX. Detection of targeted variants from the reference control sample To compare the sensitivity and specificity for germline and somatic mutation detection, three standard reference samples including NA12878 (Table [72]1), HD786 and HD780 ccfDNA reference standard were used (Table [73]2). Table 1. Germline SNV and INDEL detection of NA12878 sample (NIST reference (HG001) of GIAB ([74]https://www.nist.gov/programs-projects/genome-bottle) from Accel, NEXTFLEX, QIAseq and ThruPLEX kits. WGS methods Median coverage (X) Number of SNV Number of INDEL SNV_TPR % SNV_PPV % INDEL_TPR % INDEL_PPV % Accel 12,0 3616493 702550 95,96 99,42 87,47 96,06 38 3838215 927664 99,9 99,68 98,85 93,18 NEXTFLEX 9,0 3303878 582961 88,37 98,98 76,92 94,7 37 3810345 882677 99,82 99,66 98,04 94,17 QIAseq 8,0 3209340 598051 85,35 98,41 74,59 89,64 35 3808366 931168 99,77 99,62 97,22 87,22 ThruPLEX 8,0 3084349 575960 81,44 97,18 68,1 81,16 33 3777238 916835 99,56 99,54 93,45 80,33 [75]Open in a new tab The number of SNVs and INDELs and the TPR and PPV of each detected. Table 2. Somatic SNV detection of HD780 and HD786 samples from Accel, NEXTFLEX, QIAseq and ThruPLEX kits. Sample WGS methods Median coverage (X) Detected SNV HD780 NEXTFLEX 9,0 Accel 9,0 45,4 ThruPLEX 8,0 PIK3CA (E545K) 40,0 PIK3CA (E545K) QIAseq 8,0 KRAS (G12D) HD786 NEXTFLEX 8,0 Accel 9,0 PIK3CA (E545K) 47,0 PIK3CA (E545K) ThruPLEX 8,0 38,0 PIK3CA (E545K) and GNA11 (Q29L) QIAseq 8,0 [76]Open in a new tab NA12878 DNA was used to assess whether 10X or 30X sequencing coverage was sufficient to detect the correct germline mutation (Table [77]1). The true positive rate (TPR) and the positive predictive value (PPV) were calculated to compare the sensitivity and the specificity of detection of known germline SNPs and INDELs in this sample (Table [78]1). The Accel method detects more SNVs and INDELs than the other kits and it has a higher TPR and PPV of SNVs and INDELs especially for 10X read depth. In addition, for 30X read depth, the TPR and PPV of SNVs are higher than 99.5% for each method and the TPR and PPV of INDELs is between 93.4–98.85% and 80.33–94.13% respectively. The TPR and PPV of INDELs are lower than those of SNVs because INDELs are usually more difficult to detect. Finally, the TPR of INDELs at 30X (≥93.45%) is higher than the TPR at 10X (≤87.47%) whereas the PPV of 30X(80.33% to 94.17%) is lower than the PPV of 10X(81.16% to 96.06%) for all methods. Furthermore, WGS of the NA12878 sample using the NEXTFLEX kit was performed using 5 ng for 10X, according to the manufacturer’s recommendation, and 10 ng for 30X whereas the three other WGS kits are prepared starting with 10 ng of input material for both 10X and 30X coverage. For both coverage and input, NEXTFLEX is the second best kit for the detection of germline variants. Moreover, HD780 control sample has six somatic SNVs at ∼5%: EGFR (L858R and T790M), KRAS (G12D), NRAS (Q61K and A59T) and PIK3CA (E545K) genes. HD786 contains three somatic SNVs at ∼5%: GNA11 (Q209L) and AKT1 (E17K) and PIK3CA (E545K) genes. These two references also contain two INDELs: