Source: https://github.com/markziemann/GeneNameErrors2020
View the reports: http://ziemann-lab.net/public/gene_name_errors/
Gene name errors result when data are imported improperly into MS Excel and other spreadsheet programs (Zeeberg et al, 2004). Certain gene names like MARCH3, SEPT2 and DEC1 are converted into date format. These errors are surprisingly common in supplementary data files in the field of genomics (Ziemann et al, 2016). This could be considered a small error because it only affects a small number of genes, however it is symptomtic of poor data processing methods. The purpose of this script is to identify gene name errors present in supplementary files of PubMed Central articles in the previous month.
library("XML")
library("jsonlite")
library("xml2")
library("reutils")
library("readxl")
library("RCurl")
Here I will be getting PubMed Central IDs for the previous month.
Start with figuring out the date to search PubMed Central.
CURRENT_MONTH=format(Sys.time(), "%m")
CURRENT_YEAR=format(Sys.time(), "%Y")
if (CURRENT_MONTH == "01") {
PREV_YEAR=as.character(as.numeric(format(Sys.time(), "%Y"))-1)
PREV_MONTH="12"
} else {
PREV_YEAR=CURRENT_YEAR
PREV_MONTH=as.character(as.numeric(format(Sys.time(), "%m"))-1)
}
DATE=paste(PREV_YEAR,"/",PREV_MONTH,sep="")
DATE
## [1] "2023/2"
Let’s see how many PMC IDs we have in the past month.
QUERY ='((genom*[Abstract]))'
ESEARCH_RES <- esearch(term=QUERY, db = "pmc", rettype = "uilist", retmode = "xml", retstart = 0,
retmax = 5000000, usehistory = TRUE, webenv = NULL, querykey = NULL, sort = NULL, field = NULL,
datetype = NULL, reldate = NULL,
mindate = paste(DATE,"/1",sep="") , maxdate = paste(DATE,"/31",sep=""))
pmc <- efetch(ESEARCH_RES,retmode="text",rettype="uilist",outfile="pmcids.txt")
## Retrieving UIDs 1 to 500
## Retrieving UIDs 501 to 1000
## Retrieving UIDs 1001 to 1500
## Retrieving UIDs 1501 to 2000
## Retrieving UIDs 2001 to 2500
## Retrieving UIDs 2501 to 3000
## Retrieving UIDs 3001 to 3500
pmc <- read.table(pmc)
pmc <- paste("PMC",pmc$V1,sep="")
NUM_ARTICLES=length(pmc)
NUM_ARTICLES
## [1] 3476
writeLines(pmc,con="pmc.txt")
Now run the bash script. Note that false positives can occur (~1.5%) and these results have not been verified by a human.
Here are some definitions:
NUM_XLS = Number of supplementary Excel files in this set of PMC articles.
NUM_XLS_ARTICLES = Number of articles matching the PubMed Central search which have supplementary Excel files.
GENELISTS = The gene lists found in the Excel files. Each Excel file is counted once even it has multiple gene lists.
NUM_GENELISTS = The number of Excel files with gene lists.
NUM_GENELIST_ARTICLES = The number of PMC articles with supplementary Excel gene lists.
ERROR_GENELISTS = Files suspected to contain gene name errors. The dates and five-digit numbers indicate transmogrified gene names.
NUM_ERROR_GENELISTS = Number of Excel gene lists with errors.
NUM_ERROR_GENELIST_ARTICLES = Number of articles with supplementary Excel gene name errors.
ERROR_PROPORTION = This is the proportion of articles with Excel gene lists that have errors.
system("./gene_names.sh pmc.txt")
results <- readLines("results.txt")
XLS <- results[grep("XLS",results,ignore.case=TRUE)]
NUM_XLS = length(XLS)
NUM_XLS
## [1] 6229
NUM_XLS_ARTICLES = length(unique(sapply(strsplit(XLS," "),"[[",1)))
NUM_XLS_ARTICLES
## [1] 1316
GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>2]
#GENELISTS
NUM_GENELISTS <- length(unique(sapply(strsplit(GENELISTS," "),"[[",2)))
NUM_GENELISTS
## [1] 654
NUM_GENELIST_ARTICLES <- length(unique(sapply(strsplit(GENELISTS," "),"[[",1)))
NUM_GENELIST_ARTICLES
## [1] 346
ERROR_GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>3]
#ERROR_GENELISTS
NUM_ERROR_GENELISTS = length(ERROR_GENELISTS)
NUM_ERROR_GENELISTS
## [1] 245
GENELIST_ERROR_ARTICLES <- unique(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
GENELIST_ERROR_ARTICLES
## [1] "PMC9968710" "PMC9958381" "PMC9957872" "PMC9957247" "PMC9956962"
## [6] "PMC9954656" "PMC9954001" "PMC9953877" "PMC9953556" "PMC9951815"
## [11] "PMC9951354" "PMC9949586" "PMC9948016" "PMC9947504" "PMC9947029"
## [16] "PMC9946837" "PMC9942122" "PMC9938127" "PMC9935838" "PMC9935506"
## [21] "PMC7614190" "PMC9929516" "PMC9928584" "PMC9928580" "PMC9928424"
## [26] "PMC9924891" "PMC9924296" "PMC9922290" "PMC9921068" "PMC9918054"
## [31] "PMC9917645" "PMC9917435" "PMC9916342" "PMC9912572" "PMC9910644"
## [36] "PMC9899606" "PMC9897672" "PMC9882984" "PMC9902559" "PMC9892829"
## [41] "PMC9870088" "PMC9886953" "PMC9968929" "PMC9965082" "PMC9961725"
## [46] "PMC9958104" "PMC9950624" "PMC9950457" "PMC9947337" "PMC9945660"
## [51] "PMC9943656" "PMC9936996" "PMC9939431" "PMC9936472" "PMC9935668"
## [56] "PMC9935575" "PMC9932166" "PMC9932081" "PMC9931374" "PMC9912024"
## [61] "PMC9928207" "PMC9926862" "PMC9926786" "PMC9922817" "PMC9916187"
## [66] "PMC9917439" "PMC9916484" "PMC9916442" "PMC9913674" "PMC9910260"
## [71] "PMC9903843" "PMC9903828" "PMC9903716" "PMC9909915" "PMC9907855"
## [76] "PMC9906884" "PMC9899247" "PMC9894539" "PMC9892568" "PMC9892530"
## [81] "PMC9890215" "PMC9890070" "PMC9889090" "PMC9889089" "PMC9888242"
## [86] "PMC9887306" "PMC9887000" "PMC9883539"
NUM_ERROR_GENELIST_ARTICLES <- length(GENELIST_ERROR_ARTICLES)
NUM_ERROR_GENELIST_ARTICLES
## [1] 88
ERROR_PROPORTION = NUM_ERROR_GENELIST_ARTICLES / NUM_GENELIST_ARTICLES
ERROR_PROPORTION
## [1] 0.2543353
Here you can have a look at all the gene lists detected in the past month, as well as those with errors. The dates are obvious errors, these are commonly dates in September, March, December and October. The five-digit numbers represent dates as they are encoded in the Excel internal format. The five digit number is the number of days since 1900. If you were to take these numbers and put them into Excel and format the cells as dates, then these will also mostly map to dates in September, March, December and October.
#GENELISTS
ERROR_GENELISTS
## [1] "PMC9968710 PMC_DL/PMC9968710/supplementaryfiles/41398_2023_2370_MOESM3_ESM.xlsx Hsapiens 25 44818 44817 44621 44813 44624 44625 44817 44805 44627 44817 44626 44622 44815 44626 44624 44625 44809 44896 44623 44815 44817 44815 44809 44621 44626"
## [2] "PMC9968710 PMC_DL/PMC9968710/supplementaryfiles/41398_2023_2370_MOESM3_ESM.xlsx Hsapiens 13 44817 44815 44622 44627 44817 44627 44817 44815 44896 44624 44621 44805 44809"
## [3] "PMC9968710 PMC_DL/PMC9968710/supplementaryfiles/41398_2023_2370_MOESM3_ESM.xlsx Hsapiens 5 44817 44622 44815 44805 44815"
## [4] "PMC9968710 PMC_DL/PMC9968710/supplementaryfiles/41398_2023_2370_MOESM3_ESM.xlsx Hsapiens 7 44624 44805 44625 44622 44817 44815 44815"
## [5] "PMC9968710 PMC_DL/PMC9968710/supplementaryfiles/41398_2023_2370_MOESM3_ESM.xlsx Hsapiens 7 44815 44627 44622 44817 44809 44817 44621"
## [6] "PMC9968710 PMC_DL/PMC9968710/supplementaryfiles/41398_2023_2370_MOESM3_ESM.xlsx Hsapiens 4 44817 44815 44624 44805"
## [7] "PMC9958381 PMC_DL/PMC9958381/supplementaryfiles/mmc10.xlsx Hsapiens 1 44627"
## [8] "PMC9957872 PMC_DL/PMC9957872/supplementaryfiles/NIHMS1859677-supplement-MMC10.xlsx Mmusculus 9 44448 44449 44441 44446 44447 44443 44444 44442 44450"
## [9] "PMC9957247 zip/Table_S6.xlsx Hsapiens 1 44808"
## [10] "PMC9956962 zip/Table_S3_CircRNAs_Information_detected_in_sequence_data_of_hypothalamic_transcriptome_of_Leizhou_Goat.xlsx Hsapiens 13 3-Mar 10-Mar 43532 11-Mar 1-Mar 5-Mar 7-Mar 43714 43526 43713 4-Mar 6-Mar 9-Mar"
## [11] "PMC9954656 zip/Additional_file_S3.xlsx Mmusculus 2 44624 44621"
## [12] "PMC9954656 zip/Additional_file_S3.xlsx Mmusculus 1 44624"
## [13] "PMC9954656 zip/Additional_file_S3.xlsx Mmusculus 1 44986"
## [14] "PMC9954656 zip/Additional_file_S3.xlsx Mmusculus 1 44989"
## [15] "PMC9954001 zip/Table_S5.xlsx Hsapiens 1 44442"
## [16] "PMC9954001 zip/Table_S5.xlsx Hsapiens 1 44442"
## [17] "PMC9953877 zip/cancers-2130992-supplementary/Suppl_material_cancers-2130992_version2/Suppl_Tables_1-3_BC_HGSOC_Cuello_M_Cancers_new.xls Hsapiens 26 44896 44631 44818 44819 44630 44624 44810 44627 44811 44807 44816 44625 44813 44806 44622 44809 44629 44808 44814 44812 44623 44621 44815 44628 44805 44626"
## [18] "PMC9953556 zip/biomolecules-2146911-table_S1.xlsx Hsapiens 1 38961"
## [19] "PMC9951815 PMC_DL/PMC9951815/supplementaryfiles/Table_2.xlsx Hsapiens 1 44531"
## [20] "PMC9951815 PMC_DL/PMC9951815/supplementaryfiles/Table_1.xlsx Hsapiens 27 44257 44442 44443 44257 44446 44445 44262 44450 44264 44451 44259 44256 44261 44453 44447 44263 44441 44531 44265 44258 44440 44266 44448 44444 44256 44449 44260"
## [21] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 2 44079 44089"
## [22] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 4 44262 44262 44445 44449"
## [23] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 9 44446 44442 44450 44454 44447 44445 44257 44449 44441"
## [24] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Ggallus 13 44446 44262 44442 44450 44257 44447 44445 44263 44262 44257 44449 44263 44441"
## [25] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 13 44446 44262 44447 44262 44262 44451 44446 44445 44261 44263 44448 44257 44258"
## [26] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Ggallus 1 44447"
## [27] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 3 44447 44256 44256"
## [28] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 3 44447 44448 44442"
## [29] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 7 44447 44256 44448 44263 44265 44442 44256"
## [30] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 10 44450 44265 44447 44446 44448 44263 44445 44262 44449 44262"
## [31] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 2 44262 44447"
## [32] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 1 44447"
## [33] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 1 44447"
## [34] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 10 44447 44256 44256 44265 44257 44446 44446 44451 44263 44441"
## [35] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 4 44265 44447 44257 44262"
## [36] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 2 44447 44446"
## [37] "PMC9951354 zip/ECE3_9854_SupplementaryTablesS1-S20_RevisedVersion.xlsx Hsapiens 5 44447 44256 44262 44256 44446"
## [38] "PMC9949586 zip/djac160_Supplementary_Data/JNCI-21-1799R2_Jones_Supp_Tables_4-5_081622.xlsx Hsapiens 18 44088 44088 44088 44088 44083 44088 44083 44084 44088 44166 43891 43891 43891 43898 44166 43892 43892 43898"
## [39] "PMC9949586 zip/djac160_Supplementary_Data/JNCI-21-1799R2_Jones_Supp_Tables_4-5_081622.xlsx Hsapiens 17 14-Sep 14-Sep 14-Sep 14-Sep 9-Sep 1-Dec 8-Mar 9-Sep 1-Dec 10-Sep 1-Mar 8-Mar 1-Mar 2-Mar 4-Mar 2-Mar 4-Mar"
## [40] "PMC9948016 PMC_DL/PMC9948016/supplementaryfiles/DataSheet_2.xlsx Hsapiens 3 45178 45178 45178"
## [41] "PMC9947504 PMC_DL/PMC9947504/supplementaryfiles/Table_1.xlsx Hsapiens 26 44806 44813 44628 44819 44627 44815 44809 44621 44625 44811 44626 44624 44816 44807 44818 44805 44896 44630 44808 44631 44810 44629 44814 44812 44622 44623"
## [42] "PMC9947029 PMC_DL/PMC9947029/supplementaryfiles/125_2022_5856_MOESM2_ESM.xlsx Hsapiens 3 37865 39326 42248"
## [43] "PMC9946837 PMC_DL/PMC9946837/supplementaryfiles/41586_2023_5711_MOESM6_ESM.xlsx Mmusculus 1 44448"
## [44] "PMC9946837 PMC_DL/PMC9946837/supplementaryfiles/41586_2023_5711_MOESM6_ESM.xlsx Mmusculus 1 44258"
## [45] "PMC9946837 PMC_DL/PMC9946837/supplementaryfiles/41586_2023_5711_MOESM4_ESM.xlsx Hsapiens 5 43353 43349 43344 43348 43351"
## [46] "PMC9946837 PMC_DL/PMC9946837/supplementaryfiles/41586_2023_5711_MOESM4_ESM.xlsx Hsapiens 2 42985 42987"
## [47] "PMC9946837 PMC_DL/PMC9946837/supplementaryfiles/41586_2023_5711_MOESM4_ESM.xlsx Hsapiens 1 43345"
## [48] "PMC9946837 PMC_DL/PMC9946837/supplementaryfiles/41586_2023_5711_MOESM4_ESM.xlsx Mmusculus 4 43715 43714 43713 43719"
## [49] "PMC9942122 PMC_DL/PMC9942122/supplementaryfiles/mmc5.xlsx Hsapiens 22 43530 43712 43718 43533 43532 43716 43710 43526 43526 43719 43714 43715 43525 43525 43525 43722 43714 43709 43712 43526 43526 43711"
## [50] "PMC9942122 PMC_DL/PMC9942122/supplementaryfiles/mmc5.xlsx Hsapiens 1 44085"
## [51] "PMC9942122 PMC_DL/PMC9942122/supplementaryfiles/mmc5.xlsx Hsapiens 10 43533 43525 43711 43714 43722 43711 43527 43525 43709 43719"
## [52] "PMC9942122 PMC_DL/PMC9942122/supplementaryfiles/mmc5.xlsx Hsapiens 30 43526 43533 43528 43530 43800 43529 43525 43532 43527 43531 43532 43525 43533 43528 43530 43529 43526 43527 43531 43800 43528 43530 43800 43531 43532 43526 43525 43529 43527 43533"
## [53] "PMC9942122 PMC_DL/PMC9942122/supplementaryfiles/mmc5.xlsx Hsapiens 69 43711 43713 43713 43530 43709 43711 43530 43527 43530 43527 43530 43714 43714 43715 43714 43715 43715 43529 43710 43525 43525 43526 43526 43718 43531 43525 43716 43712 43532 43529 43527 43719 43533 43710 43717 43711 43718 43717 43718 43716 43525 43527 43711 43533 43712 43528 43711 43716 43530 43717 43525 43526 43718 43722 43714 43531 43525 43528 43717 43710 43530 43533 43718 43711 43531 43526 43712 43715 43716"
## [54] "PMC9942122 PMC_DL/PMC9942122/supplementaryfiles/mmc6.xlsx Hsapiens 4 43711 43530 43718 43712"
## [55] "PMC9942122 PMC_DL/PMC9942122/supplementaryfiles/mmc4.xlsx Hsapiens 2 44261 44261"
## [56] "PMC9942122 PMC_DL/PMC9942122/supplementaryfiles/mmc7.xlsx Hsapiens 14 44259 44261 44447 44446 44262 44256 44259 44262 44261 44444 44257 44256 44260 44447"
## [57] "PMC9938127 PMC_DL/PMC9938127/supplementaryfiles/41467_2023_36535_MOESM9_ESM.xlsx Hsapiens 105 44819 44819 44622 44622 44622 44621 44621 44621 44628 44628 44628 44625 44625 44625 44625 44629 44816 44816 44805 44805 44805 44808 44808 44630 44630 44630 44630 44630 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44622 44622 44622 44814 44814 44627 44627 44627 44627 44627 44624 44806 44806 44806 44806 44809 44809 44807 44807 44815 44815 44815 44815 44815 44815 44815 44815 44815 44621 44621 44621 44621 44621 44626 44626 44626 44626 44626 44626 44626 44631 44623 44623 44623 44623 44812 44812 44812 44812 44811 44811 44811 44811 44811 44818 44896 44896 44810 44810"
## [58] "PMC9938127 PMC_DL/PMC9938127/supplementaryfiles/41467_2023_36535_MOESM9_ESM.xlsx Hsapiens 218 44819 44819 44819 44819 44819 44622 44622 44622 44622 44622 44622 44622 44621 44621 44621 44621 44621 44628 44628 44628 44628 44625 44625 44625 44625 44625 44629 44629 44629 44816 44816 44816 44816 44816 44805 44805 44805 44805 44805 44805 44805 44805 44805 44805 44805 44805 44808 44808 44808 44808 44808 44808 44808 44808 44808 44630 44630 44630 44630 44630 44630 44630 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44622 44622 44622 44622 44814 44814 44814 44814 44814 44627 44627 44627 44627 44627 44627 44627 44627 44627 44624 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44809 44809 44809 44809 44809 44809 44809 44809 44809 44809 44809 44807 44807 44807 44815 44815 44815 44815 44815 44815 44815 44815 44815 44815 44815 44815 44815 44815 44815 44815 44815 44621 44621 44621 44621 44621 44621 44621 44626 44626 44626 44626 44626 44626 44626 44626 44626 44626 44626 44626 44626 44626 44626 44631 44631 44631 44623 44623 44623 44623 44623 44623 44623 44812 44812 44812 44812 44812 44812 44812 44812 44812 44812 44812 44811 44811 44811 44811 44811 44811 44811 44811 44811 44811 44811 44811 44811 44818 44896 44896 44810 44810 44810 44810"
## [59] "PMC9935838 PMC_DL/PMC9935838/supplementaryfiles/Table_2.XLSX Hsapiens 1 44812"
## [60] "PMC9935506 PMC_DL/PMC9935506/supplementaryfiles/41419_2023_5671_MOESM9_ESM.xlsx Mmusculus 1 44449"
## [61] "PMC9935506 PMC_DL/PMC9935506/supplementaryfiles/41419_2023_5671_MOESM9_ESM.xlsx Mmusculus 2 44260 44262"
## [62] "PMC9935506 PMC_DL/PMC9935506/supplementaryfiles/41419_2023_5671_MOESM9_ESM.xlsx Mmusculus 2 44447 44442"
## [63] "PMC9935506 PMC_DL/PMC9935506/supplementaryfiles/41419_2023_5671_MOESM9_ESM.xlsx Mmusculus 2 44441 44257"
## [64] "PMC9935506 PMC_DL/PMC9935506/supplementaryfiles/41419_2023_5671_MOESM10_ESM.xlsx Mmusculus 2 44264 44261"
## [65] "PMC7614190 PMC_DL/PMC7614190/supplementaryfiles/EMS157168-supplement-Supplementary_Tables_1_7.xlsx Hsapiens 1 37316"
## [66] "PMC9929516 PMC_DL/PMC9929516/supplementaryfiles/can-22-2245_table_s2_suppst2.xlsx Hsapiens 25 44624 44622 44815 44806 44814 44621 44623 44626 44811 44805 44627 44812 44810 44625 44622 44628 44807 44808 44621 44630 44813 44629 44896 44818 44816"
## [67] "PMC9929516 PMC_DL/PMC9929516/supplementaryfiles/can-22-2245_table_s2_suppst2.xlsx Hsapiens 23 44807 44621 44629 44630 44813 44812 44628 44627 44625 44806 44810 44623 44805 44626 44815 44814 44811 44622 44624 44622 44621 44896 44808"
## [68] "PMC9928584 PMC_DL/PMC9928584/supplementaryfiles/41556_2022_1059_MOESM3_ESM.xlsx Hsapiens 26 44627 44622 44625 44631 44812 44628 44805 44818 44624 44621 44626 44629 44623 44621 44622 44813 44807 44809 44808 44630 44814 44816 44810 44806 44896 44815"
## [69] "PMC9928584 PMC_DL/PMC9928584/supplementaryfiles/41556_2022_1059_MOESM3_ESM.xlsx Hsapiens 1 44449"
## [70] "PMC9928584 PMC_DL/PMC9928584/supplementaryfiles/41556_2022_1059_MOESM3_ESM.xlsx Hsapiens 26 44621 44813 44621 44622 44631 44627 44629 44815 44630 44808 44805 44809 44623 44806 44624 44625 44896 44807 44814 44622 44816 44810 44818 44812 44626 44628"
## [71] "PMC9928584 PMC_DL/PMC9928584/supplementaryfiles/41556_2022_1059_MOESM3_ESM.xlsx Hsapiens 26 44628 44623 44622 44624 44630 44816 44896 44818 44621 44806 44805 44631 44625 44809 44812 44810 44627 44815 44621 44626 44814 44622 44808 44813 44807 44629"
## [72] "PMC9928584 PMC_DL/PMC9928584/supplementaryfiles/41556_2022_1059_MOESM3_ESM.xlsx Hsapiens 1 44531"
## [73] "PMC9928580 PMC_DL/PMC9928580/supplementaryfiles/Table_1.xlsx Mmusculus 1 44622"
## [74] "PMC9928580 PMC_DL/PMC9928580/supplementaryfiles/Table_1.xlsx Mmusculus 1 44814"
## [75] "PMC9928424 PMC_DL/PMC9928424/supplementaryfiles/elife-80317-supp3.xlsx Mmusculus 16 37865 38231 37135 39326 40057 40787 38596 40422 38961 39692 37500 42248 37012 38108 41883 41153"
## [76] "PMC9928424 PMC_DL/PMC9928424/supplementaryfiles/elife-80317-supp7.xlsx Mmusculus 14 39692 38596 40057 37865 38231 37135 40422 40787 38961 37500 39326 37012 38108 41883"
## [77] "PMC9928424 PMC_DL/PMC9928424/supplementaryfiles/elife-80317-supp2.xlsx Mmusculus 11 37865 39326 37500 38231 40787 37135 40057 38596 39692 38961 40422"
## [78] "PMC9928424 PMC_DL/PMC9928424/supplementaryfiles/elife-80317-supp4.xlsx Mmusculus 11 37865 39326 37500 38231 40787 37135 40057 38596 39692 38961 40422"
## [79] "PMC9928424 PMC_DL/PMC9928424/supplementaryfiles/elife-80317-supp6.xlsx Mmusculus 16 37865 37500 39326 38231 38961 40422 40787 40057 37135 38596 39692 42248 37012 38108 41883 41153"
## [80] "PMC9928424 PMC_DL/PMC9928424/supplementaryfiles/elife-80317-supp5.xlsx Mmusculus 11 37865 39326 37500 38231 40787 37135 40057 38596 39692 38961 40422"
## [81] "PMC9924891 PMC_DL/PMC9924891/supplementaryfiles/12094_2023_3108_MOESM1_ESM.xlsx Hsapiens 4 44622 44621 44805 44810"
## [82] "PMC9924891 PMC_DL/PMC9924891/supplementaryfiles/12094_2023_3108_MOESM1_ESM.xlsx Hsapiens 1 44621"
## [83] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_1_suppst1.xlsx Hsapiens 2 43898 43897"
## [84] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_1_suppst1.xlsx Hsapiens 2 43898 43897"
## [85] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_1_suppst1.xlsx Hsapiens 4 43897 43898 44166 44166"
## [86] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_1_suppst1.xlsx Hsapiens 1 43898"
## [87] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_1_suppst1.xlsx Hsapiens 1 43898"
## [88] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 15 44075 43893 44080 43896 44079 43897 44083 43892 43891 43898 44081 43895 44076 44082 44085"
## [89] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 1 44080"
## [90] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 17 44085 44080 43895 43896 44076 43893 43892 43892 44082 44081 43899 43898 43897 43891 43891 44075 44083"
## [91] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 15 44080 43891 44081 43893 44075 44085 44076 44083 43898 43895 43896 44082 43892 43899 43897"
## [92] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 4 44085 44075 43898 43891"
## [93] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 17 44080 44083 44075 44081 43895 44076 44084 44085 43891 43898 44079 43893 44082 43896 43899 43897 43892"
## [94] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 186 44445 44262 44450 44260 44440 44256 44450 44448 44445 44256 44440 44257 44258 44256 44258 44446 44450 44445 44260 44440 44262 44448 44261 44441 44264 44257 44445 44448 44264 44446 44256 44262 44260 44440 44441 44450 44261 44448 44264 44440 44447 44445 44256 44448 44256 44441 44445 44261 44258 44263 44262 44440 44440 44448 44256 44450 44262 44260 44446 44261 44264 44257 44258 44440 44448 44256 44450 44260 44262 44261 44445 44258 44450 44261 44264 44448 44256 44447 44260 44445 44257 44440 44256 44260 44257 44440 44446 44258 44261 44450 44447 44256 44440 44448 44262 44446 44258 44441 44447 44263 44264 44256 44440 44262 44450 44261 44445 44448 44258 44257 44264 44450 44256 44448 44258 44440 44441 44260 44445 44257 44262 44447 44448 44256 44445 44258 44257 44450 44264 44263 44441 44440 44256 44448 44447 44257 44262 44263 44440 44448 44256 44450 44441 44261 44260 44445 44263 44264 44446 44257 44262 44447 44448 44440 44450 44260 44256 44258 44263 44448 44264 44445 44256 44261 44262 44441 44450 44446 44446 44263 44447 44450 44445 44448 44262 44264 44261 44440 44256 44257 44440 44448 44262 44441 44264 44256"
## [95] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 7 44258 44445 44260 44450 44448 44440 44256"
## [96] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 67 44080 44075 44083 44083 44075 44083 44080 44083 44075 44081 44080 44075 44085 44083 43893 44081 43891 44085 44075 44080 44083 43893 44075 44083 44085 44080 43893 44075 44080 44083 44085 43893 44081 44075 44083 44075 44085 44083 44080 44075 44083 44085 43893 44080 43891 44085 44081 43893 44083 44075 43893 44080 44075 43891 44075 44085 43891 44075 44083 44080 44085 44076 43892 44080 44083 43893 43897"
## [97] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 10 44445 44440 44260 44258 44262 44446 44441 44447 44257 44256"
## [98] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 11 44256 44447 44264 44441 44446 44262 44260 44258 44440 44448 44445"
## [99] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 5 44446 44445 44448 44258 44440"
## [100] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 4 44256 44260 44448 44445"
## [101] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 15 44445 44260 44262 44450 44440 44256 44448 44446 44263 44258 44441 44261 44447 44264 44257"
## [102] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 17 44083 44085 43891 44080 43896 43892 43891 44079 44075 43893 43899 43895 43898 44082 44076 44081 43897"
## [103] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 3 44080 44083 43891"
## [104] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 18 43891 44083 44080 43891 44085 43896 43892 43895 44076 43898 44084 44075 44081 44082 43897 43899 43893 43892"
## [105] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 1 43891"
## [106] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_6_suppst6.xlsx Hsapiens 1 44080"
## [107] "PMC9924296 PMC_DL/PMC9924296/supplementaryfiles/bcd-21-0050_supp.table_3_suppst3.xlsx Hsapiens 1 43898"
## [108] "PMC9922290 PMC_DL/PMC9922290/supplementaryfiles/41467_2023_36462_MOESM16_ESM.xlsx Hsapiens 17 44813 44812 44811 44810 44809 44808 44807 44806 44815 44814 44805 44819 44628 44626 44625 44622 44621"
## [109] "PMC9921068 PMC_DL/PMC9921068/supplementaryfiles/13148_2023_1438_MOESM2_ESM.xlsx Hsapiens 10 37681 40057 40057 37316 37500 38047 40057 36951 38047 40057"
## [110] "PMC9921068 PMC_DL/PMC9921068/supplementaryfiles/13148_2023_1438_MOESM2_ESM.xlsx Hsapiens 3 40787 40057 40057"
## [111] "PMC9921068 PMC_DL/PMC9921068/supplementaryfiles/13148_2023_1438_MOESM2_ESM.xlsx Hsapiens 12 40057 37681 40057 37316 37500 38047 40057 36951 38047 40057 40057 40787"
## [112] "PMC9918054 zip/Table_S1_immune_genes.xlsx Hsapiens 1 44811"
## [113] "PMC9917645 zip/Supplementary_Tables_01_19_2023.xlsx Hsapiens 1 44810"
## [114] "PMC9917645 zip/Supplementary_Tables_01_19_2023.xlsx Hsapiens 1 44815"
## [115] "PMC9917435 PMC_DL/PMC9917435/supplementaryfiles/elife-71235-supp1.xlsx Hsapiens 1 44262"
## [116] "PMC9916342 zip/Source_data_Figure_1.xlsx Hsapiens 3 44628 44627 44629"
## [117] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 3 44810 44814 44625"
## [118] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 16 44805 44810 44809 44807 44623 44621 44626 44815 44629 44627 44818 44812 44631 44816 44814 44806"
## [119] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 3 44623 44808 44815"
## [120] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 10 44805 44810 44621 44626 44630 44622 44625 44815 44631 44816"
## [121] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 11 44814 44815 44805 44622 44621 44630 44806 44808 44812 44623 44627"
## [122] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 12 44813 44807 44622 44805 44814 44629 44808 44809 44626 44628 44625 44811"
## [123] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 17 44807 44813 44809 44631 44628 44805 44625 44626 44624 44808 44812 44622 44819 44623 44815 44627 44818"
## [124] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 10 44805 44621 44625 44812 44813 44819 44623 44818 44814 44624"
## [125] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 10 44805 44621 44810 44623 44811 44629 44809 44626 44622 44819"
## [126] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 12 44805 44621 44810 44809 44626 44622 44629 44819 44631 44812 44807 44625"
## [127] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 3 44805 44623 44621"
## [128] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 13 44805 44621 44810 44812 44809 44624 44896 44629 44807 44628 44816 44630 44625"
## [129] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 11 44621 44806 44805 44631 44623 44624 44813 44814 44809 44811 44819"
## [130] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 6 44624 44631 44809 44813 44819 44630"
## [131] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 3 44813 44623 44815"
## [132] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 10 44621 44805 44809 44623 44812 44813 44625 44810 44630 44629"
## [133] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 3 44805 44621 44810"
## [134] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 10 44805 44621 44622 44629 44809 44806 44627 44626 44630 44625"
## [135] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 14 44805 44621 44810 44622 44809 44814 44626 44630 44813 44806 44818 44631 44811 44808"
## [136] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 13 44805 44809 44811 44627 44808 44625 44806 44814 44819 44626 44621 44807 44815"
## [137] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 12 44805 44621 44622 44624 44631 44814 44809 44806 44807 44815 44812 44808"
## [138] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 12 44621 44805 44810 44631 44813 44809 44815 44812 44818 44811 44808 44629"
## [139] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 18 44805 44621 44810 44807 44628 44813 44626 44814 44806 44625 44627 44809 44630 44816 44812 44896 44624 44631"
## [140] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 5 44807 44811 44627 44623 44816"
## [141] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 11 44805 44621 44627 44626 44806 44815 44811 44810 44625 44624 44814"
## [142] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 10 44629 44807 44621 44815 44819 44811 44631 44806 44627 44810"
## [143] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 13 44805 44621 44623 44809 44815 44806 44810 44626 44630 44629 44807 44808 44622"
## [144] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 7 44805 44621 44810 44622 44624 44811 44809"
## [145] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 4 44805 44810 44815 44809"
## [146] "PMC9912572 PMC_DL/PMC9912572/supplementaryfiles/12885_2022_10491_MOESM3_ESM.xlsx Hsapiens 8 44624 44629 44807 44626 44622 44818 44819 44621"
## [147] "PMC9910644 PMC_DL/PMC9910644/supplementaryfiles/pgen.1010583.s003.xlsx Hsapiens 1 44447"
## [148] "PMC9899606 PMC_DL/PMC9899606/supplementaryfiles/CAS-114-665-s020.xls Hsapiens 58 44266 44261 44261 44261 44445 44444 44257 44448 44256 44261 44266 44261 44440 44451 44448 44442 44257 44266 44261 44257 44440 44266 44261 44448 44264 44266 44261 44261 44261 44257 44266 44261 44445 44448 44264 44266 44261 44266 44261 44261 44264 44448 44261 44440 44266 44261 44264 44264 44448 44261 44440 44264 44264 44440 44451 44257 44266 44261"
## [149] "PMC9899606 PMC_DL/PMC9899606/supplementaryfiles/CAS-114-665-s020.xls Hsapiens 8 44440 44266 44257 44451 44264 44261 44445 44448"
## [150] "PMC9899606 PMC_DL/PMC9899606/supplementaryfiles/CAS-114-665-s007.xls Hsapiens 67 44259 44440 44448 44449 44266 44447 44453 44445 44256 44261 44256 44258 44256 44261 44453 44440 44449 44263 44256 44257 44263 44450 44262 44445 44444 44259 44446 44261 44442 44448 44442 44450 44256 44453 44260 44443 44448 44262 44256 44444 44256 44258 44449 44266 44443 44261 44446 44256 44445 44265 44263 44451 44256 44453 44443 44256 44259 44256 44451 44266 44443 44448 44448 44259 44451 44256 44266"
## [151] "PMC9899606 PMC_DL/PMC9899606/supplementaryfiles/CAS-114-665-s007.xls Hsapiens 12 44440 44448 44449 44258 44440 44450 44442 44450 44448 44258 44443 44256"
## [152] "PMC9899606 PMC_DL/PMC9899606/supplementaryfiles/CAS-114-665-s007.xls Hsapiens 19 44256 44263 44445 44440 44259 44450 44262 44442 44453 44444 44446 44258 44448 44443 44449 44261 44451 44256 44266"
## [153] "PMC9897672 zip/sciadv.ade1085_table_s2.xlsx Dmelanogaster 5 37135 38596 38231 37500 37226"
## [154] "PMC9897672 zip/sciadv.ade1085_table_s2.xlsx Dmelanogaster 5 37135 38596 37500 37226 38231"
## [155] "PMC9897672 zip/sciadv.ade1085_table_s2.xlsx Dmelanogaster 5 38596 37500 38231 37135 37226"
## [156] "PMC9882984 zip/sciadv.adf6277_dataset_s3.xlsx Mmusculus 1 43719"
## [157] "PMC9902559 PMC_DL/PMC9902559/supplementaryfiles/41398_2023_2326_MOESM2_ESM.xlsx Hsapiens 24 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104 37104"
## [158] "PMC9892829 PMC_DL/PMC9892829/supplementaryfiles/MOL2-17-238-s002.xlsx Hsapiens 2 40057 40057"
## [159] "PMC9870088 PMC_DL/PMC9870088/supplementaryfiles/jciinsight-8-153740-s153.xlsx Mmusculus 20 44257 44449 44257 44258 44444 44441 44440 44261 44256 44263 44260 44448 44264 44443 44262 44447 44450 44446 44445 44256"
## [160] "PMC9870088 PMC_DL/PMC9870088/supplementaryfiles/jciinsight-8-153740-s153.xlsx Mmusculus 20 44264 44258 44449 44257 44450 44256 44263 44260 44447 44261 44440 44262 44441 44256 44446 44445 44448 44444 44257 44443"
## [161] "PMC9886953 PMC_DL/PMC9886953/supplementaryfiles/42003_2023_4463_MOESM4_ESM.xlsx Rnorvegicus 37 44446 44446 44266 44266 44261 44257 44256 44445 44450 44260 44266 44256 44256 44256 44446 44262 44256 44256 44265 44446 44256 44258 44450 44265 44450 44256 44256 44450 44261 44256 44266 44256 44266 44265 44450 44257 44450"
## [162] "PMC9886953 PMC_DL/PMC9886953/supplementaryfiles/42003_2023_4463_MOESM4_ESM.xlsx Rnorvegicus 80 44450 44448 44450 44450 44448 44450 44450 44448 44450 44448 44450 44450 44448 44258 44450 44448 44450 44450 44448 44448 44448 44448 44448 44448 44450 44448 44448 44448 44448 44448 44448 44450 44450 44448 44450 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44450 44448 44450 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44256 44448 44448 44448 44448 44448 44448 44448 44256 44448 44450"
## [163] "PMC9886953 PMC_DL/PMC9886953/supplementaryfiles/42003_2023_4463_MOESM4_ESM.xlsx Mmusculus 55 44450 44450 44450 44450 44448 44450 44266 44448 44266 44450 44266 44448 44448 44450 44448 44256 44448 44450 44450 44449 44450 44448 44448 44448 44448 44450 44448 44448 44256 44256 44448 44448 44256 44448 44448 44450 44448 44448 44450 44448 44256 44448 44256 44448 44448 44256 44450 44450 44450 44256 44450 44256 44449 44441 44446"
## [164] "PMC9886953 PMC_DL/PMC9886953/supplementaryfiles/42003_2023_4463_MOESM4_ESM.xlsx Mmusculus 164 44266 44448 44448 44448 44450 44448 44448 44448 44448 44448 44448 44450 44448 44448 44448 44448 44448 44266 44450 44450 44266 44256 44450 44450 44448 44256 44448 44450 44450 44450 44450 44448 44448 44448 44450 44450 44258 44450 44441 44450 44256 44450 44448 44450 44450 44450 44450 44450 44450 44450 44450 44450 44450 44448 44448 44448 44448 44450 44450 44256 44448 44256 44258 44450 44258 44450 44258 44450 44450 44258 44448 44450 44448 44448 44448 44450 44450 44448 44450 44450 44448 44450 44450 44448 44448 44448 44450 44448 44450 44450 44450 44448 44448 44448 44448 44450 44450 44448 44448 44448 44448 44450 44448 44256 44448 44450 44448 44450 44448 44448 44448 44448 44450 44448 44450 44448 44448 44448 44448 44448 44450 44256 44448 44448 44448 44448 44448 44448 44448 44448 44450 44448 44256 44450 44448 44256 44450 44448 44450 44450 44450 44448 44448 44450 44450 44448 44448 44450 44448 44450 44450 44448 44445 44450 44450 44450 44450 44450 44262 44450 44450 44446 44256 44256"
## [165] "PMC9968929 PMC_DL/PMC9968929/supplementaryfiles/Table1.XLSX Hsapiens 26 44440 44256 44449 44445 44257 44257 44448 44265 44266 44256 44259 44264 44263 44447 44443 44442 44446 44451 44531 44260 44262 44454 44441 44261 44450 44258"
## [166] "PMC9965082 zip/Supplimentry_File1.xlsx Hsapiens 14 40787 38961 39326 38047 40603 38777 39692 37865 40422 40057 38777 38231 38231 40057"
## [167] "PMC9965082 zip/Supplimentry_File1.xlsx Hsapiens 16 40603 38047 40787 38777 38231 38231 39692 37865 38961 39326 39692 37500 40057 38777 37500 40057"
## [168] "PMC9965082 zip/Supplimentry_File1.xlsx Hsapiens 3 40057 37865 38961"
## [169] "PMC9961725 zip/Table_S1_DEPs_and_DEPPs.xlsx Mmusculus 2 44811 44811"
## [170] "PMC9958104 PMC_DL/PMC9958104/supplementaryfiles/42003_2023_4607_MOESM4_ESM.xlsx Athaliana 2 44442 44287"
## [171] "PMC9950624 PMC_DL/PMC9950624/supplementaryfiles/Table_1.xlsx Hsapiens 24 2023-03-02 2023-03-04 2023-09-09 2023-09-10 2023-09-01 2023-03-07 2023-03-10 2023-03-01 2023-03-02 2023-09-08 2023-03-01 2023-09-03 2023-09-07 2023-03-03 2023-09-04 2023-09-06 2023-09-02 2023-03-06 2023-09-05 2023-09-12 2023-03-05 2023-09-11 2023-03-08 2023-03-09"
## [172] "PMC9950457 PMC_DL/PMC9950457/supplementaryfiles/mmc3.xlsx Hsapiens 3 44265 44449 44265"
## [173] "PMC9950457 PMC_DL/PMC9950457/supplementaryfiles/mmc3.xlsx Hsapiens 28 44445 44447 44266 44449 44258 44448 44448 44446 44266 44441 44450 44449 44531 44265 44261 44443 44260 44256 44257 44263 44450 44448 44260 44442 44440 44443 44450 44265"
## [174] "PMC9950457 PMC_DL/PMC9950457/supplementaryfiles/mmc3.xlsx Hsapiens 5 44811 44814 44815 44818 44814"
## [175] "PMC9950457 PMC_DL/PMC9950457/supplementaryfiles/mmc3.xlsx Hsapiens 1 44624"
## [176] "PMC9947337 PMC_DL/PMC9947337/supplementaryfiles/mmc2.xls Hsapiens 7 2019/09/04 2019/09/05 2019/09/11 2019/03/08 2019/09/08 2019/09/09 2019/03/10"
## [177] "PMC9945660 PMC_DL/PMC9945660/supplementaryfiles/12903_2023_2806_MOESM1_ESM.xlsx Hsapiens 27 44807 44815 44813 44818 44811 44626 44810 44806 44627 44624 44808 44628 44816 44629 44630 44621 44625 44631 44622 44896 44812 44805 44622 44623 44814 44809 44621"
## [178] "PMC9943656 zip/LY-MAM-SupplementaryMaterial-transcriptome.xlsx Scerevisiae 1 37165"
## [179] "PMC9936996 zip/MCB00402-22R1-Supplemental_Table_S1.xlsx Scerevisiae 1 44105"
## [180] "PMC9936996 zip/MCB00402-22R1-Supplemental_Table_S1.xlsx Scerevisiae 1 44105"
## [181] "PMC9936996 zip/MCB00402-22R1-Supplemental_Table_S3.xlsx Scerevisiae 2 44835 44705"
## [182] "PMC9936996 zip/MCB00402-22R1-Supplemental_Table_S3.xlsx Scerevisiae 2 44470 44340"
## [183] "PMC9936996 zip/MCB00402-22R1-Supplemental_Table_S3.xlsx Scerevisiae 2 44705 44835"
## [184] "PMC9936996 zip/MCB00402-22R1-Supplemental_Table_S3.xlsx Scerevisiae 2 44340 44470"
## [185] "PMC9939431 PMC_DL/PMC9939431/supplementaryfiles/mmc3.xlsx Hsapiens 1 44256"
## [186] "PMC9936472 PMC_DL/PMC9936472/supplementaryfiles/41467_2023_36586_MOESM8_ESM.xlsx Scerevisiae 1 44835"
## [187] "PMC9936472 PMC_DL/PMC9936472/supplementaryfiles/41467_2023_36586_MOESM5_ESM.xlsx Scerevisiae 1 44470"
## [188] "PMC9936472 PMC_DL/PMC9936472/supplementaryfiles/41467_2023_36586_MOESM5_ESM.xlsx Scerevisiae 1 44470"
## [189] "PMC9936472 PMC_DL/PMC9936472/supplementaryfiles/41467_2023_36586_MOESM7_ESM.xlsx Scerevisiae 6 44835 44835 44835 44835 44835 44835"
## [190] "PMC9935668 PMC_DL/PMC9935668/supplementaryfiles/12033_2022_526_MOESM4_ESM.xlsx Hsapiens 13 44819 44627 44624 44621 44626 44631 44623 44896 44628 44625 44629 44630 44622"
## [191] "PMC9935668 PMC_DL/PMC9935668/supplementaryfiles/12033_2022_526_MOESM3_ESM.xlsx Hsapiens 13 44819 44627 44624 44621 44626 44631 44623 44896 44628 44625 44629 44630 44622"
## [192] "PMC9935575 zip/Raw_data.xlsx Hsapiens 14 44622 44621 44627 44624 44621 44626 44631 44623 44896 44628 44625 44629 44630 44622"
## [193] "PMC9932166 PMC_DL/PMC9932166/supplementaryfiles/41467_2023_36518_MOESM5_ESM.xlsx Mmusculus 1 44808"
## [194] "PMC9932081 PMC_DL/PMC9932081/supplementaryfiles/41598_2023_29212_MOESM8_ESM.xlsx Hsapiens 24 44262 44441 44263 44257 44265 44447 44446 44444 44258 44256 44450 44449 44257 44448 44440 44261 44260 44256 44443 44264 44259 44266 44451 44453"
## [195] "PMC9932081 PMC_DL/PMC9932081/supplementaryfiles/41598_2023_29212_MOESM12_ESM.xlsx Hsapiens 24 44441 44262 44259 44449 44264 44446 44257 44447 44258 44256 44450 44448 44265 44443 44257 44256 44266 44261 44444 44260 44451 44440 44453 44263"
## [196] "PMC9931374 PMC_DL/PMC9931374/supplementaryfiles/pdig.0000151.s029.xlsx Hsapiens 1 44813"
## [197] "PMC9912024 PMC_DL/PMC9912024/supplementaryfiles/MSB-19-e11084-s004.xlsx Scerevisiae 1 37165"
## [198] "PMC9912024 PMC_DL/PMC9912024/supplementaryfiles/MSB-19-e11084-s004.xlsx Scerevisiae 1 37165"
## [199] "PMC9928207 PMC_DL/PMC9928207/supplementaryfiles/DataSheet_2.xlsx Hsapiens 1 44621"
## [200] "PMC9926862 PMC_DL/PMC9926862/supplementaryfiles/40104_2022_826_MOESM1_ESM.xlsx Ggallus 1 44623"
## [201] "PMC9926786 PMC_DL/PMC9926786/supplementaryfiles/12915_2023_1533_MOESM9_ESM.xlsx Mmusculus 2 44080 44080"
## [202] "PMC9926786 PMC_DL/PMC9926786/supplementaryfiles/12915_2023_1533_MOESM9_ESM.xlsx Mmusculus 4 44445 44445 44445 44445"
## [203] "PMC9926786 PMC_DL/PMC9926786/supplementaryfiles/12915_2023_1533_MOESM9_ESM.xlsx Mmusculus 3 44080 44080 44080"
## [204] "PMC9926786 PMC_DL/PMC9926786/supplementaryfiles/12915_2023_1533_MOESM4_ESM.xlsx Mmusculus 1 44810"
## [205] "PMC9926786 PMC_DL/PMC9926786/supplementaryfiles/12915_2023_1533_MOESM4_ESM.xlsx Mmusculus 1 44814"
## [206] "PMC9926786 PMC_DL/PMC9926786/supplementaryfiles/12915_2023_1533_MOESM4_ESM.xlsx Mmusculus 1 44813"
## [207] "PMC9922817 PMC_DL/PMC9922817/supplementaryfiles/mmc3.xlsx Hsapiens 9 44625 44819 44814 44815 44806 44809 44811 44812 44813"
## [208] "PMC9916187 PMC_DL/PMC9916187/supplementaryfiles/mmc4.xlsx Hsapiens 60 44806 44811 44806 44806 44627 44811 44806 44806 44896 44811 44806 44806 44806 44627 44626 44806 44806 44627 44806 44811 44810 44806 44813 44811 44806 44627 44806 44627 44811 44806 44811 44813 44813 44813 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44810 44815 44625 44627 44627 44627 44627 44627 44626 44811 44811 44811 44811 44811"
## [209] "PMC9917439 PMC_DL/PMC9917439/supplementaryfiles/elife-81828-supp1.xlsx Dmelanogaster 4 44441 44440 44444 44454"
## [210] "PMC9917439 PMC_DL/PMC9917439/supplementaryfiles/elife-81828-supp1.xlsx Dmelanogaster 1 44440"
## [211] "PMC9916484 zip/ijms-2076201-supplementary.xlsx Hsapiens 26 42248 37316 36951 40422 39142 38047 37500 40787 36951 38777 40603 37681 39692 39326 41883 37226 39508 38412 39873 41153 38231 40238 40057 37316 38596 37865"
## [212] "PMC9916442 zip/table_S1.xlsx Hsapiens 8 43720 43528 43534 43713 43525 43711 43527 43718"
## [213] "PMC9913674 zip/Table_S1.xls Hsapiens 13 2023/03/01 2023/12/01 2023/03/08 2023/03/03 2023/03/02 2023/03/10 2023/03/11 2023/03/04 2023/03/05 2023/03/06 2023/03/07 2023/03/09 2023/09/15"
## [214] "PMC9910260 PMC_DL/PMC9910260/supplementaryfiles/41598_2022_17107_MOESM2_ESM.xlsx Hsapiens 24 44260 44441 44264 44263 44446 44266 44445 44262 44448 44449 44443 44451 44440 44442 44531 44258 44447 44265 44256 44261 44257 44259 44450 44454"
## [215] "PMC9910260 PMC_DL/PMC9910260/supplementaryfiles/41598_2022_17107_MOESM3_ESM.xlsx Hsapiens 24 44441 44445 44262 44448 44256 44446 44454 44450 44261 44447 44257 44443 44449 44258 44260 44531 44263 44442 44264 44440 44259 44265 44451 44266"
## [216] "PMC9903843 PMC_DL/PMC9903843/supplementaryfiles/mmc5.xlsx Hsapiens 13 44625 44806 44625 44625 44622 44621 44622 44621 44625 44627 44628 44627 44809"
## [217] "PMC9903843 PMC_DL/PMC9903843/supplementaryfiles/mmc5.xlsx Hsapiens 13 44625 44806 44625 44625 44622 44621 44622 44621 44625 44627 44628 44627 44809"
## [218] "PMC9903843 PMC_DL/PMC9903843/supplementaryfiles/mmc7.xlsx Hsapiens 26 44624 44806 44818 44896 44629 44630 44813 44625 44625 44818 44622 44621 44807 44806 44622 44621 44622 44621 44807 44621 44627 44628 44627 44808 44809 44625"
## [219] "PMC9903843 PMC_DL/PMC9903843/supplementaryfiles/mmc7.xlsx Hsapiens 26 44624 44806 44818 44896 44629 44630 44813 44625 44625 44818 44622 44621 44807 44806 44622 44621 44622 44621 44807 44621 44627 44628 44627 44808 44809 44625"
## [220] "PMC9903828 PMC_DL/PMC9903828/supplementaryfiles/mmc3.xlsx Hsapiens 2 43718 43714"
## [221] "PMC9903716 PMC_DL/PMC9903716/supplementaryfiles/mmc21.xlsx Hsapiens 2 44448 44442"
## [222] "PMC9909915 PMC_DL/PMC9909915/supplementaryfiles/12920_2023_1450_MOESM4_ESM.xls Hsapiens 1 2023/09/04"
## [223] "PMC9907855 PMC_DL/PMC9907855/supplementaryfiles/Table_1.xlsx Athaliana 11 44654 44654 44805 44835 44781 44838 44774 44775 44835 44836 44806"
## [224] "PMC9906884 PMC_DL/PMC9906884/supplementaryfiles/13075_2023_3003_MOESM1_ESM.xlsx Hsapiens 34 44263 44531 44257 44441 44258 44448 44257 44449 44447 44261 44262 44441 44531 44263 44258 44262 44450 44257 44441 44449 44261 44256 44263 44257 44256 44441 44257 44262 44261 44450 44441 44263 44260 44258"
## [225] "PMC9899247 PMC_DL/PMC9899247/supplementaryfiles/41467_2023_35995_MOESM5_ESM.xlsx Hsapiens 44 43160 43435 43169 43160 43435 43169 43160 43435 43166 43352 43166 43347 43345 43345 43160 43345 43160 43160 43160 43160 43351 43353 43352 43163 43160 43168 43168 43353 43354 43160 43160 43160 43160 43160 43352 43353 43353 43353 43345 43345 43354 43165 43351 43354"
## [226] "PMC9894539 PMC_DL/PMC9894539/supplementaryfiles/ppat.1010753.s009.xlsx Hsapiens 2 5-Sep 1-Sep"
## [227] "PMC9892568 PMC_DL/PMC9892568/supplementaryfiles/41467_2023_35974_MOESM9_ESM.xlsx Hsapiens 1 37681"
## [228] "PMC9892568 PMC_DL/PMC9892568/supplementaryfiles/41467_2023_35974_MOESM3_ESM.xlsx Hsapiens 29 42248 37316 36951 40422 39142 38047 37500 40787 36951 38777 40603 37681 39692 39326 41883 37226 39508 38412 39873 41153 37135 37135 38231 40238 40057 37316 38596 37865 38961"
## [229] "PMC9892530 PMC_DL/PMC9892530/supplementaryfiles/41420_2023_1315_MOESM8_ESM.xlsx Hsapiens 24 4-Mar 11-Mar 11-Sep 4-Sep 8-Sep 9-Mar 5-Mar 6-Mar 10-Sep 8-Mar 14-Sep 5-Sep 12-Sep 3-Mar 3-Sep 9-Sep 7-Mar 6-Sep 1-Dec 10-Mar 15-Sep 2-Sep 2-Mar 7-Sep"
## [230] "PMC9890215 PMC_DL/PMC9890215/supplementaryfiles/izac201_suppl_supplementary_table_s3.xlsx Hsapiens 5 44621 44621 44621 44621 44621"
## [231] "PMC9890215 PMC_DL/PMC9890215/supplementaryfiles/izac201_suppl_supplementary_table_s2.xlsx Hsapiens 10 44626 44626 44628 44628 44623 44626 44623 44621 44621 44627"
## [232] "PMC9890070 PMC_DL/PMC9890070/supplementaryfiles/Table_1.xlsx Dmelanogaster 3 44835 44805 44806"
## [233] "PMC9889090 PMC_DL/PMC9889090/supplementaryfiles/elife-83077-supp2.xlsx Hsapiens 15 42248 41153 37135 37135 38231 40057 40422 37500 38596 37865 40787 39692 39326 41883 38961"
## [234] "PMC9889090 PMC_DL/PMC9889090/supplementaryfiles/elife-83077-supp2.xlsx Hsapiens 2 38231 38596"
## [235] "PMC9889090 PMC_DL/PMC9889090/supplementaryfiles/elife-83077-supp2.xlsx Hsapiens 15 42248 41153 37135 37135 38231 40057 40422 37500 38596 37865 40787 39692 39326 41883 38961"
## [236] "PMC9889090 PMC_DL/PMC9889090/supplementaryfiles/elife-83077-supp2.xlsx Hsapiens 13 37865 38231 39326 38961 40787 41153 39692 37500 37135 42248 40057 38596 40422"
## [237] "PMC9889089 PMC_DL/PMC9889089/supplementaryfiles/elife-80135-supp1.xlsx Hsapiens 78 40057 38961 42248 41883 39326 39692 42248 41883 38961 40057 41883 41883 40057 40057 40787 37500 41153 37500 40422 38961 41153 41153 37500 40787 37865 41883 38596 42248 40057 38231 42248 40787 37500 38231 40787 38596 41153 39326 41153 40422 40422 38961 38231 37865 40787 41153 40057 38596 39692 40422 39692 37865 38596 37500 39326 38961 38961 40422 38596 39326 40787 39692 42248 38231 40422 38231 41883 37865 39326 37500 39692 39326 39692 37865 42248 37865 38231 38596"
## [238] "PMC9889089 PMC_DL/PMC9889089/supplementaryfiles/elife-80135-supp1.xlsx Hsapiens 78 40057 38961 42248 41883 39326 39692 42248 41883 38961 40057 41883 41883 40057 40057 40787 37500 41153 37500 40422 38961 41153 41153 37500 40787 37865 41883 38596 42248 40057 38231 42248 40787 37500 38231 40787 38596 41153 39326 41153 40422 40422 38961 38231 37865 40787 41153 40057 38596 39692 40422 39692 37865 38596 37500 39326 38961 38961 40422 38596 39326 40787 39692 42248 38231 40422 38231 41883 37865 39326 37500 39692 39326 39692 37865 42248 37865 38231 38596"
## [239] "PMC9888242 PMC_DL/PMC9888242/supplementaryfiles/Table_1.xlsx Hsapiens 1 44440"
## [240] "PMC9887306 PMC_DL/PMC9887306/supplementaryfiles/Table_2.xlsx Hsapiens 1 44818"
## [241] "PMC9887000 PMC_DL/PMC9887000/supplementaryfiles/41467_2023_36097_MOESM6_ESM.xlsx Mmusculus 22 37135 39326 39692 40422 38231 40603 37865 38412 39508 37500 39142 37681 38777 38047 39873 38961 40787 40057 38596 37316 40238 37316"
## [242] "PMC9887000 PMC_DL/PMC9887000/supplementaryfiles/41467_2023_36097_MOESM12_ESM.xlsx Mmusculus 29 36951 36951 40057 40787 39326 38777 39326 39326 36951 40787 40057 39326 40603 40787 38047 40787 40238 37681 40603 40238 40057 40603 38047 37316 38047 38047 40603 38777 40787"
## [243] "PMC9887000 PMC_DL/PMC9887000/supplementaryfiles/41467_2023_36097_MOESM15_ESM.xlsx Mmusculus 5 36951 38412 39326 37500 40422"
## [244] "PMC9883539 PMC_DL/PMC9883539/supplementaryfiles/CAM4-12-2089-s003.xlsx Hsapiens 1 44531"
## [245] "PMC9883539 PMC_DL/PMC9883539/supplementaryfiles/CAM4-12-2089-s003.xlsx Hsapiens 12 44256 44258 44259 44261 44257 44263 44260 44266 44265 44531 44264 44262"
Let’s investigate the errors in more detail.
# By species
SPECIES <- sapply(strsplit(ERROR_GENELISTS," "),"[[",3)
table(SPECIES)
## SPECIES
## Athaliana Dmelanogaster Ggallus Hsapiens Mmusculus
## 2 6 3 182 37
## Rnorvegicus Scerevisiae
## 2 13
par(mar=c(5,12,4,2))
barplot(table(SPECIES),horiz=TRUE,las=1)
par(mar=c(5,5,4,2))
# Number of affected Excel files per paper
DIST <- table(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
DIST
##
## PMC7614190 PMC9870088 PMC9882984 PMC9883539 PMC9886953 PMC9887000 PMC9887306
## 1 2 1 2 4 3 1
## PMC9888242 PMC9889089 PMC9889090 PMC9890070 PMC9890215 PMC9892530 PMC9892568
## 1 2 4 1 2 1 2
## PMC9892829 PMC9894539 PMC9897672 PMC9899247 PMC9899606 PMC9902559 PMC9903716
## 1 1 3 1 5 1 1
## PMC9903828 PMC9903843 PMC9906884 PMC9907855 PMC9909915 PMC9910260 PMC9910644
## 1 4 1 1 1 2 1
## PMC9912024 PMC9912572 PMC9913674 PMC9916187 PMC9916342 PMC9916442 PMC9916484
## 2 30 1 1 1 1 1
## PMC9917435 PMC9917439 PMC9917645 PMC9918054 PMC9921068 PMC9922290 PMC9922817
## 1 2 2 1 3 1 1
## PMC9924296 PMC9924891 PMC9926786 PMC9926862 PMC9928207 PMC9928424 PMC9928580
## 25 2 6 1 1 6 2
## PMC9928584 PMC9929516 PMC9931374 PMC9932081 PMC9932166 PMC9935506 PMC9935575
## 5 2 1 2 1 5 1
## PMC9935668 PMC9935838 PMC9936472 PMC9936996 PMC9938127 PMC9939431 PMC9942122
## 2 1 4 6 2 1 8
## PMC9943656 PMC9945660 PMC9946837 PMC9947029 PMC9947337 PMC9947504 PMC9948016
## 1 1 6 1 1 1 1
## PMC9949586 PMC9950457 PMC9950624 PMC9951354 PMC9951815 PMC9953556 PMC9953877
## 2 4 1 17 2 1 1
## PMC9954001 PMC9954656 PMC9956962 PMC9957247 PMC9957872 PMC9958104 PMC9958381
## 2 4 1 1 1 1 1
## PMC9961725 PMC9965082 PMC9968710 PMC9968929
## 1 3 6 1
summary(as.numeric(DIST))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 2.784 2.250 30.000
hist(DIST,main="Number of affected Excel files per paper")
# PMC Articles with the most errors
DIST_DF <- as.data.frame(DIST)
DIST_DF <- DIST_DF[order(-DIST_DF$Freq),,drop=FALSE]
head(DIST_DF,20)
## Var1 Freq
## 30 PMC9912572 30
## 43 PMC9924296 25
## 74 PMC9951354 17
## 63 PMC9942122 8
## 45 PMC9926786 6
## 48 PMC9928424 6
## 60 PMC9936996 6
## 66 PMC9946837 6
## 87 PMC9968710 6
## 19 PMC9899606 5
## 50 PMC9928584 5
## 55 PMC9935506 5
## 5 PMC9886953 4
## 10 PMC9889090 4
## 23 PMC9903843 4
## 59 PMC9936472 4
## 72 PMC9950457 4
## 79 PMC9954656 4
## 6 PMC9887000 3
## 17 PMC9897672 3
MOST_ERR_FILES = as.character(DIST_DF[1,1])
MOST_ERR_FILES
## [1] "PMC9912572"
# Number of errors per paper
NERR <- as.numeric(sapply(strsplit(ERROR_GENELISTS," "),"[[",4))
names(NERR) <- sapply(strsplit(ERROR_GENELISTS," "),"[[",1)
NERR <-tapply(NERR, names(NERR), sum)
NERR
## PMC7614190 PMC9870088 PMC9882984 PMC9883539 PMC9886953 PMC9887000 PMC9887306
## 1 40 1 13 336 56 1
## PMC9888242 PMC9889089 PMC9889090 PMC9890070 PMC9890215 PMC9892530 PMC9892568
## 1 156 45 3 15 24 30
## PMC9892829 PMC9894539 PMC9897672 PMC9899247 PMC9899606 PMC9902559 PMC9903716
## 2 2 15 44 164 24 2
## PMC9903828 PMC9903843 PMC9906884 PMC9907855 PMC9909915 PMC9910260 PMC9910644
## 2 78 34 11 1 48 1
## PMC9912024 PMC9912572 PMC9913674 PMC9916187 PMC9916342 PMC9916442 PMC9916484
## 2 290 13 60 3 8 26
## PMC9917435 PMC9917439 PMC9917645 PMC9918054 PMC9921068 PMC9922290 PMC9922817
## 1 5 2 1 25 17 9
## PMC9924296 PMC9924891 PMC9926786 PMC9926862 PMC9928207 PMC9928424 PMC9928580
## 425 5 12 1 1 79 2
## PMC9928584 PMC9929516 PMC9931374 PMC9932081 PMC9932166 PMC9935506 PMC9935575
## 80 48 1 48 1 9 14
## PMC9935668 PMC9935838 PMC9936472 PMC9936996 PMC9938127 PMC9939431 PMC9942122
## 26 1 9 10 323 1 152
## PMC9943656 PMC9945660 PMC9946837 PMC9947029 PMC9947337 PMC9947504 PMC9948016
## 1 27 14 3 7 26 3
## PMC9949586 PMC9950457 PMC9950624 PMC9951354 PMC9951815 PMC9953556 PMC9953877
## 35 37 24 90 28 1 26
## PMC9954001 PMC9954656 PMC9956962 PMC9957247 PMC9957872 PMC9958104 PMC9958381
## 2 5 13 1 9 2 1
## PMC9961725 PMC9965082 PMC9968710 PMC9968929
## 2 33 61 26
hist(NERR,main="number of errors per PMC article")
NERR_DF <- as.data.frame(NERR)
NERR_DF <- NERR_DF[order(-NERR_DF$NERR),,drop=FALSE]
head(NERR_DF,20)
## NERR
## PMC9924296 425
## PMC9886953 336
## PMC9938127 323
## PMC9912572 290
## PMC9899606 164
## PMC9889089 156
## PMC9942122 152
## PMC9951354 90
## PMC9928584 80
## PMC9928424 79
## PMC9903843 78
## PMC9968710 61
## PMC9916187 60
## PMC9887000 56
## PMC9910260 48
## PMC9929516 48
## PMC9932081 48
## PMC9889090 45
## PMC9899247 44
## PMC9870088 40
MOST_ERR = rownames(NERR_DF)[1]
MOST_ERR
## [1] "PMC9924296"
GENELIST_ERROR_ARTICLES <- gsub("PMC","",GENELIST_ERROR_ARTICLES)
### JSON PARSING is more reliable than XML
ARTICLES <- esummary( GENELIST_ERROR_ARTICLES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA$result
ARTICLE_DATA <- ARTICLE_DATA[2:length(ARTICLE_DATA)]
JOURNALS <- unlist(lapply(ARTICLE_DATA,function(x) {x$fulljournalname} ))
JOURNALS_TABLE <- table(JOURNALS)
JOURNALS_TABLE <- JOURNALS_TABLE[order(-JOURNALS_TABLE)]
length(JOURNALS_TABLE)
## [1] 53
NUM_JOURNALS=length(JOURNALS_TABLE)
par(mar=c(5,25,4,2))
barplot(head(JOURNALS_TABLE,10), horiz=TRUE, las=1,
xlab="Articles with gene name errors in supp files",
main="Top journals this month")
Congrats to our Journal of the Month winner!
JOURNAL_WINNER <- names(head(JOURNALS_TABLE,1))
JOURNAL_WINNER
## [1] "International Journal of Molecular Sciences"
There are two categories:
Paper with the most suplementary files affected by gene name errors (MOST_ERR_FILES)
Paper with the most gene names converted to dates (MOST_ERR)
Sometimes, one paper can win both categories. Congrats to our winners.
MOST_ERR_FILES <- gsub("PMC","",MOST_ERR_FILES)
ARTICLES <- esummary( MOST_ERR_FILES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA[2]
ARTICLE_DATA
## $result
## $result$uids
## [1] "9912572"
##
## $result$`9912572`
## $result$`9912572`$uid
## [1] "9912572"
##
## $result$`9912572`$pubdate
## [1] "2023 Feb 9"
##
## $result$`9912572`$epubdate
## [1] "2023 Feb 9"
##
## $result$`9912572`$printpubdate
## [1] ""
##
## $result$`9912572`$source
## [1] "BMC Cancer"
##
## $result$`9912572`$authors
## name authtype
## 1 Tu Z Author
## 2 Li K Author
## 3 Ji Q Author
## 4 Huang Y Author
## 5 Lv S Author
## 6 Li J Author
## 7 Wu L Author
## 8 Huang K Author
## 9 Zhu X Author
##
## $result$`9912572`$title
## [1] "Pan-cancer analysis: predictive role of TAP1 in cancer prognosis and response to immunotherapy"
##
## $result$`9912572`$volume
## [1] "23"
##
## $result$`9912572`$issue
## [1] ""
##
## $result$`9912572`$pages
## [1] "133"
##
## $result$`9912572`$articleids
## idtype value
## 1 pmid 36759763
## 2 doi 10.1186/s12885-022-10491-w
## 3 pmcid PMC9912572
##
## $result$`9912572`$fulljournalname
## [1] "BMC Cancer"
##
## $result$`9912572`$sortdate
## [1] "2023/02/09 00:00"
##
## $result$`9912572`$pmclivedate
## [1] "2023/02/11"
MOST_ERR <- gsub("PMC","",MOST_ERR)
ARTICLE_DATA <- esummary(MOST_ERR,db = "pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLE_DATA,as= "parsed")
ARTICLE_DATA
## $header
## $header$type
## [1] "esummary"
##
## $header$version
## [1] "0.3"
##
##
## $result
## $result$uids
## [1] "9924296"
##
## $result$`9924296`
## $result$`9924296`$uid
## [1] "9924296"
##
## $result$`9924296`$pubdate
## [1] "2021 Aug 24"
##
## $result$`9924296`$epubdate
## [1] "2021 Aug 24"
##
## $result$`9924296`$printpubdate
## [1] "2022 Jan"
##
## $result$`9924296`$source
## [1] "Blood Cancer Discov"
##
## $result$`9924296`$authors
## name authtype
## 1 Petti AA Author
## 2 Khan SM Author
## 3 Xu Z Author
## 4 Helton N Author
## 5 Fronick CC Author
## 6 Fulton R Author
## 7 Ramakrishnan SM Author
## 8 Nonavinkere Srivatsan S Author
## 9 Heath SE Author
## 10 Westervelt P Author
## 11 Payton JE Author
## 12 Walter MJ Author
## 13 Link DC Author
## 14 DiPersio J Author
## 15 Miller C Author
## 16 Ley TJ Author
##
## $result$`9924296`$title
## [1] "Genetic and Transcriptional Contributions to Relapse in Normal Karyotype Acute Myeloid Leukemia"
##
## $result$`9924296`$volume
## [1] "3"
##
## $result$`9924296`$issue
## [1] "1"
##
## $result$`9924296`$pages
## [1] "32-49"
##
## $result$`9924296`$articleids
## idtype value
## 1 pmid 35019859
## 2 doi 10.1158/2643-3230.BCD-21-0050
## 3 pmcid PMC9924296
##
## $result$`9924296`$fulljournalname
## [1] "Blood Cancer Discovery"
##
## $result$`9924296`$sortdate
## [1] "2021/08/24 00:00"
##
## $result$`9924296`$pmclivedate
## [1] "2023/02/14"
To plot the trend over the past 6-12 months.
url <- "https://ziemann-lab.net/public/gene_name_errors/"
doc <- htmlParse(url)
links <- xpathSApply(doc, "//a/@href")
links <- links[grep("html",links)]
listing <- htmlParse( getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE) )
listing <- xpathSApply(listing, "//a/@href")
listing <- listing[grep("html",listing)]
unlink("online_files/",recursive=TRUE)
dir.create("online_files")
sapply(listing, function(mylink) {
download.file(paste(url,mylink,sep=""),destfile=paste("online_files/",mylink,sep=""))
} )
## href href href href href href href href href href href href href href href href
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## href href href href href href href href
## 0 0 0 0 0 0 0 0
myfilelist <- list.files("online_files/",full.names=TRUE)
trends <- sapply(myfilelist, function(myfilename) {
x <- readLines(myfilename)
# Num XL gene list articles
NUM_GENELIST_ARTICLES <- x[grep("NUM_GENELIST_ARTICLES",x)[3]+1]
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES," "),"[[",3)
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES,"<"),"[[",1)
NUM_GENELIST_ARTICLES <- as.numeric(NUM_GENELIST_ARTICLES)
# number of affected articles
NUM_ERROR_GENELIST_ARTICLES <- x[grep("NUM_ERROR_GENELIST_ARTICLES",x)[3]+1]
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES," "),"[[",3)
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES,"<"),"[[",1)
NUM_ERROR_GENELIST_ARTICLES <- as.numeric(NUM_ERROR_GENELIST_ARTICLES)
# Error proportion
ERROR_PROPORTION <- x[grep("ERROR_PROPORTION",x)[3]+1]
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION," "),"[[",3)
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION,"<"),"[[",1)
ERROR_PROPORTION <- as.numeric(ERROR_PROPORTION)
# number of journals
NUM_JOURNALS <- x[grep('JOURNALS_TABLE',x)[3]+1]
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS," "),"[[",3)
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS,"<"),"[[",1)
NUM_JOURNALS <- as.numeric(NUM_JOURNALS)
NUM_JOURNALS
res <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
return(res)
})
colnames(trends) <- sapply(strsplit(colnames(trends),"_"),"[[",3)
colnames(trends) <- gsub(".html","",colnames(trends))
trends <- as.data.frame(trends)
rownames(trends) <- c("NUM_GENELIST_ARTICLES","NUM_ERROR_GENELIST_ARTICLES","ERROR_PROPORTION","NUM_JOURNALS")
trends <- t(trends)
trends <- as.data.frame(trends)
CURRENT_RES <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
trends <- rbind(trends,CURRENT_RES)
paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
## [1] "2023-03"
rownames(trends)[nrow(trends)] <- paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
plot(trends$NUM_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with Excel gene lists per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_ERROR_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with gene name errors per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$ERROR_PROPORTION, xaxt = "n" , type="b" , main="Proportion of articles with Excel gene list affected by errors",
ylab="proportion", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_JOURNALS, xaxt = "n" , type="b" , main="Number of journals with affected articles",
ylab="number of journals", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
unlink("online_files/",recursive=TRUE)
Zeeberg, B.R., Riss, J., Kane, D.W. et al. Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5, 80 (2004). https://doi.org/10.1186/1471-2105-5-80
Ziemann, M., Eren, Y. & El-Osta, A. Gene name errors are widespread in the scientific literature. Genome Biol 17, 177 (2016). https://doi.org/10.1186/s13059-016-1044-7
sessionInfo()
## R version 4.2.2 Patched (2022-11-10 r83330)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RCurl_1.98-1.10 readxl_1.4.2 reutils_0.2.3 xml2_1.3.3
## [5] jsonlite_1.8.4 XML_3.99-0.13
##
## loaded via a namespace (and not attached):
## [1] assertthat_0.2.1 digest_0.6.31 bitops_1.0-7 cellranger_1.1.0
## [5] R6_2.5.1 evaluate_0.20 highr_0.10 rlang_1.0.6
## [9] cachem_1.0.7 cli_3.6.0 jquerylib_0.1.4 bslib_0.4.2
## [13] rmarkdown_2.20 tools_4.2.2 xfun_0.37 yaml_2.3.7
## [17] fastmap_1.1.1 compiler_4.2.2 htmltools_0.5.4 knitr_1.42
## [21] sass_0.4.5