Source: https://github.com/markziemann/GeneNameErrors2020
View the reports: http://ziemann-lab.net/public/gene_name_errors/
Gene name errors result when data are imported improperly into MS Excel and other spreadsheet programs (Zeeberg et al, 2004). Certain gene names like MARCH3, SEPT2 and DEC1 are converted into date format. These errors are surprisingly common in supplementary data files in the field of genomics (Ziemann et al, 2016). This could be considered a small error because it only affects a small number of genes, however it is symptomtic of poor data processing methods. The purpose of this script is to identify gene name errors present in supplementary files of PubMed Central articles in the previous month.
library("jsonlite")
library("xml2")
library("reutils")
library("readxl")
Here I will be getting PubMed Central IDs for the previous month.
Start with figuring out the date to search PubMed Central.
DATE="2022/6"
Let’s see how many PMC IDs we have in the past month.
QUERY ='((genom*[Abstract]))'
ESEARCH_RES <- esearch(term=QUERY, db = "pmc", rettype = "uilist", retmode = "xml", retstart = 0,
retmax = 5000000, usehistory = TRUE, webenv = NULL, querykey = NULL, sort = NULL, field = NULL,
datetype = NULL, reldate = NULL, mindate = DATE, maxdate = DATE)
pmc <- efetch(ESEARCH_RES,retmode="text",rettype="uilist",outfile="pmcids.txt")
## Retrieving UIDs 1 to 500
## Retrieving UIDs 501 to 1000
## Retrieving UIDs 1001 to 1500
## Retrieving UIDs 1501 to 2000
## Retrieving UIDs 2001 to 2500
## Retrieving UIDs 2501 to 3000
## Retrieving UIDs 3001 to 3500
## Retrieving UIDs 3501 to 4000
pmc <- read.table(pmc)
pmc <- paste("PMC",pmc$V1,sep="")
NUM_ARTICLES=length(pmc)
NUM_ARTICLES
## [1] 3503
writeLines(pmc,con="pmc.txt")
Now run the bash script. As PMC has changed and restricts scraping journal articles, it is best to use the dedicated utility called pygetpapers for the download.
Note that false positives can occur (~1.5%) and these results have not been verified by a human.
Here are some definitions:
NUM_XLS = Number of supplementary Excel files in this set of PMC articles.
NUM_XLS_ARTICLES = Number of articles matching the PubMed Central search which have supplementary Excel files.
GENELISTS = The gene lists found in the Excel files. Each Excel file is counted once even it has multiple gene lists.
NUM_GENELISTS = The number of Excel files with gene lists.
NUM_GENELIST_ARTICLES = The number of PMC articles with supplementary Excel gene lists.
ERROR_GENELISTS = Files suspected to contain gene name errors. The dates and five-digit numbers indicate transmogrified gene names.
NUM_ERROR_GENELISTS = Number of Excel gene lists with errors.
NUM_ERROR_GENELIST_ARTICLES = Number of articles with supplementary Excel gene name errors.
ERROR_PROPORTION = This is the proportion of articles with Excel gene lists that have errors.
system("./gene_names.sh pmc.txt")
results <- readLines("results.txt")
XLS <- results[grep("XLS",results,ignore.case=TRUE)]
NUM_XLS = length(XLS)
NUM_XLS
## [1] 5570
NUM_XLS_ARTICLES = length(unique(sapply(strsplit(XLS," "),"[[",1)))
NUM_XLS_ARTICLES
## [1] 954
GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>2]
#GENELISTS
NUM_GENELISTS <- length(unique(sapply(strsplit(GENELISTS," "),"[[",2)))
NUM_GENELISTS
## [1] 660
NUM_GENELIST_ARTICLES <- length(unique(sapply(strsplit(GENELISTS," "),"[[",1)))
NUM_GENELIST_ARTICLES
## [1] 345
ERROR_GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>3]
#ERROR_GENELISTS
NUM_ERROR_GENELISTS = length(ERROR_GENELISTS)
NUM_ERROR_GENELISTS
## [1] 285
GENELIST_ERROR_ARTICLES <- unique(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
GENELIST_ERROR_ARTICLES
## [1] "PMC9239442" "PMC9234272" "PMC9226039" "PMC9222349" "PMC9222903"
## [6] "PMC9222793" "PMC9219160" "PMC9164292" "PMC9207522" "PMC9206655"
## [11] "PMC9202886" "PMC9202124" "PMC9198913" "PMC9198845" "PMC9197769"
## [16] "PMC9194130" "PMC9178395" "PMC9186538" "PMC8982554" "PMC9179874"
## [21] "PMC9177981" "PMC9171052" "PMC9169770" "PMC9168984" "PMC9168533"
## [26] "PMC9167495" "PMC9163161" "PMC9163142" "PMC9162925" "PMC9155246"
## [31] "PMC8929108" "PMC9161486" "PMC9160332" "PMC9152293" "PMC9241257"
## [36] "PMC9226526" "PMC9226523" "PMC9208075" "PMC9233711" "PMC9223334"
## [41] "PMC9217776" "PMC9213774" "PMC9213508" "PMC9211071" "PMC9207414"
## [46] "PMC9205910" "PMC9205776" "PMC9203529" "PMC9203013" "PMC9201760"
## [51] "PMC9200849" "PMC9200740" "PMC9186391" "PMC9198558" "PMC9194831"
## [56] "PMC9117153" "PMC9098121" "PMC9190170" "PMC9174882" "PMC9187083"
## [61] "PMC9181722" "PMC9177989" "PMC9177860" "PMC9177428" "PMC9174980"
## [66] "PMC9170766" "PMC9169118" "PMC9168273" "PMC9168240" "PMC9165556"
## [71] "PMC9165502" "PMC9164415" "PMC9164052" "PMC9163145" "PMC9131284"
## [76] "PMC8928958" "PMC9159952" "PMC9159589" "PMC9158132" "PMC9157503"
## [81] "PMC9154123" "PMC9153011"
NUM_ERROR_GENELIST_ARTICLES <- length(GENELIST_ERROR_ARTICLES)
NUM_ERROR_GENELIST_ARTICLES
## [1] 82
ERROR_PROPORTION = NUM_ERROR_GENELIST_ARTICLES / NUM_GENELIST_ARTICLES
ERROR_PROPORTION
## [1] 0.2376812
Here you can have a look at all the gene lists detected in the past month, as well as those with errors. The dates are obvious errors, these are commonly dates in September, March, December and October. The five-digit numbers represent dates as they are encoded in the Excel internal format. The five digit number is the number of days since 1900. If you were to take these numbers and put them into Excel and format the cells as dates, then these will also mostly map to dates in September, March, December and October.
#GENELISTS
ERROR_GENELISTS
## [1] "PMC9239442 PMC_DL/PMC9239442/supplementaryfiles/pgen.1010225.s002.xlsx Hsapiens 3 44805 44807 44809"
## [2] "PMC9234272 PMC_DL/PMC9234272/supplementaryfiles/Table_1.xlsx Hsapiens 1 44623"
## [3] "PMC9226039 PMC_DL/PMC9226039/supplementaryfiles/41598_2022_14964_MOESM1_ESM.xls Hsapiens 12 2021/03/02 2021/03/09 2021/03/01 2021/03/08 2021/03/10 2021/03/11 2021/03/03 2021/12/01 2021/03/07 2021/03/06 2021/03/04 2021/03/05"
## [4] "PMC9222349 PMC_DL/PMC9222349/supplementaryfiles/mmc11.xlsx Hsapiens 1 37694"
## [5] "PMC9222349 PMC_DL/PMC9222349/supplementaryfiles/mmc11.xlsx Hsapiens 2 37694 SOX2-OCT4"
## [6] "PMC9222349 PMC_DL/PMC9222349/supplementaryfiles/mmc9.xlsx Hsapiens 4 39326 40787 37500 40057"
## [7] "PMC9222349 PMC_DL/PMC9222349/supplementaryfiles/mmc9.xlsx Hsapiens 4 39326 37500 40787 40057"
## [8] "PMC9222349 PMC_DL/PMC9222349/supplementaryfiles/mmc9.xlsx Hsapiens 4 39326 37500 40787 40057"
## [9] "PMC9222349 PMC_DL/PMC9222349/supplementaryfiles/mmc2.xlsx Hsapiens 7 37681 39142 40787 39326 39326 40057 37500"
## [10] "PMC9222349 PMC_DL/PMC9222349/supplementaryfiles/mmc2.xlsx Hsapiens 2 37500 39326"
## [11] "PMC9222903 zip/Supplementary_table_S1.xlsx Hsapiens 1 43893"
## [12] "PMC9222903 zip/Supplementary_table_S1.xlsx Hsapiens 1 43893"
## [13] "PMC9222793 zip/Supplementary_Files_S1-S3.xlsx Hsapiens 2 44443 44443"
## [14] "PMC9219160 PMC_DL/PMC9219160/supplementaryfiles/13073_2022_1071_MOESM4_ESM.xlsx Hsapiens 3 40787 39326 40057"
## [15] "PMC9219160 PMC_DL/PMC9219160/supplementaryfiles/13073_2022_1071_MOESM4_ESM.xlsx Hsapiens 1 39873"
## [16] "PMC9164292 PMC_DL/PMC9164292/supplementaryfiles/med-2022-0493-sm.xlsx Hsapiens 2 44624 44818"
## [17] "PMC9164292 PMC_DL/PMC9164292/supplementaryfiles/med-2022-0493-sm.xlsx Hsapiens 5 44818 44818 44624 44624 44624"
## [18] "PMC9164292 PMC_DL/PMC9164292/supplementaryfiles/med-2022-0493-sm.xlsx Hsapiens 2 44624 44818"
## [19] "PMC9164292 PMC_DL/PMC9164292/supplementaryfiles/med-2022-0493-sm.xlsx Hsapiens 5 44816 44631 44624 44816 44816"
## [20] "PMC9164292 PMC_DL/PMC9164292/supplementaryfiles/med-2022-0493-sm.xlsx Hsapiens 3 44624 44816 44631"
## [21] "PMC9164292 PMC_DL/PMC9164292/supplementaryfiles/med-2022-0493-sm.xlsx Hsapiens 2 44818 44624"
## [22] "PMC9207522 PMC_DL/PMC9207522/supplementaryfiles/Table1.XLSX Drerio 8 37316 39873 38047 36951 38777 39508 40603 38412"
## [23] "PMC9207522 PMC_DL/PMC9207522/supplementaryfiles/Table1.XLSX Drerio 8 37316 39873 38047 36951 38777 39508 40603 38412"
## [24] "PMC9206655 PMC_DL/PMC9206655/supplementaryfiles/41389_2022_410_MOESM2_ESM.xlsx Hsapiens 1 44531"
## [25] "PMC9206655 PMC_DL/PMC9206655/supplementaryfiles/41389_2022_410_MOESM2_ESM.xlsx Hsapiens 1 44531"
## [26] "PMC9206655 PMC_DL/PMC9206655/supplementaryfiles/41389_2022_410_MOESM5_ESM.xlsx Hsapiens 1 44531"
## [27] "PMC9206655 PMC_DL/PMC9206655/supplementaryfiles/41389_2022_410_MOESM5_ESM.xlsx Hsapiens 1 44531"
## [28] "PMC9206655 PMC_DL/PMC9206655/supplementaryfiles/41389_2022_410_MOESM5_ESM.xlsx Hsapiens 1 44531"
## [29] "PMC9206655 PMC_DL/PMC9206655/supplementaryfiles/41389_2022_410_MOESM4_ESM.xlsx Hsapiens 1 44531"
## [30] "PMC9202886 PMC_DL/PMC9202886/supplementaryfiles/pgen.1010230.s010.xlsx Hsapiens 1 38991"
## [31] "PMC9202124 PMC_DL/PMC9202124/supplementaryfiles/13073_2022_1068_MOESM7_ESM.xlsx Hsapiens 26 44811 44624 44814 44631 44818 44630 44896 44621 44622 44623 44625 44626 44627 44628 44629 44819 44805 44815 44816 44806 44807 44808 44809 44810 44812 44813"
## [32] "PMC9202124 PMC_DL/PMC9202124/supplementaryfiles/13073_2022_1068_MOESM7_ESM.xlsx Hsapiens 28 44622 44623 44808 44621 44811 44814 44624 44806 44805 44812 44628 44809 44818 44626 44631 44813 44630 44816 44810 44807 44815 44629 44627 44625 44622 44621 44819 44896"
## [33] "PMC9202124 PMC_DL/PMC9202124/supplementaryfiles/13073_2022_1068_MOESM7_ESM.xlsx Hsapiens 28 44623 44622 44621 44624 44806 44805 44630 44811 44631 44818 44626 44621 44814 44812 44813 44808 44816 44810 44807 44628 44815 44627 44809 44629 44625 44622 44819 44896"
## [34] "PMC9202124 PMC_DL/PMC9202124/supplementaryfiles/13073_2022_1068_MOESM7_ESM.xlsx Hsapiens 28 44808 44811 44624 44814 44622 44621 44626 44813 44621 44818 44805 44627 44810 44812 44628 44816 44807 44809 44631 44806 44815 44629 44625 44622 44623 44630 44819 44896"
## [35] "PMC9202124 PMC_DL/PMC9202124/supplementaryfiles/13073_2022_1068_MOESM7_ESM.xlsx Hsapiens 27 44624 44622 44623 44814 44811 44621 44818 44810 44631 44621 44807 44625 44627 44630 44812 44626 44809 44819 44815 44628 44629 44816 44805 44808 44813 44622 44810"
## [36] "PMC9202124 PMC_DL/PMC9202124/supplementaryfiles/13073_2022_1068_MOESM7_ESM.xlsx Hsapiens 27 44624 44623 44622 44621 44810 44814 44811 44625 44807 44627 44812 44621 44626 44819 44815 44631 44818 44628 44629 44816 44805 44808 44630 44813 44622 44809 44810"
## [37] "PMC9202124 PMC_DL/PMC9202124/supplementaryfiles/13073_2022_1068_MOESM7_ESM.xlsx Hsapiens 27 44811 44622 44818 44621 44624 44631 44814 44630 44809 44819 44627 44810 44815 44621 44626 44623 44812 44628 44625 44629 44816 44805 44808 44813 44622 44807 44810"
## [38] "PMC9202124 PMC_DL/PMC9202124/supplementaryfiles/13073_2022_1068_MOESM7_ESM.xlsx Hsapiens 26 44811 44805 44621 44806 44628 44627 44623 44631 44807 44630 44818 44625 44814 44624 44896 44622 44626 44629 44819 44815 44816 44808 44809 44810 44812 44813"
## [39] "PMC9202124 PMC_DL/PMC9202124/supplementaryfiles/13073_2022_1068_MOESM7_ESM.xlsx Hsapiens 26 44805 44806 44627 44623 44621 44811 44807 44628 44625 44814 44624 44896 44630 44631 44622 44626 44629 44819 44815 44816 44818 44808 44809 44810 44812 44813"
## [40] "PMC9198913 PMC_DL/PMC9198913/supplementaryfiles/advancesADV2021005381-suppl2.xlsx Hsapiens 2 42797 42797"
## [41] "PMC9198913 PMC_DL/PMC9198913/supplementaryfiles/advancesADV2021005381-suppl2.xlsx Hsapiens 1 44452"
## [42] "PMC9198845 PMC_DL/PMC9198845/supplementaryfiles/mmc2.xlsx Hsapiens 4 44626 44623 44629 44622"
## [43] "PMC9198845 PMC_DL/PMC9198845/supplementaryfiles/mmc2.xlsx Hsapiens 4 44626 44623 44629 44622"
## [44] "PMC9197769 PMC_DL/PMC9197769/supplementaryfiles/41588_2022_1082_MOESM5_ESM.xlsx Hsapiens 3 6-Sep 8-Sep 6-Mar"
## [45] "PMC9194130 PMC_DL/PMC9194130/supplementaryfiles/mmc3.xlsx Hsapiens 5 44265 44449 44445 44448 44448"
## [46] "PMC9178395 PMC_DL/PMC9178395/supplementaryfiles/CTM2-12-e885-s004.xlsx Hsapiens 1 44621"
## [47] "PMC9178395 PMC_DL/PMC9178395/supplementaryfiles/CTM2-12-e885-s004.xlsx Hsapiens 1 44815"
## [48] "PMC9178395 PMC_DL/PMC9178395/supplementaryfiles/CTM2-12-e885-s004.xlsx Hsapiens 1 44621"
## [49] "PMC9178395 PMC_DL/PMC9178395/supplementaryfiles/CTM2-12-e885-s004.xlsx Hsapiens 1 44815"
## [50] "PMC9178395 PMC_DL/PMC9178395/supplementaryfiles/CTM2-12-e885-s004.xlsx Hsapiens 1 44811"
## [51] "PMC9186538 PMC_DL/PMC9186538/supplementaryfiles/NIHMS1784392-supplement-Table_S3.xls Mmusculus 7 44257 44257 44257 44257 44446 44442 44259"
## [52] "PMC8982554 zip/Supplementary_Table_S2.xlsx Hsapiens 2 44077 44077"
## [53] "PMC8982554 zip/Supplementary_Table_S2.xlsx Hsapiens 1 44085"
## [54] "PMC8982554 zip/Supplementary_Table_S2.xlsx Hsapiens 1 44085"
## [55] "PMC8982554 zip/Supplementary_Table_S2.xlsx Hsapiens 1 44077"
## [56] "PMC8982554 zip/Supplementary_Table_S2.xlsx Hsapiens 3 44077 44077 44077"
## [57] "PMC9179874 zip/Supp_Table_S2.xlsx Hsapiens 1 44813"
## [58] "PMC9177981 zip/sup_table_1.xlsx Mmusculus 1 43896"
## [59] "PMC9177981 zip/sup_table_1.xlsx Mmusculus 1 44083"
## [60] "PMC9171052 PMC_DL/PMC9171052/supplementaryfiles/Data_Sheet_2.XLSX Hsapiens 1 14977"
## [61] "PMC9169770 PMC_DL/PMC9169770/supplementaryfiles/pnas.2119593119.sd01.xlsx Scerevisiae 2 44470 44340"
## [62] "PMC9168984 PMC_DL/PMC9168984/supplementaryfiles/Table_1.xlsx Hsapiens 5 44896 44819 44623 44624 44626"
## [63] "PMC9168533 PMC_DL/PMC9168533/supplementaryfiles/Table2.xlsx Hsapiens 2 44089 44086"
## [64] "PMC9168533 PMC_DL/PMC9168533/supplementaryfiles/Table2.xlsx Hsapiens 1 44086"
## [65] "PMC9167495 PMC_DL/PMC9167495/supplementaryfiles/12864_2022_8659_MOESM5_ESM.xlsx Hsapiens 1 SNX7/15/19/32"
## [66] "PMC9163161 PMC_DL/PMC9163161/supplementaryfiles/41467_2022_30765_MOESM10_ESM.xlsx Hsapiens 1 37226"
## [67] "PMC9163161 PMC_DL/PMC9163161/supplementaryfiles/41467_2022_30765_MOESM10_ESM.xlsx Hsapiens 1 37226"
## [68] "PMC9163161 PMC_DL/PMC9163161/supplementaryfiles/41467_2022_30765_MOESM8_ESM.xlsx Hsapiens 1 37226"
## [69] "PMC9163161 PMC_DL/PMC9163161/supplementaryfiles/41467_2022_30765_MOESM11_ESM.xlsx Hsapiens 1 37226"
## [70] "PMC9163142 PMC_DL/PMC9163142/supplementaryfiles/41525_2022_305_MOESM2_ESM.xlsx Hsapiens 1 44626"
## [71] "PMC9162925 PMC_DL/PMC9162925/supplementaryfiles/41375_2022_1571_MOESM7_ESM.xlsx Hsapiens 20 38961 40787 40422 39326 37865 37316 39508 37135 40057 37500 38412 37681 42248 38777 39142 38231 39692 39873 36951 38596"
## [72] "PMC9162925 PMC_DL/PMC9162925/supplementaryfiles/41375_2022_1571_MOESM7_ESM.xlsx Hsapiens 20 39873 37865 40787 38961 39326 37135 38412 38231 38777 39508 37316 36951 42248 37681 37500 40057 39142 39692 40422 38596"
## [73] "PMC9162925 PMC_DL/PMC9162925/supplementaryfiles/41375_2022_1571_MOESM6_ESM.xlsx Hsapiens 20 40057 37500 38412 42248 39692 39508 39326 37865 38231 38961 37316 38596 37135 38777 39142 40787 40422 36951 37681 39873"
## [74] "PMC9162925 PMC_DL/PMC9162925/supplementaryfiles/41375_2022_1571_MOESM6_ESM.xlsx Hsapiens 20 38961 40057 37135 40422 39873 37865 39326 37316 39508 37500 37681 38777 42248 39142 36951 38596 38231 40787 39692 38412"
## [75] "PMC9155246 zip/Table_S1.xlsx Hsapiens 4 44450 44444 44262 44454"
## [76] "PMC8929108 zip/Supplementary_Table_S3.xlsx Ggallus 1 44450"
## [77] "PMC8929108 zip/Supplementary_Table_S3.xlsx Hsapiens 1 44443"
## [78] "PMC9161486 PMC_DL/PMC9161486/supplementaryfiles/13046_2022_2380_MOESM2_ESM.xlsx Hsapiens 7 39692 42248 40057 37500 40787 39326 40422"
## [79] "PMC9161486 PMC_DL/PMC9161486/supplementaryfiles/13046_2022_2380_MOESM2_ESM.xlsx Hsapiens 8 39692 40422 42248 39326 40057 37500 40787 38961"
## [80] "PMC9161486 PMC_DL/PMC9161486/supplementaryfiles/13046_2022_2380_MOESM2_ESM.xlsx Hsapiens 6 39692 37500 40057 39326 40422 40787"
## [81] "PMC9161486 PMC_DL/PMC9161486/supplementaryfiles/13046_2022_2380_MOESM2_ESM.xlsx Hsapiens 7 39692 42248 40057 37500 40787 39326 40422"
## [82] "PMC9161486 PMC_DL/PMC9161486/supplementaryfiles/13046_2022_2380_MOESM2_ESM.xlsx Hsapiens 8 39692 40422 42248 39326 40057 37500 40787 38961"
## [83] "PMC9161486 PMC_DL/PMC9161486/supplementaryfiles/13046_2022_2380_MOESM2_ESM.xlsx Hsapiens 6 39692 37500 40057 39326 40422 40787"
## [84] "PMC9161486 PMC_DL/PMC9161486/supplementaryfiles/13046_2022_2380_MOESM2_ESM.xlsx Hsapiens 6 40422 37500 40787 39692 37865 40057"
## [85] "PMC9160332 PMC_DL/PMC9160332/supplementaryfiles/Data_Sheet_5.XLS Hsapiens 10 2011/09/01 2005/03/01 2006/03/01 2001/09/01 2005/03/01 2001/12/01 2007/09/01 2006/03/01 2003/09/01 2002/03/01"
## [86] "PMC9160332 PMC_DL/PMC9160332/supplementaryfiles/Data_Sheet_4.XLS Hsapiens 5 2011/09/01 2007/09/01 2006/03/01 2008/09/01 2002/09/01"
## [87] "PMC9152293 PMC_DL/PMC9152293/supplementaryfiles/Table_4.xlsx Drerio 1 8-mar"
## [88] "PMC9152293 PMC_DL/PMC9152293/supplementaryfiles/Table_6.xlsx Drerio 2 43898 44079"
## [89] "PMC9152293 PMC_DL/PMC9152293/supplementaryfiles/Table_8.xlsx Drerio 3 44263 44263 44263"
## [90] "PMC9241257 PMC_DL/PMC9241257/supplementaryfiles/13287_2022_2970_MOESM2_ESM.xlsx Hsapiens 2 44259 44258"
## [91] "PMC9226526 zip/Supplementary_Table_S8.xlsx Mmusculus 1 44260"
## [92] "PMC9226526 zip/Supplementary_Table_S8.xlsx Mmusculus 1 44260"
## [93] "PMC9226526 zip/Supplementary_Table_S4.xlsx Mmusculus 1 44263"
## [94] "PMC9226526 zip/Supplementary_Table_S4.xlsx Hsapiens 1 44257"
## [95] "PMC9226526 zip/Supplementary_Table_S4.xlsx Mmusculus 7 44257 44263 44260 44260 44257 44263 44257"
## [96] "PMC9226526 zip/Supplementary_Table_S4.xlsx Mmusculus 3 44260 44257 44260"
## [97] "PMC9226523 zip/Table_S1.xlsx Celegans 7 36982 36982 37681 37681 37135 37135 37165"
## [98] "PMC9208075 PMC_DL/PMC9208075/supplementaryfiles/MOL2-16-2432-s003.xlsx Hsapiens 1 40603"
## [99] "PMC9233711 PMC_DL/PMC9233711/supplementaryfiles/41467_2022_31220_MOESM6_ESM.xlsx Hsapiens 8 44444 44257 44443 44256 44442 44265 44260 44449"
## [100] "PMC9233711 PMC_DL/PMC9233711/supplementaryfiles/41467_2022_31220_MOESM6_ESM.xlsx Hsapiens 7 44442 44444 44257 44443 44265 44257 44440"
## [101] "PMC9223334 PMC_DL/PMC9223334/supplementaryfiles/ppat.1010089.s003.xlsx Scerevisiae 1 37864"
## [102] "PMC9223334 PMC_DL/PMC9223334/supplementaryfiles/ppat.1010089.s009.xlsx Scerevisiae 2 39326 36982"
## [103] "PMC9223334 PMC_DL/PMC9223334/supplementaryfiles/ppat.1010089.s010.xlsx Scerevisiae 2 39326 36982"
## [104] "PMC9217776 PMC_DL/PMC9217776/supplementaryfiles/401_2022_2431_MOESM2_ESM.xlsx Hsapiens 4 44261 44257 44450 44447"
## [105] "PMC9213774 PMC_DL/PMC9213774/supplementaryfiles/mmc3.xlsx Mmusculus 14 39326 38596 40787 40057 37865 37316 39692 38961 38231 37500 40422 38412 42248 39326"
## [106] "PMC9213508 PMC_DL/PMC9213508/supplementaryfiles/42003_2022_3564_MOESM2_ESM.xlsx Hsapiens 76 44819 44819 44819 44622 44621 44622 44621 44622 44621 44814 44814 44814 44814 44814 44627 44627 44627 44624 44624 44806 44806 44806 44806 44806 44806 44806 44806 44806 44815 44621 44626 44626 44626 44626 44626 44626 44623 44623 44812 44812 44812 44812 44811 44896 44896 44628 44628 44628 44628 44628 44628 44628 44625 44625 44625 44625 44629 44816 44816 44816 44816 44805 44805 44805 44813 44813 44813 44622 44622 44622 44622 44622 44622 44622 44622 44807"
## [107] "PMC9211071 zip/supplementary/Supplementary_Table_5-1-.xlsx Hsapiens 26 44257 44442 44443 44257 44446 44445 44262 44450 44264 44259 44256 44261 44453 44447 44263 44441 44531 44265 44258 44440 44266 44448 44444 44256 44449 44260"
## [108] "PMC9211071 zip/supplementary/Supplementary_Table_5.xlsx Hsapiens 26 44257 44442 44443 44257 44446 44445 44262 44450 44264 44259 44256 44261 44453 44447 44263 44441 44531 44265 44258 44440 44266 44448 44444 44256 44449 44260"
## [109] "PMC9211071 zip/supplementary/Supplementary_Table_1.xlsx Hsapiens 2 44081 44085"
## [110] "PMC9211071 zip/supplementary/Supplementary_Table_3.xlsx Hsapiens 2 44264 44450"
## [111] "PMC9207414 PMC_DL/PMC9207414/supplementaryfiles/DataSheet2.XLSX Hsapiens 13 44805 44630 44626 44815 44624 44810 44622 44631 44627 44816 44818 44806 44812"
## [112] "PMC9205910 PMC_DL/PMC9205910/supplementaryfiles/41467_2022_31197_MOESM8_ESM.xlsx Hsapiens 12 44447 44259 44448 44257 44264 44256 44441 44446 44261 44262 44450 44263"
## [113] "PMC9205776 PMC_DL/PMC9205776/supplementaryfiles/41594_2022_773_MOESM3_ESM.xlsx Hsapiens 2 36951 37316"
## [114] "PMC9203529 PMC_DL/PMC9203529/supplementaryfiles/41598_2022_13693_MOESM1_ESM.xlsx Drerio 4 44348 44349 44350 44351"
## [115] "PMC9203013 PMC_DL/PMC9203013/supplementaryfiles/pgen.1009995.s010.xlsx Dmelanogaster 1 37135"
## [116] "PMC9203013 PMC_DL/PMC9203013/supplementaryfiles/pgen.1009995.s010.xlsx Dmelanogaster 1 37135"
## [117] "PMC9203013 PMC_DL/PMC9203013/supplementaryfiles/pgen.1009995.s011.xlsx Dmelanogaster 1 37135"
## [118] "PMC9203013 PMC_DL/PMC9203013/supplementaryfiles/pgen.1009995.s011.xlsx Dmelanogaster 1 37135"
## [119] "PMC9203013 PMC_DL/PMC9203013/supplementaryfiles/pgen.1009995.s005.xlsx Dmelanogaster 2 37135 37226"
## [120] "PMC9201760 PMC_DL/PMC9201760/supplementaryfiles/Table1.xlsx Hsapiens 2 44812 44815"
## [121] "PMC9200849 PMC_DL/PMC9200849/supplementaryfiles/41467_2022_31086_MOESM4_ESM.xlsx Hsapiens 1 37135"
## [122] "PMC9200849 PMC_DL/PMC9200849/supplementaryfiles/41467_2022_31086_MOESM4_ESM.xlsx Hsapiens 1 37135"
## [123] "PMC9200849 PMC_DL/PMC9200849/supplementaryfiles/41467_2022_31086_MOESM4_ESM.xlsx Hsapiens 2 37135 37135"
## [124] "PMC9200849 PMC_DL/PMC9200849/supplementaryfiles/41467_2022_31086_MOESM4_ESM.xlsx Hsapiens 2 37135 37135"
## [125] "PMC9200849 PMC_DL/PMC9200849/supplementaryfiles/41467_2022_31086_MOESM4_ESM.xlsx Hsapiens 1 37226"
## [126] "PMC9200740 PMC_DL/PMC9200740/supplementaryfiles/41467_2022_30961_MOESM5_ESM.xlsx Mmusculus 1 44258"
## [127] "PMC9200740 PMC_DL/PMC9200740/supplementaryfiles/41467_2022_30961_MOESM6_ESM.xlsx Mmusculus 2 44448 44442"
## [128] "PMC9200740 PMC_DL/PMC9200740/supplementaryfiles/41467_2022_30961_MOESM6_ESM.xlsx Mmusculus 1 44446"
## [129] "PMC9186391 PMC_DL/PMC9186391/supplementaryfiles/supp_gad.349456.122_Supplemental_Table_S1.xlsx Hsapiens 26 43530 43720 43528 43719 43526 43525 43723 43715 43722 43527 43534 43535 43717 43714 43531 43529 43800 43532 43709 43718 43716 43711 43713 43710 43712 43533"
## [130] "PMC9186391 PMC_DL/PMC9186391/supplementaryfiles/supp_gad.349456.122_Supplemental_Table_S3.xlsx Hsapiens 15 44083 43892 44082 44080 43896 44077 43892 43895 43899 44076 43898 44085 43897 44084 44081"
## [131] "PMC9198558 PMC_DL/PMC9198558/supplementaryfiles/Table1.xlsx Hsapiens 7 44257 44258 44260 44261 44262 44263 44531"
## [132] "PMC9194831 PMC_DL/PMC9194831/supplementaryfiles/Table4.xls Hsapiens 2 2022/03/11 2022/03/04"
## [133] "PMC9194831 PMC_DL/PMC9194831/supplementaryfiles/Table3.xls Hsapiens 1 2022/03/10"
## [134] "PMC9117153 PMC_DL/PMC9117153/supplementaryfiles/mmc2.xlsx Hsapiens 95 39692 39692 37865 38231 40787 36951 38412 38777 40787 38231 42248 42248 38412 42248 38047 40787 39508 39508 37681 42248 42248 36951 42248 39508 39508 40422 42248 40422 37865 40787 36951 37500 37500 36951 39508 37500 37316 39508 38231 41153 37316 41153 38231 37500 37500 37316 40787 38777 38412 41153 39508 41153 40422 36951 37500 41153 39508 37500 40787 36951 42248 41153 37500 38777 36951 37500 39508 37500 41153 39508 37500 36951 37316 37500 37500 39142 37500 37500 41153 37316 39508 39508 40787 42248 42248 41153 41153 41153 37500 42248 41153 37500 41153 41153 41153"
## [135] "PMC9117153 PMC_DL/PMC9117153/supplementaryfiles/mmc2.xlsx Hsapiens 97 37865 39692 39692 38231 38231 42248 42248 42248 42248 42248 36951 40787 42248 36951 40422 40787 36951 36951 42248 39142 42248 40238 39508 41153 37500 40422 41153 37500 37500 39508 41153 38777 39508 37500 37500 38412 41153 41153 40787 41153 36951 37500 41153 38777 37500 37500 37500 40422 41153 41153 42248 40787 37500 41153 38412 40787 38231 41153 40787 42248 40422 37316 41153 37316 38231 39508 36951 37316 37500 38047 37500 41153 37500 41153 41153 37500 36951 39508 40787 39508 37865 36951 39508 39508 39508 42248 37500 37500 39508 38412 37681 39508 39508 37316 38777 37500 37316"
## [136] "PMC9117153 PMC_DL/PMC9117153/supplementaryfiles/mmc2.xlsx Hsapiens 27 39692 40603 40057 37226 40787 37681 40238 37865 38596 38231 42248 37500 38412 39326 39508 41153 37316 40422 36951 37135 38777 39873 36951 39142 41883 38047 37316"
## [137] "PMC9117153 PMC_DL/PMC9117153/supplementaryfiles/mmc2.xlsx Hsapiens 27 40057 39692 42248 38596 38231 37226 40787 36951 40238 36951 41153 38412 37681 37865 38777 40603 39873 41883 39326 37500 38047 37316 39142 37316 40422 37135 39508"
## [138] "PMC9098121 PMC_DL/PMC9098121/supplementaryfiles/mmc4.xlsx Hsapiens 12 42248 38777 39692 38961 40422 40787 37500 39326 37316 39142 38412 40057"
## [139] "PMC9098121 PMC_DL/PMC9098121/supplementaryfiles/mmc2.xlsx Hsapiens 6 36951 39508 38412 36951 40422 40057"
## [140] "PMC9098121 PMC_DL/PMC9098121/supplementaryfiles/mmc3.xlsx Hsapiens 1 37316"
## [141] "PMC9190170 PMC_DL/PMC9190170/supplementaryfiles/12916_2022_2399_MOESM1_ESM.xlsx Hsapiens 6 39508 39508 39508 39508 39508 39508"
## [142] "PMC9190170 PMC_DL/PMC9190170/supplementaryfiles/12916_2022_2399_MOESM1_ESM.xlsx Hsapiens 22 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508"
## [143] "PMC9190170 PMC_DL/PMC9190170/supplementaryfiles/12916_2022_2399_MOESM1_ESM.xlsx Hsapiens 6 39508 39508 39508 39508 39508 39508"
## [144] "PMC9190170 PMC_DL/PMC9190170/supplementaryfiles/12916_2022_2399_MOESM1_ESM.xlsx Hsapiens 22 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508 39508"
## [145] "PMC9190170 PMC_DL/PMC9190170/supplementaryfiles/12916_2022_2399_MOESM1_ESM.xlsx Hsapiens 1 39692"
## [146] "PMC9190170 PMC_DL/PMC9190170/supplementaryfiles/12916_2022_2399_MOESM1_ESM.xlsx Hsapiens 1 39692"
## [147] "PMC9190170 PMC_DL/PMC9190170/supplementaryfiles/12916_2022_2399_MOESM1_ESM.xlsx Hsapiens 1 39692"
## [148] "PMC9190170 PMC_DL/PMC9190170/supplementaryfiles/12916_2022_2399_MOESM1_ESM.xlsx Hsapiens 1 39692"
## [149] "PMC9190170 PMC_DL/PMC9190170/supplementaryfiles/12916_2022_2399_MOESM1_ESM.xlsx Hsapiens 1 39508"
## [150] "PMC9190170 PMC_DL/PMC9190170/supplementaryfiles/12916_2022_2399_MOESM1_ESM.xlsx Hsapiens 1 39508"
## [151] "PMC9190170 PMC_DL/PMC9190170/supplementaryfiles/12916_2022_2399_MOESM1_ESM.xlsx Hsapiens 1 39508"
## [152] "PMC9190170 PMC_DL/PMC9190170/supplementaryfiles/12916_2022_2399_MOESM1_ESM.xlsx Hsapiens 1 39508"
## [153] "PMC9174882 PMC_DL/PMC9174882/supplementaryfiles/EMMM-14-e15816-s001.xlsx Mmusculus 12 38231 37865 39508 37681 36951 38047 39873 38961 40787 40057 40238 37316"
## [154] "PMC9174882 PMC_DL/PMC9174882/supplementaryfiles/EMMM-14-e15816-s001.xlsx Mmusculus 11 40787 38961 37500 40422 39692 37681 40238 36951 38231 38596 39508"
## [155] "PMC9174882 PMC_DL/PMC9174882/supplementaryfiles/EMMM-14-e15816-s004.xlsx Hsapiens 1 38231"
## [156] "PMC9174882 PMC_DL/PMC9174882/supplementaryfiles/EMMM-14-e15816-s008.xlsx Mmusculus 135 36951 37681 38961 40787 39508 37500 41883 37135 36951 38777 41153 39326 40422 37316 42248 40057 38047 37316 39692 37865 39142 38596 39873 40238 38231 40603 38412 36951 40787 39508 37681 38961 37500 41883 37135 36951 41153 39326 40422 37316 42248 38777 39142 38047 40057 37316 37865 39692 38231 40603 38596 39873 40238 38412 38777 40422 39326 41153 42248 37316 38047 40057 37865 39692 37316 39142 40238 39873 38596 38231 40603 38412 36951 37681 38961 40787 39508 37500 41883 37135 36951 38412 40238 39873 38596 40603 38231 37865 39692 37316 38047 40057 39142 38777 42248 37316 40422 39326 41153 36951 37135 41883 37500 38961 37681 39508 40787 36951 38777 37316 42248 39326 41153 40422 37316 37865 39692 38047 40057 39142 38596 40238 39873 40603 38231 38412 36951 38961 37681 39508 40787 41883 37500 36951 37135"
## [157] "PMC9174882 PMC_DL/PMC9174882/supplementaryfiles/EMMM-14-e15816-s002.xlsx Hsapiens 1 39142"
## [158] "PMC9174882 PMC_DL/PMC9174882/supplementaryfiles/EMMM-14-e15816-s005.xlsx Mmusculus 135 38777 37135 37135 37135 37135 37135 38596 43528 43526 43526 43526 43526 43526 43526 43526 43526 43526 43526 43526 43531 43531 43531 43531 43531 43531 43531 43719 43719 43719 43722 43722 43532 43532 43532 Ccdc61-Nova2 43709 43709 43709 43709 43709 43709 43709 43709 43709 43525 43525 43525 43525 43525 43525 43525 43525 43525 43718 43716 43716 43716 43716 43716 43716 43716 43712 43712 43712 43712 43712 43712 43712 43712 43712 43712 43712 43712 43712 43712 43534 43534 43534 43534 43534 43534 43534 43717 43717 43717 43535 43535 43535 43530 43711 43711 43711 43711 43711 43711 43720 43720 43720 43713 43713 43526 43526 43526 43526 43526 43526 43526 43526 43526 43526 43527 43527 43527 43527 43527 43527 43529 43529 43529 43529 43714 43714 43714 43714 43714 43714 43714 43714 43714 43714 43714 43714 43714 43714 43714"
## [159] "PMC9187083 PMC_DL/PMC9187083/supplementaryfiles/pntd.0010435.s002.xlsx Mmusculus 76 44442 44263 44258 44264 44451 44257 44260 44262 44260 44265 44265 44443 44441 44266 44442 44442 44266 44262 44454 44440 44257 44258 44444 44256 44449 44264 44263 44256 44265 44256 44446 44259 44443 44449 44448 44258 44261 44450 44447 44444 44445 44257 44257 44260 44454 44266 44257 44258 44257 44449 44450 44446 44257 44261 44440 44258 44445 44446 44262 44450 44453 44450 44453 44442 44445 44450 44450 44447 44447 44441 44263 44260 44450 44446 44441 44257"
## [160] "PMC9181722 zip/Table_S3.xlsx Hsapiens 1 40057"
## [161] "PMC9181722 zip/Table_S3.xlsx Hsapiens 2 38200 37469"
## [162] "PMC9181722 zip/Table_S3.xlsx Hsapiens 3 39326 37469 37135"
## [163] "PMC9181722 zip/Table_S3.xlsx Hsapiens 1 37469"
## [164] "PMC9181722 zip/Table_S3.xlsx Hsapiens 2 37469 38231"
## [165] "PMC9181722 zip/Table_S3.xlsx Hsapiens 1 39692"
## [166] "PMC9181722 zip/Table_S3.xlsx Hsapiens 1 37469"
## [167] "PMC9181722 zip/Table_S3.xlsx Hsapiens 1 38961"
## [168] "PMC9181722 zip/Table_S4.xlsx Hsapiens 1 38961"
## [169] "PMC9177989 zip/Supplementary_Tables.xlsx Hsapiens 4 44256 44446 44446 44256"
## [170] "PMC9177989 zip/Supplementary_Tables.xlsx Mmusculus 11 44443 44450 44444 44446 44257 44446 44443 44443 44450 44258 44444"
## [171] "PMC9177989 zip/Supplementary_Tables.xlsx Hsapiens 8 44446 44440 44440 44440 44256 44440 44257 44446"
## [172] "PMC9177860 PMC_DL/PMC9177860/supplementaryfiles/42003_2022_3496_MOESM3_ESM.xlsx Mmusculus 7 44807 44624 44631 44808 44622 44809 44815"
## [173] "PMC9177428 PMC_DL/PMC9177428/supplementaryfiles/41586_2022_4786_MOESM11_ESM.xlsx Hsapiens 9 39692 40603 39326 36951 36951 40603 38777 41883 40238"
## [174] "PMC9174980 zip/Supplemental_Materials_tables.xlsx Hsapiens 1 44257"
## [175] "PMC9170766 PMC_DL/PMC9170766/supplementaryfiles/mmc27.xlsx Hsapiens 29 43632 43525 43714 43714 43714 43712 43713 43714 43723 43718 43529 43711 43710 43715 43532 43739 43526 43525 43717 43719 43714 43534 43712 43720 43716 43722 43713 43709 43530"
## [176] "PMC9170766 PMC_DL/PMC9170766/supplementaryfiles/mmc29.xlsx Hsapiens 2 43167 43161"
## [177] "PMC9170766 PMC_DL/PMC9170766/supplementaryfiles/mmc20.xlsx Hsapiens 5 44260 44261 44263 44258 44264"
## [178] "PMC9169118 PMC_DL/PMC9169118/supplementaryfiles/pnas.2115083119.sd03.xlsx Hsapiens 24 3-Sep 8-Mar 10-Sep 15-Sep 2-Mar 6-Sep 5-Mar 4-Sep 12-Sep 9-Mar 5-Sep 14-Sep 7-Sep 4-Mar 11-Sep 1-Dec 8-Sep 7-Mar 2-Sep 9-Sep 6-Mar 10-Mar 3-Mar 11-Mar"
## [179] "PMC9169118 PMC_DL/PMC9169118/supplementaryfiles/pnas.2115083119.sd04.xlsx Hsapiens 144 44531 44257 44257 44265 44266 44266 44266 44258 44258 44259 44260 44260 44261 44261 44261 44262 44263 44263 44264 44264 44454 44454 44454 44454 44454 44450 44451 44441 44442 44442 44443 44443 44444 44445 44446 44446 44446 44447 44447 44448 44448 44531 44257 44257 44265 44266 44258 44258 44258 44259 44260 44260 44260 44261 44262 44262 44263 44263 44263 44264 44264 44454 44454 44449 44449 44449 44450 44450 44451 44453 44453 44441 44442 44442 44442 44443 44443 44444 44446 44446 44446 44447 44447 44447 44448 44448 44265 44260 44262 44454 44449 44449 44450 44441 44443 44447 44448 44265 44266 44259 44261 44453 44441 44444 44445 44445 44265 44264 44261 44441 44451 44444 44444 44531 44531 44262 44450 44266 44445 44265 44451 44259 44445 44449 44454 44442 44263 44258 44448 44531 44451 44531 44257 44441 44450 44264 44451 44259 44259 44262 44443 44445 44257 44444"
## [180] "PMC9169118 PMC_DL/PMC9169118/supplementaryfiles/pnas.2115083119.sd02.xlsx Hsapiens 144 1-Dec 1-Dec 1-Dec 2-Mar 10-Mar 11-Mar 11-Mar 11-Mar 3-Mar 4-Mar 5-Mar 6-Mar 6-Mar 6-Mar 7-Mar 7-Mar 8-Mar 9-Mar 9-Mar 15-Sep 15-Sep 15-Sep 10-Sep 11-Sep 11-Sep 12-Sep 2-Sep 2-Sep 4-Sep 4-Sep 4-Sep 5-Sep 6-Sep 6-Sep 7-Sep 8-Sep 8-Sep 9-Sep 9-Sep 2-Mar 10-Mar 10-Mar 10-Mar 11-Mar 3-Mar 3-Mar 3-Mar 5-Mar 5-Mar 5-Mar 6-Mar 6-Mar 6-Mar 7-Mar 7-Mar 8-Mar 9-Mar 9-Mar 9-Mar 15-Sep 10-Sep 10-Sep 11-Sep 11-Sep 12-Sep 14-Sep 2-Sep 2-Sep 3-Sep 4-Sep 4-Sep 5-Sep 5-Sep 5-Sep 6-Sep 6-Sep 7-Sep 7-Sep 7-Sep 8-Sep 8-Sep 9-Sep 9-Sep 9-Sep 2-Mar 10-Mar 10-Mar 3-Mar 3-Mar 4-Mar 8-Mar 9-Mar 15-Sep 15-Sep 15-Sep 10-Sep 10-Sep 12-Sep 2-Sep 3-Sep 3-Sep 3-Sep 5-Sep 5-Sep 6-Sep 7-Sep 9-Sep 1-Dec 2-Mar 11-Mar 11-Mar 4-Mar 4-Mar 7-Mar 8-Mar 15-Sep 12-Sep 14-Sep 14-Sep 8-Sep 4-Mar 5-Mar 7-Mar 8-Mar 11-Sep 12-Sep 8-Sep 1-Dec 2-Mar 4-Mar 2-Sep 7-Sep 4-Sep 1-Dec 5-Mar 3-Sep 11-Sep 12-Sep 6-Sep 2-Mar 15-Sep 10-Sep 8-Mar 3-Sep"
## [181] "PMC9169118 PMC_DL/PMC9169118/supplementaryfiles/pnas.2115083119.sd05.xlsx Hsapiens 24 43894 43892 44166 44086 44078 43897 44080 43899 44083 44089 44076 44079 44085 43893 43898 44084 44077 43900 43901 44088 43896 44081 44082 43895"
## [182] "PMC9168273 zip/Supplementary_materials-revision/Table_S3-revision.xlsx Hsapiens 1 43897"
## [183] "PMC9168240 PMC_DL/PMC9168240/supplementaryfiles/Table_1.xls Hsapiens 1 2022/09/09"
## [184] "PMC9168240 PMC_DL/PMC9168240/supplementaryfiles/Table_1.xls Hsapiens 1 2021/03/03"
## [185] "PMC9165556 PMC_DL/PMC9165556/supplementaryfiles/izab318_suppl_supplementary_data_2.xlsx Hsapiens 3 37226 36951 37316"
## [186] "PMC9165502 PMC_DL/PMC9165502/supplementaryfiles/ADVS-9-2104979-s001.xlsx Hsapiens 3 44445 44444 44264"
## [187] "PMC9165502 PMC_DL/PMC9165502/supplementaryfiles/ADVS-9-2104979-s005.xlsx Hsapiens 3 44445 44444 9-Mar"
## [188] "PMC9164415 zip/Table_S1a-Frameshift_Similarities-ClustalW-MSA.xlsx Drerio 11 44077 43895 43898 44084 43892 43894 43892 44089 44080 43897 43898"
## [189] "PMC9164415 zip/Table_S1a-Frameshift_Similarities-ClustalW-MSA.xlsx Dmelanogaster 17 44166 44166 44166 44078 44078 44078 44075 44078 44078 44078 44078 44076 44079 44079 44166 44166 44166"
## [190] "PMC9164415 zip/Table_S1a-Frameshift_Similarities-ClustalW-MSA.xlsx Celegans 18 43893 43892 43894 44105 44105 43896 43891 43983 43983 43983 43922 43922 43983 43983 44106 44106 44075 43895"
## [191] "PMC9164415 zip/Table_S1a-Frameshift_Similarities-ClustalW-MSA.xlsx Scerevisiae 1 44105"
## [192] "PMC9164415 zip/Table_S1a-Frameshift_Similarities-ClustalW-MSA.xlsx Hsapiens 21 43891 43898 44086 44075 43899 43901 43900 43894 44082 43898 44085 44078 43896 44077 43891 43893 43897 44076 44079 44088 43895"
## [193] "PMC9164415 zip/Table_S1a-Frameshift_Similarities-ClustalW-MSA.xlsx Mmusculus 49 43891 43894 43892 44076 44076 44076 43899 44084 44078 44083 44078 43900 44082 44083 43900 44082 44083 44083 44078 44078 44077 43901 43896 43901 44079 44086 43892 43892 43893 43895 43895 43897 44089 44085 44085 44088 44085 43898 43898 43898 44075 43891 43891 43891 43891 44081 44080 44080 44080"
## [194] "PMC9164415 zip/Table_S1a-Frameshift_Similarities-ClustalW-MSA.xlsx Drerio 12 43893 44129 43899 44086 43895 44084 44076 43892 43898 44082 44079 44085"
## [195] "PMC9164415 zip/Table_S1b-Frameshift_Similarities-FrameAlign.xlsx Dmelanogaster 17 44531 44531 44531 44443 44443 44443 44440 44443 44443 44443 44443 44441 44444 44444 44531 44531 44531"
## [196] "PMC9164415 zip/Table_S1b-Frameshift_Similarities-FrameAlign.xlsx Celegans 18 44258 44257 44259 44470 44470 44261 44256 44348 44348 44348 44287 44287 44348 44348 44471 44471 44440 44260"
## [197] "PMC9164415 zip/Table_S1b-Frameshift_Similarities-FrameAlign.xlsx Scerevisiae 1 44470"
## [198] "PMC9164415 zip/Table_S1b-Frameshift_Similarities-FrameAlign.xlsx Hsapiens 21 44256 44263 44451 44440 44264 44266 44265 44259 44447 44263 44450 44443 44261 44442 44256 44258 44262 44441 44444 44453 44260"
## [199] "PMC9164415 zip/Table_S1b-Frameshift_Similarities-FrameAlign.xlsx Mmusculus 49 44256 44259 44257 44441 44441 44441 44264 44449 44443 44448 44443 44265 44447 44448 44265 44447 44448 44448 44443 44443 44442 44266 44261 44266 44444 44451 44257 44257 44258 44260 44260 44262 44454 44450 44450 44453 44450 44263 44263 44263 44440 44256 44256 44256 44256 44446 44445 44445 44445"
## [200] "PMC9164415 zip/Table_S1b-Frameshift_Similarities-FrameAlign.xlsx Drerio 12 44258 44494 44264 44451 44260 44449 44441 44257 44263 44447 44444 44450"
## [201] "PMC9164415 zip/Table_S1b-Frameshift_Similarities-FrameAlign.xlsx Drerio 11 44442 44260 44263 44449 44257 44259 44257 44454 44445 44262 44263"
## [202] "PMC9164052 zip/Suppl_Table_S1.xls Hsapiens 1 44089"
## [203] "PMC9163145 PMC_DL/PMC9163145/supplementaryfiles/mmc2.xlsx Hsapiens 3 44626 44621 44813"
## [204] "PMC9163145 PMC_DL/PMC9163145/supplementaryfiles/mmc2.xlsx Hsapiens 1 44808"
## [205] "PMC9163145 PMC_DL/PMC9163145/supplementaryfiles/mmc2.xlsx Hsapiens 1 44627"
## [206] "PMC9163145 PMC_DL/PMC9163145/supplementaryfiles/mmc2.xlsx Hsapiens 1 44621"
## [207] "PMC9163145 PMC_DL/PMC9163145/supplementaryfiles/mmc2.xlsx Hsapiens 1 44626"
## [208] "PMC9131284 PMC_DL/PMC9131284/supplementaryfiles/thnov12p3946s1.xlsx Hsapiens 1 44256"
## [209] "PMC9131284 PMC_DL/PMC9131284/supplementaryfiles/thnov12p3946s1.xlsx Hsapiens 1 44256"
## [210] "PMC9131284 PMC_DL/PMC9131284/supplementaryfiles/thnov12p3946s1.xlsx Hsapiens 2 44256 44443"
## [211] "PMC9131284 PMC_DL/PMC9131284/supplementaryfiles/thnov12p3946s1.xlsx Hsapiens 4 44256 44443 44440 44445"
## [212] "PMC8928958 zip/cimb-1492332-supplementary/Supplementary_Material_1-Gene_Set_1-Microarray_Results-.xlsx Hsapiens 1 44454"
## [213] "PMC8928958 zip/cimb-1492332-supplementary/Supplementary_Material_1-Gene_Set_1-Microarray_Results-.xlsx Hsapiens 1 44454"
## [214] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 6 44260 44442 44262 44259 44258 44450"
## [215] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 3 44450 44440 44256"
## [216] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 5 44256 44442 44261 44447 44258"
## [217] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 3 44258 44441 44450"
## [218] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 4 44258 44449 44264 44256"
## [219] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 7 44263 44447 44449 44259 44258 44445 44446"
## [220] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 6 44445 44446 44262 44261 44266 44257"
## [221] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 6 44440 44256 44260 44257 44258 44442"
## [222] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 9 44260 44441 44442 44450 44258 44261 44266 44257 44440"
## [223] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 3 44259 44258 44450"
## [224] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 4 44256 44446 44440 44258"
## [225] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 4 44446 44440 44256 44450"
## [226] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 6 44266 44440 44257 44445 44256 44447"
## [227] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 1 44266"
## [228] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 4 44264 44442 44263 44259"
## [229] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 7 44258 44445 44256 44259 44257 44257 44449"
## [230] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 7 44256 44262 44258 44446 44442 44260 44449"
## [231] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 12 44447 44256 44440 44441 44258 44442 44444 44445 44259 44263 44450 44257"
## [232] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 11 44447 44449 44442 44264 44259 44260 44263 44266 44446 44440 44256"
## [233] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 1 44266"
## [234] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 5 44266 44445 44440 44261 44263"
## [235] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 2 44258 44256"
## [236] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 1 44256"
## [237] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM11_ESM.xlsx Mmusculus 7 44445 44259 44448 44261 44446 44447 44440"
## [238] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 1 44256"
## [239] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 1 44446"
## [240] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 1 44258"
## [241] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 2 44256 44259"
## [242] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 2 44442 44259"
## [243] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 6 44448 44257 44258 44446 44262 44450"
## [244] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 2 44257 44258"
## [245] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 1 44447"
## [246] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 1 44257"
## [247] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 1 44266"
## [248] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 2 44263 44266"
## [249] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 4 44257 44263 44266 44446"
## [250] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 8 44450 44257 44447 44442 44259 44266 44256 44445"
## [251] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 3 44442 44447 44258"
## [252] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM13_ESM.xlsx Mmusculus 2 44256 44442"
## [253] "PMC9159952 PMC_DL/PMC9159952/supplementaryfiles/41586_2022_4686_MOESM4_ESM.xlsx Mmusculus 21 44441 44445 44257 44264 44258 44454 44256 44444 44450 44449 44257 44262 44443 44447 44263 44259 44260 44446 44442 44261 44448"
## [254] "PMC9159589 PMC_DL/PMC9159589/supplementaryfiles/pgen.1010207.s019.xlsx Drerio 26 41153 42248 41153 40422 38777 38777 41153 41153 42248 38961 37500 38777 38777 38961 42248 41153 38961 38777 40422 41153 42248 40422 41153 38777 38961 40422"
## [255] "PMC9158132 PMC_DL/PMC9158132/supplementaryfiles/Table_2.xlsx Ggallus 3 44441 44446 44448"
## [256] "PMC9158132 PMC_DL/PMC9158132/supplementaryfiles/Table_2.xlsx Hsapiens 9 44450 44444 44445 44445 44446 44446 44447 44448 44448"
## [257] "PMC9158132 PMC_DL/PMC9158132/supplementaryfiles/Table_2.xlsx Ggallus 2 44446 44448"
## [258] "PMC9157503 PMC_DL/PMC9157503/supplementaryfiles/Table3.XLSX Hsapiens 9 44806 44806 44806 44627 44627 44810 44812 44811 44625"
## [259] "PMC9157503 PMC_DL/PMC9157503/supplementaryfiles/Table3.XLSX Ggallus 7 44622 44622 44813 44813 44809 44811 44812"
## [260] "PMC9154123 PMC_DL/PMC9154123/supplementaryfiles/ppat.1010003.s014.xlsx Hsapiens 26 43901 44082 43891 43897 44075 44088 43900 43896 43894 44080 44086 43893 44077 44089 43892 43898 44079 43899 44085 44166 44076 44084 44083 44078 44081 43895"
## [261] "PMC9154123 PMC_DL/PMC9154123/supplementaryfiles/ppat.1010003.s014.xlsx Hsapiens 26 44440 44453 44266 44262 44447 44454 44264 44265 44257 44441 44448 44450 44258 44263 44446 44444 44449 44451 44259 44445 44442 44531 44256 44443 44261 44260"
## [262] "PMC9154123 PMC_DL/PMC9154123/supplementaryfiles/ppat.1010003.s014.xlsx Hsapiens 26 44266 44259 44447 44261 44256 44451 44440 44444 44445 44442 44443 44454 44264 44262 44453 44258 44448 44265 44263 44441 44446 44531 44257 44260 44449 44450"
## [263] "PMC9154123 PMC_DL/PMC9154123/supplementaryfiles/ppat.1010003.s014.xlsx Hsapiens 26 44264 44531 44260 44445 44257 44262 44256 44447 44449 44454 44446 44441 44448 44453 44442 44443 44450 44444 44451 44259 44261 44258 44263 44265 44266 44440"
## [264] "PMC9154123 PMC_DL/PMC9154123/supplementaryfiles/ppat.1010003.s015.xlsx Hsapiens 2 44621 44621"
## [265] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl3.xlsx Hsapiens 3 37316 37681 39508"
## [266] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl3.xlsx Hsapiens 1 38596"
## [267] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl3.xlsx Hsapiens 1 38596"
## [268] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 12 39508 37316 37681 39692 40787 38777 36951 38596 37500 39142 38412 37865"
## [269] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 9 38961 38231 37135 36951 39873 40422 40057 39326 41883"
## [270] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 3 39508 37316 37681"
## [271] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 3 39508 37316 37681"
## [272] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 7 39692 40787 37135 37500 36951 39326 36951"
## [273] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 12 38596 38961 39508 37316 37681 40422 38412 39142 40057 37316 39873 38777"
## [274] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 1 38596"
## [275] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 1 38596"
## [276] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 12 39142 37316 38412 36951 37681 39326 36951 39508 37500 41883 39692 38777"
## [277] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 7 38596 38961 40057 37135 40787 40422 39873"
## [278] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 12 39508 37316 37681 38777 39692 39326 37500 39873 39142 38412 37135 40787"
## [279] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 7 38596 40057 40422 36951 36951 37316 38961"
## [280] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 3 39508 37316 37681"
## [281] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 3 39508 37316 37681"
## [282] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 10 37135 37316 37500 37681 38596 38777 39508 39692 39873 40787"
## [283] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 6 38412 38961 39142 39326 40057 40422"
## [284] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 2 37316 37681"
## [285] "PMC9153011 PMC_DL/PMC9153011/supplementaryfiles/advancesADV2021005360-suppl1.xlsx Hsapiens 2 37316 37681"
Let’s investigate the errors in more detail.
# By species
SPECIES <- sapply(strsplit(ERROR_GENELISTS," "),"[[",3)
table(SPECIES)
## SPECIES
## Celegans Dmelanogaster Drerio Ggallus Hsapiens
## 3 7 11 4 193
## Mmusculus Scerevisiae
## 61 6
par(mar=c(5,12,4,2))
barplot(table(SPECIES),horiz=TRUE,las=1)
par(mar=c(5,5,4,2))
# Number of affected Excel files per paper
DIST <- table(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
DIST
##
## PMC8928958 PMC8929108 PMC8982554 PMC9098121 PMC9117153 PMC9131284 PMC9152293
## 2 2 5 3 4 4 3
## PMC9153011 PMC9154123 PMC9155246 PMC9157503 PMC9158132 PMC9159589 PMC9159952
## 21 5 1 2 3 1 40
## PMC9160332 PMC9161486 PMC9162925 PMC9163142 PMC9163145 PMC9163161 PMC9164052
## 2 7 4 1 5 4 1
## PMC9164292 PMC9164415 PMC9165502 PMC9165556 PMC9167495 PMC9168240 PMC9168273
## 6 14 2 1 1 2 1
## PMC9168533 PMC9168984 PMC9169118 PMC9169770 PMC9170766 PMC9171052 PMC9174882
## 2 1 4 1 3 1 6
## PMC9174980 PMC9177428 PMC9177860 PMC9177981 PMC9177989 PMC9178395 PMC9179874
## 1 1 1 2 3 5 1
## PMC9181722 PMC9186391 PMC9186538 PMC9187083 PMC9190170 PMC9194130 PMC9194831
## 9 2 1 1 12 1 2
## PMC9197769 PMC9198558 PMC9198845 PMC9198913 PMC9200740 PMC9200849 PMC9201760
## 1 1 2 2 3 5 1
## PMC9202124 PMC9202886 PMC9203013 PMC9203529 PMC9205776 PMC9205910 PMC9206655
## 9 1 5 1 1 1 6
## PMC9207414 PMC9207522 PMC9208075 PMC9211071 PMC9213508 PMC9213774 PMC9217776
## 1 2 1 4 1 1 1
## PMC9219160 PMC9222349 PMC9222793 PMC9222903 PMC9223334 PMC9226039 PMC9226523
## 2 7 1 2 3 1 1
## PMC9226526 PMC9233711 PMC9234272 PMC9239442 PMC9241257
## 6 2 1 1 1
summary(as.numeric(DIST))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 3.476 4.000 40.000
hist(DIST,main="Number of affected Excel files per paper")
# PMC Articles with the most errors
DIST_DF <- as.data.frame(DIST)
DIST_DF <- DIST_DF[order(-DIST_DF$Freq),,drop=FALSE]
head(DIST_DF,20)
## Var1 Freq
## 14 PMC9159952 40
## 8 PMC9153011 21
## 23 PMC9164415 14
## 47 PMC9190170 12
## 43 PMC9181722 9
## 57 PMC9202124 9
## 16 PMC9161486 7
## 72 PMC9222349 7
## 22 PMC9164292 6
## 35 PMC9174882 6
## 63 PMC9206655 6
## 78 PMC9226526 6
## 3 PMC8982554 5
## 9 PMC9154123 5
## 19 PMC9163145 5
## 41 PMC9178395 5
## 55 PMC9200849 5
## 59 PMC9203013 5
## 5 PMC9117153 4
## 6 PMC9131284 4
MOST_ERR_FILES = as.character(DIST_DF[1,1])
MOST_ERR_FILES
## [1] "PMC9159952"
# Number of errors per paper
NERR <- as.numeric(sapply(strsplit(ERROR_GENELISTS," "),"[[",4))
names(NERR) <- sapply(strsplit(ERROR_GENELISTS," "),"[[",1)
NERR <-tapply(NERR, names(NERR), sum)
NERR
## PMC8928958 PMC8929108 PMC8982554 PMC9098121 PMC9117153 PMC9131284 PMC9152293
## 2 2 8 19 246 8 6
## PMC9153011 PMC9154123 PMC9155246 PMC9157503 PMC9158132 PMC9159589 PMC9159952
## 117 106 4 16 14 26 182
## PMC9160332 PMC9161486 PMC9162925 PMC9163142 PMC9163145 PMC9163161 PMC9164052
## 15 48 80 1 7 4 1
## PMC9164292 PMC9164415 PMC9165502 PMC9165556 PMC9167495 PMC9168240 PMC9168273
## 19 258 6 3 1 2 1
## PMC9168533 PMC9168984 PMC9169118 PMC9169770 PMC9170766 PMC9171052 PMC9174882
## 3 5 336 2 36 1 295
## PMC9174980 PMC9177428 PMC9177860 PMC9177981 PMC9177989 PMC9178395 PMC9179874
## 1 9 7 2 23 5 1
## PMC9181722 PMC9186391 PMC9186538 PMC9187083 PMC9190170 PMC9194130 PMC9194831
## 13 41 7 76 64 5 3
## PMC9197769 PMC9198558 PMC9198845 PMC9198913 PMC9200740 PMC9200849 PMC9201760
## 3 7 8 3 4 7 2
## PMC9202124 PMC9202886 PMC9203013 PMC9203529 PMC9205776 PMC9205910 PMC9206655
## 243 1 6 4 2 12 6
## PMC9207414 PMC9207522 PMC9208075 PMC9211071 PMC9213508 PMC9213774 PMC9217776
## 13 16 1 56 76 14 4
## PMC9219160 PMC9222349 PMC9222793 PMC9222903 PMC9223334 PMC9226039 PMC9226523
## 4 24 2 2 5 12 7
## PMC9226526 PMC9233711 PMC9234272 PMC9239442 PMC9241257
## 14 15 1 3 2
hist(NERR,main="number of errors per PMC article")
NERR_DF <- as.data.frame(NERR)
NERR_DF <- NERR_DF[order(-NERR_DF$NERR),,drop=FALSE]
head(NERR_DF,20)
## NERR
## PMC9169118 336
## PMC9174882 295
## PMC9164415 258
## PMC9117153 246
## PMC9202124 243
## PMC9159952 182
## PMC9153011 117
## PMC9154123 106
## PMC9162925 80
## PMC9187083 76
## PMC9213508 76
## PMC9190170 64
## PMC9211071 56
## PMC9161486 48
## PMC9186391 41
## PMC9170766 36
## PMC9159589 26
## PMC9222349 24
## PMC9177989 23
## PMC9098121 19
MOST_ERR = rownames(NERR_DF)[1]
MOST_ERR
## [1] "PMC9169118"
GENELIST_ERROR_ARTICLES <- gsub("PMC","",GENELIST_ERROR_ARTICLES)
### JSON PARSING is more reliable than XML
ARTICLES <- esummary( GENELIST_ERROR_ARTICLES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA$result
ARTICLE_DATA <- ARTICLE_DATA[2:length(ARTICLE_DATA)]
JOURNALS <- unlist(lapply(ARTICLE_DATA,function(x) {x$fulljournalname} ))
JOURNALS_TABLE <- table(JOURNALS)
JOURNALS_TABLE <- JOURNALS_TABLE[order(-JOURNALS_TABLE)]
length(JOURNALS_TABLE)
## [1] 51
par(mar=c(5,25,4,2))
barplot(head(JOURNALS_TABLE,10), horiz=TRUE, las=1,
xlab="Articles with gene name errors in supp files",
main="Top journals this month")
Congrats to our Journal of the Month winner!
JOURNAL_WINNER <- names(head(JOURNALS_TABLE,1))
JOURNAL_WINNER
## [1] "Frontiers in Genetics"
There are two categories:
Paper with the most suplementary files affected by gene name errors (MOST_ERR_FILES)
Paper with the most gene names converted to dates (MOST_ERR)
Sometimes, one paper can win both categories. Congrats to our winners.
MOST_ERR_FILES <- gsub("PMC","",MOST_ERR_FILES)
ARTICLES <- esummary( MOST_ERR_FILES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA[2]
ARTICLE_DATA
## $result
## $result$uids
## [1] "9159952"
##
## $result$`9159952`
## $result$`9159952`$uid
## [1] "9159952"
##
## $result$`9159952`$pubdate
## [1] "2022 May 4"
##
## $result$`9159952`$epubdate
## [1] "2022 May 4"
##
## $result$`9159952`$printpubdate
## [1] "2022"
##
## $result$`9159952`$source
## [1] "Nature"
##
## $result$`9159952`$authors
## name authtype
## 1 Gegenhuber B Author
## 2 Wu MV Author
## 3 Bronstein R Author
## 4 Tollkuhn J Author
##
## $result$`9159952`$title
## [1] "Gene regulation by gonadal hormone receptors underlies brain sex differences"
##
## $result$`9159952`$volume
## [1] "606"
##
## $result$`9159952`$issue
## [1] "7912"
##
## $result$`9159952`$pages
## [1] "153-159"
##
## $result$`9159952`$articleids
## idtype value
## 1 pmid 35508660
## 2 doi 10.1038/s41586-022-04686-1
## 3 pmcid PMC9159952
##
## $result$`9159952`$fulljournalname
## [1] "Nature"
##
## $result$`9159952`$sortdate
## [1] "2022/05/04 00:00"
##
## $result$`9159952`$pmclivedate
## [1] "2022/06/03"
MOST_ERR <- gsub("PMC","",MOST_ERR)
ARTICLE_DATA <- esummary(MOST_ERR,db = "pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLE_DATA,as= "parsed")
ARTICLE_DATA
## $header
## $header$type
## [1] "esummary"
##
## $header$version
## [1] "0.3"
##
##
## $result
## $result$uids
## [1] "9169118"
##
## $result$`9169118`
## $result$`9169118`$uid
## [1] "9169118"
##
## $result$`9169118`$pubdate
## [1] "2022 Mar 28"
##
## $result$`9169118`$epubdate
## [1] "2022 Mar 28"
##
## $result$`9169118`$printpubdate
## [1] "2022 Apr 5"
##
## $result$`9169118`$source
## [1] "Proc Natl Acad Sci U S A"
##
## $result$`9169118`$authors
## name authtype
## 1 Wang Y Author
## 2 Menon AK Author
## 3 Maki Y Author
## 4 Liu YS Author
## 5 Iwasaki Y Author
## 6 Fujita M Author
## 7 Guerrero PA Author
## 8 Silva DV Author
## 9 Seeberger PH Author
## 10 Murakami Y Author
## 11 Kinoshita T Author
##
## $result$`9169118`$title
## [1] "Genome-wide CRISPR screen reveals CLPTM1L as a lipid scramblase required for efficient glycosylphosphatidylinositol biosynthesis"
##
## $result$`9169118`$volume
## [1] "119"
##
## $result$`9169118`$issue
## [1] "14"
##
## $result$`9169118`$pages
## [1] "e2115083119"
##
## $result$`9169118`$articleids
## idtype value
## 1 pmid 35344438
## 2 doi 10.1073/pnas.2115083119
## 3 pmcid PMC9169118
##
## $result$`9169118`$fulljournalname
## [1] "Proceedings of the National Academy of Sciences of the United States of America"
##
## $result$`9169118`$sortdate
## [1] "2022/03/28 00:00"
##
## $result$`9169118`$pmclivedate
## [1] "2022/06/07"
TODO: To plot the trend over the past 6 months.
Zeeberg, B.R., Riss, J., Kane, D.W. et al. Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5, 80 (2004). https://doi.org/10.1186/1471-2105-5-80
Ziemann, M., Eren, Y. & El-Osta, A. Gene name errors are widespread in the scientific literature. Genome Biol 17, 177 (2016). https://doi.org/10.1186/s13059-016-1044-7
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] readxl_1.4.0 reutils_0.2.3 xml2_1.3.3 jsonlite_1.8.0
##
## loaded via a namespace (and not attached):
## [1] knitr_1.39 magrittr_2.0.3 R6_2.5.1 rlang_1.0.2
## [5] fastmap_1.1.0 stringr_1.4.0 highr_0.9 tools_4.2.0
## [9] xfun_0.31 cli_3.3.0 jquerylib_0.1.4 htmltools_0.5.2
## [13] yaml_2.3.5 digest_0.6.29 assertthat_0.2.1 sass_0.4.1
## [17] bitops_1.0-7 RCurl_1.98-1.7 evaluate_0.15 rmarkdown_2.14
## [21] stringi_1.7.6 compiler_4.2.0 bslib_0.3.1 cellranger_1.1.0
## [25] XML_3.99-0.10