Source: https://github.com/markziemann/GeneNameErrors2020
View the reports: http://ziemann-lab.net/public/gene_name_errors/
Gene name errors result when data are imported improperly into MS Excel and other spreadsheet programs (Zeeberg et al, 2004). Certain gene names like MARCH3, SEPT2 and DEC1 are converted into date format. These errors are surprisingly common in supplementary data files in the field of genomics (Ziemann et al, 2016). This could be considered a small error because it only affects a small number of genes, however it is symptomtic of poor data processing methods. The purpose of this script is to identify gene name errors present in supplementary files of PubMed Central articles in the previous month.
library("XML")
library("jsonlite")
library("xml2")
library("reutils")
library("readxl")
library("RCurl")
Here I will be getting PubMed Central IDs for the previous month.
Start with figuring out the date to search PubMed Central.
CURRENT_MONTH=format(Sys.time(), "%m")
CURRENT_YEAR=format(Sys.time(), "%Y")
if (CURRENT_MONTH == "01") {
PREV_YEAR=as.character(as.numeric(format(Sys.time(), "%Y"))-1)
PREV_MONTH="12"
} else {
PREV_YEAR=CURRENT_YEAR
PREV_MONTH=as.character(as.numeric(format(Sys.time(), "%m"))-1)
}
DATE=paste(PREV_YEAR,"/",PREV_MONTH,sep="")
DATE
## [1] "2022/10"
Let’s see how many PMC IDs we have in the past month.
QUERY ='((genom*[Abstract]))'
ESEARCH_RES <- esearch(term=QUERY, db = "pmc", rettype = "uilist", retmode = "xml", retstart = 0,
retmax = 5000000, usehistory = TRUE, webenv = NULL, querykey = NULL, sort = NULL, field = NULL,
datetype = NULL, reldate = NULL,
mindate = paste(DATE,"/1",sep="") , maxdate = paste(DATE,"/31",sep=""))
pmc <- efetch(ESEARCH_RES,retmode="text",rettype="uilist",outfile="pmcids.txt")
## Retrieving UIDs 1 to 500
## Retrieving UIDs 501 to 1000
## Retrieving UIDs 1001 to 1500
## Retrieving UIDs 1501 to 2000
## Retrieving UIDs 2001 to 2500
## Retrieving UIDs 2501 to 3000
## Retrieving UIDs 3001 to 3500
## Retrieving UIDs 3501 to 4000
pmc <- read.table(pmc)
pmc <- paste("PMC",pmc$V1,sep="")
NUM_ARTICLES=length(pmc)
NUM_ARTICLES
## [1] 3778
writeLines(pmc,con="pmc.txt")
Now run the bash script. Note that false positives can occur (~1.5%) and these results have not been verified by a human.
Here are some definitions:
NUM_XLS = Number of supplementary Excel files in this set of PMC articles.
NUM_XLS_ARTICLES = Number of articles matching the PubMed Central search which have supplementary Excel files.
GENELISTS = The gene lists found in the Excel files. Each Excel file is counted once even it has multiple gene lists.
NUM_GENELISTS = The number of Excel files with gene lists.
NUM_GENELIST_ARTICLES = The number of PMC articles with supplementary Excel gene lists.
ERROR_GENELISTS = Files suspected to contain gene name errors. The dates and five-digit numbers indicate transmogrified gene names.
NUM_ERROR_GENELISTS = Number of Excel gene lists with errors.
NUM_ERROR_GENELIST_ARTICLES = Number of articles with supplementary Excel gene name errors.
ERROR_PROPORTION = This is the proportion of articles with Excel gene lists that have errors.
system("./gene_names.sh pmc.txt")
results <- readLines("results.txt")
XLS <- results[grep("XLS",results,ignore.case=TRUE)]
NUM_XLS = length(XLS)
NUM_XLS
## [1] 5135
NUM_XLS_ARTICLES = length(unique(sapply(strsplit(XLS," "),"[[",1)))
NUM_XLS_ARTICLES
## [1] 895
GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>2]
#GENELISTS
NUM_GENELISTS <- length(unique(sapply(strsplit(GENELISTS," "),"[[",2)))
NUM_GENELISTS
## [1] 657
NUM_GENELIST_ARTICLES <- length(unique(sapply(strsplit(GENELISTS," "),"[[",1)))
NUM_GENELIST_ARTICLES
## [1] 330
ERROR_GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>3]
#ERROR_GENELISTS
NUM_ERROR_GENELISTS = length(ERROR_GENELISTS)
NUM_ERROR_GENELISTS
## [1] 241
GENELIST_ERROR_ARTICLES <- unique(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
GENELIST_ERROR_ARTICLES
## [1] "PMC9617917" "PMC9617877" "PMC9617049" "PMC9616860" "PMC9606253"
## [6] "PMC9605692" "PMC9586478" "PMC9584951" "PMC9584949" "PMC9584885"
## [11] "PMC9581932" "PMC9612055" "PMC9597788" "PMC9576059" "PMC9571470"
## [16] "PMC9575684" "PMC9568502" "PMC9596731" "PMC9605867" "PMC9584372"
## [21] "PMC9550834" "PMC9570186" "PMC9548106" "PMC9585224" "PMC9547065"
## [26] "PMC9547055" "PMC9546890" "PMC9544324" "PMC9586874" "PMC9582937"
## [31] "PMC9561262" "PMC9585938" "PMC9579367" "PMC9579347" "PMC9579318"
## [36] "PMC9558508" "PMC9535881" "PMC9534852" "PMC9532393" "PMC9531459"
## [41] "PMC9574330" "PMC9561881" "PMC9561266" "PMC9531352" "PMC9578707"
## [46] "PMC9561824" "PMC9546626" "PMC9524707" "PMC9523919" "PMC9570865"
## [51] "PMC9518689" "PMC9552960" "PMC9551187" "PMC9550876" "PMC9549141"
## [56] "PMC9549113" "PMC9548632" "PMC9548593" "PMC9548527" "PMC9550651"
## [61] "PMC9558549" "PMC9532972" "PMC9529142" "PMC9534044" "PMC9531689"
## [66] "PMC9531163" "PMC9530606" "PMC9530462" "PMC9530347" "PMC9482892"
## [71] "PMC9524857" "PMC9523398" "PMC9523360" "PMC9508470" "PMC9560165"
## [76] "PMC9483806" "PMC9512470" "PMC9494239" "PMC9613476" "PMC9584034"
## [81] "PMC9581494" "PMC9545373" "PMC9546057" "PMC9559112" "PMC9546433"
## [86] "PMC9596373" "PMC9545044"
NUM_ERROR_GENELIST_ARTICLES <- length(GENELIST_ERROR_ARTICLES)
NUM_ERROR_GENELIST_ARTICLES
## [1] 87
ERROR_PROPORTION = NUM_ERROR_GENELIST_ARTICLES / NUM_GENELIST_ARTICLES
ERROR_PROPORTION
## [1] 0.2636364
Here you can have a look at all the gene lists detected in the past month, as well as those with errors. The dates are obvious errors, these are commonly dates in September, March, December and October. The five-digit numbers represent dates as they are encoded in the Excel internal format. The five digit number is the number of days since 1900. If you were to take these numbers and put them into Excel and format the cells as dates, then these will also mostly map to dates in September, March, December and October.
#GENELISTS
ERROR_GENELISTS
## [1] "PMC9617917 PMC_DL/PMC9617917/supplementaryfiles/41598_2022_23268_MOESM2_ESM.xls Hsapiens 8 44445 44441 44261 44446 44263 44443 44260 44257"
## [2] "PMC9617877 PMC_DL/PMC9617877/supplementaryfiles/41467_2022_34179_MOESM3_ESM.xlsx Hsapiens 3 43891 43891 43891"
## [3] "PMC9617049 PMC_DL/PMC9617049/supplementaryfiles/41467_2022_34111_MOESM4_ESM.xlsx Hsapiens 25 44256 44263 44262 44445 44446 44266 44453 44443 44441 44260 44449 44264 44447 44440 44451 44448 44261 44531 44444 44259 44442 44257 44258 44450 44265"
## [4] "PMC9617049 PMC_DL/PMC9617049/supplementaryfiles/41467_2022_34111_MOESM3_ESM.xlsx Hsapiens 81 44075 43892 44088 43896 44080 44077 43900 43892 43891 44081 44084 44086 44081 43891 44088 44078 43897 44085 44085 43893 44166 44086 43898 44076 44086 44082 44082 44080 43896 44079 43899 44079 44077 43899 43894 43895 43896 43900 43900 44084 44080 44077 43891 43899 44088 44081 43894 43894 43898 44085 44082 43891 43893 44078 44083 43897 44075 44083 44078 43893 43895 43892 43897 44166 43898 44076 43892 44083 44084 44076 44075 43901 44166 43901 43892 43901 43891 44079 43892 43895 43891"
## [5] "PMC9617049 PMC_DL/PMC9617049/supplementaryfiles/41467_2022_34111_MOESM3_ESM.xlsx Hsapiens 81 43901 43891 43896 44081 44086 43894 44075 44082 43896 43892 43899 43893 44079 43892 43891 43901 44078 43896 44076 43892 43891 43899 44083 44076 44084 44166 44075 44084 44076 44081 44080 43892 44082 44075 44085 44078 43895 44086 43893 44083 44081 43901 43899 44077 43898 44166 44088 43900 44079 43898 44078 43895 44084 44088 43892 44088 43894 43892 44085 44085 44166 43897 43895 43898 43894 44079 44080 43891 44077 43897 44086 43891 43900 44080 44082 43893 44077 43897 43900 43891 44083"
## [6] "PMC9616860 PMC_DL/PMC9616860/supplementaryfiles/41467_2022_34078_MOESM4_ESM.xlsx Hsapiens 29 43895 43896 44079 44082 43892 43891 44084 44078 43897 43900 44166 43901 44075 43893 44088 44081 44075 44086 43892 43899 44077 43898 44076 43891 44083 44085 44080 44089 43894"
## [7] "PMC9616860 PMC_DL/PMC9616860/supplementaryfiles/41467_2022_34078_MOESM5_ESM.xlsx Mmusculus 15 44085 44078 43897 43892 43898 43893 44075 43892 43895 44082 44080 43896 44083 44081 44076"
## [8] "PMC9616860 PMC_DL/PMC9616860/supplementaryfiles/41467_2022_34078_MOESM6_ESM.xlsx Mmusculus 18 44083 43892 43898 43895 43893 44081 44075 44076 44082 43892 43899 44084 43896 43897 43891 44085 44080 44079"
## [9] "PMC9616860 PMC_DL/PMC9616860/supplementaryfiles/41467_2022_34078_MOESM8_ESM.xlsx Mmusculus 18 44082 44075 44083 44084 43891 43897 44076 43898 44089 43892 43895 44079 44081 43892 43893 44085 44080 43896"
## [10] "PMC9616860 PMC_DL/PMC9616860/supplementaryfiles/41467_2022_34078_MOESM7_ESM.xlsx Mmusculus 2 43897 44085"
## [11] "PMC9606253 PMC_DL/PMC9606253/supplementaryfiles/41467_2022_34200_MOESM10_ESM.xlsx Mmusculus 22 43894 43900 44083 43901 44079 43893 43901 43897 44081 43894 43892 43892 44084 44082 43901 43896 43892 43893 43897 44085 43891 44081"
## [12] "PMC9606253 PMC_DL/PMC9606253/supplementaryfiles/41467_2022_34200_MOESM9_ESM.xlsx Mmusculus 8 43716 43717 43715 43715 43716 43528 43712 43717"
## [13] "PMC9606253 PMC_DL/PMC9606253/supplementaryfiles/41467_2022_34200_MOESM4_ESM.xlsx Mmusculus 16 43346 43354 43163 43347 43348 43351 43168 43160 43165 43162 43352 43169 43349 43353 43161 43161"
## [14] "PMC9606253 PMC_DL/PMC9606253/supplementaryfiles/41467_2022_34200_MOESM5_ESM.xlsx Mmusculus 13 44085 44077 43894 44078 44079 43891 43896 44082 43899 44080 43900 44083 43901"
## [15] "PMC9606253 PMC_DL/PMC9606253/supplementaryfiles/41467_2022_34200_MOESM5_ESM.xlsx Mmusculus 17 43346 43351 43348 43163 43168 43162 43160 43347 43165 43349 43354 43353 43170 43169 43161 43350 43352"
## [16] "PMC9606253 PMC_DL/PMC9606253/supplementaryfiles/41467_2022_34200_MOESM12_ESM.xlsx Mmusculus 128 43891 43891 43891 44085 43891 43891 44080 44076 43891 43891 44080 44080 43891 43893 43891 43891 43894 44083 43891 43891 43899 44083 44083 43891 43894 43900 44082 43894 44082 43894 44079 43900 44083 44082 44084 44083 44083 43894 43894 43894 43892 43893 44084 44083 43893 43891 43901 43896 43900 43896 43891 43891 44084 43891 43901 43891 43891 43892 43891 43901 43891 43891 43894 43891 43901 43891 44084 43893 44083 43892 44083 43896 43891 43893 43892 44086 43893 43891 43891 43891 43891 43896 43900 43898 43893 44085 43891 43898 44086 43893 44081 43893 44075 44075 44084 43893 43896 44085 43891 43901 44076 43899 44077 43894 44081 44081 44088 44083 44075 43892 44085 44083 44083 44083 44077 43892 43892 44086 43892 43898 43892 43900 43896 44083 44082 43892 43891 43891"
## [17] "PMC9606253 PMC_DL/PMC9606253/supplementaryfiles/41467_2022_34200_MOESM11_ESM.xlsx Mmusculus 14 43525 43711 43528 43525 43525 43718 43528 43713 43532 43527 43717 43715 43716 43711"
## [18] "PMC9606253 PMC_DL/PMC9606253/supplementaryfiles/41467_2022_34200_MOESM11_ESM.xlsx Mmusculus 11 43528 43532 43535 43525 43717 43525 43525 43717 43717 43532 43527"
## [19] "PMC9605692 PMC_DL/PMC9605692/supplementaryfiles/elife-78345-supp2.xlsx Hsapiens 7 44809 44621 44625 44811 44810 44621 44628"
## [20] "PMC9586478 zip/sciadv.abq8297_table_s3.xlsx Hsapiens 16 38200 37834 40422 39326 37469 39692 37104 38596 37500 40787 38961 40057 37135 38231 41153 37865"
## [21] "PMC9584951 PMC_DL/PMC9584951/supplementaryfiles/41598_2022_21473_MOESM5_ESM.xlsx Hsapiens 3 44896 44623 44624"
## [22] "PMC9584949 PMC_DL/PMC9584949/supplementaryfiles/41467_2022_33945_MOESM5_ESM.xlsx Mmusculus 14 38596 38231 39326 37135 42248 38777 37316 37500 38047 39508 40422 38412 39692 36951"
## [23] "PMC9584885 PMC_DL/PMC9584885/supplementaryfiles/41467_2022_33385_MOESM6_ESM.xlsx Hsapiens 360 40057 38412 38412 37500 40057 37135 40057 40057 40057 38961 40057 37135 37135 40787 37500 40787 40057 37500 40422 38961 38961 38961 38961 38961 38961 38961 38961 40787 40787 39692 39692 37500 37500 37500 37500 37500 37500 37500 37500 37135 37135 37135 37135 37135 37135 37135 37135 37135 40057 40057 40057 40057 40057 40422 40422 40422 40422 38412 38412 38412 38961 40057 38961 37135 38412 37135 40057 37135 40057 38961 38961 38961 38961 38961 38961 38961 40787 40787 40787 40787 39692 39692 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37135 37135 37135 37135 37135 37135 37135 37135 37135 40057 40057 40057 40057 40057 40057 40057 40057 40057 40422 40422 40422 40422 40422 38412 38412 38412 38412 40057 38961 40422 40057 37500 38961 37135 40422 40057 40787 40422 37135 37135 38412 39692 39692 37135 40057 38961 40057 40057 40057 37500 37500 40787 37500 38412 37135 37500 37500 37500 37500 40057 38961 38961 40787 37135 40057 40422 40422 40787 37135 38961 38412 38961 38961 38961 37500 37500 37500 37135 37135 37135 37135 37135 40057 40057 40057 38412 38412 40057 37135 38412 40422 40057 38961 38961 38961 38961 38961 38961 38961 38961 38961 40787 40787 40787 40787 39692 39692 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37135 37135 37135 37135 37135 37135 37135 37135 37135 37135 37135 40057 40057 40057 40057 40057 40057 40057 40057 40057 40057 40422 40422 40422 40422 38412 38412 38412 38412 38961 37135 37500 40057 40787 40057 40057 38961 38412 40057 40787 38961 38412 37500 37135 40057 38961 40057 38412 37135 37135 38961 38961 40422 37500 38961 40422 38961 40057 40057 38961 37500 40422 37135 37135 37500 38412 40787 37500 40422 40057 37135 37135 37500 37135 40057 40057 37500 40787 37500 40057 40422 39692 39692 37135 37135 37135 37500 38412 37500 38961 38961 38961 38961 38961 38961 38961 38961 38961 40787 40787 40787 40787 39692 39692 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37135 37135 37135 37135 37135 37135 37135 37135 37135 37135 37135 37135 40057 40057 40057 40057 40057 40057 40057 40057 40057 40057 40057 40057 40422 40422 40422 40422 40422 38412 38412 38412 38412 38412"
## [24] "PMC9581932 PMC_DL/PMC9581932/supplementaryfiles/41467_2022_33904_MOESM4_ESM.xlsx Scerevisiae 1 41183"
## [25] "PMC9581932 PMC_DL/PMC9581932/supplementaryfiles/41467_2022_33904_MOESM4_ESM.xlsx Scerevisiae 1 44470"
## [26] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S3-RNA-seq_SAHA_DEGs-.xlsx Hsapiens 17 44448 44447 44264 44445 44262 44260 44443 44440 44442 44263 44450 44257 44441 44446 44261 44258 44444"
## [27] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S3-RNA-seq_SAHA_DEGs-.xlsx Hsapiens 1 44448"
## [28] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S3-RNA-seq_SAHA_DEGs-.xlsx Hsapiens 17 44258 44443 44257 44263 44264 44448 44262 44440 44445 44446 44450 44442 44444 44447 44260 44441 44261"
## [29] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S3-RNA-seq_SAHA_DEGs-.xlsx Hsapiens 3 44258 44443 44257"
## [30] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S3-RNA-seq_SAHA_DEGs-.xlsx Hsapiens 1 44448"
## [31] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S3-RNA-seq_SAHA_DEGs-.xlsx Hsapiens 17 44257 44443 44258 44440 44448 44264 44262 44263 44450 44445 44447 44261 44444 44446 44441 44260 44442"
## [32] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S3-RNA-seq_SAHA_DEGs-.xlsx Hsapiens 3 44257 44443 44258"
## [33] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S3-RNA-seq_SAHA_DEGs-.xlsx Hsapiens 2 44440 44448"
## [34] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S1-RNA-seq_EPH_DEGs-.xlsx Hsapiens 17 43898 44085 44081 44076 43896 43893 44077 44082 44080 43895 44079 44075 44083 43892 44078 43899 43897"
## [35] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S1-RNA-seq_EPH_DEGs-.xlsx Hsapiens 17 43893 44077 44079 44083 44080 43896 43897 44085 44075 43895 43892 43899 44082 44078 44076 43898 44081"
## [36] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S1-RNA-seq_EPH_DEGs-.xlsx Hsapiens 3 43893 44079 44083"
## [37] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S1-RNA-seq_EPH_DEGs-.xlsx Hsapiens 17 43898 43897 44081 44083 43895 43896 44079 43892 43893 44080 43899 44077 44075 44076 44078 44082 44085"
## [38] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S1-RNA-seq_EPH_DEGs-.xlsx Hsapiens 5 43893 44083 44079 43897 44081"
## [39] "PMC9612055 zip/Supplemental_Tables_S1-S3/Supplementary_Table_S1-RNA-seq_EPH_DEGs-.xlsx Hsapiens 1 43898"
## [40] "PMC9597788 zip/Table_S3.xlsx Ggallus 21 44262 44259 44256 44258 44260 44262 44262 44262 44263 44263 44264 44258 44257 44262 44262 44262 44262 44265 44261 44266 44262"
## [41] "PMC9576059 PMC_DL/PMC9576059/supplementaryfiles/pone.0272368.s003.xlsx Hsapiens 2 37865 37865"
## [42] "PMC9571470 PMC_DL/PMC9571470/supplementaryfiles/12885_2022_10155_MOESM1_ESM.xlsx Hsapiens 26 44531 44265 44266 44256 44257 44258 44259 44260 44261 44262 44263 44264 44454 44449 44450 44451 44453 44440 44441 44442 44443 44444 44445 44446 44447 44448"
## [43] "PMC9575684 PMC_DL/PMC9575684/supplementaryfiles/peerj-10-14166-s006.xlsx Athaliana 4 44654 44837 44654 44654"
## [44] "PMC9568502 PMC_DL/PMC9568502/supplementaryfiles/41598_2022_20939_MOESM13_ESM.xlsx Hsapiens 74 43891 43891 43894 43897 43894 43894 43894 43897 43897 44084 44076 43894 43894 44084 44084 43891 43891 43891 43891 43891 43891 43891 43891 43896 43896 44081 44166 44166 43895 43898 43898 43899 44086 44086 44086 44086 44086 44086 44086 44086 44086 44086 44078 44083 44083 43900 44078 44083 44083 44083 44083 44083 44083 44083 44083 44083 43900 44083 44083 44083 44083 44083 44083 44083 44083 44083 44083 43892 43892 44077 44077 44080 44080 44080"
## [45] "PMC9568502 PMC_DL/PMC9568502/supplementaryfiles/41598_2022_20939_MOESM12_ESM.xlsx Hsapiens 68 43891 43892 43891 43891 43891 43892 43891 44089 43894 43897 43897 43894 43894 43894 43894 43894 43897 43894 43894 44084 44084 43891 43891 44085 44085 44085 44085 43891 43891 44085 43901 43893 43896 43893 43893 44081 44166 44166 43899 43899 44086 44086 44086 44086 44086 44086 44086 44083 44083 44083 44083 44083 44083 44083 44083 44083 44083 43900 44083 44083 44083 44083 44083 44083 44083 43892 43892 44077"
## [46] "PMC9596731 PMC_DL/PMC9596731/supplementaryfiles/mmc2.xlsx Mmusculus 5 44627 44815 44808 44812 44813"
## [47] "PMC9605867 PMC_DL/PMC9605867/supplementaryfiles/41586_2022_5275_MOESM3_ESM.xlsx Hsapiens 19 42248 38047 40787 36951 38777 39326 41883 37226 37226 37226 37226 39508 38412 38231 40238 40238 40238 40057 37316"
## [48] "PMC9605867 PMC_DL/PMC9605867/supplementaryfiles/41586_2022_5275_MOESM3_ESM.xlsx Hsapiens 14 40238 40238 40238 40238 39508 39508 39873 41153 37500 37500 37500 37500 37500 37500"
## [49] "PMC9605867 PMC_DL/PMC9605867/supplementaryfiles/41586_2022_5275_MOESM3_ESM.xlsx Hsapiens 12 42248 40787 36951 41883 37226 37226 38412 38231 40238 40238 40057 37316"
## [50] "PMC9605867 PMC_DL/PMC9605867/supplementaryfiles/41586_2022_5275_MOESM3_ESM.xlsx Hsapiens 2 37500 37226"
## [51] "PMC9605867 PMC_DL/PMC9605867/supplementaryfiles/41586_2022_5275_MOESM3_ESM.xlsx Hsapiens 1 37226"
## [52] "PMC9584372 PMC_DL/PMC9584372/supplementaryfiles/pbio.3001543.s011.xlsx Hsapiens 7 43717 43715 43528 43714 43710 43713 43532"
## [53] "PMC9550834 PMC_DL/PMC9550834/supplementaryfiles/41467_2022_33359_MOESM29_ESM.xlsx Hsapiens 1 4-Mar"
## [54] "PMC9550834 PMC_DL/PMC9550834/supplementaryfiles/41467_2022_33359_MOESM29_ESM.xlsx Hsapiens 1 4-Mar"
## [55] "PMC9550834 PMC_DL/PMC9550834/supplementaryfiles/41467_2022_33359_MOESM9_ESM.xlsx Hsapiens 1 42988"
## [56] "PMC9570186 zip/Table_S1.xlsx Ggallus 8 44257 44258 44449 44257 44442 44256 44443 44265"
## [57] "PMC9570186 zip/Table_S1.xlsx Ggallus 7 44257 44258 44449 44264 44443 44442 44440"
## [58] "PMC9570186 zip/Table_S1.xlsx Ggallus 11 44447 44258 44257 44449 44257 44444 44264 44442 44265 44440 44256"
## [59] "PMC9570186 zip/Table_S1.xlsx Hsapiens 4 44442 44258 44257 44449"
## [60] "PMC9570186 zip/Table_S15.xlsx Ggallus 4 44622 44808 44623 44623"
## [61] "PMC9548106 PMC_DL/PMC9548106/supplementaryfiles/12935_2022_2728_MOESM2_ESM.xlsx Hsapiens 4 44531 44448 44448 44257"
## [62] "PMC9548106 PMC_DL/PMC9548106/supplementaryfiles/12935_2022_2728_MOESM2_ESM.xlsx Hsapiens 5 44257 44257 44260 44260 44453"
## [63] "PMC9585224 PMC_DL/PMC9585224/supplementaryfiles/Table_2.xls Hsapiens 1 2022/12/01"
## [64] "PMC9547065 PMC_DL/PMC9547065/supplementaryfiles/41467_2022_33427_MOESM6_ESM.xlsx Hsapiens 7 43160 43160 43167 43167 43347 43346 43346"
## [65] "PMC9547055 PMC_DL/PMC9547055/supplementaryfiles/41467_2022_33544_MOESM7_ESM.xlsx Hsapiens 1 37865"
## [66] "PMC9547055 PMC_DL/PMC9547055/supplementaryfiles/41467_2022_33544_MOESM6_ESM.xlsx Hsapiens 1 37865"
## [67] "PMC9546890 PMC_DL/PMC9546890/supplementaryfiles/41598_2022_21003_MOESM2_ESM.xlsx Hsapiens 26 44450 44442 44260 44446 44256 44257 44266 44256 44447 44451 44448 44443 44265 44453 44449 44531 44262 44258 44259 44257 44263 44454 44440 44440 44441 44261"
## [68] "PMC9544324 zip/sciadv.abn5535_file_s2.xlsx Hsapiens 2 44261 44260"
## [69] "PMC9586874 PMC_DL/PMC9586874/supplementaryfiles/41556_2022_996_MOESM3_ESM.xlsx Mmusculus 2 37316 37316"
## [70] "PMC9586874 PMC_DL/PMC9586874/supplementaryfiles/41556_2022_996_MOESM3_ESM.xlsx Mmusculus 3 38412 38412 37316"
## [71] "PMC9586874 PMC_DL/PMC9586874/supplementaryfiles/41556_2022_996_MOESM3_ESM.xlsx Mmusculus 1 38777"
## [72] "PMC9582937 zip/Table_S13.xlsx Athaliana 1 44836"
## [73] "PMC9561262 zip/Supplementary_Data_S4.xlsx Mmusculus 11 44623 44813 44623 44812 44813 44623 44622 44626 44630 44628 44811"
## [74] "PMC9585938 PMC_DL/PMC9585938/supplementaryfiles/Table1.XLSX Hsapiens 11 40057 40787 37865 38596 40057 37500 37865 40057 37500 40057 39326"
## [75] "PMC9585938 PMC_DL/PMC9585938/supplementaryfiles/Table1.XLSX Hsapiens 35 37500 39326 37226 40057 40057 38961 40057 37865 38231 38231 40057 38596 40057 37226 39692 40787 37135 41883 38231 40057 40422 38961 40057 37226 40057 40057 40057 38231 40057 40057 40057 39692 38961 40057 40787"
## [76] "PMC9579367 PMC_DL/PMC9579367/supplementaryfiles/Table_1.xlsx Hsapiens 2 44812 44810"
## [77] "PMC9579347 PMC_DL/PMC9579347/supplementaryfiles/DataSheet1.xlsx Ggallus 1 10-Mar"
## [78] "PMC9579347 PMC_DL/PMC9579347/supplementaryfiles/DataSheet1.xlsx Hsapiens 1 44621"
## [79] "PMC9579318 PMC_DL/PMC9579318/supplementaryfiles/Table_3.xlsx Hsapiens 1 44260"
## [80] "PMC9579318 PMC_DL/PMC9579318/supplementaryfiles/Table_3.xlsx Hsapiens 1 44260"
## [81] "PMC9558508 zip/animals-1918231-supplementary/Supplementary_Materials/Supplementary_Tables/Table_S2.xlsx Hsapiens 14 44813 44621 44627 44630 44627 44621 44630 44630 44630 44813 44813 44627 44630 44813"
## [82] "PMC9558508 zip/animals-1918231-supplementary/Supplementary_Materials/Supplementary_Tables/Table_S2.xlsx Hsapiens 14 44813 44621 44630 44813 44630 44813 44630 44630 44630 44621 44813 44627 44627 44627"
## [83] "PMC9558508 zip/animals-1918231-supplementary/Supplementary_Materials/Supplementary_Tables/Table_S2.xlsx Hsapiens 14 44813 44630 44627 44630 44621 44621 44813 44627 44630 44630 44627 44813 44813 44630"
## [84] "PMC9558508 zip/animals-1918231-supplementary/Supplementary_Materials/Supplementary_Tables/Table_S2.xlsx Hsapiens 14 44621 44813 44627 44630 44627 44621 44630 44630 44630 44813 44813 44627 44630 44813"
## [85] "PMC9558508 zip/animals-1918231-supplementary/Supplementary_Materials/Supplementary_Tables/Table_S2.xlsx Hsapiens 11 44630 44813 44630 44627 44627 44813 44621 44621 44630 44630 44627"
## [86] "PMC9558508 zip/animals-1918231-supplementary/Supplementary_Materials/Supplementary_Tables/Table_S2.xlsx Hsapiens 14 44630 44627 44813 44630 44630 44813 44621 44621 44627 44627 44630 44813 44630 44813"
## [87] "PMC9558508 zip/animals-1918231-supplementary/Supplementary_Materials/Supplementary_Tables/Table_S2.xlsx Hsapiens 14 44627 44630 44630 44627 44630 44627 44621 44813 44813 44621 44630 44630 44813 44813"
## [88] "PMC9558508 zip/animals-1918231-supplementary/Supplementary_Materials/Supplementary_Tables/Table_S2.xlsx Hsapiens 14 44627 44813 44621 44630 44621 44627 44630 44630 44630 44813 44813 44627 44813 44630"
## [89] "PMC9558508 zip/animals-1918231-supplementary/Supplementary_Materials/Supplementary_Tables/Table_S2.xlsx Hsapiens 14 44627 44627 44813 44621 44630 44621 44813 44627 44630 44813 44630 44630 44630 44813"
## [90] "PMC9558508 zip/animals-1918231-supplementary/Supplementary_Materials/Supplementary_Tables/Table_S2.xlsx Hsapiens 14 44627 44621 44630 44813 44630 44627 44813 44630 44630 44630 44813 44621 44627 44813"
## [91] "PMC9535881 PMC_DL/PMC9535881/supplementaryfiles/12915_2022_1410_MOESM3_ESM.xlsx Mmusculus 1 38231"
## [92] "PMC9535881 PMC_DL/PMC9535881/supplementaryfiles/12915_2022_1410_MOESM2_ESM.xlsx Mmusculus 2 39692 38231"
## [93] "PMC9535881 PMC_DL/PMC9535881/supplementaryfiles/12915_2022_1410_MOESM6_ESM.xlsx Mmusculus 1 38231"
## [94] "PMC9534852 PMC_DL/PMC9534852/supplementaryfiles/41598_2022_20572_MOESM4_ESM.xlsx Hsapiens 1 44442"
## [95] "PMC9532393 PMC_DL/PMC9532393/supplementaryfiles/41467_2022_33558_MOESM4_ESM.xlsx Hsapiens 8 43526 43712 43526 43714 43723 43717 43525 43718"
## [96] "PMC9532393 PMC_DL/PMC9532393/supplementaryfiles/41467_2022_33558_MOESM5_ESM.xlsx Hsapiens 5 43711 43533 43525 43713 43525"
## [97] "PMC9532393 PMC_DL/PMC9532393/supplementaryfiles/41467_2022_33558_MOESM6_ESM.xlsx Hsapiens 1 43167"
## [98] "PMC9532393 PMC_DL/PMC9532393/supplementaryfiles/41467_2022_33558_MOESM7_ESM.xlsx Hsapiens 2 43891 43896"
## [99] "PMC9532393 PMC_DL/PMC9532393/supplementaryfiles/41467_2022_33558_MOESM7_ESM.xlsx Hsapiens 3 43891 43898 43893"
## [100] "PMC9531459 PMC_DL/PMC9531459/supplementaryfiles/41065_2022_252_MOESM3_ESM.xlsx Hsapiens 1 44626"
## [101] "PMC9574330 PMC_DL/PMC9574330/supplementaryfiles/Table_3.xls Hsapiens 2 2021/03/05 2021/09/15"
## [102] "PMC9561881 PMC_DL/PMC9561881/supplementaryfiles/Table8.xlsx Ggallus 2 44621 44809"
## [103] "PMC9561266 zip/Supplementary_data_final_180822_2_BDM_jgb_21092022.xlsx Athaliana 1 36982"
## [104] "PMC9531352 PMC_DL/PMC9531352/supplementaryfiles/12864_2022_8914_MOESM4_ESM.xlsx Athaliana 1 44440"
## [105] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [106] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [107] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [108] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [109] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [110] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [111] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [112] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44626 44631 44624 44628 44629 44621 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [113] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [114] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [115] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [116] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [117] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [118] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [119] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [120] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [121] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 8 44623 44631 44626 44629 44628 44621 44624 44622"
## [122] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [123] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [124] "PMC9578707 PMC_DL/PMC9578707/supplementaryfiles/elife-81398-fig2-data4.xlsx Hsapiens 24 44629 44628 44622 44621 44631 44623 44624 44626 44623 44631 44626 44629 44628 44621 44624 44622 44623 44631 44626 44629 44628 44621 44624 44622"
## [125] "PMC9561824 PMC_DL/PMC9561824/supplementaryfiles/Table2.XLS Hsapiens 16 2022/09/06 2022/03/08 2022/09/01 2022/03/01 2022/09/02 2022/09/09 2022/03/02 2022/03/03 2022/09/07 2022/09/08 2022/09/04 2022/03/05 2022/09/11 2022/03/06 2022/03/02 2022/03/07"
## [126] "PMC9546626 PMC_DL/PMC9546626/supplementaryfiles/pnas.2208844119.sd01.xlsx Hsapiens 26 44809 44631 44818 44816 44819 44630 44808 44621 44626 44629 44627 44807 44624 44813 44896 44622 44805 44625 44806 44811 44623 44628 44812 44814 44810 44815"
## [127] "PMC9524707 PMC_DL/PMC9524707/supplementaryfiles/pone.0275226.s008.xlsx Hsapiens 21 44624 44815 44814 44622 44806 44811 44626 44631 44813 44622 44628 44807 44812 44627 44623 44621 44810 44625 44629 44805 44808"
## [128] "PMC9523919 PMC_DL/PMC9523919/supplementaryfiles/12967_2022_3651_MOESM2_ESM.xlsx Hsapiens 1 44078"
## [129] "PMC9570865 zip/Table_S8._Salt-inducible_DEGs_in_the_shoot_apex.xlsx Athaliana 2 43923 43924"
## [130] "PMC9570865 zip/Table_S8._Salt-inducible_DEGs_in_the_shoot_apex.xlsx Athaliana 2 43923 43924"
## [131] "PMC9570865 zip/Table_S8._Salt-inducible_DEGs_in_the_shoot_apex.xlsx Athaliana 2 43923 43924"
## [132] "PMC9518689 PMC_DL/PMC9518689/supplementaryfiles/oncotarget-13-28277-s002.xls Hsapiens 1 44166"
## [133] "PMC9518689 PMC_DL/PMC9518689/supplementaryfiles/oncotarget-13-28277-s003.xls Hsapiens 1 44896"
## [134] "PMC9518689 PMC_DL/PMC9518689/supplementaryfiles/oncotarget-13-28277-s004.xls Hsapiens 1 44896"
## [135] "PMC9518689 PMC_DL/PMC9518689/supplementaryfiles/oncotarget-13-28277-s004.xls Hsapiens 1 44896"
## [136] "PMC9552960 PMC_DL/PMC9552960/supplementaryfiles/Table_2.xlsx Mmusculus 6 37865 37865 36951 36951 36951 36951"
## [137] "PMC9552960 PMC_DL/PMC9552960/supplementaryfiles/Table_2.xlsx Mmusculus 2 37681 37681"
## [138] "PMC9551187 PMC_DL/PMC9551187/supplementaryfiles/Table_1.xlsx Hsapiens 2 44442 44256"
## [139] "PMC9550876 PMC_DL/PMC9550876/supplementaryfiles/Table_1.xlsx Hsapiens 4 44257 44258 44444 44261"
## [140] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_4.xlsx Hsapiens 2 1-Mar 2-Mar"
## [141] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_4.xlsx Hsapiens 2 1-Mar 2-Mar"
## [142] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_4.xlsx Hsapiens 2 1-Mar 2-Mar"
## [143] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_4.xlsx Hsapiens 2 1-Mar 2-Mar"
## [144] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_4.xlsx Hsapiens 2 1-Mar 2-Mar"
## [145] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_4.xlsx Hsapiens 2 2-Mar 1-Mar"
## [146] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_6.xlsx Ggallus 2 2-Mar 1-Mar"
## [147] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_6.xlsx Hsapiens 2 1-Mar 2-Mar"
## [148] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_6.xlsx Hsapiens 2 2-Mar 1-Mar"
## [149] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_6.xlsx Ggallus 2 2-Mar 1-Mar"
## [150] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_7.xlsx Hsapiens 1 44621"
## [151] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_5.xlsx Hsapiens 2 1-Mar 2-Mar"
## [152] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_5.xlsx Hsapiens 2 1-Mar 2-Mar"
## [153] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_5.xlsx Hsapiens 2 1-Mar 2-Mar"
## [154] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_5.xlsx Hsapiens 2 1-Mar 2-Mar"
## [155] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_5.xlsx Hsapiens 2 1-Mar 2-Mar"
## [156] "PMC9549141 PMC_DL/PMC9549141/supplementaryfiles/Table_5.xlsx Hsapiens 2 1-Mar 2-Mar"
## [157] "PMC9549113 PMC_DL/PMC9549113/supplementaryfiles/Table3.XLSX Hsapiens 4 15612 60961 13250 13246"
## [158] "PMC9549113 PMC_DL/PMC9549113/supplementaryfiles/Table3.XLSX Hsapiens 4 15612 60961 13250 13246"
## [159] "PMC9549113 PMC_DL/PMC9549113/supplementaryfiles/Table4.XLSX Hsapiens 2 44808 44806"
## [160] "PMC9549113 PMC_DL/PMC9549113/supplementaryfiles/Table4.XLSX Hsapiens 2 44808 44806"
## [161] "PMC9549113 PMC_DL/PMC9549113/supplementaryfiles/Table4.XLSX Hsapiens 2 44808 44806"
## [162] "PMC9548632 PMC_DL/PMC9548632/supplementaryfiles/DataSheet2.xlsx Hsapiens 17 44810 44621 44622 44813 44808 44623 44812 44626 44896 44814 44815 44627 44625 44628 44819 44806 44811"
## [163] "PMC9548593 PMC_DL/PMC9548593/supplementaryfiles/Table_1.xlsx Hsapiens 13 44819 44621 44896 44627 44625 44623 44622 44626 44624 44631 44629 44628 44630"
## [164] "PMC9548527 PMC_DL/PMC9548527/supplementaryfiles/DataSheet2.XLS Hsapiens 23 2022/09/04 2022/09/07 2022/09/11 2022/03/07 2022/09/07 2022/09/03 2022/09/05 2022/09/06 2022/03/07 2022/03/08 2022/03/08 2022/03/08 2022/03/08 2022/03/06 2022/03/02 2022/03/02 2022/03/03 2022/09/06 2022/03/03 2022/09/05 2022/03/08 2022/09/06 2022/03/08"
## [165] "PMC9550651 PMC_DL/PMC9550651/supplementaryfiles/mmc2.xlsx Mmusculus 84 44445 44444 44262 44262 44442 44449 44263 44263 44263 44450 44263 44263 44445 44258 44258 44260 44450 44263 44263 44449 44449 44261 44261 44449 44260 44260 44441 44262 44260 44257 44261 44261 44257 44263 44447 44262 44262 44261 44446 44448 44447 44444 44258 44258 44260 44257 44262 44262 44447 44258 44441 44257 44262 44262 44262 44446 44263 44263 44257 44262 44257 44263 44441 44257 44450 44450 44450 44256 44441 44257 44262 44263 44262 44441 44450 44263 44450 44448 44441 44257 44262 44263 44257 44265"
## [166] "PMC9550651 PMC_DL/PMC9550651/supplementaryfiles/mmc2.xlsx Mmusculus 22 44263 44263 44263 44257 44257 44450 44263 44257 44262 44262 44260 44263 44263 44258 44262 44262 44257 44262 44257 44258 44260 44263"
## [167] "PMC9550651 PMC_DL/PMC9550651/supplementaryfiles/mmc2.xlsx Mmusculus 3 44450 44256 44262"
## [168] "PMC9550651 PMC_DL/PMC9550651/supplementaryfiles/mmc2.xlsx Mmusculus 9 44443 44447 44262 44262 44260 44444 44257 44257 44450"
## [169] "PMC9550651 PMC_DL/PMC9550651/supplementaryfiles/mmc2.xlsx Mmusculus 6 44443 44443 44443 44448 44446 44440"
## [170] "PMC9550651 PMC_DL/PMC9550651/supplementaryfiles/mmc3.xlsx Mmusculus 1 44445"
## [171] "PMC9550651 PMC_DL/PMC9550651/supplementaryfiles/mmc3.xlsx Mmusculus 1 44445"
## [172] "PMC9558549 zip/cancers-1768880_SupplementaryFiles/cancers-1768880_SupplementaryTables.xlsx Hsapiens 3 44813 44811 44806"
## [173] "PMC9558549 zip/cancers-1768880_SupplementaryFiles/cancers-1768880_SupplementaryTables.xlsx Hsapiens 6 44624 44624 44624 44624 44624 44809"
## [174] "PMC9558549 zip/cancers-1768880_SupplementaryFiles/cancers-1768880_SupplementaryTables.xlsx Hsapiens 5 44622 44622 44627 44810 44813"
## [175] "PMC9558549 zip/cancers-1768880_SupplementaryFiles/cancers-1768880_SupplementaryTables.xlsx Hsapiens 4 44805 44621 44808 44624"
## [176] "PMC9558549 zip/cancers-1768880_SupplementaryFiles/cancers-1768880_SupplementaryTables.xlsx Ggallus 2 44444 44444"
## [177] "PMC9532972 zip/Table_1.XLS Hsapiens 1 2021/03/02"
## [178] "PMC9532972 zip/Table_1.XLS Hsapiens 33 2020/03/01 2020/03/11 2020/03/11 2020/03/07 2020/09/09 2020/09/02 2020/03/06 2020/03/02 2020/09/09 2020/03/05 2020/03/08 2020/09/09 2020/03/10 2020/03/02 2020/09/09 2020/03/02 2020/09/05 2020/09/10 2020/03/11 2020/12/01 2020/03/10 2020/09/04 2020/03/01 2020/03/10 2020/03/01 2020/03/11 2020/09/12 2020/03/04 2020/03/04 2020/12/01 2020/12/01 2020/12/01 2020/12/01"
## [179] "PMC9529142 PMC_DL/PMC9529142/supplementaryfiles/pgen.1010416.s009.xls Mmusculus 1 44807"
## [180] "PMC9534044 PMC_DL/PMC9534044/supplementaryfiles/NIHMS1837784-supplement-5.xlsx Hsapiens 6 43535 43719 43718 43714 43526 43712"
## [181] "PMC9531689 PMC_DL/PMC9531689/supplementaryfiles/Table_2.XLSX Hsapiens 1 43349"
## [182] "PMC9531163 PMC_DL/PMC9531163/supplementaryfiles/Table1.xlsx Hsapiens 12 44622 44623 44624 44625 44626 44627 44628 44629 44630 44631 44819 44896"
## [183] "PMC9531163 PMC_DL/PMC9531163/supplementaryfiles/Table1.xlsx Hsapiens 8 44623 44624 44625 44627 44628 44629 44630 44631"
## [184] "PMC9530606 PMC_DL/PMC9530606/supplementaryfiles/Table_4.xlsx Hsapiens 12 44628 44623 44622 44621 44626 44621 44628 44629 44627 44622 44625 44624"
## [185] "PMC9530606 PMC_DL/PMC9530606/supplementaryfiles/Table_4.xlsx Hsapiens 12 44628 44623 44622 44621 44626 44621 44628 44629 44627 44622 44625 44624"
## [186] "PMC9530606 PMC_DL/PMC9530606/supplementaryfiles/Table_3.xlsx Hsapiens 12 44623 44627 44624 44626 44621 44628 44622 44621 44628 44622 44625 44629"
## [187] "PMC9530606 PMC_DL/PMC9530606/supplementaryfiles/Table_3.xlsx Hsapiens 12 44623 44627 44624 44626 44621 44628 44622 44621 44628 44622 44625 44629"
## [188] "PMC9530606 PMC_DL/PMC9530606/supplementaryfiles/Table_2.xlsx Hsapiens 12 44627 44626 44625 44628 44622 44622 44628 44624 44621 44623 44621 44629"
## [189] "PMC9530606 PMC_DL/PMC9530606/supplementaryfiles/Table_2.xlsx Hsapiens 12 44627 44626 44625 44628 44622 44622 44628 44624 44621 44623 44621 44629"
## [190] "PMC9530606 PMC_DL/PMC9530606/supplementaryfiles/Table_6.xlsx Hsapiens 12 44626 44628 44623 44624 44621 44621 44627 44622 44622 44629 44625 44628"
## [191] "PMC9530606 PMC_DL/PMC9530606/supplementaryfiles/Table_6.xlsx Hsapiens 12 44626 44628 44623 44624 44621 44621 44627 44622 44622 44629 44625 44628"
## [192] "PMC9530606 PMC_DL/PMC9530606/supplementaryfiles/Table_1.xlsx Hsapiens 12 44628 44623 44622 44621 44626 44624 44629 44622 44628 44627 44625 44621"
## [193] "PMC9530606 PMC_DL/PMC9530606/supplementaryfiles/Table_1.xlsx Hsapiens 12 44628 44623 44622 44621 44626 44624 44629 44622 44628 44627 44625 44621"
## [194] "PMC9530606 PMC_DL/PMC9530606/supplementaryfiles/Table_5.xlsx Hsapiens 12 44621 44629 44627 44622 44625 44622 44624 44626 44623 44621 44628 44628"
## [195] "PMC9530606 PMC_DL/PMC9530606/supplementaryfiles/Table_5.xlsx Hsapiens 12 44621 44629 44627 44622 44625 44622 44624 44626 44623 44621 44628 44628"
## [196] "PMC9530462 PMC_DL/PMC9530462/supplementaryfiles/Table3.XLSX Hsapiens 2 44630 44813"
## [197] "PMC9530462 PMC_DL/PMC9530462/supplementaryfiles/Table3.XLSX Hsapiens 1 44630"
## [198] "PMC9530347 PMC_DL/PMC9530347/supplementaryfiles/Table_1.xlsx Hsapiens 16 39326 37500 39873 37316 40787 42248 39692 39142 38961 38777 40422 40057 37316 40057 40422 39326"
## [199] "PMC9530347 PMC_DL/PMC9530347/supplementaryfiles/Table_1.xlsx Hsapiens 1 42248"
## [200] "PMC9482892 PMC_DL/PMC9482892/supplementaryfiles/13619_2022_131_MOESM1_ESM.xlsx Mmusculus 4 44805 44812 44623 44809"
## [201] "PMC9524857 PMC_DL/PMC9524857/supplementaryfiles/Table_1.xlsx Hsapiens 9 44628 44626 44622 44627 44625 44621 44629 44819 44623"
## [202] "PMC9523398 PMC_DL/PMC9523398/supplementaryfiles/mmc2.xlsx Hsapiens 2 44622 44621"
## [203] "PMC9523398 PMC_DL/PMC9523398/supplementaryfiles/mmc3.xlsx Hsapiens 3 44621 44896 44622"
## [204] "PMC9523360 PMC_DL/PMC9523360/supplementaryfiles/DataSheet_1.xlsx Hsapiens 1 44806"
## [205] "PMC9508470 PMC_DL/PMC9508470/supplementaryfiles/mmc4.xlsx Hsapiens 2 44811 44813"
## [206] "PMC9508470 PMC_DL/PMC9508470/supplementaryfiles/mmc3.xlsx Hsapiens 15 44815 44811 44806 44819 44623 44808 44812 44811 44806 44819 44626 44814 44621 44622 44815"
## [207] "PMC9560165 PMC_DL/PMC9560165/supplementaryfiles/elife-81755-fig5-data5.xlsx Hsapiens 2 36951 38412"
## [208] "PMC9560165 PMC_DL/PMC9560165/supplementaryfiles/elife-81755-fig5-data4.xlsx Hsapiens 1 36951"
## [209] "PMC9483806 PMC_DL/PMC9483806/supplementaryfiles/mmc1.xlsx Hsapiens 25 2022-03-02 2022-09-10 2022-09-04 2022-03-05 2022-09-09 2022-03-02 2022-03-06 2022-09-06 2022-09-08 2022-09-03 2022-03-08 2022-03-09 2022-03-10 2022-09-02 2022-09-15 2022-12-01 2022-09-01 2022-09-05 2022-03-04 2022-09-07 2022-03-07 2022-03-01 2022-03-03 2022-03-01 2022-09-11"
## [210] "PMC9483806 PMC_DL/PMC9483806/supplementaryfiles/mmc1.xlsx Hsapiens 13 2022-09-10 2022-09-04 2022-03-09 2022-09-09 2022-03-02 2022-03-05 2022-03-08 2022-09-03 2022-03-06 2022-09-07 2022-03-01 2022-03-10 2022-09-11"
## [211] "PMC9483806 PMC_DL/PMC9483806/supplementaryfiles/mmc1.xlsx Hsapiens 17 2022-09-11 2022-03-01 2022-09-03 2022-09-02 2022-09-04 2022-03-08 2022-03-05 2022-09-08 2022-03-09 2022-03-01 2022-03-10 2022-09-10 2022-03-02 2022-09-07 2022-09-09 2022-03-07 2022-09-05"
## [212] "PMC9483806 PMC_DL/PMC9483806/supplementaryfiles/mmc2.xlsx Hsapiens 1 2022-03-04"
## [213] "PMC9483806 PMC_DL/PMC9483806/supplementaryfiles/mmc2.xlsx Hsapiens 1 2022-03-04"
## [214] "PMC9512470 PMC_DL/PMC9512470/supplementaryfiles/HEP4-6-2950-s002.xlsx Hsapiens 2 44447 44441"
## [215] "PMC9494239 PMC_DL/PMC9494239/supplementaryfiles/mmc6.xlsx Hsapiens 1 37316"
## [216] "PMC9613476 PMC_DL/PMC9613476/supplementaryfiles/41375_2022_1671_MOESM2_ESM.xlsx Hsapiens 3 36951 36951 36951"
## [217] "PMC9584034 PMC_DL/PMC9584034/supplementaryfiles/NIHMS1829515-supplement-5.xlsx Hsapiens 31 44624 44621 44621 44621 44621 44621 44621 44626 44626 44623 44623 44623 44623 44811 44896 44628 44625 44630 44630 44813 44813 44813 44813 44813 44809 44809 44809 44809 44809 44809 44807"
## [218] "PMC9584034 PMC_DL/PMC9584034/supplementaryfiles/NIHMS1829515-supplement-5.xlsx Hsapiens 68 44621 44621 44624 44624 44624 44624 44621 44621 44621 44621 44621 44621 44621 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44623 44623 44623 44623 44623 44623 44623 44812 44812 44811 44896 44896 44896 44896 44896 44896 44628 44628 44628 44628 44630 44630 44630 44630 44813 44813 44813 44809 44807 44807"
## [219] "PMC9584034 PMC_DL/PMC9584034/supplementaryfiles/NIHMS1829515-supplement-5.xlsx Hsapiens 36 44622 44621 44624 44624 44624 44621 44621 44621 44621 44621 44621 44621 44621 44621 44626 44626 44626 44631 44623 44623 44812 44811 44896 44896 44628 44625 44630 44630 44630 44630 44813 44813 44813 44813 44813 44813"
## [220] "PMC9584034 PMC_DL/PMC9584034/supplementaryfiles/NIHMS1829515-supplement-5.xlsx Hsapiens 46 44621 44624 44815 44815 44815 44815 44623 44623 44812 44812 44812 44812 44812 44811 44811 44818 44896 44896 44896 44808 44630 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44622 44622 44807"
## [221] "PMC9584034 PMC_DL/PMC9584034/supplementaryfiles/NIHMS1829515-supplement-5.xlsx Hsapiens 4 44624 44813 44813 44622"
## [222] "PMC9584034 PMC_DL/PMC9584034/supplementaryfiles/NIHMS1829515-supplement-5.xlsx Ggallus 11 44621 44621 44621 44621 44621 44621 44623 44623 44629 44629 44813"
## [223] "PMC9584034 PMC_DL/PMC9584034/supplementaryfiles/NIHMS1829515-supplement-5.xlsx Hsapiens 29 44621 44624 44624 44624 44624 44621 44621 44621 44631 44631 44631 44631 44631 44631 44631 44631 44623 44630 44813 44813 44813 44813 44813 44813 44813 44813 44809 44809 44807"
## [224] "PMC9584034 PMC_DL/PMC9584034/supplementaryfiles/NIHMS1829515-supplement-5.xlsx Hsapiens 18 44621 44815 44815 44621 44621 44631 44631 44631 44623 44623 44625 44630 44813 44813 44813 44809 44809 44807"
## [225] "PMC9584034 PMC_DL/PMC9584034/supplementaryfiles/NIHMS1829515-supplement-5.xlsx Hsapiens 30 44814 44814 44621 44621 44621 44621 44621 44621 44631 44631 44631 44631 44631 44631 44631 44623 44811 44896 44896 44625 44630 44630 44630 44630 44630 44813 44813 44813 44809 44809"
## [226] "PMC9584034 PMC_DL/PMC9584034/supplementaryfiles/NIHMS1829515-supplement-5.xlsx Ggallus 20 44621 44621 44626 44626 44623 44811 44628 44628 44628 44628 44628 44628 44625 44625 44625 44625 44808 44808 44808 44630"
## [227] "PMC9584034 PMC_DL/PMC9584034/supplementaryfiles/NIHMS1829515-supplement-5.xlsx Ggallus 47 44621 44627 44627 44624 44624 44624 44624 44624 44624 44815 44815 44815 44815 44621 44621 44621 44621 44626 44631 44631 44631 44631 44623 44623 44623 44623 44623 44623 44623 44896 44896 44896 44896 44896 44896 44896 44628 44630 44630 44630 44630 44813 44813 44809 44809 44809 44809"
## [228] "PMC9584034 PMC_DL/PMC9584034/supplementaryfiles/NIHMS1829515-supplement-5.xlsx Hsapiens 81 44624 44624 44815 44815 44621 44621 44621 44621 44621 44621 44631 44631 44631 44631 44631 44631 44631 44631 44631 44631 44623 44623 44623 44623 44623 44623 44623 44623 44623 44812 44812 44812 44812 44812 44812 44812 44812 44812 44812 44812 44812 44812 44812 44812 44812 44812 44812 44812 44812 44811 44811 44811 44811 44811 44811 44811 44811 44811 44811 44811 44811 44811 44628 44628 44625 44808 44808 44808 44630 44630 44813 44809 44809 44809 44809 44809 44809 44809 44810 44810 44810"
## [229] "PMC9581494 PMC_DL/PMC9581494/supplementaryfiles/NIHMS1836573-supplement-12.xlsx Hsapiens 1 44265"
## [230] "PMC9545373 PMC_DL/PMC9545373/supplementaryfiles/AGE-53-709-s001.xlsx Hsapiens 10 LOC100686643-SEPT14 41883 41883 41883 41883 41883 41883 41883 41883 41883"
## [231] "PMC9546057 PMC_DL/PMC9546057/supplementaryfiles/MEC-31-4332-s001.xlsx Dmelanogaster 1 38231"
## [232] "PMC9559112 PMC_DL/PMC9559112/supplementaryfiles/mmc2.xlsx Hsapiens 23 44445 44257 44256 44266 44443 44260 44262 44259 44454 44265 44450 44258 44448 44261 44447 44446 44449 44451 44531 44263 44441 44440 44264"
## [233] "PMC9546433 PMC_DL/PMC9546433/supplementaryfiles/AJMG-189-151-s001.xlsx Hsapiens 27 42248 37316 36951 40422 39142 38047 37500 40787 36951 38777 40603 37681 39692 39326 41518 37226 39508 38412 39873 37135 38231 40238 40057 37316 38596 37865 38961"
## [234] "PMC9546433 PMC_DL/PMC9546433/supplementaryfiles/AJMG-189-151-s001.xlsx Hsapiens 27 42248 37316 36951 40422 39142 38047 37500 40787 36951 38777 40603 37681 39692 39326 41518 41883 39508 38412 39873 37135 38231 40238 40057 37316 38596 37865 38961"
## [235] "PMC9546433 PMC_DL/PMC9546433/supplementaryfiles/AJMG-189-151-s001.xlsx Hsapiens 25 36951 37316 36951 40603 37316 37681 38047 38412 38777 39142 39508 39873 42248 37135 40422 40787 41518 37500 37865 38231 38596 38961 39326 39692 40057"
## [236] "PMC9546433 PMC_DL/PMC9546433/supplementaryfiles/AJMG-189-151-s001.xlsx Hsapiens 2 37316 38047"
## [237] "PMC9546433 PMC_DL/PMC9546433/supplementaryfiles/AJMG-189-151-s001.xlsx Hsapiens 2 38047 37316"
## [238] "PMC9596373 PMC_DL/PMC9596373/supplementaryfiles/41379_2022_1112_MOESM2_ESM.xlsx Hsapiens 1 43723"
## [239] "PMC9545044 PMC_DL/PMC9545044/supplementaryfiles/APM-130-524-s002.xlsx Hsapiens 14 40422 37500 40787 39692 39326 41883 37226 41153 37135 38231 40057 38596 37865 38961"
## [240] "PMC9545044 PMC_DL/PMC9545044/supplementaryfiles/APM-130-524-s002.xlsx Hsapiens 8 40422 40787 39692 37226 41153 37135 38231 37865"
## [241] "PMC9545044 PMC_DL/PMC9545044/supplementaryfiles/APM-130-524-s002.xlsx Hsapiens 8 40422 40787 39692 37226 41153 37135 38231 37865"
Let’s investigate the errors in more detail.
# By species
SPECIES <- sapply(strsplit(ERROR_GENELISTS," "),"[[",3)
table(SPECIES)
## SPECIES
## Athaliana Dmelanogaster Ggallus Hsapiens Mmusculus
## 7 1 13 186 32
## Scerevisiae
## 2
par(mar=c(5,12,4,2))
barplot(table(SPECIES),horiz=TRUE,las=1)
par(mar=c(5,5,4,2))
# Number of affected Excel files per paper
DIST <- table(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
DIST
##
## PMC9482892 PMC9483806 PMC9494239 PMC9508470 PMC9512470 PMC9518689 PMC9523360
## 1 5 1 2 1 4 1
## PMC9523398 PMC9523919 PMC9524707 PMC9524857 PMC9529142 PMC9530347 PMC9530462
## 2 1 1 1 1 2 2
## PMC9530606 PMC9531163 PMC9531352 PMC9531459 PMC9531689 PMC9532393 PMC9532972
## 12 2 1 1 1 5 2
## PMC9534044 PMC9534852 PMC9535881 PMC9544324 PMC9545044 PMC9545373 PMC9546057
## 1 1 3 1 3 1 1
## PMC9546433 PMC9546626 PMC9546890 PMC9547055 PMC9547065 PMC9548106 PMC9548527
## 5 1 1 2 1 2 1
## PMC9548593 PMC9548632 PMC9549113 PMC9549141 PMC9550651 PMC9550834 PMC9550876
## 1 1 5 17 7 3 1
## PMC9551187 PMC9552960 PMC9558508 PMC9558549 PMC9559112 PMC9560165 PMC9561262
## 1 2 10 5 1 2 1
## PMC9561266 PMC9561824 PMC9561881 PMC9568502 PMC9570186 PMC9570865 PMC9571470
## 1 1 1 2 5 3 1
## PMC9574330 PMC9575684 PMC9576059 PMC9578707 PMC9579318 PMC9579347 PMC9579367
## 1 1 1 20 2 2 1
## PMC9581494 PMC9581932 PMC9582937 PMC9584034 PMC9584372 PMC9584885 PMC9584949
## 1 2 1 12 1 1 1
## PMC9584951 PMC9585224 PMC9585938 PMC9586478 PMC9586874 PMC9596373 PMC9596731
## 1 1 2 1 3 1 1
## PMC9597788 PMC9605692 PMC9605867 PMC9606253 PMC9612055 PMC9613476 PMC9616860
## 1 1 5 8 14 1 5
## PMC9617049 PMC9617877 PMC9617917
## 3 1 1
summary(as.numeric(DIST))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.00 1.00 2.77 3.00 20.00
hist(DIST,main="Number of affected Excel files per paper")
# PMC Articles with the most errors
DIST_DF <- as.data.frame(DIST)
DIST_DF <- DIST_DF[order(-DIST_DF$Freq),,drop=FALSE]
head(DIST_DF,20)
## Var1 Freq
## 60 PMC9578707 20
## 39 PMC9549141 17
## 82 PMC9612055 14
## 15 PMC9530606 12
## 67 PMC9584034 12
## 45 PMC9558508 10
## 81 PMC9606253 8
## 40 PMC9550651 7
## 2 PMC9483806 5
## 20 PMC9532393 5
## 29 PMC9546433 5
## 38 PMC9549113 5
## 46 PMC9558549 5
## 54 PMC9570186 5
## 80 PMC9605867 5
## 84 PMC9616860 5
## 6 PMC9518689 4
## 24 PMC9535881 3
## 26 PMC9545044 3
## 41 PMC9550834 3
MOST_ERR_FILES = as.character(DIST_DF[1,1])
MOST_ERR_FILES
## [1] "PMC9578707"
# Number of errors per paper
NERR <- as.numeric(sapply(strsplit(ERROR_GENELISTS," "),"[[",4))
names(NERR) <- sapply(strsplit(ERROR_GENELISTS," "),"[[",1)
NERR <-tapply(NERR, names(NERR), sum)
NERR
## PMC9482892 PMC9483806 PMC9494239 PMC9508470 PMC9512470 PMC9518689 PMC9523360
## 4 57 1 17 2 4 1
## PMC9523398 PMC9523919 PMC9524707 PMC9524857 PMC9529142 PMC9530347 PMC9530462
## 5 1 21 9 1 17 3
## PMC9530606 PMC9531163 PMC9531352 PMC9531459 PMC9531689 PMC9532393 PMC9532972
## 144 20 1 1 1 19 34
## PMC9534044 PMC9534852 PMC9535881 PMC9544324 PMC9545044 PMC9545373 PMC9546057
## 6 1 4 2 30 10 1
## PMC9546433 PMC9546626 PMC9546890 PMC9547055 PMC9547065 PMC9548106 PMC9548527
## 83 26 26 2 7 9 23
## PMC9548593 PMC9548632 PMC9549113 PMC9549141 PMC9550651 PMC9550834 PMC9550876
## 13 17 14 33 126 3 4
## PMC9551187 PMC9552960 PMC9558508 PMC9558549 PMC9559112 PMC9560165 PMC9561262
## 2 8 137 20 23 3 11
## PMC9561266 PMC9561824 PMC9561881 PMC9568502 PMC9570186 PMC9570865 PMC9571470
## 1 16 2 142 34 6 26
## PMC9574330 PMC9575684 PMC9576059 PMC9578707 PMC9579318 PMC9579347 PMC9579367
## 2 4 2 464 2 2 2
## PMC9581494 PMC9581932 PMC9582937 PMC9584034 PMC9584372 PMC9584885 PMC9584949
## 1 2 1 421 7 360 14
## PMC9584951 PMC9585224 PMC9585938 PMC9586478 PMC9586874 PMC9596373 PMC9596731
## 3 1 46 16 6 1 5
## PMC9597788 PMC9605692 PMC9605867 PMC9606253 PMC9612055 PMC9613476 PMC9616860
## 21 7 48 229 121 3 82
## PMC9617049 PMC9617877 PMC9617917
## 187 3 8
hist(NERR,main="number of errors per PMC article")
NERR_DF <- as.data.frame(NERR)
NERR_DF <- NERR_DF[order(-NERR_DF$NERR),,drop=FALSE]
head(NERR_DF,20)
## NERR
## PMC9578707 464
## PMC9584034 421
## PMC9584885 360
## PMC9606253 229
## PMC9617049 187
## PMC9530606 144
## PMC9568502 142
## PMC9558508 137
## PMC9550651 126
## PMC9612055 121
## PMC9546433 83
## PMC9616860 82
## PMC9483806 57
## PMC9605867 48
## PMC9585938 46
## PMC9532972 34
## PMC9570186 34
## PMC9549141 33
## PMC9545044 30
## PMC9546626 26
MOST_ERR = rownames(NERR_DF)[1]
MOST_ERR
## [1] "PMC9578707"
GENELIST_ERROR_ARTICLES <- gsub("PMC","",GENELIST_ERROR_ARTICLES)
### JSON PARSING is more reliable than XML
ARTICLES <- esummary( GENELIST_ERROR_ARTICLES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA$result
ARTICLE_DATA <- ARTICLE_DATA[2:length(ARTICLE_DATA)]
JOURNALS <- unlist(lapply(ARTICLE_DATA,function(x) {x$fulljournalname} ))
JOURNALS_TABLE <- table(JOURNALS)
JOURNALS_TABLE <- JOURNALS_TABLE[order(-JOURNALS_TABLE)]
length(JOURNALS_TABLE)
## [1] 48
NUM_JOURNALS=length(JOURNALS_TABLE)
par(mar=c(5,25,4,2))
barplot(head(JOURNALS_TABLE,10), horiz=TRUE, las=1,
xlab="Articles with gene name errors in supp files",
main="Top journals this month")
Congrats to our Journal of the Month winner!
JOURNAL_WINNER <- names(head(JOURNALS_TABLE,1))
JOURNAL_WINNER
## [1] "Nature Communications"
There are two categories:
Paper with the most suplementary files affected by gene name errors (MOST_ERR_FILES)
Paper with the most gene names converted to dates (MOST_ERR)
Sometimes, one paper can win both categories. Congrats to our winners.
MOST_ERR_FILES <- gsub("PMC","",MOST_ERR_FILES)
ARTICLES <- esummary( MOST_ERR_FILES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA[2]
ARTICLE_DATA
## $result
## $result$uids
## [1] "9578707"
##
## $result$`9578707`
## $result$`9578707`$uid
## [1] "9578707"
##
## $result$`9578707`$pubdate
## [1] "2022 Sep 30"
##
## $result$`9578707`$epubdate
## [1] "2022 Sep 30"
##
## $result$`9578707`$printpubdate
## [1] ""
##
## $result$`9578707`$source
## [1] "eLife"
##
## $result$`9578707`$authors
## name authtype
## 1 Siepe DH Author
## 2 Henneberg LT Author
## 3 Wilson SC Author
## 4 Hess GT Author
## 5 Bassik MC Author
## 6 Zinn K Author
## 7 Garcia KC Author
##
## $result$`9578707`$title
## [1] "Identification of orphan ligand-receptor relationships using a cell-based CRISPRa enrichment screening platform"
##
## $result$`9578707`$volume
## [1] "11"
##
## $result$`9578707`$issue
## [1] ""
##
## $result$`9578707`$pages
## [1] "e81398"
##
## $result$`9578707`$articleids
## idtype value
## 1 pmid 36178190
## 2 doi 10.7554/eLife.81398
## 3 pmcid PMC9578707
##
## $result$`9578707`$fulljournalname
## [1] "eLife"
##
## $result$`9578707`$sortdate
## [1] "2022/09/30 00:00"
##
## $result$`9578707`$pmclivedate
## [1] "2022/10/19"
MOST_ERR <- gsub("PMC","",MOST_ERR)
ARTICLE_DATA <- esummary(MOST_ERR,db = "pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLE_DATA,as= "parsed")
ARTICLE_DATA
## $header
## $header$type
## [1] "esummary"
##
## $header$version
## [1] "0.3"
##
##
## $result
## $result$uids
## [1] "9578707"
##
## $result$`9578707`
## $result$`9578707`$uid
## [1] "9578707"
##
## $result$`9578707`$pubdate
## [1] "2022 Sep 30"
##
## $result$`9578707`$epubdate
## [1] "2022 Sep 30"
##
## $result$`9578707`$printpubdate
## [1] ""
##
## $result$`9578707`$source
## [1] "eLife"
##
## $result$`9578707`$authors
## name authtype
## 1 Siepe DH Author
## 2 Henneberg LT Author
## 3 Wilson SC Author
## 4 Hess GT Author
## 5 Bassik MC Author
## 6 Zinn K Author
## 7 Garcia KC Author
##
## $result$`9578707`$title
## [1] "Identification of orphan ligand-receptor relationships using a cell-based CRISPRa enrichment screening platform"
##
## $result$`9578707`$volume
## [1] "11"
##
## $result$`9578707`$issue
## [1] ""
##
## $result$`9578707`$pages
## [1] "e81398"
##
## $result$`9578707`$articleids
## idtype value
## 1 pmid 36178190
## 2 doi 10.7554/eLife.81398
## 3 pmcid PMC9578707
##
## $result$`9578707`$fulljournalname
## [1] "eLife"
##
## $result$`9578707`$sortdate
## [1] "2022/09/30 00:00"
##
## $result$`9578707`$pmclivedate
## [1] "2022/10/19"
To plot the trend over the past 6-12 months.
url <- "https://ziemann-lab.net/public/gene_name_errors/"
doc <- htmlParse(url)
links <- xpathSApply(doc, "//a/@href")
links <- links[grep("html",links)]
listing <- htmlParse( getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE) )
listing <- xpathSApply(listing, "//a/@href")
listing <- listing[grep("html",listing)]
unlink("online_files/",recursive=TRUE)
dir.create("online_files")
sapply(listing, function(mylink) {
download.file(paste(url,mylink,sep=""),destfile=paste("online_files/",mylink,sep=""))
} )
## href href href href href href href href href href href href href href href href
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## href href href href
## 0 0 0 0
myfilelist <- list.files("online_files/",full.names=TRUE)
trends <- sapply(myfilelist, function(myfilename) {
x <- readLines(myfilename)
# Num XL gene list articles
NUM_GENELIST_ARTICLES <- x[grep("NUM_GENELIST_ARTICLES",x)[3]+1]
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES," "),"[[",3)
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES,"<"),"[[",1)
NUM_GENELIST_ARTICLES <- as.numeric(NUM_GENELIST_ARTICLES)
# number of affected articles
NUM_ERROR_GENELIST_ARTICLES <- x[grep("NUM_ERROR_GENELIST_ARTICLES",x)[3]+1]
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES," "),"[[",3)
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES,"<"),"[[",1)
NUM_ERROR_GENELIST_ARTICLES <- as.numeric(NUM_ERROR_GENELIST_ARTICLES)
# Error proportion
ERROR_PROPORTION <- x[grep("ERROR_PROPORTION",x)[3]+1]
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION," "),"[[",3)
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION,"<"),"[[",1)
ERROR_PROPORTION <- as.numeric(ERROR_PROPORTION)
# number of journals
NUM_JOURNALS <- x[grep('JOURNALS_TABLE',x)[3]+1]
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS," "),"[[",3)
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS,"<"),"[[",1)
NUM_JOURNALS <- as.numeric(NUM_JOURNALS)
NUM_JOURNALS
res <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
return(res)
})
colnames(trends) <- sapply(strsplit(colnames(trends),"_"),"[[",3)
colnames(trends) <- gsub(".html","",colnames(trends))
trends <- as.data.frame(trends)
rownames(trends) <- c("NUM_GENELIST_ARTICLES","NUM_ERROR_GENELIST_ARTICLES","ERROR_PROPORTION","NUM_JOURNALS")
trends <- t(trends)
trends <- as.data.frame(trends)
CURRENT_RES <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
trends <- rbind(trends,CURRENT_RES)
paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
## [1] "2022-11"
rownames(trends)[nrow(trends)] <- paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
plot(trends$NUM_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with Excel gene lists per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_ERROR_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with gene name errors per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$ERROR_PROPORTION, xaxt = "n" , type="b" , main="Proportion of articles with Excel gene list affected by errors",
ylab="proportion", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_JOURNALS, xaxt = "n" , type="b" , main="Number of journals with affected articles",
ylab="number of journals", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
unlink("online_files/",recursive=TRUE)
Zeeberg, B.R., Riss, J., Kane, D.W. et al. Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5, 80 (2004). https://doi.org/10.1186/1471-2105-5-80
Ziemann, M., Eren, Y. & El-Osta, A. Gene name errors are widespread in the scientific literature. Genome Biol 17, 177 (2016). https://doi.org/10.1186/s13059-016-1044-7
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RCurl_1.98-1.9 readxl_1.4.1 reutils_0.2.3 xml2_1.3.3 jsonlite_1.8.3
## [6] XML_3.99-0.12
##
## loaded via a namespace (and not attached):
## [1] knitr_1.40 magrittr_2.0.3 R6_2.5.1 rlang_1.0.6
## [5] fastmap_1.1.0 stringr_1.4.1 highr_0.9 tools_4.2.1
## [9] xfun_0.34 cli_3.4.1 jquerylib_0.1.4 htmltools_0.5.3
## [13] yaml_2.3.6 digest_0.6.30 assertthat_0.2.1 sass_0.4.2
## [17] bitops_1.0-7 cachem_1.0.6 evaluate_0.17 rmarkdown_2.17
## [21] stringi_1.7.8 compiler_4.2.1 bslib_0.4.0 cellranger_1.1.0