Source: https://github.com/markziemann/GeneNameErrors2020
View the reports: http://ziemann-lab.net/public/gene_name_errors/
Gene name errors result when data are imported improperly into MS Excel and other spreadsheet programs (Zeeberg et al, 2004). Certain gene names like MARCH3, SEPT2 and DEC1 are converted into date format. These errors are surprisingly common in supplementary data files in the field of genomics (Ziemann et al, 2016). This could be considered a small error because it only affects a small number of genes, however it is symptomtic of poor data processing methods. The purpose of this script is to identify gene name errors present in supplementary files of PubMed Central articles in the previous month.
library("XML")
library("jsonlite")
library("xml2")
library("reutils")
library("readxl")
Here I will be getting PubMed Central IDs for the previous month.
Start with figuring out the date to search PubMed Central.
CURRENT_MONTH=format(Sys.time(), "%m")
CURRENT_YEAR=format(Sys.time(), "%Y")
if (CURRENT_MONTH == "01") {
PREV_YEAR=as.character(as.numeric(format(Sys.time(), "%Y"))-1)
PREV_MONTH="12"
} else {
PREV_YEAR=CURRENT_YEAR
PREV_MONTH=as.character(as.numeric(format(Sys.time(), "%m"))-1)
}
DATE=paste(PREV_YEAR,"/",PREV_MONTH,sep="")
DATE
## [1] "2021/12"
Let’s see how many PMC IDs we have in the past month.
QUERY ='((genom*[Abstract]))'
ESEARCH_RES <- esearch(term=QUERY, db = "pmc", rettype = "uilist", retmode = "xml", retstart = 0,
retmax = 5000000, usehistory = TRUE, webenv = NULL, querykey = NULL, sort = NULL, field = NULL,
datetype = NULL, reldate = NULL, mindate = DATE, maxdate = DATE)
pmc <- efetch(ESEARCH_RES,retmode="text",rettype="uilist",outfile="pmcids.txt")
## Retrieving UIDs 1 to 500
## Retrieving UIDs 501 to 1000
## Retrieving UIDs 1001 to 1500
## Retrieving UIDs 1501 to 2000
## Retrieving UIDs 2001 to 2500
## Retrieving UIDs 2501 to 3000
## Retrieving UIDs 3001 to 3500
pmc <- read.table(pmc)
pmc <- paste("PMC",pmc$V1,sep="")
NUM_ARTICLES=length(pmc)
NUM_ARTICLES
## [1] 3196
writeLines(pmc,con="pmc.txt")
Now run the bash script. Note that false positives can occur (~1.5%) and these results have not been verified by a human.
Here are some definitions:
NUM_XLS = Number of supplementary Excel files in this set of PMC articles.
NUM_XLS_ARTICLES = Number of articles matching the PubMed Central search which have supplementary Excel files.
GENELISTS = The gene lists found in the Excel files. Each Excel file is counted once even it has multiple gene lists.
NUM_GENELISTS = The number of Excel files with gene lists.
NUM_GENELIST_ARTICLES = The number of PMC articles with supplementary Excel gene lists.
ERROR_GENELISTS = Files suspected to contain gene name errors. The dates and five-digit numbers indicate transmogrified gene names.
NUM_ERROR_GENELISTS = Number of Excel gene lists with errors.
NUM_ERROR_GENELIST_ARTICLES = Number of articles with supplementary Excel gene name errors.
ERROR_PROPORTION = This is the proportion of articles with Excel gene lists that have errors.
system("./gene_names.sh pmc.txt")
results <- readLines("results.txt")
XLS <- results[grep("XLS",results,ignore.case=TRUE)]
NUM_XLS = length(XLS)
NUM_XLS
## [1] 3701
NUM_XLS_ARTICLES = length(unique(sapply(strsplit(XLS," "),"[[",1)))
NUM_XLS_ARTICLES
## [1] 724
GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>2]
#GENELISTS
NUM_GENELISTS <- length(unique(sapply(strsplit(GENELISTS," "),"[[",2)))
NUM_GENELISTS
## [1] 553
NUM_GENELIST_ARTICLES <- length(unique(sapply(strsplit(GENELISTS," "),"[[",1)))
NUM_GENELIST_ARTICLES
## [1] 264
ERROR_GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>3]
#ERROR_GENELISTS
NUM_ERROR_GENELISTS = length(ERROR_GENELISTS)
NUM_ERROR_GENELISTS
## [1] 196
GENELIST_ERROR_ARTICLES <- unique(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
GENELIST_ERROR_ARTICLES
## [1] "PMC8715921" "PMC8702551" "PMC8669026" "PMC8660810" "PMC8716048"
## [6] "PMC8695320" "PMC8690979" "PMC8712874" "PMC8712770" "PMC8671590"
## [11] "PMC8710427" "PMC8691574" "PMC8649620" "PMC8648877" "PMC8648735"
## [16] "PMC8703359" "PMC8672579" "PMC8675923" "PMC8675387" "PMC8690596"
## [21] "PMC8689144" "PMC8642546" "PMC8640014" "PMC8633298" "PMC8686545"
## [26] "PMC8684099" "PMC8634119" "PMC8674316" "PMC8649757" "PMC8626462"
## [31] "PMC8617305" "PMC8672115" "PMC8672055" "PMC8655865" "PMC8655011"
## [36] "PMC8669612" "PMC8667787" "PMC8667366" "PMC8666981" "PMC8665569"
## [41] "PMC8649439" "PMC8642897" "PMC8640463" "PMC8604731" "PMC8602413"
## [46] "PMC8663025" "PMC8656106" "PMC8638096" "PMC8612935" "PMC8657403"
## [51] "PMC8637571" "PMC8636031" "PMC8634728" "PMC8633104" "PMC8632647"
## [56] "PMC8630779" "PMC8627048" "PMC8611784" "PMC8651141" "PMC8650217"
## [61] "PMC8643507" "PMC8641155" "PMC8640099" "PMC8607334" "PMC8626588"
## [66] "PMC8635166" "PMC8635117" "PMC8633307" "PMC8632918" "PMC8631276"
NUM_ERROR_GENELIST_ARTICLES <- length(GENELIST_ERROR_ARTICLES)
NUM_ERROR_GENELIST_ARTICLES
## [1] 70
ERROR_PROPORTION = NUM_ERROR_GENELIST_ARTICLES / NUM_GENELIST_ARTICLES
ERROR_PROPORTION
## [1] 0.2651515
Here you can have a look at all the gene lists detected in the past month, as well as those with errors. The dates are obvious errors, these are commonly dates in September, March, December and October. The five-digit numbers represent dates as they are encoded in the Excel internal format. The five digit number is the number of days since 1900. If you were to take these numbers and put them into Excel and format the cells as dates, then these will also mostly map to dates in September, March, December and October.
#GENELISTS
ERROR_GENELISTS
## [1] "PMC8715921 /pmc/articles/PMC8715921/bin/Data_Sheet_2.xlsx Hsapiens 5 43526 43534 43717 43717 43709"
## [2] "PMC8715921 /pmc/articles/PMC8715921/bin/Data_Sheet_2.xlsx Hsapiens 2 44440 44442"
## [3] "PMC8715921 /pmc/articles/PMC8715921/bin/Data_Sheet_2.xlsx Hsapiens 2 44442 44440"
## [4] "PMC8715921 /pmc/articles/PMC8715921/bin/Data_Sheet_2.xlsx Hsapiens 2 44440 44442"
## [5] "PMC8702551 /pmc/articles/PMC8702551/bin/41598_2021_3567_MOESM2_ESM.xlsx Mmusculus 2 44257 44442"
## [6] "PMC8669026 /pmc/articles/PMC8669026/bin/41398_2021_1756_MOESM1_ESM.xlsx Hsapiens 25 43893 43892 44075 44076 43898 43891 43893 43892 43897 44081 43893 44076 43896 44083 44078 44085 43891 43895 43891 43898 43892 43898 44076 44089 44075"
## [7] "PMC8669026 /pmc/articles/PMC8669026/bin/41398_2021_1756_MOESM1_ESM.xlsx Hsapiens 6 43894 44088 44088 44088 44088 44088"
## [8] "PMC8669026 /pmc/articles/PMC8669026/bin/41398_2021_1756_MOESM1_ESM.xlsx Hsapiens 18 43891 44076 44076 44076 44076 44076 43892 43892 44076 43892 43892 44076 44076 44076 44076 43891 43896 43898"
## [9] "PMC8669026 /pmc/articles/PMC8669026/bin/41398_2021_1756_MOESM1_ESM.xlsx Hsapiens 8 44075 43892 43896 44081 43892 44076 44076 43891"
## [10] "PMC8669026 /pmc/articles/PMC8669026/bin/41398_2021_1756_MOESM1_ESM.xlsx Hsapiens 3 43896 44085 43891"
## [11] "PMC8669026 /pmc/articles/PMC8669026/bin/41398_2021_1756_MOESM1_ESM.xlsx Hsapiens 15 43898 44076 44076 43898 43892 44076 43891 43896 43892 43892 44078 44076 44076 43891 43891"
## [12] "PMC8669026 /pmc/articles/PMC8669026/bin/41398_2021_1756_MOESM1_ESM.xlsx Hsapiens 6 43898 44088 43891 43896 43892 43891"
## [13] "PMC8669026 /pmc/articles/PMC8669026/bin/41398_2021_1756_MOESM1_ESM.xlsx Hsapiens 2 44166 44088"
## [14] "PMC8669026 /pmc/articles/PMC8669026/bin/41398_2021_1756_MOESM1_ESM.xlsx Hsapiens 1 43893"
## [15] "PMC8669026 /pmc/articles/PMC8669026/bin/41398_2021_1756_MOESM1_ESM.xlsx Hsapiens 1 43898"
## [16] "PMC8669026 /pmc/articles/PMC8669026/bin/41398_2021_1756_MOESM1_ESM.xlsx Hsapiens 1 44075"
## [17] "PMC8669026 /pmc/articles/PMC8669026/bin/41398_2021_1756_MOESM1_ESM.xlsx Hsapiens 1 44088"
## [18] "PMC8660810 /pmc/articles/PMC8660810/bin/41467_2021_27506_MOESM5_ESM.xlsx Dmelanogaster 1 44078"
## [19] "PMC8660810 /pmc/articles/PMC8660810/bin/41467_2021_27506_MOESM5_ESM.xlsx Dmelanogaster 3 44079 44076 44078"
## [20] "PMC8660810 /pmc/articles/PMC8660810/bin/41467_2021_27506_MOESM5_ESM.xlsx Dmelanogaster 1 44078"
## [21] "PMC8660810 /pmc/articles/PMC8660810/bin/41467_2021_27506_MOESM5_ESM.xlsx Dmelanogaster 2 44076 44079"
## [22] "PMC8716048 /pmc/articles/PMC8716048/bin/pone.0260811.s003.xlsx Hsapiens 1 43901"
## [23] "PMC8695320 /pmc/articles/PMC8695320/bin/mmc4.xlsx Hsapiens 1 44451"
## [24] "PMC8690979 /pmc/articles/PMC8690979/bin/12711_2021_684_MOESM4_ESM.xlsx Hsapiens 1 43898"
## [25] "PMC8712874 /pmc/articles/PMC8712874/bin/Table1.XLSX Hsapiens 2 44077 44166"
## [26] "PMC8712874 /pmc/articles/PMC8712874/bin/Table7.XLSX Hsapiens 1 44531"
## [27] "PMC8712770 /pmc/articles/PMC8712770/bin/Table10.XLSX Hsapiens 4 44261 44256 44260 44454"
## [28] "PMC8712770 /pmc/articles/PMC8712770/bin/Table4.XLSX Hsapiens 2 38596 38047"
## [29] "PMC8712770 /pmc/articles/PMC8712770/bin/Table4.XLSX Hsapiens 4 36951 38596 37865 38231"
## [30] "PMC8712770 /pmc/articles/PMC8712770/bin/Table4.XLSX Hsapiens 6 38596 36951 37865 39692 36951 40787"
## [31] "PMC8712770 /pmc/articles/PMC8712770/bin/Table4.XLSX Hsapiens 8 38596 36951 39692 37865 36951 38777 40787 38231"
## [32] "PMC8712770 /pmc/articles/PMC8712770/bin/Table4.XLSX Hsapiens 2 37865 36951"
## [33] "PMC8712770 /pmc/articles/PMC8712770/bin/Table4.XLSX Hsapiens 1 38231"
## [34] "PMC8712770 /pmc/articles/PMC8712770/bin/Table4.XLSX Hsapiens 2 38596 38231"
## [35] "PMC8712770 /pmc/articles/PMC8712770/bin/Table4.XLSX Hsapiens 4 38231 38047 38596 37865"
## [36] "PMC8712770 /pmc/articles/PMC8712770/bin/Table6.XLSX Hsapiens 2 37865 36951"
## [37] "PMC8712770 /pmc/articles/PMC8712770/bin/Table6.XLSX Hsapiens 1 37316"
## [38] "PMC8712770 /pmc/articles/PMC8712770/bin/Table8.XLSX Hsapiens 23 37316 37865 38231 37316 39326 38961 39142 40787 39873 38047 36951 38777 39692 39508 37500 37681 37135 42248 40057 38596 36951 40422 38412"
## [39] "PMC8671590 /pmc/articles/PMC8671590/bin/41419_2021_4436_MOESM5_ESM.xlsx Rnorvegicus 7 43896 44085 44076 44082 44082 44084 44082"
## [40] "PMC8671590 /pmc/articles/PMC8671590/bin/41419_2021_4436_MOESM7_ESM.xlsx Mmusculus 1 43899"
## [41] "PMC8671590 /pmc/articles/PMC8671590/bin/41419_2021_4436_MOESM8_ESM.xlsx Mmusculus 945 43896 43896 43898 44082 44083 44084 43897 44082 44082 43898 44085 44085 43897 44082 44082 44082 44083 44085 44085 43896 43898 44085 43896 43898 44085 43898 43898 44082 44084 43895 43896 43898 44076 44081 44084 43895 43896 43898 44076 44081 44084 43898 43898 44082 44084 43896 43898 44085 44089 43895 43898 43898 43898 43895 43898 43898 43898 44081 43898 44076 44082 43898 44082 43898 44080 44081 44082 43896 44082 43898 44083 43891 43895 43896 43898 44085 43896 43898 44076 43891 43895 43896 43898 44085 43898 44081 44082 43895 43895 44082 44082 43895 43898 43898 43898 44081 44083 44081 44085 44076 43896 44083 44076 43896 44083 44082 44084 43895 44082 43895 43896 43896 43898 43898 43898 44085 43896 43898 43897 44076 43896 43898 44082 44081 44082 44082 43898 44076 43896 43898 44081 44082 43895 43898 43898 44076 43897 44082 44082 43896 43896 44082 44085 43896 44085 44081 44083 44081 44083 43896 43898 44083 44082 44085 43896 43898 43898 43898 44082 44082 44083 44083 43896 44075 43895 44084 43895 43898 44081 44085 43898 43898 44083 43898 43898 44076 44082 43898 43898 44089 43895 43896 43898 44076 44081 44084 43898 44082 44085 44083 43898 44083 43898 44082 44085 43898 44082 44085 43898 44082 44085 44082 43898 44083 43898 44083 44083 43898 43898 44083 43896 43898 43898 44083 44083 44085 43897 43898 43898 44082 43898 44082 44083 44083 44076 44085 43896 43898 43898 43898 44076 43898 43898 44082 44081 43898 44082 44083 44082 43895 44082 43895 43896 43896 43896 43898 43898 44076 44076 44081 44082 44083 44083 44083 44085 44085 43898 43898 44082 44085 44082 44082 44083 44083 43898 44082 44084 43896 44089 44082 43898 43898 43898 43898 43898 43898 44083 43898 43898 44083 44082 44082 43895 43896 44083 43898 43898 43895 43898 43896 43898 43891 43898 43898 43896 43898 43891 43898 43898 43895 44080 44085 44081 44082 44082 43891 43896 43898 44076 43891 43896 43898 44076 44083 43896 43898 44076 44083 43895 43895 43898 43895 43896 43898 44076 44081 44084 43895 43896 43898 44076 44081 44084 43896 43898 43898 43898 43898 43896 43898 44083 44084 43896 43898 43898 43898 43898 43898 44084 44085 44082 44081 44081 44089 44085 43896 43896 43898 44082 44083 43898 43898 43898 44085 43891 44084 43898 43896 44085 43895 43898 43895 43898 43895 43897 44082 43896 44076 44089 43896 44076 43895 43896 43898 44082 44082 44082 43895 43896 43898 44082 44082 44082 44076 43896 43896 44082 43896 44082 43895 43898 43898 44076 44081 44081 43896 44082 44082 43895 43898 43898 44076 43898 44084 44084 44084 43896 43896 44080 44085 43898 43898 44075 44082 43895 43896 44076 44082 43895 44076 43895 43898 44082 44076 44082 44076 44082 43898 43898 44076 44082 44085 44076 44082 43895 43898 43898 44076 43895 43898 43898 44076 43898 43895 43896 43896 43898 44080 44082 44076 44085 43898 43898 44085 43898 43896 43898 44081 44089 43898 44082 44082 43898 43898 43898 43898 43898 44084 43896 43896 43898 44082 43898 43895 43898 43898 43898 43896 43898 44080 44076 44085 44085 43896 43898 44082 44083 43898 44080 44081 44082 44080 44081 44082 43898 43895 43898 44080 44081 44082 43898 44080 44081 44082 43898 44080 44081 44082 43896 44080 44081 44082 44076 44083 44083 43895 43898 44080 44083 43898 43898 43895 44085 43896 43898 43898 44081 44085 43895 44080 44085 43898 43898 44076 43898 43896 43898 43898 43897 43898 43898 43898 44082 44083 43895 44076 43896 43898 44085 43898 43898 43895 43895 43898 44083 43898 44081 44085 43898 43898 44076 44083 44089 43898 43898 44082 43895 44085 43898 44082 44082 44081 44083 43895 43898 44076 44082 43898 44082 43895 43896 43898 44076 43898 43895 43896 43896 44083 43898 43898 43898 44082 44083 43898 43896 43898 43896 43898 43898 44085 43895 43898 43898 44076 43896 43891 43896 44085 43895 43896 43898 43898 44082 43898 43896 43898 43898 43898 43898 44082 44083 43896 43891 43896 44083 44076 44082 43896 43896 43896 43898 43896 44083 43898 44082 43895 44076 43896 44085 44082 43896 44085 43898 43898 44082 43896 43895 44075 44082 44085 43897 43895 43898 44083 43896 43896 43895 43898 44082 44083 44083 43896 43898 43891 43896 43898 44076 43898 43898 43898 44082 43896 43898 43896 43896 44076 43896 44076 44076 44081 44082 43896 44082 44082 44080 44081 44080 43896 43896 44080 43898 44080 43898 43898 44080 44081 43895 44082 43895 43898 43898 44076 43895 43895 43898 43898 44076 43896 43895 43898 43898 44076 43895 43896 43896 43896 43898 43898 43898 44082 43896 43898 44082 43895 43895 44082 44083 43895 43898 44080 43896 43896 43898 43896 43898 43896 43898 43898 43895 43895 43896 43897 43898 43898 44081 44081 44089 44082 44081 43898 43898 43898 43898 44083 44085 44082 43896 44082 43896 43898 43898 43895 43895 43898 43898 43895 43898 43896 44083 43895 44075 44076 44085 43898 44080 44081 43898 43898 43898 43896 43898 43898 43898 44082 44082 44083 44083 43898 44082 43895 43898 43898 43898 44082 43895 43896 43898 44083 44085 43895 43898 43898 44076 43898 43898 43898 43898 43895 43896 43897 44083 44085 44085 43898 43896 43898 44083 43898 43898 44082 43895 44082 44082 44080 44082 43896 43896 43898 43898 44089 43891 43898 43898 44076 44083 44084 43898 44085 43898 43898 44084 43896 44076 44082 44083 43895 44082 44085 43898 43898 44083 43895 44076 43895 43896 43898 44082 44082 44082 43895 43896 43898 43898 43898 44081 44082 44082 44083 44084 44082 44082 44076 43898 44089 43898 44081 43898 44083 43895 43898 43898 43896 44082 43895 43898 43898 44076 44081 44082 44083 44083 43898 43898 43895 43895 43896 43896 43897 43898 43898 44081 44081 44089 44083 44082 43895 43891 43898 44083 43898 43898 44082 44082 43895 43898 43897 43897 43898 44082 43895 43896 43898 44076 44081 44084 43895 43896 43896 43898 44081 44083 43896 43898 44081 44083"
## [42] "PMC8671590 /pmc/articles/PMC8671590/bin/41419_2021_4436_MOESM8_ESM.xlsx Mmusculus 1199 43893 43897 43897 44075 43893 43897 43897 44075 43893 43897 43897 44075 43893 43897 43897 44075 43893 43897 43897 44075 43893 43897 43897 44075 44084 43893 43897 43897 44075 43893 43897 43897 44075 43893 43897 43897 44075 43897 43897 44082 43893 43896 43897 43893 43896 43897 43893 43896 44078 44085 43893 43898 44081 44085 43893 43898 44081 44081 44085 43896 43896 43896 43896 43897 43898 43898 44081 44089 43893 43897 44085 43893 43893 43897 44085 43896 43896 43896 43896 43897 43898 43898 44081 44089 43893 43898 44078 44081 44081 44084 44085 44085 44084 43898 44082 43897 44083 44083 43895 44084 43893 43895 43895 44076 44082 43896 43893 43896 44075 43893 43893 43896 43897 44075 44075 44078 43897 44075 44075 43896 43897 44082 43898 44085 43893 43893 43896 43897 43898 44078 44082 44084 44085 43896 44081 44081 44083 44081 43893 43896 43897 43898 43896 43897 43897 44081 44081 44084 44085 43893 43896 43897 43898 43896 43897 43897 43897 44081 44081 44084 44085 43896 44076 44083 43897 44078 43897 43893 44084 44089 44084 44089 44078 43896 44081 44082 44084 43893 43893 44080 44083 43897 43891 43893 43896 43896 44076 44085 43896 43896 43896 44081 44081 44085 43895 44082 43896 44085 43893 43896 44082 44082 43893 43896 43897 43898 43893 43896 43897 43896 44078 44084 44084 44085 44089 43896 43898 44089 44089 43893 43896 43893 43896 44082 43896 44075 44085 43895 43896 43897 43895 43897 43893 43896 43897 44075 44081 43897 43898 43893 43896 43896 43896 43896 43896 43897 43898 43898 43898 44081 44089 43895 44078 43893 43896 43896 43896 43896 43896 43897 43898 43898 43898 44081 44089 43896 43891 43895 43893 43896 43896 43896 43896 43896 43897 43898 43898 43898 44081 44089 43893 43896 43898 44082 44082 44089 43893 43897 44085 43893 43898 43897 43895 43893 43898 43893 43898 43893 43898 43896 43896 43897 44081 43897 44078 44083 43893 43893 43898 43898 43897 43897 44076 43896 43898 43898 44083 43898 44082 44084 44083 43893 44079 44083 44085 43893 44081 43896 43897 44085 43897 43893 43896 43897 44081 43897 44076 43897 44075 43893 44075 44084 44076 44085 44081 44085 43895 44085 43896 44082 44084 43891 43895 43895 43895 43896 43896 43896 43896 43896 43896 43896 43896 43896 43896 43896 43896 43897 43897 43897 43897 43897 43898 44075 44076 44076 44076 44078 44078 44078 44078 44080 44081 44081 44081 44083 44083 44083 44083 44083 44084 44084 44084 44084 44085 44085 44085 44089 44089 44089 44089 44089 43893 43893 43898 43898 44082 43898 43895 43893 43896 43897 43898 44076 44080 43897 44075 44083 44083 44083 44085 44089 44082 43896 43896 43896 43897 44083 44081 43897 44081 43893 43898 44082 43896 43896 43893 43893 43896 43896 43896 43896 43896 43898 43898 44089 43895 43893 43895 43893 43893 43896 43898 44076 44080 44082 44082 44085 44085 43893 44082 43897 43897 43897 43897 44076 43896 44075 43897 44089 44085 44089 43896 43893 43893 43896 43898 44089 43893 43893 43896 43898 44089 43898 43898 43891 44076 44081 44082 43898 43898 43898 43896 44081 43895 43898 44076 44078 44081 44082 44084 43898 43896 43897 44081 43893 43897 44085 43893 43897 44085 43896 44076 43893 44083 43893 43896 43896 43896 43896 43897 43898 44078 44081 44089 44084 44076 44076 43893 43893 43897 44081 43897 44085 43896 43896 44081 44075 44085 44085 44081 43893 44078 44078 44078 43893 43897 43897 44076 44078 44082 44084 44081 44083 44089 43895 43897 43895 43895 44081 43896 43893 43896 43896 43897 43898 43898 44078 44078 44082 44084 44085 44078 43893 43896 43896 43897 43898 43898 44078 44078 44082 44084 44085 44079 43898 43895 43895 43896 43896 43897 43895 44082 43893 43897 43898 43895 43895 43893 43897 43898 43893 43897 43896 44089 43893 43895 43895 43897 43898 43898 43898 44083 44089 44089 43898 44089 43896 43897 43893 44076 43898 44089 43898 44089 43896 44076 43893 43897 44089 43893 43896 43897 43898 44081 44083 44089 43893 43896 43897 43898 44081 44083 43893 43897 43898 43893 43897 43898 43897 44076 44082 44082 44084 44078 44078 43893 43893 43896 44075 43896 43897 44084 44085 43895 44082 44081 43893 43893 44085 43896 43898 44089 43896 44076 44083 44081 44083 44084 44085 43893 43896 43897 43898 43896 44082 43893 44083 44085 44089 43898 43898 44078 44083 44083 44076 44089 43896 44075 44084 44084 43897 44075 43896 43896 44085 44089 44085 44089 43896 43897 44075 43896 43897 44075 43896 44089 44084 43896 44078 44084 43893 43897 44084 43896 44080 43897 44089 44085 43893 43896 43896 43896 43896 43896 43897 43898 43898 43898 44081 44089 43896 43896 43897 43898 44085 43898 43898 44082 43895 44078 44081 44081 44083 43893 43896 44082 43896 43897 44082 44078 44081 44081 44083 43898 44083 44089 43897 43897 43898 44075 44076 44081 44082 44085 44089 43896 43897 44081 43895 43897 44076 44082 44082 44083 43896 43896 43896 43896 43896 43896 44078 44089 43896 43898 43893 43893 43896 43898 43896 43898 44081 43896 43897 43897 44083 43896 43897 43898 43898 44078 44082 43895 43896 43897 43896 43891 43896 43891 43896 44075 44080 44085 43893 43897 43897 43898 43897 44075 44075 44083 43895 43896 44082 44082 44085 43896 44082 43897 44075 44082 44085 43896 43897 44081 44082 44084 44085 44078 44081 43895 43898 44081 44083 44078 44081 43893 43897 44076 43896 43897 43898 43896 43897 43898 44075 44083 44084 44089 43898 44085 43897 44076 44081 43893 44076 43898 44081 43893 43897 44089 43896 43898 44082 44083 44089 43896 44076 44078 44083 43893 43895 43897 43898 43897 43897 43897 43897 44078 44083 44081 43891 43896 43896 43896 43895 43896 43896 43896 43896 44084 43896 43896 44078 44084 43896 43898 43896 43896 43897 44078 43896 43896 43896 43896 43898 44078 43896 43898 44076 43896 43898 43896 43898 43896 44084 43893 43897 43893 43897 43897 43898 43898 43893 43897 43897 43898 43893 43897 43897 43898 43898 43898 43896 43897 44076 43896 44081 44083 44082 43898 44078 44082 44078 44084 43893 43895 43897 43898 44085 43897 44081 43897 44084 43893 43896 43896 43896 43896 43897 43898 43898 44081 44089 44076 44078 44084 43897 43896 43897 43898 43896 43897 43898 43896 43893 43896 43896 43896 43896 43896 43897 43898 43898 43898 44081 44089 44085 43895 43895 43896 43897 43898 44076 43895 44075 44089 43895 43897 43898 44084 44081 44089 43897 43896 43897 44085 43893 43896 43897 43898 43895 43896 43897 43896 43897 43897 44080 44082 43896 43898 43898 43898 44083 44085 43897 44089 43896 43895 43897 44078 43893 44076 44076 43893 43897 44076 43897 43898 43898 44081 44081 44085 44085 44083 43893 43897 43898 44076 43893 44084 44089 44081 44083 43898 43895 43898 43896 44085 43891 43897 43897 43896 44085 43896 44085 43898 43896 44085 43896 43897 43897 44075 43896 43897 43893 43896 43896 43897 43898 43898 44078 44078 44082 44084 44085 43896 44084 43895 43896 43897 43893 44078 44083 43896 44076 44078 43897 44085 44085 43897 43897 43897 44076 44078 44081 44082 44082 44084 44084 43896 43898 44083 43896 43897 43895 43895 43896 43897 43896 43898 44076 44081 44082 44082 44083 44084 44084 44085 43891 43896 43897 44083 43895 43896 44085 43897 44078 43897 44078 43893 43896 43896 43896 43896 43897 43897 43893 43893 43895 43896 43898 44078 44082 44083 43896 43898 44078 44083 44076 43896 43898 44075 43893 43897 44085 44076 43896 43896 43897 44081 43893 43897 43897 44075 43896 44082 43896 44082"
## [43] "PMC8671590 /pmc/articles/PMC8671590/bin/41419_2021_4436_MOESM8_ESM.xlsx Mmusculus 910 43893 43893 43893 43893 43893 43897 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43898 43893 43893 43898 43893 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 43893 43893 43893 43893 43893 43898 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43898 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 43893 43896 43893 43898 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 44083 44083 43893 43893 43893 43893 43893 43893 43893 43893 43896 43893 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43898 44083 43896 43893 43893 43893 43893 44083 43893 43893 43898 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43896 43893 43896 43893 43893 43893 43893 44083 43898 43893 43893 43893 43893 43893 43893 43893 43893 43898 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43898 43893 43893 43893 43898 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 43893 43898 43893 43893 44083 43893 43893 43893 44083 43893 43893 43893 43893 44076 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43898 43893 43893 43893 43893 43893 44083 43893 43898 43893 43893 43893 43893 43893 43893 43893 44083 43896 43896 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43898 43893 43893 43893 43893 43893 43893 43893 43898 43893 43893 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43897 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 43893 43893 43893 44083 43893 43893 43898 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 43893 44076 43893 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43896 43897 43893 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 43893 43893 43893 43893 43893 43896 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43898 44083 44076 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43896 43896 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 44076 43893 43893 43893 43893 43893 43893 43893 43898 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 44083 43893 43893 43893 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43898 43893 43893 43893 43893 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43896 43893 44083 43896 43893 43893 44083 43893 43893 44076 43893 43893 43893 43893 44076 43893 43893 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43896 43893 43893 43893 44083 43893 43893 43893 43898 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 43893 44083 43893 44083 43893 43893 43893 43893 43893 43893 44076 43893 44076 43893 44083 43893 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 43898 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43896 43893 43893 43893 43893 43893 43893 43897 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43896 43893 43893 43893 43898 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 43898 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 44076 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 43893 43893 43893 43893 43893 44083 44083 43893 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44083 44076 44083 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 44076 43898 43893 44083 43893 43893 43893 43893 43896 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893 43893"
## [44] "PMC8710427 /pmc/articles/PMC8710427/bin/NIHMS1763210-supplement-2.xlsx Mmusculus 26 42980 42801 42989 42802 42795 42985 42987 42986 42982 42805 42981 42799 42798 42795 42796 42992 42979 42984 42988 42803 42804 42800 42990 42983 42796 42797"
## [45] "PMC8691574 /pmc/articles/PMC8691574/bin/Table3.XLSX Hsapiens 1 44443"
## [46] "PMC8649620 /pmc/articles/PMC8649620/bin/mmc3.xlsx Hsapiens 24 43710 43715 43717 43530 43719 43713 43716 43526 43711 43531 43712 43718 43532 43714 43525 43529 43533 43527 43709 43528 43534 43800 43535 43720"
## [47] "PMC8649620 /pmc/articles/PMC8649620/bin/mmc3.xlsx Hsapiens 14 43525 43526 43529 43718 43719 43710 43711 43712 43713 43714 43715 43716 43717 43713"
## [48] "PMC8649620 /pmc/articles/PMC8649620/bin/mmc4.xlsx Hsapiens 24 43710 43715 43717 43530 43719 43713 43716 43526 43711 43531 43712 43718 43532 43714 43525 43529 43533 43527 43709 43528 43534 43800 43535 43720"
## [49] "PMC8649620 /pmc/articles/PMC8649620/bin/mmc4.xlsx Hsapiens 4 43712 43711 43528 43715"
## [50] "PMC8648877 /pmc/articles/PMC8648877/bin/41467_2021_27341_MOESM10_ESM.xlsx Hsapiens 6 44440 44440 44256 44263 44442 44442"
## [51] "PMC8648735 /pmc/articles/PMC8648735/bin/41467_2021_27427_MOESM4_ESM.xlsx Hsapiens 143 44454 44257 44256 44256 44256 44449 44449 44449 44449 44449 44449 44262 44262 44259 44259 44441 44441 44441 44441 44441 44441 44450 44450 44450 44450 44450 44450 44450 44256 44256 44256 44256 44256 44256 44256 44256 44256 44256 44256 44256 44256 44256 44256 44261 44261 44266 44266 44266 44258 44258 44258 44258 44258 44258 44258 44258 44258 44258 44447 44447 44447 44447 44447 44447 44447 44447 44447 44447 44447 44447 44447 44446 44446 44452 44531 44531 44531 44531 44531 44531 44531 44263 44263 44263 44263 44260 44260 44451 44451 44451 44440 44440 44443 44443 44265 44265 44265 44265 44265 44265 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44448 44257 44257 44257 44257"
## [52] "PMC8703359 zip/TableS8-AlternativeSplicingGene.xlsx Ggallus 6 43351 43351 43351 43349 43349 43166"
## [53] "PMC8703359 zip/TableS8-AlternativeSplicingGene.xlsx Hsapiens 3 43351 43349 43166"
## [54] "PMC8703359 zip/TableS5-filteredIsoform.xlsx Hsapiens 42 43351 43351 43355 43348 43348 43348 43352 43347 43170 43350 43350 43160 43160 43349 43349 43160 43354 43354 43167 43164 43164 43163 43166 43166 43345 43351 43348 43352 43350 43350 43350 43350 43165 43161 43349 43349 43354 43354 43166 43166 43166 43345"
## [55] "PMC8703359 zip/TableS5-filteredIsoform.xlsx Hsapiens 42 43351 43351 43348 43348 43348 43352 43347 43170 43350 43350 43350 43160 43160 43349 43160 43160 43354 43354 43167 43164 43164 43163 43166 43166 43345 43351 43348 43352 43350 43350 43350 43350 43165 43161 43349 43349 43349 43354 43354 43166 43166 43345"
## [56] "PMC8703359 zip/TableS5-filteredIsoform.xlsx Hsapiens 40 43351 43351 43355 43348 43348 43348 43352 43347 43170 43350 43350 43160 43160 43349 43349 43160 43354 43354 43167 43164 43164 43163 43166 43166 43345 43351 43348 43352 43350 43350 43350 43350 43165 43161 43349 43349 43354 43166 43166 43345"
## [57] "PMC8703359 zip/TableS5-filteredIsoform.xlsx Hsapiens 47 43351 43351 43355 43348 43348 43348 43352 43347 43170 43350 43350 43350 43160 43160 43349 43349 43160 43160 43354 43354 43167 43164 43164 43163 43166 43166 43345 43351 43348 43352 43350 43350 43350 43350 43350 43165 43161 43349 43349 43349 43354 43354 43354 43166 43166 43166 43345"
## [58] "PMC8703359 zip/TableS3-Ensemble_Ref_transcriptInfo.xlsx Hsapiens 39 43346 43350 43350 43350 43170 43165 43165 43160 43160 43349 43349 43349 43349 43349 43160 43160 43160 43354 43354 43354 43354 43162 43163 43166 43166 43167 43167 43164 43164 43345 43351 43351 43355 43348 43348 43348 43352 43347 43161"
## [59] "PMC8703359 zip/TableS2-Ensemble_Ref_geneInfo.xlsx Hsapiens 20 43346 43350 43170 43165 43160 43349 43160 43354 43162 43163 43166 43167 43164 43345 43351 43355 43348 43352 43347 43161"
## [60] "PMC8703359 zip/TableS4-filteredGene.xlsx Hsapiens 14 43351 43355 43348 43352 43347 43170 43350 43165 43161 43160 43349 43160 43163 43345"
## [61] "PMC8703359 zip/TableS4-filteredGene.xlsx Hsapiens 13 43351 43348 43352 43347 43170 43350 43165 43161 43160 43349 43160 43163 43345"
## [62] "PMC8703359 zip/TableS4-filteredGene.xlsx Hsapiens 14 43351 43355 43348 43352 43347 43170 43350 43165 43161 43160 43349 43160 43163 43345"
## [63] "PMC8672579 /pmc/articles/PMC8672579/bin/13059_2021_2539_MOESM2_ESM.xlsx Mmusculus 6 36951 38412 39142 40787 37865 38961"
## [64] "PMC8672579 /pmc/articles/PMC8672579/bin/13059_2021_2539_MOESM2_ESM.xlsx Mmusculus 1 39326"
## [65] "PMC8672579 /pmc/articles/PMC8672579/bin/13059_2021_2539_MOESM2_ESM.xlsx Mmusculus 4 37316 39873 42248 38596"
## [66] "PMC8675923 /pmc/articles/PMC8675923/bin/ppat.1010141.s001.xlsx Hsapiens 1 43349"
## [67] "PMC8675923 /pmc/articles/PMC8675923/bin/ppat.1010141.s001.xlsx Hsapiens 2 43349 43350"
## [68] "PMC8675923 /pmc/articles/PMC8675923/bin/ppat.1010141.s001.xlsx Hsapiens 2 43349 43350"
## [69] "PMC8675387 /pmc/articles/PMC8675387/bin/Table2.XLSX Mmusculus 27 44440 44446 44447 44449 44443 44266 44442 44451 44260 44263 44441 44256 44262 44258 44453 44256 44261 44259 44264 44445 44450 44448 44444 44257 44265 44257 44441"
## [70] "PMC8690596 /pmc/articles/PMC8690596/bin/NIHMS1610894-supplement-1610894_Source_Dat_Fig_4.xlsx Mmusculus 25 43891 43900 43901 43892 43893 43894 43895 43896 43897 43898 43899 44089 44075 44084 44085 44086 44088 44076 44077 44078 44079 44080 44081 44082 44083"
## [71] "PMC8690596 /pmc/articles/PMC8690596/bin/NIHMS1610894-supplement-1610894_Source_Dat_Fig_5.xlsx Mmusculus 25 43901 44086 43898 44083 43900 44080 44084 44082 44079 43892 44076 44089 44088 44085 43897 44077 43895 43899 44081 43893 44075 43894 43891 43896 44078"
## [72] "PMC8689144 /pmc/articles/PMC8689144/bin/Table1.XLSX Mmusculus 1 40603"
## [73] "PMC8642546 /pmc/articles/PMC8642546/bin/41467_2021_27367_MOESM4_ESM.xlsx Hsapiens 24 44531 44443 44261 44442 44441 44453 44265 44449 44450 44454 44259 44266 44445 44451 44258 44446 44257 44448 44260 44447 44264 44263 44262 44444"
## [74] "PMC8642546 /pmc/articles/PMC8642546/bin/41467_2021_27367_MOESM5_ESM.xlsx Hsapiens 144 44266 44263 44441 44442 44265 44454 44265 44444 44449 44259 44261 44258 44259 44442 44451 44258 44442 44441 44261 44531 44262 44451 44444 44453 44443 44261 44263 44258 44263 44442 44446 44448 44441 44451 44447 44448 44446 44443 44257 44263 44445 44266 44454 44445 44446 44262 44264 44447 44450 44454 44443 44450 44449 44265 44451 44448 44258 44453 44259 44454 44264 44441 44450 44449 44445 44444 44448 44261 44453 44258 44443 44442 44531 44454 44446 44441 44447 44531 44257 44260 44449 44264 44450 44259 44261 44266 44441 44263 44260 44448 44450 44257 44453 44445 44257 44531 44262 44443 44443 44262 44260 44446 44262 44449 44262 44264 44257 44446 44445 44259 44260 44447 44264 44531 44444 44451 44445 44258 44266 44266 44447 44449 44257 44265 44450 44261 44453 44266 44531 44259 44263 44265 44444 44260 44442 44448 44454 44453 44444 44264 44447 44451 44265 44260"
## [75] "PMC8640014 /pmc/articles/PMC8640014/bin/41467_2021_26840_MOESM7_ESM.xlsx Hsapiens 3 37226 37316 36951"
## [76] "PMC8633298 /pmc/articles/PMC8633298/bin/41467_2021_27258_MOESM4_ESM.xlsx Hsapiens 3 37226 37226 36951"
## [77] "PMC8633298 /pmc/articles/PMC8633298/bin/41467_2021_27258_MOESM4_ESM.xlsx Hsapiens 7 37226 36951 40057 37226 36951 36951 36951"
## [78] "PMC8686545 /pmc/articles/PMC8686545/bin/13046_2021_2210_MOESM3_ESM.xlsx Hsapiens 2 37316 37865"
## [79] "PMC8684099 /pmc/articles/PMC8684099/bin/12935_2021_2342_MOESM1_ESM.xlsx Hsapiens 1 44445"
## [80] "PMC8634119 zip/Dataset_EV3.xlsx Scerevisiae 1 44470"
## [81] "PMC8634119 zip/Dataset_EV3.xlsx Scerevisiae 1 44470"
## [82] "PMC8634119 zip/Dataset_EV1.xlsx Scerevisiae 2 44470 44340"
## [83] "PMC8674316 /pmc/articles/PMC8674316/bin/41598_2021_3432_MOESM6_ESM.xlsx Hsapiens 1 44258"
## [84] "PMC8649757 /pmc/articles/PMC8649757/bin/mbio.01766-21-sd002.xlsx Hsapiens 21 36951 37681 42248 38961 40422 40057 40787 38231 39326 39692 38596 39142 36951 37500 37316 38412 37135 38777 39873 39508 37865"
## [85] "PMC8649757 /pmc/articles/PMC8649757/bin/mbio.01766-21-sd003.xlsx Hsapiens 21 37500 40787 39873 39508 40422 37681 38231 36951 37135 40057 39326 37865 42248 37316 36951 39692 38777 38412 38596 38961 39142"
## [86] "PMC8626462 /pmc/articles/PMC8626462/bin/41467_2021_27176_MOESM6_ESM.xlsx Hsapiens 4 43714 43718 43723 43713"
## [87] "PMC8626462 /pmc/articles/PMC8626462/bin/41467_2021_27176_MOESM6_ESM.xlsx Hsapiens 4 43714 43718 43713 43723"
## [88] "PMC8626462 /pmc/articles/PMC8626462/bin/41467_2021_27176_MOESM6_ESM.xlsx Hsapiens 4 43714 43718 43713 43723"
## [89] "PMC8626462 /pmc/articles/PMC8626462/bin/41467_2021_27176_MOESM6_ESM.xlsx Hsapiens 12 43719 43710 43715 43717 43718 43710 43719 43716 43713 43712 43714 43716"
## [90] "PMC8626462 /pmc/articles/PMC8626462/bin/41467_2021_27176_MOESM6_ESM.xlsx Hsapiens 12 43717 43719 43715 43710 43714 43716 43716 43718 43713 43712 43710 43719"
## [91] "PMC8626462 /pmc/articles/PMC8626462/bin/41467_2021_27176_MOESM6_ESM.xlsx Hsapiens 12 43719 43717 43715 43710 43714 43719 43716 43716 43718 43713 43710 43712"
## [92] "PMC8626462 /pmc/articles/PMC8626462/bin/41467_2021_27176_MOESM7_ESM.xlsx Hsapiens 2 43710 43716"
## [93] "PMC8626462 /pmc/articles/PMC8626462/bin/41467_2021_27176_MOESM7_ESM.xlsx Hsapiens 2 43715 43717"
## [94] "PMC8626462 /pmc/articles/PMC8626462/bin/41467_2021_27176_MOESM7_ESM.xlsx Hsapiens 6 43717 43719 43714 43710 43715 43718"
## [95] "PMC8626462 /pmc/articles/PMC8626462/bin/41467_2021_27176_MOESM7_ESM.xlsx Hsapiens 5 43719 43710 43715 43717 43716"
## [96] "PMC8626462 /pmc/articles/PMC8626462/bin/41467_2021_27176_MOESM7_ESM.xlsx Hsapiens 1 43715"
## [97] "PMC8617305 /pmc/articles/PMC8617305/bin/41467_2021_27223_MOESM6_ESM.xlsx Hsapiens 21 44448 44443 44444 44264 44449 44440 44450 44265 44257 44454 44260 44259 44262 44261 44263 44258 44441 44442 44446 44451 44266"
## [98] "PMC8617305 /pmc/articles/PMC8617305/bin/41467_2021_27223_MOESM7_ESM.xlsx Hsapiens 21 44448 44443 44444 44264 44449 44440 44450 44265 44257 44454 44260 44259 44262 44261 44263 44258 44441 44442 44446 44451 44266"
## [99] "PMC8672115 /pmc/articles/PMC8672115/bin/Table1.XLSX Hsapiens 2 44448 44454"
## [100] "PMC8672115 /pmc/articles/PMC8672115/bin/Table2.XLSX Hsapiens 1 44454"
## [101] "PMC8672055 /pmc/articles/PMC8672055/bin/Table5.XLSX Hsapiens 23 44261 44451 44442 44453 44441 44448 44258 44444 44446 44447 44265 44259 44263 44445 44256 44257 44256 44266 44257 44262 44264 44450 44443"
## [102] "PMC8672055 /pmc/articles/PMC8672055/bin/Table6.XLSX Hsapiens 27 44454 44257 44256 44449 44262 44259 44441 44450 44256 44261 44266 44258 44447 44446 44453 44531 44263 44260 44264 44451 44440 44443 44265 44448 44257 44444 44442"
## [103] "PMC8655865 /pmc/articles/PMC8655865/bin/DataSheet_1.xlsx Hsapiens 1 44531"
## [104] "PMC8655011 /pmc/articles/PMC8655011/bin/41598_2021_3086_MOESM2_ESM.xlsx Hsapiens 1 43901"
## [105] "PMC8655011 /pmc/articles/PMC8655011/bin/41598_2021_3086_MOESM2_ESM.xlsx Hsapiens 1 43901"
## [106] "PMC8669612 /pmc/articles/PMC8669612/bin/DataSheet2.xlsx Hsapiens 2 44441 44447"
## [107] "PMC8667787 /pmc/articles/PMC8667787/bin/Table5.XLSX Hsapiens 2 37316 37316"
## [108] "PMC8667366 /pmc/articles/PMC8667366/bin/12916_2021_2186_MOESM3_ESM.xlsx Hsapiens 1 44442"
## [109] "PMC8667366 /pmc/articles/PMC8667366/bin/12916_2021_2186_MOESM4_ESM.xlsx Hsapiens 1 44442"
## [110] "PMC8666981 /pmc/articles/PMC8666981/bin/Table_1.xlsx Hsapiens 1 44256"
## [111] "PMC8666981 /pmc/articles/PMC8666981/bin/Table_1.xlsx Hsapiens 8 44453 44261 44453 44261 44266 44453 44453 44263"
## [112] "PMC8665569 /pmc/articles/PMC8665569/bin/12885_2021_9065_MOESM2_ESM.xlsx Hsapiens 1 43349"
## [113] "PMC8649439 /pmc/articles/PMC8649439/bin/sj-xls-1-tct-10.1177_15330338211060202.xls Hsapiens 31 2020/03/06 2020/03/05 2020/03/02 2020/03/06 2020/03/08 2020/03/01 2020/03/01 2020/03/01 2020/03/06 2020/03/07 2020/03/09 2020/03/11 2020/09/15 2020/03/09 2020/03/06 2020/03/07 2020/03/06 2020/03/03 2020/03/07 2020/03/07 2020/03/08 2020/03/01 2020/03/04 2020/03/08 2020/03/03 2020/03/07 2020/03/10 2020/03/10 2020/03/05 2020/03/06 2020/12/01"
## [114] "PMC8642897 /pmc/articles/PMC8642897/bin/12872_2021_2409_MOESM1_ESM.xls Hsapiens 2 44454 44256"
## [115] "PMC8640463 /pmc/articles/PMC8640463/bin/Table1.XLSX Mmusculus 4 42800 42796 42990 42805"
## [116] "PMC8604731 /pmc/articles/PMC8604731/bin/41591_2021_1541_MOESM8_ESM.xlsx Hsapiens 23 44257 44442 44443 44257 44446 44445 44262 44450 44264 44256 44261 44447 44263 44441 44258 44440 44454 44448 44444 44256 44449 44260 44440"
## [117] "PMC8602413 /pmc/articles/PMC8602413/bin/42003_2021_2817_MOESM6_ESM.xlsx Hsapiens 2 44084 43891"
## [118] "PMC8602413 /pmc/articles/PMC8602413/bin/42003_2021_2817_MOESM6_ESM.xlsx Hsapiens 1 44084"
## [119] "PMC8663025 /pmc/articles/PMC8663025/bin/Table_2.xlsx Hsapiens 2 44264 44263"
## [120] "PMC8656106 /pmc/articles/PMC8656106/bin/12870_2021_3339_MOESM3_ESM.xls Athaliana 4 2021/09/03 2021/09/03 2021/09/03 2021/09/03"
## [121] "PMC8638096 /pmc/articles/PMC8638096/bin/12920_2021_880_MOESM1_ESM.xls Hsapiens 9 43167 43169 43353 43353 43345 43346 43351 43435 43349"
## [122] "PMC8612935 /pmc/articles/PMC8612935/bin/41586_2021_4081_MOESM23_ESM.xlsx Mmusculus 2 39142 38961"
## [123] "PMC8612935 /pmc/articles/PMC8612935/bin/41586_2021_4081_MOESM30_ESM.xlsx Mmusculus 1 36951"
## [124] "PMC8612935 /pmc/articles/PMC8612935/bin/41586_2021_4081_MOESM30_ESM.xlsx Mmusculus 1 36951"
## [125] "PMC8612935 /pmc/articles/PMC8612935/bin/41586_2021_4081_MOESM30_ESM.xlsx Mmusculus 1 36951"
## [126] "PMC8612935 /pmc/articles/PMC8612935/bin/41586_2021_4081_MOESM30_ESM.xlsx Mmusculus 1 36951"
## [127] "PMC8612935 /pmc/articles/PMC8612935/bin/41586_2021_4081_MOESM30_ESM.xlsx Mmusculus 1 36951"
## [128] "PMC8612935 /pmc/articles/PMC8612935/bin/41586_2021_4081_MOESM30_ESM.xlsx Mmusculus 1 36951"
## [129] "PMC8612935 /pmc/articles/PMC8612935/bin/41586_2021_4081_MOESM34_ESM.xlsx Mmusculus 2 39142 38961"
## [130] "PMC8612935 /pmc/articles/PMC8612935/bin/41586_2021_4081_MOESM36_ESM.xlsx Mmusculus 2 44448 44256"
## [131] "PMC8657403 /pmc/articles/PMC8657403/bin/Table3.XLSX Hsapiens 13 44531 44256 44265 44266 44257 44258 44259 44260 44261 44262 44263 44264 44454"
## [132] "PMC8657403 /pmc/articles/PMC8657403/bin/Table8.XLSX Hsapiens 10 44454 44263 44258 44264 44265 44261 44257 44260 44262 44256"
## [133] "PMC8657403 /pmc/articles/PMC8657403/bin/Table8.XLSX Hsapiens 10 44261 44258 44262 44264 44256 44257 44454 44260 44265 44263"
## [134] "PMC8637571 /pmc/articles/PMC8637571/bin/MOL2-15-3280-s004.xlsx Mmusculus 23 37865 41153 38596 38231 40057 39873 37316 37316 40422 38961 41883 39692 39326 38412 39142 40787 38047 38777 37135 37500 39508 36951 36951"
## [135] "PMC8637571 /pmc/articles/PMC8637571/bin/MOL2-15-3280-s008.xlsx Hsapiens 24 37865 38596 38231 38412 37316 37226 39692 40422 39508 37681 40057 37316 40238 36951 38777 38961 39142 41153 37135 40787 39873 37500 39326 36951"
## [136] "PMC8636031 /pmc/articles/PMC8636031/bin/Table1.XLSX Mmusculus 8 44263 44260 44264 44261 44256 44257 44262 44258"
## [137] "PMC8636031 /pmc/articles/PMC8636031/bin/Table2.XLSX Hsapiens 24 43893 44080 43897 44085 44075 43891 43899 44078 44081 43895 43892 43898 44166 44082 44084 43900 44083 44079 43896 44076 44077 43891 43892 43894"
## [138] "PMC8636031 /pmc/articles/PMC8636031/bin/Table2.XLSX Hsapiens 24 43893 44080 43897 44085 44075 43891 43899 44078 44081 43895 43892 43898 44166 44082 44084 43900 44083 44079 43896 44076 44077 43891 43892 43894"
## [139] "PMC8636031 /pmc/articles/PMC8636031/bin/Table5.XLSX Hsapiens 3 43893 43897 44080"
## [140] "PMC8636031 /pmc/articles/PMC8636031/bin/Table8.XLSX Hsapiens 2 43901 43897"
## [141] "PMC8636031 /pmc/articles/PMC8636031/bin/Table9.XLSX Hsapiens 2 44262 44260"
## [142] "PMC8634728 /pmc/articles/PMC8634728/bin/Table5.XLSX Hsapiens 9 44531 44256 44257 44258 44260 44261 44262 44263 44454"
## [143] "PMC8633104 /pmc/articles/PMC8633104/bin/Table1.xlsx Hsapiens 5 44261 44262 44262 44443 44443"
## [144] "PMC8633104 /pmc/articles/PMC8633104/bin/Table2.xlsx Hsapiens 3 44261 44261 44261"
## [145] "PMC8632647 /pmc/articles/PMC8632647/bin/Table_1.XLSX Hsapiens 8 43901 43894 44077 43892 43891 44080 43895 44079"
## [146] "PMC8632647 /pmc/articles/PMC8632647/bin/Table_1.XLSX Hsapiens 14 44166 44086 43893 44078 43900 44082 44084 43898 44081 44083 43899 44085 43897 44076"
## [147] "PMC8632647 /pmc/articles/PMC8632647/bin/Table_1.XLSX Hsapiens 24 44082 44077 44078 44081 44166 44083 43898 44088 44080 44076 44086 43894 44085 43901 43893 43895 43896 43899 44084 43897 43892 43900 43891 44079"
## [148] "PMC8632647 /pmc/articles/PMC8632647/bin/Table_2.XLSX Hsapiens 1 38596"
## [149] "PMC8632647 /pmc/articles/PMC8632647/bin/Table_2.XLSX Hsapiens 4 40422 37316 37500 39142"
## [150] "PMC8632647 /pmc/articles/PMC8632647/bin/Table_2.XLSX Hsapiens 10 42248 37316 39692 40787 37500 38596 38412 39142 37316 40422"
## [151] "PMC8632647 /pmc/articles/PMC8632647/bin/Table_3.XLSX Hsapiens 1 38596"
## [152] "PMC8632647 /pmc/articles/PMC8632647/bin/Table_3.XLSX Hsapiens 6 38777 37865 39326 37316 39692 38596"
## [153] "PMC8632647 /pmc/articles/PMC8632647/bin/Table_4.XLSX Hsapiens 1 39326"
## [154] "PMC8632647 /pmc/articles/PMC8632647/bin/Table_4.XLSX Hsapiens 1 38047"
## [155] "PMC8632647 /pmc/articles/PMC8632647/bin/Table_4.XLSX Hsapiens 2 39326 38047"
## [156] "PMC8630779 /pmc/articles/PMC8630779/bin/NIHMS1757901-supplement-3.xlsx Hsapiens 12 42621 42437 42615 42705 42439 42432 42614 42440 42622 42618 42623 42434"
## [157] "PMC8630779 /pmc/articles/PMC8630779/bin/NIHMS1757901-supplement-4.xlsx Hsapiens 12 42621 42437 42615 42705 42439 42432 42614 42440 42622 42618 42623 42434"
## [158] "PMC8630779 /pmc/articles/PMC8630779/bin/NIHMS1757901-supplement-5.xlsx Hsapiens 12 42621 42437 42615 42705 42439 42432 42614 42440 42622 42618 42623 42434"
## [159] "PMC8630779 /pmc/articles/PMC8630779/bin/NIHMS1757901-supplement-7.xlsx Hsapiens 12 42621 42437 42615 42705 42439 42432 42614 42440 42622 42618 42623 42434"
## [160] "PMC8630779 /pmc/articles/PMC8630779/bin/NIHMS1757901-supplement-7.xlsx Hsapiens 2 42623 42705"
## [161] "PMC8630779 /pmc/articles/PMC8630779/bin/NIHMS1757901-supplement-8.xlsx Hsapiens 12 42621 42437 42615 42705 42439 42432 42614 42440 42622 42618 42623 42434"
## [162] "PMC8630779 /pmc/articles/PMC8630779/bin/NIHMS1757901-supplement-8.xlsx Hsapiens 12 42621 42437 42615 42705 42439 42432 42614 42440 42622 42618 42623 42434"
## [163] "PMC8627048 /pmc/articles/PMC8627048/bin/12938_2021_959_MOESM8_ESM.xlsx Hsapiens 1 44446"
## [164] "PMC8627048 /pmc/articles/PMC8627048/bin/12938_2021_959_MOESM8_ESM.xlsx Hsapiens 2 44266 44261"
## [165] "PMC8627048 /pmc/articles/PMC8627048/bin/12938_2021_959_MOESM8_ESM.xlsx Hsapiens 3 44448 44261 44266"
## [166] "PMC8627048 /pmc/articles/PMC8627048/bin/12938_2021_959_MOESM8_ESM.xlsx Hsapiens 2 44263 44260"
## [167] "PMC8627048 /pmc/articles/PMC8627048/bin/12938_2021_959_MOESM9_ESM.xlsx Hsapiens 1 44449"
## [168] "PMC8627048 /pmc/articles/PMC8627048/bin/12938_2021_959_MOESM9_ESM.xlsx Hsapiens 1 44261"
## [169] "PMC8627048 /pmc/articles/PMC8627048/bin/12938_2021_959_MOESM9_ESM.xlsx Hsapiens 1 44261"
## [170] "PMC8627048 /pmc/articles/PMC8627048/bin/12938_2021_959_MOESM9_ESM.xlsx Hsapiens 3 44440 44261 44445"
## [171] "PMC8611784 /pmc/articles/PMC8611784/bin/CNS-27-1483-s001.xls Ggallus 30 43891 44084 44084 43894 43894 44085 43891 43891 43891 43891 43891 43891 43891 43891 43891 43891 43891 43891 43891 43901 44088 43898 43898 43898 43895 43900 43900 43900 43900 44083"
## [172] "PMC8611784 /pmc/articles/PMC8611784/bin/CNS-27-1483-s001.xls Hsapiens 2 44085 43891"
## [173] "PMC8611784 /pmc/articles/PMC8611784/bin/CNS-27-1483-s001.xls Ggallus 17 43891 44084 43891 43891 43891 43891 43891 43891 43891 43891 43901 44088 43898 43898 43895 43900 44083"
## [174] "PMC8611784 /pmc/articles/PMC8611784/bin/CNS-27-1483-s001.xls Hsapiens 3 43895 43891 43891"
## [175] "PMC8611784 /pmc/articles/PMC8611784/bin/CNS-27-1483-s002.xls Hsapiens 2 43892 44077"
## [176] "PMC8611784 /pmc/articles/PMC8611784/bin/CNS-27-1483-s002.xls Ggallus 2 43900 43892"
## [177] "PMC8651141 /pmc/articles/PMC8651141/bin/pone.0259588.s001.xlsx Scerevisiae 2 39326 36982"
## [178] "PMC8650217 /pmc/articles/PMC8650217/bin/DataSheet_1.xlsx Hsapiens 27 44448 44441 44450 44446 44262 44261 44449 44445 44447 44260 44442 44258 44443 44263 44264 44257 44256 44257 44256 44444 44440 44531 44259 44265 44451 44453 44266"
## [179] "PMC8650217 /pmc/articles/PMC8650217/bin/DataSheet_2.xlsx Hsapiens 4 44442 44449 44446 44256"
## [180] "PMC8650217 /pmc/articles/PMC8650217/bin/DataSheet_3.xlsx Hsapiens 5 44443 44442 44264 44256 44256"
## [181] "PMC8650217 /pmc/articles/PMC8650217/bin/DataSheet_4.xlsx Hsapiens 8 44256 44442 44443 44447 44257 44264 44441 44256"
## [182] "PMC8643507 /pmc/articles/PMC8643507/bin/hmg-2021-ce-00288_aavikko_supplementary_table_8_revised_ddab206.xlsx Hsapiens 4 43529 43525 43527 43532"
## [183] "PMC8641155 /pmc/articles/PMC8641155/bin/12864_2021_8185_MOESM3_ESM.xlsx Hsapiens 5 44083 44083 44083 44082 44082"
## [184] "PMC8640099 /pmc/articles/PMC8640099/bin/Table1.XLSX Hsapiens 25 44454 44447 44449 44442 44454 44454 44265 44447 44449 44443 44446 44443 44442 44454 44261 44454 44442 44444 44454 44451 44454 44454 44454 44454 44442"
## [185] "PMC8640099 /pmc/articles/PMC8640099/bin/Table3.XLSX Hsapiens 4 44440 44443 44447 44264"
## [186] "PMC8640099 /pmc/articles/PMC8640099/bin/Table4.XLSX Hsapiens 13 44444 44445 44444 44444 44444 44443 44444 44444 44444 44443 44443 44443 44444"
## [187] "PMC8607334 /pmc/articles/PMC8607334/bin/BSR-2021-1847_supp.xlsx Mmusculus 25 43896 44085 44081 44084 43899 43891 44076 44089 43898 43892 43897 44077 44080 43895 44088 43891 43893 44078 43894 44082 44079 43892 43900 44075 44083"
## [188] "PMC8626588 /pmc/articles/PMC8626588/bin/mmc2.xlsx Mmusculus 1 43891"
## [189] "PMC8626588 /pmc/articles/PMC8626588/bin/mmc6.xlsx Hsapiens 2 43525 43712"
## [190] "PMC8626588 /pmc/articles/PMC8626588/bin/mmc6.xlsx Hsapiens 1 43525"
## [191] "PMC8635166 /pmc/articles/PMC8635166/bin/Table7.XLSX Mmusculus 1 44262"
## [192] "PMC8635117 /pmc/articles/PMC8635117/bin/Table_1.XLSX Hsapiens 26 44089 43892 44075 43893 44081 43900 43894 43895 43901 44084 44079 43899 44086 44076 43897 43891 44088 44078 44077 44166 44085 43896 44082 44083 44080 43898"
## [193] "PMC8633307 /pmc/articles/PMC8633307/bin/Table2.XLSX Hsapiens 2 43891 44084"
## [194] "PMC8632918 /pmc/articles/PMC8632918/bin/41467_2021_27087_MOESM6_ESM.xlsx Hsapiens 4 42988 42797 42796 42986"
## [195] "PMC8632918 /pmc/articles/PMC8632918/bin/41467_2021_27087_MOESM6_ESM.xlsx Hsapiens 3 42431 42623 42432"
## [196] "PMC8631276 /pmc/articles/PMC8631276/bin/DataSheet_3.xlsx Hsapiens 1 44257"
Let’s investigate the errors in more detail.
# By species
SPECIES <- sapply(strsplit(ERROR_GENELISTS," "),"[[",3)
table(SPECIES)
## SPECIES
## Athaliana Dmelanogaster Ggallus Hsapiens Mmusculus
## 1 4 4 154 28
## Rnorvegicus Scerevisiae
## 1 4
par(mar=c(5,12,4,2))
barplot(table(SPECIES),horiz=TRUE,las=1)
par(mar=c(5,5,4,2))
# Number of affected Excel files per paper
DIST <- table(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
DIST
##
## PMC8602413 PMC8604731 PMC8607334 PMC8611784 PMC8612935 PMC8617305 PMC8626462
## 2 1 1 6 9 2 11
## PMC8626588 PMC8627048 PMC8630779 PMC8631276 PMC8632647 PMC8632918 PMC8633104
## 3 8 7 1 11 2 2
## PMC8633298 PMC8633307 PMC8634119 PMC8634728 PMC8635117 PMC8635166 PMC8636031
## 2 1 3 1 1 1 6
## PMC8637571 PMC8638096 PMC8640014 PMC8640099 PMC8640463 PMC8641155 PMC8642546
## 2 1 1 3 1 1 2
## PMC8642897 PMC8643507 PMC8648735 PMC8648877 PMC8649439 PMC8649620 PMC8649757
## 1 1 1 1 1 4 2
## PMC8650217 PMC8651141 PMC8655011 PMC8655865 PMC8656106 PMC8657403 PMC8660810
## 4 1 2 1 1 3 4
## PMC8663025 PMC8665569 PMC8666981 PMC8667366 PMC8667787 PMC8669026 PMC8669612
## 1 1 2 2 1 12 1
## PMC8671590 PMC8672055 PMC8672115 PMC8672579 PMC8674316 PMC8675387 PMC8675923
## 5 2 2 3 1 1 3
## PMC8684099 PMC8686545 PMC8689144 PMC8690596 PMC8690979 PMC8691574 PMC8695320
## 1 1 1 2 1 1 1
## PMC8702551 PMC8703359 PMC8710427 PMC8712770 PMC8712874 PMC8715921 PMC8716048
## 1 11 1 12 2 4 1
summary(as.numeric(DIST))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 1.0 2.0 2.8 3.0 12.0
hist(DIST,main="Number of affected Excel files per paper")
# PMC Articles with the most errors
DIST_DF <- as.data.frame(DIST)
DIST_DF <- DIST_DF[order(-DIST_DF$Freq),,drop=FALSE]
head(DIST_DF,20)
## Var1 Freq
## 48 PMC8669026 12
## 67 PMC8712770 12
## 7 PMC8626462 11
## 12 PMC8632647 11
## 65 PMC8703359 11
## 5 PMC8612935 9
## 9 PMC8627048 8
## 10 PMC8630779 7
## 4 PMC8611784 6
## 21 PMC8636031 6
## 50 PMC8671590 5
## 34 PMC8649620 4
## 36 PMC8650217 4
## 42 PMC8660810 4
## 69 PMC8715921 4
## 8 PMC8626588 3
## 17 PMC8634119 3
## 25 PMC8640099 3
## 41 PMC8657403 3
## 53 PMC8672579 3
MOST_ERR_FILES = as.character(DIST_DF[1,1])
MOST_ERR_FILES
## [1] "PMC8669026"
# Number of errors per paper
NERR <- as.numeric(sapply(strsplit(ERROR_GENELISTS," "),"[[",4))
names(NERR) <- sapply(strsplit(ERROR_GENELISTS," "),"[[",1)
NERR <-tapply(NERR, names(NERR), sum)
NERR
## PMC8602413 PMC8604731 PMC8607334 PMC8611784 PMC8612935 PMC8617305 PMC8626462
## 3 23 25 56 12 42 64
## PMC8626588 PMC8627048 PMC8630779 PMC8631276 PMC8632647 PMC8632918 PMC8633104
## 4 14 74 1 72 7 8
## PMC8633298 PMC8633307 PMC8634119 PMC8634728 PMC8635117 PMC8635166 PMC8636031
## 10 2 4 9 26 1 63
## PMC8637571 PMC8638096 PMC8640014 PMC8640099 PMC8640463 PMC8641155 PMC8642546
## 47 9 3 42 4 5 168
## PMC8642897 PMC8643507 PMC8648735 PMC8648877 PMC8649439 PMC8649620 PMC8649757
## 2 4 143 6 31 66 42
## PMC8650217 PMC8651141 PMC8655011 PMC8655865 PMC8656106 PMC8657403 PMC8660810
## 44 2 2 1 4 33 7
## PMC8663025 PMC8665569 PMC8666981 PMC8667366 PMC8667787 PMC8669026 PMC8669612
## 2 1 9 2 2 87 2
## PMC8671590 PMC8672055 PMC8672115 PMC8672579 PMC8674316 PMC8675387 PMC8675923
## 3062 50 3 11 1 27 5
## PMC8684099 PMC8686545 PMC8689144 PMC8690596 PMC8690979 PMC8691574 PMC8695320
## 1 2 1 50 1 1 1
## PMC8702551 PMC8703359 PMC8710427 PMC8712770 PMC8712874 PMC8715921 PMC8716048
## 2 280 26 59 3 11 1
hist(NERR,main="number of errors per PMC article")
NERR_DF <- as.data.frame(NERR)
NERR_DF <- NERR_DF[order(-NERR_DF$NERR),,drop=FALSE]
head(NERR_DF,20)
## NERR
## PMC8671590 3062
## PMC8703359 280
## PMC8642546 168
## PMC8648735 143
## PMC8669026 87
## PMC8630779 74
## PMC8632647 72
## PMC8649620 66
## PMC8626462 64
## PMC8636031 63
## PMC8712770 59
## PMC8611784 56
## PMC8672055 50
## PMC8690596 50
## PMC8637571 47
## PMC8650217 44
## PMC8617305 42
## PMC8640099 42
## PMC8649757 42
## PMC8657403 33
MOST_ERR = rownames(NERR_DF)[1]
MOST_ERR
## [1] "PMC8671590"
GENELIST_ERROR_ARTICLES <- gsub("PMC","",GENELIST_ERROR_ARTICLES)
### JSON PARSING is more reliable than XML
ARTICLES <- esummary( GENELIST_ERROR_ARTICLES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA$result
ARTICLE_DATA <- ARTICLE_DATA[2:length(ARTICLE_DATA)]
JOURNALS <- unlist(lapply(ARTICLE_DATA,function(x) {x$fulljournalname} ))
JOURNALS_TABLE <- table(JOURNALS)
JOURNALS_TABLE <- JOURNALS_TABLE[order(-JOURNALS_TABLE)]
length(JOURNALS_TABLE)
## [1] 43
NUM_JOURNALS=length(JOURNALS_TABLE)
par(mar=c(5,25,4,2))
barplot(head(JOURNALS_TABLE,10), horiz=TRUE, las=1,
xlab="Articles with gene name errors in supp files",
main="Top journals this month")
Congrats to our Journal of the Month winner!
JOURNAL_WINNER <- names(head(JOURNALS_TABLE,1))
JOURNAL_WINNER
## [1] "Nature Communications"
There are two categories:
Paper with the most suplementary files affected by gene name errors (MOST_ERR_FILES)
Paper with the most gene names converted to dates (MOST_ERR)
Sometimes, one paper can win both categories. Congrats to our winners.
MOST_ERR_FILES <- gsub("PMC","",MOST_ERR_FILES)
ARTICLES <- esummary( MOST_ERR_FILES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA[2]
ARTICLE_DATA
## $result
## $result$uids
## [1] "8669026"
##
## $result$`8669026`
## $result$`8669026`$uid
## [1] "8669026"
##
## $result$`8669026`$pubdate
## [1] "2021 Dec 13"
##
## $result$`8669026`$epubdate
## [1] "2021 Dec 13"
##
## $result$`8669026`$printpubdate
## [1] ""
##
## $result$`8669026`$source
## [1] "Transl Psychiatry"
##
## $result$`8669026`$authors
## name authtype
## 1 Moore SR Author
## 2 Halldorsdottir T Author
## 3 Martins J Author
## 4 Lucae S Author
## 5 Müller-Myhsok B Author
## 6 Müller NS Author
## 7 Piechaczek C Author
## 8 Feldmann L Author
## 9 Freisleder FJ Author
## 10 Greimel E Author
## 11 Schulte-Körne G Author
## 12 Binder EB Author
## 13 Arloth J Author
##
## $result$`8669026`$title
## [1] "Sex differences in the genetic regulation of the blood transcriptome response to glucocorticoid receptor activation"
##
## $result$`8669026`$volume
## [1] "11"
##
## $result$`8669026`$issue
## [1] ""
##
## $result$`8669026`$pages
## [1] "632"
##
## $result$`8669026`$articleids
## idtype value
## 1 pmid 34903727
## 2 doi 10.1038/s41398-021-01756-2
## 3 pmcid PMC8669026
##
## $result$`8669026`$fulljournalname
## [1] "Translational Psychiatry"
##
## $result$`8669026`$sortdate
## [1] "2021/12/13 00:00"
##
## $result$`8669026`$pmclivedate
## [1] "2021/12/28"
MOST_ERR <- gsub("PMC","",MOST_ERR)
ARTICLE_DATA <- esummary(MOST_ERR,db = "pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLE_DATA,as= "parsed")
ARTICLE_DATA
## $header
## $header$type
## [1] "esummary"
##
## $header$version
## [1] "0.3"
##
##
## $result
## $result$uids
## [1] "8671590"
##
## $result$`8671590`
## $result$`8671590`$uid
## [1] "8671590"
##
## $result$`8671590`$pubdate
## [1] "2021 Dec 14"
##
## $result$`8671590`$epubdate
## [1] "2021 Dec 14"
##
## $result$`8671590`$printpubdate
## [1] ""
##
## $result$`8671590`$source
## [1] "Cell Death Dis"
##
## $result$`8671590`$authors
## name authtype
## 1 Zhang Y Author
## 2 Tan YY Author
## 3 Chen PP Author
## 4 Xu H Author
## 5 Xie SJ Author
## 6 Xu SJ Author
## 7 Li B Author
## 8 Li JH Author
## 9 Liu S Author
## 10 Yang JH Author
## 11 Zhou H Author
## 12 Qu LH Author
##
## $result$`8671590`$title
## [1] "Genome-wide identification of microRNA targets reveals positive regulation of the Hippo pathway by miR-122 during liver development"
##
## $result$`8671590`$volume
## [1] "12"
##
## $result$`8671590`$issue
## [1] "12"
##
## $result$`8671590`$pages
## [1] "1161"
##
## $result$`8671590`$articleids
## idtype value
## 1 pmid 34907157
## 2 doi 10.1038/s41419-021-04436-7
## 3 pmcid PMC8671590
##
## $result$`8671590`$fulljournalname
## [1] "Cell Death & Disease"
##
## $result$`8671590`$sortdate
## [1] "2021/12/14 00:00"
##
## $result$`8671590`$pmclivedate
## [1] "2021/12/28"
To plot the trend over the past 6-12 months.
url <- "http://ziemann-lab.net/public/gene_name_errors/"
doc <- htmlParse(url)
links <- xpathSApply(doc, "//a/@href")
links <- links[grep("html",links)]
links
## href href href
## "Report_2021-02.html" "Report_2021-03.html" "Report_2021-04.html"
## href href href
## "Report_2021-05.html" "Report_2021-06.html" "Report_2021-07.html"
## href href href
## "Report_2021-08.html" "Report_2021-09.html" "Report_2021-10.html"
## href href
## "Report_2021-11.html" "Report_2021-12.html"
unlink("online_files/",recursive=TRUE)
dir.create("online_files")
sapply(links, function(mylink) {
download.file(paste(url,mylink,sep=""),destfile=paste("online_files/",mylink,sep=""))
} )
## href href href href href href href href href href href
## 0 0 0 0 0 0 0 0 0 0 0
myfilelist <- list.files("online_files/",full.names=TRUE)
trends <- sapply(myfilelist, function(myfilename) {
x <- readLines(myfilename)
# Num XL gene list articles
NUM_GENELIST_ARTICLES <- x[grep("NUM_GENELIST_ARTICLES",x)[3]+1]
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES," "),"[[",3)
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES,"<"),"[[",1)
NUM_GENELIST_ARTICLES <- as.numeric(NUM_GENELIST_ARTICLES)
# number of affected articles
NUM_ERROR_GENELIST_ARTICLES <- x[grep("NUM_ERROR_GENELIST_ARTICLES",x)[3]+1]
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES," "),"[[",3)
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES,"<"),"[[",1)
NUM_ERROR_GENELIST_ARTICLES <- as.numeric(NUM_ERROR_GENELIST_ARTICLES)
# Error proportion
ERROR_PROPORTION <- x[grep("ERROR_PROPORTION",x)[3]+1]
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION," "),"[[",3)
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION,"<"),"[[",1)
ERROR_PROPORTION <- as.numeric(ERROR_PROPORTION)
# number of journals
NUM_JOURNALS <- x[grep('JOURNALS_TABLE',x)[3]+1]
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS," "),"[[",3)
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS,"<"),"[[",1)
NUM_JOURNALS <- as.numeric(NUM_JOURNALS)
NUM_JOURNALS
res <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
return(res)
})
colnames(trends) <- sapply(strsplit(colnames(trends),"_"),"[[",3)
colnames(trends) <- gsub(".html","",colnames(trends))
trends <- as.data.frame(trends)
rownames(trends) <- c("NUM_GENELIST_ARTICLES","NUM_ERROR_GENELIST_ARTICLES","ERROR_PROPORTION","NUM_JOURNALS")
trends <- t(trends)
trends <- as.data.frame(trends)
CURRENT_RES <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
trends <- rbind(trends,CURRENT_RES)
paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
## [1] "2022-01"
rownames(trends)[nrow(trends)] <- paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
plot(trends$NUM_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with Excel gene lists per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_ERROR_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with gene name errors per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$ERROR_PROPORTION, xaxt = "n" , type="b" , main="Proportion of articles with Excel gene list affected by errors",
ylab="proportion", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_JOURNALS, xaxt = "n" , type="b" , main="Number of journals with affected articles",
ylab="number of journals", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
unlink("online_files/",recursive=TRUE)
Zeeberg, B.R., Riss, J., Kane, D.W. et al. Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5, 80 (2004). https://doi.org/10.1186/1471-2105-5-80
Ziemann, M., Eren, Y. & El-Osta, A. Gene name errors are widespread in the scientific literature. Genome Biol 17, 177 (2016). https://doi.org/10.1186/s13059-016-1044-7
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] readxl_1.3.1 reutils_0.2.3 xml2_1.3.2 jsonlite_1.7.2 XML_3.99-0.8
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.7 knitr_1.36 magrittr_2.0.1 R6_2.5.1
## [5] rlang_0.4.12 fastmap_1.1.0 stringr_1.4.0 highr_0.9
## [9] tools_4.1.2 xfun_0.28 jquerylib_0.1.4 htmltools_0.5.2
## [13] yaml_2.2.1 digest_0.6.28 assertthat_0.2.1 sass_0.4.0
## [17] bitops_1.0-7 RCurl_1.98-1.5 evaluate_0.14 rmarkdown_2.11
## [21] stringi_1.7.5 compiler_4.1.2 bslib_0.3.1 cellranger_1.1.0