Source: https://github.com/markziemann/GeneNameErrors2020
View the reports: http://ziemann-lab.net/public/gene_name_errors/
Gene name errors result when data are imported improperly into MS Excel and other spreadsheet programs (Zeeberg et al, 2004). Certain gene names like MARCH3, SEPT2 and DEC1 are converted into date format. These errors are surprisingly common in supplementary data files in the field of genomics (Ziemann et al, 2016). This could be considered a small error because it only affects a small number of genes, however it is symptomtic of poor data processing methods. The purpose of this script is to identify gene name errors present in supplementary files of PubMed Central articles in the previous month.
library("jsonlite")
library("xml2")
library("reutils")
library("readxl")
Here I will be getting PubMed Central IDs for the previous month.
Start with figuring out the date to search PubMed Central.
DATE="2021/1"
Let’s see how many PMC IDs we have in the past month.
QUERY ='((genom*[Abstract]))'
ESEARCH_RES <- esearch(term=QUERY, db = "pmc", rettype = "uilist", retmode = "xml", retstart = 0,
retmax = 5000000, usehistory = TRUE, webenv = NULL, querykey = NULL, sort = NULL, field = NULL,
datetype = NULL, reldate = NULL, mindate = DATE, maxdate = DATE)
pmc <- efetch(ESEARCH_RES,retmode="text",rettype="uilist",outfile="pmcids.txt")
## Retrieving UIDs 1 to 500
## Retrieving UIDs 501 to 1000
## Retrieving UIDs 1001 to 1500
## Retrieving UIDs 1501 to 2000
## Retrieving UIDs 2001 to 2500
## Retrieving UIDs 2501 to 3000
## Retrieving UIDs 3001 to 3500
## Retrieving UIDs 3501 to 4000
## Retrieving UIDs 4001 to 4500
## Retrieving UIDs 4501 to 5000
pmc <- read.table(pmc)
pmc <- paste("PMC",pmc$V1,sep="")
NUM_ARTICLES=length(pmc)
NUM_ARTICLES
## [1] 4511
writeLines(pmc,con="pmc.txt")
Now run the bash script. Note that false positives can occur (~1.5%) and these results have not been verified by a human.
Here are some definitions:
NUM_XLS = Number of supplementary Excel files in this set of PMC articles.
NUM_XLS_ARTICLES = Number of articles matching the PubMed Central search which have supplementary Excel files.
GENELISTS = The gene lists found in the Excel files. Each Excel file is counted once even it has multiple gene lists.
NUM_GENELISTS = The number of Excel files with gene lists.
NUM_GENELIST_ARTICLES = The number of PMC articles with supplementary Excel gene lists.
ERROR_GENELISTS = Files suspected to contain gene name errors. The dates and five-digit numbers indicate transmogrified gene names.
NUM_ERROR_GENELISTS = Number of Excel gene lists with errors.
NUM_ERROR_GENELIST_ARTICLES = Number of articles with supplementary Excel gene name errors.
ERROR_PROPORTION = This is the proportion of articles with Excel gene lists that have errors.
#system("./gene_names.sh pmc.txt")
results <- readLines("results.txt")
XLS <- results[grep("XLS",results,ignore.case=TRUE)]
NUM_XLS = length(XLS)
NUM_XLS
## [1] 3169
NUM_XLS_ARTICLES = length(unique(sapply(strsplit(XLS," "),"[[",1)))
NUM_XLS_ARTICLES
## [1] 648
GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>2]
#GENELISTS
NUM_GENELISTS <- length(unique(sapply(strsplit(GENELISTS," "),"[[",2)))
NUM_GENELISTS
## [1] 398
NUM_GENELIST_ARTICLES <- length(unique(sapply(strsplit(GENELISTS," "),"[[",1)))
NUM_GENELIST_ARTICLES
## [1] 221
ERROR_GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>3]
#ERROR_GENELISTS
NUM_ERROR_GENELISTS = length(ERROR_GENELISTS)
NUM_ERROR_GENELISTS
## [1] 204
GENELIST_ERROR_ARTICLES <- unique(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
GENELIST_ERROR_ARTICLES
## [1] "PMC7842022" "PMC7836161" "PMC7825165" "PMC7822813" "PMC7821633"
## [6] "PMC7817693" "PMC7815088" "PMC7814551" "PMC7844323" "PMC7826362"
## [11] "PMC7812929" "PMC7807721" "PMC7838615" "PMC7815943" "PMC7801399"
## [16] "PMC7821388" "PMC7820775" "PMC7819609" "PMC7794586" "PMC7794364"
## [21] "PMC7791999" "PMC7791837" "PMC7792223" "PMC7791882" "PMC7789526"
## [26] "PMC7788918" "PMC7788893" "PMC7788839" "PMC7788826" "PMC7788777"
## [31] "PMC7788743" "PMC7786480" "PMC7812315" "PMC7782737" "PMC7782510"
## [36] "PMC7780630" "PMC7797533" "PMC7773562" "PMC7811140" "PMC7793309"
## [41] "PMC7793258" "PMC7824480" "PMC7790417" "PMC7788095" "PMC7785587"
## [46] "PMC7757804" "PMC7773550" "PMC7779551" "PMC7745987" "PMC7744083"
## [51] "PMC7773997" "PMC7773990" "PMC7793715" "PMC7758050" "PMC7782839"
## [56] "PMC7796900" "PMC7781436" "PMC7784491" "PMC7803632" "PMC7762488"
## [61] "PMC7746351" "PMC7842189" "PMC7782095" "PMC7764503" "PMC7774731"
## [66] "PMC7790754" "PMC7759458" "PMC7818422" "PMC7116647"
NUM_ERROR_GENELIST_ARTICLES <- length(GENELIST_ERROR_ARTICLES)
NUM_ERROR_GENELIST_ARTICLES
## [1] 69
ERROR_PROPORTION = NUM_ERROR_GENELIST_ARTICLES / NUM_GENELIST_ARTICLES
ERROR_PROPORTION
## [1] 0.3122172
Here you can have a look at all the gene lists detected in the past month, as well as those with errors. The dates are obvious errors, these are commonly dates in September, March, December and October. The five-digit numbers represent dates as they are encoded in the Excel internal format. The five digit number is the number of days since 1900. If you were to take these numbers and put them into Excel and format the cells as dates, then these will also mostly map to dates in September, March, December and October.
#GENELISTS
ERROR_GENELISTS
## [1] "PMC7842022 /pmc/articles/PMC7842022/bin/12935_2021_1774_MOESM3_ESM.xlsx Hsapiens 1 44081"
## [2] "PMC7842022 /pmc/articles/PMC7842022/bin/12935_2021_1774_MOESM4_ESM.xlsx Hsapiens 27 43891 43898 44077 44075 43901 44166 44079 43899 44076 43894 44080 43896 43895 44081 44086 43897 43891 44083 44078 43892 44082 44084 44085 43892 44088 43900 43893"
## [3] "PMC7836161 /pmc/articles/PMC7836161/bin/13059_2020_2252_MOESM5_ESM.xlsx Hsapiens 5 43710 43710 43710 43710 43717"
## [4] "PMC7825165 /pmc/articles/PMC7825165/bin/12864_2021_7387_MOESM4_ESM.xlsx Ggallus 2 43715 43717"
## [5] "PMC7822813 /pmc/articles/PMC7822813/bin/41467_2020_20809_MOESM6_ESM.xlsx Mmusculus 4 37316 39142 39692 37681"
## [6] "PMC7822813 /pmc/articles/PMC7822813/bin/41467_2020_20809_MOESM6_ESM.xlsx Mmusculus 1 39508"
## [7] "PMC7821633 /pmc/articles/PMC7821633/bin/12860_2021_346_MOESM4_ESM.xlsx Hsapiens 180 44081 44076 44083 43896 44080 43900 43893 44079 44082 43897 44088 44077 44078 43894 43892 44086 43899 43898 43895 43892 44075 44085 44084 43901 43891 43891 43892 44077 44078 43892 44081 44080 43897 44085 43899 44086 43894 43891 43896 44088 44082 43898 44076 44166 43900 43893 44075 44089 43901 44083 44079 43891 44084 43895 43898 44083 44083 43900 43892 44076 43895 44081 43901 44078 44166 43896 43891 44082 43898 44077 43892 43891 44079 44084 44080 44086 44085 43897 43894 43893 44075 43899 44075 44083 43891 43896 43901 43892 43892 43897 44086 43895 43900 44080 43898 44085 44078 43894 44076 44081 43899 44075 44081 44082 44084 44078 43901 44077 44086 43895 43898 44076 43891 43897 43893 44088 43891 44089 43896 43894 43899 44080 44085 44083 44079 43892 43900 43892 43892 43891 43898 43899 44086 44075 44078 43900 43892 43897 43894 44079 44077 44085 43891 43896 43901 43893 44082 44081 44088 44166 44080 44076 43895 44085 44083 44086 43891 43897 44081 43900 44078 44082 44077 43892 43896 43898 43894 43895 44075 44076 43893 43901 43891 44079 43892 44088 44080 43899 44084 44089"
## [8] "PMC7821633 /pmc/articles/PMC7821633/bin/12860_2021_346_MOESM8_ESM.xlsx Hsapiens 147 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082 44082"
## [9] "PMC7817693 /pmc/articles/PMC7817693/bin/41467_2020_20783_MOESM3_ESM.xlsx Hsapiens 1 41334"
## [10] "PMC7815088 /pmc/articles/PMC7815088/bin/pone.0245526.s012.xlsx Hsapiens 4 43353 43170 43346 43349"
## [11] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM16_ESM.xls Hsapiens 7 2020/03/02 2020/03/06 2020/09/15 2020/09/01 2020/09/02 2020/09/06 2020/09/09"
## [12] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM16_ESM.xls Hsapiens 6 2020/03/08 2020/03/01 2020/03/02 2020/03/06 2020/03/07 2020/09/15"
## [13] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM16_ESM.xls Hsapiens 2 2020/03/06 2020/03/07"
## [14] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM16_ESM.xls Hsapiens 9 2020/03/01 2020/03/06 2020/03/09 2020/09/15 2020/09/01 2020/09/02 2020/09/06 2020/09/07 2020/09/09"
## [15] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM2_ESM.xls Hsapiens 13 2020/12/01 2020/03/01 2020/03/10 2020/03/11 2020/03/02 2020/03/03 2020/03/04 2020/03/05 2020/03/06 2020/03/07 2020/03/08 2020/03/09 2020/09/15"
## [16] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM2_ESM.xls Hsapiens 13 2020/12/01 2020/03/01 2020/03/10 2020/03/11 2020/03/02 2020/03/03 2020/03/04 2020/03/05 2020/03/06 2020/03/07 2020/03/08 2020/03/09 2020/09/15"
## [17] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM2_ESM.xls Hsapiens 13 2020/12/01 2020/03/01 2020/03/10 2020/03/11 2020/03/02 2020/03/03 2020/03/04 2020/03/05 2020/03/06 2020/03/07 2020/03/08 2020/03/09 2020/09/15"
## [18] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM2_ESM.xls Hsapiens 13 2020/12/01 2020/03/01 2020/03/10 2020/03/11 2020/03/02 2020/03/03 2020/03/04 2020/03/05 2020/03/06 2020/03/07 2020/03/08 2020/03/09 2020/09/15"
## [19] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM2_ESM.xls Hsapiens 13 2020/12/01 2020/03/01 2020/03/10 2020/03/11 2020/03/02 2020/03/03 2020/03/04 2020/03/05 2020/03/06 2020/03/07 2020/03/08 2020/03/09 2020/09/15"
## [20] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM2_ESM.xls Hsapiens 13 2020/12/01 2020/03/01 2020/03/10 2020/03/11 2020/03/02 2020/03/03 2020/03/04 2020/03/05 2020/03/06 2020/03/07 2020/03/08 2020/03/09 2020/09/15"
## [21] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM3_ESM.xls Hsapiens 7 2020/03/02 2020/03/06 2020/09/15 2020/09/01 2020/09/02 2020/09/06 2020/09/09"
## [22] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM3_ESM.xls Hsapiens 9 2020/03/01 2020/03/06 2020/03/09 2020/09/15 2020/09/01 2020/09/02 2020/09/06 2020/09/07 2020/09/09"
## [23] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM3_ESM.xls Hsapiens 2 2020/03/06 2020/03/07"
## [24] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM3_ESM.xls Hsapiens 6 2020/03/01 2020/03/02 2020/03/06 2020/03/07 2020/03/08 2020/09/15"
## [25] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM5_ESM.xls Hsapiens 26 2020/12/01 2020/03/01 2020/03/02 2020/03/01 2020/03/10 2020/03/11 2020/03/02 2020/03/03 2020/03/04 2020/03/05 2020/03/06 2020/03/07 2020/03/08 2020/03/09 2020/09/15 2020/09/01 2020/09/10 2020/09/11 2020/09/12 2020/09/02 2020/09/03 2020/09/04 2020/09/06 2020/09/07 2020/09/08 2020/09/09"
## [26] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM5_ESM.xls Hsapiens 26 2020/12/01 2020/03/01 2020/03/02 2020/03/01 2020/03/10 2020/03/11 2020/03/02 2020/03/03 2020/03/04 2020/03/05 2020/03/06 2020/03/07 2020/03/08 2020/03/09 2020/09/15 2020/09/01 2020/09/10 2020/09/11 2020/09/12 2020/09/02 2020/09/03 2020/09/04 2020/09/06 2020/09/07 2020/09/08 2020/09/09"
## [27] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM5_ESM.xls Hsapiens 13 2020/12/01 2020/03/01 2020/03/10 2020/03/11 2020/03/02 2020/03/03 2020/03/04 2020/03/05 2020/03/06 2020/03/07 2020/03/08 2020/03/09 2020/09/15"
## [28] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM9_ESM.xls Hsapiens 14 2020/03/09 2020/03/01 2020/09/15 2020/03/08 2020/03/10 2020/03/07 2020/12/01 2020/03/02 2020/03/06 2020/03/11 2020/03/05 2020/03/11 2020/03/04 2020/03/03"
## [29] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM9_ESM.xls Hsapiens 13 2020/03/06 2020/03/02 2020/03/05 2020/03/07 2020/12/01 2020/03/08 2020/03/04 2020/03/11 2020/03/01 2020/09/15 2020/03/09 2020/03/03 2020/03/10"
## [30] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM9_ESM.xls Hsapiens 20 2020/03/06 2020/03/03 2020/03/08 2020/03/03 2020/03/07 2020/03/06 2020/03/06 2020/03/05 2020/03/07 2020/03/06 2020/03/01 2020/03/10 2020/03/01 2020/12/01 2020/03/09 2020/03/06 2020/03/04 2020/03/09 2020/03/11 2020/03/01"
## [31] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM9_ESM.xls Hsapiens 31 2020/03/07 2020/03/06 2020/09/15 2020/03/02 2020/03/06 2020/03/06 2020/03/07 2020/03/01 2020/03/07 2020/03/10 2020/03/06 2020/03/01 2020/03/04 2020/03/08 2020/03/07 2020/12/01 2020/03/11 2020/03/05 2020/03/07 2020/03/03 2020/03/09 2020/03/10 2020/03/08 2020/03/08 2020/03/06 2020/03/01 2020/03/06 2020/03/09 2020/03/03 2020/03/01 2020/03/05"
## [32] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM9_ESM.xls Hsapiens 16 2020/03/02 2020/03/08 2020/03/08 2020/03/01 2020/03/11 2020/03/11 2020/03/06 2020/03/06 2020/03/06 2020/03/05 2020/03/05 2020/03/10 2020/03/02 2020/03/04 2020/03/08 2020/03/03"
## [33] "PMC7814551 /pmc/articles/PMC7814551/bin/12967_2020_2698_MOESM9_ESM.xls Hsapiens 25 2020/03/06 2020/03/01 2020/03/05 2020/03/09 2020/03/05 2020/03/10 2020/03/02 2020/03/11 2020/09/15 2020/03/06 2020/03/07 2020/03/08 2020/03/02 2020/03/02 2020/03/01 2020/03/03 2020/03/08 2020/03/10 2020/03/04 2020/03/02 2020/03/11 2020/03/06 2020/03/10 2020/12/01 2020/03/08"
## [34] "PMC7844323 /pmc/articles/PMC7844323/bin/Table_2.XLSX Hsapiens 1 43891"
## [35] "PMC7844323 /pmc/articles/PMC7844323/bin/Table_2.XLSX Hsapiens 1 43891"
## [36] "PMC7826362 /pmc/articles/PMC7826362/bin/pnas.2008890118.sd02.xlsx Mmusculus 5 39692 39692 39692 39692 39692"
## [37] "PMC7826362 /pmc/articles/PMC7826362/bin/pnas.2008890118.sd02.xlsx Mmusculus 1 39692"
## [38] "PMC7826362 /pmc/articles/PMC7826362/bin/pnas.2008890118.sd03.xlsx Mmusculus 24 37500 39508 40787 39142 38047 39326 40422 37865 37316 39873 38777 38596 40057 38412 37681 39692 40603 40238 38231 37135 36951 36951 38961 41153"
## [39] "PMC7812929 /pmc/articles/PMC7812929/bin/peerj-09-10671-s007.xlsx Hsapiens 30 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900 43900"
## [40] "PMC7807721 /pmc/articles/PMC7807721/bin/13059_2020_2258_MOESM10_ESM.xlsx Dmelanogaster 1 37104"
## [41] "PMC7807721 /pmc/articles/PMC7807721/bin/13059_2020_2258_MOESM10_ESM.xlsx Hsapiens 4 38231 40787 38231 39692"
## [42] "PMC7838615 /pmc/articles/PMC7838615/bin/Table_1.XLSX Hsapiens 1 44075"
## [43] "PMC7815943 /pmc/articles/PMC7815943/bin/mmc2.xlsx Mmusculus 6 44085 43891 44083 43893 43900 43894"
## [44] "PMC7815943 /pmc/articles/PMC7815943/bin/mmc2.xlsx Mmusculus 1 43891"
## [45] "PMC7801399 /pmc/articles/PMC7801399/bin/41598_2020_79538_MOESM7_ESM.xlsx Mmusculus 22 43527 43527 43527 43527 43718 43712 43525 43525 43525 43718 43535 43535 43535 43535 43527 43525 43530 43525 43525 43712 43715 43532"
## [46] "PMC7821388 /pmc/articles/PMC7821388/bin/DataSheet_1.xlsx Hsapiens 8 43892 43899 43895 43897 43892 43898 43896 43891"
## [47] "PMC7821388 /pmc/articles/PMC7821388/bin/DataSheet_2.xlsx Hsapiens 6 43892 43896 43892 43898 43891 43895"
## [48] "PMC7820775 /pmc/articles/PMC7820775/bin/Table_1.xlsx Hsapiens 16 42248 37316 40057 37500 38961 38777 37135 37135 36951 39873 39508 40787 38231 38412 36951 39326"
## [49] "PMC7819609 /pmc/articles/PMC7819609/bin/pgen.1009224.s002.xlsx Hsapiens 9 37316 37316 37316 40787 37316 40422 40422 41153 40422"
## [50] "PMC7794586 /pmc/articles/PMC7794586/bin/41467_2020_20460_MOESM8_ESM.xlsx Hsapiens 2 43896 44088"
## [51] "PMC7794364 /pmc/articles/PMC7794364/bin/41540_2020_161_MOESM3_ESM.xlsx Hsapiens 5 39142 37226 40238 40057 37316"
## [52] "PMC7794364 /pmc/articles/PMC7794364/bin/41540_2020_161_MOESM3_ESM.xlsx Hsapiens 13 39142 38047 36951 39692 37226 39873 37135 38231 40238 40057 37316 38961 39326"
## [53] "PMC7794364 /pmc/articles/PMC7794364/bin/41540_2020_161_MOESM3_ESM.xlsx Hsapiens 13 39142 38047 36951 39692 37226 39873 37135 38231 40238 40057 37316 38961 39326"
## [54] "PMC7794364 /pmc/articles/PMC7794364/bin/41540_2020_161_MOESM3_ESM.xlsx Hsapiens 14 39142 38047 36951 39692 37226 39873 37135 38231 40238 40057 37316 38961 39326 40787"
## [55] "PMC7794364 /pmc/articles/PMC7794364/bin/41540_2020_161_MOESM3_ESM.xlsx Hsapiens 13 39142 38047 36951 39692 37226 39873 37135 38231 40238 40057 37316 38961 39326"
## [56] "PMC7794364 /pmc/articles/PMC7794364/bin/41540_2020_161_MOESM3_ESM.xlsx Hsapiens 1 40787"
## [57] "PMC7791999 /pmc/articles/PMC7791999/bin/12864_2020_7334_MOESM2_ESM.xlsx Hsapiens 1 44085"
## [58] "PMC7791999 /pmc/articles/PMC7791999/bin/12864_2020_7334_MOESM2_ESM.xlsx Hsapiens 3 44085 44085 44083"
## [59] "PMC7791837 /pmc/articles/PMC7791837/bin/13059_2020_2208_MOESM3_ESM.xlsx Drerio 8 38412 39142 37500 37316 37681 39508 40422 38047"
## [60] "PMC7791837 /pmc/articles/PMC7791837/bin/13059_2020_2208_MOESM3_ESM.xlsx Drerio 16 39142 38777 42248 38961 37865 40603 36951 37316 39142 37500 38412 37681 39508 37316 40422 38047"
## [61] "PMC7792223 /pmc/articles/PMC7792223/bin/12864_2020_7329_MOESM23_ESM.xlsx Hsapiens 1 41153"
## [62] "PMC7791882 /pmc/articles/PMC7791882/bin/12920_2020_853_MOESM1_ESM.xlsx Hsapiens 1 44166"
## [63] "PMC7789526 /pmc/articles/PMC7789526/bin/12864_2020_7305_MOESM4_ESM.xlsx Ggallus 7 44082 44083 44081 43896 43896 43898 43894"
## [64] "PMC7789526 /pmc/articles/PMC7789526/bin/12864_2020_7305_MOESM5_ESM.xlsx Ggallus 1 43354"
## [65] "PMC7788918 /pmc/articles/PMC7788918/bin/13058_2020_1379_MOESM2_ESM.xlsx Hsapiens 4 42983 42984 42987 42980"
## [66] "PMC7788918 /pmc/articles/PMC7788918/bin/13058_2020_1379_MOESM2_ESM.xlsx Hsapiens 4 42984 42980 42987 42983"
## [67] "PMC7788893 /pmc/articles/PMC7788893/bin/12967_2020_2690_MOESM1_ESM.xlsx Hsapiens 19 44166 43892 44081 44082 43893 44084 43899 44083 43897 44085 43898 43896 44086 43894 44076 44089 43901 43891 44080"
## [68] "PMC7788893 /pmc/articles/PMC7788893/bin/12967_2020_2690_MOESM1_ESM.xlsx Hsapiens 19 43896 43898 44083 44076 44085 43897 43893 44084 43894 43901 44082 44166 43891 44086 44080 43892 43899 44089 44081"
## [69] "PMC7788893 /pmc/articles/PMC7788893/bin/12967_2020_2690_MOESM1_ESM.xlsx Hsapiens 19 43901 43894 43899 44080 44089 44086 44081 43892 43896 43898 44082 44085 44166 43893 44084 43891 44076 44083 43897"
## [70] "PMC7788893 /pmc/articles/PMC7788893/bin/12967_2020_2690_MOESM1_ESM.xlsx Hsapiens 19 43894 43901 44085 43891 44086 44089 44082 43892 44166 44080 44081 44083 43899 43898 44076 44084 43893 43896 43897"
## [71] "PMC7788893 /pmc/articles/PMC7788893/bin/12967_2020_2690_MOESM1_ESM.xlsx Hsapiens 19 43893 44080 43897 44076 44083 44081 43891 43894 43901 44082 43892 44084 44166 43898 44086 44085 44089 43899 43896"
## [72] "PMC7788893 /pmc/articles/PMC7788893/bin/12967_2020_2690_MOESM1_ESM.xlsx Hsapiens 19 44086 44081 44089 44082 43891 43892 43899 43898 43896 44084 43893 44076 44166 43897 43894 43901 44080 44083 44085"
## [73] "PMC7788893 /pmc/articles/PMC7788893/bin/12967_2020_2690_MOESM1_ESM.xlsx Hsapiens 19 44082 44089 43892 44084 43891 43898 44166 43899 44080 43896 44076 43901 43894 43897 44081 43893 44085 44083 44086"
## [74] "PMC7788839 /pmc/articles/PMC7788839/bin/12935_2020_1707_MOESM2_ESM.xlsx Hsapiens 1 43891"
## [75] "PMC7788839 /pmc/articles/PMC7788839/bin/12935_2020_1707_MOESM3_ESM.xlsx Hsapiens 1 43891"
## [76] "PMC7788839 /pmc/articles/PMC7788839/bin/12935_2020_1707_MOESM7_ESM.xlsx Hsapiens 2 44075 44080"
## [77] "PMC7788826 /pmc/articles/PMC7788826/bin/12864_2020_7186_MOESM7_ESM.xlsx Hsapiens 1 44078"
## [78] "PMC7788777 /pmc/articles/PMC7788777/bin/13072_2020_376_MOESM12_ESM.xlsx Mmusculus 2 44089 44076"
## [79] "PMC7788777 /pmc/articles/PMC7788777/bin/13072_2020_376_MOESM12_ESM.xlsx Mmusculus 1 44089"
## [80] "PMC7788777 /pmc/articles/PMC7788777/bin/13072_2020_376_MOESM12_ESM.xlsx Mmusculus 2 44076 44081"
## [81] "PMC7788777 /pmc/articles/PMC7788777/bin/13072_2020_376_MOESM1_ESM.xlsx Rnorvegicus 23 43527 43535 43535 43525 43535 43531 43525 43527 43717 43535 43527 43717 43534 43532 43717 43719 43719 43527 43715 43525 43709 43711 43525"
## [82] "PMC7788777 /pmc/articles/PMC7788777/bin/13072_2020_376_MOESM1_ESM.xlsx Rnorvegicus 7 43527 43532 43527 43531 43711 43527 43717"
## [83] "PMC7788743 /pmc/articles/PMC7788743/bin/12864_2020_7314_MOESM5_ESM.xlsx Hsapiens 3 44086 44075 44088"
## [84] "PMC7786480 /pmc/articles/PMC7786480/bin/12885_2020_7728_MOESM6_ESM.xlsx Hsapiens 1 43358"
## [85] "PMC7812315 /pmc/articles/PMC7812315/bin/LSA-2020-00864_TableS1.xlsx Mmusculus 2 38961 38231"
## [86] "PMC7812315 /pmc/articles/PMC7812315/bin/LSA-2020-00864_TableS1.xlsx Mmusculus 2 40422 38596"
## [87] "PMC7812315 /pmc/articles/PMC7812315/bin/LSA-2020-00864_TableS1.xlsx Mmusculus 1 40057"
## [88] "PMC7812315 /pmc/articles/PMC7812315/bin/LSA-2020-00864_TableS1.xlsx Mmusculus 3 38777 38231 37865"
## [89] "PMC7812315 /pmc/articles/PMC7812315/bin/LSA-2020-00864_TableS1.xlsx Mmusculus 3 41883 40238 39692"
## [90] "PMC7812315 /pmc/articles/PMC7812315/bin/LSA-2020-00864_TableS1.xlsx Mmusculus 2 40422 37865"
## [91] "PMC7812315 /pmc/articles/PMC7812315/bin/LSA-2020-00864_TableS1.xlsx Mmusculus 3 40422 39326 40238"
## [92] "PMC7812315 /pmc/articles/PMC7812315/bin/LSA-2020-00864_TableS1.xlsx Mmusculus 2 39142 36951"
## [93] "PMC7812315 /pmc/articles/PMC7812315/bin/LSA-2020-00864_TableS1.xlsx Mmusculus 5 40787 40057 40238 39142 38047"
## [94] "PMC7812315 /pmc/articles/PMC7812315/bin/LSA-2020-00864_TableS1.xlsx Mmusculus 2 40057 38777"
## [95] "PMC7812315 /pmc/articles/PMC7812315/bin/LSA-2020-00864_TableS1.xlsx Mmusculus 3 40238 38777 38047"
## [96] "PMC7812315 /pmc/articles/PMC7812315/bin/LSA-2020-00864_TableS1.xlsx Mmusculus 3 39326 39142 40238"
## [97] "PMC7812315 /pmc/articles/PMC7812315/bin/LSA-2020-00864_TableS1.xlsx Mmusculus 1 37681"
## [98] "PMC7782737 /pmc/articles/PMC7782737/bin/41467_2020_20236_MOESM5_ESM.xlsx Hsapiens 6 43717 43717 43526 43525 43525 43525"
## [99] "PMC7782737 /pmc/articles/PMC7782737/bin/41467_2020_20236_MOESM5_ESM.xlsx Hsapiens 11 43717 43528 43525 43525 43525 43525 43525 43535 43800 43800 43800"
## [100] "PMC7782510 /pmc/articles/PMC7782510/bin/41467_2020_20282_MOESM6_ESM.xlsx Mmusculus 21 37865 39873 38412 38231 38961 39508 40057 37681 38777 36951 39326 39142 37316 37316 38596 37135 39692 37500 40603 40787 40422"
## [101] "PMC7782510 /pmc/articles/PMC7782510/bin/41467_2020_20282_MOESM7_ESM.xlsx Mmusculus 13 37316 37316 40787 39326 38596 37681 37681 37500 39142 39142 39326 40057 37681"
## [102] "PMC7780630 /pmc/articles/PMC7780630/bin/13059_2020_2230_MOESM1_ESM.xlsx Hsapiens 1 40057"
## [103] "PMC7797533 /pmc/articles/PMC7797533/bin/ijmsv18p0706s3.xlsx Hsapiens 14 40973 40972 40973 40974 40976 41160 41162 41162 41160 40977 41163 41163 40974 40971"
## [104] "PMC7773562 /pmc/articles/PMC7773562/bin/mmc1.xlsx Hsapiens 1 37257"
## [105] "PMC7811140 /pmc/articles/PMC7811140/bin/mmc2.xlsx Hsapiens 11 44079 44085 43893 44078 44078 44083 44083 44076 44081 43895 44084"
## [106] "PMC7811140 /pmc/articles/PMC7811140/bin/mmc2.xlsx Hsapiens 5 44083 44166 43900 44078 43900"
## [107] "PMC7811140 /pmc/articles/PMC7811140/bin/mmc4.xlsx Hsapiens 28 44089 44083 44082 44085 44081 43899 44079 44084 44076 44080 43892 43898 43897 43895 43893 43892 43896 44086 44078 43891 44075 43891 44077 44166 43901 43900 43894 44088"
## [108] "PMC7811140 /pmc/articles/PMC7811140/bin/mmc4.xlsx Hsapiens 28 44088 43894 43900 43901 44166 44077 43891 43891 44075 44078 44086 43895 43897 44084 43893 43892 44080 43892 44081 44076 44089 43896 43898 44085 43899 44083 44082 44079"
## [109] "PMC7793309 /pmc/articles/PMC7793309/bin/pgen.1009270.s015.xlsx Hsapiens 1 43526"
## [110] "PMC7793309 /pmc/articles/PMC7793309/bin/pgen.1009270.s015.xlsx Ggallus 4 43713 43714 43717 43526"
## [111] "PMC7793309 /pmc/articles/PMC7793309/bin/pgen.1009270.s016.xlsx Hsapiens 1 44086"
## [112] "PMC7793258 /pmc/articles/PMC7793258/bin/pgen.1009235.s010.xlsx Dmelanogaster 1 44076"
## [113] "PMC7793258 /pmc/articles/PMC7793258/bin/pgen.1009235.s010.xlsx Dmelanogaster 1 44075"
## [114] "PMC7824480 /pmc/articles/PMC7824480/bin/cells-10-00025-s001.xlsx Hsapiens 27 43892 43891 44084 43897 43894 44076 44085 43891 43896 43901 43893 44082 44081 44088 44166 43898 43895 43899 44086 44075 44078 43900 44083 43892 44079 44077 44080"
## [115] "PMC7790417 /pmc/articles/PMC7790417/bin/pcbi.1008491.s010.xls Hsapiens 1 40972"
## [116] "PMC7790417 /pmc/articles/PMC7790417/bin/pcbi.1008491.s010.xls Hsapiens 1 36950"
## [117] "PMC7790417 /pmc/articles/PMC7790417/bin/pcbi.1008491.s010.xls Hsapiens 1 36950"
## [118] "PMC7790417 /pmc/articles/PMC7790417/bin/pcbi.1008491.s014.xlsx Hsapiens 2 37865 38231"
## [119] "PMC7790417 /pmc/articles/PMC7790417/bin/pcbi.1008491.s014.xlsx Hsapiens 1 37865"
## [120] "PMC7788095 /pmc/articles/PMC7788095/bin/mmc3.xlsx Hsapiens 1 38777"
## [121] "PMC7788095 /pmc/articles/PMC7788095/bin/mmc3.xlsx Hsapiens 1 38777"
## [122] "PMC7788095 /pmc/articles/PMC7788095/bin/mmc4.xlsx Hsapiens 1 38777"
## [123] "PMC7788095 /pmc/articles/PMC7788095/bin/mmc4.xlsx Hsapiens 1 38777"
## [124] "PMC7785587 /pmc/articles/PMC7785587/bin/Data_Sheet_1.xlsx Hsapiens 1 43891"
## [125] "PMC7785587 /pmc/articles/PMC7785587/bin/Data_Sheet_1.xlsx Hsapiens 1 43891"
## [126] "PMC7785587 /pmc/articles/PMC7785587/bin/Data_Sheet_1.xlsx Hsapiens 1 43891"
## [127] "PMC7757804 /pmc/articles/PMC7757804/bin/pntd.0008883.s010.xlsx Hsapiens 2 44077 44083"
## [128] "PMC7773550 /pmc/articles/PMC7773550/bin/mmc2.xlsx Hsapiens 22 37135 38961 38961 37135 37135 37500 40057 40057 40057 40057 40057 40057 37135 39692 40057 38961 39326 40057 38961 37135 40057 37135"
## [129] "PMC7773550 /pmc/articles/PMC7773550/bin/mmc3.xlsx Hsapiens 26 37135 38961 38961 37135 40057 40057 37500 40057 40057 37135 40057 40057 38961 39326 37135 40057 38961 39692 37135 40057 37135 40057 37135 37135 37135 40057"
## [130] "PMC7773550 /pmc/articles/PMC7773550/bin/mmc3.xlsx Hsapiens 3 38961 37135 38961"
## [131] "PMC7773550 /pmc/articles/PMC7773550/bin/mmc3.xlsx Hsapiens 3 38961 37135 38961"
## [132] "PMC7773550 /pmc/articles/PMC7773550/bin/mmc3.xlsx Hsapiens 1 38961"
## [133] "PMC7773550 /pmc/articles/PMC7773550/bin/mmc4.xlsx Hsapiens 3 44083 44083 44083"
## [134] "PMC7779551 /pmc/articles/PMC7779551/bin/Table_2.XLSX Mmusculus 1 44083"
## [135] "PMC7745987 zip/S5_Table_5.1.xlsx Hsapiens 12 43715 43526 43719 43530 43712 43527 43717 43530 43714 43715 43718 43714"
## [136] "PMC7745987 zip/S5_Table_5.1.xlsx Hsapiens 1 43527"
## [137] "PMC7745987 zip/S5_Table_5.1.xlsx Hsapiens 2 43717 43530"
## [138] "PMC7745987 zip/S5_Table_5.1.xlsx Hsapiens 1 43719"
## [139] "PMC7745987 zip/S5_Table_5.2.xlsx Hsapiens 2 43712 43710"
## [140] "PMC7744083 /pmc/articles/PMC7744083/bin/abc5629_Data_file_S1.xlsx Mmusculus 1 36951"
## [141] "PMC7744083 /pmc/articles/PMC7744083/bin/abc5629_Data_file_S3.xlsx Mmusculus 5 36892 37104 37469 37834 38200"
## [142] "PMC7744083 /pmc/articles/PMC7744083/bin/abc5629_Data_file_S3.xlsx Mmusculus 5 36892 37104 37469 37834 38200"
## [143] "PMC7744083 /pmc/articles/PMC7744083/bin/abc5629_Data_file_S4.xlsx Mmusculus 2 43352 43167"
## [144] "PMC7744083 /pmc/articles/PMC7744083/bin/abc5629_Data_file_S4.xlsx Mmusculus 2 43352 43167"
## [145] "PMC7744083 /pmc/articles/PMC7744083/bin/abc5629_Data_file_S4.xlsx Mmusculus 14 43353 43353 43168 43347 43169 43352 43352 43352 43352 43352 43354 43354 43167 43349"
## [146] "PMC7744083 /pmc/articles/PMC7744083/bin/abc5629_Data_file_S4.xlsx Mmusculus 2 43353 43167"
## [147] "PMC7744083 /pmc/articles/PMC7744083/bin/abc5629_Data_file_S5.xlsx Mmusculus 18 43345 43161 43168 43351 43352 43352 43165 43348 43161 43162 43164 43166 43354 43357 43167 43350 43350 43349"
## [148] "PMC7744083 /pmc/articles/PMC7744083/bin/abc5629_Data_file_S5.xlsx Mmusculus 18 43345 43161 43168 43351 43352 43352 43165 43348 43161 43162 43164 43166 43354 43357 43167 43350 43350 43349"
## [149] "PMC7744083 /pmc/articles/PMC7744083/bin/abc5629_Data_file_S5.xlsx Mmusculus 18 43345 43161 43168 43351 43352 43352 43165 43348 43161 43162 43164 43166 43354 43357 43167 43350 43350 43349"
## [150] "PMC7744083 /pmc/articles/PMC7744083/bin/abc5629_Data_file_S5.xlsx Mmusculus 18 43345 43161 43168 43351 43352 43352 43165 43348 43161 43162 43164 43166 43354 43357 43167 43350 43350 43349"
## [151] "PMC7773997 /pmc/articles/PMC7773997/bin/mBio.03044-20-sd001.xlsx Scerevisiae 1 44105"
## [152] "PMC7773990 /pmc/articles/PMC7773990/bin/mBio.02664-20-sd001.xlsx Mmusculus 7 43532 43531 43719 43717 43530 43526 43710"
## [153] "PMC7773990 /pmc/articles/PMC7773990/bin/mBio.02664-20-sd001.xlsx Mmusculus 4 43354 43345 43166 43165"
## [154] "PMC7793715 /pmc/articles/PMC7793715/bin/Table_1.XLSX Hsapiens 8 43892 44086 44077 43892 44088 44075 43900 44078"
## [155] "PMC7793715 /pmc/articles/PMC7793715/bin/Table_1.XLSX Hsapiens 18 44080 43897 44083 44085 43895 43898 44089 44082 44084 44081 44079 43891 43891 44076 44166 43899 43893 43896"
## [156] "PMC7793715 /pmc/articles/PMC7793715/bin/Table_2.XLSX Hsapiens 1 44080"
## [157] "PMC7793715 /pmc/articles/PMC7793715/bin/Table_3.XLSX Hsapiens 26 43892 44086 44077 43892 44088 44075 43900 44078 43896 43893 43899 44166 44076 43891 43891 44079 44081 44084 44082 44089 43898 43895 44085 44083 43897 44080"
## [158] "PMC7758050 /pmc/articles/PMC7758050/bin/pbio.3000975.s010.xlsx Ggallus 1 42251"
## [159] "PMC7782839 /pmc/articles/PMC7782839/bin/41416_2020_1178_MOESM3_ESM.xlsx Hsapiens 16 44078 44080 44075 43891 43892 43894 44081 44085 44079 44076 43893 43897 43895 44082 43899 43896"
## [160] "PMC7796900 /pmc/articles/PMC7796900/bin/mmc1.xlsx Hsapiens 28 44166 43891 43892 43891 43900 43901 43892 43893 43894 43895 43896 43897 43898 43899 44089 44075 44084 44085 44086 44088 44076 44077 44078 44079 44080 44081 44082 44083"
## [161] "PMC7796900 /pmc/articles/PMC7796900/bin/mmc1.xlsx Hsapiens 28 44166 43891 43892 43891 43900 43901 43892 43893 43894 43895 43896 43897 43898 43899 44089 44075 44084 44085 44086 44088 44076 44077 44078 44079 44080 44081 44082 44083"
## [162] "PMC7796900 /pmc/articles/PMC7796900/bin/mmc1.xlsx Hsapiens 28 44166 43891 43892 43891 43900 43901 43892 43893 43894 43895 43896 43897 43898 43899 44089 44075 44084 44085 44086 44088 44076 44077 44078 44079 44080 44081 44082 44083"
## [163] "PMC7796900 /pmc/articles/PMC7796900/bin/mmc1.xlsx Hsapiens 28 44166 43891 43892 43891 43900 43901 43892 43893 43894 43895 43896 43897 43898 43899 44089 44075 44084 44085 44086 44088 44076 44077 44078 44079 44080 44081 44082 44083"
## [164] "PMC7796900 /pmc/articles/PMC7796900/bin/mmc1.xlsx Hsapiens 28 44166 43891 43892 43891 43900 43901 43892 43893 43894 43895 43896 43897 43898 43899 44089 44075 44084 44085 44086 44088 44076 44077 44078 44079 44080 44081 44082 44083"
## [165] "PMC7781436 /pmc/articles/PMC7781436/bin/NIHMS1653552-supplement-Table_S3.xlsx Hsapiens 99 40603 40238 40603 38047 38961 38596 38596 39692 38596 39873 38596 38596 38596 37226 38047 40787 38596 38047 38596 38961 38596 38596 40238 38047 38047 38596 39873 38596 37316 40603 38777 37226 38596 39873 38596 38596 37865 38596 38047 38231 38596 38047 40422 37226 37316 39692 38596 37226 38596 40057 38047 38596 38596 38596 38047 38596 38596 37226 38596 38047 38047 38047 37226 38596 38047 37226 38047 38047 38596 38047 38596 40422 38596 38596 38596 38047 38596 37865 38596 37865 38596 40422 38596 38047 40422 37226 37226 37226 37226 40603 38047 38596 39692 37226 40238 38047 38596 38047 37226"
## [166] "PMC7781436 /pmc/articles/PMC7781436/bin/NIHMS1653552-supplement-Table_S3.xlsx Hsapiens 27 38596 38047 37226 40603 37865 40238 40422 39692 38231 39873 37316 40057 36951 38961 37681 40787 36951 39508 38777 41883 41153 38412 37135 39326 39142 37500 37316"
## [167] "PMC7781436 /pmc/articles/PMC7781436/bin/NIHMS1653552-supplement-Table_S5.xlsx Hsapiens 2 38596 38047"
## [168] "PMC7781436 /pmc/articles/PMC7781436/bin/NIHMS1653552-supplement-Table_S5.xlsx Hsapiens 3 40057 38047 36951"
## [169] "PMC7781436 /pmc/articles/PMC7781436/bin/NIHMS1653552-supplement-Table_S5.xlsx Hsapiens 2 38596 40787"
## [170] "PMC7781436 /pmc/articles/PMC7781436/bin/NIHMS1653552-supplement-Table_S5.xlsx Hsapiens 1 38596"
## [171] "PMC7781436 /pmc/articles/PMC7781436/bin/NIHMS1653552-supplement-Table_S5.xlsx Hsapiens 1 40057"
## [172] "PMC7781436 /pmc/articles/PMC7781436/bin/NIHMS1653552-supplement-Table_S5.xlsx Hsapiens 1 38596"
## [173] "PMC7781436 /pmc/articles/PMC7781436/bin/NIHMS1653552-supplement-Table_S5.xlsx Hsapiens 2 38961 36951"
## [174] "PMC7781436 /pmc/articles/PMC7781436/bin/NIHMS1653552-supplement-Table_S5.xlsx Hsapiens 3 40422 40057 38777"
## [175] "PMC7781436 /pmc/articles/PMC7781436/bin/NIHMS1653552-supplement-Table_S5.xlsx Hsapiens 1 38596"
## [176] "PMC7781436 /pmc/articles/PMC7781436/bin/NIHMS1653552-supplement-Table_S5.xlsx Hsapiens 1 40057"
## [177] "PMC7781436 /pmc/articles/PMC7781436/bin/NIHMS1653552-supplement-Table_S5.xlsx Hsapiens 1 37500"
## [178] "PMC7784491 /pmc/articles/PMC7784491/bin/supp_mcs.a005710_Supplemental_Table_1.xlsx Hsapiens 84 37226 36951 37316 36951 40238 40603 37316 37681 38047 38412 38777 39142 39508 39873 42248 37135 40422 40787 41153 41883 37500 37865 38231 38596 38961 39326 39692 40057 37226 36951 37316 36951 40238 40603 37316 37681 38047 38412 38777 39142 39508 39873 42248 37135 40422 40787 41153 41883 37500 37865 38231 38596 38961 39326 39692 40057 37226 36951 37316 36951 40238 40603 37316 37681 38047 38412 38777 39142 39508 39873 42248 37135 40422 40787 41153 41883 37500 37865 38231 38596 38961 39326 39692 40057"
## [179] "PMC7803632 /pmc/articles/PMC7803632/bin/mmc2.xlsx Mmusculus 2 43534 43534"
## [180] "PMC7803632 /pmc/articles/PMC7803632/bin/mmc2.xlsx Mmusculus 1 43533"
## [181] "PMC7803632 /pmc/articles/PMC7803632/bin/mmc2.xlsx Mmusculus 1 43527"
## [182] "PMC7803632 /pmc/articles/PMC7803632/bin/mmc2.xlsx Mmusculus 2 43535 43535"
## [183] "PMC7803632 /pmc/articles/PMC7803632/bin/mmc2.xlsx Rnorvegicus 1 43528"
## [184] "PMC7762488 /pmc/articles/PMC7762488/bin/aging-12-104062-s003.xlsx Hsapiens 4 43891 43891 43892 43892"
## [185] "PMC7762488 /pmc/articles/PMC7762488/bin/aging-12-104062-s003.xlsx Hsapiens 2 43892 43891"
## [186] "PMC7746351 /pmc/articles/PMC7746351/bin/aging-12-103460-s003..xlsx Hsapiens 23 43709 43525 43714 43713 43529 43532 43533 43711 43525 43719 43526 43717 43800 43716 43527 43530 43715 43718 43723 43712 43531 43526 43710"
## [187] "PMC7746351 /pmc/articles/PMC7746351/bin/aging-12-103460-s003..xlsx Hsapiens 23 43709 43714 43800 43527 43526 43525 43532 43525 43533 43711 43526 43715 43718 43719 43529 43717 43713 43723 43531 43716 43530 43710 43712"
## [188] "PMC7746351 /pmc/articles/PMC7746351/bin/aging-12-103460-s003..xlsx Hsapiens 23 43527 43526 43525 43533 43709 43711 43800 43526 43713 43715 43718 43714 43532 43723 43530 43529 43525 43717 43719 43531 43716 43712 43710"
## [189] "PMC7842189 /pmc/articles/PMC7842189/bin/NIHMS1647631-supplement-2.xlsx Mmusculus 8 42436 42438 42620 42437 42615 42623 42628 42439"
## [190] "PMC7782095 /pmc/articles/PMC7782095/bin/MOL2-15-138-s003.xlsx Hsapiens 2 44166 43891"
## [191] "PMC7764503 /pmc/articles/PMC7764503/bin/vdaa151_suppl_supplementary_table_s3.xlsx Hsapiens 38 37316 38777 36951 39326 39692 40057 40787 40057 42248 40057 37226 37316 39692 38777 38777 38961 38412 38961 40787 38961 40422 37681 38777 40422 38231 37500 40787 39326 38961 36951 39508 38777 37316 39142 39142 38777 38231 37500"
## [192] "PMC7774731 /pmc/articles/PMC7774731/bin/CAM4-9-9632-s001.xlsx Hsapiens 2 43530 43711"
## [193] "PMC7790754 /pmc/articles/PMC7790754/bin/41388_2020_1523_MOESM2_ESM.xlsx Hsapiens 18 44080 44084 43900 44086 43895 43896 43899 44082 44089 44078 44076 43893 43898 43892 44166 44083 44077 43897"
## [194] "PMC7759458 /pmc/articles/PMC7759458/bin/41586_2020_2853_MOESM10_ESM.xlsx Hsapiens 1 43899"
## [195] "PMC7818422 /pmc/articles/PMC7818422/bin/MEC-30-193-s002.xlsx Athaliana 1 37347"
## [196] "PMC7116647 /pmc/articles/PMC7116647/bin/EMS84795-supplement-Supplementary_tables.xlsx Hsapiens 26 40603 41153 39873 37135 39692 37316 36951 39508 38961 39142 41883 37681 40422 36951 37500 40238 38047 40057 38596 38231 38777 37865 38412 40787 37226 37316"
## [197] "PMC7116647 /pmc/articles/PMC7116647/bin/EMS84795-supplement-Supplementary_tables.xlsx Hsapiens 26 39873 40238 38596 38231 40603 37135 39508 40422 40057 36951 39142 37500 37316 37865 38412 38777 39692 41883 40787 36951 38961 37226 41153 37681 37316 38047"
## [198] "PMC7116647 /pmc/articles/PMC7116647/bin/EMS84795-supplement-Supplementary_tables.xlsx Hsapiens 26 39873 38596 40238 38231 36951 40422 37316 38412 39692 40057 38777 37135 39508 37226 36951 40603 38961 37865 38047 40787 37681 37316 39142 41153 41883 37500"
## [199] "PMC7116647 /pmc/articles/PMC7116647/bin/EMS84795-supplement-Supplementary_tables.xlsx Hsapiens 26 39873 40238 38231 38596 37316 40057 39692 37135 40422 40787 37226 38777 40603 39142 38412 41153 36951 37500 37865 39508 38961 41883 37681 36951 38047 37316"
## [200] "PMC7116647 /pmc/articles/PMC7116647/bin/EMS84795-supplement-Supplementary_tables.xlsx Hsapiens 26 38596 40238 38231 40057 39873 39692 40422 37681 37226 37135 37316 38777 40603 40787 38412 37500 39508 41153 36951 38961 36951 37865 41883 39142 37316 38047"
## [201] "PMC7116647 /pmc/articles/PMC7116647/bin/EMS84795-supplement-Supplementary_tables.xlsx Hsapiens 26 40238 38231 39873 38412 40057 40422 38596 39692 37226 37135 37316 37500 40787 40603 38777 36951 39508 41883 37865 37681 36951 38961 41153 37316 39142 38047"
## [202] "PMC7116647 /pmc/articles/PMC7116647/bin/EMS84795-supplement-Supplementary_tables.xlsx Hsapiens 26 37316 36951 38412 41153 37316 41883 38047 40603 39142 40057 40238 38961 38596 39692 36951 40422 37500 39873 38777 37135 37681 38231 37226 40787 39508 37865"
## [203] "PMC7116647 /pmc/articles/PMC7116647/bin/EMS84795-supplement-Supplementary_tables.xlsx Hsapiens 26 36951 37316 36951 37500 39142 38412 40603 38047 41153 41883 37681 37135 40057 40422 37226 39692 38777 38596 38961 39873 38231 40787 37865 39508 37316 40238"
## [204] "PMC7116647 /pmc/articles/PMC7116647/bin/EMS84795-supplement-Supplementary_tables.xlsx Hsapiens 26 38412 36951 37316 40422 37500 38596 38047 40238 38961 39873 40787 40603 41883 41153 36951 37135 37316 37681 37226 39142 39508 39692 40057 38231 38777 37865"
Let’s investigate the errors in more detail.
# By species
SPECIES <- sapply(strsplit(ERROR_GENELISTS," "),"[[",3)
table(SPECIES)
## SPECIES
## Athaliana Dmelanogaster Drerio Ggallus Hsapiens
## 1 3 2 5 144
## Mmusculus Rnorvegicus Scerevisiae
## 45 3 1
par(mar=c(5,12,4,2))
barplot(table(SPECIES),horiz=TRUE,las=1)
par(mar=c(5,5,4,2))
# Number of affected Excel files per paper
DIST <- table(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
DIST
##
## PMC7116647 PMC7744083 PMC7745987 PMC7746351 PMC7757804 PMC7758050 PMC7759458
## 9 11 5 3 1 1 1
## PMC7762488 PMC7764503 PMC7773550 PMC7773562 PMC7773990 PMC7773997 PMC7774731
## 2 1 6 1 2 1 1
## PMC7779551 PMC7780630 PMC7781436 PMC7782095 PMC7782510 PMC7782737 PMC7782839
## 1 1 13 1 2 2 1
## PMC7784491 PMC7785587 PMC7786480 PMC7788095 PMC7788743 PMC7788777 PMC7788826
## 1 3 1 4 1 5 1
## PMC7788839 PMC7788893 PMC7788918 PMC7789526 PMC7790417 PMC7790754 PMC7791837
## 3 7 2 2 5 1 2
## PMC7791882 PMC7791999 PMC7792223 PMC7793258 PMC7793309 PMC7793715 PMC7794364
## 1 2 1 2 3 4 6
## PMC7794586 PMC7796900 PMC7797533 PMC7801399 PMC7803632 PMC7807721 PMC7811140
## 1 5 1 1 5 2 4
## PMC7812315 PMC7812929 PMC7814551 PMC7815088 PMC7815943 PMC7817693 PMC7818422
## 13 1 23 1 2 1 1
## PMC7819609 PMC7820775 PMC7821388 PMC7821633 PMC7822813 PMC7824480 PMC7825165
## 1 1 2 2 2 1 1
## PMC7826362 PMC7836161 PMC7838615 PMC7842022 PMC7842189 PMC7844323
## 3 1 1 2 1 2
summary(as.numeric(DIST))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 2.957 3.000 23.000
hist(DIST,main="Number of affected Excel files per paper")
# PMC Articles with the most errors
DIST_DF <- as.data.frame(DIST)
DIST_DF <- DIST_DF[order(-DIST_DF$Freq),,drop=FALSE]
head(DIST_DF,20)
## Var1 Freq
## 52 PMC7814551 23
## 17 PMC7781436 13
## 50 PMC7812315 13
## 2 PMC7744083 11
## 1 PMC7116647 9
## 30 PMC7788893 7
## 10 PMC7773550 6
## 42 PMC7794364 6
## 3 PMC7745987 5
## 27 PMC7788777 5
## 33 PMC7790417 5
## 44 PMC7796900 5
## 47 PMC7803632 5
## 25 PMC7788095 4
## 41 PMC7793715 4
## 49 PMC7811140 4
## 4 PMC7746351 3
## 23 PMC7785587 3
## 29 PMC7788839 3
## 40 PMC7793309 3
MOST_ERR_FILES = as.character(DIST_DF[1,1])
MOST_ERR_FILES
## [1] "PMC7814551"
# Number of errors per paper
NERR <- as.numeric(sapply(strsplit(ERROR_GENELISTS," "),"[[",4))
names(NERR) <- sapply(strsplit(ERROR_GENELISTS," "),"[[",1)
NERR <-tapply(NERR, names(NERR), sum)
NERR
## PMC7116647 PMC7744083 PMC7745987 PMC7746351 PMC7757804 PMC7758050 PMC7759458
## 234 103 18 69 2 1 1
## PMC7762488 PMC7764503 PMC7773550 PMC7773562 PMC7773990 PMC7773997 PMC7774731
## 6 38 58 1 11 1 2
## PMC7779551 PMC7780630 PMC7781436 PMC7782095 PMC7782510 PMC7782737 PMC7782839
## 1 1 144 2 34 17 16
## PMC7784491 PMC7785587 PMC7786480 PMC7788095 PMC7788743 PMC7788777 PMC7788826
## 84 3 1 4 3 35 1
## PMC7788839 PMC7788893 PMC7788918 PMC7789526 PMC7790417 PMC7790754 PMC7791837
## 4 133 8 8 6 18 24
## PMC7791882 PMC7791999 PMC7792223 PMC7793258 PMC7793309 PMC7793715 PMC7794364
## 1 4 1 2 6 53 59
## PMC7794586 PMC7796900 PMC7797533 PMC7801399 PMC7803632 PMC7807721 PMC7811140
## 2 140 14 22 7 5 72
## PMC7812315 PMC7812929 PMC7814551 PMC7815088 PMC7815943 PMC7817693 PMC7818422
## 32 30 310 4 7 1 1
## PMC7819609 PMC7820775 PMC7821388 PMC7821633 PMC7822813 PMC7824480 PMC7825165
## 9 16 14 327 5 27 2
## PMC7826362 PMC7836161 PMC7838615 PMC7842022 PMC7842189 PMC7844323
## 30 5 1 28 8 2
hist(NERR,main="number of errors per PMC article")
NERR_DF <- as.data.frame(NERR)
NERR_DF <- NERR_DF[order(-NERR_DF$NERR),,drop=FALSE]
head(NERR_DF,20)
## NERR
## PMC7821633 327
## PMC7814551 310
## PMC7116647 234
## PMC7781436 144
## PMC7796900 140
## PMC7788893 133
## PMC7744083 103
## PMC7784491 84
## PMC7811140 72
## PMC7746351 69
## PMC7794364 59
## PMC7773550 58
## PMC7793715 53
## PMC7764503 38
## PMC7788777 35
## PMC7782510 34
## PMC7812315 32
## PMC7812929 30
## PMC7826362 30
## PMC7842022 28
MOST_ERR = rownames(NERR_DF)[1]
MOST_ERR
## [1] "PMC7821633"
GENELIST_ERROR_ARTICLES <- gsub("PMC","",GENELIST_ERROR_ARTICLES)
### JSON PARSING is more reliable than XML
ARTICLES <- esummary( GENELIST_ERROR_ARTICLES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA$result
ARTICLE_DATA <- ARTICLE_DATA[2:length(ARTICLE_DATA)]
JOURNALS <- unlist(lapply(ARTICLE_DATA,function(x) {x$fulljournalname} ))
JOURNALS_TABLE <- table(JOURNALS)
JOURNALS_TABLE <- JOURNALS_TABLE[order(-JOURNALS_TABLE)]
length(JOURNALS_TABLE)
## [1] 47
par(mar=c(5,25,4,2))
barplot(head(JOURNALS_TABLE,10), horiz=TRUE, las=1,
xlab="Articles with gene name errors in supp files",
main="Top journals this month")
Congrats to our Journal of the Month winner!
JOURNAL_WINNER <- names(head(JOURNALS_TABLE,1))
JOURNAL_WINNER
## [1] "BMC Genomics"
There are two categories:
Paper with the most suplementary files affected by gene name errors (MOST_ERR_FILES)
Paper with the most gene names converted to dates (MOST_ERR)
Sometimes, one paper can win both categories. Congrats to our winners.
MOST_ERR_FILES <- gsub("PMC","",MOST_ERR_FILES)
ARTICLES <- esummary( MOST_ERR_FILES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA[2]
ARTICLE_DATA
## $result
## $result$uids
## [1] "7814551"
##
## $result$`7814551`
## $result$`7814551`$uid
## [1] "7814551"
##
## $result$`7814551`$pubdate
## [1] "2021 Jan 19"
##
## $result$`7814551`$epubdate
## [1] "2021 Jan 19"
##
## $result$`7814551`$printpubdate
## [1] ""
##
## $result$`7814551`$source
## [1] "J Transl Med"
##
## $result$`7814551`$authors
## name authtype
## 1 Zhao X Author
## 2 Zhang L Author
## 3 Wang J Author
## 4 Zhang M Author
## 5 Song Z Author
## 6 Ni B Author
## 7 You Y Author
##
## $result$`7814551`$title
## [1] "Identification of key biomarkers and immune infiltration in systemic lupus erythematosus by integrated bioinformatics analysis"
##
## $result$`7814551`$volume
## [1] "19"
##
## $result$`7814551`$issue
## [1] ""
##
## $result$`7814551`$pages
## [1] "35"
##
## $result$`7814551`$articleids
## idtype value
## 1 pmid 33468161
## 2 doi 10.1186/s12967-020-02698-x
## 3 pmcid PMC7814551
##
## $result$`7814551`$fulljournalname
## [1] "Journal of Translational Medicine"
##
## $result$`7814551`$sortdate
## [1] "2021/01/19 00:00"
##
## $result$`7814551`$pmclivedate
## [1] "2021/01/19"
MOST_ERR <- gsub("PMC","",MOST_ERR)
ARTICLE_DATA <- esummary(MOST_ERR,db = "pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLE_DATA,as= "parsed")
ARTICLE_DATA
## $header
## $header$type
## [1] "esummary"
##
## $header$version
## [1] "0.3"
##
##
## $result
## $result$uids
## [1] "7821633"
##
## $result$`7821633`
## $result$`7821633`$uid
## [1] "7821633"
##
## $result$`7821633`$pubdate
## [1] "2021 Jan 22"
##
## $result$`7821633`$epubdate
## [1] "2021 Jan 22"
##
## $result$`7821633`$printpubdate
## [1] ""
##
## $result$`7821633`$source
## [1] "BMC Mol Cell Biol"
##
## $result$`7821633`$authors
## name authtype
## 1 Shen P Author
## 2 Xu A Author
## 3 Hou Y Author
## 4 Wang H Author
## 5 Gao C Author
## 6 He F Author
## 7 Yang D Author
##
## $result$`7821633`$title
## [1] "Conserved paradoxical relationships among the evolutionary, structural and expressional features of KRAB zinc-finger proteins reveal their special functional characteristics"
##
## $result$`7821633`$volume
## [1] "22"
##
## $result$`7821633`$issue
## [1] ""
##
## $result$`7821633`$pages
## [1] "7"
##
## $result$`7821633`$articleids
## idtype value
## 1 pmid 33482715
## 2 doi 10.1186/s12860-021-00346-w
## 3 pmcid PMC7821633
##
## $result$`7821633`$fulljournalname
## [1] "BMC Molecular and Cell Biology"
##
## $result$`7821633`$sortdate
## [1] "2021/01/22 00:00"
##
## $result$`7821633`$pmclivedate
## [1] "2021/01/25"
TODO: To plot the trend over the past 6 months.
Zeeberg, B.R., Riss, J., Kane, D.W. et al. Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5, 80 (2004). https://doi.org/10.1186/1471-2105-5-80
Ziemann, M., Eren, Y. & El-Osta, A. Gene name errors are widespread in the scientific literature. Genome Biol 17, 177 (2016). https://doi.org/10.1186/s13059-016-1044-7
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] jsonlite_1.7.2 XML_3.99-0.3 readxl_1.3.1 reutils_0.2.3 xml2_1.3.2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 knitr_1.31 magrittr_2.0.1 R6_2.5.0
## [5] rlang_0.4.10 highr_0.8 stringr_1.4.0 tools_3.6.3
## [9] xfun_0.22 jquerylib_0.1.3 htmltools_0.5.1.1 yaml_2.2.1
## [13] digest_0.6.27 assertthat_0.2.1 sass_0.3.1 bitops_1.0-6
## [17] RCurl_1.98-1.3 evaluate_0.14 rmarkdown_2.7 stringi_1.5.3
## [21] compiler_3.6.3 bslib_0.2.4 cellranger_1.1.0