Source: https://github.com/markziemann/GeneNameErrors2020
View the reports: http://ziemann-lab.net/public/gene_name_errors/
Gene name errors result when data are imported improperly into MS Excel and other spreadsheet programs (Zeeberg et al, 2004). Certain gene names like MARCH3, SEPT2 and DEC1 are converted into date format. These errors are surprisingly common in supplementary data files in the field of genomics (Ziemann et al, 2016). This could be considered a small error because it only affects a small number of genes, however it is symptomtic of poor data processing methods. The purpose of this script is to identify gene name errors present in supplementary files of PubMed Central articles in the previous month.
library("XML")
library("jsonlite")
library("xml2")
library("reutils")
library("readxl")
Here I will be getting PubMed Central IDs for the previous month.
Start with figuring out the date to search PubMed Central.
CURRENT_MONTH=format(Sys.time(), "%m")
CURRENT_YEAR=format(Sys.time(), "%Y")
if (CURRENT_MONTH == "01") {
PREV_YEAR=as.character(as.numeric(format(Sys.time(), "%Y"))-1)
PREV_MONTH="12"
} else {
PREV_YEAR=CURRENT_YEAR
PREV_MONTH=as.character(as.numeric(format(Sys.time(), "%m"))-1)
}
DATE=paste(PREV_YEAR,"/",PREV_MONTH,sep="")
DATE
## [1] "2021/7"
Let’s see how many PMC IDs we have in the past month.
QUERY ='((genom*[Abstract]))'
ESEARCH_RES <- esearch(term=QUERY, db = "pmc", rettype = "uilist", retmode = "xml", retstart = 0,
retmax = 5000000, usehistory = TRUE, webenv = NULL, querykey = NULL, sort = NULL, field = NULL,
datetype = NULL, reldate = NULL, mindate = DATE, maxdate = DATE)
pmc <- efetch(ESEARCH_RES,retmode="text",rettype="uilist",outfile="pmcids.txt")
## Retrieving UIDs 1 to 500
## Retrieving UIDs 501 to 1000
## Retrieving UIDs 1001 to 1500
## Retrieving UIDs 1501 to 2000
## Retrieving UIDs 2001 to 2500
## Retrieving UIDs 2501 to 3000
## Retrieving UIDs 3001 to 3500
pmc <- read.table(pmc)
pmc <- paste("PMC",pmc$V1,sep="")
NUM_ARTICLES=length(pmc)
NUM_ARTICLES
## [1] 3440
writeLines(pmc,con="pmc.txt")
Now run the bash script. Note that false positives can occur (~1.5%) and these results have not been verified by a human.
Here are some definitions:
NUM_XLS = Number of supplementary Excel files in this set of PMC articles.
NUM_XLS_ARTICLES = Number of articles matching the PubMed Central search which have supplementary Excel files.
GENELISTS = The gene lists found in the Excel files. Each Excel file is counted once even it has multiple gene lists.
NUM_GENELISTS = The number of Excel files with gene lists.
NUM_GENELIST_ARTICLES = The number of PMC articles with supplementary Excel gene lists.
ERROR_GENELISTS = Files suspected to contain gene name errors. The dates and five-digit numbers indicate transmogrified gene names.
NUM_ERROR_GENELISTS = Number of Excel gene lists with errors.
NUM_ERROR_GENELIST_ARTICLES = Number of articles with supplementary Excel gene name errors.
ERROR_PROPORTION = This is the proportion of articles with Excel gene lists that have errors.
system("./gene_names.sh pmc.txt")
results <- readLines("results.txt")
XLS <- results[grep("XLS",results,ignore.case=TRUE)]
NUM_XLS = length(XLS)
NUM_XLS
## [1] 4754
NUM_XLS_ARTICLES = length(unique(sapply(strsplit(XLS," "),"[[",1)))
NUM_XLS_ARTICLES
## [1] 819
GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>2]
#GENELISTS
NUM_GENELISTS <- length(unique(sapply(strsplit(GENELISTS," "),"[[",2)))
NUM_GENELISTS
## [1] 585
NUM_GENELIST_ARTICLES <- length(unique(sapply(strsplit(GENELISTS," "),"[[",1)))
NUM_GENELIST_ARTICLES
## [1] 287
ERROR_GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>3]
#ERROR_GENELISTS
NUM_ERROR_GENELISTS = length(ERROR_GENELISTS)
NUM_ERROR_GENELISTS
## [1] 232
GENELIST_ERROR_ARTICLES <- unique(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
GENELIST_ERROR_ARTICLES
## [1] "PMC8322331" "PMC8320537" "PMC8319688" "PMC8318262" "PMC8299662"
## [6] "PMC8312417" "PMC8312575" "PMC8300563" "PMC8298252" "PMC8270899"
## [11] "PMC8257577" "PMC8238961" "PMC8283158" "PMC8293916" "PMC8288011"
## [16] "PMC8253816" "PMC8289438" "PMC8280127" "PMC8279952" "PMC8284395"
## [21] "PMC8283691" "PMC8280078" "PMC8281136" "PMC8279762" "PMC8277874"
## [26] "PMC8278573" "PMC8278202" "PMC8276479" "PMC8274021" "PMC8273915"
## [31] "PMC8273913" "PMC8272312" "PMC8268579" "PMC8266393" "PMC8255842"
## [36] "PMC8254024" "PMC7611218" "PMC8270675" "PMC8268225" "PMC8267460"
## [41] "PMC8267417" "PMC8266308" "PMC8263711" "PMC8255068" "PMC8233376"
## [46] "PMC8225795" "PMC8222384" "PMC8219828" "PMC8219801" "PMC8219709"
## [51] "PMC8217501" "PMC8264799" "PMC8260770" "PMC8260754" "PMC8261551"
## [56] "PMC8260222" "PMC8219162" "PMC8259168" "PMC8258016" "PMC8247369"
## [61] "PMC8226361" "PMC8224457" "PMC8253514" "PMC8253049" "PMC8249860"
## [66] "PMC8248162" "PMC8246630" "PMC8213753" "PMC8211852" "PMC8203611"
## [71] "PMC8244194" "PMC8221374"
NUM_ERROR_GENELIST_ARTICLES <- length(GENELIST_ERROR_ARTICLES)
NUM_ERROR_GENELIST_ARTICLES
## [1] 72
ERROR_PROPORTION = NUM_ERROR_GENELIST_ARTICLES / NUM_GENELIST_ARTICLES
ERROR_PROPORTION
## [1] 0.2508711
Here you can have a look at all the gene lists detected in the past month, as well as those with errors. The dates are obvious errors, these are commonly dates in September, March, December and October. The five-digit numbers represent dates as they are encoded in the Excel internal format. The five digit number is the number of days since 1900. If you were to take these numbers and put them into Excel and format the cells as dates, then these will also mostly map to dates in September, March, December and October.
#GENELISTS
ERROR_GENELISTS
## [1] "PMC8322331 /pmc/articles/PMC8322331/bin/41598_2021_94805_MOESM2_ESM.xlsx Dmelanogaster 2 37135 38596"
## [2] "PMC8322331 /pmc/articles/PMC8322331/bin/41598_2021_94805_MOESM2_ESM.xlsx Dmelanogaster 3 37135 37500 38596"
## [3] "PMC8322331 /pmc/articles/PMC8322331/bin/41598_2021_94805_MOESM6_ESM.xlsx Dmelanogaster 2 04-Sep 05-Sep"
## [4] "PMC8320537 /pmc/articles/PMC8320537/bin/Table_2.XLS Hsapiens 11 2020/03/02 2020/03/07 2020/03/09 2020/03/04 2020/03/01 2020/03/06 2020/03/08 2020/12/01 2020/03/10 2020/03/03 2020/03/05"
## [5] "PMC8319688 /pmc/articles/PMC8319688/bin/Table_2.xlsx Scerevisiae 2 43556 43715"
## [6] "PMC8318262 /pmc/articles/PMC8318262/bin/erab223_suppl_supplementary_file002.xlsx Athaliana 3 44077 44077 44078"
## [7] "PMC8318262 /pmc/articles/PMC8318262/bin/erab223_suppl_supplementary_file002.xlsx Athaliana 1 44077"
## [8] "PMC8318262 /pmc/articles/PMC8318262/bin/erab223_suppl_supplementary_file006.xlsx Athaliana 1 44077"
## [9] "PMC8299662 /pmc/articles/PMC8299662/bin/12885_2021_8462_MOESM3_ESM.xlsx Hsapiens 3 43892 43891 43898"
## [10] "PMC8299662 /pmc/articles/PMC8299662/bin/12885_2021_8462_MOESM4_ESM.xlsx Hsapiens 1 43900"
## [11] "PMC8312417 /pmc/articles/PMC8312417/bin/aging-13-203250-s003.xlsx Hsapiens 25 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446 44446"
## [12] "PMC8312575 /pmc/articles/PMC8312575/bin/Table_1.XLSX Mmusculus 4 43711 43527 43525 43713"
## [13] "PMC8312575 /pmc/articles/PMC8312575/bin/Table_5.XLSX Mmusculus 5 44084 44083 43899 43892 44083"
## [14] "PMC8312575 /pmc/articles/PMC8312575/bin/Table_6.XLSX Mmusculus 5 44084 44083 43899 43892 44083"
## [15] "PMC8300563 /pmc/articles/PMC8300563/bin/Table_1.xlsx Hsapiens 13 36951 37681 36951 38047 38047 36951 38047 36951 40057 40603 38047 38047 40057"
## [16] "PMC8300563 /pmc/articles/PMC8300563/bin/Table_3.xlsx Hsapiens 6 36951 36951 37135 37135 40057 40057"
## [17] "PMC8298252 /pmc/articles/PMC8298252/bin/204_2021_3108_MOESM2_ESM.xlsx Mmusculus 3 37135 38961 39692"
## [18] "PMC8298252 /pmc/articles/PMC8298252/bin/204_2021_3108_MOESM2_ESM.xlsx Rnorvegicus 4 38961 40787 40422 37135"
## [19] "PMC8270899 /pmc/articles/PMC8270899/bin/41467_2021_24445_MOESM3_ESM.xlsx Hsapiens 1 44083"
## [20] "PMC8257577 /pmc/articles/PMC8257577/bin/41467_2021_24466_MOESM4_ESM.xlsx Mmusculus 10 44084 44078 43900 43893 43895 44076 43897 44088 44075 43891"
## [21] "PMC8257577 /pmc/articles/PMC8257577/bin/41467_2021_24466_MOESM4_ESM.xlsx Mmusculus 3 43901 44077 43894"
## [22] "PMC8238961 /pmc/articles/PMC8238961/bin/41467_2021_23899_MOESM5_ESM.xlsx Hsapiens 1 43896"
## [23] "PMC8283158 /pmc/articles/PMC8283158/bin/mmc2.xlsx Hsapiens 1 43347"
## [24] "PMC8283158 /pmc/articles/PMC8283158/bin/mmc2.xlsx Hsapiens 3 39513 39693 39515"
## [25] "PMC8283158 /pmc/articles/PMC8283158/bin/mmc2.xlsx Hsapiens 3 42989 42981 42800"
## [26] "PMC8283158 /pmc/articles/PMC8283158/bin/mmc2.xlsx Hsapiens 1 42800"
## [27] "PMC8293916 /pmc/articles/PMC8293916/bin/Table_1.XLSX Hsapiens 1 44085"
## [28] "PMC8288011 /pmc/articles/PMC8288011/bin/CTM2-11-e498-s005.xlsx Hsapiens 1 44531"
## [29] "PMC8253816 /pmc/articles/PMC8253816/bin/41467_2021_24373_MOESM7_ESM.xlsx Mmusculus 75 44450 44445 44264 44266 44263 44443 44449 44258 44258 44440 44450 44262 44454 44265 44442 44259 44445 44256 44453 44264 44260 44443 44257 44441 44440 44450 44265 44443 44451 44257 44447 44259 44453 44262 44441 44444 44257 44445 44260 44451 44442 44260 44449 44444 44446 44256 44440 44451 44441 44448 44446 44259 44448 44258 44261 44444 44453 44447 44261 44446 44264 44261 44448 44265 44442 44454 44266 44263 44454 44449 44256 44266 44447 44263 44262"
## [30] "PMC8253816 /pmc/articles/PMC8253816/bin/41467_2021_24373_MOESM7_ESM.xlsx Mmusculus 25 42980 42805 42990 42804 42979 42988 42981 42803 42989 42985 42800 42798 42992 42795 42993 42983 42984 42987 42802 42796 42982 42797 42799 42801 42986"
## [31] "PMC8253816 /pmc/articles/PMC8253816/bin/41467_2021_24373_MOESM7_ESM.xlsx Mmusculus 75 44261 44450 44262 44445 44266 44446 44263 44440 44449 44258 44440 44450 44265 44442 44264 44445 44264 44262 44260 44257 44441 44450 44265 44443 44451 44257 44447 44259 44264 44453 44262 44441 44444 44257 44445 44260 44451 44442 44260 44444 44442 44453 44449 44446 44256 44440 44451 44263 44441 44454 44449 44448 44443 44446 44259 44448 44258 44261 44444 44454 44453 44447 44261 44256 44448 44265 44454 44266 44263 44256 44266 44443 44447 44258 44259"
## [32] "PMC8253816 /pmc/articles/PMC8253816/bin/41467_2021_24373_MOESM7_ESM.xlsx Mmusculus 25 44445 44261 44263 44447 44453 44451 44256 44454 44260 44262 44444 44257 44440 44265 44259 44264 44448 44443 44258 44446 44450 44442 44266 44449 44441"
## [33] "PMC8253816 /pmc/articles/PMC8253816/bin/41467_2021_24373_MOESM7_ESM.xlsx Mmusculus 25 44261 44445 44263 44447 44257 44260 44453 44256 44451 44454 44262 44444 44440 44265 44259 44448 44258 44442 44446 44443 44264 44266 44450 44441 44449"
## [34] "PMC8253816 /pmc/articles/PMC8253816/bin/41467_2021_24373_MOESM8_ESM.xlsx Mmusculus 9 44085 44084 44075 44083 44082 44081 44076 43896 43897"
## [35] "PMC8253816 /pmc/articles/PMC8253816/bin/41467_2021_24373_MOESM8_ESM.xlsx Mmusculus 7 44085 44084 44083 44082 44081 44076 44075"
## [36] "PMC8253816 /pmc/articles/PMC8253816/bin/41467_2021_24373_MOESM8_ESM.xlsx Mmusculus 8 44085 44089 44084 44075 44083 44082 44081 44076"
## [37] "PMC8253816 /pmc/articles/PMC8253816/bin/41467_2021_24373_MOESM8_ESM.xlsx Mmusculus 8 44450 44454 44260 44449 44440 44448 44446 44441"
## [38] "PMC8253816 /pmc/articles/PMC8253816/bin/41467_2021_24373_MOESM8_ESM.xlsx Mmusculus 6 44446 44441 44440 44448 44450 44449"
## [39] "PMC8289438 /pmc/articles/PMC8289438/bin/Table7.XLS Hsapiens 1 2021/03/01"
## [40] "PMC8289438 /pmc/articles/PMC8289438/bin/Table7.XLS Hsapiens 6 2021/09/07 2021/09/06 2021/09/10 2021/09/01 2021/03/01 2021/09/01"
## [41] "PMC8280127 /pmc/articles/PMC8280127/bin/41419_2021_3982_MOESM9_ESM.xlsx Hsapiens 2 44453 44443"
## [42] "PMC8279952 /pmc/articles/PMC8279952/bin/41594_2021_603_MOESM6_ESM.xlsx Mmusculus 2 39326 37500"
## [43] "PMC8284395 /pmc/articles/PMC8284395/bin/Table_2.XLSX Hsapiens 1 44083"
## [44] "PMC8284395 /pmc/articles/PMC8284395/bin/Table_2.XLSX Hsapiens 1 44083"
## [45] "PMC8284395 /pmc/articles/PMC8284395/bin/Table_3.XLSX Hsapiens 1 44085"
## [46] "PMC8284395 /pmc/articles/PMC8284395/bin/Table_3.XLSX Hsapiens 1 44085"
## [47] "PMC8284395 /pmc/articles/PMC8284395/bin/Table_4.XLSX Hsapiens 1 44348"
## [48] "PMC8283691 /pmc/articles/PMC8283691/bin/Table_1.xlsx Hsapiens 1 43349"
## [49] "PMC8280078 /pmc/articles/PMC8280078/bin/painreports-6-e944-s003.xlsx Mmusculus 8 44442 44442 44256 44257 44257 44256 44443 44442"
## [50] "PMC8280078 /pmc/articles/PMC8280078/bin/painreports-6-e944-s004.xlsx Mmusculus 7 44257 44257 44444 44444 44257 44257 44258"
## [51] "PMC8281136 /pmc/articles/PMC8281136/bin/Table_2.xlsx Hsapiens 14 43896 43895 43891 43892 43899 43901 43900 43891 43894 44166 43893 43897 43898 43892"
## [52] "PMC8279762 /pmc/articles/PMC8279762/bin/elife-61407-supp1.xlsx Hsapiens 8 40057 39692 37500 40422 40787 39326 38412 37316"
## [53] "PMC8277874 /pmc/articles/PMC8277874/bin/41598_2021_93904_MOESM2_ESM.xls Ggallus 23 2020/09/07 2020/09/11 2020/03/07 2020/03/06 2020/03/05 2020/09/02 2020/09/03 2020/03/02 2020/03/10 2020/03/02 2020/09/14 2020/09/01 2020/09/05 2020/03/01 2020/09/06 2020/03/03 2020/09/04 2020/03/11 2020/03/01 2020/09/08 2020/03/08 2020/03/09 2020/09/10"
## [54] "PMC8278573 /pmc/articles/PMC8278573/bin/Table_3.XLSX Hsapiens 3 44082 44082 44082"
## [55] "PMC8278573 /pmc/articles/PMC8278573/bin/Table_5.XLSX Hsapiens 11 43892 43893 43896 43897 44084 44085 44076 44080 44081 44082 44083"
## [56] "PMC8278573 /pmc/articles/PMC8278573/bin/Table_5.XLSX Hsapiens 11 43892 43893 43896 43897 44084 44085 44076 44080 44081 44082 44083"
## [57] "PMC8278573 /pmc/articles/PMC8278573/bin/Table_5.XLSX Hsapiens 11 43892 43893 43896 43897 44084 44085 44076 44080 44081 44082 44083"
## [58] "PMC8278573 /pmc/articles/PMC8278573/bin/Table_5.XLSX Hsapiens 12 43892 43893 43896 43897 44089 44084 44085 44076 44080 44081 44082 44083"
## [59] "PMC8278573 /pmc/articles/PMC8278573/bin/Table_5.XLSX Hsapiens 12 43892 43893 43896 43897 44089 44084 44085 44076 44080 44081 44082 44083"
## [60] "PMC8278573 /pmc/articles/PMC8278573/bin/Table_5.XLSX Hsapiens 12 43892 43893 43896 43897 44089 44084 44085 44076 44080 44081 44082 44083"
## [61] "PMC8278202 /pmc/articles/PMC8278202/bin/Table_2.XLSX Mmusculus 4 3-Sep 4-Mar 5-Sep 9-Mar"
## [62] "PMC8278202 /pmc/articles/PMC8278202/bin/Table_3.XLSX Mmusculus 11 43894 44077 44079 44079 43899 44077 44079 44079 44079 43899 44079"
## [63] "PMC8276479 /pmc/articles/PMC8276479/bin/12864_2021_7808_MOESM4_ESM.xlsx Hsapiens 1 43898"
## [64] "PMC8274021 /pmc/articles/PMC8274021/bin/12876_2021_1869_MOESM2_ESM.xlsx Hsapiens 1 44443"
## [65] "PMC8274021 /pmc/articles/PMC8274021/bin/12876_2021_1869_MOESM2_ESM.xlsx Hsapiens 1 44443"
## [66] "PMC8274021 /pmc/articles/PMC8274021/bin/12876_2021_1869_MOESM2_ESM.xlsx Hsapiens 3 44256 44259 44443"
## [67] "PMC8274021 /pmc/articles/PMC8274021/bin/12876_2021_1869_MOESM2_ESM.xlsx Hsapiens 9 44256 44259 44262 44263 44449 44450 44443 44445 44446"
## [68] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [69] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [70] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [71] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [72] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [73] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [74] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [75] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [76] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [77] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [78] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [79] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [80] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [81] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [82] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [83] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [84] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [85] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [86] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [87] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [88] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [89] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [90] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [91] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [92] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [93] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [94] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [95] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [96] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [97] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [98] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [99] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [100] "PMC8273915 /pmc/articles/PMC8273915/bin/DataSheet2.xlsx Hsapiens 2 44257 44256"
## [101] "PMC8273913 /pmc/articles/PMC8273913/bin/Data_Sheet_3.xlsx Hsapiens 6 42248 40057 38961 40422 40787 39692"
## [102] "PMC8273913 /pmc/articles/PMC8273913/bin/Data_Sheet_3.xlsx Hsapiens 2 39326 39692"
## [103] "PMC8273913 /pmc/articles/PMC8273913/bin/Data_Sheet_3.xlsx Hsapiens 2 38777 39873"
## [104] "PMC8273913 /pmc/articles/PMC8273913/bin/Data_Sheet_3.xlsx Hsapiens 9 40787 38961 36951 40422 37681 38231 39326 40057 39692"
## [105] "PMC8272312 /pmc/articles/PMC8272312/bin/12864_2021_7865_MOESM1_ESM.xlsx Hsapiens 1 42074"
## [106] "PMC8268579 /pmc/articles/PMC8268579/bin/13059_2021_2417_MOESM2_ESM.xlsx Mmusculus 21 43715 43716 43718 43535 43711 43529 43709 43532 43710 43531 43527 43525 43530 43533 43714 43719 43717 43712 43713 43526 43526"
## [107] "PMC8268579 /pmc/articles/PMC8268579/bin/13059_2021_2417_MOESM3_ESM.xlsx Mmusculus 20 43715 43716 43709 43718 43712 43535 43711 43529 43532 43710 43531 43527 43530 43533 43714 43719 43717 43713 43526 43526"
## [108] "PMC8266393 /pmc/articles/PMC8266393/bin/elife-69619-supp1.xlsx Scerevisiae 1 44470"
## [109] "PMC8266393 /pmc/articles/PMC8266393/bin/elife-69619-supp1.xlsx Hsapiens 20 44256 44265 44257 44258 44259 44260 44261 44262 44263 44264 44454 44450 44441 44442 44443 44444 44445 44446 44447 44448"
## [110] "PMC8266393 /pmc/articles/PMC8266393/bin/elife-69619-supp3.xlsx Scerevisiae 1 44470"
## [111] "PMC8266393 /pmc/articles/PMC8266393/bin/elife-69619-supp3.xlsx Scerevisiae 1 44470"
## [112] "PMC8266393 /pmc/articles/PMC8266393/bin/elife-69619-supp3.xlsx Scerevisiae 1 44470"
## [113] "PMC8266393 /pmc/articles/PMC8266393/bin/elife-69619-supp4.xlsx Scerevisiae 1 44470"
## [114] "PMC8266393 /pmc/articles/PMC8266393/bin/elife-69619-supp4.xlsx Scerevisiae 1 44470"
## [115] "PMC8266393 /pmc/articles/PMC8266393/bin/elife-69619-supp4.xlsx Scerevisiae 1 44470"
## [116] "PMC8255842 /pmc/articles/PMC8255842/bin/FEB4-11-1814-s002.xlsx Hsapiens 27 44257 44442 44443 44257 44446 44445 44262 44450 44264 44451 44259 44256 44261 44453 44447 44263 44441 44531 44265 44258 44440 44266 44448 44444 44256 44449 44260"
## [117] "PMC8254024 /pmc/articles/PMC8254024/bin/mmc3.xlsx Hsapiens 129 43720 43534 43526 43531 43710 43711 43530 43530 43716 43525 43531 43710 43530 43530 43716 43718 43530 43530 43715 43526 43532 43532 43533 43709 43720 43712 43534 43534 43718 43718 43718 43531 43528 43710 43710 43711 43525 43525 43525 43525 43525 43525 43525 43719 43719 43719 43530 43530 43527 43535 43535 43535 43535 43715 43715 43715 43722 43722 43722 43722 43722 43714 43714 43714 43713 43526 43532 43534 43718 43718 43718 43528 43710 43713 43711 43711 43525 43525 43525 43719 43719 43719 43530 43530 43530 43530 43715 43722 43722 43800 43800 43714 43525 43533 43717 43718 43528 43710 43719 43719 43530 43530 43525 43711 43718 43720 43717 43718 43719 43719 43720 43710 43530 43709 43530 43714 43716 43525 43710 43529 43717 43528 43534 43531 43534 43534 43715 43530 43530"
## [118] "PMC7611218 /pmc/articles/PMC7611218/bin/EMS123718-supplement-Extended_Table_1.xlsx Hsapiens 24 38412 38777 39142 37865 38596 37681 37316 38961 38047 39692 39873 38231 40422 42248 40787 39326 37226 37500 39508 41883 40603 40238 40057 41153"
## [119] "PMC7611218 /pmc/articles/PMC7611218/bin/EMS123718-supplement-Extended_Table_2.xlsx Hsapiens 4 37500 40057 40787 39326"
## [120] "PMC7611218 /pmc/articles/PMC7611218/bin/EMS123718-supplement-Extended_Table_2.xlsx Hsapiens 10 42984 42985 42988 42985 42989 42980 42993 42980 42987 42986"
## [121] "PMC8270675 /pmc/articles/PMC8270675/bin/NIHMS1664954-supplement-2.xlsx Mmusculus 81 43711 43529 43719 43719 43719 43719 43719 43719 43529 43719 43717 43711 43529 43717 43711 43529 43711 43711 43719 43715 43719 43711 43529 43719 43719 43717 43715 43719 43711 43719 43711 43715 43715 43717 43715 43719 43715 43715 43529 43715 43719 43719 43719 43719 43711 43711 43719 43715 43715 43715 43717 43715 43719 43719 43719 43719 43719 43717 43711 43529 43719 43717 43717 43719 43717 43717 43717 43529 43719 43717 43719 43719 43717 43717 43719 43719 43717 43719 43717 43717 43719"
## [122] "PMC8268225 /pmc/articles/PMC8268225/bin/13567_2021_972_MOESM4_ESM.xlsx Hsapiens 5 43717 43714 43716 43719 43526"
## [123] "PMC8268225 /pmc/articles/PMC8268225/bin/13567_2021_972_MOESM4_ESM.xlsx Hsapiens 3 43717 43714 43526"
## [124] "PMC8267460 /pmc/articles/PMC8267460/bin/NIHMS1713046-supplement-4.xlsx Hsapiens 3 43892 44166 43891"
## [125] "PMC8267417 /pmc/articles/PMC8267417/bin/Table_1.xls Hsapiens 13 2021/12/01 2021/03/10 2021/03/11 2021/03/01 2021/03/02 2021/03/03 2021/03/04 2021/03/05 2021/03/06 2021/03/07 2021/03/08 2021/03/09 2021/09/15"
## [126] "PMC8266308 /pmc/articles/PMC8266308/bin/aging-13-203162-s003.xlsx Hsapiens 1 44256"
## [127] "PMC8266308 /pmc/articles/PMC8266308/bin/aging-13-203162-s003.xlsx Hsapiens 1 44443"
## [128] "PMC8263711 /pmc/articles/PMC8263711/bin/41598_2021_93570_MOESM3_ESM.xlsx Hsapiens 1 43716"
## [129] "PMC8255068 /pmc/articles/PMC8255068/bin/peerj-09-11645-s009.xlsx Hsapiens 2 3-Sep 11-Sep"
## [130] "PMC8255068 /pmc/articles/PMC8255068/bin/peerj-09-11645-s009.xlsx Hsapiens 4 6-Mar 3-Sep 1-Mar 9-Sep"
## [131] "PMC8255068 /pmc/articles/PMC8255068/bin/peerj-09-11645-s009.xlsx Hsapiens 3 7-Sep 3-Sep 5-Sep"
## [132] "PMC8233376 /pmc/articles/PMC8233376/bin/41467_2021_24243_MOESM4_ESM.xlsx Hsapiens 1 43894"
## [133] "PMC8233376 /pmc/articles/PMC8233376/bin/41467_2021_24243_MOESM9_ESM.xlsx Hsapiens 1 44077"
## [134] "PMC8225795 zip/Source_Data1.xlsx Hsapiens 1 44441"
## [135] "PMC8225795 zip/Source_Data1.xlsx Hsapiens 1 44444"
## [136] "PMC8225795 zip/Source_data3-Somatic_mutation_profile.xlsx Hsapiens 6 43894 44076 44081 43901 43900 44086"
## [137] "PMC8222384 /pmc/articles/PMC8222384/bin/41467_2021_23993_MOESM11_ESM.xlsx Mmusculus 1 40057"
## [138] "PMC8222384 /pmc/articles/PMC8222384/bin/41467_2021_23993_MOESM11_ESM.xlsx Hsapiens 1 40057"
## [139] "PMC8222384 /pmc/articles/PMC8222384/bin/41467_2021_23993_MOESM9_ESM.xlsx Mmusculus 1 40057"
## [140] "PMC8219828 /pmc/articles/PMC8219828/bin/41467_2021_24140_MOESM4_ESM.xlsx Mmusculus 25 43891 43900 43901 43892 43893 43894 43895 43896 43897 43898 43899 44089 44075 44084 44085 44086 44088 44076 44077 44078 44079 44080 44081 44082 44083"
## [141] "PMC8219801 zip/RawData/Supplementary_Figures/FigS4/d/repressed_common_exp1_2.xlsx Hsapiens 6 39326 40057 38777 36951 38961 39508"
## [142] "PMC8219709 /pmc/articles/PMC8219709/bin/41698_2021_201_MOESM2_ESM.xlsx Hsapiens 1 MYSM1-JUN"
## [143] "PMC8217501 zip/Source_Data/source_data_supplementary_figure_5d.xlsx Mmusculus 1 37135"
## [144] "PMC8264799 /pmc/articles/PMC8264799/bin/DataSheet_2.xls Hsapiens 2 43901 43894"
## [145] "PMC8260770 /pmc/articles/PMC8260770/bin/41598_2021_93346_MOESM1_ESM.xlsx Hsapiens 6 41153 38412 38412 38412 38412 39508"
## [146] "PMC8260770 /pmc/articles/PMC8260770/bin/41598_2021_93346_MOESM1_ESM.xlsx Hsapiens 1 36951"
## [147] "PMC8260770 /pmc/articles/PMC8260770/bin/41598_2021_93346_MOESM1_ESM.xlsx Ggallus 1 36951"
## [148] "PMC8260754 /pmc/articles/PMC8260754/bin/41598_2021_92988_MOESM1_ESM.xlsx Hsapiens 26 43710 43713 43712 43723 43530 43715 43717 43714 43716 43800 43526 43529 43525 43719 43711 43528 43709 43531 43533 43527 43720 43718 43534 43532 43722 43535"
## [149] "PMC8260754 /pmc/articles/PMC8260754/bin/41598_2021_92988_MOESM2_ESM.xlsx Hsapiens 26 42982 42805 42980 42797 42800 42796 42796 42803 43070 42801 42987 42986 42802 42988 42799 42990 42795 42989 42993 42992 42983 42984 42981 42985 42798 42804"
## [150] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_11.XLSX Hsapiens 19 44440 44445 44264 44441 44446 44260 44449 44262 44448 44261 44257 44258 44447 44263 44259 44450 44266 44257 44443"
## [151] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_11.XLSX Hsapiens 19 44440 44446 44441 44264 44257 44258 44447 44263 44259 44450 44449 44266 44257 44443 44262 44261 44445 44448 44260"
## [152] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_11.XLSX Hsapiens 19 44445 44448 44440 44446 44441 44264 44263 44450 44260 44257 44261 44262 44258 44447 44259 44449 44266 44257 44443"
## [153] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_11.XLSX Hsapiens 19 44446 44262 44257 44264 44258 44447 44259 44450 44449 44266 44257 44443 44441 44440 44260 44263 44261 44445 44448"
## [154] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_11.XLSX Hsapiens 19 44445 44440 44446 44448 44262 44450 44257 44441 44260 44447 44264 44261 44263 44258 44259 44449 44266 44257 44443"
## [155] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_11.XLSX Hsapiens 19 44445 44440 44257 44448 44446 44261 44447 44258 44263 44259 44449 44266 44257 44443 44260 44264 44262 44441 44450"
## [156] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_11.XLSX Hsapiens 19 44440 44445 44446 44448 44257 44260 44450 44441 44262 44263 44261 44447 44449 44443 44258 44259 44264 44266 44257"
## [157] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_11.XLSX Hsapiens 19 44257 44445 44448 44450 44261 44443 44263 44258 44447 44259 44449 44264 44266 44257 44262 44260 44441 44446 44440"
## [158] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_11.XLSX Hsapiens 19 44445 44446 44448 44440 44262 44260 44257 44443 44261 44258 44447 44263 44259 44441 44450 44449 44264 44266 44257"
## [159] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_11.XLSX Hsapiens 19 44440 44446 44445 44257 44448 44258 44447 44263 44262 44259 44441 44450 44449 44264 44266 44261 44257 44443 44260"
## [160] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_11.XLSX Ggallus 19 44440 44446 44445 44448 44260 44257 44441 44262 44450 44261 44263 44447 44258 44259 44449 44264 44266 44257 44443"
## [161] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_11.XLSX Hsapiens 19 44450 44261 44445 44448 44258 44447 44263 44259 44449 44264 44266 44257 44443 44257 44260 44440 44446 44441 44262"
## [162] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_13.XLSX Ggallus 19 44446 44445 44440 44448 44450 44262 44260 44441 44257 44261 44263 44264 44447 44258 44259 44449 44266 44257 44443"
## [163] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_13.XLSX Hsapiens 19 44445 44257 44446 44448 44441 44262 44261 44440 44258 44447 44259 44449 44266 44257 44443 44450 44264 44263 44260"
## [164] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44440 44446 44257 44450 44441 44260 44445 44448 44261 44443 44258 44447 44263 44262 44259 44449 44264 44266 44257"
## [165] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44440 44441 44446 44448 44445 44443 44260 44450 44258 44447 44263 44262 44259 44449 44264 44266 44261 44257 44257"
## [166] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44450 44440 44446 44445 44262 44260 44443 44448 44257 44441 44261 44258 44447 44263 44259 44449 44264 44266 44257"
## [167] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44450 44257 44261 44258 44447 44263 44259 44441 44449 44264 44266 44257 44445 44448 44262 44260 44443 44446 44440"
## [168] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44440 44446 44445 44450 44448 44441 44262 44260 44261 44257 44257 44449 44263 44264 44258 44447 44259 44266 44443"
## [169] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44450 44448 44446 44445 44441 44257 44261 44260 44258 44447 44263 44259 44449 44264 44266 44443 44440 44257 44262"
## [170] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44446 44440 44445 44450 44448 44441 44261 44262 44257 44260 44263 44257 44264 44449 44447 44258 44259 44266 44443"
## [171] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44445 44446 44261 44441 44448 44257 44260 44449 44258 44447 44259 44264 44266 44257 44443 44262 44450 44263 44440"
## [172] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44450 44445 44440 44446 44448 44262 44260 44441 44261 44257 44263 44447 44264 44258 44259 44449 44266 44257 44443"
## [173] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44450 44445 44448 44257 44441 44260 44440 44262 44258 44447 44263 44259 44449 44266 44261 44257 44443 44264 44446"
## [174] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44440 44445 44446 44450 44448 44262 44260 44263 44257 44441 44258 44447 44259 44449 44264 44266 44261 44257 44443"
## [175] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44440 44446 44445 44448 44258 44447 44263 44260 44259 44441 44449 44264 44266 44261 44257 44443 44257 44450 44262"
## [176] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44445 44440 44448 44446 44262 44450 44260 44257 44441 44264 44447 44263 44258 44259 44449 44266 44261 44257 44443"
## [177] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_14.XLSX Hsapiens 19 44257 44446 44264 44450 44258 44447 44263 44260 44259 44449 44266 44261 44257 44443 44448 44441 44262 44440 44445"
## [178] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_2.XLSX Hsapiens 17 44261 44448 44449 44264 44266 44260 44442 44440 44450 44263 44257 44256 44445 44257 44447 44262 44446"
## [179] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_2.XLSX Hsapiens 17 44261 44448 44449 44264 44266 44260 44442 44440 44450 44263 44257 44256 44445 44257 44447 44262 44446"
## [180] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_2.XLSX Hsapiens 17 44261 44448 44449 44264 44266 44260 44442 44440 44450 44263 44257 44256 44445 44257 44447 44262 44446"
## [181] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_2.XLSX Hsapiens 17 44261 44448 44449 44264 44266 44260 44442 44440 44450 44263 44257 44256 44445 44257 44447 44262 44446"
## [182] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_2.XLSX Hsapiens 17 44261 44448 44449 44264 44266 44260 44442 44440 44450 44263 44257 44256 44445 44257 44447 44262 44446"
## [183] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_2.XLSX Hsapiens 17 44261 44448 44449 44264 44266 44260 44442 44440 44450 44263 44257 44256 44445 44257 44447 44262 44446"
## [184] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_2.XLSX Hsapiens 17 44256 44266 44449 44264 44450 44446 44257 44261 44448 44447 44440 44445 44263 44260 44442 44262 44257"
## [185] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_2.XLSX Hsapiens 17 44261 44448 44449 44264 44266 44260 44442 44440 44450 44263 44257 44256 44445 44257 44447 44262 44446"
## [186] "PMC8261551 /pmc/articles/PMC8261551/bin/Table_7.XLSX Hsapiens 1234 44080 44080 44075 44081 44080 44075 44075 44080 44081 44075 44080 44075 44075 44080 44080 44075 44080 44081 44080 44075 44075 44080 44081 44075 44080 44080 44075 44075 44075 44080 44081 44080 44075 44075 44080 44081 44075 44080 44075 44080 44075 44080 44075 44080 44080 44075 44080 44080 44080 44075 44080 44075 44080 44085 44075 44085 44081 44075 44085 44080 44081 44075 44085 44081 44075 44085 44081 44075 44081 44085 44075 44085 44075 44081 44075 44081 44080 44075 44081 44075 44081 44075 44081 44075 44080 44080 44081 44075 44080 44075 44080 44081 44080 44080 44075 44081 44080 44075 44080 44081 44080 44075 44080 44081 44080 44080 44075 44080 44081 44080 44075 44080 44081 44080 44075 44081 44075 44080 44080 44075 44080 44080 44080 44081 44080 44080 44080 44081 44080 44085 44080 44081 44080 44085 44075 44080 44081 44085 44083 44080 44080 44081 44075 44080 44081 44085 44083 44080 44075 44080 44081 44085 44083 44085 44080 44075 44081 44080 44085 44083 44080 44075 44081 44080 44085 44083 44085 44080 44081 44075 44085 44075 44085 44080 44081 44080 44075 44081 44080 44075 44080 44080 44081 44075 44083 44085 44080 44080 44081 44085 44075 44081 44075 44081 44075 44081 44075 44081 44075 44081 44075 44085 44085 44075 44085 44081 44080 44083 44075 44085 44080 44081 44083 44075 44085 44083 44081 44080 44075 44085 44081 44075 44085 44081 44085 44075 44085 44085 44085 44075 44081 44080 44085 44080 44075 44085 44081 44085 44075 44081 44085 44080 44075 44080 44085 44075 44081 44075 44081 44080 44085 44075 44081 44085 44080 44080 44085 44075 44075 44081 44085 44080 44085 44080 44085 44075 44085 44075 44081 44085 44080 44080 44075 44085 44081 44075 44081 44085 44080 44085 44075 44081 44080 44085 44075 44085 44081 44080 44075 44080 44081 44075 44075 44081 44080 44085 44080 44075 44085 44080 44081 44075 44085 44085 44085 44081 44080 44075 44085 44080 44085 44081 44085 44075 44081 44080 44085 44083 44080 44085 44075 44081 44080 44075 44080 44085 44081 44080 44075 44081 44085 44080 44083 44085 44080 44085 44075 44085 44075 44081 44085 44080 44083 44080 44085 44081 44075 44075 44081 44080 44085 44083 44085 44075 44080 44081 44085 44085 44075 44085 44080 44081 44075 44080 44081 44075 44080 44080 44081 44075 44080 44085 44083 44085 44080 44085 44080 44081 44075 44085 44085 44081 44075 44081 44085 44075 44085 44081 44080 44085 44075 44075 44085 44081 44080 44075 44075 44085 44081 44080 44075 44081 44085 44075 44075 44085 44081 44080 44075 44085 44080 44081 44075 44085 44081 44080 44075 44085 44085 44075 44075 44085 44080 44085 44075 44075 44085 44081 44075 44085 44081 44080 44075 44081 44085 44085 44075 44081 44085 44081 44075 44085 44081 44075 44081 44075 44075 44075 44081 44085 44075 44075 44075 44085 44080 44075 44081 44085 44080 44085 44081 44085 44081 44080 44080 44081 44085 44075 44081 44085 44080 44085 44080 44085 44081 44081 44075 44081 44085 44080 44080 44081 44075 44081 44085 44080 44085 44080 44085 44075 44081 44085 44080 44080 44081 44075 44081 44085 44080 44081 44085 44081 44075 44085 44081 44085 44080 44085 44081 44080 44085 44083 44080 44080 44081 44075 44080 44075 44080 44081 44080 44080 44075 44081 44075 44080 44081 44080 44075 44080 44081 44080 44075 44075 44080 44081 44080 44075 44081 44080 44080 44075 44075 44080 44080 44075 44080 44080 44075 44080 44081 44081 44075 44081 44081 44075 44081 44075 44081 44075 44081 44075 44081 44075 44081 44080 44083 44080 44081 44080 44081 44075 44080 44081 44080 44080 44081 44080 44075 44080 44081 44083 44080 44081 44075 44080 44081 44080 44080 44075 44080 44081 44080 44081 44075 44080 44081 44080 44081 44080 44081 44075 44080 44080 44080 44080 44080 44081 44080 44080 44080 44080 44081 44080 44075 44081 44075 44081 44075 44081 44075 44081 44075 44081 44075 44080 44080 44075 44081 44080 44075 44080 44081 44083 44080 44080 44075 44075 44080 44081 44083 44080 44075 44080 44081 44083 44080 44075 44080 44081 44083 44080 44075 44080 44081 44083 44080 44075 44081 44075 44080 44080 44075 44080 44075 44080 44080 44075 44081 44083 44080 44080 44080 44085 44075 44085 44081 44075 44085 44075 44085 44083 44081 44075 44085 44081 44075 44085 44081 44085 44075 44083 44075 44081 44075 44081 44075 44081 44075 44081 44075 44081 44075 44075 44081 44075 44081 44080 44075 44081 44083 44075 44081 44081 44075 44085 44075 44080 44081 44080 44075 44085 44085 44075 44081 44085 44080 44075 44080 44075 44085 44075 44080 44081 44075 44085 44080 44081 44080 44075 44075 44085 44081 44080 44085 44080 44075 44085 44085 44075 44081 44085 44080 44080 44075 44085 44075 44081 44085 44080 44075 44080 44081 44085 44075 44085 44080 44075 44080 44075 44075 44081 44080 44085 44080 44075 44085 44080 44075 44081 44085 44085 44083 44083 44083 44085 44076 44083 44085 44085 44076 44083 44075 44085 44083 44081 44085 44083 44085 44083 44083 44083 44085 44075 44083 44081 44076 44080 44083 44085 44075 44085 44083 44081 44085 44085 44083 44085 44075 44085 44083 44081 44085 44083 44075 44085 44083 44081 44085 44085 44083 44085 44075 44083 44083 44085 44085 44083 44083 44083 44085 44083 44085 44085 44081 44085 44075 44081 44085 44083 44080 44075 44085 44081 44080 44083 44075 44085 44081 44083 44080 44075 44081 44085 44080 44083 44075 44081 44085 44080 44083 44085 44081 44085 44075 44081 44083 44080 44085 44085 44085 44075 44085 44081 44075 44085 44081 44080 44075 44085 44081 44085 44075 44085 44081 44075 44085 44081 44085 44085 44075 44085 43892 43892 43892 43892 44075 43892 43892 44075 43892 44075 43892 44075 43892 44075 43892 43892 43892 43892 43892 44075 44080 44080 44075 44075 44080 44081 44075 44080 44075 44075 44080 44075 44080 44081 44080 44075 44075 44080 44081 44075 44080 44075 44075 44075 44080 44081 44080 44075 44075 44080 44081 44075 44075 44080 44075 44080 44075 44075 44080 44075 44080 44075 44075 44080 44081 44080 44075 44075 44080 44081 44080 44075 44075 44080 44081 44075 44080 44081 44080 44075 44075 44080 44081 44080 44075 44075 44080 44081 44080 44075 44075 44081 44080 44075 44080 44075 44080 44075 44080 44075 44075 44080 44080 44075 44080 44075 44081 44085 44075 44081 44080 44085 44075 44080 44085 44085 44075 44075 44085 44081 44080 44075 44075 44080 44085 44075 44081 44085 44080 44075 44075 44085 44080 44081 44080 44075 44085 44075 44085 44081 44080 44075 44085 44075 44080 44085 44075 44085 44075 44085 44081 44080 44075 44080 44085 44075 44081 44085 44080 44085 44075 44075 44081 44080 44085 44075 44085 44080 44075 44080 44075 44075 44075 44081 44085 44075 44080 44085 44075 44080 44081 44085 44085 44085 44080 44083 44080 44081 44085 44080 44085 44081 44085 44075 44085 44081 44080 44080 44085 44080 44081 44075 44085 44080 44081 44080 44085 44081 44075 44085 44080 44081 44085 44080 44085 44085 44075 44085 44081 44080 44080 44085 44081 44075 44085 44081 44080 44083 44080 44081 44085 44081 44085 44075 44080 44080 44085 44080 44085 44080 44081 44085 44081 44075 44081 44075 44081 44080 44075 44081 44075 44081 44075 44081 44075 44085 44085 44075 44085 44081 44083 44075 44085 44081 44080 44083 44075 44085 44081 44083 44085 44075 44085 44081 44075 44085 44081 44085 44075 44085 44085 44085 44083 44075 44080 44075 44080 44083 44075 44075 44080 44081 44075 44075 44080 44075 44080 44075 44080 44081 44083 44080 44075 44075 44080 44081 44083 44075 44075 44080 44075 44075 44080 44081 44075 44080 44075 44081 44080 44075 44080 44075 44080 44075 44075 44075 44075 44075 44075 44080 44075 44080"
## [187] "PMC8260222 /pmc/articles/PMC8260222/bin/elife-69454-supp1.xlsx Hsapiens 1 43711"
## [188] "PMC8260222 /pmc/articles/PMC8260222/bin/elife-69454-supp2.xlsx Hsapiens 1 41888"
## [189] "PMC8219162 /pmc/articles/PMC8219162/bin/pgen.1009574.s008.xlsx Mmusculus 3 44442 44443 44264"
## [190] "PMC8219162 /pmc/articles/PMC8219162/bin/pgen.1009574.s008.xlsx Mmusculus 3 44443 44264 44440"
## [191] "PMC8219162 /pmc/articles/PMC8219162/bin/pgen.1009574.s008.xlsx Mmusculus 9 44265 44258 44259 44260 44261 44262 44450 44446 44448"
## [192] "PMC8219162 /pmc/articles/PMC8219162/bin/pgen.1009574.s008.xlsx Mmusculus 13 44256 44265 44266 44258 44260 44262 44264 44449 44450 44441 44444 44445 44448"
## [193] "PMC8219162 /pmc/articles/PMC8219162/bin/pgen.1009574.s008.xlsx Mmusculus 2 44258 44260"
## [194] "PMC8219162 /pmc/articles/PMC8219162/bin/pgen.1009574.s008.xlsx Mmusculus 2 44258 44260"
## [195] "PMC8219162 /pmc/articles/PMC8219162/bin/pgen.1009574.s008.xlsx Mmusculus 15 44256 44257 44256 44265 44259 44262 44263 44264 44449 44450 44451 44444 44446 44447 44448"
## [196] "PMC8219162 /pmc/articles/PMC8219162/bin/pgen.1009574.s008.xlsx Mmusculus 3 44442 44443 44264"
## [197] "PMC8219162 /pmc/articles/PMC8219162/bin/pgen.1009574.s008.xlsx Mmusculus 1 44264"
## [198] "PMC8219162 /pmc/articles/PMC8219162/bin/pgen.1009574.s008.xlsx Mmusculus 1 44264"
## [199] "PMC8259168 /pmc/articles/PMC8259168/bin/13059_2021_2413_MOESM14_ESM.xlsx Hsapiens 1 37865"
## [200] "PMC8258016 /pmc/articles/PMC8258016/bin/evab103_supplementary_data.xlsx Mmusculus 15 41883 40057 38961 38777 37865 38412 37681 38596 40603 39508 36951 39873 40787 37316 38047"
## [201] "PMC8247369 /pmc/articles/PMC8247369/bin/jcav12p4780s2.xlsx Hsapiens 1 44257"
## [202] "PMC8226361 /pmc/articles/PMC8226361/bin/sj-xlsx-2-cix-10.1177_11769351211027592.xlsx Hsapiens 31 44083 44083 44083 44083 44083 44083 44083 44083 44083 44083 44083 44083 44083 44079 43891 43900 43901 44082 43900 43891 43900 44083 44083 43901 44083 43901 44083 44079 44083 43891 44083"
## [203] "PMC8226361 /pmc/articles/PMC8226361/bin/sj-xlsx-2-cix-10.1177_11769351211027592.xlsx Hsapiens 1 43891"
## [204] "PMC8226361 /pmc/articles/PMC8226361/bin/sj-xlsx-2-cix-10.1177_11769351211027592.xlsx Hsapiens 2 44088 44083"
## [205] "PMC8224457 /pmc/articles/PMC8224457/bin/MSB-17-e9760-s005.xlsx Hsapiens 1 39873"
## [206] "PMC8253514 /pmc/articles/PMC8253514/bin/Data_Sheet_2.xlsx Hsapiens 1 40787"
## [207] "PMC8253049 /pmc/articles/PMC8253049/bin/DataSheet_1.xlsx Ggallus 12 44257 44261 44264 44256 44258 44259 44266 44265 44260 44454 44263 44262"
## [208] "PMC8253049 /pmc/articles/PMC8253049/bin/DataSheet_1.xlsx Hsapiens 13 44257 44259 44261 44258 44531 44266 44262 44264 44265 44256 44263 44454 44260"
## [209] "PMC8249860 /pmc/articles/PMC8249860/bin/Table_3.xlsx Hsapiens 4 43529 43722 43534 43720"
## [210] "PMC8249860 /pmc/articles/PMC8249860/bin/Table_7.xlsx Hsapiens 9 43529 43529 43529 43529 43529 43529 43529 43529 43529"
## [211] "PMC8248162 /pmc/articles/PMC8248162/bin/ALZ-17-984-s003.xlsx Hsapiens 1 44078"
## [212] "PMC8246630 /pmc/articles/PMC8246630/bin/NIHMS1610898-supplement-1610898_Source_Data_Fig_4.xlsx Mmusculus 25 43891 43900 43901 43892 43893 43894 43895 43896 43897 43898 43899 44089 44075 44084 44085 44086 44088 44076 44077 44078 44079 44080 44081 44082 44083"
## [213] "PMC8246630 /pmc/articles/PMC8246630/bin/NIHMS1610898-supplement-1610898_Source_Data_Fig_5.xlsx Mmusculus 23 43891 43892 43900 43901 43893 43895 43896 43897 43898 43899 44089 44075 44084 44085 44086 44076 44077 44078 44079 44080 44081 44082 44083"
## [214] "PMC8246630 /pmc/articles/PMC8246630/bin/NIHMS1610898-supplement-1610898_Source_Data_Fig_5.xlsx Mmusculus 23 43891 43901 43892 43893 43894 43895 43896 43897 43898 43899 44089 44075 44084 44085 44086 44076 44077 44078 44079 44080 44081 44082 44083"
## [215] "PMC8246630 /pmc/articles/PMC8246630/bin/NIHMS1610898-supplement-1610898_Source_Data_Fig_5.xlsx Mmusculus 24 43525 43528 43534 43712 43532 43527 43531 43713 43529 43716 43723 43715 43709 43710 43711 43526 43530 43719 43720 43533 43714 43718 43717 43535"
## [216] "PMC8246630 /pmc/articles/PMC8246630/bin/NIHMS1610898-supplement-1610898_Sup_Dataset_3.xlsx Mmusculus 1 43535"
## [217] "PMC8246630 /pmc/articles/PMC8246630/bin/NIHMS1610898-supplement-1610898_Sup_Dataset_4.xlsx Mmusculus 4 43531 43715 43529 43710"
## [218] "PMC8246630 /pmc/articles/PMC8246630/bin/NIHMS1610898-supplement-1610898_Sup_Dataset_4.xlsx Mmusculus 7 43531 43715 43532 43529 43530 43723 43710"
## [219] "PMC8246630 /pmc/articles/PMC8246630/bin/NIHMS1610898-supplement-1610898_Sup_Dataset_4.xlsx Mmusculus 4 43712 43532 43531 43529"
## [220] "PMC8213753 /pmc/articles/PMC8213753/bin/41467_2021_24043_MOESM13_ESM.xlsx Hsapiens 1 44077"
## [221] "PMC8213753 /pmc/articles/PMC8213753/bin/41467_2021_24043_MOESM20_ESM.xlsx Hsapiens 9 43901 44077 43896 43894 43900 43899 44084 43892 43892"
## [222] "PMC8213753 /pmc/articles/PMC8213753/bin/41467_2021_24043_MOESM5_ESM.xlsx Hsapiens 4 43894 44080 43892 44075"
## [223] "PMC8213753 /pmc/articles/PMC8213753/bin/41467_2021_24043_MOESM6_ESM.xlsx Hsapiens 1 44075"
## [224] "PMC8211852 /pmc/articles/PMC8211852/bin/41467_2021_24044_MOESM11_ESM.xlsx Hsapiens 21 43529 43534 43525 43525 43526 43530 43531 43532 43723 43709 43719 43710 43711 43714 43715 43716 43717 43527 43533 43718 43712"
## [225] "PMC8211852 /pmc/articles/PMC8211852/bin/41467_2021_24044_MOESM5_ESM.xlsx Hsapiens 21 43529 43534 43525 43525 43526 43530 43531 43532 43723 43709 43719 43710 43711 43714 43715 43716 43717 43527 43533 43718 43712"
## [226] "PMC8211852 /pmc/articles/PMC8211852/bin/41467_2021_24044_MOESM6_ESM.xlsx Hsapiens 21 43525 43534 43526 43527 43529 43530 43531 43532 43533 43719 43710 43711 43714 43715 43717 43718 43712 43723 43525 43709 43716"
## [227] "PMC8203611 /pmc/articles/PMC8203611/bin/41467_2021_23873_MOESM4_ESM.xlsx Hsapiens 6 43895 44083 43899 44082 43900 44088"
## [228] "PMC8244194 /pmc/articles/PMC8244194/bin/13148_2021_1119_MOESM4_ESM.xlsx Hsapiens 11 44083 44083 44080 44080 44080 44083 44166 44083 44083 44083 44083"
## [229] "PMC8221374 /pmc/articles/PMC8221374/bin/mmc1.xlsx Ggallus 1 37500"
## [230] "PMC8221374 /pmc/articles/PMC8221374/bin/mmc1.xlsx Ggallus 1 37500"
## [231] "PMC8221374 /pmc/articles/PMC8221374/bin/mmc1.xlsx Hsapiens 1 38231"
## [232] "PMC8221374 /pmc/articles/PMC8221374/bin/mmc1.xlsx Hsapiens 77 36951 37135 39142 40422 41153 37316 37500 38777 39692 40603 40603 41883 38412 40422 39142 37865 41153 42248 39508 38412 40057 38777 37316 40238 37865 40422 38047 37865 39326 39508 38231 41883 40057 39508 37316 37135 37135 41153 40057 40057 39142 39142 38047 38047 38047 38777 38777 39692 40057 38047 40787 39692 39508 41153 36951 40787 40057 39692 36951 38961 38231 38047 38047 36951 39326 41883 40057 38777 39692 41883 38412 38231 40238 37865 39142 39873 37500"
Let’s investigate the errors in more detail.
# By species
SPECIES <- sapply(strsplit(ERROR_GENELISTS," "),"[[",3)
table(SPECIES)
## SPECIES
## Athaliana Dmelanogaster Ggallus Hsapiens Mmusculus
## 3 3 7 163 47
## Rnorvegicus Scerevisiae
## 1 8
par(mar=c(5,12,4,2))
barplot(table(SPECIES),horiz=TRUE,las=1)
par(mar=c(5,5,4,2))
# Number of affected Excel files per paper
DIST <- table(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
DIST
##
## PMC7611218 PMC8203611 PMC8211852 PMC8213753 PMC8217501 PMC8219162 PMC8219709
## 3 1 3 4 1 10 1
## PMC8219801 PMC8219828 PMC8221374 PMC8222384 PMC8224457 PMC8225795 PMC8226361
## 1 1 4 3 1 3 3
## PMC8233376 PMC8238961 PMC8244194 PMC8246630 PMC8247369 PMC8248162 PMC8249860
## 2 1 1 8 1 1 2
## PMC8253049 PMC8253514 PMC8253816 PMC8254024 PMC8255068 PMC8255842 PMC8257577
## 2 1 10 1 3 1 2
## PMC8258016 PMC8259168 PMC8260222 PMC8260754 PMC8260770 PMC8261551 PMC8263711
## 1 1 2 2 3 37 1
## PMC8264799 PMC8266308 PMC8266393 PMC8267417 PMC8267460 PMC8268225 PMC8268579
## 1 2 8 1 1 2 2
## PMC8270675 PMC8270899 PMC8272312 PMC8273913 PMC8273915 PMC8274021 PMC8276479
## 1 1 1 4 33 4 1
## PMC8277874 PMC8278202 PMC8278573 PMC8279762 PMC8279952 PMC8280078 PMC8280127
## 1 2 7 1 1 2 1
## PMC8281136 PMC8283158 PMC8283691 PMC8284395 PMC8288011 PMC8289438 PMC8293916
## 1 4 1 5 1 2 1
## PMC8298252 PMC8299662 PMC8300563 PMC8312417 PMC8312575 PMC8318262 PMC8319688
## 2 2 2 1 3 3 1
## PMC8320537 PMC8322331
## 1 3
summary(as.numeric(DIST))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 3.222 3.000 37.000
hist(DIST,main="Number of affected Excel files per paper")
# PMC Articles with the most errors
DIST_DF <- as.data.frame(DIST)
DIST_DF <- DIST_DF[order(-DIST_DF$Freq),,drop=FALSE]
head(DIST_DF,20)
## Var1 Freq
## 34 PMC8261551 37
## 47 PMC8273915 33
## 6 PMC8219162 10
## 24 PMC8253816 10
## 18 PMC8246630 8
## 38 PMC8266393 8
## 52 PMC8278573 7
## 60 PMC8284395 5
## 4 PMC8213753 4
## 10 PMC8221374 4
## 46 PMC8273913 4
## 48 PMC8274021 4
## 58 PMC8283158 4
## 1 PMC7611218 3
## 3 PMC8211852 3
## 11 PMC8222384 3
## 13 PMC8225795 3
## 14 PMC8226361 3
## 26 PMC8255068 3
## 33 PMC8260770 3
MOST_ERR_FILES = as.character(DIST_DF[1,1])
MOST_ERR_FILES
## [1] "PMC8261551"
# Number of errors per paper
NERR <- as.numeric(sapply(strsplit(ERROR_GENELISTS," "),"[[",4))
names(NERR) <- sapply(strsplit(ERROR_GENELISTS," "),"[[",1)
NERR <-tapply(NERR, names(NERR), sum)
NERR
## PMC7611218 PMC8203611 PMC8211852 PMC8213753 PMC8217501 PMC8219162 PMC8219709
## 38 6 63 15 1 52 1
## PMC8219801 PMC8219828 PMC8221374 PMC8222384 PMC8224457 PMC8225795 PMC8226361
## 6 25 80 3 1 8 34
## PMC8233376 PMC8238961 PMC8244194 PMC8246630 PMC8247369 PMC8248162 PMC8249860
## 2 1 11 111 1 1 13
## PMC8253049 PMC8253514 PMC8253816 PMC8254024 PMC8255068 PMC8255842 PMC8257577
## 25 1 263 129 9 27 13
## PMC8258016 PMC8259168 PMC8260222 PMC8260754 PMC8260770 PMC8261551 PMC8263711
## 15 1 2 52 8 1902 1
## PMC8264799 PMC8266308 PMC8266393 PMC8267417 PMC8267460 PMC8268225 PMC8268579
## 2 2 27 13 3 8 41
## PMC8270675 PMC8270899 PMC8272312 PMC8273913 PMC8273915 PMC8274021 PMC8276479
## 81 1 1 19 66 14 1
## PMC8277874 PMC8278202 PMC8278573 PMC8279762 PMC8279952 PMC8280078 PMC8280127
## 23 15 72 8 2 15 2
## PMC8281136 PMC8283158 PMC8283691 PMC8284395 PMC8288011 PMC8289438 PMC8293916
## 14 8 1 5 1 7 1
## PMC8298252 PMC8299662 PMC8300563 PMC8312417 PMC8312575 PMC8318262 PMC8319688
## 7 4 19 25 14 5 2
## PMC8320537 PMC8322331
## 11 7
hist(NERR,main="number of errors per PMC article")
NERR_DF <- as.data.frame(NERR)
NERR_DF <- NERR_DF[order(-NERR_DF$NERR),,drop=FALSE]
head(NERR_DF,20)
## NERR
## PMC8261551 1902
## PMC8253816 263
## PMC8254024 129
## PMC8246630 111
## PMC8270675 81
## PMC8221374 80
## PMC8278573 72
## PMC8273915 66
## PMC8211852 63
## PMC8219162 52
## PMC8260754 52
## PMC8268579 41
## PMC7611218 38
## PMC8226361 34
## PMC8255842 27
## PMC8266393 27
## PMC8219828 25
## PMC8253049 25
## PMC8312417 25
## PMC8277874 23
MOST_ERR = rownames(NERR_DF)[1]
MOST_ERR
## [1] "PMC8261551"
GENELIST_ERROR_ARTICLES <- gsub("PMC","",GENELIST_ERROR_ARTICLES)
### JSON PARSING is more reliable than XML
ARTICLES <- esummary( GENELIST_ERROR_ARTICLES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA$result
ARTICLE_DATA <- ARTICLE_DATA[2:length(ARTICLE_DATA)]
JOURNALS <- unlist(lapply(ARTICLE_DATA,function(x) {x$fulljournalname} ))
JOURNALS_TABLE <- table(JOURNALS)
JOURNALS_TABLE <- JOURNALS_TABLE[order(-JOURNALS_TABLE)]
length(JOURNALS_TABLE)
## [1] 39
NUM_JOURNALS=length(JOURNALS_TABLE)
par(mar=c(5,25,4,2))
barplot(head(JOURNALS_TABLE,10), horiz=TRUE, las=1,
xlab="Articles with gene name errors in supp files",
main="Top journals this month")
Congrats to our Journal of the Month winner!
JOURNAL_WINNER <- names(head(JOURNALS_TABLE,1))
JOURNAL_WINNER
## [1] "Nature Communications"
There are two categories:
Paper with the most suplementary files affected by gene name errors (MOST_ERR_FILES)
Paper with the most gene names converted to dates (MOST_ERR)
Sometimes, one paper can win both categories. Congrats to our winners.
MOST_ERR_FILES <- gsub("PMC","",MOST_ERR_FILES)
ARTICLES <- esummary( MOST_ERR_FILES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA[2]
ARTICLE_DATA
## $result
## $result$uids
## [1] "8261551"
##
## $result$`8261551`
## $result$`8261551`$uid
## [1] "8261551"
##
## $result$`8261551`$pubdate
## [1] "2021 Jun 23"
##
## $result$`8261551`$epubdate
## [1] "2021 Jun 23"
##
## $result$`8261551`$printpubdate
## [1] ""
##
## $result$`8261551`$source
## [1] "Front Genet"
##
## $result$`8261551`$authors
## name authtype
## 1 Herrera-Uribe J Author
## 2 Wiarda JE Author
## 3 Sivasankaran SK Author
## 4 Daharsh L Author
## 5 Liu H Author
## 6 Byrne KA Author
## 7 Smith TP Author
## 8 Lunney JK Author
## 9 Loving CL Author
## 10 Tuggle CK Author
##
## $result$`8261551`$title
## [1] "Reference Transcriptomes of Porcine Peripheral Immune Cells Created Through Bulk and Single-Cell RNA Sequencing"
##
## $result$`8261551`$volume
## [1] "12"
##
## $result$`8261551`$issue
## [1] ""
##
## $result$`8261551`$pages
## [1] "689406"
##
## $result$`8261551`$articleids
## idtype value
## 1 pmid 34249103
## 2 doi 10.3389/fgene.2021.689406
## 3 pmcid PMC8261551
##
## $result$`8261551`$fulljournalname
## [1] "Frontiers in Genetics"
##
## $result$`8261551`$sortdate
## [1] "2021/06/23 00:00"
##
## $result$`8261551`$pmclivedate
## [1] "2021/07/08"
MOST_ERR <- gsub("PMC","",MOST_ERR)
ARTICLE_DATA <- esummary(MOST_ERR,db = "pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLE_DATA,as= "parsed")
ARTICLE_DATA
## $header
## $header$type
## [1] "esummary"
##
## $header$version
## [1] "0.3"
##
##
## $result
## $result$uids
## [1] "8261551"
##
## $result$`8261551`
## $result$`8261551`$uid
## [1] "8261551"
##
## $result$`8261551`$pubdate
## [1] "2021 Jun 23"
##
## $result$`8261551`$epubdate
## [1] "2021 Jun 23"
##
## $result$`8261551`$printpubdate
## [1] ""
##
## $result$`8261551`$source
## [1] "Front Genet"
##
## $result$`8261551`$authors
## name authtype
## 1 Herrera-Uribe J Author
## 2 Wiarda JE Author
## 3 Sivasankaran SK Author
## 4 Daharsh L Author
## 5 Liu H Author
## 6 Byrne KA Author
## 7 Smith TP Author
## 8 Lunney JK Author
## 9 Loving CL Author
## 10 Tuggle CK Author
##
## $result$`8261551`$title
## [1] "Reference Transcriptomes of Porcine Peripheral Immune Cells Created Through Bulk and Single-Cell RNA Sequencing"
##
## $result$`8261551`$volume
## [1] "12"
##
## $result$`8261551`$issue
## [1] ""
##
## $result$`8261551`$pages
## [1] "689406"
##
## $result$`8261551`$articleids
## idtype value
## 1 pmid 34249103
## 2 doi 10.3389/fgene.2021.689406
## 3 pmcid PMC8261551
##
## $result$`8261551`$fulljournalname
## [1] "Frontiers in Genetics"
##
## $result$`8261551`$sortdate
## [1] "2021/06/23 00:00"
##
## $result$`8261551`$pmclivedate
## [1] "2021/07/08"
To plot the trend over the past 6-12 months.
url <- "http://ziemann-lab.net/public/gene_name_errors/"
doc <- htmlParse(url)
links <- xpathSApply(doc, "//a/@href")
links <- links[grep("html",links)]
links
## href href href
## "Report_2021-02.html" "Report_2021-03.html" "Report_2021-04.html"
## href href href
## "Report_2021-05.html" "Report_2021-06.html" "Report_2021-07.html"
unlink("online_files/",recursive=TRUE)
dir.create("online_files")
sapply(links, function(mylink) {
download.file(paste(url,mylink,sep=""),destfile=paste("online_files/",mylink,sep=""))
} )
## href href href href href href
## 0 0 0 0 0 0
myfilelist <- list.files("online_files/",full.names=TRUE)
trends <- sapply(myfilelist, function(myfilename) {
x <- readLines(myfilename)
# Num XL gene list articles
NUM_GENELIST_ARTICLES <- x[grep("NUM_GENELIST_ARTICLES",x)[3]+1]
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES," "),"[[",3)
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES,"<"),"[[",1)
NUM_GENELIST_ARTICLES <- as.numeric(NUM_GENELIST_ARTICLES)
# number of affected articles
NUM_ERROR_GENELIST_ARTICLES <- x[grep("NUM_ERROR_GENELIST_ARTICLES",x)[3]+1]
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES," "),"[[",3)
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES,"<"),"[[",1)
NUM_ERROR_GENELIST_ARTICLES <- as.numeric(NUM_ERROR_GENELIST_ARTICLES)
# Error proportion
ERROR_PROPORTION <- x[grep("ERROR_PROPORTION",x)[3]+1]
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION," "),"[[",3)
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION,"<"),"[[",1)
ERROR_PROPORTION <- as.numeric(ERROR_PROPORTION)
# number of journals
NUM_JOURNALS <- x[grep('JOURNALS_TABLE',x)[3]+1]
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS," "),"[[",3)
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS,"<"),"[[",1)
NUM_JOURNALS <- as.numeric(NUM_JOURNALS)
NUM_JOURNALS
res <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
return(res)
})
colnames(trends) <- sapply(strsplit(colnames(trends),"_"),"[[",3)
colnames(trends) <- gsub(".html","",colnames(trends))
trends <- as.data.frame(trends)
rownames(trends) <- c("NUM_GENELIST_ARTICLES","NUM_ERROR_GENELIST_ARTICLES","ERROR_PROPORTION","NUM_JOURNALS")
trends <- t(trends)
trends <- as.data.frame(trends)
CURRENT_RES <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
trends <- rbind(trends,CURRENT_RES)
paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
## [1] "2021-08"
rownames(trends)[nrow(trends)] <- paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
plot(trends$NUM_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with Excel gene lists per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_ERROR_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with gene name errors per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$ERROR_PROPORTION, xaxt = "n" , type="b" , main="Proportion of articles with Excel gene list affected by errors",
ylab="proportion", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_JOURNALS, xaxt = "n" , type="b" , main="Number of journals with affected articles",
ylab="number of journals", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
unlink("online_files/",recursive=TRUE)
Zeeberg, B.R., Riss, J., Kane, D.W. et al. Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5, 80 (2004). https://doi.org/10.1186/1471-2105-5-80
Ziemann, M., Eren, Y. & El-Osta, A. Gene name errors are widespread in the scientific literature. Genome Biol 17, 177 (2016). https://doi.org/10.1186/s13059-016-1044-7
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] readxl_1.3.1 reutils_0.2.3 xml2_1.3.2 jsonlite_1.7.2 XML_3.99-0.6
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.7 knitr_1.33 magrittr_2.0.1 R6_2.5.0
## [5] rlang_0.4.11 stringr_1.4.0 highr_0.9 tools_4.1.0
## [9] xfun_0.24 jquerylib_0.1.4 htmltools_0.5.1.1 yaml_2.2.1
## [13] digest_0.6.27 assertthat_0.2.1 sass_0.4.0 bitops_1.0-7
## [17] RCurl_1.98-1.3 evaluate_0.14 rmarkdown_2.9 stringi_1.7.3
## [21] compiler_4.1.0 bslib_0.2.5.1 cellranger_1.1.0