Source: https://github.com/markziemann/GeneNameErrors2020
View the reports: http://ziemann-lab.net/public/gene_name_errors/
Gene name errors result when data are imported improperly into MS Excel and other spreadsheet programs (Zeeberg et al, 2004). Certain gene names like MARCH3, SEPT2 and DEC1 are converted into date format. These errors are surprisingly common in supplementary data files in the field of genomics (Ziemann et al, 2016). This could be considered a small error because it only affects a small number of genes, however it is symptomtic of poor data processing methods. The purpose of this script is to identify gene name errors present in supplementary files of PubMed Central articles in the previous month.
library("XML")
library("jsonlite")
library("xml2")
library("reutils")
library("readxl")
Here I will be getting PubMed Central IDs for the previous month.
Start with figuring out the date to search PubMed Central.
CURRENT_MONTH=format(Sys.time(), "%m")
CURRENT_YEAR=format(Sys.time(), "%Y")
if (CURRENT_MONTH == "01") {
PREV_YEAR=as.character(as.numeric(format(Sys.time(), "%Y"))-1)
PREV_MONTH="12"
} else {
PREV_YEAR=CURRENT_YEAR
PREV_MONTH=as.character(as.numeric(format(Sys.time(), "%m"))-1)
}
DATE=paste(PREV_YEAR,"/",PREV_MONTH,sep="")
DATE
## [1] "2022/2"
Let’s see how many PMC IDs we have in the past month.
QUERY ='((genom*[Abstract]))'
ESEARCH_RES <- esearch(term=QUERY, db = "pmc", rettype = "uilist", retmode = "xml", retstart = 0,
retmax = 5000000, usehistory = TRUE, webenv = NULL, querykey = NULL, sort = NULL, field = NULL,
datetype = NULL, reldate = NULL, mindate = DATE, maxdate = DATE)
pmc <- efetch(ESEARCH_RES,retmode="text",rettype="uilist",outfile="pmcids.txt")
## Retrieving UIDs 1 to 500
## Retrieving UIDs 501 to 1000
## Retrieving UIDs 1001 to 1500
## Retrieving UIDs 1501 to 2000
## Retrieving UIDs 2001 to 2500
## Retrieving UIDs 2501 to 3000
## Retrieving UIDs 3001 to 3500
pmc <- read.table(pmc)
pmc <- paste("PMC",pmc$V1,sep="")
NUM_ARTICLES=length(pmc)
NUM_ARTICLES
## [1] 3383
writeLines(pmc,con="pmc.txt")
Now run the bash script. Note that false positives can occur (~1.5%) and these results have not been verified by a human.
Here are some definitions:
NUM_XLS = Number of supplementary Excel files in this set of PMC articles.
NUM_XLS_ARTICLES = Number of articles matching the PubMed Central search which have supplementary Excel files.
GENELISTS = The gene lists found in the Excel files. Each Excel file is counted once even it has multiple gene lists.
NUM_GENELISTS = The number of Excel files with gene lists.
NUM_GENELIST_ARTICLES = The number of PMC articles with supplementary Excel gene lists.
ERROR_GENELISTS = Files suspected to contain gene name errors. The dates and five-digit numbers indicate transmogrified gene names.
NUM_ERROR_GENELISTS = Number of Excel gene lists with errors.
NUM_ERROR_GENELIST_ARTICLES = Number of articles with supplementary Excel gene name errors.
ERROR_PROPORTION = This is the proportion of articles with Excel gene lists that have errors.
system("./gene_names.sh pmc.txt")
results <- readLines("results.txt")
XLS <- results[grep("XLS",results,ignore.case=TRUE)]
NUM_XLS = length(XLS)
NUM_XLS
## [1] 3654
NUM_XLS_ARTICLES = length(unique(sapply(strsplit(XLS," "),"[[",1)))
NUM_XLS_ARTICLES
## [1] 645
GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>2]
#GENELISTS
NUM_GENELISTS <- length(unique(sapply(strsplit(GENELISTS," "),"[[",2)))
NUM_GENELISTS
## [1] 491
NUM_GENELIST_ARTICLES <- length(unique(sapply(strsplit(GENELISTS," "),"[[",1)))
NUM_GENELIST_ARTICLES
## [1] 247
ERROR_GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>3]
#ERROR_GENELISTS
NUM_ERROR_GENELISTS = length(ERROR_GENELISTS)
NUM_ERROR_GENELISTS
## [1] 184
GENELIST_ERROR_ARTICLES <- unique(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
GENELIST_ERROR_ARTICLES
## [1] "PMC8873981" "PMC8873582" "PMC8872788" "PMC8837554" "PMC8865640"
## [6] "PMC8860442" "PMC8856659" "PMC8851870" "PMC8856606" "PMC8851680"
## [11] "PMC8851671" "PMC8848662" "PMC8851317" "PMC8850650" "PMC8850306"
## [16] "PMC8817631" "PMC8833116" "PMC8829032" "PMC8828869" "PMC8826452"
## [21] "PMC8822763" "PMC8819090" "PMC8807747" "PMC8803925" "PMC8791834"
## [26] "PMC8789861" "PMC8814347" "PMC8807355" "PMC8803663" "PMC8796709"
## [31] "PMC8782884" "PMC8812899" "PMC8811262" "PMC8795503" "PMC8794810"
## [36] "PMC8791277" "PMC8783018" "PMC8806071" "PMC8804361" "PMC8867041"
## [41] "PMC8860223" "PMC8865845" "PMC8863889" "PMC8856743" "PMC8851819"
## [46] "PMC8853518" "PMC8847715" "PMC8845119" "PMC8817049" "PMC8844495"
## [51] "PMC8844200" "PMC8763317" "PMC8830042" "PMC8819343" "PMC8821623"
## [56] "PMC8819394" "PMC8823376" "PMC8818873" "PMC8818738" "PMC8807822"
## [61] "PMC8807396" "PMC8789859" "PMC8782899" "PMC8776955" "PMC8766575"
## [66] "PMC8807568" "PMC8800112" "PMC8796118" "PMC8784157" "PMC8782496"
## [71] "PMC8792531"
NUM_ERROR_GENELIST_ARTICLES <- length(GENELIST_ERROR_ARTICLES)
NUM_ERROR_GENELIST_ARTICLES
## [1] 71
ERROR_PROPORTION = NUM_ERROR_GENELIST_ARTICLES / NUM_GENELIST_ARTICLES
ERROR_PROPORTION
## [1] 0.2874494
Here you can have a look at all the gene lists detected in the past month, as well as those with errors. The dates are obvious errors, these are commonly dates in September, March, December and October. The five-digit numbers represent dates as they are encoded in the Excel internal format. The five digit number is the number of days since 1900. If you were to take these numbers and put them into Excel and format the cells as dates, then these will also mostly map to dates in September, March, December and October.
#GENELISTS
ERROR_GENELISTS
## [1] "PMC8873981 /pmc/articles/PMC8873981/bin/Table3.XLSX Hsapiens 1 43901"
## [2] "PMC8873582 /pmc/articles/PMC8873582/bin/Table11.XLSX Hsapiens 23 44079 44084 44089 43895 44077 44078 44082 44085 43893 43894 44083 43891 43892 43896 43892 44075 43897 43898 43891 44081 44080 43899 44076"
## [3] "PMC8873582 /pmc/articles/PMC8873582/bin/Table12.XLSX Hsapiens 27 43900 43897 44082 44088 43894 44089 44075 44077 44080 43896 44083 43892 44084 43892 43899 44078 44086 44081 43891 44079 43895 44166 44085 43893 43891 43901 43898"
## [4] "PMC8873582 /pmc/articles/PMC8873582/bin/Table2.XLSX Hsapiens 21 43896 43895 44076 44083 43897 44082 44081 43891 44085 43892 44084 43893 43898 43901 43891 44088 43900 44077 44166 43894 44086"
## [5] "PMC8873582 /pmc/articles/PMC8873582/bin/Table2.XLSX Hsapiens 21 43896 43895 44076 44083 43897 44082 44081 43891 44085 43892 44084 43893 43898 43901 43891 44088 43900 44077 44166 43894 44086"
## [6] "PMC8873582 /pmc/articles/PMC8873582/bin/Table2.XLSX Hsapiens 21 43896 43895 44076 44083 43897 44082 44081 43891 44085 43892 44084 43893 43898 43901 43891 44088 43900 44077 44166 43894 44086"
## [7] "PMC8873582 /pmc/articles/PMC8873582/bin/Table3.XLSX Hsapiens 308 44075 44083 44083 44081 44082 43897 43892 43892 43892 44079 43892 43892 43891 43892 44083 43892 43892 43892 44083 44079 44083 43892 44081 44083 44083 44081 44075 44083 44081 44079 44082 44081 44079 44077 43891 44077 43892 43893 43891 44083 44081 43901 44081 43891 44082 44083 44079 43891 44081 44083 43891 43891 44081 43891 44081 44079 43897 44083 44083 43891 43895 43891 43891 43891 43891 44083 44077 44078 44083 44082 44083 43892 44083 44083 44083 44079 43893 44083 44083 44082 43893 44082 43891 44083 43893 44083 44084 43895 44083 44083 43896 44079 43899 44083 43891 44084 44082 44083 44083 43893 43893 43893 43893 43891 44077 44082 44083 44085 43893 44078 44084 44083 43892 44082 44082 44078 44082 44084 44078 44079 44079 44078 44084 43893 44083 43900 43900 44083 44077 44083 44084 44079 44083 44083 43892 43893 43894 43891 44077 44083 44077 44082 44078 43901 44084 44083 44077 43892 43900 43894 44083 44082 43892 43901 44082 43895 43901 44085 43900 43897 44085 44083 43894 44082 44079 43898 44085 44085 44078 44078 44082 44085 43894 44083 43901 43901 43901 43901 43894 43892 43901 43894 43891 44085 44082 43901 43901 43894 44075 43895 43901 43901 44089 44083 44078 44083 44085 43892 43891 43898 43901 44075 43898 43894 43901 44080 44085 44083 44085 43898 43900 44083 43900 43900 44083 44083 43895 43894 44078 44078 44083 43894 44080 44083 44083 44083 44083 44075 44086 44089 44075 43895 44083 43896 44083 43897 43901 43898 44083 44083 44082 43894 43895 44083 43900 44085 43898 44089 44086 44085 43898 44075 43896 44083 44083 44084 44080 44086 44083 44085 44075 44078 44085 43898 43896 44083 44075 44083 44082 44083 44078 43891 44078 44083 44083 43900 44080 44078 44082 44080 44089 44166 44078 44075 44086 44089 44083 44086 44083 44086 43895 43899 44082 44075 43901 44083 44083 44083 43891 44083 44083 44083 43896 44083 44086 44166 44089 43901"
## [8] "PMC8873582 /pmc/articles/PMC8873582/bin/Table3.XLSX Hsapiens 308 44075 44083 44083 44081 44082 43897 43892 43892 43892 44079 43892 43892 43891 43892 44083 43892 43892 43892 44083 44079 44083 43892 44081 44083 44083 44081 44075 44083 44081 44079 44082 44081 44079 44077 43891 44077 43892 43893 43891 44083 44081 43901 44081 43891 44082 44083 44079 43891 44081 44083 43891 43891 44081 43891 44081 44079 43897 44083 44083 43891 43895 43891 43891 43891 43891 44083 44077 44078 44083 44082 44083 43892 44083 44083 44083 44079 43893 44083 44083 44082 43893 44082 43891 44083 43893 44083 44084 43895 44083 44083 43896 44079 43899 44083 43891 44084 44082 44083 44083 43893 43893 43893 43893 43891 44077 44082 44083 44085 43893 44078 44084 44083 43892 44082 44082 44078 44082 44084 44078 44079 44079 44078 44084 43893 44083 43900 43900 44083 44077 44083 44084 44079 44083 44083 43892 43893 43894 43891 44077 44083 44077 44082 44078 43901 44084 44083 44077 43892 43900 43894 44083 44082 43892 43901 44082 43895 43901 44085 43900 43897 44085 44083 43894 44082 44079 43898 44085 44085 44078 44078 44082 44085 43894 44083 43901 43901 43901 43901 43894 43892 43901 43894 43891 44085 44082 43901 43901 43894 44075 43895 43901 43901 44089 44083 44078 44083 44085 43892 43891 43898 43901 44075 43898 43894 43901 44080 44085 44083 44085 43898 43900 44083 43900 43900 44083 44083 43895 43894 44078 44078 44083 43894 44080 44083 44083 44083 44083 44075 44086 44089 44075 43895 44083 43896 44083 43897 43901 43898 44083 44083 44082 43894 43895 44083 43900 44085 43898 44089 44086 44085 43898 44075 43896 44083 44083 44084 44080 44086 44083 44085 44075 44078 44085 43898 43896 44083 44075 44083 44082 44083 44078 43891 44078 44083 44083 43900 44080 44078 44082 44080 44089 44166 44078 44075 44086 44089 44083 44086 44083 44086 43895 43899 44082 44075 43901 44083 44083 44083 43891 44083 44083 44083 43896 44083 44086 44166 44089 43901"
## [9] "PMC8872788 /pmc/articles/PMC8872788/bin/pnas.2115999119.sd04.xlsx Hsapiens 36 44078 44078 44083 44083 44083 44083 44083 44080 44083 44083 43897 44078 44079 44076 44083 44083 44083 44084 44076 44080 44083 44078 44075 44083 44083 44081 44075 44080 44083 44075 44080 44081 44084 44081 44079 44081"
## [10] "PMC8872788 /pmc/articles/PMC8872788/bin/pnas.2115999119.sd04.xlsx Hsapiens 36 44083 44083 44083 44083 44083 44083 44083 44083 44083 44078 44083 44075 44080 44083 44083 44083 44083 44084 44076 44081 44078 44080 44079 44076 44079 44080 44075 44078 44080 44081 44081 43897 44084 44075 44081 44083"
## [11] "PMC8872788 /pmc/articles/PMC8872788/bin/pnas.2115999119.sd04.xlsx Hsapiens 56 44079 44083 43897 44081 44076 44083 44084 44084 44083 44081 44083 44084 44080 44083 44083 44083 44076 44081 44081 44083 44078 44083 43897 44083 44078 44083 43897 44075 44080 44081 44084 44084 43897 44081 44082 44083 43897 44078 44083 44076 44078 44082 44081 44076 44083 44083 43897 44078 44080 44082 43897 44079 44081 44083 44075 44083"
## [12] "PMC8872788 /pmc/articles/PMC8872788/bin/pnas.2115999119.sd04.xlsx Hsapiens 25 44080 44075 44083 44084 44083 44083 44081 44083 44080 44075 44079 44081 44083 44075 44079 44084 44076 44084 44078 44081 44083 44079 44083 44078 44081"
## [13] "PMC8837554 /pmc/articles/PMC8837554/bin/41588_2021_990_MOESM4_ESM.xlsx Hsapiens 1 38231"
## [14] "PMC8837554 /pmc/articles/PMC8837554/bin/41588_2021_990_MOESM4_ESM.xlsx Hsapiens 28 38047 37226 37500 37865 36951 40238 39873 38412 39326 36951 41883 38231 37316 41153 37681 42248 40057 38961 39508 38596 39142 39692 37135 40603 37316 38777 40787 40422"
## [15] "PMC8865640 /pmc/articles/PMC8865640/bin/pbio.3001538.s013.xlsx Hsapiens 27 39326 37316 37500 39142 39692 38961 39873 40057 39508 40422 40603 40787 37135 41153 38777 38596 38412 37681 41883 37316 38047 38231 36951 37865 40238 37226 36951"
## [16] "PMC8865640 /pmc/articles/PMC8865640/bin/pbio.3001538.s013.xlsx Hsapiens 17 38200 36892 38200 38200 38200 36892 36892 37104 36892 37104 37834 37104 37469 37834 37834 37834 37469"
## [17] "PMC8865640 /pmc/articles/PMC8865640/bin/pbio.3001538.s013.xlsx Hsapiens 79 38777 40057 38961 39142 39508 40057 40057 40057 40057 40057 40057 40057 40057 40057 40057 40057 36951 39142 39142 40238 40238 40238 40238 40238 40238 40238 40238 40238 40238 40238 38777 38777 38961 38961 38961 38961 36951 36951 40787 40787 40787 36951 39508 39508 39508 37135 38961 39692 40787 40238 40238 38596 38596 38596 39692 39692 39692 39692 39692 39692 38412 38412 38412 37865 37865 37865 36951 39873 37681 41153 41153 40422 40422 40422 40422 40422 40422 40238 41883"
## [18] "PMC8865640 /pmc/articles/PMC8865640/bin/pbio.3001538.s013.xlsx Hsapiens 41 39692 37226 37316 38596 38231 38961 36951 37865 38047 39508 39508 39508 37135 37500 38596 38231 40057 40057 40057 40057 40057 40057 38961 38961 38961 39692 39692 39692 38412 38412 40787 37865 37865 39142 39873 37681 37681 41153 41153 40238 39508"
## [19] "PMC8865640 /pmc/articles/PMC8865640/bin/pbio.3001538.s014.xlsx Hsapiens 5 37104 37469 38200 36892 37834"
## [20] "PMC8865640 /pmc/articles/PMC8865640/bin/pbio.3001538.s014.xlsx Hsapiens 86 38777 41883 38412 37865 36951 40057 38777 38777 37500 38412 37681 38961 39508 38412 37135 40057 40057 40057 40057 40057 40057 40057 40057 40057 40057 40057 37865 40787 37500 37500 37500 37500 37500 37500 37500 37500 39326 38596 39142 40422 37865 39508 39508 39508 37316 39873 39692 36951 37681 38596 38596 38231 42248 42248 38961 38961 38961 38961 39692 39692 39692 39692 37316 37316 37316 37316 37316 40787 40787 38047 39142 39142 40238 40238 39326 39326 39692 39692 39692 39692 39692 36951 36951 36951 40787 40422"
## [21] "PMC8865640 /pmc/articles/PMC8865640/bin/pbio.3001538.s014.xlsx Hsapiens 54 40238 39508 40238 40238 40238 40238 40238 40238 40238 40238 40238 40238 40238 40238 40238 39508 42248 38777 41883 40603 39326 39326 39326 42248 38777 38777 38961 38961 38961 38961 38961 38961 37226 38412 37316 37316 36951 36951 36951 40787 40787 40787 40787 38047 36951 39142 39142 39142 40422 40422 40422 40422 40422 40422"
## [22] "PMC8865640 /pmc/articles/PMC8865640/bin/pbio.3001538.s014.xlsx Hsapiens 15 37316 40057 38047 37500 37500 37500 37500 37500 37500 37316 38412 38412 40422 40238 41883"
## [23] "PMC8865640 /pmc/articles/PMC8865640/bin/pbio.3001538.s017.xlsx Hsapiens 10 42248 40422 39142 37500 40787 38777 37681 39326 38412 38961"
## [24] "PMC8865640 /pmc/articles/PMC8865640/bin/pbio.3001538.s017.xlsx Hsapiens 27 39142 38047 38047 37500 40787 40787 40787 40787 40787 40787 38777 38777 38777 37681 39326 37226 41153 41153 37135 40057 40057 40057 40057 37316 37865 38961 38961"
## [25] "PMC8865640 /pmc/articles/PMC8865640/bin/pbio.3001538.s017.xlsx Hsapiens 31 42248 37316 40422 37500 37500 40787 37681 37681 37681 37681 39692 39873 39873 37135 37135 40238 40238 40057 40057 40057 40057 37316 37316 38596 38596 38596 37865 38961 38961 38961 38961"
## [26] "PMC8860442 /pmc/articles/PMC8860442/bin/elife-75132-supp2.xlsx Mmusculus 25 44621 44631 44622 44623 44624 44625 44626 44627 44628 44819 44814 44815 44806 44807 44808 44809 44811 44812 44813 44629 44810 44805 44630 44816 44818"
## [27] "PMC8856659 /pmc/articles/PMC8856659/bin/elife-73357-supp1.xlsx Dmelanogaster 1 44440"
## [28] "PMC8851870 /pmc/articles/PMC8851870/bin/13148_2022_1236_MOESM5_ESM.xlsx Ggallus 1 37226"
## [29] "PMC8856606 /pmc/articles/PMC8856606/bin/Data_Sheet_1.xlsx Hsapiens 2 44448 44448"
## [30] "PMC8851680 /pmc/articles/PMC8851680/bin/mmc2.xlsx Hsapiens 23 44266 44450 44442 44447 44444 44445 44256 44440 44448 44258 44449 44262 44257 44441 44257 44256 44260 44263 44264 44446 44265 44261 44443"
## [31] "PMC8851680 /pmc/articles/PMC8851680/bin/mmc2.xlsx Hsapiens 3 44257 44531 44256"
## [32] "PMC8851671 /pmc/articles/PMC8851671/bin/esab066_suppl_supplementary_table_s3.xlsx Dmelanogaster 5 42979 43070 42983 42980 42982"
## [33] "PMC8848662 /pmc/articles/PMC8848662/bin/13059_2022_2613_MOESM22_ESM.xlsx Hsapiens 3 44166 43891 43892"
## [34] "PMC8851317 /pmc/articles/PMC8851317/bin/Table_2.xls Hsapiens 4 44447 44450 44446 44256"
## [35] "PMC8850650 /pmc/articles/PMC8850650/bin/Table1.XLSX Hsapiens 2 44443 44531"
## [36] "PMC8850306 /pmc/articles/PMC8850306/bin/Table_8.xlsx Hsapiens 24 44445 44444 44256 44257 44449 44448 44258 44264 44453 44440 44447 44259 44262 44443 44446 44261 44450 44263 44256 44257 44260 44441 44442 44454"
## [37] "PMC8850306 /pmc/articles/PMC8850306/bin/Table_8.xlsx Hsapiens 17 44264 44449 44258 44262 44263 44447 44440 44441 44257 44454 44448 44261 44450 44256 44446 44260 44445"
## [38] "PMC8850306 /pmc/articles/PMC8850306/bin/Table_8.xlsx Hsapiens 16 44262 44446 44441 44445 44261 44260 44258 44450 44442 44440 44257 44263 44454 44447 44448 44264"
## [39] "PMC8850306 /pmc/articles/PMC8850306/bin/Table_9.xlsx Hsapiens 21 43353 43345 43349 43352 43168 43166 43435 43160 43162 43164 43163 43344 43167 43161 43351 43346 43170 43348 43347 43169 43355"
## [40] "PMC8817631 /pmc/articles/PMC8817631/bin/peerj-10-12909-s003.xlsx Hsapiens 26 44086 43891 43896 43898 44079 43891 44089 43895 44080 44075 44081 44077 43899 44083 44084 43897 44085 44082 43893 44088 43892 43892 43900 44076 43894 44078"
## [41] "PMC8817631 /pmc/articles/PMC8817631/bin/peerj-10-12909-s005.xlsx Hsapiens 5 43891 44084 43897 44085 43894"
## [42] "PMC8833116 /pmc/articles/PMC8833116/bin/aging-14-203809-s002.xlsx Hsapiens 1 44257"
## [43] "PMC8829032 /pmc/articles/PMC8829032/bin/Table1.XLSX Hsapiens 1 44259"
## [44] "PMC8828869 /pmc/articles/PMC8828869/bin/41598_2022_6239_MOESM1_ESM.xlsx Hsapiens 34 44440 44440 44256 44266 44440 44440 44261 44440 44440 44440 44447 44440 44440 44256 44448 44447 44256 44440 44443 44256 44440 44256 44261 44440 44451 44440 44256 44261 44440 44256 44449 44440 44440 44256"
## [45] "PMC8828869 /pmc/articles/PMC8828869/bin/41598_2022_6239_MOESM1_ESM.xlsx Hsapiens 310 44451 44257 44256 44454 44260 44451 44266 44261 44258 44447 44446 44453 44440 44446 44453 44257 44256 44453 44454 44263 44260 44261 44266 44447 44258 44453 44446 44264 44451 44265 44442 44258 44453 44451 44444 44261 44266 44446 44440 44451 44261 44266 44265 44257 44256 44444 44266 44261 44446 44531 44454 44440 44451 44257 44441 44444 44442 44256 44450 44266 44261 44446 44453 44256 44257 44263 44264 44451 44257 44258 44447 44261 44266 44453 44446 44440 44453 44264 44451 44453 44266 44261 44453 44454 44451 44443 44448 44265 44257 44442 44261 44266 44453 44261 44266 44264 44444 44256 44257 44263 44440 44444 44442 44266 44261 44447 44258 44446 44453 44531 44266 44261 44446 44257 44256 44263 44260 44264 44451 44444 44442 44266 44261 44446 44453 44531 44256 44257 44263 44260 44264 44257 44446 44531 44261 44266 44454 44265 44441 44259 44453 44444 44266 44261 44266 44261 44265 44450 44446 44261 44266 44453 44446 44257 44256 44454 44264 44451 44262 44259 44444 44256 44450 44447 44266 44261 44446 44453 44445 44446 44453 44257 44256 44454 44451 44440 44444 44442 44266 44261 44447 44258 44446 44453 44257 44256 44443 44265 44448 44256 44257 44264 44261 44266 44446 44440 44451 44444 44442 44258 44447 44266 44261 44446 44453 44445 44266 44261 44256 44257 44443 44265 44448 44257 44256 44257 44441 44266 44261 44453 44445 44257 44256 44261 44266 44453 44446 44256 44447 44261 44266 44446 44264 44257 44451 44257 44444 44453 44260 44453 44446 44261 44266 44263 44444 44453 44446 44440 44261 44446 44453 44261 44266 44257 44256 44264 44440 44451 44265 44443 44448 44257 44449 44450 44266 44261 44447 44258 44446 44454 44264 44451 44440 44265 44448 44444 44266 44261 44453 44446 44263 44451 44444 44442 44266 44261 44258 44447 44446 44453 44256 44257 44454 44440 44448 44261 44266 44446 44453 44257 44256 44454 44260 44448 44262 44444 44442 44446 44445 44264"
## [46] "PMC8826452 /pmc/articles/PMC8826452/bin/Table1.XLSX Hsapiens 19 44257 44262 44450 44264 44451 44259 44256 44261 44258 44265 44263 44454 44445 44260 44256 44257 44446 44448 44444"
## [47] "PMC8826452 /pmc/articles/PMC8826452/bin/Table2.XLSX Hsapiens 24 44257 44262 44450 44264 44451 44259 44256 44261 44258 44265 44443 44454 44266 44445 44441 44257 44447 44453 44442 44449 44446 44448 44444 44440"
## [48] "PMC8826452 /pmc/articles/PMC8826452/bin/Table3.XLSX Hsapiens 27 44257 44262 44450 44264 44451 44259 44256 44261 44258 44265 44263 44443 44454 44266 44445 44441 44256 44257 44531 44447 44453 44442 44449 44446 44448 44444 44440"
## [49] "PMC8826452 /pmc/articles/PMC8826452/bin/Table4.XLSX Hsapiens 26 44262 44450 44264 44451 44259 44256 44261 44258 44265 44263 44443 44454 44266 44445 44260 44441 44256 44257 44447 44453 44442 44449 44446 44448 44444 44440"
## [50] "PMC8822763 /pmc/articles/PMC8822763/bin/13293_2022_415_MOESM4_ESM.xlsx Hsapiens 5 44446 44256 44445 44263 44444"
## [51] "PMC8822763 /pmc/articles/PMC8822763/bin/13293_2022_415_MOESM4_ESM.xlsx Hsapiens 2 44446 44445"
## [52] "PMC8822763 /pmc/articles/PMC8822763/bin/13293_2022_415_MOESM4_ESM.xlsx Hsapiens 6 44446 44256 44445 44263 44260 44444"
## [53] "PMC8822763 /pmc/articles/PMC8822763/bin/13293_2022_415_MOESM4_ESM.xlsx Hsapiens 6 44446 44256 44445 44263 44260 44444"
## [54] "PMC8819090 /pmc/articles/PMC8819090/bin/Table1.XLSX Hsapiens 1 43893"
## [55] "PMC8807747 /pmc/articles/PMC8807747/bin/41467_2022_28198_MOESM6_ESM.xls Hsapiens 1 44257"
## [56] "PMC8807747 /pmc/articles/PMC8807747/bin/41467_2022_28198_MOESM6_ESM.xls Hsapiens 1 44257"
## [57] "PMC8803925 /pmc/articles/PMC8803925/bin/41408_2021_576_MOESM4_ESM.xlsx Hsapiens 14 44257 44257 44262 44264 44259 44256 44261 44263 44531 44265 44258 44266 44256 44260"
## [58] "PMC8791834 /pmc/articles/PMC8791834/bin/41586_2021_4278_MOESM4_ESM.xlsx Hsapiens 14 38231 38777 40238 36951 36951 37500 40057 40057 39142 36951 40057 36951 36951 39692"
## [59] "PMC8789861 /pmc/articles/PMC8789861/bin/41467_2022_28028_MOESM12_ESM.xlsx Hsapiens 2 44453 44453"
## [60] "PMC8789861 /pmc/articles/PMC8789861/bin/41467_2022_28028_MOESM12_ESM.xlsx Hsapiens 4 44453 44453 44453 44453"
## [61] "PMC8789861 /pmc/articles/PMC8789861/bin/41467_2022_28028_MOESM12_ESM.xlsx Hsapiens 2 44453 44453"
## [62] "PMC8789861 /pmc/articles/PMC8789861/bin/41467_2022_28028_MOESM12_ESM.xlsx Hsapiens 4 44260 44260 44260 44260"
## [63] "PMC8814347 /pmc/articles/PMC8814347/bin/Table_1.xlsx Hsapiens 1 44445"
## [64] "PMC8807355 /pmc/articles/PMC8807355/bin/MOL2-16-665-s001.xlsx Hsapiens 20 38231 36951 38596 39326 39692 39142 37316 40057 36951 42248 40787 37500 38961 37681 38412 39873 40422 38777 37316 39508"
## [65] "PMC8807355 /pmc/articles/PMC8807355/bin/MOL2-16-665-s001.xlsx Hsapiens 20 38231 36951 38596 39326 39692 39142 37316 40057 36951 42248 40787 37500 38961 37681 38412 39873 40422 38777 37316 39508"
## [66] "PMC8807355 /pmc/articles/PMC8807355/bin/MOL2-16-665-s001.xlsx Hsapiens 25 40422 39142 38047 37500 40787 36951 38777 40603 37681 39692 39326 41883 37226 39508 38412 39873 41153 37135 38231 40238 40057 37316 38596 37865 38961"
## [67] "PMC8807355 /pmc/articles/PMC8807355/bin/MOL2-16-665-s001.xlsx Hsapiens 1 38047"
## [68] "PMC8803663 /pmc/articles/PMC8803663/bin/mmc1.xlsx Hsapiens 1 42248"
## [69] "PMC8803663 /pmc/articles/PMC8803663/bin/mmc9.xlsx Hsapiens 6 40787 40422 42248 37500 38412 40057"
## [70] "PMC8803663 /pmc/articles/PMC8803663/bin/mmc9.xlsx Hsapiens 11 38412 39326 37500 36951 40057 38596 40422 38961 40787 39692 39326"
## [71] "PMC8796709 /pmc/articles/PMC8796709/bin/peerj-10-12843-s003.xlsx Hsapiens 6 44449 44446 44453 44443 44264 44257"
## [72] "PMC8782884 /pmc/articles/PMC8782884/bin/41467_2022_28135_MOESM5_ESM.xlsx Mmusculus 2 43893 43891"
## [73] "PMC8812899 /pmc/articles/PMC8812899/bin/pmed.1003897.s014.xlsx Hsapiens 25 42248 39692 36951 39326 37226 39873 38596 38777 39508 40787 37681 38961 37316 37135 36951 38047 37865 40057 38412 38231 37316 39142 37500 37135 40422"
## [74] "PMC8812899 /pmc/articles/PMC8812899/bin/pmed.1003897.s016.xlsx Hsapiens 1 40057"
## [75] "PMC8811262 /pmc/articles/PMC8811262/bin/DataSheet1.xlsx Hsapiens 3 44447 44447 44447"
## [76] "PMC8795503 /pmc/articles/PMC8795503/bin/pnas.2114314119.sd04.xlsx Hsapiens 1 40422"
## [77] "PMC8795503 /pmc/articles/PMC8795503/bin/pnas.2114314119.sd07.xlsx Hsapiens 7 37316 39326 39692 40057 40787 40422 37681"
## [78] "PMC8794810 /pmc/articles/PMC8794810/bin/pnas.2107879119.sd02.xlsx Athaliana 6 37895 37165 37530 38200 38930 38261"
## [79] "PMC8791277 /pmc/articles/PMC8791277/bin/mmc3.xlsx Hsapiens 1 39325"
## [80] "PMC8783018 /pmc/articles/PMC8783018/bin/12264_2021_770_MOESM6_ESM.xls Hsapiens 7 40424 40431 40423 40427 40243 40512 40425"
## [81] "PMC8806071 /pmc/articles/PMC8806071/bin/pmed.1003679.s023.xlsx Hsapiens 1 36951"
## [82] "PMC8804361 /pmc/articles/PMC8804361/bin/Table6.XLSX Hsapiens 1 44257"
## [83] "PMC8867041 /pmc/articles/PMC8867041/bin/Table2.XLSX Hsapiens 3 44450 44262 44266"
## [84] "PMC8867041 /pmc/articles/PMC8867041/bin/Table2.XLSX Hsapiens 3 44448 44256 44257"
## [85] "PMC8860223 /pmc/articles/PMC8860223/bin/cir-145-606-s002.xlsx Mmusculus 12 37135 37865 38596 38961 38231 40422 40057 40787 39692 42248 39326 37500"
## [86] "PMC8865845 /pmc/articles/PMC8865845/bin/elife-70382-fig4-data1.xlsx Hsapiens 2 44257 44256"
## [87] "PMC8863889 /pmc/articles/PMC8863889/bin/41598_2022_6813_MOESM2_ESM.xlsx Hsapiens 315 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44261 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260 44260"
## [88] "PMC8863889 /pmc/articles/PMC8863889/bin/41598_2022_6813_MOESM2_ESM.xlsx Hsapiens 172 44444 44449 44444 44442 44449 44449 44449 44449 44449 44442 44444 44449 44449 44449 44449 44261 44449 44449 44449 44449 44444 44449 44442 44449 44450 44256 44443 44449 44256 44449 44442 44444 44449 44449 44443 44445 44449 44440 44449 44444 44449 44444 44443 44449 44449 44445 44442 44449 44444 44449 44449 44443 44449 44257 44443 44449 44447 44449 44449 44444 44440 44445 44449 44256 44449 44449 44449 44449 44443 44449 44449 44257 44444 44444 44442 44441 44256 44444 44445 44440 44444 44261 44449 44449 44444 44449 44443 44441 44444 44449 44449 44449 44449 44256 44450 44450 44441 44445 44449 44449 44449 44449 44450 44449 44449 44449 44444 44442 44444 44449 44447 44440 44449 44449 44440 44442 44440 44449 44445 44444 44449 44449 44442 44258 44258 44259 44444 44256 44449 44444 44449 44445 44260 44449 44449 44443 44440 44449 44449 44449 44444 44256 44440 44261 44442 44442 44440 44442 44445 44443 44449 44449 44447 44449 44256 44449 44444 44258 44259 44444 44444 44441 44257 44442 44445 44444 44260 44449 44259 44449 44449 44261"
## [89] "PMC8856743 /pmc/articles/PMC8856743/bin/mmc18.xlsx Hsapiens 27 43526 43525 43718 43531 43528 43710 43719 43525 43530 43535 43527 43716 43715 43722 43800 43532 43529 43533 43720 43709 43712 43534 43717 43526 43713 43711 43714"
## [90] "PMC8851819 /pmc/articles/PMC8851819/bin/12864_2022_8304_MOESM4_ESM.xlsx Ggallus 67 37316 37316 37316 37316 37316 37316 38412 37316 37316 37316 38412 37316 37316 37316 37316 37316 37316 37316 37316 36951 36951 37316 37316 36951 38777 38777 38777 38777 38777 38961 38961 38961 38961 37500 38961 38961 38961 40787 39508 37500 38961 40422 41153 40422 39692 39508 39326 38412 39326 39692 41153 40603 39692 39692 37500 38596 41153 37500 37500 38596 37500 38596 37500 38596 40603 40603 40603"
## [91] "PMC8853518 /pmc/articles/PMC8853518/bin/pcbi.1009800.s005.xlsx Hsapiens 2 4-octyl itaconate"
## [92] "PMC8853518 /pmc/articles/PMC8853518/bin/pcbi.1009800.s006.xlsx Hsapiens 1 SOX2-OCT4-NANOG"
## [93] "PMC8853518 /pmc/articles/PMC8853518/bin/pcbi.1009800.s006.xlsx Hsapiens 2 4-octyl itaconate"
## [94] "PMC8853518 /pmc/articles/PMC8853518/bin/pcbi.1009800.s006.xlsx Hsapiens 2 10-decarbamoylmitomycin C"
## [95] "PMC8847715 /pmc/articles/PMC8847715/bin/Table1.xlsx Hsapiens 1 44531"
## [96] "PMC8845119 /pmc/articles/PMC8845119/bin/oncotarget-13-28195-s002.xls Hsapiens 20 40788 40610 40603 40609 40786 40605 40795 40787 40602 40799 40797 40791 40789 40606 40790 40604 40794 40792 40607 40796"
## [97] "PMC8845119 /pmc/articles/PMC8845119/bin/oncotarget-13-28195-s002.xls Hsapiens 20 40788 40610 40603 40609 40786 40605 40795 40787 40602 40799 40797 40791 40789 40606 40790 40604 40794 40792 40607 40796"
## [98] "PMC8817049 /pmc/articles/PMC8817049/bin/41467_2022_28365_MOESM4_ESM.xlsx Hsapiens 1 44453"
## [99] "PMC8817049 /pmc/articles/PMC8817049/bin/41467_2022_28365_MOESM4_ESM.xlsx Hsapiens 1 44442"
## [100] "PMC8844495 /pmc/articles/PMC8844495/bin/Table2.xlsx Mmusculus 3 44259 44444 44448"
## [101] "PMC8844495 /pmc/articles/PMC8844495/bin/Table4.xlsx Mmusculus 3 44259 44444 44448"
## [102] "PMC8844495 /pmc/articles/PMC8844495/bin/Table4.xlsx Mmusculus 8 44266 44259 44261 44262 44450 44444 44446 44448"
## [103] "PMC8844200 /pmc/articles/PMC8844200/bin/Table12.XLSX Athaliana 2 42980 42826"
## [104] "PMC8844200 /pmc/articles/PMC8844200/bin/Table12.XLSX Athaliana 1 43709"
## [105] "PMC8844200 /pmc/articles/PMC8844200/bin/Table5.XLSX Athaliana 5 42827 42979 42980 42828 43014"
## [106] "PMC8844200 /pmc/articles/PMC8844200/bin/Table6.XLSX Athaliana 2 43558 43744"
## [107] "PMC8844200 /pmc/articles/PMC8844200/bin/Table7.XLSX Athaliana 4 42979 42828 42826 43014"
## [108] "PMC8844200 /pmc/articles/PMC8844200/bin/Table8.XLSX Athaliana 1 43558"
## [109] "PMC8844200 /pmc/articles/PMC8844200/bin/Table9.XLSX Athaliana 2 42979 43014"
## [110] "PMC8844200 /pmc/articles/PMC8844200/bin/Table9.XLSX Athaliana 5 42980 42828 42827 42980 42826"
## [111] "PMC8763317 /pmc/articles/PMC8763317/bin/MO-018-D1MO00236H-s002.xlsx Hsapiens 1 40057"
## [112] "PMC8763317 /pmc/articles/PMC8763317/bin/MO-018-D1MO00236H-s002.xlsx Hsapiens 1 40057"
## [113] "PMC8830042 /pmc/articles/PMC8830042/bin/13148_2022_1239_MOESM1_ESM.xlsx Hsapiens 2 43892 44082"
## [114] "PMC8830042 /pmc/articles/PMC8830042/bin/13148_2022_1239_MOESM1_ESM.xlsx Hsapiens 2 44082 43892"
## [115] "PMC8830042 /pmc/articles/PMC8830042/bin/13148_2022_1239_MOESM1_ESM.xlsx Hsapiens 9 44083 44083 43891 44083 44083 43891 43897 43891 43892"
## [116] "PMC8819343 /pmc/articles/PMC8819343/bin/CAS-113-540-s001.xlsx Hsapiens 20 43526 43712 43526 43715 43714 43531 43719 43533 43528 43525 43530 43716 43532 43710 43527 43709 43717 43713 43718 43529"
## [117] "PMC8819343 /pmc/articles/PMC8819343/bin/CAS-113-540-s001.xlsx Hsapiens 3 43526 43714 43716"
## [118] "PMC8819343 /pmc/articles/PMC8819343/bin/CAS-113-540-s001.xlsx Hsapiens 20 43717 43530 43713 43715 43712 43723 43526 43718 43716 43533 43528 43719 43527 43525 43711 43531 43532 43710 43714 43529"
## [119] "PMC8821623 /pmc/articles/PMC8821623/bin/41598_2022_5737_MOESM2_ESM.xlsx Ggallus 1 44047"
## [120] "PMC8819394 /pmc/articles/PMC8819394/bin/mmc4.xlsx Rnorvegicus 1 42429"
## [121] "PMC8819394 /pmc/articles/PMC8819394/bin/mmc6.xlsx Rnorvegicus 1 42438"
## [122] "PMC8819394 /pmc/articles/PMC8819394/bin/mmc7.xlsx Rnorvegicus 1 42438"
## [123] "PMC8823376 /pmc/articles/PMC8823376/bin/NIHMS1687331-supplement-Supplemental_Table_6.xlsx Hsapiens 3 43345 43350 43350"
## [124] "PMC8818873 /pmc/articles/PMC8818873/bin/Table4.XLSX Hsapiens 26 44896 44630 44631 44621 44622 44623 44624 44625 44626 44627 44628 44629 44819 44814 44815 44816 44818 44805 44806 44807 44808 44809 44810 44811 44812 44813"
## [125] "PMC8818738 /pmc/articles/PMC8818738/bin/DataSheet_2.xlsx Hsapiens 7 44079 43891 44075 44078 43894 43900 43893"
## [126] "PMC8807822 /pmc/articles/PMC8807822/bin/41467_2022_28287_MOESM10_ESM.xlsx Hsapiens 2 44441 44448"
## [127] "PMC8807822 /pmc/articles/PMC8807822/bin/41467_2022_28287_MOESM5_ESM.xlsx Hsapiens 5 44448 44441 44445 44446 44447"
## [128] "PMC8807396 /pmc/articles/PMC8807396/bin/41375_2021_1381_MOESM9_ESM.xlsx Mmusculus 1 40238"
## [129] "PMC8789859 /pmc/articles/PMC8789859/bin/41392_2021_838_MOESM2_ESM.xlsx Hsapiens 8 43891 43891 43891 43891 43891 43891 43891 43891"
## [130] "PMC8789859 /pmc/articles/PMC8789859/bin/41392_2021_838_MOESM2_ESM.xlsx Hsapiens 2 43901 43891"
## [131] "PMC8782899 /pmc/articles/PMC8782899/bin/41467_2022_27953_MOESM4_ESM.xlsx Hsapiens 83 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500"
## [132] "PMC8782899 /pmc/articles/PMC8782899/bin/41467_2022_27953_MOESM4_ESM.xlsx Hsapiens 1 37500"
## [133] "PMC8782899 /pmc/articles/PMC8782899/bin/41467_2022_27953_MOESM4_ESM.xlsx Hsapiens 3 40787 37500 39326"
## [134] "PMC8782899 /pmc/articles/PMC8782899/bin/41467_2022_27953_MOESM4_ESM.xlsx Hsapiens 6 39326 40787 39692 37500 37135 40057"
## [135] "PMC8782899 /pmc/articles/PMC8782899/bin/41467_2022_27953_MOESM4_ESM.xlsx Hsapiens 10 40422 39692 41883 41153 37500 40057 40787 39326 37135 38961"
## [136] "PMC8782899 /pmc/articles/PMC8782899/bin/41467_2022_27953_MOESM4_ESM.xlsx Hsapiens 3 40787 37500 39326"
## [137] "PMC8782899 /pmc/articles/PMC8782899/bin/41467_2022_27953_MOESM4_ESM.xlsx Hsapiens 3 37500 37500 37500"
## [138] "PMC8782899 /pmc/articles/PMC8782899/bin/41467_2022_27953_MOESM4_ESM.xlsx Hsapiens 2 37500 39326"
## [139] "PMC8776955 /pmc/articles/PMC8776955/bin/42003_2022_3031_MOESM6_ESM.xlsx Hsapiens 1 37135"
## [140] "PMC8776955 /pmc/articles/PMC8776955/bin/42003_2022_3031_MOESM6_ESM.xlsx Hsapiens 1 37135"
## [141] "PMC8766575 /pmc/articles/PMC8766575/bin/42003_2022_3011_MOESM11_ESM.xlsx Hsapiens 27 44257 44256 44449 44262 44259 44441 44450 44256 44261 44266 44258 44447 44446 44453 44531 44263 44260 44264 44451 44440 44443 44265 44448 44257 44444 44442 44445"
## [142] "PMC8766575 /pmc/articles/PMC8766575/bin/42003_2022_3011_MOESM6_ESM.xlsx Hsapiens 4 44449 44261 44450 44449"
## [143] "PMC8807568 /pmc/articles/PMC8807568/bin/Table10.XLSX Drerio 2 44454 44260"
## [144] "PMC8800112 /pmc/articles/PMC8800112/bin/mmc3.xlsx Hsapiens 1 2002/09/01"
## [145] "PMC8796118 /pmc/articles/PMC8796118/bin/mmc3.xls Ggallus 2 2021/09/14 \"ZNF713,14-Sep\""
## [146] "PMC8784157 /pmc/articles/PMC8784157/bin/pnas.2105171119.sd01.xlsx Hsapiens 28 38961 37865 40057 38777 40422 41883 38412 37316 38596 39873 40238 37316 42248 37226 38231 36951 39326 36951 40603 39692 37135 37681 40787 39142 39508 37500 41153 38047"
## [147] "PMC8784157 /pmc/articles/PMC8784157/bin/pnas.2105171119.sd01.xlsx Hsapiens 28 38961 37316 38596 39142 38412 37681 41153 39508 39873 40422 40057 39326 36951 37865 40603 38231 38047 37316 36951 37500 37135 40238 40787 39692 41883 37226 42248 38777"
## [148] "PMC8784157 /pmc/articles/PMC8784157/bin/pnas.2105171119.sd01.xlsx Hsapiens 28 38777 39508 38961 36951 38412 40238 37316 40603 39873 37316 38596 37226 41153 36951 42248 38047 37681 40422 41883 38231 39326 37865 40787 39142 37500 39692 40057 37135"
## [149] "PMC8784157 /pmc/articles/PMC8784157/bin/pnas.2105171119.sd02.xlsx Hsapiens 28 43532 43717 43526 43715 43531 43710 43718 43526 43533 43723 43535 43525 43716 43528 43720 43721 43530 43534 43527 43529 43800 43713 43712 43719 43711 43709 43722 43714"
## [150] "PMC8784157 /pmc/articles/PMC8784157/bin/pnas.2105171119.sd02.xlsx Hsapiens 28 37135 39326 38596 38231 37500 38777 38961 40057 37316 36951 38412 40787 37865 38047 39508 39873 37681 41153 37226 41883 42248 40422 39142 37316 40238 41518 40603 39692"
## [151] "PMC8784157 /pmc/articles/PMC8784157/bin/pnas.2105171119.sd02.xlsx Hsapiens 28 43709 43715 43713 43712 43710 43530 43714 43717 43526 43525 43529 43719 43711 43528 43532 43533 43527 43720 43800 43722 43723 43718 43531 43526 43534 43721 43535 43716"
## [152] "PMC8784157 /pmc/articles/PMC8784157/bin/pnas.2105171119.sd02.xlsx Hsapiens 28 37681 39508 38777 38231 38596 40787 42248 37865 37500 39873 37316 40422 40238 39142 40603 38961 38047 39692 39326 40057 36951 37135 41153 37316 41518 38412 41883 37226"
## [153] "PMC8784157 /pmc/articles/PMC8784157/bin/pnas.2105171119.sd02.xlsx Hsapiens 28 43532 43526 43531 43530 43716 43710 43720 43717 43533 43528 43721 43712 43719 43526 43800 43535 43534 43715 43713 43711 43527 43529 43722 43709 43718 43723 43714 43525"
## [154] "PMC8784157 /pmc/articles/PMC8784157/bin/pnas.2105171119.sd02.xlsx Hsapiens 28 39692 39326 38596 39508 37316 40057 38777 41518 39142 37500 39873 41153 38047 38231 37226 40787 37316 40603 40238 37865 37135 37681 38412 38961 41883 40422 36951 42248"
## [155] "PMC8782496 /pmc/articles/PMC8782496/bin/CTM2-12-e670-s001.xls Hsapiens 7 43901 43899 44080 43896 44078 44086 44083"
## [156] "PMC8782496 /pmc/articles/PMC8782496/bin/CTM2-12-e670-s001.xls Hsapiens 6 44266 44264 44445 44261 44451 44443"
## [157] "PMC8782496 /pmc/articles/PMC8782496/bin/CTM2-12-e670-s001.xls Hsapiens 10 44531 44256 44265 44257 44258 44259 44260 44262 44263 44256"
## [158] "PMC8782496 /pmc/articles/PMC8782496/bin/CTM2-12-e670-s001.xls Hsapiens 11 44257 44454 44440 44449 44450 44453 44441 44442 44444 44446 44447"
## [159] "PMC8782496 /pmc/articles/PMC8782496/bin/CTM2-12-e670-s001.xls Hsapiens 1 44448"
## [160] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM10_ESM.xlsx Ggallus 24 44266 44446 44449 44441 44447 44260 44444 44263 44258 44531 44451 44262 44443 44257 44448 44454 44264 44261 44445 44265 44259 44442 44450 44453"
## [161] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM10_ESM.xlsx Hsapiens 28 44440 44256 44447 44531 44265 44263 44445 44262 44453 44258 44450 44449 44257 44260 44443 44442 44261 44441 44451 44259 44266 44454 44264 44257 44448 44256 44446 44444"
## [162] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM10_ESM.xlsx Hsapiens 28 44449 44256 44264 44531 44448 44260 44263 44454 44450 44445 44262 44258 44453 44451 44441 44447 44444 44443 44259 44265 44257 44256 44442 44261 44257 44440 44266 44446"
## [163] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM10_ESM.xlsx Hsapiens 24 44446 44262 44266 44258 44261 44265 44441 44451 44445 44260 44447 44531 44453 44449 44450 44264 44448 44257 44454 44443 44263 44259 44444 44442"
## [164] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM10_ESM.xlsx Hsapiens 24 44450 44441 44266 44448 44443 44444 44454 44261 44442 44262 44449 44264 44259 44445 44453 44531 44447 44446 44260 44451 44263 44265 44258 44257"
## [165] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM10_ESM.xlsx Hsapiens 28 44531 44451 44264 44265 44453 44257 44256 44258 44263 44447 44445 44443 44449 44266 44262 44257 44256 44450 44454 44259 44448 44444 44261 44441 44442 44260 44446 44440"
## [166] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM10_ESM.xlsx Hsapiens 1 44446"
## [167] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM10_ESM.xlsx Hsapiens 28 44265 44443 44264 44263 44266 44531 44257 44258 44453 44445 44447 44262 44451 44441 44256 44257 44454 44260 44450 44448 44256 44446 44444 44449 44259 44440 44261 44442"
## [168] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM2_ESM.xlsx Hsapiens 4 44257 44257 44257 44257"
## [169] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM2_ESM.xlsx Hsapiens 4 44257 multiKO_2-Mar_ENSCSAG00000015876 44257 44257"
## [170] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM3_ESM.xlsx Hsapiens 4 44257 44257 multiKO_2-Mar_ENSCSAG00000015876 44257"
## [171] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM3_ESM.xlsx Hsapiens 4 44257 44257 multiKO_2-Mar_ENSCSAG00000015876 44257"
## [172] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM4_ESM.xlsx Hsapiens 2 44257 multiKO_2-Mar_ENSCSAG00000015876"
## [173] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM5_ESM.xlsx Hsapiens 2 44257 multiKO_2-Mar_ENSCSAG00000015876"
## [174] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM5_ESM.xlsx Hsapiens 2 44257 multiKO_2-Mar_ENSCSAG00000015876"
## [175] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM6_ESM.xlsx Hsapiens 28 44531 44259 44265 44257 44262 44451 44257 44440 44260 44264 44442 44444 44261 44446 44449 44454 44258 44443 44447 44266 44445 44263 44450 44256 44441 44453 44448 44256"
## [176] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM6_ESM.xlsx Hsapiens 28 44531 44265 44262 44259 44263 44450 44447 44445 44446 44260 44256 44444 44443 44258 44449 44454 44440 44441 44257 44257 44266 44261 44451 44256 44442 44453 44448 44264"
## [177] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM6_ESM.xlsx Hsapiens 28 44265 44447 44531 44263 44262 44264 44442 44260 44446 44443 44257 44256 44449 44454 44441 44258 44444 44448 44261 44440 44266 44257 44453 44259 44256 44451 44450 44445"
## [178] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM6_ESM.xlsx Hsapiens 28 44531 44263 44262 44442 44259 44265 44264 44260 44446 44256 44257 44258 44440 44261 44444 44449 44454 44257 44443 44445 44447 44441 44451 44453 44450 44256 44448 44266"
## [179] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM6_ESM.xlsx Hsapiens 28 44447 44265 44264 44531 44266 44259 44262 44446 44260 44257 44449 44440 44258 44256 44454 44443 44451 44444 44448 44261 44442 44257 44256 44453 44441 44445 44263 44450"
## [180] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM7_ESM.xlsx Hsapiens 28 44261 44257 44448 44260 44445 44256 44441 44258 44446 44453 44256 44262 44443 44265 44447 44450 44257 44259 44449 44266 44440 44263 44531 44442 44444 44451 44264 44454"
## [181] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM7_ESM.xlsx Hsapiens 28 44440 44260 44257 44453 44448 44441 44444 44261 44445 44265 44258 44447 44256 44263 44259 44531 44450 44257 44443 44262 44442 44256 44266 44264 44451 44449 44454 44446"
## [182] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM8_ESM.xlsx Hsapiens 28 44441 44260 44261 44256 44449 44450 44258 44445 44256 44448 44531 44259 44454 44440 44446 44257 44444 44442 44266 44264 44443 44453 44265 44257 44451 44263 44262 44447"
## [183] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM8_ESM.xlsx Hsapiens 27 44531 44262 44441 44260 44265 44256 44448 44263 44264 44257 44447 44261 44445 44258 44450 44442 44266 44453 44257 44451 44444 44454 44449 44446 44440 44443 44256"
## [184] "PMC8792531 /pmc/articles/PMC8792531/bin/13073_2022_1013_MOESM8_ESM.xlsx Hsapiens 27 44531 44441 44260 44449 44448 44442 44445 44454 44261 44266 44262 44446 44264 44450 44257 44451 44443 44447 44258 44257 44263 44453 44256 44440 44265 44444 44256"
Let’s investigate the errors in more detail.
# By species
SPECIES <- sapply(strsplit(ERROR_GENELISTS," "),"[[",3)
table(SPECIES)
## SPECIES
## Athaliana Dmelanogaster Drerio Ggallus Hsapiens
## 9 2 1 5 157
## Mmusculus Rnorvegicus
## 7 3
par(mar=c(5,12,4,2))
barplot(table(SPECIES),horiz=TRUE,las=1)
par(mar=c(5,5,4,2))
# Number of affected Excel files per paper
DIST <- table(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
DIST
##
## PMC8763317 PMC8766575 PMC8776955 PMC8782496 PMC8782884 PMC8782899 PMC8783018
## 2 2 2 5 1 8 1
## PMC8784157 PMC8789859 PMC8789861 PMC8791277 PMC8791834 PMC8792531 PMC8794810
## 9 2 4 1 1 25 1
## PMC8795503 PMC8796118 PMC8796709 PMC8800112 PMC8803663 PMC8803925 PMC8804361
## 2 1 1 1 3 1 1
## PMC8806071 PMC8807355 PMC8807396 PMC8807568 PMC8807747 PMC8807822 PMC8811262
## 1 4 1 1 2 2 1
## PMC8812899 PMC8814347 PMC8817049 PMC8817631 PMC8818738 PMC8818873 PMC8819090
## 2 1 2 2 1 1 1
## PMC8819343 PMC8819394 PMC8821623 PMC8822763 PMC8823376 PMC8826452 PMC8828869
## 3 3 1 4 1 4 2
## PMC8829032 PMC8830042 PMC8833116 PMC8837554 PMC8844200 PMC8844495 PMC8845119
## 1 3 1 2 8 3 2
## PMC8847715 PMC8848662 PMC8850306 PMC8850650 PMC8851317 PMC8851671 PMC8851680
## 1 1 4 1 1 1 2
## PMC8851819 PMC8851870 PMC8853518 PMC8856606 PMC8856659 PMC8856743 PMC8860223
## 1 1 4 1 1 1 1
## PMC8860442 PMC8863889 PMC8865640 PMC8865845 PMC8867041 PMC8872788 PMC8873582
## 1 2 11 1 2 4 7
## PMC8873981
## 1
summary(as.numeric(DIST))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 2.592 3.000 25.000
hist(DIST,main="Number of affected Excel files per paper")
# PMC Articles with the most errors
DIST_DF <- as.data.frame(DIST)
DIST_DF <- DIST_DF[order(-DIST_DF$Freq),,drop=FALSE]
head(DIST_DF,20)
## Var1 Freq
## 13 PMC8792531 25
## 66 PMC8865640 11
## 8 PMC8784157 9
## 6 PMC8782899 8
## 47 PMC8844200 8
## 70 PMC8873582 7
## 4 PMC8782496 5
## 10 PMC8789861 4
## 23 PMC8807355 4
## 39 PMC8822763 4
## 41 PMC8826452 4
## 52 PMC8850306 4
## 59 PMC8853518 4
## 69 PMC8872788 4
## 19 PMC8803663 3
## 36 PMC8819343 3
## 37 PMC8819394 3
## 44 PMC8830042 3
## 48 PMC8844495 3
## 1 PMC8763317 2
MOST_ERR_FILES = as.character(DIST_DF[1,1])
MOST_ERR_FILES
## [1] "PMC8792531"
# Number of errors per paper
NERR <- as.numeric(sapply(strsplit(ERROR_GENELISTS," "),"[[",4))
names(NERR) <- sapply(strsplit(ERROR_GENELISTS," "),"[[",1)
NERR <-tapply(NERR, names(NERR), sum)
NERR
## PMC8763317 PMC8766575 PMC8776955 PMC8782496 PMC8782884 PMC8782899 PMC8783018
## 2 31 2 35 2 111 7
## PMC8784157 PMC8789859 PMC8789861 PMC8791277 PMC8791834 PMC8792531 PMC8794810
## 252 10 12 1 14 485 6
## PMC8795503 PMC8796118 PMC8796709 PMC8800112 PMC8803663 PMC8803925 PMC8804361
## 8 2 6 1 18 14 1
## PMC8806071 PMC8807355 PMC8807396 PMC8807568 PMC8807747 PMC8807822 PMC8811262
## 1 66 1 2 2 7 3
## PMC8812899 PMC8814347 PMC8817049 PMC8817631 PMC8818738 PMC8818873 PMC8819090
## 26 1 2 31 7 26 1
## PMC8819343 PMC8819394 PMC8821623 PMC8822763 PMC8823376 PMC8826452 PMC8828869
## 43 3 1 19 3 96 344
## PMC8829032 PMC8830042 PMC8833116 PMC8837554 PMC8844200 PMC8844495 PMC8845119
## 1 13 1 29 22 14 40
## PMC8847715 PMC8848662 PMC8850306 PMC8850650 PMC8851317 PMC8851671 PMC8851680
## 1 3 78 2 4 5 26
## PMC8851819 PMC8851870 PMC8853518 PMC8856606 PMC8856659 PMC8856743 PMC8860223
## 67 1 7 2 1 27 12
## PMC8860442 PMC8863889 PMC8865640 PMC8865845 PMC8867041 PMC8872788 PMC8873582
## 25 487 392 2 6 153 729
## PMC8873981
## 1
hist(NERR,main="number of errors per PMC article")
NERR_DF <- as.data.frame(NERR)
NERR_DF <- NERR_DF[order(-NERR_DF$NERR),,drop=FALSE]
head(NERR_DF,20)
## NERR
## PMC8873582 729
## PMC8863889 487
## PMC8792531 485
## PMC8865640 392
## PMC8828869 344
## PMC8784157 252
## PMC8872788 153
## PMC8782899 111
## PMC8826452 96
## PMC8850306 78
## PMC8851819 67
## PMC8807355 66
## PMC8819343 43
## PMC8845119 40
## PMC8782496 35
## PMC8766575 31
## PMC8817631 31
## PMC8837554 29
## PMC8856743 27
## PMC8812899 26
MOST_ERR = rownames(NERR_DF)[1]
MOST_ERR
## [1] "PMC8873582"
GENELIST_ERROR_ARTICLES <- gsub("PMC","",GENELIST_ERROR_ARTICLES)
### JSON PARSING is more reliable than XML
ARTICLES <- esummary( GENELIST_ERROR_ARTICLES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA$result
ARTICLE_DATA <- ARTICLE_DATA[2:length(ARTICLE_DATA)]
JOURNALS <- unlist(lapply(ARTICLE_DATA,function(x) {x$fulljournalname} ))
JOURNALS_TABLE <- table(JOURNALS)
JOURNALS_TABLE <- JOURNALS_TABLE[order(-JOURNALS_TABLE)]
length(JOURNALS_TABLE)
## [1] 43
NUM_JOURNALS=length(JOURNALS_TABLE)
par(mar=c(5,25,4,2))
barplot(head(JOURNALS_TABLE,10), horiz=TRUE, las=1,
xlab="Articles with gene name errors in supp files",
main="Top journals this month")
Congrats to our Journal of the Month winner!
JOURNAL_WINNER <- names(head(JOURNALS_TABLE,1))
JOURNAL_WINNER
## [1] "Frontiers in Genetics"
There are two categories:
Paper with the most suplementary files affected by gene name errors (MOST_ERR_FILES)
Paper with the most gene names converted to dates (MOST_ERR)
Sometimes, one paper can win both categories. Congrats to our winners.
MOST_ERR_FILES <- gsub("PMC","",MOST_ERR_FILES)
ARTICLES <- esummary( MOST_ERR_FILES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA[2]
ARTICLE_DATA
## $result
## $result$uids
## [1] "8792531"
##
## $result$`8792531`
## $result$`8792531`$uid
## [1] "8792531"
##
## $result$`8792531`$pubdate
## [1] "2022 Jan 27"
##
## $result$`8792531`$epubdate
## [1] "2022 Jan 27"
##
## $result$`8792531`$printpubdate
## [1] ""
##
## $result$`8792531`$source
## [1] "Genome Med"
##
## $result$`8792531`$authors
## name authtype
## 1 Grodzki M Author
## 2 Bluhm AP Author
## 3 Schaefer M Author
## 4 Tagmount A Author
## 5 Russo M Author
## 6 Sobh A Author
## 7 Rafiee R Author
## 8 Vulpe CD Author
## 9 Karst SM Author
## 10 Norris MH Author
##
## $result$`8792531`$title
## [1] "Genome-scale CRISPR screens identify host factors that promote human coronavirus infection"
##
## $result$`8792531`$volume
## [1] "14"
##
## $result$`8792531`$issue
## [1] ""
##
## $result$`8792531`$pages
## [1] "10"
##
## $result$`8792531`$articleids
## idtype value
## 1 pmid 35086559
## 2 doi 10.1186/s13073-022-01013-1
## 3 pmcid PMC8792531
##
## $result$`8792531`$fulljournalname
## [1] "Genome Medicine"
##
## $result$`8792531`$sortdate
## [1] "2022/01/27 00:00"
##
## $result$`8792531`$pmclivedate
## [1] "2022/01/27"
MOST_ERR <- gsub("PMC","",MOST_ERR)
ARTICLE_DATA <- esummary(MOST_ERR,db = "pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLE_DATA,as= "parsed")
ARTICLE_DATA
## $header
## $header$type
## [1] "esummary"
##
## $header$version
## [1] "0.3"
##
##
## $result
## $result$uids
## [1] "8873582"
##
## $result$`8873582`
## $result$`8873582`$uid
## [1] "8873582"
##
## $result$`8873582`$pubdate
## [1] "2022 Feb 11"
##
## $result$`8873582`$epubdate
## [1] "2022 Feb 11"
##
## $result$`8873582`$printpubdate
## [1] ""
##
## $result$`8873582`$source
## [1] "Front Genet"
##
## $result$`8873582`$authors
## name authtype
## 1 Chen J Author
## 2 Liu W Author
## 3 Du J Author
## 4 Wang P Author
## 5 Wang J Author
## 6 Ye K Author
##
## $result$`8873582`$title
## [1] "Comprehensive Genomic and Epigenomic Analyses on Transcriptomic Regulation in Stomach Adenocarcinoma"
##
## $result$`8873582`$volume
## [1] "12"
##
## $result$`8873582`$issue
## [1] ""
##
## $result$`8873582`$pages
## [1] "778095"
##
## $result$`8873582`$articleids
## idtype value
## 1 pmid 35222516
## 2 doi 10.3389/fgene.2021.778095
## 3 pmcid PMC8873582
##
## $result$`8873582`$fulljournalname
## [1] "Frontiers in Genetics"
##
## $result$`8873582`$sortdate
## [1] "2022/02/11 00:00"
##
## $result$`8873582`$pmclivedate
## [1] "2022/02/26"
To plot the trend over the past 6-12 months.
url <- "http://ziemann-lab.net/public/gene_name_errors/"
doc <- htmlParse(url)
links <- xpathSApply(doc, "//a/@href")
links <- links[grep("html",links)]
links
## href href href
## "Report_2021-02.html" "Report_2021-03.html" "Report_2021-04.html"
## href href href
## "Report_2021-05.html" "Report_2021-06.html" "Report_2021-07.html"
## href href href
## "Report_2021-08.html" "Report_2021-09.html" "Report_2021-10.html"
## href href href
## "Report_2021-11.html" "Report_2021-12.html" "Report_2022-01.html"
## href
## "Report_2022-02.html"
unlink("online_files/",recursive=TRUE)
dir.create("online_files")
sapply(links, function(mylink) {
download.file(paste(url,mylink,sep=""),destfile=paste("online_files/",mylink,sep=""))
} )
## href href href href href href href href href href href href href
## 0 0 0 0 0 0 0 0 0 0 0 0 0
myfilelist <- list.files("online_files/",full.names=TRUE)
trends <- sapply(myfilelist, function(myfilename) {
x <- readLines(myfilename)
# Num XL gene list articles
NUM_GENELIST_ARTICLES <- x[grep("NUM_GENELIST_ARTICLES",x)[3]+1]
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES," "),"[[",3)
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES,"<"),"[[",1)
NUM_GENELIST_ARTICLES <- as.numeric(NUM_GENELIST_ARTICLES)
# number of affected articles
NUM_ERROR_GENELIST_ARTICLES <- x[grep("NUM_ERROR_GENELIST_ARTICLES",x)[3]+1]
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES," "),"[[",3)
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES,"<"),"[[",1)
NUM_ERROR_GENELIST_ARTICLES <- as.numeric(NUM_ERROR_GENELIST_ARTICLES)
# Error proportion
ERROR_PROPORTION <- x[grep("ERROR_PROPORTION",x)[3]+1]
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION," "),"[[",3)
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION,"<"),"[[",1)
ERROR_PROPORTION <- as.numeric(ERROR_PROPORTION)
# number of journals
NUM_JOURNALS <- x[grep('JOURNALS_TABLE',x)[3]+1]
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS," "),"[[",3)
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS,"<"),"[[",1)
NUM_JOURNALS <- as.numeric(NUM_JOURNALS)
NUM_JOURNALS
res <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
return(res)
})
colnames(trends) <- sapply(strsplit(colnames(trends),"_"),"[[",3)
colnames(trends) <- gsub(".html","",colnames(trends))
trends <- as.data.frame(trends)
rownames(trends) <- c("NUM_GENELIST_ARTICLES","NUM_ERROR_GENELIST_ARTICLES","ERROR_PROPORTION","NUM_JOURNALS")
trends <- t(trends)
trends <- as.data.frame(trends)
CURRENT_RES <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
trends <- rbind(trends,CURRENT_RES)
paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
## [1] "2022-03"
rownames(trends)[nrow(trends)] <- paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
plot(trends$NUM_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with Excel gene lists per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_ERROR_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with gene name errors per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$ERROR_PROPORTION, xaxt = "n" , type="b" , main="Proportion of articles with Excel gene list affected by errors",
ylab="proportion", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_JOURNALS, xaxt = "n" , type="b" , main="Number of journals with affected articles",
ylab="number of journals", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
unlink("online_files/",recursive=TRUE)
Zeeberg, B.R., Riss, J., Kane, D.W. et al. Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5, 80 (2004). https://doi.org/10.1186/1471-2105-5-80
Ziemann, M., Eren, Y. & El-Osta, A. Gene name errors are widespread in the scientific literature. Genome Biol 17, 177 (2016). https://doi.org/10.1186/s13059-016-1044-7
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] readxl_1.3.1 reutils_0.2.3 xml2_1.3.3 jsonlite_1.7.2 XML_3.99-0.8
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.7 knitr_1.37 magrittr_2.0.1 R6_2.5.1
## [5] rlang_0.4.12 fastmap_1.1.0 stringr_1.4.0 highr_0.9
## [9] tools_4.1.2 xfun_0.29 jquerylib_0.1.4 htmltools_0.5.2
## [13] yaml_2.2.1 digest_0.6.29 assertthat_0.2.1 sass_0.4.0
## [17] bitops_1.0-7 RCurl_1.98-1.5 evaluate_0.14 rmarkdown_2.11
## [21] stringi_1.7.6 compiler_4.1.2 bslib_0.3.1 cellranger_1.1.0