Source: https://github.com/markziemann/GeneNameErrors2020
View the reports: http://ziemann-lab.net/public/gene_name_errors/
Gene name errors result when data are imported improperly into MS Excel and other spreadsheet programs (Zeeberg et al, 2004). Certain gene names like MARCH3, SEPT2 and DEC1 are converted into date format. These errors are surprisingly common in supplementary data files in the field of genomics (Ziemann et al, 2016). This could be considered a small error because it only affects a small number of genes, however it is symptomtic of poor data processing methods. The purpose of this script is to identify gene name errors present in supplementary files of PubMed Central articles in the previous month.
library("XML")
library("jsonlite")
library("xml2")
library("reutils")
library("readxl")
library("RCurl")
Here I will be getting PubMed Central IDs for the previous month.
Start with figuring out the date to search PubMed Central.
CURRENT_MONTH=format(Sys.time(), "%m")
CURRENT_YEAR=format(Sys.time(), "%Y")
if (CURRENT_MONTH == "01") {
PREV_YEAR=as.character(as.numeric(format(Sys.time(), "%Y"))-1)
PREV_MONTH="12"
} else {
PREV_YEAR=CURRENT_YEAR
PREV_MONTH=as.character(as.numeric(format(Sys.time(), "%m"))-1)
}
DATE=paste(PREV_YEAR,"/",PREV_MONTH,sep="")
DATE
## [1] "2022/12"
Let’s see how many PMC IDs we have in the past month.
QUERY ='((genom*[Abstract]))'
ESEARCH_RES <- esearch(term=QUERY, db = "pmc", rettype = "uilist", retmode = "xml", retstart = 0,
retmax = 5000000, usehistory = TRUE, webenv = NULL, querykey = NULL, sort = NULL, field = NULL,
datetype = NULL, reldate = NULL,
mindate = paste(DATE,"/1",sep="") , maxdate = paste(DATE,"/31",sep=""))
pmc <- efetch(ESEARCH_RES,retmode="text",rettype="uilist",outfile="pmcids.txt")
## Retrieving UIDs 1 to 500
## Retrieving UIDs 501 to 1000
## Retrieving UIDs 1001 to 1500
## Retrieving UIDs 1501 to 2000
## Retrieving UIDs 2001 to 2500
## Retrieving UIDs 2501 to 3000
## Retrieving UIDs 3001 to 3500
## Retrieving UIDs 3501 to 4000
pmc <- read.table(pmc)
pmc <- paste("PMC",pmc$V1,sep="")
NUM_ARTICLES=length(pmc)
NUM_ARTICLES
## [1] 3567
writeLines(pmc,con="pmc.txt")
Now run the bash script. Note that false positives can occur (~1.5%) and these results have not been verified by a human.
Here are some definitions:
NUM_XLS = Number of supplementary Excel files in this set of PMC articles.
NUM_XLS_ARTICLES = Number of articles matching the PubMed Central search which have supplementary Excel files.
GENELISTS = The gene lists found in the Excel files. Each Excel file is counted once even it has multiple gene lists.
NUM_GENELISTS = The number of Excel files with gene lists.
NUM_GENELIST_ARTICLES = The number of PMC articles with supplementary Excel gene lists.
ERROR_GENELISTS = Files suspected to contain gene name errors. The dates and five-digit numbers indicate transmogrified gene names.
NUM_ERROR_GENELISTS = Number of Excel gene lists with errors.
NUM_ERROR_GENELIST_ARTICLES = Number of articles with supplementary Excel gene name errors.
ERROR_PROPORTION = This is the proportion of articles with Excel gene lists that have errors.
system("./gene_names.sh pmc.txt")
results <- readLines("results.txt")
XLS <- results[grep("XLS",results,ignore.case=TRUE)]
NUM_XLS = length(XLS)
NUM_XLS
## [1] 6654
NUM_XLS_ARTICLES = length(unique(sapply(strsplit(XLS," "),"[[",1)))
NUM_XLS_ARTICLES
## [1] 1171
GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>2]
#GENELISTS
NUM_GENELISTS <- length(unique(sapply(strsplit(GENELISTS," "),"[[",2)))
NUM_GENELISTS
## [1] 634
NUM_GENELIST_ARTICLES <- length(unique(sapply(strsplit(GENELISTS," "),"[[",1)))
NUM_GENELIST_ARTICLES
## [1] 351
ERROR_GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>3]
#ERROR_GENELISTS
NUM_ERROR_GENELISTS = length(ERROR_GENELISTS)
NUM_ERROR_GENELISTS
## [1] 247
GENELIST_ERROR_ARTICLES <- unique(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
GENELIST_ERROR_ARTICLES
## [1] "PMC9801655" "PMC9795187" "PMC9793550" "PMC9792465" "PMC9757741"
## [6] "PMC9789999" "PMC9785075" "PMC9784255" "PMC9775101" "PMC9772819"
## [11] "PMC9769457" "PMC9768194" "PMC9763387" "PMC9763382" "PMC9762592"
## [16] "PMC9762028" "PMC9759858" "PMC9758924" "PMC9755029" "PMC9750130"
## [21] "PMC9748153" "PMC9746999" "PMC9742853" "PMC9734637" "PMC9733279"
## [26] "PMC9729295" "PMC9729111" "PMC9728798" "PMC9727928" "PMC9724437"
## [31] "PMC9721428" "PMC9720402" "PMC9720150" "PMC9683724" "PMC9718428"
## [36] "PMC9715950" "PMC9715678" "PMC9713440" "PMC9714951" "PMC9713698"
## [41] "PMC9713327" "PMC9713173" "PMC9708086" "PMC9706722" "PMC9703941"
## [46] "PMC9800021" "PMC9795334" "PMC9792466" "PMC9791056" "PMC9776514"
## [51] "PMC9776006" "PMC9775906" "PMC9775105" "PMC9774719" "PMC9771818"
## [56] "PMC9768914" "PMC9764863" "PMC9763853" "PMC9763118" "PMC9763110"
## [61] "PMC9762029" "PMC9758519" "PMC9753033" "PMC9749333" "PMC9748020"
## [66] "PMC9748018" "PMC9746894" "PMC9744761" "PMC9743592" "PMC9743561"
## [71] "PMC9738480" "PMC9736101" "PMC9734139" "PMC9731154" "PMC9723631"
## [76] "PMC9724785" "PMC9722939" "PMC9718667" "PMC9716074" "PMC9715725"
## [81] "PMC9714834" "PMC9713371" "PMC9705836"
NUM_ERROR_GENELIST_ARTICLES <- length(GENELIST_ERROR_ARTICLES)
NUM_ERROR_GENELIST_ARTICLES
## [1] 83
ERROR_PROPORTION = NUM_ERROR_GENELIST_ARTICLES / NUM_GENELIST_ARTICLES
ERROR_PROPORTION
## [1] 0.2364672
Here you can have a look at all the gene lists detected in the past month, as well as those with errors. The dates are obvious errors, these are commonly dates in September, March, December and October. The five-digit numbers represent dates as they are encoded in the Excel internal format. The five digit number is the number of days since 1900. If you were to take these numbers and put them into Excel and format the cells as dates, then these will also mostly map to dates in September, March, December and October.
#GENELISTS
ERROR_GENELISTS
## [1] "PMC9801655 PMC_DL/PMC9801655/supplementaryfiles/13578_2022_948_MOESM3_ESM.xlsx Ggallus 2 44813 44630"
## [2] "PMC9801655 PMC_DL/PMC9801655/supplementaryfiles/13578_2022_948_MOESM3_ESM.xlsx Hsapiens 1 44806"
## [3] "PMC9801655 PMC_DL/PMC9801655/supplementaryfiles/13578_2022_948_MOESM3_ESM.xlsx Hsapiens 1 44624"
## [4] "PMC9801655 PMC_DL/PMC9801655/supplementaryfiles/13578_2022_948_MOESM3_ESM.xlsx Ggallus 2 44622 44621"
## [5] "PMC9801655 PMC_DL/PMC9801655/supplementaryfiles/13578_2022_948_MOESM3_ESM.xlsx Hsapiens 2 44812 44623"
## [6] "PMC9801655 PMC_DL/PMC9801655/supplementaryfiles/13578_2022_948_MOESM3_ESM.xlsx Ggallus 1 44806"
## [7] "PMC9801655 PMC_DL/PMC9801655/supplementaryfiles/13578_2022_948_MOESM3_ESM.xlsx Hsapiens 1 44624"
## [8] "PMC9801655 PMC_DL/PMC9801655/supplementaryfiles/13578_2022_948_MOESM3_ESM.xlsx Hsapiens 2 44622 44621"
## [9] "PMC9801655 PMC_DL/PMC9801655/supplementaryfiles/13578_2022_948_MOESM3_ESM.xlsx Hsapiens 1 44622"
## [10] "PMC9801655 PMC_DL/PMC9801655/supplementaryfiles/13578_2022_948_MOESM3_ESM.xlsx Hsapiens 2 44625 44628"
## [11] "PMC9801655 PMC_DL/PMC9801655/supplementaryfiles/13578_2022_948_MOESM3_ESM.xlsx Hsapiens 2 44621 44815"
## [12] "PMC9801655 PMC_DL/PMC9801655/supplementaryfiles/13578_2022_948_MOESM3_ESM.xlsx Hsapiens 1 44896"
## [13] "PMC9795187 zip/Table_2.xlsx Hsapiens 2 44808 44809"
## [14] "PMC9793550 PMC_DL/PMC9793550/supplementaryfiles/12859_2022_5109_MOESM4_ESM.xlsx Hsapiens 5 44443 44257 44442 44265 44259"
## [15] "PMC9792465 PMC_DL/PMC9792465/supplementaryfiles/42003_2022_4351_MOESM4_ESM.xlsx Drerio 1 44454"
## [16] "PMC9792465 PMC_DL/PMC9792465/supplementaryfiles/42003_2022_4351_MOESM4_ESM.xlsx Drerio 5 44624 44816 44631 44816 44816"
## [17] "PMC9792465 PMC_DL/PMC9792465/supplementaryfiles/42003_2022_4351_MOESM4_ESM.xlsx Drerio 11 44626 44628 44625 44625 44808 44626 44626 44628 44625 44625 44808"
## [18] "PMC9792465 PMC_DL/PMC9792465/supplementaryfiles/42003_2022_4351_MOESM4_ESM.xlsx Drerio 2 44819 44819"
## [19] "PMC9792465 PMC_DL/PMC9792465/supplementaryfiles/42003_2022_4351_MOESM4_ESM.xlsx Drerio 1 44443"
## [20] "PMC9792465 PMC_DL/PMC9792465/supplementaryfiles/42003_2022_4351_MOESM4_ESM.xlsx Drerio 3 44451 44266 44259"
## [21] "PMC9792465 PMC_DL/PMC9792465/supplementaryfiles/42003_2022_4351_MOESM4_ESM.xlsx Drerio 3 44263 44261 44260"
## [22] "PMC9757741 zip/sciadv.abo4082_table_s1.xlsx Celegans 10 44621 44805 44652 44624 44626 44625 44836 44713 44835 44622"
## [23] "PMC9789999 PMC_DL/PMC9789999/supplementaryfiles/41467_2022_35604_MOESM6_ESM.xlsx Mmusculus 24 44631 44815 44628 44808 44818 44819 44813 44812 44627 44622 44630 44624 44806 44625 44805 44621 44814 44816 44807 44809 44626 44810 44811 44629"
## [24] "PMC9789999 PMC_DL/PMC9789999/supplementaryfiles/41467_2022_35604_MOESM5_ESM.xlsx Mmusculus 24 44805 44812 44625 44630 44621 44818 44622 44814 44628 44627 44816 44806 44811 44807 44626 44819 44808 44809 44810 44629 44813 44624 44631 44815"
## [25] "PMC9789999 PMC_DL/PMC9789999/supplementaryfiles/41467_2022_35604_MOESM4_ESM.xlsx Mmusculus 24 44631 44628 44818 44813 44630 44812 44807 44808 44811 44806 44809 44819 44814 44815 44816 44626 44810 44805 44625 44627 44622 44624 44629 44621"
## [26] "PMC9785075 zip/Supplementary_Table_S5.XLSX Hsapiens 3 38200 37135 38961"
## [27] "PMC9784255 PMC_DL/PMC9784255/supplementaryfiles/12863_2022_1099_MOESM7_ESM.xlsx Hsapiens 1 44256"
## [28] "PMC9775101 zip/Table_S1-S11.xlsx Hsapiens 2 44623 44630"
## [29] "PMC9772819 zip/Table_S6_630_PSGs_expression_in_each_cell_type.xlsx Hsapiens 1 43895"
## [30] "PMC9772819 zip/Table_S12_Gene_expression_average_and_proportion.xlsx Hsapiens 25 44256 44257 44256 44265 44266 44257 44258 44259 44260 44261 44262 44263 44264 44440 44449 44450 44451 44441 44442 44443 44444 44445 44446 44447 44448"
## [31] "PMC9772819 zip/Table_S12_Gene_expression_average_and_proportion.xlsx Hsapiens 25 44256 44257 44256 44265 44266 44257 44258 44259 44260 44261 44262 44263 44264 44440 44449 44450 44451 44441 44442 44443 44444 44445 44446 44447 44448"
## [32] "PMC9772819 zip/Table_S13_The_mean_expression_of_DEGs_in_different_cell_types.xlsx Hsapiens 4 44621 44628 44808 44811"
## [33] "PMC9769457 PMC_DL/PMC9769457/supplementaryfiles/Table_2.xlsx Hsapiens 14 44624 44630 44807 44816 44629 44627 44806 44811 44622 44628 44810 44622 44814 44805"
## [34] "PMC9768194 PMC_DL/PMC9768194/supplementaryfiles/Table1.XLSX Hsapiens 8 44621 44624 44626 44628 44805 44807 44810 44811"
## [35] "PMC9768194 PMC_DL/PMC9768194/supplementaryfiles/Table1.XLSX Hsapiens 2 44626 44815"
## [36] "PMC9768194 PMC_DL/PMC9768194/supplementaryfiles/Table1.XLSX Hsapiens 5 44621 44626 44628 44818 44806"
## [37] "PMC9768194 PMC_DL/PMC9768194/supplementaryfiles/Table1.XLSX Hsapiens 11 44896 44621 44622 44621 44627 44628 44815 44806 44807 44808 44812"
## [38] "PMC9768194 PMC_DL/PMC9768194/supplementaryfiles/Table1.XLSX Hsapiens 8 44621 44621 44630 44624 44625 44628 44806 44812"
## [39] "PMC9768194 PMC_DL/PMC9768194/supplementaryfiles/Table1.XLSX Hsapiens 18 44896 44621 44630 44624 44625 44626 44627 44628 44629 44814 44815 44816 44807 44808 44810 44811 44812 44813"
## [40] "PMC9768194 PMC_DL/PMC9768194/supplementaryfiles/Table1.XLSX Hsapiens 13 44896 44622 44623 44625 44629 44814 44815 44818 44806 44807 44811 44812 44813"
## [41] "PMC9768194 PMC_DL/PMC9768194/supplementaryfiles/Table1.XLSX Hsapiens 17 44621 44621 44623 44624 44626 44627 44628 44819 44814 44815 44806 44807 44808 44810 44811 44812 44813"
## [42] "PMC9768194 PMC_DL/PMC9768194/supplementaryfiles/Table1.XLSX Hsapiens 16 44621 44621 44624 44625 44627 44628 44629 44815 44816 44818 44807 44809 44810 44811 44812 44813"
## [43] "PMC9768194 PMC_DL/PMC9768194/supplementaryfiles/Table1.XLSX Hsapiens 17 44621 44623 44624 44625 44626 44627 44628 44819 44815 44818 44806 44807 44809 44810 44811 44812 44813"
## [44] "PMC9768194 PMC_DL/PMC9768194/supplementaryfiles/Table1.XLSX Hsapiens 15 44896 44621 44621 44622 44623 44624 44626 44627 44628 44819 44815 44818 44810 44811 44812"
## [45] "PMC9768194 PMC_DL/PMC9768194/supplementaryfiles/Table1.XLSX Hsapiens 19 44621 44630 44623 44624 44625 44626 44627 44628 44629 44819 44814 44815 44818 44806 44807 44810 44811 44812 44813"
## [46] "PMC9763387 PMC_DL/PMC9763387/supplementaryfiles/41387_2022_228_MOESM1_ESM.xlsx Hsapiens 5 44625 44624 44813 44621 44813"
## [47] "PMC9763387 PMC_DL/PMC9763387/supplementaryfiles/41387_2022_228_MOESM1_ESM.xlsx Hsapiens 13 44896 44621 44626 44624 44814 44625 44806 44810 44627 44630 44814 44629 44626"
## [48] "PMC9763387 PMC_DL/PMC9763387/supplementaryfiles/41387_2022_228_MOESM1_ESM.xlsx Hsapiens 1 44806"
## [49] "PMC9763382 PMC_DL/PMC9763382/supplementaryfiles/mmc3.xlsx Hsapiens 52 44896 44896 44896 44896 44630 44630 44630 44630 44621 44631 44631 44631 44631 44621 44621 44621 44622 44622 44622 44622 44623 44623 44623 44623 44624 44624 44624 44624 44625 44625 44625 44625 44626 44626 44626 44626 44627 44627 44627 44627 44628 44628 44628 44628 44629 44629 44629 44629 44819 44819 44819 44819"
## [50] "PMC9763382 PMC_DL/PMC9763382/supplementaryfiles/mmc3.xlsx Hsapiens 13 44622 44625 44629 44626 44819 44631 44628 44630 44627 44623 44624 44621 44896"
## [51] "PMC9763382 PMC_DL/PMC9763382/supplementaryfiles/mmc3.xlsx Hsapiens 50 44896 44896 44896 44896 44630 44630 44630 44630 44621 44631 44631 44631 44631 44621 44621 44621 44622 44622 44622 44622 44623 44623 44623 44623 44624 44624 44624 44624 44625 44625 44625 44625 44626 44626 44626 44627 44627 44627 44628 44628 44628 44628 44629 44629 44629 44629 44819 44819 44819 44819"
## [52] "PMC9763382 PMC_DL/PMC9763382/supplementaryfiles/mmc3.xlsx Hsapiens 13 44629 44625 44622 44630 44626 44621 44819 44627 44623 44624 44631 44628 44896"
## [53] "PMC9762592 PMC_DL/PMC9762592/supplementaryfiles/pgen.1010080.s011.xls Dmelanogaster 5 44441 44531 44440 44443 44444"
## [54] "PMC9762592 PMC_DL/PMC9762592/supplementaryfiles/pgen.1010080.s013.xls Dmelanogaster 1 44166"
## [55] "PMC9762028 PMC_DL/PMC9762028/supplementaryfiles/40779_2022_432_MOESM1_ESM.xlsx Hsapiens 77 44622 44812 44805 44626 44625 44622 44628 44628 44805 44814 44621 44809 44807 44628 44814 44810 44806 44628 44815 44621 44626 44808 44815 44621 44812 44806 44622 44809 44811 44630 44623 44813 44628 44813 44814 44814 44813 44814 44627 44811 44811 44813 44811 44815 44625 44624 44813 44815 44622 44806 44627 44811 44626 44808 44806 44813 44813 44806 44806 44815 44813 44812 44806 44810 44806 44811 44806 44812 44813 44813 44810 44813 44629 44815 44806 44629 44623"
## [56] "PMC9762028 PMC_DL/PMC9762028/supplementaryfiles/40779_2022_432_MOESM1_ESM.xlsx Hsapiens 70 44626 44628 44808 44626 44623 44807 44813 44813 44808 44809 44815 44622 44622 44629 44812 44806 44805 44622 44815 44813 44813 44626 44808 44806 44813 44806 44813 44625 44813 44629 44622 44622 44628 44809 44813 44813 44814 44813 44810 44806 44806 44809 44806 44815 44623 44626 44814 44621 44806 44627 44813 44627 44807 44621 44809 44815 44806 44621 44811 44806 44806 44815 44806 44811 44815 44627 44811 44806 44621 44811"
## [57] "PMC9762028 PMC_DL/PMC9762028/supplementaryfiles/40779_2022_432_MOESM1_ESM.xlsx Hsapiens 43 44622 44811 44621 44810 44622 44626 44815 44813 44813 44623 44814 44623 44623 44806 44621 44805 44623 44814 44626 44812 44624 44806 44811 44627 44811 44815 44809 44813 44806 44813 44810 44808 44809 44815 44814 44627 44814 44806 44808 44813 44808 44812 44808"
## [58] "PMC9759858 PMC_DL/PMC9759858/supplementaryfiles/13148_2022_1401_MOESM2_ESM.xlsx Hsapiens 2 40603 39142"
## [59] "PMC9758924 PMC_DL/PMC9758924/supplementaryfiles/12967_2022_3820_MOESM2_ESM.xlsx Hsapiens 13 43526 43719 43716 43715 43710 43529 43712 43525 43718 43717 43713 43714 43709"
## [60] "PMC9758924 PMC_DL/PMC9758924/supplementaryfiles/12967_2022_3820_MOESM2_ESM.xlsx Hsapiens 13 43712 43710 43715 43715 43715 43715 43716 43713 43718 43717 43717 43717 43717"
## [61] "PMC9758924 PMC_DL/PMC9758924/supplementaryfiles/12967_2022_3820_MOESM2_ESM.xlsx Hsapiens 1 42256"
## [62] "PMC9758924 PMC_DL/PMC9758924/supplementaryfiles/12967_2022_3820_MOESM2_ESM.xlsx Hsapiens 1 44166"
## [63] "PMC9758924 PMC_DL/PMC9758924/supplementaryfiles/12967_2022_3820_MOESM2_ESM.xlsx Hsapiens 3 44531 44447 44447"
## [64] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 2 39326 39508"
## [65] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 1 40787"
## [66] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 1 36951"
## [67] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 2 36951 38231"
## [68] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 2 36951 38231"
## [69] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 2 36951 37865"
## [70] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 1 36951"
## [71] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 1 38231"
## [72] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 1 37865"
## [73] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 20 39326 39508 40787 37316 38231 38047 36951 39873 37316 40603 38777 40057 40422 37865 39692 37135 39142 37500 38961 38412"
## [74] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 3 36951 38231 37865"
## [75] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 7 38231 38231 38231 38231 38231 38231 38231"
## [76] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 2 38231 38231"
## [77] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 1 39326"
## [78] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 2 37316 39326"
## [79] "PMC9755029 PMC_DL/PMC9755029/supplementaryfiles/mmc2.xlsx Mmusculus 4 39326 39508 40787 37316"
## [80] "PMC9750130 zip/Mouterde_SupplementaryTablesS1toS25_GBE2022_v3.xlsx Ggallus 498 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865"
## [81] "PMC9748153 zip/Raw_data_and_code/GSE7305.xlsx Hsapiens 23 44806 44810 44627 44813 44621 44811 44815 44626 44812 44622 44808 44814 44623 44625 44896 44628 44807 44629 44805 44624 44630 44816 44631"
## [82] "PMC9748153 zip/Raw_data_and_code/Figure4/GSE7305_differentially_expressed_genes_profile.xlsx Hsapiens 3 44811 44622 44808"
## [83] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s009.xlsx Hsapiens 2 38047 40057"
## [84] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s009.xlsx Ggallus 3 40422 38047 37226"
## [85] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s009.xlsx Hsapiens 1 37226"
## [86] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s009.xlsx Hsapiens 3 38047 40057 40787"
## [87] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s009.xlsx Hsapiens 4 40057 38047 40787 37226"
## [88] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s009.xlsx Hsapiens 3 40057 38047 40787"
## [89] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s009.xlsx Ggallus 3 40057 38047 40787"
## [90] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s008.xlsx Hsapiens 5 40238 40057 38047 40787 36951"
## [91] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s008.xlsx Ggallus 3 40057 38047 40787"
## [92] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s008.xlsx Hsapiens 2 40057 38047"
## [93] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s022.xlsx Ggallus 1 40787"
## [94] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s022.xlsx Hsapiens 7 36951 40787 38047 40787 36951 40238 40787"
## [95] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s018.xlsx Hsapiens 13 40238 40057 40057 40057 40057 40057 38047 38047 40787 40787 40787 40787 36951"
## [96] "PMC9746999 PMC_DL/PMC9746999/supplementaryfiles/pbio.3000221.s007.xlsx Ggallus 13 40238 40057 40057 40057 40057 40057 38047 38047 40787 40787 40787 40787 36951"
## [97] "PMC9742853 PMC_DL/PMC9742853/supplementaryfiles/mmc2.xlsx Hsapiens 1 39142"
## [98] "PMC9734637 PMC_DL/PMC9734637/supplementaryfiles/239_2022_10079_MOESM1_ESM.xlsx Scerevisiae 1 42879"
## [99] "PMC9733279 PMC_DL/PMC9733279/supplementaryfiles/12915_2022_1479_MOESM12_ESM.xlsx Dmelanogaster 6 17608 17403 17572 17568 17903 17395"
## [100] "PMC9733279 PMC_DL/PMC9733279/supplementaryfiles/12915_2022_1479_MOESM5_ESM.xlsx Dmelanogaster 2 37226 44896"
## [101] "PMC9733279 PMC_DL/PMC9733279/supplementaryfiles/12915_2022_1479_MOESM5_ESM.xlsx Dmelanogaster 1 37226"
## [102] "PMC9729295 PMC_DL/PMC9729295/supplementaryfiles/41467_2022_35065_MOESM6_ESM.xlsx Hsapiens 1 37500"
## [103] "PMC9729295 PMC_DL/PMC9729295/supplementaryfiles/41467_2022_35065_MOESM6_ESM.xlsx Hsapiens 2 40787 37500"
## [104] "PMC9729111 PMC_DL/PMC9729111/supplementaryfiles/41588_2022_1233_MOESM4_ESM.xlsx Hsapiens 22 37681 39873 38596 40238 40603 40057 39142 36951 42248 41153 40422 36951 39508 38777 38231 41883 39692 40787 37500 37135 37316 38047"
## [105] "PMC9728798 zip/Supplementary_Data_6.xlsx Ggallus 1 43892"
## [106] "PMC9728798 zip/Supplementary_Data_6.xlsx Hsapiens 1 43891"
## [107] "PMC9728798 zip/Supplementary_Data_6.xlsx Hsapiens 1 43892"
## [108] "PMC9728798 zip/Supplementary_Data_6.xlsx Hsapiens 1 43891"
## [109] "PMC9728798 zip/Supplementary_Data_6.xlsx Hsapiens 1 43891"
## [110] "PMC9728798 zip/Supplementary_Data_6.xlsx Ggallus 2 43896 43891"
## [111] "PMC9728798 zip/Supplementary_Data_6.xlsx Hsapiens 2 43897 43898"
## [112] "PMC9727928 PMC_DL/PMC9727928/supplementaryfiles/EMMM-14-e15200-s004.xlsx Hsapiens 3 38596 38961 40057"
## [113] "PMC9727928 PMC_DL/PMC9727928/supplementaryfiles/EMMM-14-e15200-s011.xlsx Hsapiens 7 44806 44810 44811 44812 44814 44813 44819"
## [114] "PMC9724437 PMC_DL/PMC9724437/supplementaryfiles/13059_2022_2821_MOESM2_ESM.xlsx Hsapiens 28 44257 44442 44443 44257 44446 44445 44262 44450 44264 44451 44259 44256 44261 44453 44447 44263 44441 44531 44265 44258 44440 44266 44448 44444 44256 44449 44260 44440"
## [115] "PMC9721428 PMC_DL/PMC9721428/supplementaryfiles/mmc11.xlsx Hsapiens 1 41153"
## [116] "PMC9721428 PMC_DL/PMC9721428/supplementaryfiles/mmc12.xlsx Hsapiens 2 38777 39326"
## [117] "PMC9721428 PMC_DL/PMC9721428/supplementaryfiles/mmc10.xlsx Hsapiens 2 39142 39692"
## [118] "PMC9720402 PMC_DL/PMC9720402/supplementaryfiles/Table2.xlsx Hsapiens 1 44628"
## [119] "PMC9720150 PMC_DL/PMC9720150/supplementaryfiles/Table_1.xlsx Hsapiens 1 44256"
## [120] "PMC9683724 zip/sciadv.abq3806_table_s2.xlsx Mmusculus 27 44624 44806 44621 44622 44627 44819 44815 44818 44628 44805 44621 44811 44814 44629 44812 44808 44630 44813 44631 44626 44807 44816 44809 44622 44623 44625 44810"
## [121] "PMC9718428 PMC_DL/PMC9718428/supplementaryfiles/Table3.XLSX Hsapiens 13 44622 44623 44626 44627 44819 44814 44815 44806 44808 44810 44811 44812 44813"
## [122] "PMC9718428 PMC_DL/PMC9718428/supplementaryfiles/Table4.XLSX Hsapiens 19 44896 44621 44622 44621 44622 44623 44625 44626 44627 44628 44819 44814 44815 44806 44808 44810 44811 44812 44813"
## [123] "PMC9715950 PMC_DL/PMC9715950/supplementaryfiles/41598_2022_23140_MOESM1_ESM.xlsx Hsapiens 26 44266 44265 44449 44531 44454 44445 44441 44257 44448 44443 44263 44260 44259 44262 44261 44256 44258 44264 44453 44450 44451 44447 44444 44446 44440 44442"
## [124] "PMC9715678 PMC_DL/PMC9715678/supplementaryfiles/42003_2022_4269_MOESM3_ESM.xlsx Mmusculus 14 38961 39326 40057 37316 37500 37135 39692 41153 38231 37865 41883 40787 38596 40422"
## [125] "PMC9715678 PMC_DL/PMC9715678/supplementaryfiles/42003_2022_4269_MOESM3_ESM.xlsx Mmusculus 14 38231 38596 40057 39692 40787 39326 37316 40422 37135 38961 41883 41153 37500 37865"
## [126] "PMC9715678 PMC_DL/PMC9715678/supplementaryfiles/42003_2022_4269_MOESM3_ESM.xlsx Mmusculus 14 40057 39326 39692 37316 37500 38231 38961 37135 37865 38596 41883 40422 40787 41153"
## [127] "PMC9715678 PMC_DL/PMC9715678/supplementaryfiles/42003_2022_4269_MOESM3_ESM.xlsx Mmusculus 14 37865 38231 40422 39326 39692 41153 41883 38961 40057 37135 37500 40787 37316 38596"
## [128] "PMC9715678 PMC_DL/PMC9715678/supplementaryfiles/42003_2022_4269_MOESM3_ESM.xlsx Mmusculus 14 39326 40057 38961 37316 37865 38231 38596 39692 40422 40787 37135 37500 41153 41883"
## [129] "PMC9715678 PMC_DL/PMC9715678/supplementaryfiles/42003_2022_4269_MOESM3_ESM.xlsx Mmusculus 8 40057 39326 37865 39692 38231 40787 40422 38596"
## [130] "PMC9713440 PMC_DL/PMC9713440/supplementaryfiles/jkac276_supplementary_data.xlsx Drerio 1 44805"
## [131] "PMC9714951 PMC_DL/PMC9714951/supplementaryfiles/pone.0278108.s006.xlsx Hsapiens 18 40238 41344 41522 41334 41334 41522 41334 41334 41522 41337 41344 41337 41342 41526 41530 41336 41336 41336"
## [132] "PMC9714951 PMC_DL/PMC9714951/supplementaryfiles/pone.0278108.s004.xlsx Hsapiens 3 41344 41344 40238"
## [133] "PMC9714951 PMC_DL/PMC9714951/supplementaryfiles/pone.0278108.s003.xlsx Hsapiens 4 41526 41336 41336 41336"
## [134] "PMC9714951 PMC_DL/PMC9714951/supplementaryfiles/pone.0278108.s002.xlsx Hsapiens 6 41336 41344 41344 40238 41336 41344"
## [135] "PMC9713698 PMC_DL/PMC9713698/supplementaryfiles/Table2.xlsx Hsapiens 1 44626"
## [136] "PMC9713327 PMC_DL/PMC9713327/supplementaryfiles/mmc4.xlsx Scerevisiae 1 44470"
## [137] "PMC9713173 PMC_DL/PMC9713173/supplementaryfiles/41586_2022_5448_MOESM5_ESM.xlsx Hsapiens 1 44627"
## [138] "PMC9708086 PMC_DL/PMC9708086/supplementaryfiles/elife-79676-supp1.xlsx Hsapiens 1 40057"
## [139] "PMC9706722 PMC_DL/PMC9706722/supplementaryfiles/DataSheet_1.xlsx Hsapiens 3 5-Sep 9-Sep 6-Sep"
## [140] "PMC9703941 PMC_DL/PMC9703941/supplementaryfiles/supplementary_table_14_ddac158.xlsx Hsapiens 98 2001-03-01 2003-03-01 2007-03-01 2009-03-01 2001-09-01 2010-09-01 2011-09-01 2003-09-01 2004-09-01 2006-09-01 2007-09-01 2008-09-01 2001-03-01 2003-03-01 2005-03-01 2007-03-01 2008-03-01 2011-09-01 2004-09-01 2007-09-01 2009-09-01 2001-03-01 2001-03-01 2003-03-01 2005-03-01 2006-03-01 2007-03-01 2008-03-01 2009-03-01 2010-09-01 2011-09-01 2002-09-01 2003-09-01 2004-09-01 2006-09-01 2007-09-01 2009-09-01 2006-03-01 2007-03-01 2002-09-01 2003-09-01 2006-09-01 2007-09-01 2003-03-01 2006-03-01 2009-03-01 2010-09-01 2011-09-01 2006-09-01 2007-09-01 2009-09-01 2005-03-01 2006-03-01 2007-03-01 2009-03-01 2011-09-01 2002-09-01 2007-09-01 2004-03-01 2006-03-01 2007-03-01 2009-03-01 2001-09-01 2004-09-01 2007-09-01 2001-03-01 2002-03-01 2003-03-01 2005-03-01 2007-03-01 2008-03-01 2003-09-01 2004-09-01 2006-09-01 2002-03-01 2007-03-01 2003-09-01 2007-09-01 2003-03-01 2007-03-01 2011-09-01 2006-09-01 2007-09-01 2003-03-01 2007-03-01 2009-03-01 2011-09-01 2006-09-01 2007-09-01 2008-09-01 2009-09-01 2001-03-01 2005-03-01 2011-09-01 2003-09-01 2006-09-01 2007-09-01 2009-09-01"
## [141] "PMC9800021 PMC_DL/PMC9800021/supplementaryfiles/mmc4.xlsx Ggallus 132 43892 43891 43898 43895 43895 43895 43899 43899 43899 43899 43891 44075 44075 44086 43892 43892 44084 44084 44084 43897 43897 43897 44076 44076 43891 43891 44085 44085 44085 44085 43896 43896 43893 43893 44082 44082 44082 44082 43901 43901 44088 44166 44080 44080 44080 43892 43891 43891 43891 44088 44088 43898 43895 44075 44075 44075 44075 44075 44075 44086 43892 44084 43897 44076 43891 44085 44085 44085 43896 43893 43893 43901 44081 44088 44166 44166 44080 43892 44166 43901 44088 44166 43891 43898 43893 43892 43891 43896 44166 44166 43898 43891 44081 43891 43891 44086 43898 44080 43892 43892 44080 43892 43891 44081 44081 44081 43901 44085 43891 44076 44166 44080 43891 44080 43891 44080 44082 43891 44166 43901 43891 44088 44084 43891 43893 44088 44085 43901 43893 43901 44166 44080"
## [142] "PMC9800021 PMC_DL/PMC9800021/supplementaryfiles/mmc8.xlsx Hsapiens 27 44257 44442 44443 44257 44446 44445 44262 44450 44264 44451 44259 44256 44261 44453 44447 44263 44441 44531 44265 44258 44440 44266 44448 44444 44256 44449 44260"
## [143] "PMC9800021 PMC_DL/PMC9800021/supplementaryfiles/mmc8.xlsx Hsapiens 27 44257 44442 44443 44257 44446 44445 44262 44450 44264 44451 44259 44256 44261 44453 44447 44263 44441 44531 44265 44258 44440 44266 44448 44444 44256 44449 44260"
## [144] "PMC9795334 PMC_DL/PMC9795334/supplementaryfiles/mmc3.xlsx Hsapiens 1 42192"
## [145] "PMC9795334 PMC_DL/PMC9795334/supplementaryfiles/mmc3.xlsx Hsapiens 1 42192"
## [146] "PMC9795334 PMC_DL/PMC9795334/supplementaryfiles/mmc3.xlsx Hsapiens 1 42258"
## [147] "PMC9795334 PMC_DL/PMC9795334/supplementaryfiles/mmc3.xlsx Mmusculus 1 44449"
## [148] "PMC9795334 PMC_DL/PMC9795334/supplementaryfiles/mmc3.xlsx Mmusculus 1 44256"
## [149] "PMC9795334 PMC_DL/PMC9795334/supplementaryfiles/mmc3.xlsx Mmusculus 1 44447"
## [150] "PMC9792466 PMC_DL/PMC9792466/supplementaryfiles/41419_2022_5519_MOESM4_ESM.xlsx Mmusculus 22 42796 42795 42796 42797 42798 42799 42800 42801 42802 42803 42993 42979 42988 42989 42980 42981 42982 42983 42984 42985 42986 42987"
## [151] "PMC9791056 PMC_DL/PMC9791056/supplementaryfiles/Table_1.xlsx Athaliana 1 44837"
## [152] "PMC9791056 PMC_DL/PMC9791056/supplementaryfiles/Table_1.xlsx Athaliana 1 44805"
## [153] "PMC9776514 zip/Supplementary_Table_S5.xlsx Hsapiens 2 40057 40057"
## [154] "PMC9776514 zip/Supplementary_Table_S6.xlsx Hsapiens 1 40057"
## [155] "PMC9776514 zip/Supplementary_Table_S4.xlsx Hsapiens 2 40057 40057"
## [156] "PMC9776006 zip/Supplementary_file_3.xlsx Hsapiens 19 44621 44622 44630 44622 44623 44625 44626 44627 44628 44805 44815 44816 44806 44807 44808 44809 44810 44811 44813"
## [157] "PMC9776006 zip/Supplementary_file_3.xlsx Hsapiens 19 44621 44622 44630 44622 44623 44625 44626 44627 44628 44805 44815 44816 44806 44807 44808 44809 44810 44811 44813"
## [158] "PMC9776006 zip/Supplementary_file_3.xlsx Hsapiens 1 44622"
## [159] "PMC9776006 zip/Supplementary_file_3.xlsx Hsapiens 1 44622"
## [160] "PMC9776006 zip/Supplementary_file_2.xlsx Hsapiens 24 44626 44813 44623 44816 44814 44630 44805 44628 44810 44818 44806 44896 44622 44809 44808 44625 44622 44621 44815 44807 44624 44621 44629 44811"
## [161] "PMC9775906 zip/supplementary_Table_S2.xlsx Hsapiens 3 44621 44622 44896"
## [162] "PMC9775105 zip/Table_S1.xlsx Hsapiens 4 39692 37316 38961 38231"
## [163] "PMC9775105 zip/Table_S3.xlsx Hsapiens 3 44627 44814 44806"
## [164] "PMC9774719 zip/Table_S5_Liver_novel_annotated.xlsx Rnorvegicus 1 43892"
## [165] "PMC9774719 zip/Table_S5_Liver_novel_annotated.xlsx Rnorvegicus 1 43892"
## [166] "PMC9771818 PMC_DL/PMC9771818/supplementaryfiles/41586_2022_5477_MOESM3_ESM.xlsx Ggallus 4 44624 44621 44624 44621"
## [167] "PMC9768914 PMC_DL/PMC9768914/supplementaryfiles/12929_2022_892_MOESM1_ESM.xlsx Mmusculus 2 38231 39326"
## [168] "PMC9764863 zip/Supplementary_Data_7.xlsx Hsapiens 3 44622 44809 44627"
## [169] "PMC9763853 PMC_DL/PMC9763853/supplementaryfiles/mmc6.xlsx Hsapiens 3 44448 44444 44445"
## [170] "PMC9763118 PMC_DL/PMC9763118/supplementaryfiles/41380_2022_1779_MOESM7_ESM.xlsx Hsapiens 1 40787"
## [171] "PMC9763118 PMC_DL/PMC9763118/supplementaryfiles/41380_2022_1779_MOESM7_ESM.xlsx Hsapiens 2 40422 38231"
## [172] "PMC9763118 PMC_DL/PMC9763118/supplementaryfiles/41380_2022_1779_MOESM7_ESM.xlsx Hsapiens 1 40787"
## [173] "PMC9763118 PMC_DL/PMC9763118/supplementaryfiles/41380_2022_1779_MOESM7_ESM.xlsx Hsapiens 1 38231"
## [174] "PMC9763110 PMC_DL/PMC9763110/supplementaryfiles/41380_2022_1776_MOESM7_ESM.xlsx Hsapiens 3 44258 44256 44450"
## [175] "PMC9763110 PMC_DL/PMC9763110/supplementaryfiles/41380_2022_1776_MOESM7_ESM.xlsx Hsapiens 20 44896 44628 44625 44631 44805 44630 44816 44629 44818 44814 44815 44623 44621 44624 44811 44807 44810 44813 44627 44622"
## [176] "PMC9763110 PMC_DL/PMC9763110/supplementaryfiles/41380_2022_1776_MOESM7_ESM.xlsx Hsapiens 12 44815 44623 44621 44624 44627 44813 44628 44810 44629 44818 44896 44814"
## [177] "PMC9763110 PMC_DL/PMC9763110/supplementaryfiles/41380_2022_1776_MOESM7_ESM.xlsx Hsapiens 17 44625 44628 44631 44805 44630 44816 44818 44624 44815 44623 44811 44621 44807 44810 44813 44622 44896"
## [178] "PMC9762029 PMC_DL/PMC9762029/supplementaryfiles/13059_2022_2834_MOESM11_ESM.xlsx Athaliana 2 44805 44840"
## [179] "PMC9762029 PMC_DL/PMC9762029/supplementaryfiles/13059_2022_2834_MOESM11_ESM.xlsx Athaliana 1 44775"
## [180] "PMC9762029 PMC_DL/PMC9762029/supplementaryfiles/13059_2022_2834_MOESM7_ESM.xlsx Athaliana 1 44653"
## [181] "PMC9762029 PMC_DL/PMC9762029/supplementaryfiles/13059_2022_2834_MOESM7_ESM.xlsx Athaliana 1 44806"
## [182] "PMC9762029 PMC_DL/PMC9762029/supplementaryfiles/13059_2022_2834_MOESM10_ESM.xlsx Athaliana 304 44653 44652 44777 44841 44775 44653 44805 44652 44838 44841 44777 44653 44805 44775 44774 44777 44838 44779 44653 44652 44805 44841 44840 44777 44838 44653 44652 44806 44840 44774 44777 44838 44653 44652 44777 44841 44775 44653 44654 44654 44805 44775 44652 44806 44780 44777 44653 44652 44775 44838 44805 44781 44781 44654 44841 44806 44780 44836 44839 44839 44805 44835 44774 44654 44781 44779 44775 44777 44838 44840 44840 44840 44776 44778 44779 44839 44835 44839 44778 44781 44806 44780 44805 44781 44781 44654 44841 44806 44780 44836 44839 44839 44805 44835 44775 44774 44654 44781 44779 44775 44777 44840 44840 44840 44776 44778 44779 44839 44835 44839 44778 44781 44806 44780 44805 44781 44781 44654 44841 44806 44780 44836 44839 44839 44805 44835 44841 44654 44781 44779 44775 44777 44840 44840 44840 44776 44778 44839 44835 44839 44778 44652 44781 44806 44780 44805 44781 44781 44654 44841 44806 44780 44836 44839 44839 44805 44835 44775 44774 44654 44781 44779 44775 44777 44840 44840 44776 44778 44779 44839 44835 44839 44778 44781 44806 44780 44805 44805 44781 44781 44654 44841 44806 44780 44836 44839 44839 44805 44835 44775 44841 44654 44781 44779 44775 44777 44840 44840 44776 44778 44779 44839 44835 44839 44778 44781 44780 44805 44805 44781 44781 44654 44841 44806 44780 44836 44839 44839 44805 44835 44774 44654 44781 44779 44775 44777 44838 44840 44840 44840 44776 44778 44779 44839 44835 44839 44778 44781 44806 44780 44805 44781 44781 44841 44806 44780 44836 44839 44839 44805 44835 44774 44841 44781 44779 44775 44777 44838 44840 44840 44840 44776 44778 44779 44839 44835 44839 44778 44781 44805 44805 44781 44781 44654 44841 44806 44780 44836 44839 44839 44805 44835 44774 44841 44654 44781 44779 44775 44777 44777 44840 44840 44840 44776 44778 44779 44839 44835 44839 44778 44781 44806 44780 44805"
## [183] "PMC9758519 PMC_DL/PMC9758519/supplementaryfiles/mmc3.xls Hsapiens 2 38961 39873"
## [184] "PMC9753033 PMC_DL/PMC9753033/supplementaryfiles/41598_2022_26135_MOESM2_ESM.xlsx Hsapiens 3 44621 44622 44896"
## [185] "PMC9749333 PMC_DL/PMC9749333/supplementaryfiles/12967_2022_3777_MOESM2_ESM.xls Hsapiens 1 44624"
## [186] "PMC9748020 PMC_DL/PMC9748020/supplementaryfiles/41598_2022_26011_MOESM3_ESM.xlsx Hsapiens 1 37226"
## [187] "PMC9748018 PMC_DL/PMC9748018/supplementaryfiles/41419_2022_5483_MOESM3_ESM.xlsx Mmusculus 25 37316 37500 37500 40787 40787 40787 41883 37135 39326 39326 39326 40422 39692 38231 38231 40057 40057 40057 40057 37865 37865 37865 41153 38596 38961"
## [188] "PMC9748018 PMC_DL/PMC9748018/supplementaryfiles/41419_2022_5483_MOESM2_ESM.xlsx Mmusculus 2 38961 37865"
## [189] "PMC9748018 PMC_DL/PMC9748018/supplementaryfiles/41419_2022_5483_MOESM2_ESM.xlsx Mmusculus 2 38231 38231"
## [190] "PMC9746894 zip/Table_1.XLSX Hsapiens 27 44809 44628 44896 44631 44630 44623 44626 44815 44810 44805 44819 44624 44622 44808 44621 44806 44814 44818 44813 44816 44621 44622 44627 44625 44812 44807 44629"
## [191] "PMC9746894 zip/Table_1.XLSX Hsapiens 27 44809 44628 44896 44631 44630 44623 44626 44815 44810 44805 44819 44624 44622 44808 44621 44806 44814 44818 44813 44816 44621 44622 44627 44625 44812 44807 44629"
## [192] "PMC9744761 PMC_DL/PMC9744761/supplementaryfiles/DataSheet3.xls Hsapiens 12 2022/12/01 2022/03/01 2022/03/10 2022/03/11 2022/03/02 2022/03/03 2022/03/04 2022/03/05 2022/03/06 2022/03/07 2022/03/08 2022/03/09"
## [193] "PMC9743592 PMC_DL/PMC9743592/supplementaryfiles/13148_2022_1386_MOESM3_ESM.xlsx Hsapiens 4 37469 37469 37469 37834"
## [194] "PMC9743561 PMC_DL/PMC9743561/supplementaryfiles/12887_2022_3764_MOESM1_ESM.xlsx Hsapiens 2 43714 43525"
## [195] "PMC9738480 zip/Supplementary_Table_S2.xlsx Ggallus 1 44447"
## [196] "PMC9738480 zip/Supplementary_Table_S11.xlsx Hsapiens 2 44447 44447"
## [197] "PMC9738480 zip/Supplementary_Table_S1.xlsx Hsapiens 76 43897 43901 43896 43900 44078 44078 44078 44078 44078 44078 44083 44083 44083 44083 44083 44083 44083 43897 43897 43897 43897 43891 43892 44084 44084 43892 44082 44082 44082 44082 44082 44082 44082 44082 44082 43893 44085 44085 44085 44085 44085 44085 44085 44085 44077 44077 44077 43899 44081 44081 44081 44081 44079 44080 44080 44080 44080 43898 43898 44086 44086 44076 44078 44078 43897 43897 43897 43895 44088 44075 44078 43893 43891 44081 43894 43897"
## [198] "PMC9738480 zip/Supplementary_Table_S1.xlsx Hsapiens 76 44257 44447 44262 44447 44448 44262 44447 44262 44447 44256 44262 44265 44258 44443 44445 44445 44266 44448 44445 44446 44446 44443 44446 44263 44262 44448 44449 44443 44442 44450 44262 44447 44445 44442 44259 44447 44260 44449 44441 44448 44443 44447 44443 44447 44440 44448 44443 44450 44443 44447 44262 44256 44263 44453 44448 44258 44262 44262 44257 44444 44450 44442 44450 44450 44450 44448 44264 44451 44443 44443 44450 44450 44261 44451 44446 44446"
## [199] "PMC9736101 zip/Table_S2_Gene-based-GWAS.xlsx Hsapiens 2 44814 44631"
## [200] "PMC9736101 zip/Table_S2_Gene-based-GWAS.xlsx Hsapiens 1 44808"
## [201] "PMC9734139 zip/Supplementary_Data/Supplementary_Data_7.xlsx Drerio 2 44449 44449"
## [202] "PMC9734139 zip/Supplementary_Data/Supplementary_Data_5.xlsx Drerio 8 44626 44627 44622 44629 44814 44816 44625 44810"
## [203] "PMC9731154 PMC_DL/PMC9731154/supplementaryfiles/DataSheet_2.xlsx Hsapiens 1 44812"
## [204] "PMC9731154 PMC_DL/PMC9731154/supplementaryfiles/DataSheet_2.xlsx Hsapiens 2 44624 44808"
## [205] "PMC9731154 PMC_DL/PMC9731154/supplementaryfiles/DataSheet_2.xlsx Hsapiens 3 44440 44441 44450"
## [206] "PMC9731154 PMC_DL/PMC9731154/supplementaryfiles/DataSheet_2.xlsx Hsapiens 1 44264"
## [207] "PMC9731154 PMC_DL/PMC9731154/supplementaryfiles/DataSheet_2.xlsx Hsapiens 2 44447 44450"
## [208] "PMC9731154 PMC_DL/PMC9731154/supplementaryfiles/DataSheet_2.xlsx Hsapiens 3 44447 44449 44258"
## [209] "PMC9723631 zip/Supplementary_data_1_DEGs_after_drug_treatments.xlsx Hsapiens 1 44076"
## [210] "PMC9724785 PMC_DL/PMC9724785/supplementaryfiles/DataSheet1.xlsx Hsapiens 1 38412"
## [211] "PMC9722939 PMC_DL/PMC9722939/supplementaryfiles/41467_2022_34460_MOESM9_ESM.xlsx Hsapiens 15 44445 44260 44441 44443 44454 44447 44446 44450 44257 44444 44448 44449 44440 44442 44256"
## [212] "PMC9722939 PMC_DL/PMC9722939/supplementaryfiles/41467_2022_34460_MOESM6_ESM.xlsx Hsapiens 11 44441 44446 44448 44446 44447 44446 44448 44441 44450 44448 44448"
## [213] "PMC9722939 PMC_DL/PMC9722939/supplementaryfiles/41467_2022_34460_MOESM8_ESM.xlsx Hsapiens 15 44813 44805 44809 44806 44808 44814 44810 44811 44625 44815 44812 44622 44621 44807 44819"
## [214] "PMC9722939 PMC_DL/PMC9722939/supplementaryfiles/41467_2022_34460_MOESM5_ESM.xlsx Hsapiens 36 44264 44445 44453 44453 44440 44440 44440 44441 44441 44262 44453 44443 44447 44451 44451 44443 44450 44451 44261 44261 44451 44443 44443 44450 44440 44261 44261 44264 44443 44443 44443 44443 44441 44440 44440 44449"
## [215] "PMC9722939 PMC_DL/PMC9722939/supplementaryfiles/41467_2022_34460_MOESM5_ESM.xlsx Hsapiens 36 44264 44445 44453 44453 44440 44440 44440 44441 44441 44262 44453 44443 44447 44451 44451 44443 44450 44451 44261 44261 44451 44443 44443 44450 44440 44261 44261 44264 44443 44443 44443 44443 44441 44440 44440 44449"
## [216] "PMC9722939 PMC_DL/PMC9722939/supplementaryfiles/41467_2022_34460_MOESM5_ESM.xlsx Hsapiens 1 44816"
## [217] "PMC9722939 PMC_DL/PMC9722939/supplementaryfiles/41467_2022_34460_MOESM5_ESM.xlsx Hsapiens 28 44454 44257 44256 44449 44262 44259 44441 44450 44256 44261 44266 44258 44447 44446 44453 44531 44263 44260 44264 44451 44440 44443 44265 44448 44257 44444 44442 44445"
## [218] "PMC9722939 PMC_DL/PMC9722939/supplementaryfiles/41467_2022_34460_MOESM7_ESM.xlsx Hsapiens 15 44256 44440 44449 44450 44454 44257 44441 44442 44443 44260 44444 44445 44446 44447 44448"
## [219] "PMC9722939 PMC_DL/PMC9722939/supplementaryfiles/41467_2022_34460_MOESM7_ESM.xlsx Hsapiens 3 44622 44807 44621"
## [220] "PMC9722939 PMC_DL/PMC9722939/supplementaryfiles/41467_2022_34460_MOESM7_ESM.xlsx Hsapiens 15 44807 44621 44809 44622 44808 44811 44806 44813 44814 44812 44815 44810 44805 44819 44625"
## [221] "PMC9718667 PMC_DL/PMC9718667/supplementaryfiles/41380_2022_1709_MOESM3_ESM.xlsx Hsapiens 25 44450 44257 44446 44442 44258 44265 44447 44257 44451 44266 44260 44256 44440 44443 44259 44256 44262 44440 44531 44453 44454 44441 44263 44448 44261"
## [222] "PMC9718667 PMC_DL/PMC9718667/supplementaryfiles/41380_2022_1709_MOESM3_ESM.xlsx Hsapiens 26 44627 44814 44808 44629 44806 44631 44625 44815 44622 44812 44896 44628 44816 44622 44630 44805 44621 44813 44623 44811 44805 44624 44621 44807 44819 44818"
## [223] "PMC9718667 PMC_DL/PMC9718667/supplementaryfiles/41380_2022_1709_MOESM3_ESM.xlsx Hsapiens 24 44450 44447 44446 44266 44442 44258 44257 44265 44451 44256 44260 44262 44531 44257 44448 44263 44259 44256 44443 44453 44440 44441 44454 44440"
## [224] "PMC9718667 PMC_DL/PMC9718667/supplementaryfiles/41380_2022_1709_MOESM3_ESM.xlsx Hsapiens 25 44621 44815 44811 44622 44625 44816 44806 44631 44622 44805 44630 44628 44805 44623 44621 44807 44808 44818 44626 44624 44896 44627 44812 44819 44813"
## [225] "PMC9716074 PMC_DL/PMC9716074/supplementaryfiles/Table8.xlsx Dmelanogaster 1 44808"
## [226] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM11_ESM.xlsx Hsapiens 11 44814 44627 44631 44819 44628 44622 44629 44806 44813 44623 44625"
## [227] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM11_ESM.xlsx Hsapiens 1 44622"
## [228] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM11_ESM.xlsx Hsapiens 1 44622"
## [229] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM13_ESM.xlsx Hsapiens 1 44818"
## [230] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM13_ESM.xlsx Hsapiens 1 44626"
## [231] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM13_ESM.xlsx Hsapiens 6 44621 44622 44809 44808 44807 44625"
## [232] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM10_ESM.xlsx Hsapiens 32 44621 44813 44624 44622 44621 44621 44624 44806 44806 44815 44815 44621 44628 44816 44813 44813 44809 44627 44810 44813 44806 44813 44809 44807 44626 44812 44809 44808 44810 44813 44623 44812"
## [233] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM16_ESM.xlsx Hsapiens 6 44256 44453 44531 44265 44454 44266"
## [234] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM16_ESM.xlsx Hsapiens 3 44443 44451 44440"
## [235] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM16_ESM.xlsx Hsapiens 4 44442 44446 44450 44449"
## [236] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM16_ESM.xlsx Hsapiens 7 44445 44262 44261 44441 44448 44444 44260"
## [237] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM16_ESM.xlsx Hsapiens 2 44621 44807"
## [238] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM12_ESM.xlsx Hsapiens 4 44808 44622 44896 44818"
## [239] "PMC9715725 PMC_DL/PMC9715725/supplementaryfiles/41422_2022_736_MOESM12_ESM.xlsx Hsapiens 4 44622 44808 44625 44626"
## [240] "PMC9714834 PMC_DL/PMC9714834/supplementaryfiles/pone.0278270.s001.xls Hsapiens 3 2023/09/09 2023/03/02 2023/09/09"
## [241] "PMC9714834 PMC_DL/PMC9714834/supplementaryfiles/pone.0278270.s001.xls Hsapiens 19 2023/09/09 2023/09/09 2023/03/01 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/03/10 2023/09/08 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/09/09"
## [242] "PMC9714834 PMC_DL/PMC9714834/supplementaryfiles/pone.0278270.s001.xls Hsapiens 20 2023/09/09 2023/09/09 2023/03/01 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/03/10 2023/03/02 2023/09/08 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/09/09 2023/09/09"
## [243] "PMC9713371 PMC_DL/PMC9713371/supplementaryfiles/mmc4.xlsx Hsapiens 30 38412 40057 40057 40057 37316 37316 39142 39142 39142 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 37500 38777 38777 38961 38961 38961 38961 38961 38961 38961 38961"
## [244] "PMC9713371 PMC_DL/PMC9713371/supplementaryfiles/mmc4.xlsx Hsapiens 2 38961 38961"
## [245] "PMC9713371 PMC_DL/PMC9713371/supplementaryfiles/mmc4.xlsx Hsapiens 1 38961"
## [246] "PMC9705836 PMC_DL/PMC9705836/supplementaryfiles/Table1.XLSX Ggallus 1 44838"
## [247] "PMC9705836 PMC_DL/PMC9705836/supplementaryfiles/Table4.XLSX Hsapiens 1 44838"
Let’s investigate the errors in more detail.
# By species
SPECIES <- sapply(strsplit(ERROR_GENELISTS," "),"[[",3)
table(SPECIES)
## SPECIES
## Athaliana Celegans Dmelanogaster Drerio Ggallus
## 7 1 6 10 15
## Hsapiens Mmusculus Rnorvegicus Scerevisiae
## 170 34 2 2
par(mar=c(5,12,4,2))
barplot(table(SPECIES),horiz=TRUE,las=1)
par(mar=c(5,5,4,2))
# Number of affected Excel files per paper
DIST <- table(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
DIST
##
## PMC9683724 PMC9703941 PMC9705836 PMC9706722 PMC9708086 PMC9713173 PMC9713327
## 1 1 2 1 1 1 1
## PMC9713371 PMC9713440 PMC9713698 PMC9714834 PMC9714951 PMC9715678 PMC9715725
## 3 1 1 3 4 6 14
## PMC9715950 PMC9716074 PMC9718428 PMC9718667 PMC9720150 PMC9720402 PMC9721428
## 1 1 2 4 1 1 3
## PMC9722939 PMC9723631 PMC9724437 PMC9724785 PMC9727928 PMC9728798 PMC9729111
## 10 1 1 1 2 7 1
## PMC9729295 PMC9731154 PMC9733279 PMC9734139 PMC9734637 PMC9736101 PMC9738480
## 2 6 3 2 1 2 4
## PMC9742853 PMC9743561 PMC9743592 PMC9744761 PMC9746894 PMC9746999 PMC9748018
## 1 1 1 1 2 14 3
## PMC9748020 PMC9748153 PMC9749333 PMC9750130 PMC9753033 PMC9755029 PMC9757741
## 1 2 1 1 1 16 1
## PMC9758519 PMC9758924 PMC9759858 PMC9762028 PMC9762029 PMC9762592 PMC9763110
## 1 5 1 3 5 2 4
## PMC9763118 PMC9763382 PMC9763387 PMC9763853 PMC9764863 PMC9768194 PMC9768914
## 4 4 3 1 1 12 1
## PMC9769457 PMC9771818 PMC9772819 PMC9774719 PMC9775101 PMC9775105 PMC9775906
## 1 1 4 2 1 2 1
## PMC9776006 PMC9776514 PMC9784255 PMC9785075 PMC9789999 PMC9791056 PMC9792465
## 5 3 1 1 3 2 7
## PMC9792466 PMC9793550 PMC9795187 PMC9795334 PMC9800021 PMC9801655
## 1 1 1 6 3 12
summary(as.numeric(DIST))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 2.976 3.500 16.000
hist(DIST,main="Number of affected Excel files per paper")
# PMC Articles with the most errors
DIST_DF <- as.data.frame(DIST)
DIST_DF <- DIST_DF[order(-DIST_DF$Freq),,drop=FALSE]
head(DIST_DF,20)
## Var1 Freq
## 48 PMC9755029 16
## 14 PMC9715725 14
## 41 PMC9746999 14
## 62 PMC9768194 12
## 83 PMC9801655 12
## 22 PMC9722939 10
## 27 PMC9728798 7
## 77 PMC9792465 7
## 13 PMC9715678 6
## 30 PMC9731154 6
## 81 PMC9795334 6
## 51 PMC9758924 5
## 54 PMC9762029 5
## 71 PMC9776006 5
## 12 PMC9714951 4
## 18 PMC9718667 4
## 35 PMC9738480 4
## 56 PMC9763110 4
## 57 PMC9763118 4
## 58 PMC9763382 4
MOST_ERR_FILES = as.character(DIST_DF[1,1])
MOST_ERR_FILES
## [1] "PMC9755029"
# Number of errors per paper
NERR <- as.numeric(sapply(strsplit(ERROR_GENELISTS," "),"[[",4))
names(NERR) <- sapply(strsplit(ERROR_GENELISTS," "),"[[",1)
NERR <-tapply(NERR, names(NERR), sum)
NERR
## PMC9683724 PMC9703941 PMC9705836 PMC9706722 PMC9708086 PMC9713173 PMC9713327
## 27 98 2 3 1 1 1
## PMC9713371 PMC9713440 PMC9713698 PMC9714834 PMC9714951 PMC9715678 PMC9715725
## 33 1 1 42 31 78 83
## PMC9715950 PMC9716074 PMC9718428 PMC9718667 PMC9720150 PMC9720402 PMC9721428
## 26 1 32 100 1 1 5
## PMC9722939 PMC9723631 PMC9724437 PMC9724785 PMC9727928 PMC9728798 PMC9729111
## 175 1 28 1 10 9 22
## PMC9729295 PMC9731154 PMC9733279 PMC9734139 PMC9734637 PMC9736101 PMC9738480
## 3 12 9 10 1 3 155
## PMC9742853 PMC9743561 PMC9743592 PMC9744761 PMC9746894 PMC9746999 PMC9748018
## 1 2 4 12 54 63 29
## PMC9748020 PMC9748153 PMC9749333 PMC9750130 PMC9753033 PMC9755029 PMC9757741
## 1 26 1 498 3 52 10
## PMC9758519 PMC9758924 PMC9759858 PMC9762028 PMC9762029 PMC9762592 PMC9763110
## 2 31 2 190 309 6 52
## PMC9763118 PMC9763382 PMC9763387 PMC9763853 PMC9764863 PMC9768194 PMC9768914
## 5 128 19 3 3 149 2
## PMC9769457 PMC9771818 PMC9772819 PMC9774719 PMC9775101 PMC9775105 PMC9775906
## 14 4 55 2 2 7 3
## PMC9776006 PMC9776514 PMC9784255 PMC9785075 PMC9789999 PMC9791056 PMC9792465
## 64 5 1 3 72 2 26
## PMC9792466 PMC9793550 PMC9795187 PMC9795334 PMC9800021 PMC9801655
## 22 5 2 6 186 18
hist(NERR,main="number of errors per PMC article")
NERR_DF <- as.data.frame(NERR)
NERR_DF <- NERR_DF[order(-NERR_DF$NERR),,drop=FALSE]
head(NERR_DF,20)
## NERR
## PMC9750130 498
## PMC9762029 309
## PMC9762028 190
## PMC9800021 186
## PMC9722939 175
## PMC9738480 155
## PMC9768194 149
## PMC9763382 128
## PMC9718667 100
## PMC9703941 98
## PMC9715725 83
## PMC9715678 78
## PMC9789999 72
## PMC9776006 64
## PMC9746999 63
## PMC9772819 55
## PMC9746894 54
## PMC9755029 52
## PMC9763110 52
## PMC9714834 42
MOST_ERR = rownames(NERR_DF)[1]
MOST_ERR
## [1] "PMC9750130"
GENELIST_ERROR_ARTICLES <- gsub("PMC","",GENELIST_ERROR_ARTICLES)
### JSON PARSING is more reliable than XML
ARTICLES <- esummary( GENELIST_ERROR_ARTICLES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA$result
ARTICLE_DATA <- ARTICLE_DATA[2:length(ARTICLE_DATA)]
JOURNALS <- unlist(lapply(ARTICLE_DATA,function(x) {x$fulljournalname} ))
JOURNALS_TABLE <- table(JOURNALS)
JOURNALS_TABLE <- JOURNALS_TABLE[order(-JOURNALS_TABLE)]
length(JOURNALS_TABLE)
## [1] 52
NUM_JOURNALS=length(JOURNALS_TABLE)
par(mar=c(5,25,4,2))
barplot(head(JOURNALS_TABLE,10), horiz=TRUE, las=1,
xlab="Articles with gene name errors in supp files",
main="Top journals this month")
Congrats to our Journal of the Month winner!
JOURNAL_WINNER <- names(head(JOURNALS_TABLE,1))
JOURNAL_WINNER
## [1] "Frontiers in Genetics"
There are two categories:
Paper with the most suplementary files affected by gene name errors (MOST_ERR_FILES)
Paper with the most gene names converted to dates (MOST_ERR)
Sometimes, one paper can win both categories. Congrats to our winners.
MOST_ERR_FILES <- gsub("PMC","",MOST_ERR_FILES)
ARTICLES <- esummary( MOST_ERR_FILES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA[2]
ARTICLE_DATA
## $result
## $result$uids
## [1] "9755029"
##
## $result$`9755029`
## $result$`9755029`$uid
## [1] "9755029"
##
## $result$`9755029`$pubdate
## [1] "2022 Oct 14"
##
## $result$`9755029`$epubdate
## [1] "2022 Oct 14"
##
## $result$`9755029`$printpubdate
## [1] ""
##
## $result$`9755029`$source
## [1] "Neurobiol Stress"
##
## $result$`9755029`$authors
## name authtype
## 1 Gerstner N Author
## 2 Krontira AC Author
## 3 Cruceanu C Author
## 4 Roeh S Author
## 5 Pütz B Author
## 6 Sauer S Author
## 7 Rex-Haffner M Author
## 8 Schmidt MV Author
## 9 Binder EB Author
## 10 Knauer-Arloth J Author
##
## $result$`9755029`$title
## [1] "DiffBrainNet: Differential analyses add new insights into the response to glucocorticoids at the level of genes, networks and brain regions"
##
## $result$`9755029`$volume
## [1] "21"
##
## $result$`9755029`$issue
## [1] ""
##
## $result$`9755029`$pages
## [1] "100496"
##
## $result$`9755029`$articleids
## idtype value
## 1 pmid 36532379
## 2 doi 10.1016/j.ynstr.2022.100496
## 3 pmcid PMC9755029
##
## $result$`9755029`$fulljournalname
## [1] "Neurobiology of Stress"
##
## $result$`9755029`$sortdate
## [1] "2022/10/14 00:00"
##
## $result$`9755029`$pmclivedate
## [1] "2022/12/17"
MOST_ERR <- gsub("PMC","",MOST_ERR)
ARTICLE_DATA <- esummary(MOST_ERR,db = "pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLE_DATA,as= "parsed")
ARTICLE_DATA
## $header
## $header$type
## [1] "esummary"
##
## $header$version
## [1] "0.3"
##
##
## $result
## $result$uids
## [1] "9750130"
##
## $result$`9750130`
## $result$`9750130`$uid
## [1] "9750130"
##
## $result$`9750130`$pubdate
## [1] "2022 Nov 29"
##
## $result$`9750130`$epubdate
## [1] "2022 Nov 29"
##
## $result$`9750130`$printpubdate
## [1] ""
##
## $result$`9750130`$source
## [1] "Genome Biol Evol"
##
## $result$`9750130`$authors
## name authtype
## 1 Mouterde M Author
## 2 Daali Y Author
## 3 Rollason V Author
## 4 ČÞková M Author
## 5 Mulugeta A Author
## 6 Al Balushi KA Author
## 7 Fakis G Author
## 8 Constantinidis TC Author
## 9 Al-Thihli K Author
## 10 Černá M Author
## 11 Makonnen E Author
## 12 Boukouvala S Author
## 13 Al-Yahyaee S Author
## 14 Yimer G Author
## 15 Černý V Author
## 16 Desmeules J Author
## 17 Poloni ES Author
##
## $result$`9750130`$title
## [1] "Joint Analysis of Phenotypic and Genomic Diversity Sheds Light on the Evolution of Xenobiotic Metabolism in Humans"
##
## $result$`9750130`$volume
## [1] "14"
##
## $result$`9750130`$issue
## [1] "12"
##
## $result$`9750130`$pages
## [1] "evac167"
##
## $result$`9750130`$articleids
## idtype value
## 1 pmid 36445690
## 2 doi 10.1093/gbe/evac167
## 3 pmcid PMC9750130
##
## $result$`9750130`$fulljournalname
## [1] "Genome Biology and Evolution"
##
## $result$`9750130`$sortdate
## [1] "2022/11/29 00:00"
##
## $result$`9750130`$pmclivedate
## [1] "2022/12/15"
To plot the trend over the past 6-12 months.
url <- "https://ziemann-lab.net/public/gene_name_errors/"
doc <- htmlParse(url)
links <- xpathSApply(doc, "//a/@href")
links <- links[grep("html",links)]
listing <- htmlParse( getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE) )
listing <- xpathSApply(listing, "//a/@href")
listing <- listing[grep("html",listing)]
unlink("online_files/",recursive=TRUE)
dir.create("online_files")
sapply(listing, function(mylink) {
download.file(paste(url,mylink,sep=""),destfile=paste("online_files/",mylink,sep=""))
} )
## href href href href href href href href href href href href href href href href
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## href href href href href href
## 0 0 0 0 0 0
myfilelist <- list.files("online_files/",full.names=TRUE)
trends <- sapply(myfilelist, function(myfilename) {
x <- readLines(myfilename)
# Num XL gene list articles
NUM_GENELIST_ARTICLES <- x[grep("NUM_GENELIST_ARTICLES",x)[3]+1]
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES," "),"[[",3)
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES,"<"),"[[",1)
NUM_GENELIST_ARTICLES <- as.numeric(NUM_GENELIST_ARTICLES)
# number of affected articles
NUM_ERROR_GENELIST_ARTICLES <- x[grep("NUM_ERROR_GENELIST_ARTICLES",x)[3]+1]
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES," "),"[[",3)
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES,"<"),"[[",1)
NUM_ERROR_GENELIST_ARTICLES <- as.numeric(NUM_ERROR_GENELIST_ARTICLES)
# Error proportion
ERROR_PROPORTION <- x[grep("ERROR_PROPORTION",x)[3]+1]
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION," "),"[[",3)
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION,"<"),"[[",1)
ERROR_PROPORTION <- as.numeric(ERROR_PROPORTION)
# number of journals
NUM_JOURNALS <- x[grep('JOURNALS_TABLE',x)[3]+1]
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS," "),"[[",3)
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS,"<"),"[[",1)
NUM_JOURNALS <- as.numeric(NUM_JOURNALS)
NUM_JOURNALS
res <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
return(res)
})
colnames(trends) <- sapply(strsplit(colnames(trends),"_"),"[[",3)
colnames(trends) <- gsub(".html","",colnames(trends))
trends <- as.data.frame(trends)
rownames(trends) <- c("NUM_GENELIST_ARTICLES","NUM_ERROR_GENELIST_ARTICLES","ERROR_PROPORTION","NUM_JOURNALS")
trends <- t(trends)
trends <- as.data.frame(trends)
CURRENT_RES <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
trends <- rbind(trends,CURRENT_RES)
paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
## [1] "2023-01"
rownames(trends)[nrow(trends)] <- paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
plot(trends$NUM_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with Excel gene lists per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_ERROR_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with gene name errors per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$ERROR_PROPORTION, xaxt = "n" , type="b" , main="Proportion of articles with Excel gene list affected by errors",
ylab="proportion", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_JOURNALS, xaxt = "n" , type="b" , main="Number of journals with affected articles",
ylab="number of journals", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
unlink("online_files/",recursive=TRUE)
Zeeberg, B.R., Riss, J., Kane, D.W. et al. Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5, 80 (2004). https://doi.org/10.1186/1471-2105-5-80
Ziemann, M., Eren, Y. & El-Osta, A. Gene name errors are widespread in the scientific literature. Genome Biol 17, 177 (2016). https://doi.org/10.1186/s13059-016-1044-7
sessionInfo()
## R version 4.2.2 Patched (2022-11-10 r83330)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RCurl_1.98-1.9 readxl_1.4.1 reutils_0.2.3 xml2_1.3.3 jsonlite_1.8.4
## [6] XML_3.99-0.13
##
## loaded via a namespace (and not attached):
## [1] knitr_1.41 magrittr_2.0.3 R6_2.5.1 rlang_1.0.6
## [5] fastmap_1.1.0 highr_0.9 stringr_1.5.0 tools_4.2.2
## [9] xfun_0.35 cli_3.4.1 jquerylib_0.1.4 htmltools_0.5.4
## [13] assertthat_0.2.1 yaml_2.3.6 digest_0.6.31 lifecycle_1.0.3
## [17] sass_0.4.4 vctrs_0.5.1 bitops_1.0-7 glue_1.6.2
## [21] cachem_1.0.6 evaluate_0.19 rmarkdown_2.19 stringi_1.7.8
## [25] cellranger_1.1.0 compiler_4.2.2 bslib_0.4.1