Source: https://github.com/markziemann/GeneNameErrors2020
View the reports: http://ziemann-lab.net/public/gene_name_errors/
Gene name errors result when data are imported improperly into MS Excel and other spreadsheet programs (Zeeberg et al, 2004). Certain gene names like MARCH3, SEPT2 and DEC1 are converted into date format. These errors are surprisingly common in supplementary data files in the field of genomics (Ziemann et al, 2016). This could be considered a small error because it only affects a small number of genes, however it is symptomtic of poor data processing methods. The purpose of this script is to identify gene name errors present in supplementary files of PubMed Central articles in the previous month.
library("XML")
library("jsonlite")
library("xml2")
library("reutils")
library("readxl")
library("RCurl")
Here I will be getting PubMed Central IDs for the previous month.
Start with figuring out the date to search PubMed Central.
CURRENT_MONTH=format(Sys.time(), "%m")
CURRENT_YEAR=format(Sys.time(), "%Y")
if (CURRENT_MONTH == "01") {
PREV_YEAR=as.character(as.numeric(format(Sys.time(), "%Y"))-1)
PREV_MONTH="12"
} else {
PREV_YEAR=CURRENT_YEAR
PREV_MONTH=as.character(as.numeric(format(Sys.time(), "%m"))-1)
}
DATE=paste(PREV_YEAR,"/",PREV_MONTH,sep="")
DATE
## [1] "2023/5"
Let’s see how many PMC IDs we have in the past month.
QUERY ='((genom*[Abstract]))'
ESEARCH_RES <- esearch(term=QUERY, db = "pmc", rettype = "uilist", retmode = "xml", retstart = 0,
retmax = 5000000, usehistory = TRUE, webenv = NULL, querykey = NULL, sort = NULL, field = NULL,
datetype = NULL, reldate = NULL,
mindate = paste(DATE,"/1",sep="") , maxdate = paste(DATE,"/31",sep=""))
pmc <- efetch(ESEARCH_RES,retmode="text",rettype="uilist",outfile="pmcids.txt")
## Retrieving UIDs 1 to 500
## Retrieving UIDs 501 to 1000
## Retrieving UIDs 1001 to 1500
## Retrieving UIDs 1501 to 2000
## Retrieving UIDs 2001 to 2500
## Retrieving UIDs 2501 to 3000
pmc <- read.table(pmc)
pmc <- paste("PMC",pmc$V1,sep="")
NUM_ARTICLES=length(pmc)
NUM_ARTICLES
## [1] 2996
writeLines(pmc,con="pmc.txt")
Now run the bash script. Note that false positives can occur (~1.5%) and these results have not been verified by a human.
Here are some definitions:
NUM_XLS = Number of supplementary Excel files in this set of PMC articles.
NUM_XLS_ARTICLES = Number of articles matching the PubMed Central search which have supplementary Excel files.
GENELISTS = The gene lists found in the Excel files. Each Excel file is counted once even it has multiple gene lists.
NUM_GENELISTS = The number of Excel files with gene lists.
NUM_GENELIST_ARTICLES = The number of PMC articles with supplementary Excel gene lists.
ERROR_GENELISTS = Files suspected to contain gene name errors. The dates and five-digit numbers indicate transmogrified gene names.
NUM_ERROR_GENELISTS = Number of Excel gene lists with errors.
NUM_ERROR_GENELIST_ARTICLES = Number of articles with supplementary Excel gene name errors.
ERROR_PROPORTION = This is the proportion of articles with Excel gene lists that have errors.
system("./gene_names.sh pmc.txt")
results <- readLines("results.txt")
XLS <- results[grep("XLS",results,ignore.case=TRUE)]
NUM_XLS = length(XLS)
NUM_XLS
## [1] 4715
NUM_XLS_ARTICLES = length(unique(sapply(strsplit(XLS," "),"[[",1)))
NUM_XLS_ARTICLES
## [1] 1078
GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>2]
#GENELISTS
NUM_GENELISTS <- length(unique(sapply(strsplit(GENELISTS," "),"[[",2)))
NUM_GENELISTS
## [1] 504
NUM_GENELIST_ARTICLES <- length(unique(sapply(strsplit(GENELISTS," "),"[[",1)))
NUM_GENELIST_ARTICLES
## [1] 291
ERROR_GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>3]
#ERROR_GENELISTS
NUM_ERROR_GENELISTS = length(ERROR_GENELISTS)
NUM_ERROR_GENELISTS
## [1] 161
GENELIST_ERROR_ARTICLES <- unique(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
GENELIST_ERROR_ARTICLES
## [1] "PMC10225086" "PMC10218407" "PMC10216680" "PMC10214836" "PMC10226651"
## [6] "PMC10225930" "PMC10217816" "PMC10216487" "PMC10214295" "PMC10210621"
## [11] "PMC10210411" "PMC10209733" "PMC10209063" "PMC10206494" "PMC10206047"
## [16] "PMC10203873" "PMC10203443" "PMC10203276" "PMC10202655" "PMC10200853"
## [21] "PMC10199912" "PMC10191432" "PMC10191288" "PMC10185564" "PMC10185556"
## [26] "PMC10181825" "PMC10164992" "PMC10156726" "PMC10151398" "PMC10150886"
## [31] "PMC10150505" "PMC10150242" "PMC10150019" "PMC10208567" "PMC10206066"
## [36] "PMC10203218" "PMC10199099" "PMC10194222" "PMC10188588" "PMC10188112"
## [41] "PMC10183592" "PMC10183018" "PMC10179737" "PMC10174503" "PMC10174322"
## [46] "PMC10174054" "PMC10170458" "PMC10171335" "PMC10170761" "PMC10170086"
## [51] "PMC10169700" "PMC10167087" "PMC10160042" "PMC10158796" "PMC10157938"
## [56] "PMC10156747" "PMC10154557" "PMC10150489"
NUM_ERROR_GENELIST_ARTICLES <- length(GENELIST_ERROR_ARTICLES)
NUM_ERROR_GENELIST_ARTICLES
## [1] 58
ERROR_PROPORTION = NUM_ERROR_GENELIST_ARTICLES / NUM_GENELIST_ARTICLES
ERROR_PROPORTION
## [1] 0.1993127
Here you can have a look at all the gene lists detected in the past month, as well as those with errors. The dates are obvious errors, these are commonly dates in September, March, December and October. The five-digit numbers represent dates as they are encoded in the Excel internal format. The five digit number is the number of days since 1900. If you were to take these numbers and put them into Excel and format the cells as dates, then these will also mostly map to dates in September, March, December and October.
#GENELISTS
ERROR_GENELISTS
## [1] "PMC10225086 PMC_DL/PMC10225086/supplementaryfiles/13071_2023_5785_MOESM3_ESM.xlsx Celegans 2 44836 44624"
## [2] "PMC10225086 PMC_DL/PMC10225086/supplementaryfiles/13071_2023_5785_MOESM3_ESM.xlsx Celegans 2 44626 44621"
## [3] "PMC10225086 PMC_DL/PMC10225086/supplementaryfiles/13071_2023_5785_MOESM3_ESM.xlsx Celegans 2 44626 44623"
## [4] "PMC10218407 zip/Supplemental_Table_1_DEGs_v2.xlsx Drerio 18 45201 45184 44987 44992 44993 45172 45175 44992 44994 45179 45171 45173 44989 44996 44991 44992 44990 45181"
## [5] "PMC10218407 zip/Supplemental_Table_1_DEGs_v2.xlsx Drerio 18 44992 45179 45184 45181 44993 45172 45171 45175 44987 45201 44992 44992 44991 44990 44994 44989 45173 44996"
## [6] "PMC10218407 zip/Supplemental_Table_1_DEGs_v2.xlsx Drerio 18 45184 44992 45179 45181 45175 45171 45172 44993 44991 45201 44990 44994 44989 44992 45173 44996 44987 44992"
## [7] "PMC10218407 zip/Supplemental_Table_1_DEGs_v2.xlsx Drerio 18 45175 45201 44996 45184 44994 44993 44989 44992 44992 44992 45173 45181 44991 45171 44987 45172 45179 44990"
## [8] "PMC10218407 zip/Supplemental_Table_3_DEGs_v3.xlsx Drerio 18 45201 45184 44987 44992 44993 45172 45175 44992 44994 45179 45171 45173 44989 44996 44991 44992 44990 45181"
## [9] "PMC10218407 zip/Supplemental_Table_3_DEGs_v3.xlsx Drerio 18 44992 45179 45184 45181 44993 45172 45171 45175 44987 45201 44992 44992 44991 44990 44994 44989 45173 44996"
## [10] "PMC10218407 zip/Supplemental_Table_3_DEGs_v3.xlsx Drerio 18 45184 44992 45179 45181 45175 45171 45172 44993 44991 45201 44990 44994 44989 44992 45173 44996 44987 44992"
## [11] "PMC10218407 zip/Supplemental_Table_3_DEGs_v3.xlsx Drerio 18 45175 45201 44996 45184 44994 44993 44989 44992 44992 44992 45173 45181 44991 45171 44987 45172 45179 44990"
## [12] "PMC10216680 zip/Supplement_Tables.xlsx Hsapiens 1 44623"
## [13] "PMC10214836 PMC_DL/PMC10214836/supplementaryfiles/Table_1.xlsx Hsapiens 1 45261"
## [14] "PMC10226651 PMC_DL/PMC10226651/supplementaryfiles/Table_2.xlsx Hsapiens 5 44444 44446 44443 44261 44442"
## [15] "PMC10225930 PMC_DL/PMC10225930/supplementaryfiles/mmc3.xlsx Hsapiens 3 37316 37316 37500"
## [16] "PMC10225930 PMC_DL/PMC10225930/supplementaryfiles/mmc3.xlsx Hsapiens 7 38961 37316 37500 40057 42248 39326 39142"
## [17] "PMC10217816 zip/genes-2253974-supplementary/Telomere_dkd_Supp_Tables_2023_revised.xlsx Ggallus 1 40787"
## [18] "PMC10216487 zip/Supplemental_Table_S4_combined_gene_hit_list.xlsx Hsapiens 26 44989 45178 45261 44986 44987 44986 44995 44996 44987 44988 44990 44991 44992 44993 44994 45170 45179 45180 45181 45183 45171 45172 45173 45174 45175 45177"
## [19] "PMC10214295 PMC_DL/PMC10214295/supplementaryfiles/mmc4.xlsx Hsapiens 3 44814 44627 44811"
## [20] "PMC10210621 zip/Supplemental-Tables_TCAI-revised.xlsx Hsapiens 1 43357"
## [21] "PMC10210411 PMC_DL/PMC10210411/supplementaryfiles/13058_2023_1627_MOESM5_ESM.xlsx Hsapiens 12 12-Sep 2-Mar 4-Sep 3-Sep 10-Mar 2-Mar 9-Mar 2-Sep 8-Sep 5-Mar 10-Sep 11-Mar"
## [22] "PMC10210411 PMC_DL/PMC10210411/supplementaryfiles/13058_2023_1627_MOESM5_ESM.xlsx Hsapiens 5 10-Mar 2-Mar 8-Sep 4-Sep 3-Sep"
## [23] "PMC10210411 PMC_DL/PMC10210411/supplementaryfiles/13058_2023_1627_MOESM5_ESM.xlsx Hsapiens 8 2-Sep 3-Sep 10-Mar 5-Mar 2-Mar 8-Sep 11-Mar 10-Sep"
## [24] "PMC10210411 PMC_DL/PMC10210411/supplementaryfiles/13058_2023_1627_MOESM5_ESM.xlsx Hsapiens 10 8-Mar 3-Sep 4-Mar 2-Sep 1-Sep 12-Sep 5-Mar 7-Mar 11-Mar 10-Mar"
## [25] "PMC10210411 PMC_DL/PMC10210411/supplementaryfiles/13058_2023_1627_MOESM5_ESM.xlsx Hsapiens 3 2-Sep 8-Sep 11-Sep"
## [26] "PMC10210411 PMC_DL/PMC10210411/supplementaryfiles/13058_2023_1627_MOESM5_ESM.xlsx Hsapiens 12 8-Mar 3-Sep 7-Mar 1-Sep 12-Sep 11-Mar 2-Mar 2-Sep 5-Mar 10-Mar 4-Mar 11-Sep"
## [27] "PMC10210411 PMC_DL/PMC10210411/supplementaryfiles/13058_2023_1627_MOESM5_ESM.xlsx Hsapiens 1 1-Mar"
## [28] "PMC10210411 PMC_DL/PMC10210411/supplementaryfiles/13058_2023_1627_MOESM5_ESM.xlsx Hsapiens 3 4-Sep 6-Sep 10-Mar"
## [29] "PMC10210411 PMC_DL/PMC10210411/supplementaryfiles/13058_2023_1627_MOESM5_ESM.xlsx Hsapiens 5 1-Mar 6-Sep 2-Mar 1-Mar 12-Sep"
## [30] "PMC10209733 zip/TableS4.xlsx Hsapiens 3 44622 44626 44813"
## [31] "PMC10209733 zip/TableS7.xlsx Hsapiens 10 44622 44813 44626 44812 44810 44813 44812 44626 44810 44810"
## [32] "PMC10209063 PMC_DL/PMC10209063/supplementaryfiles/41467_2023_38785_MOESM4_ESM.xlsx Drerio 14 44836 44814 44624 44819 44810 44621 44621 44622 44625 44629 44807 44631 30363 30417"
## [33] "PMC10209063 PMC_DL/PMC10209063/supplementaryfiles/41467_2023_38785_MOESM4_ESM.xlsx Hsapiens 11 44621 44631 44814 44628 44626 44810 44621 44622 44625 44624 44809"
## [34] "PMC10209063 PMC_DL/PMC10209063/supplementaryfiles/41467_2023_38785_MOESM4_ESM.xlsx Drerio 14 44621 44814 44836 44819 44810 44624 30363 44807 44622 30417 44631 44629 44621 44625"
## [35] "PMC10209063 PMC_DL/PMC10209063/supplementaryfiles/41467_2023_38785_MOESM4_ESM.xlsx Hsapiens 11 44621 44814 44810 44626 44624 44631 44622 44809 44621 44625 44628"
## [36] "PMC10209063 PMC_DL/PMC10209063/supplementaryfiles/41467_2023_38785_MOESM4_ESM.xlsx Drerio 14 44624 44807 44629 44810 44819 44814 44836 44621 30417 44631 44625 44621 30363 44622"
## [37] "PMC10209063 PMC_DL/PMC10209063/supplementaryfiles/41467_2023_38785_MOESM4_ESM.xlsx Hsapiens 11 44810 44624 44621 44809 44814 44625 44626 44631 44621 44628 44622"
## [38] "PMC10209063 PMC_DL/PMC10209063/supplementaryfiles/41467_2023_38785_MOESM6_ESM.xlsx Hsapiens 14 45174 44986 44986 45174 44996 44987 45175 44993 44992 44991 41701 45173 45179 45172"
## [39] "PMC10206494 PMC_DL/PMC10206494/supplementaryfiles/mmc2.xlsx Hsapiens 12 44811 44806 44623 44814 44623 44626 44813 44623 44623 44621 44623 44815"
## [40] "PMC10206047 PMC_DL/PMC10206047/supplementaryfiles/Table7.XLSX Hsapiens 1 44990"
## [41] "PMC10203873 PMC_DL/PMC10203873/supplementaryfiles/Table_4.xlsx Hsapiens 2 44819 44621"
## [42] "PMC10203443 PMC_DL/PMC10203443/supplementaryfiles/mmc4.xls Athaliana 2 44835 44778"
## [43] "PMC10203443 PMC_DL/PMC10203443/supplementaryfiles/mmc4.xls Athaliana 1 44835"
## [44] "PMC10203276 PMC_DL/PMC10203276/supplementaryfiles/mmc17.xlsx Hsapiens 5 44625 44627 44626 44629 44628"
## [45] "PMC10202655 PMC_DL/PMC10202655/supplementaryfiles/mmc3.xlsx Hsapiens 81 39873 37316 37135 37226 37316 39873 38231 37316 38412 37135 40057 39508 41883 36951 41153 41153 41883 39142 38961 39326 40787 36951 38961 38777 39692 40422 40238 38777 39142 41153 38596 40422 38047 36951 38231 39326 38047 36951 39142 39326 37865 40057 40787 37500 37316 39873 40787 37865 37681 40238 40422 40057 40603 40603 39508 38596 37865 37500 37681 37681 38777 39692 38412 38412 37226 37135 39508 39692 37316 38047 37316 38961 38231 38596 40603 37226 36951 41883 36951 40238 37500"
## [46] "PMC10202655 PMC_DL/PMC10202655/supplementaryfiles/mmc3.xlsx Hsapiens 27 37226 36951 37316 36951 40238 40603 37316 37681 38047 38412 38777 39142 39508 39873 37135 40422 40787 41153 41883 37500 37865 38231 38596 38961 39326 39692 40057"
## [47] "PMC10200853 PMC_DL/PMC10200853/supplementaryfiles/mmc3.xlsx Hsapiens 27 44995 44996 45180 45178 44986 44987 45175 44988 45183 44994 45182 44989 45174 45181 45261 45170 45173 45179 44991 45184 45176 44990 44992 45171 45177 44993 45172"
## [48] "PMC10199912 PMC_DL/PMC10199912/supplementaryfiles/41598_2023_35312_MOESM7_ESM.xlsx Hsapiens 6 37500 40057 36951 36951 37226 39326"
## [49] "PMC10191432 zip/adg2235_Data_file_S1.xlsx Hsapiens 28 36951 37226 38231 40422 39326 38777 38596 40787 41883 39508 37865 40603 38961 36951 37316 37500 39873 40238 37135 39142 42248 41153 39692 38047 40057 37681 37316 38412"
## [50] "PMC10191432 zip/adg2235_Data_file_S2.xlsx Hsapiens 28 40603 37135 39326 38777 40238 37681 41153 39508 39142 36951 37316 37226 36951 38961 39873 37500 40422 39692 38231 37865 41883 37316 38047 38412 40787 42248 38596 40057"
## [51] "PMC10191288 PMC_DL/PMC10191288/supplementaryfiles/pone.0283553.s007.xlsx Hsapiens 1 25419"
## [52] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM7_ESM.xlsx Hsapiens 1 IRF5-Oct"
## [53] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM4_ESM.xlsx Hsapiens 9 44623 44623 44627 44806 44806 44806 44806 44806 44806"
## [54] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM4_ESM.xlsx Hsapiens 10 44807 44808 44622 44621 44621 44624 44622 44626 44815 44812"
## [55] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM4_ESM.xlsx Hsapiens 35 44807 44626 44622 44621 44811 44812 44815 44626 44815 44629 44629 44811 44811 44626 44621 44815 44819 44625 44625 44622 44819 44625 44812 44811 44626 44812 44812 44812 44815 44811 44622 44812 44626 44811 44625"
## [56] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM6_ESM.xlsx Hsapiens 4 44625 44622 44621 44622"
## [57] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM6_ESM.xlsx Hsapiens 2 44622 44621"
## [58] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM6_ESM.xlsx Hsapiens 25 44621 44622 44622 44625 44621 44621 44621 44626 44626 44626 44621 44626 44621 44625 44622 44621 44622 44622 44622 44626 44621 44624 44622 44626 44631"
## [59] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM6_ESM.xlsx Hsapiens 7 44626 44622 44631 44621 44621 44622 44624"
## [60] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM6_ESM.xlsx Hsapiens 24 44621 44629 44626 44625 44626 44622 44621 44621 44621 44626 44626 44621 44622 44629 44625 44896 44626 44624 44626 44626 44626 44621 44631 44622"
## [61] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM6_ESM.xlsx Hsapiens 3 44622 44621 44631"
## [62] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM6_ESM.xlsx Hsapiens 31 44625 44622 44621 44621 44621 44629 44621 44621 44626 44621 44621 44621 44622 44629 44621 44622 44626 44621 44626 44621 44622 44621 44621 44626 44626 44626 44626 44622 44625 44625 44621"
## [63] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM5_ESM.xlsx Hsapiens 6 44807 44621 44622 44819 44815 44626"
## [64] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM5_ESM.xlsx Hsapiens 107 44806 44621 44811 44811 44626 44626 44806 44814 44625 44813 44811 44627 44819 44819 44811 44626 44815 44628 44622 44622 44622 44811 44811 44811 44810 44810 44810 44810 44810 44627 44627 44627 44627 44815 44815 44815 44815 44815 44815 44815 44629 44626 44626 44626 44626 44812 44812 44812 44812 44812 44628 44806 44806 44806 44806 44806 44806 44623 44623 44819 44813 44813 44813 44813 44621 44814 44814 44814 44625 44808 44809 44807 44813 44813 44806 44627 44811 44815 44806 44819 44814 44814 44622 44622 44806 44819 44814 44625 44812 44813 44813 44621 44621 44627 44626 44626 44626 44806 44806 44806 44806 44806 44625 44622 44810 44815 44629"
## [65] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM5_ESM.xlsx Hsapiens 108 44811 44815 44806 44806 44625 44812 44623 44627 44626 44819 44625 44811 44623 44626 44815 44806 44813 44813 44813 44811 44627 44622 44622 44622 44811 44811 44811 44810 44810 44810 44810 44627 44627 44815 44815 44815 44815 44626 44626 44812 44812 44628 44806 44806 44819 44819 44813 44813 44813 44813 44813 44621 44621 44814 44814 44814 44814 44625 44808 44809 44807 44806 44814 44806 44819 44810 44622 44622 44811 44806 44815 44627 44627 44627 44626 44626 44626 44806 44814 44814 44626 44806 44626 44806 44806 44819 44811 44812 44621 44815 44625 44815 44810 44815 44629 44626 44812 44806 44806 44806 44806 44621 44622 44629 44812 44628 44813 44815"
## [66] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM5_ESM.xlsx Hsapiens 24 44807 44622 44811 44622 44621 44811 44819 44812 44819 44811 44815 44625 44819 44815 44625 44812 44812 44626 44819 44815 44815 44622 44626 44625"
## [67] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM5_ESM.xlsx Hsapiens 8 44807 44624 44812 44621 44622 44815 44626 44622"
## [68] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM5_ESM.xlsx Hsapiens 35 44812 44811 44621 44815 44808 44629 44626 44811 44812 44815 44812 44819 44807 44622 44622 44626 44622 44815 44819 44812 44815 44815 44626 44626 44819 44622 44626 44625 44815 44811 44625 44815 44812 44621 44811"
## [69] "PMC10185564 PMC_DL/PMC10185564/supplementaryfiles/41467_2023_38140_MOESM5_ESM.xlsx Hsapiens 1 44807"
## [70] "PMC10185556 PMC_DL/PMC10185556/supplementaryfiles/41467_2023_38439_MOESM3_ESM.xlsx Hsapiens 1 40057"
## [71] "PMC10181825 PMC_DL/PMC10181825/supplementaryfiles/elife-86273-supp3.xlsx Mmusculus 5 44814 44805 44621 44813 44629"
## [72] "PMC10181825 PMC_DL/PMC10181825/supplementaryfiles/elife-86273-supp3.xlsx Mmusculus 3 44622 44812 44808"
## [73] "PMC10181825 PMC_DL/PMC10181825/supplementaryfiles/elife-86273-supp3.xlsx Mmusculus 1 44812"
## [74] "PMC10181825 PMC_DL/PMC10181825/supplementaryfiles/elife-86273-supp2.xlsx Mmusculus 8 44814 44805 44621 44813 44629 44622 44812 44808"
## [75] "PMC10181825 PMC_DL/PMC10181825/supplementaryfiles/elife-86273-supp2.xlsx Mmusculus 1 44812"
## [76] "PMC10164992 PMC_DL/PMC10164992/supplementaryfiles/Table_4.xlsx Hsapiens 2 43897 43900"
## [77] "PMC10164992 PMC_DL/PMC10164992/supplementaryfiles/Table_4.xlsx Hsapiens 2 43897 43900"
## [78] "PMC10164992 PMC_DL/PMC10164992/supplementaryfiles/Table_4.xlsx Hsapiens 1 43896"
## [79] "PMC10164992 PMC_DL/PMC10164992/supplementaryfiles/Table_4.xlsx Hsapiens 1 43896"
## [80] "PMC10164992 PMC_DL/PMC10164992/supplementaryfiles/Table_4.xlsx Hsapiens 1 43901"
## [81] "PMC10164992 PMC_DL/PMC10164992/supplementaryfiles/Table_4.xlsx Hsapiens 1 43901"
## [82] "PMC10156726 PMC_DL/PMC10156726/supplementaryfiles/41467_2023_37909_MOESM7_ESM.xlsx Drerio 1 44076"
## [83] "PMC10156726 PMC_DL/PMC10156726/supplementaryfiles/41467_2023_37909_MOESM10_ESM.xlsx Drerio 2 43892 44077"
## [84] "PMC10151398 zip/Table_S1_G3-2023-404112.xlsx Dmelanogaster 18 44806 44805 44809 44809 44808 44808 44808 44808 44806 44896 44808 44896 44896 44896 44808 44808 44896 44896"
## [85] "PMC10150886 PMC_DL/PMC10150886/supplementaryfiles/Table_1.XLSX Hsapiens 79 44995 44987 45178 44989 45178 45178 45178 45178 45178 45180 45178 44988 45178 45178 45178 44995 45178 45178 45178 45178 44988 45178 45174 45178 45178 44993 45178 45179 44994 44995 44995 45178 45178 44988 44993 45178 45178 45173 45180 44995 45180 45178 44995 45178 45178 45178 45178 45173 45179 45178 45178 45178 45180 44995 45178 45177 45173 44995 45178 45178 45178 44989 45178 45178 44986 44986 45184 45178 45178 45178 44989 45170 44989 45178 44996 45178 45179 45178 45183"
## [86] "PMC10150886 PMC_DL/PMC10150886/supplementaryfiles/Table_5.XLSX Hsapiens 1 44987"
## [87] "PMC10150886 PMC_DL/PMC10150886/supplementaryfiles/Table_5.XLSX Ggallus 38 45181 45181 45174 45180 44987 45178 45181 45178 44989 44993 44989 44994 44995 44995 45178 45178 44987 45261 44988 45179 45173 44995 45178 45179 44995 45178 45178 45178 45178 44989 44995 44988 45178 45179 45178 44993 44995 44995"
## [88] "PMC10150505 PMC_DL/PMC10150505/supplementaryfiles/13059_2023_2938_MOESM2_ESM.xlsx Mmusculus 25 44076 44088 43898 43891 43891 43896 43900 44081 44076 44078 43892 44077 43897 43892 44085 43893 43899 44075 44082 44084 43894 44080 44083 44079 43895"
## [89] "PMC10150242 PMC_DL/PMC10150242/supplementaryfiles/mcr-22-0635_supplementary_table_s3_suppst3.xlsx Hsapiens 4 40787 39326 37500 40057"
## [90] "PMC10150242 PMC_DL/PMC10150242/supplementaryfiles/mcr-22-0635_supplementary_table_s3_suppst3.xlsx Hsapiens 4 40787 39326 37500 40057"
## [91] "PMC10150242 PMC_DL/PMC10150242/supplementaryfiles/mcr-22-0635_supplementary_table_s3_suppst3.xlsx Hsapiens 5 37500 40787 39326 38412 40057"
## [92] "PMC10150242 PMC_DL/PMC10150242/supplementaryfiles/mcr-22-0635_supplementary_table_s3_suppst3.xlsx Hsapiens 4 37500 40787 39326 40057"
## [93] "PMC10150242 PMC_DL/PMC10150242/supplementaryfiles/mcr-22-0635_supplementary_table_s3_suppst3.xlsx Hsapiens 6 37500 40422 40787 39326 38412 40057"
## [94] "PMC10150242 PMC_DL/PMC10150242/supplementaryfiles/mcr-22-0635_supplementary_table_s3_suppst3.xlsx Hsapiens 4 39692 37500 39326 40057"
## [95] "PMC10150242 PMC_DL/PMC10150242/supplementaryfiles/mcr-22-0635_supplementary_table_s3_suppst3.xlsx Hsapiens 4 37500 40787 39326 40057"
## [96] "PMC10150242 PMC_DL/PMC10150242/supplementaryfiles/mcr-22-0635_supplementary_table_s3_suppst3.xlsx Hsapiens 4 37500 40787 39326 40057"
## [97] "PMC10150242 PMC_DL/PMC10150242/supplementaryfiles/mcr-22-0635_supplementary_table_s3_suppst3.xlsx Hsapiens 7 37500 40787 39326 40057 40422 39326 38412"
## [98] "PMC10150242 PMC_DL/PMC10150242/supplementaryfiles/mcr-22-0635_supplementary_table_s2_suppst2.xlsx Hsapiens 4 40238 40422 36951 36951"
## [99] "PMC10150242 PMC_DL/PMC10150242/supplementaryfiles/mcr-22-0635_supplementary_table_s2_suppst2.xlsx Hsapiens 6 40057 36951 36951 36951 36951 40603"
## [100] "PMC10150242 PMC_DL/PMC10150242/supplementaryfiles/mcr-22-0635_supplementary_table_s2_suppst2.xlsx Hsapiens 5 39508 40057 40057 36951 37226"
## [101] "PMC10150019 PMC_DL/PMC10150019/supplementaryfiles/Table1.XLSX Athaliana 5 44807 44840 44808 44653 44806"
## [102] "PMC10208567 zip/adg8156_Table_S2.xlsx Hsapiens 4 11-SEP 4-MAR 10-MAR 9-MAR"
## [103] "PMC10208567 zip/adg8156_Table_S2.xlsx Hsapiens 2 14-SEP 8-SEP"
## [104] "PMC10208567 zip/adg8156_Table_S2.xlsx Hsapiens 6 8-SEP 14-SEP 11-SEP 9-MAR 4-MAR 10-MAR"
## [105] "PMC10206066 PMC_DL/PMC10206066/supplementaryfiles/41540_2023_275_MOESM3_ESM.xlsx Hsapiens 5 39508 40422 39142 38412 37500"
## [106] "PMC10203218 PMC_DL/PMC10203218/supplementaryfiles/Table1.XLSX Mmusculus 11 44993 44996 44990 44994 44991 44986 44989 44987 44992 44988 44995"
## [107] "PMC10199099 PMC_DL/PMC10199099/supplementaryfiles/41467_2023_38543_MOESM7_ESM.xlsx Mmusculus 1 44987"
## [108] "PMC10199099 PMC_DL/PMC10199099/supplementaryfiles/41467_2023_38543_MOESM7_ESM.xlsx Mmusculus 1 44622"
## [109] "PMC10199099 PMC_DL/PMC10199099/supplementaryfiles/41467_2023_38543_MOESM7_ESM.xlsx Mmusculus 1 44622"
## [110] "PMC10199099 PMC_DL/PMC10199099/supplementaryfiles/41467_2023_38543_MOESM7_ESM.xlsx Mmusculus 1 44812"
## [111] "PMC10194222 PMC_DL/PMC10194222/supplementaryfiles/mmc6.xlsx Hsapiens 431 44809 44811 44812 44812 44812 44813 44813 44813 44813 44813 44813 44816 44811 44626 44626 44813 44813 44813 44813 44812 44812 44815 44622 44622 44811 44809 44626 44813 44810 44812 44622 44625 44622 44815 44807 44816 44807 44810 44623 44810 44812 44812 44812 44815 44816 44810 44810 44626 44816 44810 44625 44812 44815 44815 44815 44815 44815 44623 44812 44812 44812 44812 44812 44812 44812 44812 44811 44811 44811 44811 44816 44622 44807 44816 44809 44809 44628 44811 44811 44626 44626 44628 44811 44626 44628 44628 44628 44626 44628 44628 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44622 44622 44812 44812 44631 44813 44816 44812 44812 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44812 44622 44816 44814 44628 44814 44628 44814 44816 44814 44814 44814 44814 44814 44816 44622 44628 44816 44622 44816 44622 44622 44812 44812 44631 44813 44812 44812 44812 44622 44629 44628 44628 44622 44628 44622 44812 44816 44813 44812 44810 44812 44807 44816 44807 44810 44810 44810 44816 44816 44623 44812 44807 44816 44628 44813 44813 44813 44813 44813 44813 44628 44813 44813 44813 44813 44813 44628 44628 44628 44628 44628 44809 44622 44805 44623 44805 44622 44816 44816 44622 44622 44622 44622 44806 44806 44812 44813 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44806 44624 44812 44806 44806 44812 44812 44812 44812 44812 44807 44806 44806 44806 44806 44806 44806 44806 44813 44631 44622 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44622 44622 44814 44814 44814 44622 44814 44814 44814 44814 44814 44806 44809 44628 44813 44813 44813 44813 44813 44813 44816 44628 44813 44813 44813 44813 44813 44807 44624 44816 44628 44807 44623 44628 44628 44816 44816 44628 44628 44623 44623 44623 44623 44623 44816 44807 44816 44813 44813 44813 44813 44813 44813 44810 44629 44810 44810 44810 44624 44624 44813 44626 44626 44626 44626 44626 44626 44626 44813 44813 44813 44813 44813 44813 44813 44809 44626 44626 44626 44628 44626 44811 44811 44811 44811 44816 44816 44816 44816 44816 44625 44625 44631 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44626 44626 44626 44624 44626 44806 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44814 44806 44806 44814 44806 44622 44809 44812 44812 44812 44812 44812 44809 44812 44622 44624 44812 44812 44812 44622 44622 44812 44812 44812 44812 44812 44812 44812 44812 44812 44622 44622"
## [112] "PMC10188588 PMC_DL/PMC10188588/supplementaryfiles/41467_2023_37714_MOESM3_ESM.xlsx Hsapiens 1 44531"
## [113] "PMC10188112 PMC_DL/PMC10188112/supplementaryfiles/elife-82249-supp5.xlsx Drerio 1 44628"
## [114] "PMC10188112 PMC_DL/PMC10188112/supplementaryfiles/elife-82249-supp5.xlsx Drerio 20 44816 44629 44816 44621 44628 44814 44816 44819 44621 44814 44810 44621 44629 44816 44819 44814 44816 44816 44629 44628"
## [115] "PMC10183592 PMC_DL/PMC10183592/supplementaryfiles/Table_1.xlsx Hsapiens 2 44627 44806"
## [116] "PMC10183018 PMC_DL/PMC10183018/supplementaryfiles/41598_2023_34467_MOESM2_ESM.xlsx Hsapiens 1 42804"
## [117] "PMC10179737 zip/Supplementary_Table_S2.xlsx Hsapiens 2 44444 44266"
## [118] "PMC10179737 zip/Supplementary_Table_S2.xlsx Hsapiens 3 44257 44450 44261"
## [119] "PMC10174503 PMC_DL/PMC10174503/supplementaryfiles/pgen.1010566.s005.xlsx Mmusculus 17 44257 44263 44447 44259 44446 44441 44453 44260 44454 44262 44451 44258 44449 44444 44289 44265 44266"
## [120] "PMC10174503 PMC_DL/PMC10174503/supplementaryfiles/pgen.1010566.s006.xlsx Mmusculus 2 43711 43712"
## [121] "PMC10174503 PMC_DL/PMC10174503/supplementaryfiles/pgen.1010566.s006.xlsx Mmusculus 2 43711 43712"
## [122] "PMC10174322 PMC_DL/PMC10174322/supplementaryfiles/Table6.xls Hsapiens 1 44621"
## [123] "PMC10174322 PMC_DL/PMC10174322/supplementaryfiles/Table6.xls Hsapiens 30 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986 44986"
## [124] "PMC10174054 zip/The_differentially_expressed_genes_list_in_GSE58708_dataset.xls Hsapiens 2 2023/09/06 2023/09/01"
## [125] "PMC10170458 PMC_DL/PMC10170458/supplementaryfiles/41586_2023_6003_MOESM16_ESM.xlsx Hsapiens 23 37135 39326 37500 37500 39326 40787 39326 39692 37500 38961 39326 40787 40057 38961 37500 40057 40422 39692 40422 40057 39326 37500 39692"
## [126] "PMC10170458 PMC_DL/PMC10170458/supplementaryfiles/41586_2023_6003_MOESM16_ESM.xlsx Hsapiens 11 37500 37500 36951 37500 40787 39326 40057 39326 37316 42248 39326"
## [127] "PMC10170458 PMC_DL/PMC10170458/supplementaryfiles/41586_2023_6003_MOESM15_ESM.xlsx Hsapiens 2 37500 37135"
## [128] "PMC10170458 PMC_DL/PMC10170458/supplementaryfiles/41586_2023_6003_MOESM15_ESM.xlsx Hsapiens 3 36951 42248 37316"
## [129] "PMC10171335 PMC_DL/PMC10171335/supplementaryfiles/mmc6.xlsx Hsapiens 2 44622 44621"
## [130] "PMC10171335 PMC_DL/PMC10171335/supplementaryfiles/mmc6.xlsx Hsapiens 2 44626 44631"
## [131] "PMC10171335 PMC_DL/PMC10171335/supplementaryfiles/mmc6.xlsx Hsapiens 2 44622 44621"
## [132] "PMC10171335 PMC_DL/PMC10171335/supplementaryfiles/mmc6.xlsx Hsapiens 1 44813"
## [133] "PMC10171335 PMC_DL/PMC10171335/supplementaryfiles/mmc6.xlsx Hsapiens 2 44812 44623"
## [134] "PMC10171335 PMC_DL/PMC10171335/supplementaryfiles/mmc6.xlsx Hsapiens 2 44622 44621"
## [135] "PMC10171335 PMC_DL/PMC10171335/supplementaryfiles/mmc6.xlsx Hsapiens 1 44811"
## [136] "PMC10170761 PMC_DL/PMC10170761/supplementaryfiles/13072_2023_491_MOESM9_ESM.xlsx Hsapiens 1 44807"
## [137] "PMC10170086 PMC_DL/PMC10170086/supplementaryfiles/41598_2023_34519_MOESM6_ESM.xlsx Hsapiens 6 44896 44630 44621 44622 44623 44627"
## [138] "PMC10169700 zip/PNASNEXUS-PNASNEXUS-2022-00816-T-s02.xlsx Athaliana 1 44440"
## [139] "PMC10167087 PMC_DL/PMC10167087/supplementaryfiles/12672_2023_671_MOESM1_ESM.xlsx Hsapiens 1 44258"
## [140] "PMC10167087 PMC_DL/PMC10167087/supplementaryfiles/12672_2023_671_MOESM1_ESM.xlsx Hsapiens 7 44257 44257 44260 44264 44442 44444 44448"
## [141] "PMC10160042 PMC_DL/PMC10160042/supplementaryfiles/41398_2023_2449_MOESM1_ESM.xlsx Hsapiens 9 4-octyl itaconate 4-methylene-2-octyl-5-oxotetrahydrofuran-3-carboxylic acid 3-deoxy-2-octulosonic acid(2)-lipid A 10-decarbamoylmitomycin C"
## [142] "PMC10158796 PMC_DL/PMC10158796/supplementaryfiles/crc-22-0301-s04.xlsx Hsapiens 273 45184 45184 44990 45261 44990 45178 37316 44990 44990 44996 37316 37316 37316 44989 44993 44990 45184 37316 44986 45261 44995 44986 37316 45172 37316 45179 45181 44991 44990 44996 45175 44987 45261 44986 45173 45176 45172 44990 44986 45180 45183 45176 45183 45172 45171 44992 45172 45172 44987 44990 44987 37316 44986 45261 45183 45181 45178 44989 45170 45175 45176 45180 44986 45181 44988 45178 45170 45170 45170 44986 44996 44987 44991 44986 44986 44988 44995 44993 44993 45175 44986 44988 45179 44991 45179 44994 45177 45178 45178 44989 45261 44986 45178 45178 44986 45175 44987 44987 44995 44991 45177 44986 45181 45171 44987 45171 44986 45184 44994 44993 45180 37316 45172 45180 45170 44991 45171 44987 45179 45184 45176 45184 45172 44988 44995 45171 44989 45175 45183 45176 44995 44986 44987 44992 44996 45184 44988 44995 45261 44996 44986 44992 44995 44991 44992 45171 45177 45183 45181 45173 44987 45170 44993 44992 45175 45180 44991 45179 45261 45173 45175 45176 44987 44989 45261 45177 45183 37316 45184 44996 44993 45179 45177 45181 45176 44987 45183 44987 44989 45183 44988 44991 44990 45175 45177 45179 45184 44993 44987 45171 44989 44989 45179 45184 45180 45180 44995 44991 44992 44996 44992 44993 45172 45171 45183 44987 45183 45178 44993 44993 44986 45170 44987 44988 45177 44994 44990 45170 44986 44995 44992 44991 44987 44986 44995 44992 44986 45175 45178 45176 45181 45177 45171 44989 45173 44996 45178 45170 45172 44989 45177 44992 45261 45181 44996 45175 45181 45173 45179 45180 45177 45173 44988 44987 45172 44987 45170 45179 45173 45180 44988 45173 45171 45261 44996 44988 45180 45181 45173 44987 44994 45173 44994"
## [143] "PMC10158796 PMC_DL/PMC10158796/supplementaryfiles/crc-22-0301-s04.xlsx Hsapiens 273 44987 44995 45172 44991 45175 45178 45178 45170 45261 44986 44987 44989 44987 45184 45173 44996 44986 45173 45173 45184 45175 45261 44991 45172 45181 44987 37316 45175 45183 45175 45179 44989 45173 45173 37316 44987 45178 44995 45183 44994 44990 44996 37316 44991 44996 45175 37316 37316 45175 44987 45261 45261 44991 44986 45177 44995 44987 44988 37316 44992 44991 45178 45172 37316 45177 45170 45178 44991 45172 44990 45183 44996 45172 44991 45175 45176 37316 45178 44987 45170 45176 44988 45172 45181 44987 44986 45177 45180 45177 45171 45261 45179 45171 45170 44988 44992 45171 44993 45181 45176 45176 45180 44990 45181 45179 44996 44987 44989 44992 44987 44988 37316 45179 45171 44986 44989 44995 44993 45178 45181 45173 45183 44988 45261 45179 45261 44986 45181 45184 45170 45181 45171 45184 37316 45179 44993 44986 45181 44996 45177 45176 44986 44994 45181 44988 45178 44986 45172 45183 44992 45180 44986 45177 45180 45181 44993 44995 44990 45183 44992 44990 45180 45171 45176 45171 45176 44993 44987 45261 45171 45171 44987 44994 45176 45184 44992 44990 44992 44993 45177 45170 44988 44991 44994 45173 44990 45183 44986 44989 44986 45170 44986 45177 45175 44995 45177 45183 45184 45184 45183 45179 45184 44987 44989 44991 45179 45172 45180 45180 44995 44987 44992 44996 44992 44993 44987 45172 45184 45261 44987 44990 44986 45180 45261 44996 44988 44989 45184 45177 45170 44986 45180 44987 45178 44992 45175 45173 45183 44989 44986 44989 44986 44988 45172 44988 44989 44990 44987 44986 45180 44993 45173 44986 45170 44991 45170 44996 44995 45179 45178 44993 45173 44996 44987 44995 44990 44995 44986 45179 45175 44994 45171 44993"
## [144] "PMC10158796 PMC_DL/PMC10158796/supplementaryfiles/crc-22-0301-s04.xlsx Hsapiens 273 45178 44989 45261 45173 44986 44995 44990 44993 45261 45177 44987 44988 45181 45175 45176 45184 45177 45183 44991 44986 44996 45173 45184 45173 44992 45181 45177 44987 45178 44994 44990 45173 45261 44992 45183 45184 44987 44986 44987 45177 44995 45177 45172 45183 44995 45179 45176 45184 45176 44993 44987 44987 44992 45171 44992 44992 45175 37316 37316 44989 44989 45176 44991 45179 45179 45184 45180 45180 45172 45180 44986 45180 44995 44991 44996 45184 44992 44992 44993 45179 44987 45176 44987 45183 45171 45180 45180 44986 44989 44989 45171 45179 45180 45181 44991 45184 44991 44986 45176 37316 45261 45181 44992 44990 45175 44986 45179 44996 44987 37316 45172 45171 37316 45173 44987 45170 44988 45175 45179 44986 44990 44996 44986 44991 45181 45181 44987 45179 45184 44987 45173 44986 45177 45175 45170 37316 44991 45172 44994 44993 37316 45170 44990 44995 45172 44993 45175 45170 45181 45171 44987 45171 45178 44987 44990 45261 44995 44996 45183 44995 45170 45170 44988 45171 44996 45170 45171 44986 45175 44992 44986 44986 45181 45183 44995 44990 44989 45261 45177 45175 44992 37316 44993 44989 45178 44988 45178 44986 45181 44988 45180 44986 45178 45173 44987 44987 44986 45171 45170 45177 44989 44990 45183 44987 44988 44996 45178 45261 44988 45181 45172 45173 44986 45176 44991 45170 37316 45177 45172 45173 45176 44996 44988 44990 44990 45178 45172 45175 45172 45177 45184 45183 45175 44994 44988 45173 44995 45179 44991 44986 45179 44994 45178 44986 44989 44993 45171 44996 44991 44986 45172 44993 44989 45261 45178 44993 45183 44994 44987 44987 44988 45261 44987 45170 44996 45184 45180 44995 45183 44993 45261 37316 45180"
## [145] "PMC10158796 PMC_DL/PMC10158796/supplementaryfiles/crc-22-0301-s04.xlsx Hsapiens 273 45176 45171 45171 44995 45175 44995 45183 44986 45171 45177 44986 45175 45179 44995 44986 45175 44989 44986 45183 45172 45180 44995 45171 44987 45178 44995 45178 44993 44993 44987 45171 45177 44987 45181 45173 44988 37316 45173 37316 45178 37316 37316 37316 44996 44987 37316 45181 45170 37316 44986 44987 44990 45176 44994 44990 44990 45178 44991 45172 44990 45175 45179 44990 44986 45173 44990 45183 45178 45261 44991 45170 44986 44993 44990 45177 44991 45179 45181 45183 44995 45175 45178 37316 37316 44987 45177 45184 44990 44996 45180 45180 44987 44987 44992 45178 44992 45172 44991 45170 45179 44988 44986 45184 45177 44986 44986 44995 45261 44992 44996 45173 45178 45176 44992 45184 45181 45173 44988 44995 45175 44993 44987 45172 45183 44989 44989 45179 44989 45172 45181 45171 45177 44990 45180 45183 44992 45180 44996 44994 44996 45181 44991 44993 45183 45179 45184 45176 44993 44987 45171 44989 44989 45179 45184 45180 44987 44992 45180 44991 44992 44991 44996 44992 44993 44986 45170 45173 44987 45261 45181 44996 45172 45170 44991 44992 44996 45176 44994 44988 44986 44988 44991 45184 44987 45175 45180 45176 44987 44988 45177 45176 45171 44993 45172 44995 44986 44988 44988 45178 45181 44996 44989 45261 45173 44992 45170 44987 44989 45261 45175 44996 45171 45177 45170 44994 45261 37316 44986 44989 45177 44991 45184 44986 45170 45184 45261 44986 44993 45181 45183 44988 45176 45184 45170 45183 45177 44986 45261 45261 45172 44986 45175 44987 44993 44986 44990 45179 45173 45180 45178 44986 45170 45181 45172 44987 44988 44989 44987 45179 45179 45184 44994 45183 44987 45171 44995 45173 45172 45180 45261 45173 44987 45175"
## [146] "PMC10158796 PMC_DL/PMC10158796/supplementaryfiles/crc-22-0301-s05.xlsx Hsapiens 26 37316 44990 45176 44991 45183 45170 44992 45184 45178 45171 44994 45180 45179 44986 45181 45261 45175 45177 44989 44996 44995 44993 44987 45173 44988 45172"
## [147] "PMC10158796 PMC_DL/PMC10158796/supplementaryfiles/crc-22-0301-s05.xlsx Hsapiens 26 37316 45176 45171 44992 45181 45177 45179 45180 45170 44990 45184 44989 44995 45183 44987 44986 44994 44993 45261 44996 44988 44991 45173 45172 45178 45175"
## [148] "PMC10158796 PMC_DL/PMC10158796/supplementaryfiles/crc-22-0301-s05.xlsx Hsapiens 26 37316 44990 44992 44991 45176 45180 44989 44993 45179 44986 45173 44996 45181 45183 44995 45175 45184 45172 44994 45170 45261 45171 45178 44987 44988 45177"
## [149] "PMC10158796 PMC_DL/PMC10158796/supplementaryfiles/crc-22-0301-s05.xlsx Hsapiens 26 44992 45180 37316 45176 45171 45170 45184 45179 44991 45181 44989 44995 44993 44986 45172 44987 44996 45175 45183 44988 44990 45261 44994 45178 45173 45177"
## [150] "PMC10157938 PMC_DL/PMC10157938/supplementaryfiles/13046_2023_2671_MOESM6_ESM.xlsx Hsapiens 26 44815 44814 44810 44812 44811 44806 44627 44808 44813 44625 44623 44809 44819 44630 44631 44896 44626 44805 44629 44621 44622 44624 44628 44807 44818 44816"
## [151] "PMC10157938 PMC_DL/PMC10157938/supplementaryfiles/13046_2023_2671_MOESM5_ESM.xls Hsapiens 26 44084 44085 43895 43893 43897 44076 44089 44083 44082 43891 44081 44166 43896 43892 44080 43898 44088 43899 44078 44086 44077 44075 43900 44079 43901 43894"
## [152] "PMC10157938 PMC_DL/PMC10157938/supplementaryfiles/13046_2023_2671_MOESM4_ESM.xlsx Hsapiens 8 44628 44626 44621 44625 44629 44627 44622 44623"
## [153] "PMC10156747 PMC_DL/PMC10156747/supplementaryfiles/41419_2023_5827_MOESM6_ESM.xlsx Hsapiens 5 42248 42248 39142 40787 38777"
## [154] "PMC10154557 PMC_DL/PMC10154557/supplementaryfiles/Table_4.xlsx Hsapiens 34 45176 45184 45171 45176 45178 44991 45175 45170 45173 45170 45180 45178 44986 45184 45176 45175 45178 45184 45176 45171 45175 45171 45176 45171 45178 45175 45176 45184 45184 45170 45184 45180 44987 45184"
## [155] "PMC10154557 PMC_DL/PMC10154557/supplementaryfiles/Table_3.xlsx Hsapiens 34 44986 45176 45078 26177 45175 45170 45178 45184 33482 44996 45079 45180 26543 22525 45080 45081 44997 45082 45083 45084 45085 45181 22890 45086 44987 45087 44998 45088 45089 45171 26908 45090 45091 45182"
## [156] "PMC10154557 PMC_DL/PMC10154557/supplementaryfiles/Table_6.xlsx Hsapiens 3 44991 44987 45180"
## [157] "PMC10150489 PMC_DL/PMC10150489/supplementaryfiles/40104_2023_864_MOESM14_ESM.xlsx Hsapiens 1 44627"
## [158] "PMC10150489 PMC_DL/PMC10150489/supplementaryfiles/40104_2023_864_MOESM14_ESM.xlsx Hsapiens 2 44621 44627"
## [159] "PMC10150489 PMC_DL/PMC10150489/supplementaryfiles/40104_2023_864_MOESM14_ESM.xlsx Hsapiens 4 44627 44627 44626 44626"
## [160] "PMC10150489 PMC_DL/PMC10150489/supplementaryfiles/40104_2023_864_MOESM12_ESM.xlsx Hsapiens 52 44625 44621 44621 44628 44628 44628 44628 44628 44623 44621 44627 44627 44631 44626 44626 44621 44621 44624 44627 44627 44627 44627 44627 44627 44626 44626 44626 44626 44627 44627 44627 44627 44627 44627 44627 44627 44624 44622 44622 44623 44626 44626 44626 44626 44626 44626 44626 44621 44625 44627 44622 44626"
## [161] "PMC10150489 PMC_DL/PMC10150489/supplementaryfiles/40104_2023_864_MOESM12_ESM.xlsx Hsapiens 52 44625 44621 44621 44628 44628 44628 44628 44628 44623 44621 44627 44627 44631 44626 44626 44621 44621 44624 44627 44627 44627 44627 44627 44627 44626 44626 44626 44626 44627 44627 44627 44627 44627 44627 44627 44627 44624 44622 44622 44623 44626 44626 44626 44626 44626 44626 44626 44621 44625 44627 44622 44626"
Let’s investigate the errors in more detail.
# By species
SPECIES <- sapply(strsplit(ERROR_GENELISTS," "),"[[",3)
table(SPECIES)
## SPECIES
## Athaliana Celegans Dmelanogaster Drerio Ggallus
## 4 3 1 15 2
## Hsapiens Mmusculus
## 122 14
par(mar=c(5,12,4,2))
barplot(table(SPECIES),horiz=TRUE,las=1)
par(mar=c(5,5,4,2))
# Number of affected Excel files per paper
DIST <- table(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
DIST
##
## PMC10150019 PMC10150242 PMC10150489 PMC10150505 PMC10150886 PMC10151398
## 1 12 5 1 3 1
## PMC10154557 PMC10156726 PMC10156747 PMC10157938 PMC10158796 PMC10160042
## 3 2 1 3 8 1
## PMC10164992 PMC10167087 PMC10169700 PMC10170086 PMC10170458 PMC10170761
## 6 2 1 1 4 1
## PMC10171335 PMC10174054 PMC10174322 PMC10174503 PMC10179737 PMC10181825
## 7 1 2 3 2 5
## PMC10183018 PMC10183592 PMC10185556 PMC10185564 PMC10188112 PMC10188588
## 1 1 1 18 2 1
## PMC10191288 PMC10191432 PMC10194222 PMC10199099 PMC10199912 PMC10200853
## 1 2 1 4 1 1
## PMC10202655 PMC10203218 PMC10203276 PMC10203443 PMC10203873 PMC10206047
## 2 1 1 2 1 1
## PMC10206066 PMC10206494 PMC10208567 PMC10209063 PMC10209733 PMC10210411
## 1 1 3 7 2 9
## PMC10210621 PMC10214295 PMC10214836 PMC10216487 PMC10216680 PMC10217816
## 1 1 1 1 1 1
## PMC10218407 PMC10225086 PMC10225930 PMC10226651
## 8 3 2 1
summary(as.numeric(DIST))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 2.776 3.000 18.000
hist(DIST,main="Number of affected Excel files per paper")
# PMC Articles with the most errors
DIST_DF <- as.data.frame(DIST)
DIST_DF <- DIST_DF[order(-DIST_DF$Freq),,drop=FALSE]
head(DIST_DF,20)
## Var1 Freq
## 28 PMC10185564 18
## 2 PMC10150242 12
## 48 PMC10210411 9
## 11 PMC10158796 8
## 55 PMC10218407 8
## 19 PMC10171335 7
## 46 PMC10209063 7
## 13 PMC10164992 6
## 3 PMC10150489 5
## 24 PMC10181825 5
## 17 PMC10170458 4
## 34 PMC10199099 4
## 5 PMC10150886 3
## 7 PMC10154557 3
## 10 PMC10157938 3
## 22 PMC10174503 3
## 45 PMC10208567 3
## 56 PMC10225086 3
## 8 PMC10156726 2
## 14 PMC10167087 2
MOST_ERR_FILES = as.character(DIST_DF[1,1])
MOST_ERR_FILES
## [1] "PMC10185564"
# Number of errors per paper
NERR <- as.numeric(sapply(strsplit(ERROR_GENELISTS," "),"[[",4))
names(NERR) <- sapply(strsplit(ERROR_GENELISTS," "),"[[",1)
NERR <-tapply(NERR, names(NERR), sum)
NERR
## PMC10150019 PMC10150242 PMC10150489 PMC10150505 PMC10150886 PMC10151398
## 5 57 111 25 118 18
## PMC10154557 PMC10156726 PMC10156747 PMC10157938 PMC10158796 PMC10160042
## 71 3 5 60 1196 9
## PMC10164992 PMC10167087 PMC10169700 PMC10170086 PMC10170458 PMC10170761
## 8 8 1 6 39 1
## PMC10171335 PMC10174054 PMC10174322 PMC10174503 PMC10179737 PMC10181825
## 12 2 31 21 5 18
## PMC10183018 PMC10183592 PMC10185556 PMC10185564 PMC10188112 PMC10188588
## 1 2 1 440 21 1
## PMC10191288 PMC10191432 PMC10194222 PMC10199099 PMC10199912 PMC10200853
## 1 56 431 4 6 27
## PMC10202655 PMC10203218 PMC10203276 PMC10203443 PMC10203873 PMC10206047
## 108 11 5 3 2 1
## PMC10206066 PMC10206494 PMC10208567 PMC10209063 PMC10209733 PMC10210411
## 5 12 12 89 13 59
## PMC10210621 PMC10214295 PMC10214836 PMC10216487 PMC10216680 PMC10217816
## 1 3 1 26 1 1
## PMC10218407 PMC10225086 PMC10225930 PMC10226651
## 144 6 10 5
hist(NERR,main="number of errors per PMC article")
NERR_DF <- as.data.frame(NERR)
NERR_DF <- NERR_DF[order(-NERR_DF$NERR),,drop=FALSE]
head(NERR_DF,20)
## NERR
## PMC10158796 1196
## PMC10185564 440
## PMC10194222 431
## PMC10218407 144
## PMC10150886 118
## PMC10150489 111
## PMC10202655 108
## PMC10209063 89
## PMC10154557 71
## PMC10157938 60
## PMC10210411 59
## PMC10150242 57
## PMC10191432 56
## PMC10170458 39
## PMC10174322 31
## PMC10200853 27
## PMC10216487 26
## PMC10150505 25
## PMC10174503 21
## PMC10188112 21
MOST_ERR = rownames(NERR_DF)[1]
MOST_ERR
## [1] "PMC10158796"
GENELIST_ERROR_ARTICLES <- gsub("PMC","",GENELIST_ERROR_ARTICLES)
### JSON PARSING is more reliable than XML
ARTICLES <- esummary( GENELIST_ERROR_ARTICLES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA$result
ARTICLE_DATA <- ARTICLE_DATA[2:length(ARTICLE_DATA)]
JOURNALS <- unlist(lapply(ARTICLE_DATA,function(x) {x$fulljournalname} ))
JOURNALS_TABLE <- table(JOURNALS)
JOURNALS_TABLE <- JOURNALS_TABLE[order(-JOURNALS_TABLE)]
length(JOURNALS_TABLE)
## [1] 42
NUM_JOURNALS=length(JOURNALS_TABLE)
par(mar=c(5,25,4,2))
barplot(head(JOURNALS_TABLE,10), horiz=TRUE, las=1,
xlab="Articles with gene name errors in supp files",
main="Top journals this month")
Congrats to our Journal of the Month winner!
JOURNAL_WINNER <- names(head(JOURNALS_TABLE,1))
JOURNAL_WINNER
## [1] "Nature Communications"
There are two categories:
Paper with the most suplementary files affected by gene name errors (MOST_ERR_FILES)
Paper with the most gene names converted to dates (MOST_ERR)
Sometimes, one paper can win both categories. Congrats to our winners.
MOST_ERR_FILES <- gsub("PMC","",MOST_ERR_FILES)
ARTICLES <- esummary( MOST_ERR_FILES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA[2]
ARTICLE_DATA
## $result
## $result$uids
## [1] "10185564"
##
## $result$`10185564`
## $result$`10185564`$uid
## [1] "10185564"
##
## $result$`10185564`$pubdate
## [1] "2023 May 15"
##
## $result$`10185564`$epubdate
## [1] "2023 May 15"
##
## $result$`10185564`$printpubdate
## [1] ""
##
## $result$`10185564`$source
## [1] "Nat Commun"
##
## $result$`10185564`$authors
## name authtype
## 1 Ursini G Author
## 2 Di Carlo P Author
## 3 Mukherjee S Author
## 4 Chen Q Author
## 5 Han S Author
## 6 Kim J Author
## 7 Deyssenroth M Author
## 8 Marsit CJ Author
## 9 Chen J Author
## 10 Hao K Author
## 11 Punzi G Author
## 12 Weinberger DR Author
##
## $result$`10185564`$title
## [1] "Prioritization of potential causative genes for schizophrenia in placenta"
##
## $result$`10185564`$volume
## [1] "14"
##
## $result$`10185564`$issue
## [1] ""
##
## $result$`10185564`$pages
## [1] "2613"
##
## $result$`10185564`$articleids
## idtype value
## 1 pmid 37188697
## 2 doi 10.1038/s41467-023-38140-1
## 3 pmcid PMC10185564
##
## $result$`10185564`$fulljournalname
## [1] "Nature Communications"
##
## $result$`10185564`$sortdate
## [1] "2023/05/15 00:00"
##
## $result$`10185564`$pmclivedate
## [1] "2023/05/17"
MOST_ERR <- gsub("PMC","",MOST_ERR)
ARTICLE_DATA <- esummary(MOST_ERR,db = "pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLE_DATA,as= "parsed")
ARTICLE_DATA
## $header
## $header$type
## [1] "esummary"
##
## $header$version
## [1] "0.3"
##
##
## $result
## $result$uids
## [1] "10158796"
##
## $result$`10158796`
## $result$`10158796`$uid
## [1] "10158796"
##
## $result$`10158796`$pubdate
## [1] "2023 May 4"
##
## $result$`10158796`$epubdate
## [1] "2023 May 4"
##
## $result$`10158796`$printpubdate
## [1] ""
##
## $result$`10158796`$source
## [1] "Cancer Res Commun"
##
## $result$`10158796`$authors
## name authtype
## 1 Erasimus H Author
## 2 Kolnik V Author
## 3 Lacroix F Author
## 4 Sidhu S Author
## 5 D'Agostino S Author
## 6 Lemaitre O Author
## 7 Rohaut A Author
## 8 Sanchez I Author
## 9 Thill G Author
## 10 Didier M Author
## 11 Debussche L Author
## 12 Marcireau C Author
##
## $result$`10158796`$title
## [1] "Genome-wide CRISPR Screen Reveals RAB10 as a Synthetic Lethal Gene in Colorectal and Pancreatic Cancers Carrying SMAD4 Loss"
##
## $result$`10158796`$volume
## [1] "3"
##
## $result$`10158796`$issue
## [1] "5"
##
## $result$`10158796`$pages
## [1] "780-792"
##
## $result$`10158796`$articleids
## idtype value
## 1 pmid 0
## 2 doi 10.1158/2767-9764.CRC-22-0301
## 3 pmcid PMC10158796
##
## $result$`10158796`$fulljournalname
## [1] "Cancer Research Communications"
##
## $result$`10158796`$sortdate
## [1] "2023/05/04 00:00"
##
## $result$`10158796`$pmclivedate
## [1] "2023/05/05"
To plot the trend over the past 6-12 months.
url <- "https://ziemann-lab.net/public/gene_name_errors/"
doc <- htmlParse(url)
links <- xpathSApply(doc, "//a/@href")
links <- links[grep("html",links)]
listing <- htmlParse( getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE) )
listing <- xpathSApply(listing, "//a/@href")
listing <- listing[grep("html",listing)]
unlink("online_files/",recursive=TRUE)
dir.create("online_files")
sapply(listing, function(mylink) {
download.file(paste(url,mylink,sep=""),destfile=paste("online_files/",mylink,sep=""))
} )
## href href href href href href href href href href href href href href href href
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## href href href href href href href href href href href
## 0 0 0 0 0 0 0 0 0 0 0
myfilelist <- list.files("online_files/",full.names=TRUE)
trends <- sapply(myfilelist, function(myfilename) {
x <- readLines(myfilename)
# Num XL gene list articles
NUM_GENELIST_ARTICLES <- x[grep("NUM_GENELIST_ARTICLES",x)[3]+1]
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES," "),"[[",3)
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES,"<"),"[[",1)
NUM_GENELIST_ARTICLES <- as.numeric(NUM_GENELIST_ARTICLES)
# number of affected articles
NUM_ERROR_GENELIST_ARTICLES <- x[grep("NUM_ERROR_GENELIST_ARTICLES",x)[3]+1]
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES," "),"[[",3)
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES,"<"),"[[",1)
NUM_ERROR_GENELIST_ARTICLES <- as.numeric(NUM_ERROR_GENELIST_ARTICLES)
# Error proportion
ERROR_PROPORTION <- x[grep("ERROR_PROPORTION",x)[3]+1]
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION," "),"[[",3)
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION,"<"),"[[",1)
ERROR_PROPORTION <- as.numeric(ERROR_PROPORTION)
# number of journals
NUM_JOURNALS <- x[grep('JOURNALS_TABLE',x)[3]+1]
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS," "),"[[",3)
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS,"<"),"[[",1)
NUM_JOURNALS <- as.numeric(NUM_JOURNALS)
NUM_JOURNALS
res <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
return(res)
})
colnames(trends) <- sapply(strsplit(colnames(trends),"_"),"[[",3)
colnames(trends) <- gsub(".html","",colnames(trends))
trends <- as.data.frame(trends)
rownames(trends) <- c("NUM_GENELIST_ARTICLES","NUM_ERROR_GENELIST_ARTICLES","ERROR_PROPORTION","NUM_JOURNALS")
trends <- t(trends)
trends <- as.data.frame(trends)
CURRENT_RES <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
trends <- rbind(trends,CURRENT_RES)
paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
## [1] "2023-06"
rownames(trends)[nrow(trends)] <- paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
plot(trends$NUM_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with Excel gene lists per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_ERROR_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with gene name errors per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$ERROR_PROPORTION, xaxt = "n" , type="b" , main="Proportion of articles with Excel gene list affected by errors",
ylab="proportion", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_JOURNALS, xaxt = "n" , type="b" , main="Number of journals with affected articles",
ylab="number of journals", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
unlink("online_files/",recursive=TRUE)
Zeeberg, B.R., Riss, J., Kane, D.W. et al. Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5, 80 (2004). https://doi.org/10.1186/1471-2105-5-80
Ziemann, M., Eren, Y. & El-Osta, A. Gene name errors are widespread in the scientific literature. Genome Biol 17, 177 (2016). https://doi.org/10.1186/s13059-016-1044-7
sessionInfo()
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RCurl_1.98-1.12 readxl_1.4.2 reutils_0.2.3 xml2_1.3.4
## [5] jsonlite_1.8.4 XML_3.99-0.14
##
## loaded via a namespace (and not attached):
## [1] assertthat_0.2.1 digest_0.6.31 R6_2.5.1 fastmap_1.1.1
## [5] cellranger_1.1.0 xfun_0.39 cachem_1.0.7 knitr_1.42
## [9] htmltools_0.5.5 rmarkdown_2.21 bitops_1.0-7 cli_3.6.1
## [13] sass_0.4.5 jquerylib_0.1.4 compiler_4.3.0 highr_0.10
## [17] tools_4.3.0 evaluate_0.20 bslib_0.4.2 yaml_2.3.7
## [21] rlang_1.1.1