Source: https://github.com/markziemann/GeneNameErrors2020
View the reports: http://ziemann-lab.net/public/gene_name_errors/
Gene name errors result when data are imported improperly into MS Excel and other spreadsheet programs (Zeeberg et al, 2004). Certain gene names like MARCH3, SEPT2 and DEC1 are converted into date format. These errors are surprisingly common in supplementary data files in the field of genomics (Ziemann et al, 2016). This could be considered a small error because it only affects a small number of genes, however it is symptomtic of poor data processing methods. The purpose of this script is to identify gene name errors present in supplementary files of PubMed Central articles in the previous month.
library("XML")
library("jsonlite")
library("xml2")
library("reutils")
library("readxl")
library("RCurl")
Here I will be getting PubMed Central IDs for the previous month.
Start with figuring out the date to search PubMed Central.
CURRENT_MONTH=format(Sys.time(), "%m")
CURRENT_YEAR=format(Sys.time(), "%Y")
if (CURRENT_MONTH == "01") {
PREV_YEAR=as.character(as.numeric(format(Sys.time(), "%Y"))-1)
PREV_MONTH="12"
} else {
PREV_YEAR=CURRENT_YEAR
PREV_MONTH=as.character(as.numeric(format(Sys.time(), "%m"))-1)
}
DATE=paste(PREV_YEAR,"/",PREV_MONTH,sep="")
DATE
## [1] "2023/3"
Let’s see how many PMC IDs we have in the past month.
QUERY ='((genom*[Abstract]))'
ESEARCH_RES <- esearch(term=QUERY, db = "pmc", rettype = "uilist", retmode = "xml", retstart = 0,
retmax = 5000000, usehistory = TRUE, webenv = NULL, querykey = NULL, sort = NULL, field = NULL,
datetype = NULL, reldate = NULL,
mindate = paste(DATE,"/1",sep="") , maxdate = paste(DATE,"/31",sep=""))
pmc <- efetch(ESEARCH_RES,retmode="text",rettype="uilist",outfile="pmcids.txt")
## Retrieving UIDs 1 to 500
## Retrieving UIDs 501 to 1000
## Retrieving UIDs 1001 to 1500
## Retrieving UIDs 1501 to 2000
## Retrieving UIDs 2001 to 2500
## Retrieving UIDs 2501 to 3000
## Retrieving UIDs 3001 to 3500
pmc <- read.table(pmc)
pmc <- paste("PMC",pmc$V1,sep="")
NUM_ARTICLES=length(pmc)
NUM_ARTICLES
## [1] 3417
writeLines(pmc,con="pmc.txt")
Now run the bash script. Note that false positives can occur (~1.5%) and these results have not been verified by a human.
Here are some definitions:
NUM_XLS = Number of supplementary Excel files in this set of PMC articles.
NUM_XLS_ARTICLES = Number of articles matching the PubMed Central search which have supplementary Excel files.
GENELISTS = The gene lists found in the Excel files. Each Excel file is counted once even it has multiple gene lists.
NUM_GENELISTS = The number of Excel files with gene lists.
NUM_GENELIST_ARTICLES = The number of PMC articles with supplementary Excel gene lists.
ERROR_GENELISTS = Files suspected to contain gene name errors. The dates and five-digit numbers indicate transmogrified gene names.
NUM_ERROR_GENELISTS = Number of Excel gene lists with errors.
NUM_ERROR_GENELIST_ARTICLES = Number of articles with supplementary Excel gene name errors.
ERROR_PROPORTION = This is the proportion of articles with Excel gene lists that have errors.
system("./gene_names.sh pmc.txt")
results <- readLines("results.txt")
XLS <- results[grep("XLS",results,ignore.case=TRUE)]
NUM_XLS = length(XLS)
NUM_XLS
## [1] 4658
NUM_XLS_ARTICLES = length(unique(sapply(strsplit(XLS," "),"[[",1)))
NUM_XLS_ARTICLES
## [1] 887
GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>2]
#GENELISTS
NUM_GENELISTS <- length(unique(sapply(strsplit(GENELISTS," "),"[[",2)))
NUM_GENELISTS
## [1] 545
NUM_GENELIST_ARTICLES <- length(unique(sapply(strsplit(GENELISTS," "),"[[",1)))
NUM_GENELIST_ARTICLES
## [1] 293
ERROR_GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>3]
#ERROR_GENELISTS
NUM_ERROR_GENELISTS = length(ERROR_GENELISTS)
NUM_ERROR_GENELISTS
## [1] 260
GENELIST_ERROR_ARTICLES <- unique(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
GENELIST_ERROR_ARTICLES
## [1] "PMC10040137" "PMC10039465" "PMC10037080" "PMC10036921" "PMC10034058"
## [6] "PMC10033906" "PMC10033451" "PMC10028295" "PMC10028223" "PMC10027475"
## [11] "PMC10022859" "PMC10022277" "PMC10022194" "PMC10017747" "PMC10016690"
## [16] "PMC10016662" "PMC10010668" "PMC10009272" "PMC10008571" "PMC10005326"
## [21] "PMC10005268" "PMC10002577" "PMC10000440" "PMC10000302" "PMC9997660"
## [26] "PMC9984532" "PMC9983314" "PMC9981464" "PMC9981200" "PMC9980695"
## [31] "PMC9980301" "PMC9977441" "PMC9977313" "PMC9975319" "PMC9972295"
## [36] "PMC10045015" "PMC10043811" "PMC10041653" "PMC10035230" "PMC10033758"
## [41] "PMC10033677" "PMC10033585" "PMC10028083" "PMC10028036" "PMC10027880"
## [46] "PMC10025971" "PMC10025435" "PMC10025395" "PMC10022665" "PMC10020157"
## [51] "PMC10014956" "PMC10014730" "PMC10012531" "PMC10011742" "PMC10011429"
## [56] "PMC10011141" "PMC10011137" "PMC10011132" "PMC10010006" "PMC10008496"
## [61] "PMC10003012" "PMC10002421" "PMC9998611" "PMC9998435" "PMC9998083"
## [66] "PMC9997923" "PMC9996951" "PMC9995926" "PMC9995291" "PMC9982299"
## [71] "PMC9981717" "PMC9981215" "PMC9978639" "PMC9978323" "PMC9977725"
## [76] "PMC9977003" "PMC9975292" "PMC9975158" "PMC9972178"
NUM_ERROR_GENELIST_ARTICLES <- length(GENELIST_ERROR_ARTICLES)
NUM_ERROR_GENELIST_ARTICLES
## [1] 79
ERROR_PROPORTION = NUM_ERROR_GENELIST_ARTICLES / NUM_GENELIST_ARTICLES
ERROR_PROPORTION
## [1] 0.2696246
Here you can have a look at all the gene lists detected in the past month, as well as those with errors. The dates are obvious errors, these are commonly dates in September, March, December and October. The five-digit numbers represent dates as they are encoded in the Excel internal format. The five digit number is the number of days since 1900. If you were to take these numbers and put them into Excel and format the cells as dates, then these will also mostly map to dates in September, March, December and October.
#GENELISTS
ERROR_GENELISTS
## [1] "PMC10040137 PMC_DL/PMC10040137/supplementaryfiles/13148_2023_1442_MOESM2_ESM.xlsx Ggallus 33 44441 44441 44445 44441 44262 44442 44445 44258 44259 44261 44441 44448 44256 44263 44449 44263 44260 44454 44444 44443 44263 44442 44445 44450 44446 44258 44258 44443 44264 44531 44449 44443 44443"
## [2] "PMC10040137 PMC_DL/PMC10040137/supplementaryfiles/13148_2023_1442_MOESM2_ESM.xlsx Hsapiens 3 44257 44258 44440"
## [3] "PMC10039465 PMC_DL/PMC10039465/supplementaryfiles/NIHMS1877580-supplement-7.xlsx Hsapiens 4 7-Sep 3-Sep 2-Sep 11-Sep"
## [4] "PMC10037080 zip/SourceData_figures.xlsx Athaliana 8 45171 45170 45017 45017 45017 45171 45170 45017"
## [5] "PMC10037080 zip/SourceData_figures.xlsx Athaliana 4 44652 44806 44652 44652"
## [6] "PMC10037080 zip/SourceData_figures.xlsx Athaliana 8 45017 45171 45170 45017 45017 45171 45170 45017"
## [7] "PMC10037080 zip/SourceData_figures.xlsx Athaliana 8 45017 45171 45170 45017 45017 45017 45171 45170"
## [8] "PMC10037080 zip/SourceData_figures.xlsx Athaliana 8 45017 45171 45170 45171 45170 45017 45017 45017"
## [9] "PMC10036921 PMC_DL/PMC10036921/supplementaryfiles/Table_1.xlsx Hsapiens 1 44994"
## [10] "PMC10036921 PMC_DL/PMC10036921/supplementaryfiles/Table_1.xlsx Hsapiens 1 45176"
## [11] "PMC10034058 PMC_DL/PMC10034058/supplementaryfiles/Table4.XLSX Hsapiens 15 38200 37104 37834 40422 37500 40787 39692 39326 37469 41153 37135 38231 40057 38596 37865"
## [12] "PMC10033906 PMC_DL/PMC10033906/supplementaryfiles/41467_2023_37266_MOESM14_ESM.xlsx Hsapiens 17 44813 44806 44626 44808 44811 44805 44810 44622 44814 44627 44815 44625 44621 44628 44818 44629 44631"
## [13] "PMC10033906 PMC_DL/PMC10033906/supplementaryfiles/41467_2023_37266_MOESM3_ESM.xlsx Hsapiens 268 44622 44622 44622 44811 44622 44622 44812 44811 44806 44813 44813 44621 44626 44812 44811 44626 44813 44626 44815 44626 44626 44815 44626 44806 44813 44622 44806 44622 44625 44626 44806 44819 44808 44808 44808 44806 44806 44806 44806 44622 44813 44813 44806 44806 44806 44811 44806 44806 44806 44806 44806 44626 44626 44626 44896 44625 44806 44806 44806 44806 44806 44811 44813 44811 44813 44806 44622 44625 44813 44814 44811 44806 44813 44814 44815 44626 44811 44813 44622 44622 44813 44815 44622 44806 44813 44806 44813 44812 44806 44806 44806 44808 44806 44626 44813 44626 44813 44806 44813 44813 44813 44815 44806 44622 44622 44806 44806 44625 44813 44626 44810 44805 44622 44813 44808 44626 44626 44811 44811 44810 44810 44621 44805 44813 44813 44813 44621 44626 44628 44814 44811 44811 44806 44806 44806 44813 44814 44626 44818 44818 44626 44627 44805 44814 44813 44627 44813 44622 44626 44810 44808 44808 44629 44627 44626 44626 44625 44625 44625 44813 44625 44625 44625 44806 44805 44813 44813 44806 44806 44813 44806 44811 44813 44622 44813 44806 44806 44813 44808 44813 44813 44813 44813 44813 44627 44813 44627 44806 44815 44814 44814 44622 44813 44813 44813 44813 44806 44810 44810 44811 44806 44813 44814 44806 44626 44626 44811 44810 44626 44621 44626 44813 44806 44806 44806 44626 44806 44814 44814 44805 44621 44811 44622 44808 44806 44622 44626 44626 44813 44815 44813 44896 44815 44811 44813 44813 44813 44813 44813 44622 44806 44813 44813 44622 44814 44806 44814 44811 44811 44811 44813 44806 44813 44815 44813 44626 44811 44813 44813 44813 44622 44622 44622 44814 44814 44806 44815 44815"
## [14] "PMC10033906 PMC_DL/PMC10033906/supplementaryfiles/41467_2023_37266_MOESM15_ESM.xlsx Hsapiens 19 44813 44806 44811 44626 44814 44815 44622 44810 44808 44627 44896 44621 44805 44621 44622 44819 44628 44623 44818"
## [15] "PMC10033906 PMC_DL/PMC10033906/supplementaryfiles/41467_2023_37266_MOESM4_ESM.xlsx Hsapiens 47 44819 44810 44626 44813 44815 44815 44631 44631 44622 44628 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44627 44815 44623 44818 44896 44896 44896 44896 44896 44813 44813 44813 44819 44819 44815 44813 44813 44813 44622 44622 44622 44814 44814 44806 44815 44815"
## [16] "PMC10033906 PMC_DL/PMC10033906/supplementaryfiles/41467_2023_37266_MOESM13_ESM.xlsx Hsapiens 13 44806 44813 44622 44811 44626 44808 44814 44815 44625 44896 44622 44812 44810"
## [17] "PMC10033906 PMC_DL/PMC10033906/supplementaryfiles/41467_2023_37266_MOESM6_ESM.xlsx Hsapiens 1 44819"
## [18] "PMC10033906 PMC_DL/PMC10033906/supplementaryfiles/41467_2023_37266_MOESM5_ESM.xlsx Hsapiens 5 44626 44626 44811 44621 44819"
## [19] "PMC10033451 PMC_DL/PMC10033451/supplementaryfiles/41591_2023_2221_MOESM3_ESM.xlsx Hsapiens 19 44815 44812 44626 44627 44629 44805 44811 44628 44622 44806 44625 44814 44813 44810 44622 44809 44621 44623 44621"
## [20] "PMC10033451 PMC_DL/PMC10033451/supplementaryfiles/41591_2023_2221_MOESM3_ESM.xlsx Hsapiens 19 44809 44622 44623 44621 44805 44815 44626 44810 44622 44621 44806 44628 44812 44811 44814 44629 44813 44625 44627"
## [21] "PMC10028295 PMC_DL/PMC10028295/supplementaryfiles/Table_4.xlsx Hsapiens 24 44986 44994 44990 44993 44992 44988 44991 44987 44991 44992 44994 44988 44987 44990 44993 44986 44991 44986 44987 44993 44988 44992 44990 44994"
## [22] "PMC10028295 PMC_DL/PMC10028295/supplementaryfiles/Table_4.xlsx Hsapiens 8 44992 44991 44988 44994 44990 44993 44986 44987"
## [23] "PMC10028295 PMC_DL/PMC10028295/supplementaryfiles/Table_4.xlsx Hsapiens 32 44992 44994 44991 44988 44986 44990 44987 44993 44991 44988 44992 44990 44993 44987 44986 44994 44994 44992 44990 44988 44991 44993 44986 44987 44994 44991 44992 44987 44986 44990 44993 44988"
## [24] "PMC10028295 PMC_DL/PMC10028295/supplementaryfiles/Table_4.xlsx Hsapiens 24 44991 44994 44990 44987 44988 44992 44993 44986 44990 44988 44992 44994 44991 44993 44987 44986 44991 44992 44990 44993 44988 44994 44987 44986"
## [25] "PMC10028295 PMC_DL/PMC10028295/supplementaryfiles/Table_4.xlsx Hsapiens 24 44991 44987 44990 44994 44988 44992 44993 44986 44990 44992 44988 44994 44993 44986 44991 44987 44991 44990 44992 44993 44987 44986 44988 44994"
## [26] "PMC10028295 PMC_DL/PMC10028295/supplementaryfiles/Table_4.xlsx Hsapiens 24 44988 44987 44991 44994 44992 44990 44993 44986 44994 44987 44992 44993 44988 44990 44991 44986 44987 44994 44988 44993 44990 44992 44991 44986"
## [27] "PMC10028295 PMC_DL/PMC10028295/supplementaryfiles/Table_4.xlsx Hsapiens 32 44994 44991 44992 44986 44987 44990 44988 44993 44991 44988 44992 44994 44990 44993 44987 44986 44994 44991 44990 44988 44992 44986 44987 44993 44991 44986 44994 44992 44988 44987 44993 44990"
## [28] "PMC10028295 PMC_DL/PMC10028295/supplementaryfiles/Table_4.xlsx Hsapiens 8 44991 44990 44992 44987 44986 44994 44993 44988"
## [29] "PMC10028295 PMC_DL/PMC10028295/supplementaryfiles/Table_2.xlsx Hsapiens 1 44991"
## [30] "PMC10028223 PMC_DL/PMC10028223/supplementaryfiles/Supplementary_Data3.xlsx Hsapiens 2 44621 44622"
## [31] "PMC10028223 PMC_DL/PMC10028223/supplementaryfiles/Supplementary_Data4.xlsx Hsapiens 3 44621 44622 44896"
## [32] "PMC10027475 PMC_DL/PMC10027475/supplementaryfiles/mmc2.xlsx Hsapiens 16 36951 39508 36951 39508 36951 36951 36951 36951 36951 36951 39508 36951 36951 36951 36951 39508"
## [33] "PMC10022859 zip/Supplementary_Table_1_Akbar_et_al.xlsx Hsapiens 11 37681 39326 37500 37316 40422 39692 41883 36951 38961 38412 39142"
## [34] "PMC10022277 PMC_DL/PMC10022277/supplementaryfiles/12943_2023_1734_MOESM7_ESM.xlsx Hsapiens 2 39326 41883"
## [35] "PMC10022277 PMC_DL/PMC10022277/supplementaryfiles/12943_2023_1734_MOESM7_ESM.xlsx Hsapiens 1 40422"
## [36] "PMC10022277 PMC_DL/PMC10022277/supplementaryfiles/12943_2023_1734_MOESM7_ESM.xlsx Hsapiens 6 40422 39692 38961 37135 38231 37865"
## [37] "PMC10022277 PMC_DL/PMC10022277/supplementaryfiles/12943_2023_1734_MOESM7_ESM.xlsx Hsapiens 13 40422 40787 40057 39692 42248 37865 38596 37500 39326 38231 37135 38961 41883"
## [38] "PMC10022194 PMC_DL/PMC10022194/supplementaryfiles/12957_2022_2836_MOESM1_ESM.xlsx Hsapiens 3 43900 44078 44080"
## [39] "PMC10022194 PMC_DL/PMC10022194/supplementaryfiles/12957_2022_2836_MOESM1_ESM.xlsx Hsapiens 1 44077"
## [40] "PMC10017747 PMC_DL/PMC10017747/supplementaryfiles/Table_1.xlsx Mmusculus 2 43166 43160"
## [41] "PMC10017747 PMC_DL/PMC10017747/supplementaryfiles/Table_1.xlsx Mmusculus 4 44621 44622 44626 44628"
## [42] "PMC10017747 PMC_DL/PMC10017747/supplementaryfiles/Table_1.xlsx Hsapiens 2 44621 44622"
## [43] "PMC10017747 PMC_DL/PMC10017747/supplementaryfiles/Table_1.xlsx Hsapiens 5 44808 44807 44809 44623 44630"
## [44] "PMC10016690 PMC_DL/PMC10016690/supplementaryfiles/Table_3.xlsx Hsapiens 3 44621 44809 44623"
## [45] "PMC10016690 PMC_DL/PMC10016690/supplementaryfiles/Table_2.xlsx Hsapiens 3 44621 44622 44812"
## [46] "PMC10016690 PMC_DL/PMC10016690/supplementaryfiles/Table_1.xlsx Hsapiens 3 44622 44621 44896"
## [47] "PMC10016662 PMC_DL/PMC10016662/supplementaryfiles/ppat.1011232.s013.xlsx Hsapiens 24 43898 43900 44089 44166 43892 44081 43897 43894 44076 44080 44085 43896 44079 44088 43893 43901 44078 44084 44083 43895 44082 43899 44086 44077"
## [48] "PMC10010668 PMC_DL/PMC10010668/supplementaryfiles/cir-147-881-s006.xlsx Mmusculus 67 44627 44627 44627 44627 44627 44627 44627 44627 44627 44627 44627 44627 44815 44815 44815 44815 44815 44815 44815 44805 44805 44811 44811 44811 44811 44811 44811 44812 44812 44812 44812 44812 44812 44808 44808 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44813 44631 44626 44626 44626 44626 44622 44622 44623 44623 44623 44623 44623 44623 44623 44623 44623 44623 44810 44810 44810 44810"
## [49] "PMC10009272 PMC_DL/PMC10009272/supplementaryfiles/Table5.XLSX Hsapiens 28 41699 41884 41885 41699 41888 41887 41704 41892 41706 41893 41701 41698 41703 41895 41889 41705 41883 41973 41707 41700 41882 41896 41708 41890 41886 41698 41891 41702"
## [50] "PMC10009272 PMC_DL/PMC10009272/supplementaryfiles/Table1.XLSX Hsapiens 28 44813 44806 44811 44812 44815 44814 44627 44819 44626 44625 44624 44628 44810 44622 44629 44809 44622 44623 44621 44818 44808 44805 44807 44621 44896 44630 44816 44631"
## [51] "PMC10009272 PMC_DL/PMC10009272/supplementaryfiles/Table6.XLSX Hsapiens 28 43892 44077 44078 43892 44081 44080 43897 44085 43899 44086 43894 43891 43896 44088 44082 43898 44076 44166 43900 43893 44075 44089 43901 44083 44079 43891 44084 43895"
## [52] "PMC10009272 PMC_DL/PMC10009272/supplementaryfiles/Table7.XLSX Hsapiens 28 43894 44085 44084 43899 43892 44082 44075 43897 43895 44081 44089 44079 44088 43891 43892 44083 44080 44166 43893 44077 43900 43896 43898 44076 44078 43891 44086 43901"
## [53] "PMC10008571 PMC_DL/PMC10008571/supplementaryfiles/41467_2023_36922_MOESM4_ESM.xls Hsapiens 6 44621 44630 44626 44627 44628 44810"
## [54] "PMC10005326 zip/Supp_Table_S3.xlsx Athaliana 2 44777 44777"
## [55] "PMC10005268 zip/nutrients-2184192-sup1/Table_S9.xlsx Mmusculus 1 11-Sep"
## [56] "PMC10002577 zip/Supp_Table_S3._Total_number_of_Single_Nucleotide_Variants_of_hepatic_lysosomal_enzyme_activities_and_GSLs_identified_by_GWAS_analysis_02.xlsx Mmusculus 2 38200 37135"
## [57] "PMC10002577 zip/Supp_Table_S3._Total_number_of_Single_Nucleotide_Variants_of_hepatic_lysosomal_enzyme_activities_and_GSLs_identified_by_GWAS_analysis_02.xlsx Mmusculus 1 40422"
## [58] "PMC10000440 zip/Table_S3.xlsx Ggallus 1 44622"
## [59] "PMC10000440 zip/Table_S3.xlsx Ggallus 1 44622"
## [60] "PMC10000440 zip/Table_S3.xlsx Ggallus 2 44622 44622"
## [61] "PMC10000440 zip/Table_S3.xlsx Ggallus 1 44621"
## [62] "PMC10000440 zip/Table_S2.xlsx Ggallus 4 44622 44622 44622 44622"
## [63] "PMC10000440 zip/Table_S2.xlsx Ggallus 1 44621"
## [64] "PMC10000302 PMC_DL/PMC10000302/supplementaryfiles/peerj-11-14913-s001.xlsx Mmusculus 22 44813 44621 44805 44630 44814 44631 44815 44622 44806 44623 44807 44624 44808 44625 44809 44626 44810 44627 44811 44628 44812 44629"
## [65] "PMC10000302 PMC_DL/PMC10000302/supplementaryfiles/peerj-11-14913-s003.xlsx Mmusculus 22 44808 44805 44630 44622 44631 44809 44627 44815 44812 44628 44811 44625 44810 44624 44629 44807 44626 44814 44621 44806 44623 44813"
## [66] "PMC9997660 PMC_DL/PMC9997660/supplementaryfiles/JCB_202208159_DataS1.xlsx Hsapiens 35 43534 43718 43529 43535 43528 43719 43722 43526 43527 43711 43800 43533 43712 43717 43716 43710 43714 43720 43713 43532 43717 43714 43718 43715 43715 43711 43713 43710 43723 43712 43716 43530 43720 43719 43531"
## [67] "PMC9997660 PMC_DL/PMC9997660/supplementaryfiles/JCB_202208159_DataS2.xls Hsapiens 27 2021/12/01 2021/03/01 2021/03/02 2021/03/01 2021/03/10 2021/03/11 2021/03/02 2021/03/03 2021/03/04 2021/03/05 2021/03/06 2021/03/07 2021/03/08 2021/03/09 2021/09/01 2021/09/10 2021/09/11 2021/09/12 2021/09/14 2021/09/02 2021/09/03 2021/09/04 2021/09/05 2021/09/06 2021/09/07 2021/09/08 2021/09/09"
## [68] "PMC9984532 PMC_DL/PMC9984532/supplementaryfiles/41467_2023_36866_MOESM7_ESM.xlsx Scerevisiae 2 37165 45413"
## [69] "PMC9984532 PMC_DL/PMC9984532/supplementaryfiles/41467_2023_36866_MOESM7_ESM.xlsx Scerevisiae 2 37165 45413"
## [70] "PMC9984532 PMC_DL/PMC9984532/supplementaryfiles/41467_2023_36866_MOESM7_ESM.xlsx Scerevisiae 2 37165 45413"
## [71] "PMC9983314 PMC_DL/PMC9983314/supplementaryfiles/JCMM-27-714-s001.xlsx Hsapiens 30 44896 44630 44630 44622 44622 44623 44623 44625 44626 44627 44628 44628 44819 44819 44805 44814 44815 44816 44806 44806 44806 44806 44808 44809 44809 44809 44811 44811 44813 44813"
## [72] "PMC9981464 PMC_DL/PMC9981464/supplementaryfiles/41564_2023_1326_MOESM3_ESM.xlsx Mmusculus 28 44446 44446 44447 44443 44260 44260 44263 44263 44263 44263 44441 44441 44262 44262 44454 44261 44445 44445 44450 44450 44448 44444 44444 44444 44444 44257 44257 44257"
## [73] "PMC9981464 PMC_DL/PMC9981464/supplementaryfiles/41564_2023_1326_MOESM3_ESM.xlsx Mmusculus 1 44260"
## [74] "PMC9981200 PMC_DL/PMC9981200/supplementaryfiles/crc-22-0037-s05.xlsx Hsapiens 25 44260 44448 44257 44266 44265 44256 44447 44256 44443 44262 44258 44444 44450 44259 44440 44263 44257 44441 44264 44445 44261 44451 44442 44446 44449"
## [75] "PMC9981200 PMC_DL/PMC9981200/supplementaryfiles/crc-22-0037-s06.xlsx Hsapiens 23 43164 43348 43160 43344 43168 43166 43354 43162 43345 43169 43351 43165 43163 43161 43352 43170 43353 43350 43346 43347 43349 43355 43167"
## [76] "PMC9980695 zip/Supplementary/SM_table_S1.xlsx Mmusculus 4 42064 42248 42256 42262"
## [77] "PMC9980301 PMC_DL/PMC9980301/supplementaryfiles/MOL2-17-499-s008.xlsx Hsapiens 5 44813 44813 44813 44813 44809"
## [78] "PMC9980301 PMC_DL/PMC9980301/supplementaryfiles/MOL2-17-499-s008.xlsx Hsapiens 2 44621 44808"
## [79] "PMC9980301 PMC_DL/PMC9980301/supplementaryfiles/MOL2-17-499-s014.xlsx Hsapiens 5 44813 44813 44813 44813 44809"
## [80] "PMC9980301 PMC_DL/PMC9980301/supplementaryfiles/MOL2-17-499-s005.xlsx Hsapiens 2 44621 44808"
## [81] "PMC9977441 PMC_DL/PMC9977441/supplementaryfiles/jciinsight-8-162409-s115.xlsx Hsapiens 3 38200 37104 37834"
## [82] "PMC9977441 PMC_DL/PMC9977441/supplementaryfiles/jciinsight-8-162409-s115.xlsx Hsapiens 4 38200 37469 37104 37834"
## [83] "PMC9977441 PMC_DL/PMC9977441/supplementaryfiles/jciinsight-8-162409-s115.xlsx Hsapiens 4 38200 37469 37104 37834"
## [84] "PMC9977441 PMC_DL/PMC9977441/supplementaryfiles/jciinsight-8-162409-s115.xlsx Hsapiens 3 38200 37469 37104"
## [85] "PMC9977441 PMC_DL/PMC9977441/supplementaryfiles/jciinsight-8-162409-s115.xlsx Hsapiens 4 37469 37834 37104 38200"
## [86] "PMC9977441 PMC_DL/PMC9977441/supplementaryfiles/jciinsight-8-162409-s115.xlsx Hsapiens 4 37469 37834 37104 38200"
## [87] "PMC9977441 PMC_DL/PMC9977441/supplementaryfiles/jciinsight-8-162409-s115.xlsx Hsapiens 4 37469 37834 37104 38200"
## [88] "PMC9977313 PMC_DL/PMC9977313/supplementaryfiles/jciinsight-8-150368-s052.xlsx Hsapiens 1 37681"
## [89] "PMC9977313 PMC_DL/PMC9977313/supplementaryfiles/jciinsight-8-150368-s053.xlsx Hsapiens 3 36951 36951 41883"
## [90] "PMC9975319 PMC_DL/PMC9975319/supplementaryfiles/mmc4.xlsx Hsapiens 14 43892 43892 43897 43899 43894 43891 43896 43898 44166 43900 43893 43901 43891 43895"
## [91] "PMC9975319 PMC_DL/PMC9975319/supplementaryfiles/mmc4.xlsx Hsapiens 10 43892 43892 43897 43899 43891 43896 43898 43893 43891 43895"
## [92] "PMC9972295 PMC_DL/PMC9972295/supplementaryfiles/Table9.XLSX Hsapiens 28 43900 43901 44080 44085 44085 43897 44083 43899 43899 43892 43894 44076 43892 44084 44078 43895 44079 43893 44075 44083 44078 43896 43898 44082 44081 44077 44088 43891"
## [93] "PMC10045015 zip/Table_S1._General_information_regarding_the_2507_human_genes_in_the_LCGene.xlsx Hsapiens 2 40057 38231"
## [94] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 12 44621 44623 44626 44624 44625 44896 44627 44631 44622 44630 44628 44629"
## [95] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 12 44626 44627 44621 44623 44628 44625 44629 44622 44631 44624 44630 44896"
## [96] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 12 44628 44622 44629 44627 44626 44624 44623 44630 44625 44621 44896 44631"
## [97] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 11 44621 44627 44626 44623 44630 44625 44629 44631 44624 44622 44628"
## [98] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 12 44624 44626 44627 44621 44622 44625 44628 44896 44631 44629 44630 44623"
## [99] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 12 44623 44621 44626 44624 44631 44896 44622 44628 44625 44627 44629 44630"
## [100] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 11 44621 44626 44627 44623 44622 44628 44625 44629 44630 44624 44631"
## [101] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 11 44626 44627 44625 44622 44623 44628 44630 44629 44624 44621 44896"
## [102] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 10 44623 44626 44625 44621 44627 44628 44629 44896 44624 44631"
## [103] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 10 44628 44623 44627 44629 44626 44621 44625 44896 44624 44631"
## [104] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 10 44623 44621 44896 44626 44624 44631 44625 44629 44627 44628"
## [105] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 9 44623 44621 44626 44627 44625 44628 44624 44629 44896"
## [106] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 10 44626 44624 44625 44628 44623 44896 44621 44627 44631 44629"
## [107] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 10 44629 44626 44621 44627 44631 44624 44628 44625 44623 44896"
## [108] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 9 44627 44623 44624 44625 44626 44896 44621 44629 44628"
## [109] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 10 44626 44896 44627 44621 44628 44631 44625 44624 44629 44623"
## [110] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 10 44621 44623 44627 44625 44626 44896 44624 44628 44631 44629"
## [111] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 9 44621 44623 44629 44625 44626 44628 44896 44624 44627"
## [112] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 10 44623 44627 44626 44629 44628 44621 44625 44896 44624 44631"
## [113] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 10 44626 44628 44627 44621 44629 44625 44631 44624 44896 44623"
## [114] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 9 44626 44629 44625 44623 44624 44896 44628 44627 44621"
## [115] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 12 44623 44621 44628 44626 44627 44896 44631 44630 44624 44629 44625 44622"
## [116] "PMC10043811 PMC_DL/PMC10043811/supplementaryfiles/mmc1.xlsx Hsapiens 11 44623 44627 44622 44621 44896 44628 44629 44630 44624 44625 44626"
## [117] "PMC10041653 PMC_DL/PMC10041653/supplementaryfiles/JCB_202205117_TableS1.xlsx Hsapiens 24 40787 38047 38412 37681 37865 37316 38777 41153 39692 37500 39873 38231 38596 40422 38961 40603 41883 40238 39508 39142 42248 37226 39326 40057"
## [118] "PMC10035230 PMC_DL/PMC10035230/supplementaryfiles/13148_2023_1463_MOESM7_ESM.xlsx Hsapiens 28 44815 44813 44813 44629 44813 44813 44813 44813 44813 44813 44813 44623 44813 44813 44813 44813 44813 44815 44809 44813 44812 44621 44630 44813 44813 44806 44810 44630"
## [119] "PMC10033758 PMC_DL/PMC10033758/supplementaryfiles/Table2.XLSX Hsapiens 5 44806 44623 44815 44814 44812"
## [120] "PMC10033758 PMC_DL/PMC10033758/supplementaryfiles/Table2.XLSX Hsapiens 2 44812 44622"
## [121] "PMC10033758 PMC_DL/PMC10033758/supplementaryfiles/Table2.XLSX Hsapiens 6 44806 44625 44815 44805 44810 44626"
## [122] "PMC10033758 PMC_DL/PMC10033758/supplementaryfiles/Table2.XLSX Hsapiens 3 44625 44806 44813"
## [123] "PMC10033758 PMC_DL/PMC10033758/supplementaryfiles/Table1.XLSX Hsapiens 2 44623 44819"
## [124] "PMC10033758 PMC_DL/PMC10033758/supplementaryfiles/Table1.XLSX Hsapiens 2 44815 44810"
## [125] "PMC10033758 PMC_DL/PMC10033758/supplementaryfiles/Table1.XLSX Hsapiens 3 44810 44815 44621"
## [126] "PMC10033758 PMC_DL/PMC10033758/supplementaryfiles/Table1.XLSX Hsapiens 1 44621"
## [127] "PMC10033758 PMC_DL/PMC10033758/supplementaryfiles/Table1.XLSX Hsapiens 2 44623 44625"
## [128] "PMC10033677 PMC_DL/PMC10033677/supplementaryfiles/41598_2023_31701_MOESM2_ESM.xlsx Hsapiens 28 42248 37316 36951 40422 39142 38047 37500 40787 36951 38777 40603 37681 39692 39326 41883 37226 39508 38412 39873 41153 37135 37135 38231 40238 40057 37316 38596 37865"
## [129] "PMC10033585 PMC_DL/PMC10033585/supplementaryfiles/Table_1.xls Hsapiens 8 2023/12/01 2023/03/01 2023/03/02 2023/03/03 2023/03/05 2023/03/06 2023/03/07 2023/03/08"
## [130] "PMC10033585 PMC_DL/PMC10033585/supplementaryfiles/Table_5.xls Hsapiens 2 2022/03/08 2022/03/06"
## [131] "PMC10033585 PMC_DL/PMC10033585/supplementaryfiles/Table_4.xls Hsapiens 3 2022/03/03 2022/03/06 2022/03/09"
## [132] "PMC10033585 PMC_DL/PMC10033585/supplementaryfiles/Table_2.xls Hsapiens 7 2023/03/01 2023/03/02 2023/03/03 2023/03/05 2023/03/06 2023/03/07 2023/03/08"
## [133] "PMC10028083 PMC_DL/PMC10028083/supplementaryfiles/DataSheet_2.xlsx Hsapiens 7 44622 44628 44625 44814 44623 44815 44626"
## [134] "PMC10028083 PMC_DL/PMC10028083/supplementaryfiles/DataSheet_2.xlsx Hsapiens 2 44813 44810"
## [135] "PMC10028083 PMC_DL/PMC10028083/supplementaryfiles/DataSheet_2.xlsx Hsapiens 1 44809"
## [136] "PMC10028083 PMC_DL/PMC10028083/supplementaryfiles/DataSheet_2.xlsx Hsapiens 8 44623 44816 44628 44814 44631 44896 44624 44625"
## [137] "PMC10028083 PMC_DL/PMC10028083/supplementaryfiles/DataSheet_2.xlsx Hsapiens 7 44808 44806 44813 44622 44621 44805 44810"
## [138] "PMC10028036 PMC_DL/PMC10028036/supplementaryfiles/CAM4-12-6009-s003.xlsx Hsapiens 1 44083"
## [139] "PMC10027880 zip/Supplementary_Raw_data_8.xlsx Hsapiens 5 44258 44444 44256 44531 44442"
## [140] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 1 44805"
## [141] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 1 44630"
## [142] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 3 44805 44628 44809"
## [143] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 4 44808 44623 44818 44807"
## [144] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 1 44807"
## [145] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 2 44624 44807"
## [146] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 1 44807"
## [147] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 8 44818 44816 44631 44810 44624 44621 44808 44628"
## [148] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 2 44621 44631"
## [149] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 6 44805 44626 44814 44815 44621 44628"
## [150] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 2 44808 44628"
## [151] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 1 44623"
## [152] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 4 44630 44814 44807 44628"
## [153] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 2 44624 44623"
## [154] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc13.xlsx Hsapiens 3 44818 44629 44624"
## [155] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 1 44631"
## [156] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 1 44631"
## [157] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 1 44807"
## [158] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 1 44818"
## [159] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 1 44809"
## [160] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 3 44621 44818 44813"
## [161] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 1 44819"
## [162] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 1 44813"
## [163] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 2 44813 44812"
## [164] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 4 44810 44812 44807 44808"
## [165] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 11 44809 44816 44818 44807 44628 44813 44626 44819 44627 44631 44814"
## [166] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 3 44629 44621 44812"
## [167] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 2 44629 44621"
## [168] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 4 44810 44621 44631 44816"
## [169] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 4 44630 44624 44817 44818"
## [170] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 4 44819 44812 44811 44630"
## [171] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 3 44630 44819 44810"
## [172] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 12 44624 44814 44623 44621 44806 44809 44815 44817 44631 44626 44813 44811"
## [173] "PMC10025971 PMC_DL/PMC10025971/supplementaryfiles/mmc16.xlsx Hsapiens 7 44816 44896 44630 44813 44809 44624 44621"
## [174] "PMC10025435 PMC_DL/PMC10025435/supplementaryfiles/supplementary_data_1_bbad032.xlsx Hsapiens 3 44809 44813 44810"
## [175] "PMC10025395 PMC_DL/PMC10025395/supplementaryfiles/Table_4.xlsx Hsapiens 5 44080 43891 43892 44083 43896"
## [176] "PMC10022665 zip/Suppl_data_Kim_etal_2022_Supplemental_Tables.xlsx Hsapiens 12 44621 44622 44819 44814 44815 44806 44807 44809 44810 44811 44812 44813"
## [177] "PMC10020157 PMC_DL/PMC10020157/supplementaryfiles/41467_2023_37057_MOESM4_ESM.xlsx Mmusculus 6 37500 40787 39326 39692 42248 37135"
## [178] "PMC10014956 PMC_DL/PMC10014956/supplementaryfiles/42003_2023_4655_MOESM3_ESM.xlsx Mmusculus 4 44627 44631 44811 44628"
## [179] "PMC10014730 PMC_DL/PMC10014730/supplementaryfiles/Table2.XLSX Hsapiens 3 45178 45177 45173"
## [180] "PMC10012531 PMC_DL/PMC10012531/supplementaryfiles/12920_2023_1479_MOESM1_ESM.xlsx Hsapiens 19 44814 44815 44819 44812 44813 44810 44811 44808 44806 44622 44621 44628 44627 44626 44625 44623 44622 44621 44896"
## [181] "PMC10012531 PMC_DL/PMC10012531/supplementaryfiles/12920_2023_1479_MOESM1_ESM.xlsx Hsapiens 26 44814 44815 44816 44819 44812 44813 44810 44811 44805 44808 44806 44807 44631 44630 44622 44621 44629 44628 44627 44626 44625 44624 44623 44622 44621 44896"
## [182] "PMC10012531 PMC_DL/PMC10012531/supplementaryfiles/12920_2023_1479_MOESM1_ESM.xlsx Hsapiens 10 44629 44628 44627 44626 44625 44624 44623 44622 44621 44896"
## [183] "PMC10011742 PMC_DL/PMC10011742/supplementaryfiles/mmc2.xlsx Ggallus 2 44806 44810"
## [184] "PMC10011742 PMC_DL/PMC10011742/supplementaryfiles/mmc3.xlsx Hsapiens 19 44986 44986 44987 44988 44990 44991 44992 44993 44994 45184 45170 45179 45180 45171 45173 45175 45176 45177 45178"
## [185] "PMC10011429 PMC_DL/PMC10011429/supplementaryfiles/mmc1.xlsx Hsapiens 1 14977"
## [186] "PMC10011141 PMC_DL/PMC10011141/supplementaryfiles/41588_2023_1324_MOESM7_ESM.xlsx Mmusculus 10 44622 44623 44627 44811 44806 44806 44816 44623 44809 44627"
## [187] "PMC10011141 PMC_DL/PMC10011141/supplementaryfiles/41588_2023_1324_MOESM7_ESM.xlsx Mmusculus 41 44625 44621 44627 44815 44808 44630 44814 44807 44622 44809 44808 44813 44628 44629 44621 44628 44627 44812 44626 44808 44621 44807 44814 44626 44622 44813 44631 44809 44807 44623 44622 44816 44809 44624 44628 44808 44807 44628 44809 44626 44624"
## [188] "PMC10011141 PMC_DL/PMC10011141/supplementaryfiles/41588_2023_1324_MOESM7_ESM.xlsx Mmusculus 60 44628 44814 44627 44811 44809 44808 44629 44622 44626 44625 44811 44628 44814 44627 44811 44622 44625 44808 44808 44627 44628 44626 44625 44813 44808 44622 44621 44814 44621 44809 44814 44809 44621 44811 44625 44808 44816 44814 44622 44627 44815 44622 44622 44624 44811 44808 44628 44622 44808 44627 44816 44623 44811 44627 44630 44808 44811 44623 44809 44807"
## [189] "PMC10011141 PMC_DL/PMC10011141/supplementaryfiles/41588_2023_1324_MOESM7_ESM.xlsx Mmusculus 84 44809 44621 44621 44816 44809 44816 44808 44807 44816 44811 44628 44809 44622 44818 44628 44809 44813 44811 44809 44814 44809 44812 44627 44631 44623 44631 44626 44815 44621 44628 44625 44818 44631 44814 44808 44621 44622 44626 44818 44811 44631 44808 44627 44631 44812 44624 44809 44628 44808 44627 44624 44806 44630 44807 44628 44625 44630 44808 44809 44631 44809 44623 44622 44622 44628 44625 44621 44811 44812 44627 44621 44627 44621 44627 44626 44815 44625 44622 44811 44628 44630 44812 44811 44621"
## [190] "PMC10011141 PMC_DL/PMC10011141/supplementaryfiles/41588_2023_1324_MOESM7_ESM.xlsx Mmusculus 89 44808 44621 44623 44630 44630 44630 44621 44813 44811 44630 44621 44621 44621 44812 44624 44627 44623 44621 44628 44813 44630 44621 44813 44814 44624 44813 44630 44624 44624 44623 44621 44624 44631 44808 44621 44814 44624 44623 44810 44813 44621 44622 44814 44624 44814 44631 44814 44811 44627 44631 44813 44631 44631 44809 44813 44807 44628 44621 44628 44628 44813 44631 44626 44621 44624 44811 44811 44631 44631 44814 44806 44814 44811 44630 44811 44621 44631 44621 44626 44624 44628 44811 44809 44621 44631 44811 44622 44807 44813"
## [191] "PMC10011141 PMC_DL/PMC10011141/supplementaryfiles/41588_2023_1324_MOESM6_ESM.xlsx Mmusculus 1 44810"
## [192] "PMC10011137 PMC_DL/PMC10011137/supplementaryfiles/41588_2023_1314_MOESM4_ESM.xlsx Hsapiens 11 37316 36951 37500 37500 37681 40238 40057 37316 38596 38596 37865"
## [193] "PMC10011137 PMC_DL/PMC10011137/supplementaryfiles/41588_2023_1314_MOESM4_ESM.xlsx Hsapiens 2 37681 40057"
## [194] "PMC10011137 PMC_DL/PMC10011137/supplementaryfiles/41588_2023_1314_MOESM4_ESM.xlsx Hsapiens 2 37681 40057"
## [195] "PMC10011132 PMC_DL/PMC10011132/supplementaryfiles/41588_2023_1327_MOESM4_ESM.xlsx Hsapiens 59 39326 37500 39326 37500 40057 37500 39326 40057 36951 39326 37500 37500 40787 40057 37500 39508 39326 39326 37500 39326 39326 39326 39326 39326 40787 40057 37500 39508 39326 37500 39326 37500 37500 39508 39326 37500 37500 37500 40057 40057 37500 39326 40057 37500 39326 39326 37681 39326 39326 37500 40057 38777 37500 39326 39326 37316 39508 38412 36951"
## [196] "PMC10011132 PMC_DL/PMC10011132/supplementaryfiles/41588_2023_1327_MOESM4_ESM.xlsx Hsapiens 6 37500 36951 39508 37681 37316 38412"
## [197] "PMC10010006 PMC_DL/PMC10010006/supplementaryfiles/13287_2023_3259_MOESM6_ESM.xlsx Hsapiens 301 43895 43895 43895 43899 43895 43899 43895 43895 43895 43899 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43899 43895 43895 43896 43895 43901 43895 43896 43895 43895 43895 43895 43895 43901 43899 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43899 43893 43901 43899 43895 43895 43895 43895 43895 43895 43895 43895 43895 43893 43895 43895 43899 43895 43895 43899 43899 43895 43895 43895 43895 43901 43895 43895 43899 43895 43895 43895 43895 43899 43901 43899 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43893 43895 43895 43895 43895 43895 43895 43901 43899 43895 43899 43895 43895 43899 43895 43895 43895 43895 43895 43895 43893 43895 43895 43895 43901 43899 43895 43901 43895 43895 43895 43895 43899 43895 43895 43895 43899 43895 43899 43895 43893 43895 43899 43899 43895 43895 43895 43895 43895 43901 43899 43895 43895 43895 43895 43896 43895 43895 43893 43899 43901 43895 43899 43901 43895 43895 43893 43895 43895 43899 43895 43895 43895 43895 43899 43895 43895 43901 43895 43901 43895 43899 43895 43895 43899 43895 43895 43895 43895 43895 43893 43895 43899 43895 43895 43895 43899 43895 43895 43895 43899 43895 43895 43895 43901 43895 43901 43899 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43899 43895 43901 43895 43899 43895 43895 43895 43899 43899 43895 43895 43895 43899 43899 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43899 43895 43895 43895 43901 43899 43895 43895 43901 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43899 43895 43896 43895 43895 43895 43895 43895 43895 43895 43895 43901 43895 43895 43899 43895 43893 43895 43895 43895 43895 43895 43899 43895 43893 43895 43895"
## [198] "PMC10010006 PMC_DL/PMC10010006/supplementaryfiles/13287_2023_3259_MOESM6_ESM.xlsx Hsapiens 304 43895 43895 43895 43895 43895 43895 43899 43895 43899 43895 43895 43895 43895 43895 43901 43895 43895 43895 43895 43895 43895 43895 43896 43895 43895 43895 43895 43895 43895 43895 43895 43899 43901 43895 43895 43896 43895 43893 43895 43895 43899 43899 43899 43895 43895 43895 43895 43895 43895 43899 43895 43895 43895 43895 43895 43899 43895 43895 43895 43895 43895 43895 43895 43901 43895 43901 43895 43895 43895 43895 43901 43895 43895 43895 43895 43899 43895 43895 43895 43895 43895 43899 43893 43895 43901 43901 43895 43895 43899 43895 43895 43895 43895 43895 43895 43895 43895 43901 43895 43895 43895 43895 43895 43895 43899 43895 43895 43895 43895 43901 43895 43895 43895 43899 43895 43895 43901 43895 43895 43895 43895 43893 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43901 43895 43895 43895 43899 43895 43893 43901 43895 43899 43895 43895 43895 43895 43899 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43899 43895 43895 43895 43895 43895 43895 43893 43899 43895 43899 43895 43895 43895 43895 43895 43895 43895 43899 43895 43893 43899 43899 43895 43895 43895 43895 43895 43895 43899 43901 43901 43895 43895 43901 43895 43895 43895 43895 43893 43899 43895 43901 43895 43895 43895 43895 43895 43893 43895 43899 43901 43893 43895 43895 43901 43895 43895 43895 43895 43895 43899 43895 43896 43895 43895 43895 43895 43895 43895 43899 43895 43893 43895 43901 43895 43895 43895 43895 43895 43895 43895 43895 43899 43899 43901 43901 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43893 43895 43899 43895 43895 43899 43895 43899 43899 43899 43895 43895 43895 43895 43899 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43895 43901 43895 43895 43895 43901 43895 43895 43895"
## [199] "PMC10010006 PMC_DL/PMC10010006/supplementaryfiles/13287_2023_3259_MOESM2_ESM.xlsx Hsapiens 1 43895"
## [200] "PMC10008496 PMC_DL/PMC10008496/supplementaryfiles/aging-15-204552-s003.xlsx Hsapiens 2 44810 44805"
## [201] "PMC10003012 zip/Table_S5_Lists_of_mRNAs_correlated_across_cancer_types.xlsx Hsapiens 3 38231 36951 38961"
## [202] "PMC10002421 PMC_DL/PMC10002421/supplementaryfiles/Table_1.xls Hsapiens 2 2019/03/03 2019/03/02"
## [203] "PMC10002421 PMC_DL/PMC10002421/supplementaryfiles/Table_1.xls Hsapiens 1 2019/03/09"
## [204] "PMC10002421 PMC_DL/PMC10002421/supplementaryfiles/Table_6.xls Hsapiens 1 2019/03/03"
## [205] "PMC9998611 PMC_DL/PMC9998611/supplementaryfiles/41598_2023_30926_MOESM6_ESM.xlsx Hsapiens 27 42795 42980 42981 42795 42984 42983 42800 42988 42802 42989 42797 42794 42799 42991 42985 42801 42979 43069 42803 42796 42978 42804 42986 42982 42794 42987 42798"
## [206] "PMC9998611 PMC_DL/PMC9998611/supplementaryfiles/41598_2023_30926_MOESM10_ESM.xlsx Hsapiens 1 44265"
## [207] "PMC9998435 PMC_DL/PMC9998435/supplementaryfiles/41598_2023_31141_MOESM5_ESM.xlsx Hsapiens 7 36951 40422 37500 40057 39692 39142 38047"
## [208] "PMC9998435 PMC_DL/PMC9998435/supplementaryfiles/41598_2023_31141_MOESM5_ESM.xlsx Hsapiens 12 36951 38777 37865 37316 38231 39508 38961 39326 42248 40787 38412 39873"
## [209] "PMC9998435 PMC_DL/PMC9998435/supplementaryfiles/41598_2023_31141_MOESM5_ESM.xlsx Hsapiens 11 36951 37865 37316 40787 38777 39326 38961 42248 38231 39508 38412"
## [210] "PMC9998435 PMC_DL/PMC9998435/supplementaryfiles/41598_2023_31141_MOESM5_ESM.xlsx Hsapiens 8 36951 39692 40422 39142 40057 37500 38047 39873"
## [211] "PMC9998435 PMC_DL/PMC9998435/supplementaryfiles/41598_2023_31141_MOESM5_ESM.xlsx Hsapiens 4 39142 39692 38961 38047"
## [212] "PMC9998435 PMC_DL/PMC9998435/supplementaryfiles/41598_2023_31141_MOESM5_ESM.xlsx Hsapiens 12 36951 40787 37316 40422 38412 39873 40057 38777 37500 37865 39326 42248"
## [213] "PMC9998435 PMC_DL/PMC9998435/supplementaryfiles/41598_2023_31141_MOESM8_ESM.xlsx Hsapiens 14 37865 40787 40422 38047 39326 40057 39692 38412 39873 37500 37316 37681 39142 38961"
## [214] "PMC9998435 PMC_DL/PMC9998435/supplementaryfiles/41598_2023_31141_MOESM8_ESM.xlsx Hsapiens 10 42248 36951 39508 37135 36951 38231 37316 40238 40603 38777"
## [215] "PMC9998435 PMC_DL/PMC9998435/supplementaryfiles/41598_2023_31141_MOESM8_ESM.xlsx Hsapiens 9 38777 42248 36951 37681 39508 39142 38231 38961 40057"
## [216] "PMC9998435 PMC_DL/PMC9998435/supplementaryfiles/41598_2023_31141_MOESM8_ESM.xlsx Hsapiens 14 40422 40787 37135 37865 38047 39326 38412 39692 37500 40238 40603 36951 39873 37316"
## [217] "PMC9998435 PMC_DL/PMC9998435/supplementaryfiles/41598_2023_31141_MOESM8_ESM.xlsx Hsapiens 7 40422 39326 40787 37865 37316 39142 42248"
## [218] "PMC9998435 PMC_DL/PMC9998435/supplementaryfiles/41598_2023_31141_MOESM8_ESM.xlsx Hsapiens 5 40057 39873 38777 38412 37500"
## [219] "PMC9998435 PMC_DL/PMC9998435/supplementaryfiles/41598_2023_31141_MOESM8_ESM.xlsx Hsapiens 1 40787"
## [220] "PMC9998083 PMC_DL/PMC9998083/supplementaryfiles/elife-81097-supp1.xlsx Hsapiens 4 36892 36892 36892 36892"
## [221] "PMC9997923 PMC_DL/PMC9997923/supplementaryfiles/pone.0281061.s002.xlsx Hsapiens 1 44815"
## [222] "PMC9996951 PMC_DL/PMC9996951/supplementaryfiles/40659_2023_417_MOESM3_ESM.xlsx Hsapiens 4 40057 38596 40787 40787"
## [223] "PMC9996951 PMC_DL/PMC9996951/supplementaryfiles/40659_2023_417_MOESM3_ESM.xlsx Hsapiens 2 39692 41518"
## [224] "PMC9995926 PMC_DL/PMC9995926/supplementaryfiles/Data_Sheet_2.xlsx Hsapiens 5 7-Mar 8-Mar 1-Mar 6-Mar 3-Mar"
## [225] "PMC9995926 PMC_DL/PMC9995926/supplementaryfiles/Data_Sheet_2.xlsx Hsapiens 1 4-Mar"
## [226] "PMC9995291 PMC_DL/PMC9995291/supplementaryfiles/mmc4.xlsx Hsapiens 2 44621 44623"
## [227] "PMC9995291 PMC_DL/PMC9995291/supplementaryfiles/mmc7.xlsx Hsapiens 1 44623"
## [228] "PMC9995291 PMC_DL/PMC9995291/supplementaryfiles/mmc2.xlsx Hsapiens 5 44621 44623 44805 44815 44811"
## [229] "PMC9995291 PMC_DL/PMC9995291/supplementaryfiles/mmc2.xlsx Hsapiens 3 44621 44814 44811"
## [230] "PMC9995291 PMC_DL/PMC9995291/supplementaryfiles/mmc3.xlsx Ggallus 2 44621 44623"
## [231] "PMC9995291 PMC_DL/PMC9995291/supplementaryfiles/mmc3.xlsx Hsapiens 2 44628 44623"
## [232] "PMC9982299 PMC_DL/PMC9982299/supplementaryfiles/mmc2.xlsx Hsapiens 6 FOSL2-JUND FOSL2-JUNB FOSL2-JUN FOSL1-JUND FOSL1-JUN FOSL1-JUNB"
## [233] "PMC9982299 PMC_DL/PMC9982299/supplementaryfiles/mmc3.xlsx Hsapiens 15 43527 43525 43527 43527 43529 43527 43722 43723 43531 43532 43525 43533 43532 43532 43529"
## [234] "PMC9981717 PMC_DL/PMC9981717/supplementaryfiles/412_2023_787_MOESM1_ESM.xlsx Dmelanogaster 6 44443 44441 44440 1-Dec 44444 44441"
## [235] "PMC9981717 PMC_DL/PMC9981717/supplementaryfiles/412_2023_787_MOESM1_ESM.xlsx Dmelanogaster 5 38231 37500 37135 38596 37500"
## [236] "PMC9981717 PMC_DL/PMC9981717/supplementaryfiles/412_2023_787_MOESM4_ESM.xlsx Dmelanogaster 2 1-Sep 2-Sep"
## [237] "PMC9981717 PMC_DL/PMC9981717/supplementaryfiles/412_2023_787_MOESM4_ESM.xlsx Dmelanogaster 2 37135 37500"
## [238] "PMC9981717 PMC_DL/PMC9981717/supplementaryfiles/412_2023_787_MOESM4_ESM.xlsx Dmelanogaster 1 4-Sep"
## [239] "PMC9981717 PMC_DL/PMC9981717/supplementaryfiles/412_2023_787_MOESM4_ESM.xlsx Dmelanogaster 1 38231"
## [240] "PMC9981717 PMC_DL/PMC9981717/supplementaryfiles/412_2023_787_MOESM4_ESM.xlsx Dmelanogaster 1 1-Sep"
## [241] "PMC9981717 PMC_DL/PMC9981717/supplementaryfiles/412_2023_787_MOESM4_ESM.xlsx Dmelanogaster 1 37135"
## [242] "PMC9981717 PMC_DL/PMC9981717/supplementaryfiles/412_2023_787_MOESM4_ESM.xlsx Dmelanogaster 2 2-Sep 1-Sep"
## [243] "PMC9981717 PMC_DL/PMC9981717/supplementaryfiles/412_2023_787_MOESM4_ESM.xlsx Dmelanogaster 2 37500 37135"
## [244] "PMC9981717 PMC_DL/PMC9981717/supplementaryfiles/412_2023_787_MOESM3_ESM.xlsx Dmelanogaster 1 2-Sep"
## [245] "PMC9981717 PMC_DL/PMC9981717/supplementaryfiles/412_2023_787_MOESM3_ESM.xlsx Dmelanogaster 1 37500"
## [246] "PMC9981717 PMC_DL/PMC9981717/supplementaryfiles/412_2023_787_MOESM3_ESM.xlsx Dmelanogaster 1 2-Sep"
## [247] "PMC9981215 PMC_DL/PMC9981215/supplementaryfiles/crc-21-0194-s06.xlsx Hsapiens 3 44813 44809 44810"
## [248] "PMC9978639 PMC_DL/PMC9978639/supplementaryfiles/mmc11.xlsx Hsapiens 3 44811 44806 44810"
## [249] "PMC9978323 PMC_DL/PMC9978323/supplementaryfiles/mmc2.xlsx Hsapiens 14 36951 37316 36951 37316 38047 38412 39873 42248 41153 41883 37500 38596 39326 39692"
## [250] "PMC9977725 PMC_DL/PMC9977725/supplementaryfiles/41598_2023_29843_MOESM2_ESM.xlsx Hsapiens 6 44989 44992 44991 45170 45180 44991"
## [251] "PMC9977725 PMC_DL/PMC9977725/supplementaryfiles/41598_2023_29843_MOESM2_ESM.xlsx Hsapiens 1 45170"
## [252] "PMC9977725 PMC_DL/PMC9977725/supplementaryfiles/41598_2023_29843_MOESM2_ESM.xlsx Hsapiens 3 45170 44991 44991"
## [253] "PMC9977003 PMC_DL/PMC9977003/supplementaryfiles/pone.0280495.s002.xlsx Hsapiens 1 43891"
## [254] "PMC9975292 PMC_DL/PMC9975292/supplementaryfiles/mmc4.xlsx Hsapiens 3 44806 44811 44813"
## [255] "PMC9975158 zip/Table-/Table_S5._The_detailed_information_of_significantly_changed_m6A_peaks_among_the_TBI_group_and_TBI+Hypo_group.xlsx Mmusculus 1 44262"
## [256] "PMC9975158 zip/Table-/Table_S4._The_detailed_information_of_significantly_changed_m6A_peaks_after_TBI.xlsx Mmusculus 1 44988"
## [257] "PMC9975158 zip/Table-/Table_S7._The_detailed_information_of_conjoint_analysis_between_m6A_methylationm_and_RNA_expression_after_TBI.xlsx Rnorvegicus 1 44258"
## [258] "PMC9975158 zip/Table-/Table_S6._The_detailed_information_of_significantly_changed_mRNA_after_TBI.xlsx Mmusculus 1 44258"
## [259] "PMC9972178 PMC_DL/PMC9972178/supplementaryfiles/CAM4-12-4993-s002.xlsx Hsapiens 6 44257 44256 44449 44262 44259 44441"
## [260] "PMC9972178 PMC_DL/PMC9972178/supplementaryfiles/CAM4-12-4993-s002.xlsx Hsapiens 6 44257 44256 44449 44262 44259 44441"
Let’s investigate the errors in more detail.
# By species
SPECIES <- sapply(strsplit(ERROR_GENELISTS," "),"[[",3)
table(SPECIES)
## SPECIES
## Athaliana Dmelanogaster Ggallus Hsapiens Mmusculus
## 6 13 9 206 22
## Rnorvegicus Scerevisiae
## 1 3
par(mar=c(5,12,4,2))
barplot(table(SPECIES),horiz=TRUE,las=1)
par(mar=c(5,5,4,2))
# Number of affected Excel files per paper
DIST <- table(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
DIST
##
## PMC10000302 PMC10000440 PMC10002421 PMC10002577 PMC10003012 PMC10005268
## 2 6 3 2 1 1
## PMC10005326 PMC10008496 PMC10008571 PMC10009272 PMC10010006 PMC10010668
## 1 1 1 4 3 1
## PMC10011132 PMC10011137 PMC10011141 PMC10011429 PMC10011742 PMC10012531
## 2 3 6 1 2 3
## PMC10014730 PMC10014956 PMC10016662 PMC10016690 PMC10017747 PMC10020157
## 1 1 1 3 4 1
## PMC10022194 PMC10022277 PMC10022665 PMC10022859 PMC10025395 PMC10025435
## 2 4 1 1 1 1
## PMC10025971 PMC10027475 PMC10027880 PMC10028036 PMC10028083 PMC10028223
## 34 1 1 1 5 2
## PMC10028295 PMC10033451 PMC10033585 PMC10033677 PMC10033758 PMC10033906
## 9 2 4 1 9 7
## PMC10034058 PMC10035230 PMC10036921 PMC10037080 PMC10039465 PMC10040137
## 1 1 2 5 1 2
## PMC10041653 PMC10043811 PMC10045015 PMC9972178 PMC9972295 PMC9975158
## 1 23 1 2 1 4
## PMC9975292 PMC9975319 PMC9977003 PMC9977313 PMC9977441 PMC9977725
## 1 2 1 2 7 3
## PMC9978323 PMC9978639 PMC9980301 PMC9980695 PMC9981200 PMC9981215
## 1 1 4 1 2 1
## PMC9981464 PMC9981717 PMC9982299 PMC9983314 PMC9984532 PMC9995291
## 2 13 2 1 3 6
## PMC9995926 PMC9996951 PMC9997660 PMC9997923 PMC9998083 PMC9998435
## 2 2 2 1 1 13
## PMC9998611
## 2
summary(as.numeric(DIST))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 3.291 3.000 34.000
hist(DIST,main="Number of affected Excel files per paper")
# PMC Articles with the most errors
DIST_DF <- as.data.frame(DIST)
DIST_DF <- DIST_DF[order(-DIST_DF$Freq),,drop=FALSE]
head(DIST_DF,20)
## Var1 Freq
## 31 PMC10025971 34
## 50 PMC10043811 23
## 68 PMC9981717 13
## 78 PMC9998435 13
## 37 PMC10028295 9
## 41 PMC10033758 9
## 42 PMC10033906 7
## 59 PMC9977441 7
## 2 PMC10000440 6
## 15 PMC10011141 6
## 72 PMC9995291 6
## 35 PMC10028083 5
## 46 PMC10037080 5
## 10 PMC10009272 4
## 23 PMC10017747 4
## 26 PMC10022277 4
## 39 PMC10033585 4
## 54 PMC9975158 4
## 63 PMC9980301 4
## 3 PMC10002421 3
MOST_ERR_FILES = as.character(DIST_DF[1,1])
MOST_ERR_FILES
## [1] "PMC10025971"
# Number of errors per paper
NERR <- as.numeric(sapply(strsplit(ERROR_GENELISTS," "),"[[",4))
names(NERR) <- sapply(strsplit(ERROR_GENELISTS," "),"[[",1)
NERR <-tapply(NERR, names(NERR), sum)
NERR
## PMC10000302 PMC10000440 PMC10002421 PMC10002577 PMC10003012 PMC10005268
## 44 10 4 3 3 1
## PMC10005326 PMC10008496 PMC10008571 PMC10009272 PMC10010006 PMC10010668
## 2 2 6 112 606 67
## PMC10011132 PMC10011137 PMC10011141 PMC10011429 PMC10011742 PMC10012531
## 65 15 285 1 21 55
## PMC10014730 PMC10014956 PMC10016662 PMC10016690 PMC10017747 PMC10020157
## 3 4 24 9 13 6
## PMC10022194 PMC10022277 PMC10022665 PMC10022859 PMC10025395 PMC10025435
## 4 22 12 11 5 3
## PMC10025971 PMC10027475 PMC10027880 PMC10028036 PMC10028083 PMC10028223
## 107 16 5 1 25 5
## PMC10028295 PMC10033451 PMC10033585 PMC10033677 PMC10033758 PMC10033906
## 177 38 20 28 26 370
## PMC10034058 PMC10035230 PMC10036921 PMC10037080 PMC10039465 PMC10040137
## 15 28 2 36 4 36
## PMC10041653 PMC10043811 PMC10045015 PMC9972178 PMC9972295 PMC9975158
## 24 242 2 12 28 4
## PMC9975292 PMC9975319 PMC9977003 PMC9977313 PMC9977441 PMC9977725
## 3 24 1 4 26 10
## PMC9978323 PMC9978639 PMC9980301 PMC9980695 PMC9981200 PMC9981215
## 14 3 14 4 48 3
## PMC9981464 PMC9981717 PMC9982299 PMC9983314 PMC9984532 PMC9995291
## 29 26 21 30 6 15
## PMC9995926 PMC9996951 PMC9997660 PMC9997923 PMC9998083 PMC9998435
## 6 6 62 1 4 114
## PMC9998611
## 28
hist(NERR,main="number of errors per PMC article")
NERR_DF <- as.data.frame(NERR)
NERR_DF <- NERR_DF[order(-NERR_DF$NERR),,drop=FALSE]
head(NERR_DF,20)
## NERR
## PMC10010006 606
## PMC10033906 370
## PMC10011141 285
## PMC10043811 242
## PMC10028295 177
## PMC9998435 114
## PMC10009272 112
## PMC10025971 107
## PMC10010668 67
## PMC10011132 65
## PMC9997660 62
## PMC10012531 55
## PMC9981200 48
## PMC10000302 44
## PMC10033451 38
## PMC10037080 36
## PMC10040137 36
## PMC9983314 30
## PMC9981464 29
## PMC10033677 28
MOST_ERR = rownames(NERR_DF)[1]
MOST_ERR
## [1] "PMC10010006"
GENELIST_ERROR_ARTICLES <- gsub("PMC","",GENELIST_ERROR_ARTICLES)
### JSON PARSING is more reliable than XML
ARTICLES <- esummary( GENELIST_ERROR_ARTICLES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA$result
ARTICLE_DATA <- ARTICLE_DATA[2:length(ARTICLE_DATA)]
JOURNALS <- unlist(lapply(ARTICLE_DATA,function(x) {x$fulljournalname} ))
JOURNALS_TABLE <- table(JOURNALS)
JOURNALS_TABLE <- JOURNALS_TABLE[order(-JOURNALS_TABLE)]
length(JOURNALS_TABLE)
## [1] 53
NUM_JOURNALS=length(JOURNALS_TABLE)
par(mar=c(5,25,4,2))
barplot(head(JOURNALS_TABLE,10), horiz=TRUE, las=1,
xlab="Articles with gene name errors in supp files",
main="Top journals this month")
Congrats to our Journal of the Month winner!
JOURNAL_WINNER <- names(head(JOURNALS_TABLE,1))
JOURNAL_WINNER
## [1] "Frontiers in Immunology"
There are two categories:
Paper with the most suplementary files affected by gene name errors (MOST_ERR_FILES)
Paper with the most gene names converted to dates (MOST_ERR)
Sometimes, one paper can win both categories. Congrats to our winners.
MOST_ERR_FILES <- gsub("PMC","",MOST_ERR_FILES)
ARTICLES <- esummary( MOST_ERR_FILES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA[2]
ARTICLE_DATA
## $result
## $result$uids
## [1] "10025971"
##
## $result$`10025971`
## $result$`10025971`$uid
## [1] "10025971"
##
## $result$`10025971`$pubdate
## [1] "2023 Mar 2"
##
## $result$`10025971`$epubdate
## [1] "2023 Mar 2"
##
## $result$`10025971`$printpubdate
## [1] ""
##
## $result$`10025971`$source
## [1] "iScience"
##
## $result$`10025971`$authors
## name authtype
## 1 Li J Author
## 2 Li B Author
## 3 Zhao R Author
## 4 Li G Author
##
## $result$`10025971`$title
## [1] "Systematic analysis of the aberrances and functional implications of cuproptosis in cancer"
##
## $result$`10025971`$volume
## [1] "26"
##
## $result$`10025971`$issue
## [1] "4"
##
## $result$`10025971`$pages
## [1] "106319"
##
## $result$`10025971`$articleids
## idtype value
## 1 pmid 36950125
## 2 doi 10.1016/j.isci.2023.106319
## 3 pmcid PMC10025971
##
## $result$`10025971`$fulljournalname
## [1] "iScience"
##
## $result$`10025971`$sortdate
## [1] "2023/03/02 00:00"
##
## $result$`10025971`$pmclivedate
## [1] "2023/03/21"
MOST_ERR <- gsub("PMC","",MOST_ERR)
ARTICLE_DATA <- esummary(MOST_ERR,db = "pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLE_DATA,as= "parsed")
ARTICLE_DATA
## $header
## $header$type
## [1] "esummary"
##
## $header$version
## [1] "0.3"
##
##
## $result
## $result$uids
## [1] "10010006"
##
## $result$`10010006`
## $result$`10010006`$uid
## [1] "10010006"
##
## $result$`10010006`$pubdate
## [1] "2023 Mar 13"
##
## $result$`10010006`$epubdate
## [1] "2023 Mar 13"
##
## $result$`10010006`$printpubdate
## [1] ""
##
## $result$`10010006`$source
## [1] "Stem Cell Res Ther"
##
## $result$`10010006`$authors
## name authtype
## 1 Luo HT Author
## 2 He Q Author
## 3 Yang W Author
## 4 He F Author
## 5 Dong J Author
## 6 Hu CF Author
## 7 Yang XF Author
## 8 Li N Author
## 9 Li FR Author
##
## $result$`10010006`$title
## [1] "Single-cell analyses reveal distinct expression patterns and roles of long non-coding RNAs during hESC differentiation into pancreatic progenitors"
##
## $result$`10010006`$volume
## [1] "14"
##
## $result$`10010006`$issue
## [1] ""
##
## $result$`10010006`$pages
## [1] "38"
##
## $result$`10010006`$articleids
## idtype value
## 1 pmid 36907881
## 2 doi 10.1186/s13287-023-03259-x
## 3 pmcid PMC10010006
##
## $result$`10010006`$fulljournalname
## [1] "Stem Cell Research & Therapy"
##
## $result$`10010006`$sortdate
## [1] "2023/03/13 00:00"
##
## $result$`10010006`$pmclivedate
## [1] "2023/03/14"
To plot the trend over the past 6-12 months.
url <- "https://ziemann-lab.net/public/gene_name_errors/"
doc <- htmlParse(url)
links <- xpathSApply(doc, "//a/@href")
links <- links[grep("html",links)]
listing <- htmlParse( getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE) )
listing <- xpathSApply(listing, "//a/@href")
listing <- listing[grep("html",listing)]
unlink("online_files/",recursive=TRUE)
dir.create("online_files")
sapply(listing, function(mylink) {
download.file(paste(url,mylink,sep=""),destfile=paste("online_files/",mylink,sep=""))
} )
## href href href href href href href href href href href href href href href href
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## href href href href href href href href href
## 0 0 0 0 0 0 0 0 0
myfilelist <- list.files("online_files/",full.names=TRUE)
trends <- sapply(myfilelist, function(myfilename) {
x <- readLines(myfilename)
# Num XL gene list articles
NUM_GENELIST_ARTICLES <- x[grep("NUM_GENELIST_ARTICLES",x)[3]+1]
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES," "),"[[",3)
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES,"<"),"[[",1)
NUM_GENELIST_ARTICLES <- as.numeric(NUM_GENELIST_ARTICLES)
# number of affected articles
NUM_ERROR_GENELIST_ARTICLES <- x[grep("NUM_ERROR_GENELIST_ARTICLES",x)[3]+1]
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES," "),"[[",3)
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES,"<"),"[[",1)
NUM_ERROR_GENELIST_ARTICLES <- as.numeric(NUM_ERROR_GENELIST_ARTICLES)
# Error proportion
ERROR_PROPORTION <- x[grep("ERROR_PROPORTION",x)[3]+1]
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION," "),"[[",3)
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION,"<"),"[[",1)
ERROR_PROPORTION <- as.numeric(ERROR_PROPORTION)
# number of journals
NUM_JOURNALS <- x[grep('JOURNALS_TABLE',x)[3]+1]
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS," "),"[[",3)
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS,"<"),"[[",1)
NUM_JOURNALS <- as.numeric(NUM_JOURNALS)
NUM_JOURNALS
res <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
return(res)
})
colnames(trends) <- sapply(strsplit(colnames(trends),"_"),"[[",3)
colnames(trends) <- gsub(".html","",colnames(trends))
trends <- as.data.frame(trends)
rownames(trends) <- c("NUM_GENELIST_ARTICLES","NUM_ERROR_GENELIST_ARTICLES","ERROR_PROPORTION","NUM_JOURNALS")
trends <- t(trends)
trends <- as.data.frame(trends)
CURRENT_RES <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
trends <- rbind(trends,CURRENT_RES)
paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
## [1] "2023-04"
rownames(trends)[nrow(trends)] <- paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
plot(trends$NUM_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with Excel gene lists per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_ERROR_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with gene name errors per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$ERROR_PROPORTION, xaxt = "n" , type="b" , main="Proportion of articles with Excel gene list affected by errors",
ylab="proportion", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_JOURNALS, xaxt = "n" , type="b" , main="Number of journals with affected articles",
ylab="number of journals", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
unlink("online_files/",recursive=TRUE)
Zeeberg, B.R., Riss, J., Kane, D.W. et al. Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5, 80 (2004). https://doi.org/10.1186/1471-2105-5-80
Ziemann, M., Eren, Y. & El-Osta, A. Gene name errors are widespread in the scientific literature. Genome Biol 17, 177 (2016). https://doi.org/10.1186/s13059-016-1044-7
sessionInfo()
## R version 4.2.3 (2023-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RCurl_1.98-1.10 readxl_1.4.2 reutils_0.2.3 xml2_1.3.3
## [5] jsonlite_1.8.4 XML_3.99-0.13
##
## loaded via a namespace (and not attached):
## [1] assertthat_0.2.1 digest_0.6.31 bitops_1.0-7 cellranger_1.1.0
## [5] R6_2.5.1 evaluate_0.20 highr_0.10 rlang_1.1.0
## [9] cachem_1.0.7 cli_3.6.0 jquerylib_0.1.4 bslib_0.4.2
## [13] rmarkdown_2.20 tools_4.2.3 xfun_0.37 yaml_2.3.7
## [17] fastmap_1.1.1 compiler_4.2.3 htmltools_0.5.4 knitr_1.42
## [21] sass_0.4.5