Source: https://github.com/markziemann/GeneNameErrors2020
View the reports: http://ziemann-lab.net/public/gene_name_errors/
Gene name errors result when data are imported improperly into MS Excel and other spreadsheet programs (Zeeberg et al, 2004). Certain gene names like MARCH3, SEPT2 and DEC1 are converted into date format. These errors are surprisingly common in supplementary data files in the field of genomics (Ziemann et al, 2016). This could be considered a small error because it only affects a small number of genes, however it is symptomtic of poor data processing methods. The purpose of this script is to identify gene name errors present in supplementary files of PubMed Central articles in the previous month.
library("XML")
library("jsonlite")
library("xml2")
library("reutils")
library("readxl")
Here I will be getting PubMed Central IDs for the previous month.
Start with figuring out the date to search PubMed Central.
CURRENT_MONTH=format(Sys.time(), "%m")
CURRENT_YEAR=format(Sys.time(), "%Y")
if (CURRENT_MONTH == "01") {
PREV_YEAR=as.character(as.numeric(format(Sys.time(), "%Y"))-1)
PREV_MONTH="12"
} else {
PREV_YEAR=CURRENT_YEAR
PREV_MONTH=as.character(as.numeric(format(Sys.time(), "%m"))-1)
}
DATE=paste(PREV_YEAR,"/",PREV_MONTH,sep="")
DATE
## [1] "2021/6"
Let’s see how many PMC IDs we have in the past month.
QUERY ='((genom*[Abstract]))'
ESEARCH_RES <- esearch(term=QUERY, db = "pmc", rettype = "uilist", retmode = "xml", retstart = 0,
retmax = 5000000, usehistory = TRUE, webenv = NULL, querykey = NULL, sort = NULL, field = NULL,
datetype = NULL, reldate = NULL, mindate = DATE, maxdate = DATE)
pmc <- efetch(ESEARCH_RES,retmode="text",rettype="uilist",outfile="pmcids.txt")
## Retrieving UIDs 1 to 500
## Retrieving UIDs 501 to 1000
## Retrieving UIDs 1001 to 1500
## Retrieving UIDs 1501 to 2000
## Retrieving UIDs 2001 to 2500
## Retrieving UIDs 2501 to 3000
## Retrieving UIDs 3001 to 3500
pmc <- read.table(pmc)
pmc <- paste("PMC",pmc$V1,sep="")
NUM_ARTICLES=length(pmc)
NUM_ARTICLES
## [1] 3129
writeLines(pmc,con="pmc.txt")
Now run the bash script. Note that false positives can occur (~1.5%) and these results have not been verified by a human.
Here are some definitions:
NUM_XLS = Number of supplementary Excel files in this set of PMC articles.
NUM_XLS_ARTICLES = Number of articles matching the PubMed Central search which have supplementary Excel files.
GENELISTS = The gene lists found in the Excel files. Each Excel file is counted once even it has multiple gene lists.
NUM_GENELISTS = The number of Excel files with gene lists.
NUM_GENELIST_ARTICLES = The number of PMC articles with supplementary Excel gene lists.
ERROR_GENELISTS = Files suspected to contain gene name errors. The dates and five-digit numbers indicate transmogrified gene names.
NUM_ERROR_GENELISTS = Number of Excel gene lists with errors.
NUM_ERROR_GENELIST_ARTICLES = Number of articles with supplementary Excel gene name errors.
ERROR_PROPORTION = This is the proportion of articles with Excel gene lists that have errors.
system("./gene_names.sh pmc.txt")
results <- readLines("results.txt")
XLS <- results[grep("XLS",results,ignore.case=TRUE)]
NUM_XLS = length(XLS)
NUM_XLS
## [1] 3997
NUM_XLS_ARTICLES = length(unique(sapply(strsplit(XLS," "),"[[",1)))
NUM_XLS_ARTICLES
## [1] 685
GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>2]
#GENELISTS
NUM_GENELISTS <- length(unique(sapply(strsplit(GENELISTS," "),"[[",2)))
NUM_GENELISTS
## [1] 465
NUM_GENELIST_ARTICLES <- length(unique(sapply(strsplit(GENELISTS," "),"[[",1)))
NUM_GENELIST_ARTICLES
## [1] 241
ERROR_GENELISTS <- XLS[lapply(strsplit(XLS," "),length)>3]
#ERROR_GENELISTS
NUM_ERROR_GENELISTS = length(ERROR_GENELISTS)
NUM_ERROR_GENELISTS
## [1] 257
GENELIST_ERROR_ARTICLES <- unique(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
GENELIST_ERROR_ARTICLES
## [1] "PMC8225970" "PMC8222551" "PMC8237048" "PMC8236889" "PMC8236812"
## [6] "PMC8233618" "PMC8220000" "PMC8195742" "PMC8187656" "PMC8231655"
## [11] "PMC8226268" "PMC8226229" "PMC8209599" "PMC8187149" "PMC8214279"
## [16] "PMC8207177" "PMC8217872" "PMC8217514" "PMC8213770" "PMC8215672"
## [21] "PMC8210593" "PMC8196104" "PMC8195999" "PMC8192578" "PMC8191494"
## [26] "PMC8166871" "PMC8159740" "PMC8207764" "PMC8207585" "PMC8204412"
## [31] "PMC8163257" "PMC8201885" "PMC8195602" "PMC8192774" "PMC8191028"
## [36] "PMC8186478" "PMC8186232" "PMC8185950" "PMC8185105" "PMC8180056"
## [41] "PMC8176597" "PMC8175737" "PMC8175556" "PMC8208808" "PMC8172904"
## [46] "PMC8172568" "PMC8203102" "PMC8164820" "PMC8164247" "PMC8193354"
## [51] "PMC8193101" "PMC8163762" "PMC8161999" "PMC8189496" "PMC8168385"
## [56] "PMC8161968" "PMC8160133" "PMC8186667" "PMC8149827" "PMC8183685"
## [61] "PMC8181421" "PMC8144567" "PMC8144423" "PMC8144406" "PMC8140133"
## [66] "PMC8139961" "PMC8172126" "PMC8168535" "PMC8149808" "PMC8149807"
## [71] "PMC8166323" "PMC8166252" "PMC8128874" "PMC8191307" "PMC8178510"
## [76] "PMC8186902" "PMC8188494" "PMC8184692" "PMC8183939" "PMC8187225"
## [81] "PMC8183694" "PMC8185079" "PMC8167930" "PMC8148052" "PMC8159799"
## [86] "PMC8168789" "PMC8183600" "PMC8204688"
NUM_ERROR_GENELIST_ARTICLES <- length(GENELIST_ERROR_ARTICLES)
NUM_ERROR_GENELIST_ARTICLES
## [1] 88
ERROR_PROPORTION = NUM_ERROR_GENELIST_ARTICLES / NUM_GENELIST_ARTICLES
ERROR_PROPORTION
## [1] 0.3651452
Here you can have a look at all the gene lists detected in the past month, as well as those with errors. The dates are obvious errors, these are commonly dates in September, March, December and October. The five-digit numbers represent dates as they are encoded in the Excel internal format. The five digit number is the number of days since 1900. If you were to take these numbers and put them into Excel and format the cells as dates, then these will also mostly map to dates in September, March, December and October.
#GENELISTS
ERROR_GENELISTS
## [1] "PMC8225970 /pmc/articles/PMC8225970/bin/mmc2.xlsx Mmusculus 2 44082 43895"
## [2] "PMC8225970 /pmc/articles/PMC8225970/bin/mmc2.xlsx Mmusculus 2 44082 43895"
## [3] "PMC8225970 /pmc/articles/PMC8225970/bin/mmc2.xlsx Mmusculus 25 43534 43712 43525 43525 43714 43526 43723 43533 43529 43719 43531 43715 43526 43711 43710 43718 43716 43530 43717 43713 43535 43528 43532 43527 43709"
## [4] "PMC8222551 /pmc/articles/PMC8222551/bin/CNR2-4-e1335-s005.xlsx Hsapiens 15 43892 43891 44086 44078 43900 44083 44083 44083 43897 44077 43891 44080 43897 44079 44085"
## [5] "PMC8222551 /pmc/articles/PMC8222551/bin/CNR2-4-e1335-s006.xlsx Ggallus 16 43897 44079 44085 44080 43892 43891 44086 44078 43900 44083 44083 44083 43897 44077 43891 44080"
## [6] "PMC8237048 /pmc/articles/PMC8237048/bin/41436_2021_1243_MOESM2_ESM.xlsx Hsapiens 25 44257 44256 44449 44262 44259 44441 44450 44256 44261 44266 44258 44447 44446 44531 44263 44260 44264 44451 44440 44443 44265 44448 44257 44444 44442"
## [7] "PMC8236889 /pmc/articles/PMC8236889/bin/Table_1.XLSX Hsapiens 1 43164"
## [8] "PMC8236889 /pmc/articles/PMC8236889/bin/Table_2.XLS Hsapiens 2 2021/03/08 2021/03/01"
## [9] "PMC8236812 /pmc/articles/PMC8236812/bin/Table_1.xlsx Hsapiens 1 44262"
## [10] "PMC8233618 /pmc/articles/PMC8233618/bin/13287_2021_2444_MOESM5_ESM.xlsx Hsapiens 2 43894 43891"
## [11] "PMC8220000 /pmc/articles/PMC8220000/bin/mmc1.xlsx Hsapiens 2 44441 44446"
## [12] "PMC8220000 /pmc/articles/PMC8220000/bin/mmc1.xlsx Hsapiens 2 44441 44446"
## [13] "PMC8195742 /pmc/articles/PMC8195742/bin/41594_2021_590_MOESM10_ESM.xlsx Mmusculus 2 38047 37500"
## [14] "PMC8195742 /pmc/articles/PMC8195742/bin/41594_2021_590_MOESM10_ESM.xlsx Mmusculus 13 40787 39142 39508 39692 37865 39873 39326 38961 37135 40057 40422 36951 37316"
## [15] "PMC8195742 /pmc/articles/PMC8195742/bin/41594_2021_590_MOESM10_ESM.xlsx Mmusculus 1 38047"
## [16] "PMC8195742 /pmc/articles/PMC8195742/bin/41594_2021_590_MOESM10_ESM.xlsx Mmusculus 4 38961 39873 40787 39142"
## [17] "PMC8187656 /pmc/articles/PMC8187656/bin/42003_2021_2227_MOESM4_ESM.xlsx Hsapiens 2 37500 39692"
## [18] "PMC8187656 /pmc/articles/PMC8187656/bin/42003_2021_2227_MOESM4_ESM.xlsx Hsapiens 1 39692"
## [19] "PMC8231655 zip/SupplementaryTable1_AllTEs_Koks.xlsx Hsapiens 235 37681 37226 40238 37226 41153 41153 38231 40603 40603 37226 40238 39692 39692 40603 41153 41153 37681 40238 40057 40603 40603 37316 37226 38047 38047 37865 37865 38231 38231 40422 40422 37865 40603 38047 37316 40422 37681 37316 40422 39873 39508 37226 36951 40603 40603 40422 40422 40422 40422 36951 36951 38596 38596 38596 41153 40422 38047 40057 40057 40057 40057 40057 37226 41883 41883 40603 37865 37226 39692 38231 36951 38231 39873 38412 38412 41883 41153 38047 39692 41153 37316 41883 37316 37316 37316 37316 36951 40603 39873 41883 41883 38047 38047 38047 39692 41153 41153 41153 40422 39508 37865 37681 41883 41883 40238 39692 41883 37681 37681 40057 40057 37316 38231 41153 41153 38961 38961 38961 38961 38961 38047 37316 37316 40603 40057 37865 40603 39692 40603 38047 38412 38412 41153 37865 40422 41153 38047 38961 38961 38961 38412 36951 37865 38231 40057 40057 37316 40422 38596 38596 38961 37865 40422 38047 41883 36951 36951 37226 37226 37316 39142 39142 37865 38596 39873 37681 37681 37681 37226 41883 41883 37226 37226 40057 38412 38412 38047 41883 39692 39692 40057 40057 40603 40422 37681 37316 37316 37316 37316 38961 39508 41153 41883 40422 36951 37681 39692 40057 38047 37316 37316 38596 36951 38231 38412 39326 38047 39873 38596 37865 37865 37865 37865 40422 39692 37226 40422 40422 40422 40057 40057 37135 39873 40057 40603 37226 37316 40422 40238 39692 38047 41153 39873 39873 39873"
## [20] "PMC8226268 /pmc/articles/PMC8226268/bin/Table_1.XLSX Hsapiens 3 44256 44257 44256"
## [21] "PMC8226229 /pmc/articles/PMC8226229/bin/DataSheet_2.xlsx Hsapiens 20 44257 44442 44443 44257 44446 44445 44262 44450 44264 44261 44447 44263 44441 44258 44266 44448 44444 44256 44449 44260"
## [22] "PMC8226229 /pmc/articles/PMC8226229/bin/DataSheet_2.xlsx Hsapiens 1 44531"
## [23] "PMC8209599 /pmc/articles/PMC8209599/bin/CAM4-10-4150-s005.xlsx Hsapiens 2 44440 44256"
## [24] "PMC8209599 /pmc/articles/PMC8209599/bin/CAM4-10-4150-s006.xlsx Hsapiens 1 44256"
## [25] "PMC8187149 /pmc/articles/PMC8187149/bin/41436_2021_1116_MOESM4_ESM.xlsx Hsapiens 1 43344"
## [26] "PMC8187149 /pmc/articles/PMC8187149/bin/41436_2021_1116_MOESM4_ESM.xlsx Hsapiens 1 43678"
## [27] "PMC8187149 /pmc/articles/PMC8187149/bin/41436_2021_1116_MOESM4_ESM.xlsx Hsapiens 1 43922"
## [28] "PMC8214279 /pmc/articles/PMC8214279/bin/12864_2021_7793_MOESM2_ESM.xlsx Hsapiens 1 39326"
## [29] "PMC8214279 /pmc/articles/PMC8214279/bin/12864_2021_7793_MOESM3_ESM.xlsx Hsapiens 1 39326"
## [30] "PMC8207177 /pmc/articles/PMC8207177/bin/mmc3.xlsx Hsapiens 1 38596"
## [31] "PMC8217872 /pmc/articles/PMC8217872/bin/Table1.XLSX Hsapiens 1 44075"
## [32] "PMC8217872 /pmc/articles/PMC8217872/bin/Table1.XLSX Hsapiens 1 44075"
## [33] "PMC8217872 /pmc/articles/PMC8217872/bin/Table1.XLSX Hsapiens 1 44075"
## [34] "PMC8217514 /pmc/articles/PMC8217514/bin/41598_2021_92332_MOESM2_ESM.xlsx Hsapiens 72 37135 38596 37500 41883 37865 38231 41883 41153 40422 41153 37865 37135 38231 41153 37500 41153 39326 37865 40787 38961 39692 38596 40422 41883 38231 37865 38596 41883 37500 39692 39326 40787 37135 40787 38596 41153 38961 39692 39692 37135 38596 37500 38596 40787 39326 37135 41883 40422 38231 38961 41883 38961 38231 41153 40787 38961 39692 37500 37865 37500 39692 37135 40787 37865 38231 40422 40422 38961 40422 39326 39326 39326"
## [35] "PMC8217514 /pmc/articles/PMC8217514/bin/41598_2021_92332_MOESM2_ESM.xlsx Hsapiens 12 39326 38231 38231 39326 38231 38231 39326 39326 39326 38231 38231 39326"
## [36] "PMC8217514 /pmc/articles/PMC8217514/bin/41598_2021_92332_MOESM2_ESM.xlsx Hsapiens 1 37500"
## [37] "PMC8213770 /pmc/articles/PMC8213770/bin/41598_2021_91690_MOESM8_ESM.xlsx Celegans 1 44470"
## [38] "PMC8213770 /pmc/articles/PMC8213770/bin/41598_2021_91690_MOESM8_ESM.xlsx Celegans 1 44471"
## [39] "PMC8213770 /pmc/articles/PMC8213770/bin/41598_2021_91690_MOESM8_ESM.xlsx Celegans 1 14346"
## [40] "PMC8213770 /pmc/articles/PMC8213770/bin/41598_2021_91690_MOESM8_ESM.xlsx Celegans 1 16378"
## [41] "PMC8213770 /pmc/articles/PMC8213770/bin/41598_2021_91690_MOESM8_ESM.xlsx Celegans 1 44287"
## [42] "PMC8213770 /pmc/articles/PMC8213770/bin/41598_2021_91690_MOESM8_ESM.xlsx Celegans 1 44440"
## [43] "PMC8213770 /pmc/articles/PMC8213770/bin/41598_2021_91690_MOESM9_ESM.xlsx Celegans 1 44470"
## [44] "PMC8213770 /pmc/articles/PMC8213770/bin/41598_2021_91690_MOESM9_ESM.xlsx Celegans 1 44471"
## [45] "PMC8213770 /pmc/articles/PMC8213770/bin/41598_2021_91690_MOESM9_ESM.xlsx Celegans 1 14346"
## [46] "PMC8213770 /pmc/articles/PMC8213770/bin/41598_2021_91690_MOESM9_ESM.xlsx Celegans 1 16378"
## [47] "PMC8213770 /pmc/articles/PMC8213770/bin/41598_2021_91690_MOESM9_ESM.xlsx Celegans 1 44287"
## [48] "PMC8213770 /pmc/articles/PMC8213770/bin/41598_2021_91690_MOESM9_ESM.xlsx Celegans 1 44440"
## [49] "PMC8215672 /pmc/articles/PMC8215672/bin/DataSheet_6.xlsx Hsapiens 1 43900"
## [50] "PMC8215672 /pmc/articles/PMC8215672/bin/DataSheet_6.xlsx Hsapiens 1 44265"
## [51] "PMC8215672 /pmc/articles/PMC8215672/bin/DataSheet_6.xlsx Hsapiens 2 44264 44453"
## [52] "PMC8210593 /pmc/articles/PMC8210593/bin/thnov11p7175s3.xlsx Hsapiens 30 44262 44262 44453 44453 44256 44256 44453 44453 44453 44453 44453 44262 44262 44260 44260 44449 44261 44261 44261 44261 44440 44453 44262 44262 44262 44262 44444 44444 44260 44265"
## [53] "PMC8196104 /pmc/articles/PMC8196104/bin/41467_2021_23681_MOESM5_ESM.xlsx Hsapiens 6 43900 43891 43900 43898 43901 43900"
## [54] "PMC8195999 /pmc/articles/PMC8195999/bin/41467_2021_23596_MOESM4_ESM.xlsx Mmusculus 4 37316 40422 39692 38596"
## [55] "PMC8192578 /pmc/articles/PMC8192578/bin/41467_2021_23472_MOESM4_ESM.xlsx Hsapiens 1 38231"
## [56] "PMC8192578 /pmc/articles/PMC8192578/bin/41467_2021_23472_MOESM4_ESM.xlsx Hsapiens 1 38596"
## [57] "PMC8191494 /pmc/articles/PMC8191494/bin/mmc2.xlsx Mmusculus 12 39326 39692 40422 38231 37865 41153 37500 41883 38961 40787 40057 38596"
## [58] "PMC8191494 /pmc/articles/PMC8191494/bin/mmc8.xlsx Mmusculus 3 40603 38047 37500"
## [59] "PMC8166871 /pmc/articles/PMC8166871/bin/41467_2021_23539_MOESM9_ESM.xlsx Hsapiens 22 44082 44077 44083 43897 43893 43891 44085 43899 43896 43891 43898 44078 44081 43892 44079 44086 43895 44075 44080 43900 44084 43892"
## [60] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM14_ESM.xlsx Hsapiens 2 43712 43531"
## [61] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM14_ESM.xlsx Hsapiens 2 43716 43719"
## [62] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM14_ESM.xlsx Hsapiens 11 43533 43717 43531 43711 43719 43719 43530 43714 43714 43716 43719"
## [63] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM14_ESM.xlsx Hsapiens 13 43717 43716 43714 43719 43531 43531 43719 43714 43711 43533 43712 43530 43719"
## [64] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM14_ESM.xlsx Hsapiens 9 43533 43717 43531 43711 43719 43719 43530 43714 43714"
## [65] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM17_ESM.xlsx Hsapiens 6 43710 43713 43525 43531 43712 43714"
## [66] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM17_ESM.xlsx Hsapiens 5 43716 43530 43533 43718 43719"
## [67] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM17_ESM.xlsx Hsapiens 3 43711 43715 43532"
## [68] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM17_ESM.xlsx Hsapiens 16 43711 43715 43710 43713 43713 43525 43531 43532 43714 43525 43531 43710 43712 43713 43714 43714"
## [69] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM17_ESM.xlsx Ggallus 18 43710 43716 43713 43525 43531 43710 43712 43713 43714 43714 43530 43533 43718 43718 43719 43719 43713 43713"
## [70] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM20_ESM.xlsx Hsapiens 27 43723 43715 43712 43719 43715 43715 43530 43712 43533 43530 43530 43531 43712 43714 43714 43530 43723 43525 43531 43531 43711 43712 43712 43712 43716 43716 43723"
## [71] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM20_ESM.xlsx Hsapiens 3 43712 43723 43723"
## [72] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM20_ESM.xlsx Hsapiens 1 43531"
## [73] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM20_ESM.xlsx Hsapiens 1 43531"
## [74] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM20_ESM.xlsx Ggallus 10 43723 43531 43530 43530 43530 43533 43712 43714 43714 43719"
## [75] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM20_ESM.xlsx Hsapiens 7 43530 43712 43712 43715 43715 43723 43715"
## [76] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM20_ESM.xlsx Hsapiens 7 43530 43712 43712 43715 43715 43723 43715"
## [77] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM20_ESM.xlsx Hsapiens 3 43712 43723 43525"
## [78] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM20_ESM.xlsx Hsapiens 3 43715 43716 43531"
## [79] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM20_ESM.xlsx Hsapiens 3 43715 43716 43531"
## [80] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM20_ESM.xlsx Hsapiens 1 43716"
## [81] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM20_ESM.xlsx Hsapiens 1 43711"
## [82] "PMC8159740 /pmc/articles/PMC8159740/bin/41380_2019_563_MOESM20_ESM.xlsx Hsapiens 1 43711"
## [83] "PMC8207764 /pmc/articles/PMC8207764/bin/12885_2021_8422_MOESM1_ESM.xlsx Hsapiens 3 43900 44077 44088"
## [84] "PMC8207764 /pmc/articles/PMC8207764/bin/12885_2021_8422_MOESM1_ESM.xlsx Hsapiens 2 43900 44088"
## [85] "PMC8207585 /pmc/articles/PMC8207585/bin/12920_2021_1011_MOESM1_ESM.xlsx Hsapiens 21 44085 44081 44081 44085 44081 43891 44081 43891 44081 44081 44085 44085 44085 44085 44078 44081 44083 44085 43896 43891 44085"
## [86] "PMC8204412 /pmc/articles/PMC8204412/bin/12885_2021_8470_MOESM2_ESM.xlsx Hsapiens 1 44083"
## [87] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 27 43900 44081 44075 43896 43898 43891 43901 44086 44076 44075 43892 44089 43894 43891 43897 43899 43893 44078 44077 44088 44084 43895 44083 44166 44085 43892 44082"
## [88] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 28 43895 44088 43898 44082 44084 43899 43892 44079 43900 43891 43892 43891 44076 43901 44086 44077 43897 43896 44166 43893 44085 44075 44075 44078 44083 44081 44089 43894"
## [89] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 27 43892 44082 44077 43897 44076 44085 43894 44075 43900 43891 44088 43896 44075 44084 43893 44086 43892 44081 43891 44078 43901 43898 44089 43899 43895 44166 44083"
## [90] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 27 43892 43897 43896 44078 44076 44075 44075 44082 43893 44089 43900 44081 43894 44085 43891 44077 44088 43891 44083 43901 43898 43892 44086 43895 44166 43899 44084"
## [91] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 27 44166 44081 44075 44082 43891 44075 43895 44085 44078 44077 44083 43891 44084 43894 44088 43897 43892 43899 43892 43901 43900 43898 44089 43896 43893 44076 44086"
## [92] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 28 44077 44088 44079 44081 44084 44082 44085 43898 44083 44086 43900 43895 43892 43896 43897 43894 43891 43899 43892 43901 44166 44075 43893 44076 44075 44078 44089 43891"
## [93] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 28 44086 44078 44079 43900 44077 44085 43901 44166 43897 44084 43892 44089 44075 43894 43899 43891 44083 43892 43895 44082 43898 44075 44076 43896 44081 43891 43893 44088"
## [94] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 28 43894 44088 44082 43891 43893 43898 44075 44083 43892 44075 43892 44078 43900 43899 43897 44077 44076 43895 43896 44086 44079 44081 43891 44166 43901 44084 44089 44085"
## [95] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 28 44083 43895 43892 43898 43897 44078 44084 44079 43892 44089 44088 43901 44081 43891 44086 44075 43899 43891 44076 44166 44085 43893 44075 43894 43896 43900 44082 44077"
## [96] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 27 43893 44082 44078 43891 43901 44089 43895 44084 43900 43892 43897 44083 44085 43898 44166 44088 43894 44077 44081 44075 44075 44076 43896 44086 43891 43899 43892"
## [97] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 28 44076 43897 43893 43892 44082 43895 44085 44086 43894 43891 44083 44166 44081 44075 43900 44079 44078 43896 43898 44088 43891 43899 43901 44089 44077 44084 43892 44075"
## [98] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 27 43895 43899 44077 43893 44085 43892 43892 43900 44076 44075 43896 44086 44075 44081 43897 43891 44166 44082 43891 44083 43898 43894 44088 43901 44089 44078 44084"
## [99] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 28 44082 44086 43891 44084 43896 44089 43892 44078 44077 44083 44081 44088 44075 44166 44075 43891 43898 43897 43895 44079 43893 43901 44085 44076 43899 43894 43892 43900"
## [100] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 28 44081 44084 43901 43891 43895 44079 44086 43899 43894 44166 43897 44089 43896 43892 44076 43891 44085 43900 43892 44083 44078 43893 44082 43898 44088 44077 44075 44075"
## [101] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 27 43892 44089 43899 44083 44076 43897 44081 43891 44075 43896 44166 43898 44075 44077 44082 44084 44078 43893 44085 43901 43900 44088 44086 43891 43895 43894 43892"
## [102] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 28 43898 43895 43891 44166 43893 44076 44077 44084 43897 44079 43899 44081 44075 43891 43900 43901 44086 43892 44088 44082 44078 44089 43896 43892 43894 44075 44083 44085"
## [103] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 27 43892 44084 43895 43896 43898 43900 44075 44088 43899 43894 44075 43893 44081 43901 43891 44078 44166 43891 44077 44082 43892 44089 44086 44076 43897 44085 44083"
## [104] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 27 44166 44078 43892 44081 44083 43900 43892 43891 44089 43895 43893 43894 44082 44086 43896 44084 43898 44085 44075 44088 43897 44075 43891 44076 44077 43899 43901"
## [105] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 28 43900 44078 43898 43899 43894 44079 44089 43896 43901 44081 44084 43892 44077 43893 43891 44075 44076 44083 44086 43895 44166 44088 43892 43891 43897 44085 44075 44082"
## [106] "PMC8163257 /pmc/articles/PMC8163257/bin/mmc3.xlsx Hsapiens 27 43900 44083 44076 44086 43901 43891 44081 43897 44089 44082 43894 44078 43892 44075 44075 43891 44077 43893 44088 43899 43895 44084 43898 44085 44166 43892 43896"
## [107] "PMC8201885 /pmc/articles/PMC8201885/bin/12920_2021_1010_MOESM6_ESM.xlsx Hsapiens 8 lnc-C6orf201-17-002 lnc-C6orf201-17-001 lnc-USP35-15-001 lnc-SAMD11-13-001 lnc-SAMD11-12-001 lnc-SAMD11-12-002 lnc-SAMD11-13-002 lnc-USP35-15-002"
## [108] "PMC8195602 /pmc/articles/PMC8195602/bin/elife-66222-fig1-data1.xlsx Hsapiens 11 43534 43532 43715 43720 43723 43527 43711 43530 43718 43800 43717"
## [109] "PMC8195602 /pmc/articles/PMC8195602/bin/elife-66222-fig7-data1.xlsx Mmusculus 26 44257 44260 44447 44258 44256 44443 44440 44264 44441 44450 44263 44449 44266 44453 44446 44444 44442 44259 44262 44256 44445 44454 44451 44448 44261 44257"
## [110] "PMC8192774 /pmc/articles/PMC8192774/bin/41598_2021_91934_MOESM2_ESM.xlsx Hsapiens 1 43892"
## [111] "PMC8191028 /pmc/articles/PMC8191028/bin/13100_2021_241_MOESM7_ESM.xlsx Hsapiens 12 44447 44450 44257 44448 44256 44444 44445 44259 44440 44264 44258 44265"
## [112] "PMC8186478 /pmc/articles/PMC8186478/bin/MSB-17-e9522-s001.xlsx Hsapiens 3 43899 44081 44076"
## [113] "PMC8186478 /pmc/articles/PMC8186478/bin/MSB-17-e9522-s010.xlsx Hsapiens 5 43345 43358 43164 43162 43166"
## [114] "PMC8186478 /pmc/articles/PMC8186478/bin/MSB-17-e9522-s011.xlsx Hsapiens 33 43352 43162 43168 43168 43165 43352 43163 43163 43165 43164 43163 43165 43167 43352 43167 43352 43162 43167 43168 43168 43344 43344 43352 43162 43164 43163 43344 43352 43352 43164 43352 43165 43352"
## [115] "PMC8186478 /pmc/articles/PMC8186478/bin/MSB-17-e9522-s011.xlsx Hsapiens 34 43354 43346 43357 43355 43355 43354 43166 43353 43345 43348 43353 43355 43345 43346 43353 43344 43166 43357 43348 43354 43166 43348 43357 43346 43344 43166 43166 43348 43344 43348 43346 43345 43345 43166"
## [116] "PMC8186232 /pmc/articles/PMC8186232/bin/13148_2021_1110_MOESM2_ESM.xlsx Hsapiens 1 44448"
## [117] "PMC8186232 /pmc/articles/PMC8186232/bin/13148_2021_1110_MOESM3_ESM.xlsx Hsapiens 4 44448 44444 44265 44265"
## [118] "PMC8185950 /pmc/articles/PMC8185950/bin/13059_2021_2374_MOESM12_ESM.xlsx Mmusculus 3 40422 37865 40057"
## [119] "PMC8185105 /pmc/articles/PMC8185105/bin/41523_2021_282_MOESM1_ESM.xlsx Hsapiens 2 38961 37104"
## [120] "PMC8180056 /pmc/articles/PMC8180056/bin/12859_2021_4185_MOESM4_ESM.xlsx Hsapiens 2 37226 39508"
## [121] "PMC8176597 /pmc/articles/PMC8176597/bin/12864_2021_7717_MOESM1_ESM.xlsx Hsapiens 5 43723 43715 43719 43717 43710"
## [122] "PMC8175737 /pmc/articles/PMC8175737/bin/41419_2021_3738_MOESM2_ESM.xls Hsapiens 31 37316 37865 38231 37316 39326 38961 39142 40787 39873 41153 38047 36951 38777 41883 39692 39508 37500 37226 40238 37681 37135 42248 40603 40057 38596 36951 40422 38412 40057 39508 37135"
## [123] "PMC8175737 /pmc/articles/PMC8175737/bin/41419_2021_3738_MOESM2_ESM.xls Hsapiens 4 37316 37681 38596 38961"
## [124] "PMC8175737 /pmc/articles/PMC8175737/bin/41419_2021_3738_MOESM2_ESM.xls Hsapiens 32 37316 36951 40422 39142 38047 40787 36951 38777 38777 40603 37681 39692 39326 39326 41883 39508 39873 41153 38231 40057 40057 40057 40057 40057 40057 40057 40057 40057 37316 37865 37865 38961"
## [125] "PMC8175737 /pmc/articles/PMC8175737/bin/41419_2021_3738_MOESM2_ESM.xls Hsapiens 9 37316 37865 39326 38961 39142 40787 39873 39508 37681"
## [126] "PMC8175737 /pmc/articles/PMC8175737/bin/41419_2021_3738_MOESM2_ESM.xls Rnorvegicus 9 37316 37865 39326 38961 39142 40787 39873 39508 37681"
## [127] "PMC8175556 /pmc/articles/PMC8175556/bin/42003_2021_2140_MOESM4_ESM.xlsx Hsapiens 2 44083 44089"
## [128] "PMC8175556 /pmc/articles/PMC8175556/bin/42003_2021_2140_MOESM7_ESM.xlsx Hsapiens 1 43717"
## [129] "PMC8208808 /pmc/articles/PMC8208808/bin/Table_1.xlsx Mmusculus 4 44447 44450 44448 44445"
## [130] "PMC8172904 /pmc/articles/PMC8172904/bin/42003_2021_2201_MOESM4_ESM.xlsx Hsapiens 2 44443 44446"
## [131] "PMC8172904 /pmc/articles/PMC8172904/bin/42003_2021_2201_MOESM4_ESM.xlsx Hsapiens 22 43895 43892 44089 43898 43897 43900 43893 44081 43892 43896 43891 44079 44080 44076 44085 44082 43899 44083 44084 44077 44078 44075"
## [132] "PMC8172568 /pmc/articles/PMC8172568/bin/42003_2021_2208_MOESM4_ESM.xlsx Mmusculus 10 44262 44257 44257 44446 44443 44260 44450 44263 44261 44447"
## [133] "PMC8172568 /pmc/articles/PMC8172568/bin/42003_2021_2208_MOESM4_ESM.xlsx Mmusculus 6 44442 44264 44447 44262 44450 44446"
## [134] "PMC8172568 /pmc/articles/PMC8172568/bin/42003_2021_2208_MOESM4_ESM.xlsx Mmusculus 6 44446 44257 44262 44260 44256 44257"
## [135] "PMC8172568 /pmc/articles/PMC8172568/bin/42003_2021_2208_MOESM4_ESM.xlsx Mmusculus 4 44445 44447 44443 44260"
## [136] "PMC8172568 /pmc/articles/PMC8172568/bin/42003_2021_2208_MOESM4_ESM.xlsx Mmusculus 2 44260 44257"
## [137] "PMC8172568 /pmc/articles/PMC8172568/bin/42003_2021_2208_MOESM4_ESM.xlsx Mmusculus 9 44262 44257 44260 44448 44446 44450 44257 44443 44261"
## [138] "PMC8203102 /pmc/articles/PMC8203102/bin/Table_10.XLS Ggallus 6 42987 42987 42987 42987 42987 42987"
## [139] "PMC8203102 /pmc/articles/PMC8203102/bin/Table_10.XLS Ggallus 6 42987 42987 42987 42987 42987 42987"
## [140] "PMC8203102 /pmc/articles/PMC8203102/bin/Table_5.XLS Hsapiens 5 43532 43715 43717 43525 43529"
## [141] "PMC8203102 /pmc/articles/PMC8203102/bin/Table_5.XLS Hsapiens 5 43532 43529 43715 43717 43713"
## [142] "PMC8164820 /pmc/articles/PMC8164820/bin/13287_2021_2388_MOESM8_ESM.xls Hsapiens 1 2021/03/01"
## [143] "PMC8164247 /pmc/articles/PMC8164247/bin/12885_2021_8388_MOESM1_ESM.xlsx Hsapiens 22 44531 44257 44265 44266 44258 44259 44260 44261 44262 44263 44264 44449 44450 44451 44441 44442 44443 44444 44445 44446 44447 44448"
## [144] "PMC8193354 /pmc/articles/PMC8193354/bin/Data_Sheet_3.XLSX Hsapiens 8 39873 39142 37135 40787 38961 36951 38777 36951"
## [145] "PMC8193354 /pmc/articles/PMC8193354/bin/Data_Sheet_3.XLSX Hsapiens 7 38231 36951 38596 39692 38777 40787 36951"
## [146] "PMC8193354 /pmc/articles/PMC8193354/bin/Data_Sheet_3.XLSX Hsapiens 5 37316 39692 38231 38596 39508"
## [147] "PMC8193354 /pmc/articles/PMC8193354/bin/Data_Sheet_3.XLSX Hsapiens 10 39508 40787 39142 37135 38777 39873 38596 37316 36951 38961"
## [148] "PMC8193354 /pmc/articles/PMC8193354/bin/Data_Sheet_3.XLSX Hsapiens 3 38231 39692 36951"
## [149] "PMC8193354 /pmc/articles/PMC8193354/bin/Data_Sheet_3.XLSX Hsapiens 5 39142 39873 39508 36951 40787"
## [150] "PMC8193354 /pmc/articles/PMC8193354/bin/Data_Sheet_3.XLSX Hsapiens 8 38231 39692 38596 36951 37316 38961 37135 38777"
## [151] "PMC8193354 /pmc/articles/PMC8193354/bin/Data_Sheet_3.XLSX Hsapiens 7 39142 39508 38777 40787 38961 39873 38596"
## [152] "PMC8193354 /pmc/articles/PMC8193354/bin/Data_Sheet_3.XLSX Ggallus 6 37135 36951 38231 36951 39692 37316"
## [153] "PMC8193354 /pmc/articles/PMC8193354/bin/Data_Sheet_3.XLSX Hsapiens 6 37135 39873 39142 38961 39508 37316"
## [154] "PMC8193101 /pmc/articles/PMC8193101/bin/Table_2.XLSX Athaliana 3 43344 43345 43380"
## [155] "PMC8163762 /pmc/articles/PMC8163762/bin/41467_2021_23142_MOESM5_ESM.xlsx Scerevisiae 1 37165"
## [156] "PMC8161999 /pmc/articles/PMC8161999/bin/12967_2021_2903_MOESM1_ESM.xlsx Hsapiens 2 43526 43713"
## [157] "PMC8189496 /pmc/articles/PMC8189496/bin/ppat.1009599.s001.xlsx Hsapiens 23 43899 44078 44076 44085 44086 44079 44077 43894 44089 43901 44083 43895 44166 43896 43892 44080 43898 43893 44082 44084 43897 43900 44081"
## [158] "PMC8168385 /pmc/articles/PMC8168385/bin/2a8527d91253de19a3b16993.xlsx Hsapiens 18 40422 38596 38108 37135 37865 38231 41153 37500 40057 38961 42248 37226 39326 37012 41883 40787 39692 44531"
## [159] "PMC8168385 /pmc/articles/PMC8168385/bin/2a8527d91253de19a3b16993.xlsx Hsapiens 1 44531"
## [160] "PMC8161968 /pmc/articles/PMC8161968/bin/12935_2021_1968_MOESM3_ESM.xlsx Hsapiens 1 43892"
## [161] "PMC8160133 /pmc/articles/PMC8160133/bin/41467_2021_23384_MOESM5_ESM.xlsx Hsapiens 16 37226 37012 42248 37135 40422 40787 41153 41883 37500 37865 38231 38596 38961 39326 39692 40057"
## [162] "PMC8160133 /pmc/articles/PMC8160133/bin/41467_2021_23384_MOESM7_ESM.xlsx Hsapiens 16 37226 37012 42248 37135 40422 40787 41153 41883 37500 37865 38231 38596 38961 39326 39692 40057"
## [163] "PMC8186667 /pmc/articles/PMC8186667/bin/Table_1.XLSX Hsapiens 1 44445"
## [164] "PMC8149827 /pmc/articles/PMC8149827/bin/41467_2021_22843_MOESM6_ESM.xlsx Hsapiens 1 38047"
## [165] "PMC8183685 /pmc/articles/PMC8183685/bin/Table_1.XLSX Hsapiens 14 44089 43892 43891 44084 43897 44076 44085 43896 43898 43895 43899 44078 44083 43892"
## [166] "PMC8183685 /pmc/articles/PMC8183685/bin/Table_1.XLSX Hsapiens 14 44089 43892 43891 44084 43897 44076 44085 43896 43898 43895 43899 44078 44083 43892"
## [167] "PMC8183685 /pmc/articles/PMC8183685/bin/Table_2.XLSX Hsapiens 29 44089 44083 44083 44083 44083 44089 44083 44083 43891 43891 44078 44083 43899 44078 43892 44083 44083 44078 44078 44083 44082 44075 44083 44083 44083 44078 44079 43897 44083"
## [168] "PMC8181421 /pmc/articles/PMC8181421/bin/Table_1.xlsx Mmusculus 1 40057"
## [169] "PMC8181421 /pmc/articles/PMC8181421/bin/Table_1.xlsx Mmusculus 1 39326"
## [170] "PMC8181421 /pmc/articles/PMC8181421/bin/Table_1.xlsx Mmusculus 1 39326"
## [171] "PMC8181421 /pmc/articles/PMC8181421/bin/Table_1.xlsx Mmusculus 1 39326"
## [172] "PMC8181421 /pmc/articles/PMC8181421/bin/Table_1.xlsx Mmusculus 13 38777 40057 38961 38961 40787 39326 38777 39326 38961 39326 39142 40057 38777"
## [173] "PMC8181421 /pmc/articles/PMC8181421/bin/Table_1.xlsx Mmusculus 2 40057 37681"
## [174] "PMC8181421 /pmc/articles/PMC8181421/bin/Table_3.xlsx Mmusculus 1 39326"
## [175] "PMC8181421 /pmc/articles/PMC8181421/bin/Table_3.xlsx Mmusculus 1 39326"
## [176] "PMC8181421 /pmc/articles/PMC8181421/bin/Table_3.xlsx Mmusculus 3 40057 37135 37316"
## [177] "PMC8181421 /pmc/articles/PMC8181421/bin/Table_3.xlsx Mmusculus 3 40057 37135 37316"
## [178] "PMC8181421 /pmc/articles/PMC8181421/bin/Table_4.xlsx Mmusculus 2 36951 40238"
## [179] "PMC8181421 /pmc/articles/PMC8181421/bin/Table_4.xlsx Mmusculus 1 40057"
## [180] "PMC8144567 /pmc/articles/PMC8144567/bin/41467_2021_23379_MOESM7_ESM.xlsx Hsapiens 1 44088"
## [181] "PMC8144567 /pmc/articles/PMC8144567/bin/41467_2021_23379_MOESM7_ESM.xlsx Hsapiens 2 44082 43893"
## [182] "PMC8144567 /pmc/articles/PMC8144567/bin/41467_2021_23379_MOESM7_ESM.xlsx Hsapiens 1 44089"
## [183] "PMC8144567 /pmc/articles/PMC8144567/bin/41467_2021_23379_MOESM7_ESM.xlsx Hsapiens 1 44088"
## [184] "PMC8144567 /pmc/articles/PMC8144567/bin/41467_2021_23379_MOESM7_ESM.xlsx Hsapiens 2 44075 44086"
## [185] "PMC8144567 /pmc/articles/PMC8144567/bin/41467_2021_23379_MOESM7_ESM.xlsx Hsapiens 2 44075 44086"
## [186] "PMC8144423 /pmc/articles/PMC8144423/bin/41467_2021_23308_MOESM6_ESM.xlsx Mmusculus 24 44075 43892 43896 43899 43897 43892 44085 44083 43901 43898 44084 43891 43891 44082 44081 44078 44077 44086 43895 44076 43893 43894 44080 44079"
## [187] "PMC8144423 /pmc/articles/PMC8144423/bin/41467_2021_23308_MOESM6_ESM.xlsx Mmusculus 24 44075 43896 43892 43898 44085 43899 43897 44083 44082 43895 43901 44077 44081 44084 44078 44086 44076 43891 43893 43891 43894 44080 44079 43892"
## [188] "PMC8144423 /pmc/articles/PMC8144423/bin/41467_2021_23308_MOESM6_ESM.xlsx Mmusculus 22 44075 44081 44082 44084 44078 43901 44077 43895 43898 44076 43897 43893 43891 43896 43894 43899 44080 44085 44083 44079 43892 43892"
## [189] "PMC8144423 /pmc/articles/PMC8144423/bin/41467_2021_23308_MOESM6_ESM.xlsx Rnorvegicus 24 44075 44081 44082 44084 44078 43901 44077 44086 43895 43898 44076 43897 43893 43891 43896 43894 43899 44080 44085 44083 44079 43892 43900 43892"
## [190] "PMC8144423 /pmc/articles/PMC8144423/bin/41467_2021_23308_MOESM6_ESM.xlsx Mmusculus 24 44085 44075 44083 44081 43896 43899 43898 44078 44076 44082 43897 44080 44084 44077 43892 43891 43901 44086 43895 43891 43893 43894 44079 43892"
## [191] "PMC8144423 /pmc/articles/PMC8144423/bin/41467_2021_23308_MOESM6_ESM.xlsx Mmusculus 25 43896 44085 44077 44078 43891 43892 43899 43895 44075 44076 43891 43901 43897 44081 44082 44084 44086 43898 43893 44088 43894 44080 44083 44079 43892"
## [192] "PMC8144423 /pmc/articles/PMC8144423/bin/41467_2021_23308_MOESM6_ESM.xlsx Mmusculus 24 44075 43899 43892 44077 44081 44082 44084 44078 43901 44086 43895 43898 44076 43891 43897 43893 43891 43896 43894 44080 44085 44083 44079 43892"
## [193] "PMC8144423 /pmc/articles/PMC8144423/bin/41467_2021_23308_MOESM6_ESM.xlsx Mmusculus 25 44088 43899 43901 44075 44077 44083 43891 43896 44076 44085 44080 44084 44081 43892 44079 43892 43897 43895 44082 43898 43893 43891 44078 44086 43900"
## [194] "PMC8144423 /pmc/articles/PMC8144423/bin/41467_2021_23308_MOESM6_ESM.xlsx Mmusculus 25 44075 44081 44082 44084 44078 43901 44077 44086 43895 43898 44076 43891 43897 43893 44088 43891 43896 43894 43899 44080 44085 44083 44079 43892 43892"
## [195] "PMC8144423 /pmc/articles/PMC8144423/bin/41467_2021_23308_MOESM7_ESM.xlsx Mmusculus 3 44081 44083 44075"
## [196] "PMC8144423 /pmc/articles/PMC8144423/bin/41467_2021_23308_MOESM7_ESM.xlsx Mmusculus 3 44081 44083 44075"
## [197] "PMC8144423 /pmc/articles/PMC8144423/bin/41467_2021_23308_MOESM7_ESM.xlsx Mmusculus 3 44081 44083 44075"
## [198] "PMC8144423 /pmc/articles/PMC8144423/bin/41467_2021_23308_MOESM7_ESM.xlsx Mmusculus 5 44075 44081 44085 44083 44076"
## [199] "PMC8144406 /pmc/articles/PMC8144406/bin/41467_2021_23171_MOESM6_ESM.xlsx Hsapiens 2 43891 44080"
## [200] "PMC8140133 /pmc/articles/PMC8140133/bin/41467_2021_23327_MOESM7_ESM.xlsx Hsapiens 1 40787"
## [201] "PMC8140133 /pmc/articles/PMC8140133/bin/41467_2021_23327_MOESM7_ESM.xlsx Hsapiens 3 37865 38231 37135"
## [202] "PMC8140133 /pmc/articles/PMC8140133/bin/41467_2021_23327_MOESM7_ESM.xlsx Hsapiens 3 37865 38231 37135"
## [203] "PMC8140133 /pmc/articles/PMC8140133/bin/41467_2021_23327_MOESM7_ESM.xlsx Hsapiens 2 38231 37135"
## [204] "PMC8140133 /pmc/articles/PMC8140133/bin/41467_2021_23327_MOESM7_ESM.xlsx Hsapiens 3 37865 37135 38231"
## [205] "PMC8140133 /pmc/articles/PMC8140133/bin/41467_2021_23327_MOESM7_ESM.xlsx Hsapiens 1 40787"
## [206] "PMC8140133 /pmc/articles/PMC8140133/bin/41467_2021_23327_MOESM9_ESM.xlsx Hsapiens 1 38961"
## [207] "PMC8140133 /pmc/articles/PMC8140133/bin/41467_2021_23327_MOESM9_ESM.xlsx Hsapiens 1 39692"
## [208] "PMC8140133 /pmc/articles/PMC8140133/bin/41467_2021_23327_MOESM9_ESM.xlsx Hsapiens 3 38231 38961 39692"
## [209] "PMC8139961 /pmc/articles/PMC8139961/bin/41537_2021_159_MOESM2_ESM.xlsx Ggallus 7 43891 44083 43891 43891 44083 43891 44081"
## [210] "PMC8172126 /pmc/articles/PMC8172126/bin/Table_11.XLSX Ggallus 13 44446 44442 44446 44442 44446 44446 44446 44446 44446 44446 44442 44446 44446"
## [211] "PMC8172126 /pmc/articles/PMC8172126/bin/Table_4.XLSX Hsapiens 1 44081"
## [212] "PMC8172126 /pmc/articles/PMC8172126/bin/Table_9.XLSX Hsapiens 6 44260 44260 44448 44448 44448 44448"
## [213] "PMC8172126 /pmc/articles/PMC8172126/bin/Table_9.XLSX Mmusculus 1 44258"
## [214] "PMC8168535 /pmc/articles/PMC8168535/bin/Data_Sheet_4.xlsx Hsapiens 2 43891 44166"
## [215] "PMC8149808 /pmc/articles/PMC8149808/bin/mmc2.xlsx Mmusculus 24 37135 40603 40422 38596 40787 38047 39508 42248 37865 37316 38231 39142 37681 38412 39692 40057 38961 41883 39873 36951 40238 38777 39326 37500"
## [216] "PMC8149807 /pmc/articles/PMC8149807/bin/mmc3.xlsx Hsapiens 14 37865 36951 39692 40787 38961 37500 38412 37316 40057 40422 38596 39326 38777 39142"
## [217] "PMC8149807 /pmc/articles/PMC8149807/bin/mmc3.xlsx Hsapiens 13 37316 38412 38961 40057 37865 39692 36951 40787 38777 37500 39326 39142 40422"
## [218] "PMC8166323 /pmc/articles/PMC8166323/bin/Table_4.XLSX Mmusculus 2 43892 43891"
## [219] "PMC8166252 /pmc/articles/PMC8166252/bin/Data_Sheet_1.xlsx Hsapiens 1 72086"
## [220] "PMC8128874 /pmc/articles/PMC8128874/bin/41467_2021_22872_MOESM9_ESM.xlsx Hsapiens 5 43899 44083 44076 44081 44080"
## [221] "PMC8128874 /pmc/articles/PMC8128874/bin/41467_2021_22872_MOESM9_ESM.xlsx Hsapiens 3 44080 44081 44089"
## [222] "PMC8191307 /pmc/articles/PMC8191307/bin/mmc1.xlsx Hsapiens 144 43800 43800 43800 43526 43526 43526 43534 43534 43534 43535 43535 43535 43527 43527 43527 43528 43528 43528 43529 43529 43529 43530 43530 43530 43531 43531 43531 43532 43532 43532 43533 43533 43533 43722 43722 43722 43723 43723 43723 43718 43718 43718 43719 43719 43719 43720 43720 43720 43710 43710 43710 43711 43711 43711 43712 43712 43712 43713 43713 43713 43714 43714 43714 43715 43715 43715 43716 43716 43716 43717 43717 43717 43800 43800 43800 43526 43526 43526 43534 43534 43534 43535 43535 43535 43527 43527 43527 43528 43528 43528 43529 43529 43529 43530 43530 43530 43531 43531 43531 43532 43532 43532 43533 43533 43533 43723 43723 43723 43718 43718 43718 43719 43719 43719 43720 43720 43720 43722 43722 43722 43710 43710 43710 43711 43711 43711 43712 43712 43712 43713 43713 43713 43714 43714 43714 43715 43715 43715 43716 43716 43716 43717 43717 43717"
## [223] "PMC8191307 /pmc/articles/PMC8191307/bin/mmc2.xlsx Hsapiens 24 44080 44076 44079 43894 43892 43893 43901 43897 44166 43900 44088 43896 44084 43899 44081 44078 44082 44086 44077 44089 44085 43895 44083 43898"
## [224] "PMC8178510 /pmc/articles/PMC8178510/bin/CAM4-10-3700-s003.xlsx Hsapiens 4 42432 42618 42437 42429"
## [225] "PMC8186902 /pmc/articles/PMC8186902/bin/elife-66921-supp6.xlsx Mmusculus 4 43716 43526 43525 43710"
## [226] "PMC8188494 /pmc/articles/PMC8188494/bin/mmc2.xlsx Hsapiens 13 36951 40603 40238 37316 39508 41153 40057 39142 38047 37500 37681 39326 38231"
## [227] "PMC8188494 /pmc/articles/PMC8188494/bin/mmc2.xlsx Hsapiens 2 44077 44083"
## [228] "PMC8184692 /pmc/articles/PMC8184692/bin/JCMM-25-5534-s001.xlsx Hsapiens 4 43901 43894 44077 44079"
## [229] "PMC8184692 /pmc/articles/PMC8184692/bin/JCMM-25-5534-s002.xlsx Hsapiens 1 44265"
## [230] "PMC8183939 /pmc/articles/PMC8183939/bin/JCLA-35-e23791-s004.xlsx Hsapiens 3 43892 44076 44079"
## [231] "PMC8183939 /pmc/articles/PMC8183939/bin/JCLA-35-e23791-s008.xlsx Hsapiens 2 43892 44079"
## [232] "PMC8187225 /pmc/articles/PMC8187225/bin/11064_2021_3324_MOESM1_ESM.xlsx Mmusculus 1 43891"
## [233] "PMC8183694 /pmc/articles/PMC8183694/bin/cm9-134-1310-s001.xlsx Hsapiens 1 43170"
## [234] "PMC8185079 /pmc/articles/PMC8185079/bin/41418_2020_726_MOESM10_ESM.xlsx Hsapiens 1 43162"
## [235] "PMC8185079 /pmc/articles/PMC8185079/bin/41418_2020_726_MOESM10_ESM.xlsx Hsapiens 1 43162"
## [236] "PMC8167930 /pmc/articles/PMC8167930/bin/mmc1.xlsx Hsapiens 27 37226 36951 37316 36951 40238 37316 37681 38047 38412 38777 39142 39508 39873 42248 37135 40422 40787 41153 41883 37500 37865 38231 38596 38961 39326 39692 40057"
## [237] "PMC8167930 /pmc/articles/PMC8167930/bin/mmc1.xlsx Hsapiens 27 37681 37316 39873 38047 39326 37865 41153 37226 37135 41883 37316 42248 40057 39508 40238 38231 38412 38777 36951 40422 38961 38596 40787 39142 36951 39692 37500"
## [238] "PMC8167930 /pmc/articles/PMC8167930/bin/mmc2.xlsx Hsapiens 9 37316 39142 39873 40787 37865 38231 38596 39326 40057"
## [239] "PMC8167930 /pmc/articles/PMC8167930/bin/mmc2.xlsx Hsapiens 1 39326"
## [240] "PMC8167930 /pmc/articles/PMC8167930/bin/mmc2.xlsx Hsapiens 1 38596"
## [241] "PMC8167930 /pmc/articles/PMC8167930/bin/mmc2.xlsx Hsapiens 1 39142"
## [242] "PMC8167930 /pmc/articles/PMC8167930/bin/mmc2.xlsx Hsapiens 1 40787"
## [243] "PMC8167930 /pmc/articles/PMC8167930/bin/mmc2.xlsx Hsapiens 2 37865 38231"
## [244] "PMC8167930 /pmc/articles/PMC8167930/bin/mmc2.xlsx Hsapiens 2 37316 40057"
## [245] "PMC8148052 /pmc/articles/PMC8148052/bin/mmc2.xlsx Hsapiens 3 42993 42993 42979"
## [246] "PMC8148052 /pmc/articles/PMC8148052/bin/mmc2.xlsx Hsapiens 3 42980 42987 42980"
## [247] "PMC8159799 /pmc/articles/PMC8159799/bin/10571_2020_971_MOESM1_ESM.xlsx Hsapiens 2 37500 39326"
## [248] "PMC8168789 /pmc/articles/PMC8168789/bin/NIHMS1702004-supplement-tables2.xlsx Hsapiens 3 43891 44084 44081"
## [249] "PMC8183600 /pmc/articles/PMC8183600/bin/NIHMS1527246-supplement-1527246_SD_2.xlsx Hsapiens 1 43160"
## [250] "PMC8183600 /pmc/articles/PMC8183600/bin/NIHMS1527246-supplement-1527246_SD_2.xlsx Hsapiens 2 43347 43346"
## [251] "PMC8183600 /pmc/articles/PMC8183600/bin/NIHMS1527246-supplement-1527246_SD_2.xlsx Hsapiens 1 43164"
## [252] "PMC8183600 /pmc/articles/PMC8183600/bin/NIHMS1527246-supplement-1527246_SD_2.xlsx Hsapiens 6 43161 43348 43168 43349 43351 43352"
## [253] "PMC8183600 /pmc/articles/PMC8183600/bin/NIHMS1527246-supplement-1527246_SD_9.xlsx Hsapiens 25 43718 43526 43713 43709 43532 43528 43525 43716 43710 43534 43720 43527 43719 43529 43722 43525 43533 43531 43711 43530 43535 43717 43800 43526 43712"
## [254] "PMC8183600 /pmc/articles/PMC8183600/bin/NIHMS1527246-supplement-1527246_SD_9.xlsx Hsapiens 27 43528 43715 43529 43526 43713 43710 43718 43531 43714 43800 43534 43709 43719 43525 43530 43720 43711 43532 43722 43527 43535 43526 43717 43716 43712 43533 43525"
## [255] "PMC8204688 /pmc/articles/PMC8204688/bin/NIHMS1704835-supplement-Table_S2.xlsx Hsapiens 1 42804"
## [256] "PMC8204688 /pmc/articles/PMC8204688/bin/NIHMS1704835-supplement-Table_S2.xlsx Hsapiens 1 42798"
## [257] "PMC8204688 /pmc/articles/PMC8204688/bin/NIHMS1704835-supplement-Table_S4.xlsx Ggallus 3 41525 41521 41525"
Let’s investigate the errors in more detail.
# By species
SPECIES <- sapply(strsplit(ERROR_GENELISTS," "),"[[",3)
table(SPECIES)
## SPECIES
## Athaliana Celegans Ggallus Hsapiens Mmusculus Rnorvegicus
## 1 12 9 184 48 2
## Scerevisiae
## 1
par(mar=c(5,12,4,2))
barplot(table(SPECIES),horiz=TRUE,las=1)
par(mar=c(5,5,4,2))
# Number of affected Excel files per paper
DIST <- table(sapply(strsplit(ERROR_GENELISTS," "),"[[",1))
DIST
##
## PMC8128874 PMC8139961 PMC8140133 PMC8144406 PMC8144423 PMC8144567 PMC8148052
## 2 1 9 1 13 6 2
## PMC8149807 PMC8149808 PMC8149827 PMC8159740 PMC8159799 PMC8160133 PMC8161968
## 2 1 1 23 1 2 1
## PMC8161999 PMC8163257 PMC8163762 PMC8164247 PMC8164820 PMC8166252 PMC8166323
## 1 20 1 1 1 1 1
## PMC8166871 PMC8167930 PMC8168385 PMC8168535 PMC8168789 PMC8172126 PMC8172568
## 1 9 2 1 1 4 6
## PMC8172904 PMC8175556 PMC8175737 PMC8176597 PMC8178510 PMC8180056 PMC8181421
## 2 2 5 1 1 1 12
## PMC8183600 PMC8183685 PMC8183694 PMC8183939 PMC8184692 PMC8185079 PMC8185105
## 6 3 1 2 2 2 1
## PMC8185950 PMC8186232 PMC8186478 PMC8186667 PMC8186902 PMC8187149 PMC8187225
## 1 2 4 1 1 3 1
## PMC8187656 PMC8188494 PMC8189496 PMC8191028 PMC8191307 PMC8191494 PMC8192578
## 2 2 1 1 2 2 2
## PMC8192774 PMC8193101 PMC8193354 PMC8195602 PMC8195742 PMC8195999 PMC8196104
## 1 1 10 2 4 1 1
## PMC8201885 PMC8203102 PMC8204412 PMC8204688 PMC8207177 PMC8207585 PMC8207764
## 1 4 1 3 1 1 2
## PMC8208808 PMC8209599 PMC8210593 PMC8213770 PMC8214279 PMC8215672 PMC8217514
## 1 2 1 12 2 3 3
## PMC8217872 PMC8220000 PMC8222551 PMC8225970 PMC8226229 PMC8226268 PMC8231655
## 3 2 2 3 2 1 1
## PMC8233618 PMC8236812 PMC8236889 PMC8237048
## 1 1 2 1
summary(as.numeric(DIST))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.00 2.00 2.92 3.00 23.00
hist(DIST,main="Number of affected Excel files per paper")
# PMC Articles with the most errors
DIST_DF <- as.data.frame(DIST)
DIST_DF <- DIST_DF[order(-DIST_DF$Freq),,drop=FALSE]
head(DIST_DF,20)
## Var1 Freq
## 11 PMC8159740 23
## 16 PMC8163257 20
## 5 PMC8144423 13
## 35 PMC8181421 12
## 74 PMC8213770 12
## 59 PMC8193354 10
## 3 PMC8140133 9
## 23 PMC8167930 9
## 6 PMC8144567 6
## 28 PMC8172568 6
## 36 PMC8183600 6
## 31 PMC8175737 5
## 27 PMC8172126 4
## 45 PMC8186478 4
## 61 PMC8195742 4
## 65 PMC8203102 4
## 37 PMC8183685 3
## 48 PMC8187149 3
## 67 PMC8204688 3
## 76 PMC8215672 3
MOST_ERR_FILES = as.character(DIST_DF[1,1])
MOST_ERR_FILES
## [1] "PMC8159740"
# Number of errors per paper
NERR <- as.numeric(sapply(strsplit(ERROR_GENELISTS," "),"[[",4))
names(NERR) <- sapply(strsplit(ERROR_GENELISTS," "),"[[",1)
NERR <-tapply(NERR, names(NERR), sum)
NERR
## PMC8128874 PMC8139961 PMC8140133 PMC8144406 PMC8144423 PMC8144567 PMC8148052
## 8 7 18 2 231 9 6
## PMC8149807 PMC8149808 PMC8149827 PMC8159740 PMC8159799 PMC8160133 PMC8161968
## 27 24 1 153 2 32 1
## PMC8161999 PMC8163257 PMC8163762 PMC8164247 PMC8164820 PMC8166252 PMC8166323
## 2 550 1 22 1 1 2
## PMC8166871 PMC8167930 PMC8168385 PMC8168535 PMC8168789 PMC8172126 PMC8172568
## 22 71 19 2 3 21 37
## PMC8172904 PMC8175556 PMC8175737 PMC8176597 PMC8178510 PMC8180056 PMC8181421
## 24 3 85 5 4 2 30
## PMC8183600 PMC8183685 PMC8183694 PMC8183939 PMC8184692 PMC8185079 PMC8185105
## 62 57 1 5 5 2 2
## PMC8185950 PMC8186232 PMC8186478 PMC8186667 PMC8186902 PMC8187149 PMC8187225
## 3 5 75 1 4 3 1
## PMC8187656 PMC8188494 PMC8189496 PMC8191028 PMC8191307 PMC8191494 PMC8192578
## 3 15 23 12 168 15 2
## PMC8192774 PMC8193101 PMC8193354 PMC8195602 PMC8195742 PMC8195999 PMC8196104
## 1 3 65 37 20 4 6
## PMC8201885 PMC8203102 PMC8204412 PMC8204688 PMC8207177 PMC8207585 PMC8207764
## 8 22 1 5 1 21 5
## PMC8208808 PMC8209599 PMC8210593 PMC8213770 PMC8214279 PMC8215672 PMC8217514
## 4 3 30 12 2 4 85
## PMC8217872 PMC8220000 PMC8222551 PMC8225970 PMC8226229 PMC8226268 PMC8231655
## 3 4 31 29 21 3 235
## PMC8233618 PMC8236812 PMC8236889 PMC8237048
## 2 1 3 25
hist(NERR,main="number of errors per PMC article")
NERR_DF <- as.data.frame(NERR)
NERR_DF <- NERR_DF[order(-NERR_DF$NERR),,drop=FALSE]
head(NERR_DF,20)
## NERR
## PMC8163257 550
## PMC8231655 235
## PMC8144423 231
## PMC8191307 168
## PMC8159740 153
## PMC8175737 85
## PMC8217514 85
## PMC8186478 75
## PMC8167930 71
## PMC8193354 65
## PMC8183600 62
## PMC8183685 57
## PMC8172568 37
## PMC8195602 37
## PMC8160133 32
## PMC8222551 31
## PMC8181421 30
## PMC8210593 30
## PMC8225970 29
## PMC8149807 27
MOST_ERR = rownames(NERR_DF)[1]
MOST_ERR
## [1] "PMC8163257"
GENELIST_ERROR_ARTICLES <- gsub("PMC","",GENELIST_ERROR_ARTICLES)
### JSON PARSING is more reliable than XML
ARTICLES <- esummary( GENELIST_ERROR_ARTICLES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA$result
ARTICLE_DATA <- ARTICLE_DATA[2:length(ARTICLE_DATA)]
JOURNALS <- unlist(lapply(ARTICLE_DATA,function(x) {x$fulljournalname} ))
JOURNALS_TABLE <- table(JOURNALS)
JOURNALS_TABLE <- JOURNALS_TABLE[order(-JOURNALS_TABLE)]
length(JOURNALS_TABLE)
## [1] 53
NUM_JOURNALS=length(JOURNALS_TABLE)
par(mar=c(5,25,4,2))
barplot(head(JOURNALS_TABLE,10), horiz=TRUE, las=1,
xlab="Articles with gene name errors in supp files",
main="Top journals this month")
Congrats to our Journal of the Month winner!
JOURNAL_WINNER <- names(head(JOURNALS_TABLE,1))
JOURNAL_WINNER
## [1] "Nature Communications"
There are two categories:
Paper with the most suplementary files affected by gene name errors (MOST_ERR_FILES)
Paper with the most gene names converted to dates (MOST_ERR)
Sometimes, one paper can win both categories. Congrats to our winners.
MOST_ERR_FILES <- gsub("PMC","",MOST_ERR_FILES)
ARTICLES <- esummary( MOST_ERR_FILES , db="pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLES,as= "parsed")
ARTICLE_DATA <- ARTICLE_DATA[2]
ARTICLE_DATA
## $result
## $result$uids
## [1] "8159740"
##
## $result$`8159740`
## $result$`8159740`$uid
## [1] "8159740"
##
## $result$`8159740`$pubdate
## [1] "2019 Oct 30"
##
## $result$`8159740`$epubdate
## [1] "2019 Oct 30"
##
## $result$`8159740`$printpubdate
## [1] "2021"
##
## $result$`8159740`$source
## [1] "Mol Psychiatry"
##
## $result$`8159740`$authors
## name authtype
## 1 Ivashko-Pachima Y Author
## 2 Hadar A Author
## 3 Grigg I Author
## 4 Korenková V Author
## 5 Kapitansky O Author
## 6 Karmon G Author
## 7 Gershovits M Author
## 8 Sayas CL Author
## 9 Kooy RF Author
## 10 Attems J Author
## 11 Gurwitz D Author
## 12 Gozes I Author
##
## $result$`8159740`$title
## [1] "Discovery of autism/intellectual disability somatic mutations in Alzheimer's brains: mutated ADNP cytoskeletal impairments and repair as a case study"
##
## $result$`8159740`$volume
## [1] "26"
##
## $result$`8159740`$issue
## [1] "5"
##
## $result$`8159740`$pages
## [1] "1619-1633"
##
## $result$`8159740`$articleids
## idtype value
## 1 pmid 31664177
## 2 doi 10.1038/s41380-019-0563-5
## 3 pmcid PMC8159740
##
## $result$`8159740`$fulljournalname
## [1] "Molecular Psychiatry"
##
## $result$`8159740`$sortdate
## [1] "2019/10/30 00:00"
##
## $result$`8159740`$pmclivedate
## [1] "2021/06/17"
MOST_ERR <- gsub("PMC","",MOST_ERR)
ARTICLE_DATA <- esummary(MOST_ERR,db = "pmc" , retmode = "json" )
ARTICLE_DATA <- reutils::content(ARTICLE_DATA,as= "parsed")
ARTICLE_DATA
## $header
## $header$type
## [1] "esummary"
##
## $header$version
## [1] "0.3"
##
##
## $result
## $result$uids
## [1] "8163257"
##
## $result$`8163257`
## $result$`8163257`$uid
## [1] "8163257"
##
## $result$`8163257`$pubdate
## [1] "2021 Jun 15"
##
## $result$`8163257`$epubdate
## [1] ""
##
## $result$`8163257`$printpubdate
## [1] "2021 Jun 15"
##
## $result$`8163257`$source
## [1] "Biol Psychiatry"
##
## $result$`8163257`$authors
## name
## 1 Martin J
## 2 Khramtsova EA
## 3 Goleva SB
## 4 Blokland GA
## 5 Traglia M
## 6 Walters RK
## 7 Hübel C
## 8 Coleman JR
## 9 Breen G
## 10 Børglum AD
## 11 Demontis D
## 12 Grove J
## 13 Werge T
## 14 Bralten J
## 15 Bulik CM
## 16 Lee PH
## 17 Mathews CA
## 18 Peterson RE
## 19 Winham SJ
## 20 Wray N
## 21 Edenberg HJ
## 22 Guo W
## 23 Yao Y
## 24 Neale BM
## 25 Faraone SV
## 26 Petryshen TL
## 27 Weiss LA
## 28 Duncan LE
## 29 Goldstein JM
## 30 Smoller JW
## 31 Stranger BE
## 32 Davis LK
## 33 Sex Differences Cross-Disorder Analysis Group of the Psychiatric Genomics ConsortiumAldaMartinMBortolatoMarcoMBurtonChristie L.CLByrneEndaECareyCaitlin E.CEErdmanLaurenLHuckinsLaura M.LMMattheisenManuelMRobinsonEliseEStahlEliE
## authtype
## 1 Author
## 2 Author
## 3 Author
## 4 Author
## 5 Author
## 6 Author
## 7 Author
## 8 Author
## 9 Author
## 10 Author
## 11 Author
## 12 Author
## 13 Author
## 14 Author
## 15 Author
## 16 Author
## 17 Author
## 18 Author
## 19 Author
## 20 Author
## 21 Author
## 22 Author
## 23 Author
## 24 Author
## 25 Author
## 26 Author
## 27 Author
## 28 Author
## 29 Author
## 30 Author
## 31 Author
## 32 Author
## 33 CollectiveName
##
## $result$`8163257`$title
## [1] "Examining Sex-Differentiated Genetic Effects Across Neuropsychiatric and Behavioral Traits"
##
## $result$`8163257`$volume
## [1] "89"
##
## $result$`8163257`$issue
## [1] "12"
##
## $result$`8163257`$pages
## [1] "1127-1137"
##
## $result$`8163257`$articleids
## idtype value
## 1 pmid 33648717
## 2 doi 10.1016/j.biopsych.2020.12.024
## 3 pmcid PMC8163257
##
## $result$`8163257`$fulljournalname
## [1] "Biological Psychiatry"
##
## $result$`8163257`$sortdate
## [1] "2021/06/15 00:00"
##
## $result$`8163257`$pmclivedate
## [1] "2021/06/15"
To plot the trend over the past 6-12 months.
url <- "http://ziemann-lab.net/public/gene_name_errors/"
doc <- htmlParse(url)
links <- xpathSApply(doc, "//a/@href")
links <- links[grep("html",links)]
links
## href href href
## "Report_2021-02.html" "Report_2021-03.html" "Report_2021-04.html"
## href href
## "Report_2021-05.html" "Report_2021-06.html"
unlink("online_files/",recursive=TRUE)
dir.create("online_files")
sapply(links, function(mylink) {
download.file(paste(url,mylink,sep=""),destfile=paste("online_files/",mylink,sep=""))
} )
## href href href href href
## 0 0 0 0 0
myfilelist <- list.files("online_files/",full.names=TRUE)
trends <- sapply(myfilelist, function(myfilename) {
x <- readLines(myfilename)
# Num XL gene list articles
NUM_GENELIST_ARTICLES <- x[grep("NUM_GENELIST_ARTICLES",x)[3]+1]
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES," "),"[[",3)
NUM_GENELIST_ARTICLES <- sapply(strsplit(NUM_GENELIST_ARTICLES,"<"),"[[",1)
NUM_GENELIST_ARTICLES <- as.numeric(NUM_GENELIST_ARTICLES)
# number of affected articles
NUM_ERROR_GENELIST_ARTICLES <- x[grep("NUM_ERROR_GENELIST_ARTICLES",x)[3]+1]
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES," "),"[[",3)
NUM_ERROR_GENELIST_ARTICLES <- sapply(strsplit(NUM_ERROR_GENELIST_ARTICLES,"<"),"[[",1)
NUM_ERROR_GENELIST_ARTICLES <- as.numeric(NUM_ERROR_GENELIST_ARTICLES)
# Error proportion
ERROR_PROPORTION <- x[grep("ERROR_PROPORTION",x)[3]+1]
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION," "),"[[",3)
ERROR_PROPORTION <- sapply(strsplit(ERROR_PROPORTION,"<"),"[[",1)
ERROR_PROPORTION <- as.numeric(ERROR_PROPORTION)
# number of journals
NUM_JOURNALS <- x[grep('JOURNALS_TABLE',x)[3]+1]
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS," "),"[[",3)
NUM_JOURNALS <- sapply(strsplit(NUM_JOURNALS,"<"),"[[",1)
NUM_JOURNALS <- as.numeric(NUM_JOURNALS)
NUM_JOURNALS
res <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
return(res)
})
colnames(trends) <- sapply(strsplit(colnames(trends),"_"),"[[",3)
colnames(trends) <- gsub(".html","",colnames(trends))
trends <- as.data.frame(trends)
rownames(trends) <- c("NUM_GENELIST_ARTICLES","NUM_ERROR_GENELIST_ARTICLES","ERROR_PROPORTION","NUM_JOURNALS")
trends <- t(trends)
trends <- as.data.frame(trends)
CURRENT_RES <- c(NUM_GENELIST_ARTICLES,NUM_ERROR_GENELIST_ARTICLES,ERROR_PROPORTION,NUM_JOURNALS)
trends <- rbind(trends,CURRENT_RES)
paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
## [1] "2021-07"
rownames(trends)[nrow(trends)] <- paste(CURRENT_YEAR,CURRENT_MONTH,sep="-")
plot(trends$NUM_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with Excel gene lists per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_ERROR_GENELIST_ARTICLES, xaxt = "n" , type="b" , main="Number of articles with gene name errors per month",
ylab="number of articles", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$ERROR_PROPORTION, xaxt = "n" , type="b" , main="Proportion of articles with Excel gene list affected by errors",
ylab="proportion", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
plot(trends$NUM_JOURNALS, xaxt = "n" , type="b" , main="Number of journals with affected articles",
ylab="number of journals", xlab="month")
axis(1, at=1:nrow(trends), labels=rownames(trends))
unlink("online_files/",recursive=TRUE)
Zeeberg, B.R., Riss, J., Kane, D.W. et al. Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5, 80 (2004). https://doi.org/10.1186/1471-2105-5-80
Ziemann, M., Eren, Y. & El-Osta, A. Gene name errors are widespread in the scientific literature. Genome Biol 17, 177 (2016). https://doi.org/10.1186/s13059-016-1044-7
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] readxl_1.3.1 reutils_0.2.3 xml2_1.3.2 jsonlite_1.7.2 XML_3.99-0.6
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 knitr_1.33 magrittr_2.0.1 R6_2.5.0
## [5] rlang_0.4.11 stringr_1.4.0 highr_0.9 tools_4.1.0
## [9] xfun_0.23 jquerylib_0.1.4 htmltools_0.5.1.1 yaml_2.2.1
## [13] digest_0.6.27 assertthat_0.2.1 sass_0.4.0 bitops_1.0-7
## [17] RCurl_1.98-1.3 evaluate_0.14 rmarkdown_2.8 stringi_1.6.2
## [21] compiler_4.1.0 bslib_0.2.5.1 cellranger_1.1.0