File Download

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

남덕우

Nam, Dougu
Bioinformatics Lab.
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.startPage 408 -
dc.citation.title BMC GENOMICS -
dc.citation.volume 18 -
dc.contributor.author Yoon, Sora -
dc.contributor.author Nam, Dougu -
dc.date.accessioned 2023-12-21T22:14:26Z -
dc.date.available 2023-12-21T22:14:26Z -
dc.date.created 2017-06-20 -
dc.date.issued 2017-05 -
dc.description.abstract Background: In differential expression analysis of RNA-sequencing (RNA-seq) read count data for two sample groups, it is known that highly expressed genes (or longer genes) are more likely to be differentially expressed which is called read count bias (or gene length bias). This bias had great effect on the downstream Gene Ontology over-representation analysis. However, such a bias has not been systematically analyzed for different replicate types of RNA-seq data. Results: We show that the dispersion coefficient of a gene in the negative binomial modeling of read counts is the critical determinant of the read count bias (and gene length bias) by mathematical inference and tests for a number of simulated and real RNA-seq datasets. We demonstrate that the read count bias is mostly confined to data with small gene dispersions (e.g., technical replicates and some of genetically identical replicates such as cell lines or inbred animals), and many biological replicate data from unrelated samples do not suffer from such a bias except for genes with some small counts. It is also shown that the sample-permuting GSEA method yields a considerable number of false positives caused by the read count bias, while the preranked method does not. Conclusion: We showed the small gene variance (similarly, dispersion) is the main cause of read count bias (and gene length bias) for the first time and analyzed the read count bias for different replicate types of RNA-seq data and its effect on gene-set enrichment analysis. -
dc.identifier.bibliographicCitation BMC GENOMICS, v.18, pp.408 -
dc.identifier.doi 10.1186/s12864-017-3809-0 -
dc.identifier.issn 1471-2164 -
dc.identifier.scopusid 2-s2.0-85019724972 -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/22265 -
dc.identifier.url https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-3809-0 -
dc.identifier.wosid 000404075000004 -
dc.language 영어 -
dc.publisher BIOMED CENTRAL LTD -
dc.title Gene dispersion is the key determinant of the read count bias in differential expression analysis of RNA-seq data -
dc.type Article -
dc.description.isOpenAccess TRUE -
dc.relation.journalWebOfScienceCategory Biotechnology & Applied Microbiology; Genetics & Heredity -
dc.relation.journalResearchArea Biotechnology & Applied Microbiology; Genetics & Heredity -
dc.description.journalRegisteredClass scie -
dc.description.journalRegisteredClass scopus -
dc.subject.keywordAuthor RNA-seq -
dc.subject.keywordAuthor Differential expression analysis -
dc.subject.keywordAuthor Read count bias -
dc.subject.keywordAuthor Gene length bias -
dc.subject.keywordAuthor Dispersion -
dc.subject.keywordPlus BIOCONDUCTOR PACKAGE -
dc.subject.keywordPlus ENRICHMENT ANALYSIS -
dc.subject.keywordPlus BREAST-CANCER -
dc.subject.keywordPlus LENGTH BIAS -
dc.subject.keywordPlus IDENTIFICATION -
dc.subject.keywordPlus REPRODUCIBILITY -
dc.subject.keywordPlus NORMALIZATION -
dc.subject.keywordPlus SEQUENCE -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.