File Download

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

이세민

Lee, Semin
Computational Biology Lab.
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.startPage 16927 -
dc.citation.title SCIENTIFIC REPORTS -
dc.citation.volume 9 -
dc.contributor.author Lee, Kanggeun -
dc.contributor.author Jeong, Hyoung-oh -
dc.contributor.author Lee, Semin -
dc.contributor.author Jeong, Won-Ki -
dc.date.accessioned 2023-12-21T18:21:21Z -
dc.date.available 2023-12-21T18:21:21Z -
dc.date.created 2019-12-06 -
dc.date.issued 2019-11 -
dc.description.abstract With recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy. -
dc.identifier.bibliographicCitation SCIENTIFIC REPORTS, v.9, pp.16927 -
dc.identifier.doi 10.1038/s41598-019-53034-3 -
dc.identifier.issn 2045-2322 -
dc.identifier.scopusid 2-s2.0-85075114846 -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/30674 -
dc.identifier.url https://www.nature.com/articles/s41598-019-53034-3 -
dc.identifier.wosid 000496716800011 -
dc.language 영어 -
dc.publisher NATURE PUBLISHING GROUP -
dc.title CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network -
dc.type Article -
dc.description.isOpenAccess TRUE -
dc.relation.journalWebOfScienceCategory Multidisciplinary Sciences -
dc.relation.journalResearchArea Science & Technology - Other Topics -
dc.type.docType Article -
dc.description.journalRegisteredClass scie -
dc.description.journalRegisteredClass scopus -
dc.subject.keywordPlus PATTERNS -
dc.subject.keywordPlus GENES -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.