File Download

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Efficient normalization and feature selection methods for single-cell RNA-seq data analysis

Author(s)
Cho, Juok
Advisor
Nam, Dougu
Issued Date
2024-08
URI
https://scholarworks.unist.ac.kr/handle/201301/84106 http://unist.dcollection.net/common/orgView/200000813277
Abstract
Over the past decade, single-cell RNA sequencing (scRNA-seq) has emerged as a pivotal technique. It allows researchers to explore cellular diversity and uncover regulatory interactions between different cell populations. This capability is crucial for developing biomarkers and predicting responses to drugs. However, despite advancements, single-cell RNA sequencing (scRNA-seq) still faces technical hurdles such as isolating individual cells and preparing libraries, which can lead to low-abundance and noisy data counts. Additionally, the raw read counts need preprocessing to correct for variations caused by technical factors across cells. Effective preprocessing aims to eliminate such technical errors while preserving the biological diversity inherent in the data. Many attempts have been made to understand the biological distinctions among cell types using scRNA- seq data. Despite the availability of numerous tools addressing this challenge, researchers often arrive at divergent conclusions due to their selections among different pipelines and analysis options. A significant factor contributing to these discrepancies is the preprocessing steps, which significantly influence subsequent analyses. Most researchers rely on traditional normalization and feature selection methods provided by widely used tools such as Seurat R package, often without exploring potentially superior alternatives. This approach risks overlooking biologically significant findings. However, there is a lack of comprehensive discussion regarding the best strategies, theoretical underpinnings, and mathematical rationale behind their varying performances. Motivated by this gap, I undertook an investigation into various data preprocessing techniques to identify optimal methods for accurately detecting genuine signals that reflect variations in gene expression levels in single-cell RNA-seq data. To achieve this goal, I examined diverse data processing techniques currently employed in scRNA-seq data analysis, focusing on the mathematical and theoretical distinctions among different normalization techniques and feature selection methods. I then evaluated their performances in downstream analyses to assess their effects and discussed the corresponding biological implications. These evaluations were conducted using both simulated and real scRNA-seq data under varied conditions, including data sparsity, number of cell populations, and feature count, to gauge their impacts. Significantly, feature selection methods based on deviance outperformed those based on highly variable genes (HVG) and highly expressed genes (HEG) in the context of T-cell separation and detection. Furthermore, I proposed novel normalization and feature selection approaches that consider varying sequencing depths among cells, demonstrating promising outcomes in testing. In summary, the selection of normalization and feature selection methods significantly influenced clustering, differential expression analysis, cell type classification, and trajectory analysis. I anticipate that my research will contribute to identifying robust mathematical solutions for accurately estimating true expression signals from scRNA-seq data.
Publisher
Ulsan National Institute of Science and Technology
Degree
Doctor
Major
Department of Biomedical Engineering

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.