Human genomes are routinely compared against a universal human reference. However, this strategy could miss population-specific and personal genomic variations, which may be detected more efficiently using an ethnically-relevant or personal reference. Here I describe principles and methods in constructing a hybrid assembly of the first Korean reference genome (KOREF) by compiling all the major contemporary sequencing and mapping technologies: short and long paired-end sequences, synthetic and single molecule long reads, and optical and nanochannel genome maps. This low-cost hybrid approach shows the feasibility of routine reference-quality de novo assembled genomes to precisely analyze many personal and ethnic genomes in the future. I also introduce the concept of the consensus variome reference, providing information on millions of variants incorporated directly from 40 additional ethnically homogeneous genomes from the Korean Personal Genome Project. KOREF is the first de novo assembled consensus variome reference. KOREF has been constructed according to standardized production and evaluation procedures, and registered as a standard reference data for ethnic Korean genomes by evaluating its traceability, uncertainty, and consistency. By comparing KOREF against other ethnic references, I find that the ethnically-relevant consensus reference can be beneficial for efficient variants detection and possibly other purposes in the future. Therefore, I propose that, despite the limited level of divergence within our species, the level of genomic scale variation is sufficiently high to warrant the use of ethnically-relevant references for large-scale personal and disease genome projects. Systematic comparison of human assemblies also shows the importance of assembly quality, suggesting the necessity of new technologies to comprehensively map ethnic and personal genomic structure variations. In the era of large-scale population genome projects, the leveraging of ethnicity-specific genome assemblies as well as the human reference genome will accelerate mapping all human genome diversity on Earth.
Publisher
Ulsan National Institute of Science and Technology (UNIST)