BROWSE

Related Researcher

Author's Photo

Bhak, Jong
KOrean GenomIcs Center(KOGIC)
Research Interests
  • Geromics, genomics, bioinformatics, protein Engineering, OMICS

ITEM VIEW & DOWNLOAD

RSDB: representative protein sequence databases have high information content

Cited 29 times inthomson ciCited 32 times inthomson ci
Title
RSDB: representative protein sequence databases have high information content
Other Titles
RSDB: representative protein sequence databases have high information content.
Author
Bhak, Jong HwaHolm, LiisaHeger, AndreasChothia, Cyrus
Keywords
HIDDEN MARKOV-MODELS; ALIGNMENTS; SEARCH; SENSITIVITY; FAMILIES; PFAM
Issue Date
2000-05
Publisher
OXFORD UNIV PRESS
Citation
BIOINFORMATICS, v.16, no.5, pp.458 - 464
Abstract
Motivation: Biological sequence databases are highly redundant for two main reasons. 1. various databanks keep redundant sequences with many identical and nearly identical sequences 2. natural sequences often have high sequence identities due to gene duplication. We wanted to know how many sequences call be removed before the databases start losing homology information. Can a database of sequences with mutual sequence identity of 50% or less provide us with the same amount of biological information as the original full database ? Results: Comparisons of nine representative sequence databases (RSDB) derived from full protein databanks showed that the information content of sequence databases is not linearly proportional to its size. An RSDB reduced to mutual sequence identity of around 50% (RSDB50) was equivalent to the original full database irt terms of the effectiveness of homology searching. It was a third of the full database size which resulted in a six times faster iterative profile searching. The RSDBs are produced at different granularity for efficient homology searching. Availability: All the RSDB files generated ann the full analysis results are available through internet: ftp://ftp.ebi.ac.uk/pub/contrib/jong/RSDB/ http://cyrah.ebi. ac.uk:1111/Proj/Bio/RSDB Contact: jong@biosophy/org
URI
Go to Link
DOI
10.1093/bioinformatics/16.5.458
ISSN
1367-4803
Appears in Collections:
BME_Journal Papers
Files in This Item:
Bioinformatics-2000-Park-458-64.pdf Download

find_unist can give you direct access to the published full text of this article. (UNISTARs only)

Show full item record

qrcode

  • mendeley

    citeulike

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

MENU