Abstract
A large number of prokaryotes have been produced, so how to provide a means to describe and distinguish them accurately is becoming a key issue of prokaryotic taxonomy. We proposed an efficient algorithm to filter out most genome fragments that are horizontally transferred, and extracted a new genome vector (GV). To highlight the power of GV, we applied it to identify prokaryotes and their variable-size genome fragments. The result indicated that the new vector as species tags can accurately identify genome fragments as short as 3,000 bp at species level.
Similar content being viewed by others
References
Cole JR, Chai B, Marsh TL et al (2003) The Ribosomal Database Project (RDP-II): previewing a new auto aligner that allows regular updates and the new prokaryote taxonomy. Nucleic Acids Res 31:442–443
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans 13:21–27
Diaz NN, Krause L, Goesmann A et al (2009) TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics 10:56
Godfray HCJ (2002) Challenges for taxonomy. Nature 417:17–19
Holt JG, Krieg NR, Sneath PHA (1997) Bergey’s manual of determinative bacteriology. Williams & Wilkins, Baltimore
Karlin S, Brocchieri L, Mrazek J et al (1999) A chimeric prokaryotic ancestry of mitochondria and primitive eukaryotes. Proc Natl Acad Sci USA 96:9190–9195
Karlin S, Burge C (1995) Dinucleotide relative abundance extremes: a genomic signature. Trends Genet 11:283–290
Karlin S, Mrazek J, Ma J et al (2005) Predicted highly expressed genes in archaeal genomes. Proc Natl Acad Sci USA 102:7303–7308
Karlin S, Zhu ZY, Karlin KD (1997) The extended environment of mononuclear metal centers in protein structures. Proc Natl Acad Sci USA 94:14225–14230
McHardy AC, Martin HG, Tsirigos A et al (2007) Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 4:63–72
Mrazek J, Bhaya D, Grossman AR et al (2001) Highly expressed and alien genes of the Synechocystis genome. Nucleic Acids Res 29:1590–1601
Mrazek J, Karlin S (1999) Detecting alien genes in bacterial genomes. Ann N Y Acad Sci 870:314–329
Olsen GJ, Woese CR (1994) The winds of (evolutionary) change: breathing new life into microbiology. J Bacteriol 176:1–6
Otsu N (1979) A threshold selection method from Gray-level Histogram. IEEE Trans Syst Man Cybern SMC 9:62–66
Qi J, Luo H, Hao BL (2004) CVTree: a phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Res 32:45–47
Qi J, Wang B, Hao BL (2004) Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach. J Mol Evol 58:1–11
Woese CR, Fox GE (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci USA 74:5088–5090
Yao Z, Ruzzo WL (2006) A regression-based K nearest neighbor algorithm for gene functions prediction from heterogeneous data. BMC Bioinformatics 7(Suppl 1):S11
Zhou FF, Olman V, Xu Y (2008) Barcodes for genomes and applications. BMC Bioinformatics 9:546
Acknowledgments
The research was supported by the Graduate Innovation Fund of Jilin University (20121101). We would like to thank the anonymous reviewers for their helpful comments on our work. We would also like to thank Dr. Xu, Y and Dr. Zhou F for their helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Hou, T., Liu, F., Lin, C.X. et al. A New Vector for Identification of Prokaryotes and Their Variable-Size Genomes. Curr Microbiol 66, 96–101 (2013). https://doi.org/10.1007/s00284-012-0246-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00284-012-0246-9