Baskin Center for Computer Engineering and Science
University of California
Santa Cruz, CA 95064
Technical Report UCSC-CRL-99-11
Updated for SAM Version 3.4
July 31, 2003
Mark Diekhans, Leslie Grate
Christian Barrett, Michael Brown, Jonathan Casper, Melissa Cline,
Terry Figel, Rachel Karchin, Kimmen Sjölander, Christopher Tarnas
supersedes UCSC-CRL-95-07 and UCSC-CRL-96-22
The Sequence Alignment and Modeling system (SAM) is a collection of flexible software tools for creating, refining, and using linear hidden Markov models for biological sequence analysis. The model states can be viewed as representing the sequence of columns in a multiple sequence alignment, with provisions for arbitrary position-dependent insertions and deletions in each sequence. The models are trained on a family of protein or nucleic acid sequences using an expectation-maximization algorithm and a variety of algorithmic heuristics. A trained model can then be used to both generate multiple alignments and search databases for new members of the family.
SAM includes programs and scripts for the SAM-T2K method of remote homology detection. SAM-T2K is an iterative HMM search method for creating an HMM from a single protein sequence or seed alignment using iterative search of a protein database. The method is currently the most sensitive purely-sequence-based remote homology detection algorithm. SAM-T2K is based on successful methods created for the CASP2 and CASP3 protein structure prediction experiments.
The algorithms and methods used by SAM have been described in several pioneering papers from the University of California, Santa Cruz. These papers, as well as the SAM software suite, several servers, and links to related sites are available on the World-Wide Web to