ISMB-97: Intelligent Systems for Molecular Biology 1997
Gelfand Tutorial
The Fifth International Conference on Intelligent Systems for
Molecular Biology (ISMB-97)
return to main index for ISMB97
Prediction of Protein Secondary Structure:
Parametric and Nonparametric Statistical Methods
a tutorial
Peter J. Munson, Valentina DiFrancesco, Geetha Vasudevan
Analytical Biostatistics Section, Laboratory of Structural Biology
National Institutes of Health
Bethesda, MD 20892-5626, USA
Knowledge of protein structure is an essential tool for deducing the
biological function of a novel DNA or protein sequence. While it is not
yet possible to reliably predict three-dimensional structure from sequence,
methods for secondary (or local) structure are showing steadily increasing
accuracy. Bioinformaticians should become aware of the many methods for
structure prediction, and their relative merits. Widely available
commercial packages do not yet make use of the best available methods, and
research into improving current methods is quite active.
Prediction methods can be grouped as either theoretical (based on
underlying physics) or statistical (based on the growing database of known
structures). The statistical methods may be viewed as either parametric or
nonparametric in nature. We will review many published methods using this
dichotomy, and discuss the questions: How reliable are current structure
predictions? Are predictions becoming more reliable with time? How does
one choose between multiple predictions for the same protein? How do
multiple sequences improve prediction accuracy? How are such methods used
to determine protein class, protein architecture or fold?
Experience with our hybrid, or semiparametric prediction method, the
Quadratic-Logistic, will also be discussed.