ISMB-97: Intelligent Systems for Molecular Biology 1997 Gelfand Tutorial

The Fifth International Conference on Intelligent Systems for Molecular Biology (ISMB-97)

return to main index for ISMB97

Prediction of Protein Secondary Structure: Parametric and Nonparametric Statistical Methods
a tutorial
Peter J. Munson, Valentina DiFrancesco, Geetha Vasudevan
Analytical Biostatistics Section, Laboratory of Structural Biology
National Institutes of Health
Bethesda, MD 20892-5626, USA

Knowledge of protein structure is an essential tool for deducing the biological function of a novel DNA or protein sequence. While it is not yet possible to reliably predict three-dimensional structure from sequence, methods for secondary (or local) structure are showing steadily increasing accuracy. Bioinformaticians should become aware of the many methods for structure prediction, and their relative merits. Widely available commercial packages do not yet make use of the best available methods, and research into improving current methods is quite active. Prediction methods can be grouped as either theoretical (based on underlying physics) or statistical (based on the growing database of known structures). The statistical methods may be viewed as either parametric or nonparametric in nature. We will review many published methods using this dichotomy, and discuss the questions: How reliable are current structure predictions? Are predictions becoming more reliable with time? How does one choose between multiple predictions for the same protein? How do multiple sequences improve prediction accuracy? How are such methods used to determine protein class, protein architecture or fold? Experience with our hybrid, or semiparametric prediction method, the Quadratic-Logistic, will also be discussed.