The automated server karypis.srv.2 uses machine learning techniques for protein structure prediction. Having selected a set of probable templates, protein structures are built using comparative modeling, followed with a model refinement and an evaluation stage to rank the protein structures Given a target protein sequence, we use a meta domain predictor to identify the domain boundaries. Each protein domain sequence is then classified to be in one of the 945 fold classes from the SCOP database (Version 1.69). We build a set of 945 one-versus-rest discriminatory SVM-based classifers, which uses direct kernel functions developed from profile and secondary structure information. Having selected a set of top 3 fold classes, we use a profile-based local alignment scheme to build a set of 30 models using MODELER and refine them using SCWRL. We use ProQ to assess and rank the models and use the top scoring models as our final prediction. We also classify our target domain sequences into one of the 1538 superfamilies (remote homology detection) and build a hierarchical multiclass classifier to select the best possible fold class. We used a Linux-cluster consisting of 40 processors to efficiently parallelize the process of classifying target sequences into the fold and superfamily level classes.
Structure based functional annotation has gained considerable importance with the emergence of Structural Genomics. The SG targets are often functionally uncharacterized and the development of structure based methods for annotation is essential if we want to gain biological insight dividends on this huge investment. Here we show that various energy terms that assign high-energy values to rare structural features can be used to identify catalytic residues. Spatial averaging and summation of the atom-based energies considerably improve the signal of each energy term. The consensus of the energy terms has an even higher predictive power. A program that implements the new method will soon be available as part of the MESHI molecular modeling suite http://www.cs.bgu.ac.il/~meshi
Of all experimental methods at our disposal, protein X-ray crystallography delivers the most accurate and detailed protein structure models. However, although the models are generally of high quality, they are not perfect. The plasticity of proteins, their local propensities, and the nature of the method can give rise to inaccuracies that range from nuisance level to serious overinterpretation. For structure based drug discovery techniques, quality of the structure around active sites and ligands is particularly important, and validation of drug target crystal structure models is crucial. We will discuss the sources of errors and their detection on hand of examples.
Joint work with David Haussler, Harry Noller, Jeremy Darot.
We build a general model of sequence substitution which is capable of detecting the coevolution between nucleotides and amino acid residues. The model extends the continuous-time Markov process to the sequence substitution of two sites. It incurs a general and simple reweighting scheme of the substitution rate matrix based on covariation of the sequences. Unlike previous approaches, this scheme requires no knowledge about the sequence pairing rules and introduces very few extra free parameters. By applying this model to aligned 16S rRNA sequences across 146 species, we find it performs substantially better in detecting secondary interactions than several other methods such as the mutual information measures. Furthermore, it detects the tertiary interactions which do not conform with the standard Watson-Crick or GU base pairing rules. By applying the model to the aligned sequences of all the known protein domains, we identify the pairs of domain families and the position pairs within these families which exhibit strong coevolution patterns. The coevolving domain families are highly enriched with the domains belonging to the same proteins or protein complexes, or participating in the same functional processes, suggesting sequence coevolution has strong structural and functional implications. In addition, the coevolving positions within the same proteins or protein complexes tend to be proximate in the 3D structures. Furthermore, we identify a number of multi-way coevolution between multiple domain families. The coevolving positions of these domains form "pathways" connecting distant amino acid residues in large protein structures. The results suggest in addition to direct physical interactions coevolution may also manifest other structural and functional properties in proteins.
Unraveling the principles by which proteins fold, associate, and cooperatively function is a major challenge in biology that will allow scientists to gain insight into and understanding of the mechanisms of life. To understand these principles, it is necessary to manage large- scale, hierarchical simulations that investigate the behavior of biological system from the relationships between its individual components. The complex nature of these simulations requires powerful hardware and innovative software; in particular, software that allows studying, analyzing, and modeling proteins, complexes, and function in a dynamic way. To be successful, such software must bring together high-performance parallel computing, human-computer interaction, and innovative algorithm design. As more powerful computers open up new opportunities for interactive applications, human intervention becomes an integral part of the scientific investigation process; therefore, there is a clear need for environments that facilitate the interaction between humans and computer- based simulations. Furthermore, as the amount of computation done on a human timescale increases, effective interfaces for rapid analysis of generated data and feedback-based steering tools become a necessary part of the application development.
My goal is to develop a new generation of steering, exploration, and visualization tools for biological applications powerful enough to handle the complexity of the biological problems at hand and to allow the level of human intervention that is needed to integrate human knowledge, experiments, and intuition into the simulation process. As a first approach in that direction, my group has created ProteinShop and DockingShop, two cutting-edge graphical environments for protein modeling and simulations. ProteinShop is a graphical infrastructure for the interactive modeling, manipulation, optimization, and analysis of proteins. It uses inverse kinematics algorithms to interactively manipulate protein structures in a natural, intuitive way by enabling all the angles in the structure to move as jointed segments. ProteinShop provides a number of useful visual cues to guide the user during the manipulation process; for instance, energy visualization based on volume rendering offers insight into the behavior of energy functions as well as on the optimization algorithms that may be used to minimize them. ProteinShop enables users to monitor and steer a protein structure prediction simulation while it runs on a remote parallel computer by providing an interface that lets them complement and guide the simulation process.
DockingShop is an interactive environment for steering docking simulations. Its design will allow users to easily integrate different software tools and to launch docking programs. DockingShop is more than a rendering module that creates 3D images of the models or checks the quality of the fits. It provides an interactive environment with real-time visual feedback to steer the docking process for rapid estimation of the conformational binding mode taking into account the flexibility of the side chains and backbone movement. DockinShop allows users to easily discard false positives and to steer the prediction process to accelerate convergence to the solution, effectively combining human knowledge and intuition with computational power.
Martin Paluszewski, Thomas Hamelryck, and Pawel Winter
Algorithms for Molecular Biology 2006, 1:20 doi:10.1186/1748-7188-1-20 http://www.almob.org/content/1/1/20/abstract/
A new, promising solvent exposure measure, called half-sphere-exposure (HSE), has recently been proposed. Here, we study the reconstruction of a protein's Cα trace solely from structure-derived HSE information. This problem is of relevance for de novo structure prediction using predicted HSE measure. For comparison, we also consider the well-established contact number (CN) measure. We define energy functions based on the HSE- or CN-vectors and minimize them using two conformational search heuristics: Monte Carlo simulation (MCS) and tabu search (TS). While MCS has been the dominant conformational search heuristic in literature, TS has been applied only a few times. To discretize the conformational space, we use lattice models with various complexity.
The proposed TS heuristic with a novel tabu definition generally performs better than MCS for this problem. Our experiments show that, at least for small proteins (up to 35 amino acids), it is possible to reconstruct the protein backbone solely from the HSE or CN information. In general, the HSE measure leads to better models than the CN measure, as judged by the RMSD and the angle correlation with the native structure. The angle correlation, a measure of structural similarity, evaluates whether equivalent residues in two structures have the same general orientation. Our results indicate that the HSE measure is potentially very useful to represent solvent exposure in protein structure prediction, design and simulation.
no abstract provided yet
Thomas Huber, Dmitri Mouradov, Gordon King, Jade K. Forwood, David A. Hume, Jennifer L. Martin, Bostjan Kobe, Ian Ross
no abstract provided yet
A model for the proteolipid ring and bafilomycin/concanamycin-binding site in the vacuolar ATPase of Neurospora crassa
Bowman BJ, McCall ME, Baertsch R,Bowman EJ.
J Biol Chem. 2006 Oct 20;281(42):31885-93. Epub 2006 Aug 15.
The vacuolar ATPase has been implicated in a variety of physiological processes in eukaryotic cells. Bafilomycin and concanamycin, highly potent and specific inhibitors of the vacuolar ATPase, have been widely used to investigate the enzyme. Derivatives have been developed as possible therapeutic drugs. We have used random mutagenesis and site-directed mutagenesis to identify 23 residues in the c subunit involved in binding these drugs. We generated a model for the structure of the ring of c subunits in Neurospora crassa by using data from the crystal structure of the homologous subunits of the bacterium Enterococcus hirae (Murata, T., Yamato, I., Kakinuma, Y., Leslie, A.G., and Walker, J.E. (2005) Science 308, 654-659 doi:10.1126/science.1110064). In the model 10 of the 11 mutation sites that confer the highest degree of resistance are closely clustered. They form a putative drug-binding pocket at the interface between helices 1 and 2 on one c subunit and helix 4 of the adjacent c subunit. The excellent fit of the N.crassa sequence to the E. hirae structure and the degree to which the structural model predicts the clustering of these residues suggest that the folding of the bacterial and eukaryotic polypeptides is very similar.
no abstract provided yet
no abstract provided yet
Guanglei Cui1, Andrew Wollacot2, Jae Shick Yang3, Eugene Shakhnovich3, Kenneth M. Merz, Jr.1
Protein structure prediction from sequence consists of two steps of equal importance, structure generation and structure evaluation. With the advent of linear-scaling algorithms to compute quantum mechanically the conformational energies of proteins with the proper treatment of solvation, it has become possible to incorporate quantum mechanics (QM) based potentials into the structure evaluation step. In this presentation we describe the first generation of our QM based (semiempirical QM) scoring function (divscore) and discuss our results on four CASP7 targets (T0358, T0363, T0382, and T0383), for which, decoy structures were generated by the Harvard group. The future of this approach will be discussed in light of the results obtained to date.