T0599 model 1

UCSC/QB3 Symposium on Bioinformatics
December 10, 2010

The main announcement can be found at http://compbio.soe.ucsc.edu/workshop-2010.html

Thomas Huber Bioinformatic/experimental hybrid approaches: Protein structure determination with paramagnetic probes
Schmitz, C.1, Vernon, R.2, Otting, G.3, Baker, D.2 and Huber, T.1
1Bijvoet Center for Biomolecular Research, Utrecht University, The Netherlands.
2Department of Biochemistry, University of Washington, Seattle, USA.
3Research School of Chemistry, Australian National University, Canberra, Australia.

The pseudocontact shift (PCS) effect, induced by a bound paramagnetic lanthanide ion, is becoming widely used in protein nuclear magnetic resonance (NMR) spectroscopy as it yields a complementary combination of orientational and (long range) distance restraints. This versatile effect can be accurately determined with highly sensitive NMR experiments and has been successfully used (i) to automatically assign NMR resonances, (ii) to determine the structure of protein-protein complexes and protein-ligand complexes, and (iii) to refine NMR structures. Up to now, however, it has been speculated whether or not PCS data as the only experimental restraints are sufficient for de novo structure determination of a protein.

Here we present the first full structure determination of a protein from only PCS data. We present results from nine proteins with different folds ranging in size from 56 to 186 amino acids and show that PCS restraints implemented in the fragment assembly step of the ROSETTA software is highly efficient in biasing the sampling of the conformational space towards the correct target structure. We further show that the best structures computed have a backbone RMSD from the native structure as low as 1.0 Ångström. The question whether we are performing structure prediction assisted by experimental data, or truly de novo structure determination is raised and discussed.

Jerry Tsai, Constructing a Useful Definition of Tertiary Structure
University of the Pacific

We propose a new definition of protein tertiary structure that provides insight and explanation into the structure of protein packing. Current definitions of tertiary structure are unsatisfactory as they have difficulty making sense of the overwhelming amount of information from a full side-chain representation. As a result, common representations only imply protein tertiary structure from the placement of secondary structure. A construct describing tertiary structure should simplify this complexity and yet intuitively represent the non-specific complexity of side-chain packing. As an example at the level of protein secondary structure, the ribbon diagram confers the backbone conformation and hydrogen-bonding pattern without explicitly showing all the atoms or bonds. To address this challenge, we have developed a descriptor of protein tertiary structure called the relative packing group. This relative packing group construct defines an elementary unit of tertiary structure. The construct is defined as a set of residues that all contact each other. Using the Delauney tessellation, only contacts between side-chains atoms are considered. In an analysis of the PDB, we find that 4 body interactions are most dominant and that surface residues contribute more than just pair-wise contacts. As a fundamental unit of tertiary structure, the relative packing group captures and describes the non-local and non-specific tertiary structure into a discrete and logical classification system. The scheme is based on locality of residues with respect to each other, and considers residues from common secondary structure as acting together. For instance for a 4 residue relative packing group, if 3 residues come from the same helix and one from another secondary structure element, the group is defined as 3+1. We have found that helices prefer 3+1 packing groups and that sheet structure prefer 1+2+1 packing groups. Looking at interactions between secondary structure, we find certain patterns for interactions between helices, packing within and between sheets, and packing between helices and sheets. Also, this contruct provides the basis fro an intuitive graphical representation of tertiary structure.

Krishna Roskin, Present, and Future of Sequence Alignment
University of California, Santa Cruz

The aim of comparative genomics is to study variation between genomes to gain a deeper understanding of the forces and processes that shape them. In this dissertation, I chronicle my involvement at the dawn of mammalian comparative genomics when the second and third mammalian genome were sequenced and analyzed. This dissertation will cover the methods used to align the mouse and rat genomes to the human genome and how these methods enabled me to analyze the process of neutral evolution with unprecedented fidelity. This new wealth of data enabled the observation of large and fine scale variations in the processes of neutral evolution. My results and observations have impacted the field of comparative genomics and continue to be used in current literature. This dissertation also presents my work applying alignment algorithms to solve very large alignment problems. Continuing developments in alignment algorithms have produced more sophisticated and principled alignment methods. These new algorithms are often computationally expensive. I developed two general methods that harness the power of parallelization and enable these expensive methods to be applied to large, biologically relevant problems.

Andrew Uzilov, FragSeq: transcriptome-wide RNA structure probing using high-throughput sequencing
University of California, Santa Cruz

Classical approaches to determine structures of noncoding RNA (ncRNA) probed only one RNA at a time with enzymes and chemicals, using gel electrophoresis to identify reactive positions. To accelerate RNA structure inference, we developed fragmentation sequencing (FragSeq), a high-throughput RNA structure probing method that uses high-throughput RNA sequencing of fragments generated by digestion with nuclease P1, which specifically cleaves single-stranded nucleic acids. In experiments probing the entire mouse nuclear transcriptome, we accurately and simultaneously mapped single-stranded RNA regions in multiple ncRNAs with known structure. We probed in two cell types to verify reproducibility. We also identified and experimentally validated structured regions in ncRNAs with, to our knowledge, no previously reported probing data.

Juyong Lee, De novo protein structure prediction by dynamic fragment assembly and conformational space annealing
Juyong Lee1,4, Jinhyuk Lee4, Takeshi N. Sasaki2, Masaki Sasai3,4, Chaok Seok1,4, and Jooyoung Le4
1Department of Chemistry, Seoul National University, Korea
2Department of Human Informatics, Aichi Shukutoku University, Japan
3Department of Applied Physics, Nagoya University, Japan
4Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Korea

Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of a protein structure and an efficient conformational sampling method for successful protein modeling. In this paper, we present an ab initio structure prediction method which combines a recently suggested novel way of fragment assembly, dynamic fragment assembly (DFA) and conformational space annealing (CSA) algorithm. In DFA, model structures are scored by a continuous functions constructed based on short- and long-range structural restraint information from a fragment library. Here DFA is represented by the full-atom model by CHARMM with the addition of the empirical potential of DFIRE. The relative contributions between various energy terms are optimized using linear programming. The conformational sampling was carried out with conformational space annealing (CSA) algorithm which can find lower energy conformations more efficiently than simulated annealing (SA) used in the existing DFA study. The newly introduced DFA energy function and CSA sampling algorithm are implemented into CHARMM. Test results on 30 small single-domain proteins and 14 template-free modeling targets of the 8th Critical Assessment of protein Structure Prediction show that the current method provides comparable and complementary prediction results to existing top methods.

Manel Camps, Mapping polymerase replication in the cell using mutation footprinting
Jennifer M. Allen1, David M. Simcha2, Nolan G. Ericson3, David L. Alexander1, Jacob T. Marquette1, Benjamin P. Van Biber4, Chris J. Troll1, Rachel Karchin2, Jason H. Bielas3, Lawrence A. Loeb4, and Manel Camps1
1 Department of Microbiology and Environmental Toxicology, University of California, Santa Cruz
2 Biomedical Engineering Department and Institute for Computational Medicine, Johns Hopkins University
3 Public Health Sciences; Molecular Diagnostics; Fred Hutchinson Cancer Research Center
4 Department of Pathology, University of Washington, Seattle

DNA polymerase I (pol I) is one of five polymerases expressed in E. coli. It has a well-established role in initiating leading strand synthesis of ColE1 plasmids, extending an RNA transcript at the plasmid origin of replication. Pol I is also involved in Okazaki primer processing during lagging strand synthesis in both plasmid and chromosomal DNA. Using a highly error-prone variant of DNA polymerase I, we have identified by direct sequencing a large (>1000) set of neutral pol I mutations introduced by DNA polymerase I in a ColE1 plasmid sequence. The resulting pol I mutation footprint confirmed the known roles of Pol I for ColE1 plasmid replication, and established the length of the Pol I extension product (~1000 bp) and of Okazaki primer processing (~20 nt) in vivo. A similar approach could be used to map the template of DNA polymerases in other prokaryotic or eukaryotic organisms by taking advantage of naturally high error rates and/or by altering their fidelity.