The main announcement can be found at http://compbio.soe.ucsc.edu/workshop-2010.html
The pseudocontact shift (PCS) effect, induced by a bound paramagnetic lanthanide ion, is becoming widely used in protein nuclear magnetic resonance (NMR) spectroscopy as it yields a complementary combination of orientational and (long range) distance restraints. This versatile effect can be accurately determined with highly sensitive NMR experiments and has been successfully used (i) to automatically assign NMR resonances, (ii) to determine the structure of protein-protein complexes and protein-ligand complexes, and (iii) to refine NMR structures. Up to now, however, it has been speculated whether or not PCS data as the only experimental restraints are sufficient for de novo structure determination of a protein.
Here we present the first full structure determination of a protein from only PCS data. We present results from nine proteins with different folds ranging in size from 56 to 186 amino acids and show that PCS restraints implemented in the fragment assembly step of the ROSETTA software is highly efficient in biasing the sampling of the conformational space towards the correct target structure. We further show that the best structures computed have a backbone RMSD from the native structure as low as 1.0 Ångström. The question whether we are performing structure prediction assisted by experimental data, or truly de novo structure determination is raised and discussed.
We propose a new definition of protein tertiary structure that provides insight and explanation into the structure of protein packing. Current definitions of tertiary structure are unsatisfactory as they have difficulty making sense of the overwhelming amount of information from a full side-chain representation. As a result, common representations only imply protein tertiary structure from the placement of secondary structure. A construct describing tertiary structure should simplify this complexity and yet intuitively represent the non-specific complexity of side-chain packing. As an example at the level of protein secondary structure, the ribbon diagram confers the backbone conformation and hydrogen-bonding pattern without explicitly showing all the atoms or bonds. To address this challenge, we have developed a descriptor of protein tertiary structure called the relative packing group. This relative packing group construct defines an elementary unit of tertiary structure. The construct is defined as a set of residues that all contact each other. Using the Delauney tessellation, only contacts between side-chains atoms are considered. In an analysis of the PDB, we find that 4 body interactions are most dominant and that surface residues contribute more than just pair-wise contacts. As a fundamental unit of tertiary structure, the relative packing group captures and describes the non-local and non-specific tertiary structure into a discrete and logical classification system. The scheme is based on locality of residues with respect to each other, and considers residues from common secondary structure as acting together. For instance for a 4 residue relative packing group, if 3 residues come from the same helix and one from another secondary structure element, the group is defined as 3+1. We have found that helices prefer 3+1 packing groups and that sheet structure prefer 1+2+1 packing groups. Looking at interactions between secondary structure, we find certain patterns for interactions between helices, packing within and between sheets, and packing between helices and sheets. Also, this contruct provides the basis fro an intuitive graphical representation of tertiary structure.
The aim of comparative genomics is to study variation between genomes to gain a deeper understanding of the forces and processes that shape them. In this dissertation, I chronicle my involvement at the dawn of mammalian comparative genomics when the second and third mammalian genome were sequenced and analyzed. This dissertation will cover the methods used to align the mouse and rat genomes to the human genome and how these methods enabled me to analyze the process of neutral evolution with unprecedented fidelity. This new wealth of data enabled the observation of large and fine scale variations in the processes of neutral evolution. My results and observations have impacted the field of comparative genomics and continue to be used in current literature. This dissertation also presents my work applying alignment algorithms to solve very large alignment problems. Continuing developments in alignment algorithms have produced more sophisticated and principled alignment methods. These new algorithms are often computationally expensive. I developed two general methods that harness the power of parallelization and enable these expensive methods to be applied to large, biologically relevant problems.
Classical approaches to determine structures of noncoding RNA (ncRNA) probed only one RNA at a time with enzymes and chemicals, using gel electrophoresis to identify reactive positions. To accelerate RNA structure inference, we developed fragmentation sequencing (FragSeq), a high-throughput RNA structure probing method that uses high-throughput RNA sequencing of fragments generated by digestion with nuclease P1, which specifically cleaves single-stranded nucleic acids. In experiments probing the entire mouse nuclear transcriptome, we accurately and simultaneously mapped single-stranded RNA regions in multiple ncRNAs with known structure. We probed in two cell types to verify reproducibility. We also identified and experimentally validated structured regions in ncRNAs with, to our knowledge, no previously reported probing data.
Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of a protein structure and an efficient conformational sampling method for successful protein modeling. In this paper, we present an ab initio structure prediction method which combines a recently suggested novel way of fragment assembly, dynamic fragment assembly (DFA) and conformational space annealing (CSA) algorithm. In DFA, model structures are scored by a continuous functions constructed based on short- and long-range structural restraint information from a fragment library. Here DFA is represented by the full-atom model by CHARMM with the addition of the empirical potential of DFIRE. The relative contributions between various energy terms are optimized using linear programming. The conformational sampling was carried out with conformational space annealing (CSA) algorithm which can find lower energy conformations more efficiently than simulated annealing (SA) used in the existing DFA study. The newly introduced DFA energy function and CSA sampling algorithm are implemented into CHARMM. Test results on 30 small single-domain proteins and 14 template-free modeling targets of the 8th Critical Assessment of protein Structure Prediction show that the current method provides comparable and complementary prediction results to existing top methods.
DNA polymerase I (pol I) is one of five polymerases expressed in E. coli. It has a well-established role in initiating leading strand synthesis of ColE1 plasmids, extending an RNA transcript at the plasmid origin of replication. Pol I is also involved in Okazaki primer processing during lagging strand synthesis in both plasmid and chromosomal DNA. Using a highly error-prone variant of DNA polymerase I, we have identified by direct sequencing a large (>1000) set of neutral pol I mutations introduced by DNA polymerase I in a ColE1 plasmid sequence. The resulting pol I mutation footprint confirmed the known roles of Pol I for ColE1 plasmid replication, and established the length of the Pol I extension product (~1000 bp) and of Okazaki primer processing (~20 nt) in vivo. A similar approach could be used to map the template of DNA polymerases in other prokaryotic or eukaryotic organisms by taking advantage of naturally high error rates and/or by altering their fidelity.