ISMB-97: Intelligent Systems for Molecular Biology 1997
Gelfand Tutorial
The Fifth International Conference on Intelligent Systems for
Molecular Biology (ISMB-97)
return to main index for ISMB97
Prediction of function in DNA sequence analysis
a tutorial
Misha Gelfand
Striking advances in large scale DNA sequencing resulted in
complete sequencing of several bacterial genomes, the yeast
genome, megabase sequences of higher eukaryotes, in
particular, cosmid-size fragments of human DNA. Thus one of
the most important problems of the computational molecular
biology is now interpretation and functional mapping of the
obtained sequence data.
The tutorial will cover the problems of computer-assisted
functional analysis of nucleotide sequences from both the
developer's and userr's points of view. It will start with
an overview of DNA statisticss. Then the algorithms for
recognition of functional sites, protein-coding regions and
other DNA features will be considered, as well as the
algorithms of database search. Finally, the user's strategy
of analysis of a newly sequenced DNA fragment will be
discussed.
The supplementary material will include the list of
existing algorithms for functional interpretation of DNA
sequences as well as locations of e-mail an WWW servers.
- DNA statistics.
- Oligonucleotide counts.
- "Linguistics of DNA": preferred and avoided
oligonucleotides.
- Information theory analysis (Shannon entropy,
Kolmogorov and Lempel-Ziv complexity).
- Analysis of periodicitity (Fourier transform,
cross-correlation functions etc.).
- Fractal analysis.
- Zipf law in linguistics and bioinformatics.
- Statistical analysis and recognition of protein-coding
regions.
- Codon usage.
- Coding potentials.
- Statistical regularities of the exon-intron
structure.
- Statistical analysis and recognition of functional
sites.
- Signal detection and algorithms of multiple local
alignment.
- Consensus. Compilations of transcription factors.
- Weight matrices. Probabilistic and statistical-
mechanical intepretation.
- Neural network recognizers.
- Other functional regions.
- tRNA genes and self-splicing introns.
- CpG islands.
- Matrix attachment regions.
- Nucleosome positioning.
- Database similarity search.
- Gene recognition.
- Statistical algorithms.
- Gene detection by similarity search.
- Prediction of gene structure by spliced alignment.
- Analysis of complete genomes.
- Preiction of protein function.
- Comparison of genomes. The minimal gene set.
- Combining diverse evidencee. Case study: recognition
of restriction-modification system proteins and
prediction of their specificity by protein homology
and DNA statistics.
References:
- M.S.Gelfand. Prediction of function in DNA sequence
analysis. J. Computational Biology 2, 87-115 (1995).
- M.S.Gelfand. Computer finctional analysis of nucleotide
sequences: problems and approaches. DIMACS Series in
Discrete and Applied Mathematics, vol. 8 "Mathematical
Analysis of Biopolymer Sequences", pp. 19-61 (1992).
- J.W.Fickett, C.-S.Tung. Assessment of protein coding
measures. Nucleic Acids Res. 20, 6441-6450 (1992).
- J.W.Fickett. The gene identification problem: an
overview for developers. Comput. Chem.
- JJ.W.Fickett. Finding genes by computer: the state of the
art. Trends Genet. 12, 316-320 (1996).
- P.Bork. Go hunting in sequence databases but watch out
for traps. Trends Genet. 12, 425-427 (1996).