next up previous contents
Next: Substitution matrices Up: Estimation methods Previous: Pseudocounts

Gribskov average-score method

 

The Gribskov profile [GME87] or average-score method [TAK94] computes the weighted average of scores from a score matrix M. There are several standard scoring matrices in use, most notably the Dayhoff matrices [DSO78] and the BLOSUM matrices [HH92], which were originally created for aligning one sequence with another (|s|=1).

The scores are best interpreted as the logarithm of the ratio of the probability of the amino acid in the context to the background probability [Alt91]:

displaymath1708

where s is a sample containing exactly one amino acid: j.

The averaging of the score matrices is intended to create a new score. With the interpretation of scores given above, and assuming natural logarithms are used, the posterior counts are

displaymath1714

We can avoid recording the extra parameters tex2html_wrap_inline1646 by redefining the score matrix slightly. If we let tex2html_wrap_inline1718 , then

displaymath1720

The BLOSUM substitution matrices provide a score matrix

displaymath1722

for matching amino acid i and amino acid j, where P(i,j) is the probability of i and j appearing as an ordered pair in any column of a correct alignment. Let's take natural logarithms in creating the score matrix (to match the exponential in the computation of tex2html_wrap_inline1734 ). If we use j to name the sample consisting of a single amino acid j, then

displaymath1740

This is the optimal value for tex2html_wrap_inline1742 , and so the Gribskov average score method is optimal for |s|=1 (with a properly chosen score matrix).

Although the Gribskov average score method is optimal at |s|=1, it does not perform well at the extremes. For |s|=0, it predicts a completely flat distribution (just as zero-offset methods do). As tex2html_wrap_inline1626 , the Gribskov average-score method does not approach a maximum-likelihood estimate for tex2html_wrap_inline1504 .

We can get much better performance for |s|>1 by optimizing the score matrix as described in Section 4, but the Gribskov average-score method does not generalize to other values of |s| as well the substitution matrix method described in Section 3.4.


next up previous contents
Next: Substitution matrices Up: Estimation methods Previous: Pseudocounts

Rey Rivera
Thu Aug 1 17:59:45 PDT 1996