The Gribskov profile [GME87] or average-score method [TAK94] computes the weighted average of scores from a score matrix M. There are several standard scoring matrices in use, most notably the Dayhoff matrices [DSO78] and the BLOSUM matrices [HH92], which were originally created for aligning one sequence with another (|s|=1).
The scores are best interpreted as the logarithm of the ratio of the probability of the amino acid in the context to the background probability [Alt91]:
where s is a sample containing exactly one amino acid: j.
The averaging of the score matrices is intended to create a new score. With the interpretation of scores given above, and assuming natural logarithms are used, the posterior counts are
We can avoid recording the extra parameters
by redefining
the score matrix slightly.
If we let
, then
The BLOSUM substitution matrices provide a score matrix
for matching amino acid
i and amino acid j, where P(i,j) is the probability of i and
j appearing as an ordered pair in any column of a correct alignment.
Let's take natural logarithms in creating the score matrix (to match
the exponential in the computation of
).
If we use j to name the sample consisting of a single amino acid
j, then
This is the optimal value for
, and so the Gribskov average
score method is optimal for |s|=1 (with a properly chosen score
matrix).
Although the Gribskov average score method is optimal at |s|=1, it
does not perform well at the extremes.
For |s|=0, it predicts a completely flat distribution (just as
zero-offset methods do).
As
, the Gribskov average-score method does not
approach a maximum-likelihood estimate for
.
We can get much better performance for |s|>1 by optimizing the score matrix as described in Section 4, but the Gribskov average-score method does not generalize to other values of |s| as well the substitution matrix method described in Section 3.4.