We can use Bayesian probability techniques to interpret the
pseudocount regularizers.
To apply these methods we have to view amino acids as being generated
by a two-stage random process.
First, a 20-dimensional density vector over the amino acids is
chosen randomly, then amino acids are chosen randomly with
probabilities .
The probability of amino acid *i* given a sample *s* is the integral
over all possible vectors of the probability of choosing that
vector times the probability of choosing *i* given that vector:

Computing the probability requires applying Bayes' rule:

giving us a new formula for the probability of amino acid *i*:

The probability is easily computed for any density vector , but we need to know the prior distribution of in order to compute the integral. The computation for is the same as in Section 2.1:

There is an obvious generalization to non-integer *s*(*j*) values by
replacing the factorial function with the equivalent expression using
the Gamma function:

In order to compute the integral, we must choose a model for the the prior distribution of . One choice that allows us to compute the integral is to model the prior as a Dirichlet distribution, that is

for some parameter vector *z*, where *C* is a constant chosen so that
P .

Showing in detail how to compute the integral is beyond the scope of this paper, but the answer can be derived from the standard definition of the Beta function [GR65, p. 948,]

and the combining formula [GR65, p. 285,]:

By writing the integral over all vectors as a multiple integral over the 20 dimensions of the vector and doing some rearrangement, we can get the solution

where we have introduced the notation as an simple
generalization of to the vector argument *z*.

With this choice of prior distribution for , we can compute

We can now compute the estimated probability of the sample

The integral for estimating the conditional probability of amino acid
*i* given sample *s* is then

Notation: is used above to mean the vector consisting of
a one in the *i*th position and a zero elsewhere. is
one if *i*=*j* and zero otherwise.

This rather involved computation finally ends up with the pseudocount
method for estimating the probability of an amino acid given a sample
of amino acids.
The regularizer parameters *z* can be interpreted as assuming a
Dirichlet distribution for the prior probabilities .
Previous work with pseudocounts has relied heavily on this Bayesian
interpretation of the parameters, going so far as to assign , which does indeed provide the optimal estimates for
, but which we have seen in Section 3.2 is
not the best setting of the parameters for |*s*|>0.

The *posterior distribution* of after seeing a sample *s* is
P P P . As we can see from the
above computations, this posterior distribution is again a Dirichlet
distribution, with parameters *s*(*j*)+*z*(*j*), instead of the prior
distribution's parameters *z*(*j*). This interpretation of
as the parameters of the posterior distribution is what
inspired naming them the *posterior counts*. The scaling of
does matter for this interpretation, and so not all the posterior
counts produced by regularizers can be automatically interpreted as
Dirichlet posterior distributions on .

We can extend the Bayesian analysis to compute the posterior distribution of given that we have seen several independent samples: P . The computation is fairly straightforward. First we apply Bayes rule:

Repeating the mathematics for a single sample would be tedious, but we
can take a shortcut. Since the posterior distribution after seeing
a sample is again a Dirichlet distribution, we can treat it as the
prior distribution for adding the next sample. Using this trick, we
can see that the final posterior distribution after seeing all *n*
samples is a Dirichlet distribution with parameters . In other words, we get the same result from
observing *n* independent samples as we would get from adding all the
samples together and using the resulting counts as a single sample.

Thu Aug 1 17:59:45 PDT 1996