next up previous contents
Next: Bayesian interpretation of Dirichlet Up: Regularizers for Estimating Distributions Previous: Partial derivatives for Dirichlet

Bayesian interpretation of pseudocount regularizers


We can use Bayesian probability techniques to interpret the pseudocount regularizers. To apply these methods we have to view amino acids as being generated by a two-stage random process. First, a 20-dimensional density vector tex2html_wrap_inline2232 over the amino acids is chosen randomly, then amino acids are chosen randomly with probabilities tex2html_wrap_inline2234 . The probability of amino acid i given a sample s is the integral over all possible vectors tex2html_wrap_inline2232 of the probability of choosing that vector times the probability of choosing i given that vector:


Computing the probability tex2html_wrap_inline2246 requires applying Bayes' rule:


giving us a new formula for the probability of amino acid i:


The probability tex2html_wrap_inline2254 is easily computed for any density vector tex2html_wrap_inline2232 , but we need to know the prior distribution of tex2html_wrap_inline2232 in order to compute the integral. The computation for tex2html_wrap_inline2254 is the same as in Section 2.1:


There is an obvious generalization to non-integer s(j) values by replacing the factorial function with the equivalent expression using the Gamma function:


In order to compute the integral, we must choose a model for the the prior distribution of tex2html_wrap_inline2232 . One choice that allows us to compute the integral is to model the prior as a Dirichlet distribution, that is


for some parameter vector z, where C is a constant chosen so that tex2html_wrap_inline2276 P tex2html_wrap_inline2278 .

Showing in detail how to compute the integral is beyond the scope of this paper, but the answer can be derived from the standard definition of the Beta function [GR65, p. 948,]


and the combining formula [GR65, p. 285,]:


By writing the integral over all tex2html_wrap_inline2232 vectors as a multiple integral over the 20 dimensions of the vector and doing some rearrangement, we can get the solution


where we have introduced the tex2html_wrap_inline2284 notation as an simple generalization of tex2html_wrap_inline2286 to the vector argument z.

With this choice of prior distribution for tex2html_wrap_inline2232 , we can compute


We can now compute the estimated probability of the sample


The integral for estimating the conditional probability of amino acid i given sample s is then


Notation: tex2html_wrap_inline2320 is used above to mean the vector consisting of a one in the ith position and a zero elsewhere. tex2html_wrap_inline2324 is one if i=j and zero otherwise.

This rather involved computation finally ends up with the pseudocount method for estimating the probability of an amino acid given a sample of amino acids. The regularizer parameters z can be interpreted as assuming a Dirichlet distribution for the prior probabilities tex2html_wrap_inline2330 . Previous work with pseudocounts has relied heavily on this Bayesian interpretation of the parameters, going so far as to assign tex2html_wrap_inline2332 , which does indeed provide the optimal estimates for tex2html_wrap_inline1482 , but which we have seen in Section 3.2 is not the best setting of the parameters for |s|>0.

The posterior distribution of tex2html_wrap_inline2232 after seeing a sample s is tex2html_wrap_inline2342 P tex2html_wrap_inline2344 P tex2html_wrap_inline2346 P tex2html_wrap_inline2348 . As we can see from the above computations, this posterior distribution is again a Dirichlet distribution, with parameters s(j)+z(j), instead of the prior distribution's parameters z(j). This interpretation of tex2html_wrap_inline2354 as the parameters of the posterior distribution is what inspired naming them the posterior counts. The scaling of tex2html_wrap_inline1632 does matter for this interpretation, and so not all the posterior counts produced by regularizers can be automatically interpreted as Dirichlet posterior distributions on tex2html_wrap_inline2232 .

We can extend the Bayesian analysis to compute the posterior distribution of tex2html_wrap_inline2232 given that we have seen several independent samples: tex2html_wrap_inline2342 P tex2html_wrap_inline2364 . The computation is fairly straightforward. First we apply Bayes rule:


Repeating the mathematics for a single sample would be tedious, but we can take a shortcut. Since the posterior distribution after seeing a sample is again a Dirichlet distribution, we can treat it as the prior distribution for adding the next sample. Using this trick, we can see that the final posterior distribution after seeing all n samples is a Dirichlet distribution with parameters tex2html_wrap_inline2382 . In other words, we get the same result from observing n independent samples as we would get from adding all the samples together and using the resulting counts as a single sample.

next up previous contents
Next: Bayesian interpretation of Dirichlet Up: Regularizers for Estimating Distributions Previous: Partial derivatives for Dirichlet

Rey Rivera
Thu Aug 1 17:59:45 PDT 1996