The simplest way to handle zero counts is to add a zero-offset
z to all counts, so that the estimated probability of character b
in a given context C is
.
If all the counts in a context are zero (the usual case in
uninteresting sequences), this method assigns a probability of 0.25 (a
cost of 2 bits) to all four possible characters, independent of the
choice of z.
The second most common situation is to have a count of one for one
character and zero for the other characters in the context.
This method assigns a probability of (1+z)/(1+4z) to the character
that was seen, and z/(1+4z) to the other characters.
If the seed is large enough to contain multiple instances of interesting sequences, we can also choose z to minimize the encoding cost of doing adaptive compression on the seed using the model. The minimization is currently done by using Newton's method to set the derivative of the encoding cost to zero, but almost any standard optimization technique should work.