next up previous
Next: Results and Discussion Up: Hidden Markov models for Previous: Additional features

Implementation and Performance

The inner loop of the algorithm is an tex2html_wrap_inline3779 dynamic programming algorithm that calculates the forward and backward values for each of the three states at a given pair of model and sequence indices. The serial implementation of this algorithm is straightforward. For Cray vectorization, the two inner loops were modified to calculate counter-diagonals in order (i.e., tex2html_wrap_inline3781 ) to make effective use of the vector pipelines.

The MasPar parallel processor features a 2-dimensional mesh of processing elements (PEs) and a global router [Nickolls, 1990]. For large numbers of sequences, the best algorithm mapping can be to perform a complete model-sequence dynamic programming in each PE; unfortunately, the 64 KBytes of local memory per PE is too small for such coarse-grain parallelism. Instead, the array is treated as collection of linear arrays of PEs, where each linear array contains one model. Thus, if the models are of length 100, forty sequence-against-model calculations are performed at a time using 4000 of the 4096 processing elements.

The linear arrays are used as follows. During the forward part of the dynamic programming, the (i,j) value is computed during step i+j in i-th PE of one of the linear arrays. This calculation depends on values from (i-1,j), (i,j-1), and (i-1,j-1), all of which have already been calculated in either PE tex2html_wrap_inline3795 or PE tex2html_wrap_inline3797 . The characters of the sequence are also shifted through the linear array, one PE at a time, to ensure that character j is in PE tex2html_wrap_inline3795 at time step i+j. During the backward calculation, this process is reversed, calculating each diagonal at time tex2html_wrap_inline3805 . After all sequences have been compared to the model, the frequencies are combined using log-time reduction, and uploaded to the host computer for high-level processing such as surgery and noise injection.

Performance can be rated in terms of dynamic programming cell updates per second, or CUPS. Performing both the forward and backward calculations at a single dynamic programming cell (one character from one sequence against one model node) is a single cell update. The total number of cell updates, which depends greatly on parameter settings, is the sum of model lengths over all reestimation cycles multiplied by the total number of characters in all the sequences. For the experiments described in the next section, a typical training run on 50 globins of average length 146 and a model length of 145 with 40 reestimatation cycles, required 50 seconds on our 4096-PE MasPar MP-2 or about 7 minutes on a DEC Alpha 3000/400 (longer runs runs heighten the difference between the two machines to factors of 10-15). Experiments based on 400 training sequences and an earlier version of the code are summarized in Table 1.

   table3042
Table 1: HMM training on various machines.


next up previous
Next: Results and Discussion Up: Hidden Markov models for Previous: Additional features

Rey Rivera
Thu Aug 29 15:28:54 PDT 1996