Methods

The human/mouse alignments produced by blastz was analyzed by counting the number occurrences of aligned human and mouse base pairs in categorized sites. The number of substitutions per site from the ancestral DNA was estimated using the REV model. Categorization was done based on the type of site, GC content, and if CpGs were included. Results created for all possible combinations of these attributes.

The following types of sites considered:

GC content was computed from the aligned bases in 100kb, non-overlapping windows of the human genome. This was done using either the human or mouse DNA. The windows were then partitioned into three ranges that give a uniform distribution of the human GC content windows. This resulted in the following ranges:

Sites were identified as being CpG if they contained either base on a CG pair on the positive strand. Note that this also includes CpGs on the negative strand in opposite direction.