The Assemblathon 1 was a competition to assess the performance of de-novo assemblers using a simulated short read data set of a simulated ~110 megabase diploid genome. The use of simulation meant we knew the assembly solution, and in fact we knew both haplotypes and their evolutionary history. This allowed us to perform unique analyses of the produced assemblies using a graph model of the alignment of the assembly, the haplotypes and the bacterial contamination.
The code on this page was used by the UC Santa Cruz analysis team (Dent Earl, Benedict Paten, John St. John, Ngan Nguyen, Mark Diekhans, David Haussler) to assess the assemblies. The code can be run to reproduce the Assemblathon 1 assessments, and it can also be applied to novel data sets, where the "truth" (or some proxy of underlying haplotype(s) [1 or 2]) is known.
The Assemblathon 1 analysis was run as a collaborative project with the UC Davis Assemblathon 1 analysis team (Keith Bradnam, Aaron Darling, Joeseph Fass, Dawei Lin, Ian Korf).
Earl et al. Assemblathon 1: A competitive assessment of de novo short read assembly methods. Genome Res (2011) vol. 21 (12) pp. 2224-41 Link.