The fly data set is comprised of 20 real fly genomes in variou states of completion, from the nearly-complete Drosophila melanogaster
dm3 (chromosome sequences) to the fragmentary D. rhopaloa
droRho (34,000 contigs).
These genomes are real and therefore do not have a root genome or a burnin process.
This is a combination of the phylogeny presented in the modENCODE white paper proposing the sequencing of eight additional fly genomes (genome.gov) courtesy of Artyom Kopp (UC Davis) and the phylogeny used by UCSC for the 15-way inserct alignment. The Kopp tree lacked droSim1 and droSec1 which were added by normalizing the branch lengths between the dm3 branches on the two trees. Extraneous species were trimmed using tree_doctor from PHAST. This tree is provided for progressive aligners that need a guide tree and will be used in the analysis for StatSigMa-w.
((droGri2:0.183954, droVir3:0.093575):0.000000, (droMoj3:0.110563, ((((droBip:0.034265, droAna3:0.042476):0.121927, (droKik:0.097564, ((droFic:0.109823, (((dm3:0.023047, (droSim1:0.015485, droSec1:0.015184):0.013850):0.016088, (droYak2:0.026909, droEre2:0.029818):0.008929):0.047596, (droEug:0.102473, (droBia:0.069103, droTak:0.060723):0.015855):0.005098):0.010453):0.008044, (droEle:0.062413, droRho:0.051516):0.015405):0.046129):0.018695):0.078585, (droPer1:0.007065, dp4:0.005900):0.185269):0.068212, droWil1:0.259408):0.097093):0.035250);
Readme with notes on the build process to download and create the flies.seq.tar.gz file (not necessary to download, data is contained in downloadFlies.sh below, this link is here for the sake of a paper trail,): createFlies.txt
Script to download and create the correct directory structure: downloadFlies.sh
An analysis package has the following directory structure:
packagePrimates/ .. README.txt .. annotations/ .. predictions/ .. regions/ .. sequences/ .. truths/
These directories may be populated with the following (expand all files):
tree drawn using phyfi