Each data point produced by a DNA microarray hybridization experiment
represents the ratio of expression levels of a particular gene under
two different experimental conditions. The
result, from an experiment with n genes on a single chip, is a
series of n expression-level ratios. Typically, the numerator of
each ratio is the expression level of the gene in the varying
condition of interest, whereas the denominator is the expression level
of the gene in some reference condition. The data from a series of
m such experiments may be represented as a gene expression matrix,
in which each of the n rows consists of an m-element expression
vector for a single gene.
The expression measurement is positive if the gene is induced
(turned up) with respect to the reference state and negative if it is
repressed (turned down). Experiments are carried out using a set of 79-element gene
expression vectors for 2467 annotated yeast genes. The data were collected at
various time points during the diauxic shift, the
mitotic cell division cycle, sporulation, and temperature and reducing shocks,
and is available
here
via the Stanford web site.
Class definitions made by the
MIPS Yeast Genome Database
that we used
to train SVMs include six functional classes: tricarboxylic acid
cycle (TCA), respiration, cytoplasmic ribosomes, proteasome, histones
and helix-turn-helix proteins. The MYGD class definitions come from
biochemical and genetic studies of gene function, while the microarray
expression data measures mRNA levels of genes. Many classes in MYGD,
especially structural classes such as protein kinases, will be
unlearnable from expression data by any classifier. The first five
classes were selected because they represent categories of genes that
are expected, on biological grounds, to exhibit similar expression
profiles. Furthermore, Eisen et al. suggested that the mRNA
expression vectors for these classes cluster well using hierarchical
clustering. The sixth class, the
helix-turn-helix proteins, is included as a control group. Since
there is no reason to believe that the members of this class are
similarly regulated, we did not expect any classifier to learn to
recognize members of this class based upon mRNA expression
measurements.
An interesting observation upon looking at the raw expression data shows that genes in the TCA
and respiration classes are regulated very similarily, as shown in the figure below.
Similarity between the average expression profiles of
the tricarboxylic-acid pathway and respiration chain complexes. Each
series represents the average log expression ratio for all genes in
the given family plotted as a function of DNA microarray experiment.
Ticks along the X-axis represent the beginnings of experimental
series.
 |
- The target
gene labels from the MIPS database for six functional classes.
- A web
page for plotting gene expression data used for cross-validation tests
against a plot of the genes in a single MIPS class.
- A web
page for plotting gene expression data used for predictions
against a plot of the genes in a single MIPS class.
- A web
page for plotting gene expression data used for predictions for any set of genes.