next up previous
Next: Introduction


Support Vector Machine Classification of
Microarray Gene Expression Data


Michael P. S. Brown$^\ddag $
William Noble Grundy$^\ddag $ 1
David Lin$^\ddag $
Nello Cristianini$^\S$ 2
Charles Sugnet$^\P$
Manuel Ares, Jr.$^\P$
David Haussler$^\ddag $

$^\ddag $Department of Computer Science
University of California, Santa Cruz
Santa Cruz, CA 95065

$^\P$Center for Molecular Biology of RNA
Department of Biology
University of California, Santa Cruz
Santa Cruz, CA 95065

$^\S$Department of Engineering Mathematics
University of Bristol
Bristol, UK

June 12, 1999


We introduce a new method of functionally classifying genes using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). We describe SVMs that use different similarity metrics including a simple dot product of gene expression vectors, polynomial versions of the dot product, and a radial basis function. Compared to the other SVM similarity metrics, the radial basis function SVM appears to provide superior performance in identifying sets of genes with a common function using expression data. In addition, SVM performance is compared to four standard machine learning algorithms. SVMs have many features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers.

Keywords: Gene Microarrays, Gene Expression, Support Vector Machines, Pattern Classification, Functional Gene Annotation

Running head: SVM Classification of Gene Expression Data

next up previous
Next: Introduction
Michael Brown