Currently, most approaches to the computational analysis of gene expression data attempt to learn functionally significant classifications of genes in an unsupervised fashion. A learning method is considered unsupervised if it learns in the absence of a teacher signal.
Support vector machines (SVMs) and other supervised learning techniques such as Parzen windows, Fisher's linear discriminant, and decision tree learners, use a training set to specify in advance which data should cluster together.
As applied to gene expression data, an SVM would begin with a set of genes that have a common function, for example, genes coding for ribosomal proteins or genes coding for components of the proteasome. In addition, a separate set of genes that are known not to be members of the functional class is specified. These two sets of genes are combined to form a set of training examples in which the genes are labeled positively if they are in the functional class and are labeled negatively if they are known not to be in the functional class. A set of training examples can easily be assembled from literature and database sources.
Using this training set, an SVM would learn to discriminate between the members and non-members of a given functional class based on expression data. Having learned the expression features of the class, the SVM could recognize new genes as members or as non-members of the class based on their expression data. The SVM could also be reapplied to the training examples to identify outliers that may have previously been assigned to the incorrect class in the training set.Thus, an SVM would use the biological information in the investigator's training set to determine what expression features are characteristic of a given functional group and use this information to decide whether any given gene is likely to be a member of the group.
For a more mathematically complete explanation of how SVMs were implemented for these tests, click here .
For descriptions of the other supervised learning methods, click here .
For a great new book describing SVMs by N. Cristianini and J. Shawe-Taylor, click here .
SVMs are now being employed for a wide variety of applications, a list of which can be seen here .