Confidence Levels for GPCR Subfamily Classification Scores

These statistics were computed by testing classifiers which were trained on only one half of our full set of GPCR sequence data. The remaining examples were held out during training and used to evaluate how the classifiers would perform on previously unseen sequence data. [Find all training data here]. Please note that the web server classifiers you are using were trained on all of the sequence data and should be more reliable than what is reported below. (However, we have no way of testing this.)

Each of 93 subfamily classifiers was tested with a set of 710 sequences, containing:


How confident should I be if my sequence is accepted or rejected by a particular classifier?

Some general guidelines:

Look at the score which the subfamily classifier gave your sequence.

Examples:
0.9418235histaminehistamine receptor classifier predicts your sequence is histamine receptor.
-0.4217733olfactoryolfactory receptor classifier predicts your sequence is not olfactory receptor.

You can evaluate the confidence of your classification by looking at a a plot of percent misclassified vs. discriminant score in our tests.

Analysis of false positive errors shown in the plot.

Training a good classifier with only a few examples is difficult, and these tests were done on half of the available training sequences. It is not surprising that some classifiers built for small subfamilies do not test well. However, the web classifiers you are using were built with approximately twice the amount of data as the classifiers we tested, and are likely to perform better for you than they did in our tests.


Email rachelk@cse.ucsc.edu with problems and questions.
UCSC Computational Biology Group



rachelk@cse.ucsc.edu