Increased regularization does not necessarily decrease the number of support vectors

Although neural networks seem to be getting all the attention in machine learning these days, support vector machines (SVMs) are still quite useful in some situations. For example, if you want a classifier to work on a wearable/implantable device that consumes very little energy, SVMs-on-a-chip are currently probably your best option.

If you are making an extremely efficient SVM though, you will probably need to limit the number of support vectors (SVs) that define the classifier and need to be compared to each observation you want to classify. Intuitively the regularization parameter, C, used to train the SVM should reduce the number of SVs since decreasing it penalizes complex models. However, this is not necessarily the case as illustrated in the plot below from some data I’m working on. You can see that as C decreases, the number of SVs often increases.

The number of support vectors in each trained radial-basis function SVM for various hyperparameter values. The green star indicates the hyperparameter values that produce the best results on validation data.

The simplest way to limit the number of SVs is to limit the number of samples in the training data that can be selected as SVs (e.g., Lin et al., 2003, Wang et al., 2012). There are some savvier methods in the literature that try to approximate multiple SVs with a new single SV (Nguyen et al., 2005) or that directly try reduce the number of SVs in the optimization procedure (reviewed in Keerthi et al., 2006). None of these methods seem to be widely used, but MATLAB code for Wang et al.’s method is publicly available.


Keerthi, S. Sathiya, Olivier Chapelle, and Dennis DeCoste. “Building support vector machines with reduced classifier complexity.” Journal of Machine Learning Research 7, no. Jul (2006): 1493-1515.

Lin, Kuan-Ming, and Chih-Jen Lin. “A study on reduced support vector machines.” IEEE transactions on Neural Networks 14, no. 6 (2003): 1449-1459.

Nguyen, DucDung, and TuBao Ho. “An efficient method for simplifying support vector machines.” In Proceedings of the 22nd international conference on Machine learning, pp. 617-624. ACM, 2005.

Wang, Zhuang, Koby Crammer, and Slobodan Vucetic. “Breaking the curse of kernelization: Budgeted stochastic gradient descent for large-scale svm training.” Journal of Machine Learning Research 13, no. Oct (2012): 3103-3131.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s