Solucija - Why are kernel methods with RBFs effective for handwritten digits (letters) classification?

Why are kernel methods with RBFs effective for handwritten digits (letters) classification?

The question emerged while reading Ch. 3 of http://www.gaussianprocess.org/gpml/ . In the end of this chapter, the authors gave results for the problem of handwritten digits classification (16x16 greyscale pictures); features are 256 pixel intensities + bias. I was surprised that in such a high-dimensional problem, 'metric' methods, like Gaussian processes with squared exponential kernel, or SVM with the same kernel, behave quite nice without any dimension reduction preceeded. Also, I heard sometimes that SVM is good for [essentially bag-of-word] text classification. Why aren't they suffering from the curse of dimensionality?
Answer:

To put this interesting discussion on the ground, one can modify the toy script from http://scikit-learn.org/stable/auto_examples/plot_digits_classification.html to get some simple statistics. For example, let us look at the last test vector, and how far it is from 519 support vectors (the space has 8*6=42 dimensions). Distances are big enough, laying in the interval [26.7394839142 64.88451279]. However, gamma is small (0.001), i.e. variance is big so "effective distance" is small (and that is the key, I think!). As a consequence, scalar product (i.e. exp(-gamma*|test-support|^2) with all 519 support vectors lays in the interval [0.937175688934 0.973614850818]

ostrodmit at Cross Validated Visit the source

Was this solution helpful to you?

Other answers

From a machine learning perspective, 257 dimensions is far from high dimensional. A wide range of problems, including text classification, are solved in thousands of dimensions. Currently, problems that are considered high dimensional are in millions of dimensions. Both Gaussian processes and SVMs are part of a larger class of algorithms called http://en.wikipedia.org/wiki/Kernel_methods. The combination of the http://en.wikipedia.org/wiki/Kernel_trick, the http://www.cs.berkeley.edu/~bartlett/courses/281b-sp08/8.pdf and proper use of http://en.wikipedia.org/wiki/Regularization_%28mathematics%29 make these methods robust against the curse of dimensionality. Kernel methods always work on distances between points, regardless of the dimensionality in which this distance is defined. The dimensionality can be infinite, for example with the Gaussian kernel (see e.g. slide 11 of http://www.csie.ntu.edu.tw/~cjlin/talks/kuleuven_svm.pdf). From the representer theorem we know that the solution of any kernel method can be written in terms of instances, and is therefore limited in dimensionality and complexity.

Marc Claesen

Related Q & A:

Why are random letters inserted in my tmux session?Best solution by Super User
How to convert capital letters to small letters in MS excel?Best solution by Yahoo! Answers
Why won't certain letters show up on my msn messenger?Best solution by Yahoo! Answers
Why the letters on the a keyboard arranged so?Best solution by ChaCha
Why were new farming methods important to the Agricultural Revolution?Best solution by ChaCha

Just Added Q & A:

How many active mobile subscribers are there in China?Best solution by Quora
How to find the right vacation?Best solution by bookit.com
How To Make Your Own Primer?Best solution by thekrazycouponlady.com
How do you get the domain & range?Best solution by ChaCha
How do you open pop up blockers?Best solution by Yahoo! Answers

For every problem there is a solution! Proved by Solucija.

Got an issue and looking for advice?
Ask Solucija to search every corner of the Web for help.
Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.