Why are kernel methods with RBFs effective for handwritten digits (letters) classification?

Why is dot product square a kernel function for performing a circular classification in support vector machines?

  • I'm trying to get a overall picture of support vector machines, and came across kernel functions in the descriptions K(x1, x2) Can someone provide an big picture explanation of how does the kernel function quantifies similarity in different scenarios? I understand by simply x1.x2 it's akin to cosine similarity, but how is it a linear kernel? Similarly, using (x1.x2)^2, can perform non linear classification in the form below, but why? I don't see it anywhere close to any equation of a circle?

  • Answer:

    Imagine that you add another dimension, lets call it z. On this z-axis your data is represented by some kernel function. By using a suitable kernel function, which in this case could be a Gaussian kernel this new figure would go to 1 in the z-axis as (x1, x2) -> (0, 0), and would go to zero as (x1, x2) -> (inf, inf). The nonlinearity comes from the fact, that you transform your data in a nonlinear fashion from, in your case, a two-dimensional space to a three-dimensional space. Here's a nice blogpost about kernelfunctions: http://crsouza.blogspot.dk/2010/03/kernel-functions-for-machine-learning.html And here's a lecture on SVM's: http://www.cs.ucf.edu/courses/cap6412/fall2009/papers/Berwick2003.pdf

Rógvi Dávid Arge at Quora Visit the source

Was this solution helpful to you?

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.