Many linear parametric models can be re-cast into an equivalent 'dual representstion' in which the predictions are also based on linear combinations of a kernel function evaluated at the training data points. As we shall see, for models which are based on a fixed nonlinear feature space mapping Φ(x), the kernel function is given by the relation: k(x,x')=Φ(x)TΦ(x')=ΣΦi(x)Φi(x'). From this definition, we see that the kernel is a symmetric function of its arguments so that k(x,x')=k(x',x).
The simplest example of a kernel function is obtained by considering the identity mapping for the feature space so that Φ(x)=x, in which case k(x, x')=xTx'. We shall refer to this as the linear kernel. The concept of a kernel formulated as an inner product in a feature space allows us to build interesting extensions of many well-known algorithms by making use of the kernel trick, also known as kernel substitution. The general idea is that, if we have an algorithm formulated in such a way that the input vector x enters only in the form of scalar products, then we can replace that scalar product with some other choice of kernel.
We require that the kernel k(x,x') be symmetric and positive semidefinite anf that it expresses the appropriate form of similarity between x and x' according to the intended application.