203-Kernel-SVM


Kernel Trick

Dual SVM: another QP with valuable geometric messages and almost no dependence on $\tilde{d}$
$q_{n,m}=y_ny_mz_n^Tz_m$, and what we need is $z_n^Tz_m=\phi(x_n)^T\phi(x_m)$ calculated faster than $O(\tilde{d})$. For example, for $\phi_2$, transform+inner product would be carefully done in $O(d)$ rather than $O(\tilde{d})=O(d^2)$.
2nd order polynomial transform:
$\phi_2(x)=(1,x_1,x_2,…,x_d,x_1^2,x_1x_2,…,x_1x_d,x_2x_1,x_2^2,…,x_2x_d,…,x_d^2)$

Kernel is a trick that can do transform & inner product at the same time. In addition to calculating $Q$, when solving optimal bias $b$ from $SV(x_s,y_s)$, we don’t use the form of complete $w$. In order to use the trick, we just substitute the primal formula of $w$. Similarily, when testing input $x$, we don’t use final form of $w$.
Thus in kernel SVM, $w$ was saved in $\alpha_s$ of SVs & we predict with SVs only.


Polynomial Kernel (Polynomial SVM)

General Polynomial Kernel:
$K_Q(x,x^{‘}=(\zeta+\gamma x^Tx^{‘})^Q\ with\ \gamma>0,\ \zeta\geq0$


Gaussian Kernel (Infinite Dimensional Transform)

$K(x,x^{‘})=exp(-\gamma||x-x^{‘}||^2)\ with\ \gamma>0$
also called Radial Basis Function(RBF) Kernel.
may fall into overfitting, need careful selection of $\gamma$, powerful.


Other Valid Kernels

Necessary & sufficient conditions for valid kernel:(Mercer’s condition)

  • symmetric
  • let $k_{ij}=K(x_i, x_j)$, the matrix $K$ must always be positive semi-definite