Dual-SVM

The primal svm (linear svm) is solvable with quadratic programming:
$
\begin{eqnarray}
\mathop{min}\limits_{b,w}\qquad \frac{1}{2}w^Tw\qquad\quad s.t.\quad y_n(w^Tx_n+b)\geq1,\ for\ n=1,2,…,N
\end{eqnarray}
$
For linear SVM, we can throw the problem into a QP with $d+1$(not many) unknown variables. For non-linear SVM, we can replace $x_n$ with $z_n=\Phi(x_n)$, and we get $\tilde{d}+1$(can be infinite) unknown varibales.

Goal: SVM without dependence on $\tilde{d}$

Starting Point: Constrained to ‘Unconstrained’ by Lagrange Multiples
The primal problem is equal to the problem below:
$
\begin{eqnarray}
\mathop{min}\limits_{b,w}(\mathop{max}\limits_{\alpha\geq0}\quad L(b,w,\alpha)=\frac{1}{2}w^Tw+\sum\limits_{n=1}^{N}\alpha_n(1-y_n(w^Tz_n+b)))
\end{eqnarray}
$
Constraints now hidden in $max$.

Lagrange Dual Problem
For any fixed $\alpha^{‘}$ with all $\alpha^{‘}_n\geq0$,
$
\begin{eqnarray}
\mathop{min}\limits_{b,w}(\mathop{max}\limits_{\alpha\geq0}\quad L(b,w,\alpha))&\geq &\mathop{min}\limits_{b,w}\quad L(b,w,\alpha^{‘})\\
\mathop{min}\limits_{b,w}(\mathop{max}\limits_{\alpha\geq0}\quad L(b,w,\alpha))&\geq &\mathop{max}\limits_{\alpha^{‘}\geq0}\quad\mathop{min}\limits_{b,w}\quad L(b,w,\alpha^{‘})
\end{eqnarray}
$

$\geq$: weak duality
$=$: strong duality, true for QP if convex primal & feasible primal & linear constraints, called constraint qualification.

Simplifications
For $b$,
$
\begin{eqnarray}
\frac{\partial L}{\partial b}=0=-\sum_{n=1}^N\alpha_ny_n
\end{eqnarray}
$
and we can remove $b$ from primal formula.

For $w$,
$
\begin{eqnarray}
\frac{\partial L}{\partial w}=0=w-\sum_{n=1}^N\alpha_ny_nz_n
\end{eqnarray}
$

Finally, we substituting $w$,
$
\begin{eqnarray}
\mathop{min}\limits_{\alpha}\quad \frac{1}{2}\sum_{n=1}^N\sum_{m=1}^N\alpha_n\alpha_my_ny_mz_n^Tz_m-\sum_{n=1}^N\alpha_n\qquad s.t.\ \sum_{n=1}^Ny_n\alpha_n=0\ and\ \alpha\geq0
\end{eqnarray}
$

Now we got a QP form problem with $N$ unknown variables and $N+1$ constraints. Actually the $Q$ symetric matrix $Q_{n,m}=y_ny_mz_n^Tz_m$ can be very large, 30000*30000 for instence.
In that case, we need special solver for not storing whole $Q$ & utilizing special constraints properly.

As we get $\alpha$ from QP, we can use KKT conditions to solve Optimal(b,w):
if primal-dual optimal $(b,w,\alpha)$,

primal feasible: $y_n(w^Tz_n+b)\geq1$
dual feasible: $\alpha\geq0$
dual-inner optimal: $\sum y_n\alpha_n=0$; $w=\sum\alpha_n(y_nz_n)$
primal-inner optimal: $\alpha_n(1-y_n(w^Tz_n+b))=0$

For any $\alpha_n>0$, the corresponding $(1-y_n(w^Tz_n+b))$ must be zero. And the physical meaning is that the corresponding $z_n$ is exactly on the ‘fat’ boundary. Actually, we use these points to compute $w$ and any one to compute $b=y_n-w^Tz_n$.
Those points are called SV as we only need it to get $w$ and $b$.

Moreover, $w_{PLA} = \sum\beta_n(y_nx_n)$, that is $w$ can be linear combinations of $(y_nx_n)$ on $\beta_n$ where $\beta_n$ is the mistake counts happened on $x_n$. And SVM only use SVs to represent $w$.

Lastly, we may cost too much on naive computation of $Q$ for large $\tilde{d}$, and avoiding naive computation of $Q$ can be done.