Trade-off on M

can we make sure that $E_{out}(g)$ is close enough to $E_{in}(g)$ ?
can we make $E_{out}(g)$ small enough?

$\mathbb{P}[BAD]\leq 2·M·exp(…)$
对于小的M来说，我们可以做到第一点，但是第二点做不到，因为选择太少了；
对于大的M来说，我们可以做到第二点，但是第一点做不到.

$\mathbb{P}[|E_{in}(g)-E_{out}(g)|>\epsilon]\leq 2·M·exp(-2\epsilon^2N)$

overlapping for similar hypothesis $h_1\approx h_2$
to account for overlapping, we can group similar hypothesis by kind.
lines in 2D
| N |数量(effective(N))|
| :—–:| :—-: |
| 1 | 2 |
| 2| 4 |
| 3 | 8 |
|4|$14<2^N$|

we can replace M by effective(N)

Effective Number of Hypothesis

$h(x_1,x_2,…,x_N)$ called a dichotomy: hypothesis ‘limited’ to the eyes of $x_1,x_2,…,x_N$
$h(x_1,x_2,…,x_N)$ all dichotomies ‘implemented’ by $H$ on $x_1,x_2,…,x_N$ with upper bounded by $2^N$
仅对每一个在数据集上的不同划分进行计数，不同的假设

Growth Function: remove dependence by taking max of all possible $(x_1,x_2,…,x_N)$
$m_H(N)=\mathop{max}\limits_{x_1,x_2,…,x_N\epsilon \chi}|H(x_1,x_2,…,x_N)|$

positive rays : $\qquad \qquad m_H(N)=N+1$

positive intervals : $\qquad\ \ m_H(N)=0.5·N^2+0.5·N+1$

convex sets : $\qquad\qquad\ \ m_H(N)=2^N$

2D perceptrons : $\qquad\ \ \ \ m_H(N)<2^N\ in\ some\ cases$

Break Point: if no k inputs can be shattered by $H$, call k a break point for $H$. Will foucus on minimum break point k.

positive rays : $O(N)$ break point at 2

positive intervals : $O(N^2)$ break point at 3

convex sets : $O(2^N)$ no break point

2D perceptrons : break point at 4

推论：$m_H(N)=O(N^{k-1})$

Restriction of Break Point

e.g. 当minimum break point k=2时，在N=3时，maximum possible=4<<$2^3$
—>break point k restricts maximum possible $m_H(N)$ a lot for $N>k$
idea: $m_H(N) \leq\ maximum\ possible\ m_H(N)\ given\ k \ \ \leq\ poly(N)$

Bounding Function

$B(N,K)$ : maximum possible $m_H(N)$ when break point = k
Table of Bounding Function:

这里我们知道B(2,2)=3; B(3,2)=4

当然B(N,1)=1

这里当$N<k,\ B(N,k)=2^N$

当$N=k,\ B(N,k)=2^N-1$

$B(N,k)\leq B(N-1,k)+B(N-1,k-1)$

Bounding Function: The Theorem
$B(N,k)\leq \sum_{i=0}^{k-1}\binom{N}{i}$
右边最高项是$N^{k-1}\Rightarrow m_H(N)\ is\ poly(N)\ if\ break\ point\ exists$.
$’\leq’\ can\ be\ ‘=’\ actually$.

Replace M by $m_H(N)$

$M$ 的计数应当是所有可能的 $D$, 而不是 $samples$, 也就是无论 $m_H(N)$ 如何, $N$ 应当是无限的. 如果要替换成关于 $samples$ 的 $N$, 就需要结合 $validation$ 估计 $E_{out}$ 和相应的概率知识作替换.
want: $\mathbb{P}[\exists h\epsilon H\ s.t.\ |E_{in}(h)-E_{out}(h)|>\epsilon]\leq 2·m_H(N)·exp(-2\epsilon^2N)$
from: $\mathbb{P}[\exists h\epsilon H\ s.t.\ |E_{in}(h)-E_{out}(h)|>\epsilon]\leq 2·M·exp(-2\epsilon^2N)$
但我们不能够直接这么做，因为原始式子中|H|是无限的，$E_{out}(h)$是无限的，之前所讨论的是基于采样的in-sample N. 所以为了替换成功我们还需要作些推导.

Step1: Replace $E_{out}\ by\ E_{in}^{‘}$, i.e. sample verification set $D’$ of size N to caculate $E_{in}^{‘}$

$\frac {1}{2}\mathbb{P}[\exists h\epsilon H\ s.t.\ |E_{in}(h)-E_{out}(h)|>\epsilon]\leq \mathbb{P}[\exists h\epsilon H\ s.t.\ |E_{in}(h)-E_{in}^{‘}(h)|>\frac {\epsilon}{2}]$
对于上式，基于从$E_{out}$ 到 $E_{in}^{‘}$ 的替换，第一个1/2表示替换之后式子中差值的绝对值大于误差值的概率增加了一倍；第二个1/2表示更加严格的误差限制(数学需要？).

Step2: Decompose $H$ by Kind
$BAD\leq 2\mathbb{P}[\exists h\epsilon H\ s.t.\ |E_{in}(h)-E_{in}^{‘}(h)|>\frac {\epsilon}{2}]\\
\qquad\ \leq 2m_H(N)\mathbb{P}[fixed\ h\ s.t.\ |E_{in}(h)-E_{in}^{‘}(h)|>\frac {\epsilon}{2}]$
$E_{in}(h)$ 有 N 笔数据，$E_{in}^{‘}(h)$ 也有 N 笔不同的数据，无限的 $H$ 变成了$|H(x_1,…,x_N,x_1^{‘},…,x_N^{‘})|$，这样我们终于可以使用$m_H(2N)$来计算BAD-overlap了.

Step3: Use Hoeffding without Replacement
$BAD \leq 2m_H(N)\mathbb{P}[fixed\ h\ s.t.\ |E_{in}(h)-E_{in}^{‘}(h)|>\frac {\epsilon}{2}]\\
\qquad\ =2m_H(N)\mathbb{P}[fixed\ h\ s.t.\ |E_{in}(h)-\frac{E_{in}+E_{in}^{‘}(h)}{2}|>\frac {\epsilon}{4}]\\
\qquad\ \leq 2m_H(N)·2exp(-2(\frac{\epsilon}{4})^2N)$
其中的代换是很有必要的，因为比较第一次sample与两次sample平均的$E$是使用Hoeffding的条件.

Vapnik-Chervonenkis (VC) bound:
$\mathbb{P}[\exists h\epsilon H\ s.t.\ |E_{in}(h)-E_{out}(h)|>\epsilon]\leq 4m_H(2N)·exp(-\frac{\epsilon^2}{8}N)$