2017-08-04-Similarity-Measure

Veröffentlicht am 2017-08-04

In statistics and related fields, a similarity measure or similarity function is a real-valued function that quantifies the similarity between two objects. Although no single definition of a simlarity measure exists, usually such measures are in some sense the inverse of distance metrics: they take on large values for similar objects and either zero of a negative value for very dissimilar objects. E.g., in the context of cluster analysis, Frey and Dueck suggest defining a similarity measure $s(x,y)=-||x-y||_2^2$ where the inverse is the squared Euclidean distance.

16-Three-Learning-Principles

Veröffentlicht am 2017-07-30

Occam’s Razor

The simplest model that fits the data is also the most plausible .

Simple Model
simple hypothesis $h$ : small $\Omega(h)$ , specified by few parameters .
simple model $H$ : small $\Omega(H)$ , contains small number of hypotheses .
small $\Omega(h)\Leftarrow small\ \Omega(H)$
simple : small hypothesis/model complexity

15-Validation

Veröffentlicht am 2017-07-29

Model Selection Problem

There are so many models learned, even just for binary classification :
$$A\epsilon \{PLA,\ pocket,\ linear\ regression,\ logistic\ regression\}\times$$
$$T\epsilon \{100, 1000, 1000\}\times$$
$$\eta\epsilon\{1,0.01,0.0001\}\times$$
$$\phi\epsilon\{linear,\ quadratic,\ poly-10,\ Legendre-poly-10\}\times$$
$$\Omega(w)\epsilon\{L2\ regularizer,\ L1\ regularizer,\ symmetry\ regularizer\}\times$$
$$\lambda\epsilon\{0,0.01,1\}$$

Adrein-Movies

Veröffentlicht am 2017-07-28

Movies

2017/7/28 ： Requiem For a Dream

Music

莫扎特李斯特舒曼舒伯特门德尔松柴可夫斯基德沃夏克巴赫肖邦

14-Regularization

Veröffentlicht am 2017-07-26

Regularization Hypothesis Set

idea : ‘step back’ from $H_{10}$ to $H_2$

E.g.
hypothesis $w$ in $H_{10}$ : $w_0+w_1x+w_2x^2+…+w_{10}x^{10}$
hypothesis $w$ in $H_2$ : $w_0+w_1x+w_2x^2$
that is, $H_2=H_{10}$ AND ‘constraint that $w_3=w_4=…=w_{10}=0$ ‘ .

Regular-Expression-Matching

Veröffentlicht am 2017-07-26

Question

Implement regular Expression matching with support for ‘.’ and ‘*’ .

‘.’ Matches any single character .
‘*’ Matches zero or more of the preceding element .

The matching should cover the entire input string (not partial) .

The function prototype should be :
bool isMatch(const char s, const char p)

13-Hazard-of-Overfitting

Veröffentlicht am 2017-07-26

Bad Generalization & Overfitting

Bad Generalization : low $E_{in}$ and high $E_{out}$
Overfitting : lower $E_{in}$ and higher $E_{out}$

Cause of Overfitting : use excessive $d_{vc}$ , boise & limited data size $N$ .

Weiterlesen »

12-Nonlinear-Transformation

Veröffentlicht am 2017-07-23

Quadratic Hypotheses

有时候在某些 $D$ 上总会出现所有的 lines $E_{in}$ 都很大，这种时候线性假设就存在表达限制. 但是也许是 circular separable 的，例如对于假设 $h(x)=sign(-x_1^2-x_2^2+0.6)$ 如何将线性的一些求解算法应用到这种二次可分的假设上呢？

这里我们可以把 $x_1^2$ 等作为整体 $z_1$ 来看待，这样就映射到了熟悉的线性中的情况.

$\{(x_n,y_n)\}\ circular\ separable\ \Rightarrow\ \{(z_n,y_n)\}\ linear\ separable$
$x\epsilon X\ \mathop{\mapsto}\limits^{\Phi}\ z\epsilon Z\ with\ nonlinear\ feature\ transform\ \Phi$

11-Linear-Models-for-Classification

Veröffentlicht am 2017-07-20

Linear Models for Binary Classification

Visualizing Error Functions ($s=w^Tx\ ;\ y\epsilon\{-1,+1\}$)

linear classification:
$\qquad h(x)=sign(s)\ ;\ err(h,x,y)=[h(x)\neq y]$
$\qquad err_{0/1}(s,y)=[sign(s)\neq y]=[sign(ys)\neq1]$
linear regresion:
$\qquad h(x)=s\ ;\ err(h,x,y)=(h(x)-y)^2$
$\qquad err_{SQR}(s,y)=(s-y)^2=(ys-1)^2$
logistic regression:
$\qquad h(x)=\theta(s)\ ;\ err(h,x,y)=-ln\ h(yx)$
$\qquad err_{CE}(s,y)=ln(1+exp(-ys))$

10-Logistic-Regression

Veröffentlicht am 2017-07-20

Logistic Regression

binary classification: ideal f(x) = sign(P(+1|x)-1/2) {-1,+1}
‘soft’ binary classification: f(x) = P(+1|x) {[0,1]} —> target function
Logistic Hypothesis $h(x)=\theta (w^Tx)$ with $\theta (s)=\frac{1}{1+e^{-s}}$
Logistic Regression use $h(x)=\frac{1}{1+exp(-w^Tx)}$ to approximate target function f(x)=p(+1|x)
Error Function LR的输出将是概率 $P(y|x)$ ，为了定义损失函数，我们引入Likelihood. 意即我们认为数据 $D$ 是 $target\ function\ f(x)$ 以某种概率生成的，这个概率是：

$likehood(f)=p(x_1)f(x_1)\times p(x_2)(1-f(x_2))\times…\times p(x_N)(1-f(x_N))$