AA-09-14-PIT-Fingerprinting


bases: Fundamental Theorem of Algebra/ Field/ ring of multivariate polynomials/ Chebyshev’s theorem/ Prime Number Theorem

For univariate polynomial of degree $d$, it is trivial that the probability hitting occurs of randomly picked $r$ from fixed $S$ is $\Pr[f(r)=0]\le\frac{d}{|S|}.$ By fixing $S\subseteq\mathbb{F}$ to be 2d, this probability is at most 1/2, we can reduce it to an arbitrarily small constant $\delta$ by repeat the above testing independently for $\log_2 \frac{1}{\delta}$ times, since the error probability decays geometrically as we repeat the algorithm independently.

Communication Complexity of Equality

$$\mathrm{EQ}:\{0,1\}^n\times\{0,1\}^n\rightarrow\{0,1\}$$

A Randomized protocol for EQ
Bob does:

  • pick $r\in\mathbb{Z}_p$ uniformly at random;
  • send $r$ and $g(r)$ to Alice;

Upon receiving $r$ and $g(r)$ Alice does:

  • compute $f(r)$;
  • if $f(r)=g(r)$ return Y, else N.

$\Pr[f(r)=g(r)]\le \frac{n-1}{p}$
By choosing $p$ to be a prime in the interval $[n^2, 2n^2]$, the protocol’s false positive probablity is $O(1/n)$, with communication complexity $O(log\ n)$.


True form of Polynomial Identity Testing(PIT)

Input: $f(x_1,x_2,\ldots,x_n)=\sum_{i_1,i_2,\ldots,i_n\ge 0\atop i_1+i_2+\cdots+i_n\le d}a_{i_1,i_2,\ldots,i_n}x_{1}^{i_1}x_2^{i_2}\cdots x_{n}^{i_n}$.
Determine whether $f\equiv0$.

Randomized algorithm for multivariate PIT

  • suppose we have a finite subset $S\subseteq\mathbb{F}$;
  • pick $r_1,r_2,\ldots,r_n\in S$ uniformly and independently at random;
  • if $f(\vec{r})=f(r_1,r_2,\ldots,r_n) = 0$ then return “yes” else return “no”.

Again, the false positive probability’s upper bound, as Schwartz-Zippel Theorem
told us, $\Pr[f(r_1,r_2,\ldots,r_n)=0]\le\frac{d}{|S|}.$

Proof by Induction:
Induction basis: …
Induction hypothesis: …
Induction step:
$f(x_1,x_2,\ldots,x_n)=\sum_{i=0}^kx_n^{i}f_i(x_1,x_2,\ldots,x_{n-1})$
where $k$ is the highest degree of $x_n$. In particular, we write $f$ as:
$f(x_1,x_2,\ldots,x_n)=x_n^k f_k(x_1,x_2,\ldots,x_{n-1})+\bar{f}(x_1,x_2,\ldots,x_n)$
That is when $f_k(r_1,r_2,\ldots,r_{n-1})$ non-zero, the problem reduce to unvariate case with guarantee that $\bar{f}$ has no $x_n^k$ term, otherwise, it is a probability conditioning on $f_k(r_1,r_2,\ldots,r_{n-1})=0$ which can be calculate from hypothesis.
By the law of total probability, we have

\begin{align}
&\Pr[f(r_1,r_2,\ldots,r_n)=0]\\
=
&\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})=0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\\
&+\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\cdot\Pr[f_k(r_1,r_2,\ldots,r_{n-1})\neq0].
\end{align}

\begin{align}
(*)
&\qquad
&\Pr[f_k(r_1,r_2,\ldots,r_{n-1})=0]\le\frac{d-k}{|S|}.
\end{align}

\begin{align}
(**)
&\qquad
&\Pr[f(\vec{r})=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]=\Pr[g_{r_1,\ldots,r_{n-1}}(r_n)=0\mid f_k(r_1,r_2,\ldots,r_{n-1})\neq0]\le\frac{k}{|S|}
\end{align}

$\Pr[f(r_1,r_2,\ldots,r_n)=0]
\le\frac{d-k}{|S|}+\frac{k}{|S|}=\frac{d}{|S|}.$


Fingerprinting

By PIT

For any bit-string $x\in\{0,1\}^n$, its random fingerprint is $\mathrm{FING}(x)=\sum_{i=1}^n x_i r^{i}$, where the additions and multiplications are defined over a finite field $\mathbb{Z}_p$, and $r$ is chosen uniformly at random from $\mathbb{Z}_p$, where $p$ is some suitable prime which can be represented in $\Theta(\log n)$ bits. For instance, we can choose $p$ to be any prime from the interval $[n^2, 2n^2]$. Due to Chebyshev’s theorem, such prime must exist. If $x\neq y$, $\Pr[\mathrm{FING}(x)=\mathrm{FING}(y)] \le \frac{n-1}{p}\le \frac{1}{n}.$

By randomized checksum

Communication protocol for EQ by random checksum
Bob does:
$\qquad$for some parameter k,
$\qquad$ - choose a prime $p\in[k]$ uniformly at random;
$\qquad$ - send $p$ and $x\ mod\ p$ to Alice.
Upon receiving $p$ and $x\ mod\ p$, Alice does:
$\qquad$ - check whether $x\bmod p=y\bmod p$.

Communication complexity is obviously $O(\log k)$. The error probability $\Pr[z\bmod p=0]\le\frac{\mbox{the number of prime divisors of }z}{\mbox{the number of primes in }[k]}$, let $k=2n^2\ln n$, we have $\Pr[z\bmod p=0]\le\frac{n}{\pi(k)}\sim\frac{1}{n}$.


Checking distinctness

  • input: two multisets $A=\{a_1,a_2,…,a_n\}$ and $B=\{b_1,b_2,…,b_n\}$ where $a_1,a_2,…,b_1,b_2,…,b_n\in\{1,2,…,n\}$
  • Determine whether $A=B$.

We define a unvariate polynomial $f_A\in\mathbb{Z}_p[x]$ over the finite field $\mathbb{Z}_p$ as follows: $f_A(x)=\prod_{i=1}^n(x-a_i)$, where $+$ and $·$ are defined over the finite field $\mathbb{Z}_p$. And the random fingerprinting function is: $\mathrm{FING}(A)=f_A(r)=\prod_{i=1}^n(r-a_i)$.

Theorem(Lipton 1989)
if $A\neq B$, then $Pr[FING(A)=FING(B)]=O(1/n)$ with p in $[(n\log n)^2, 2(n\log n)^2]$.

Assuming that $A\neq B$, we must have $\tilde{f}_A\not\equiv\tilde{f}_B$. But $f_A\equiv f_B$ is possible, then by the law of total probability:
$Pr[FING(A)=FING(B)]=Pr[f_A(r)=f_B(r)|f_A\not\equiv f_B]Pr[f_A\not\equiv f_B]+Pr[f_A(r)=f_B(r)|f_A\equiv f_B]Pr[f_A\equiv f_B]$

We finally have $Pr[FING(A)=FING(B)]=O(1/n).$