Basic Definitions

In this section, we introduce the basic prerequisites for understanding the proofs in this book.

Essential Algebra

In this book, we will

Definition: Integral Domain

An integral domain is a commutative ring with no “zero divisors”. All this means is that: $\forall a, b \in F, a \cdot b = 0 ⟹ a = 0 \lor b = 0$

Example: The Integers

The integers $Z$ form an integral domain.

An example of a ring, which is not an integral domain is $Z / (2^{k} Z)$ , since: $2^{k - 1} \cdot 2^{k - 1} = 0 mod 2^{k}$ Which means that $2^{k - 1}$ is a zero divisor.

Definition: Field

A field is an integral domain where every non-zero element is a unit. All this means is that:

$\forall. a \in F ∖ {0}, \exists. b \in F, a \cdot b = 1$

In other words, $a^{- 1}$ is defined for all $a \in F ∖ {0}$ .

Definition: Prime Fields

The ring $F_{p} = Z / (p Z)$ is a field whenever $p$ is prime.

This corresponds to the “integers modulo $p$ ”.

It is also easy to see that every field is an integral domain. Broadly speaking:

The integral domain structure will be sufficient to argue soundness.
Sometimes, not always, we will need the additional field structure to prove completness.

The relationship between these algebraic structures can be visualized as:

Venn diagram showing the hierarchical relationship between commutative rings, integral domains, and fields

Other notions exist, which are more fine-grained than the distinction above, for instance, unique factorization domains (UFDs), principal ideal domains (PIDs), and Euclidean domains. But for the purposes of this book, only these three structures will be relevant.

Integral Domains

After showing this general theorem, we are going to immediately restrict ourselves a bit, but just a bit. Namely, we are going to require that $F$ is an integral domain. An integral domain is a commutative ring which has no zero divisors, which is a fancy way of saying:

$\forall a, b \in F . a \cdot b = 0 ⟹ a = 0 or b = 0$

In other words: the only way to get a zero product is to multiply by zero. Fields satisfy this, the integers $Z$ satisfy this, however, e.g. $Z_{2^{32}}$ does not satisfy this:

$2^{16} \cdot 2^{16} = 2^{32} = 0 mod 2^{32}$

The fact that $F$ is an integral domain is important, because it allows us to conclude that if we have a product of polynomials $f (X) = f_{1} (X) \cdot f_{2} (X)$ and a point $x \in F$ such that $f (x) = 0$ , then either $f_{1} (x) = 0$ or $f_{2} (x) = 0$ . Combined with the factor theorem, this allows us to upper bound the number of roots of any polynomial $f (X)$ by the degree of $f (X)$ .

Tensor Products & Hypercubes

The "Boolean Hypercube" is a tensor product, i.e. "all vectors with entries from", the set ${0, 1}$ :

$H_{k} = {0, 1}^{k} \subseteq F^{k} = {(x_{1}, \dots, x_{k}) ∣ x_{1}, \dots, x_{k} \in {0, 1}}$

For instance:

$H_{2} = {0, 1}^{2} = {(0, 0), (0, 1), (1, 0), (1, 1)}$

It is clear that the $k$ -dimensional Boolean hypercube has $2^{k}$ elements: $∣ H_{k} ∣ = 2^{k}$ . In many ways, the choice of ${0, 1}$ is arbitrary, another popular choice is ${- 1, 1}$ which has the slight advantage that it is a group under multiplication, making some things slightly nicer.

The most important part about $H_{k}$ is not the particular sets ${0, 1}$ or ${- 1, 1}$ , nor that every coordinate is from the same set, but the tensor structure, in the most general case: $H_{k} = S_{1} \otimes S_{2} \otimes \dots \otimes S_{k} = {(x_{1}, \dots, x_{k}) ∣ x_{1} \in S_{1}, \dots, x_{k} \in S_{k}}$ for small sets $S_{1}, \dots, S_{k}$ in which case: $∣ H_{k} ∣ = ∣ S_{1} ∣ \cdot ∣ S_{2} ∣ \cdot \dots \cdot ∣ S_{k} ∣$ However, most naturally $\forall i .∣ S_{i} ∣ = 2$ , hence the name "Boolean hypercube".

Multivariate Polynomials

A multivariate polynomial is, as the name suggests, simply a polynomial in one or more variables.

For instance:

$f (X_{1}, X_{2}) = X_{1}^{2} + X_{2}^{2} + X_{1} X_{2} + 1$
Is a multivariate polynomial of individual degrees $2$ and $2$ in the variables $X_{1}$ and $X_{2}$ .
$f (X_{1}, X_{2}) = X_{1}^{2} + X_{2}^{2} + X_{1} X_{2} + X_{1} X_{2}^{3} + 1$
Is a multivariate polynomial of individual degrees $2$ and $3$ in the variables $X_{1}$ and $X_{2}$ .
$f (X_{1}) = 5 \cdot X_{1}^{6} + 3 \cdot X_{1}^{4} + 2 \cdot X_{1}^{2} + 1$
Is a multivariate polynomial of individual degree $6$ in the variable $X_{1}$ .

Definition: Multilinear Polynomial

A multilinear polynomial $f (X_{1}, \dots, X_{k}) \in F [X_{1}, \dots, X_{k}]$ is a multivariate polynomial where individual degrees in each variable are at most one. For instance:

$f (X_{1}, X_{2}) = 8 \cdot X_{1} X_{2} + 5 \cdot X_{1} + X_{2}$
$f (X_{1}) = 5 \cdot X_{1} + 37$

While all the earlier examples of multivariate polynomials were not multilinear polynomials.

Roots of Polynomials

The Factor Theorem

The factor theorem states that if a non-zero polynomial $f (X) \in F [X]$ has a root at $x \in F$ , i.e. $f (x) = 0$ , then we can write $f (X) = (X - x) \cdot g (X)$ for some polynomial $g (X)$ .

The Factor Theorem

Let $f (X) \in F [X]$ , with $f \neq = 0$ and let $x \in F$ be such that $f (x) = 0$ .

Then there exists $g (X) \in F [X]$ such that $f (X) = (X - x) \cdot g (X)$ .

We first consider a special case, then use this to prove the general case.

Proof.

Consider the two cases:

Case x = 0. First we show the theorem when $f (0) = 0$ , then the claim is that $f (X) = (X - x)$ $\cdot g (X)$ $=$ $X \cdot g (X)$ . To see this observe that if $f (X) = \sum_{i = 0}^{d} c_{i} \cdot X^{i}$ then $f (0) = c_{0}$ , and since $f (0) = 0$ we have $c_{0} = 0$ . Therefore $f (X)$ is of the form: $f (X) = c_{1} \cdot X + \dots + c_{d} \cdot X^{d} = X \cdot (c_{1} + \dots + c_{d} \cdot X^{d - 1})$ If we define $g (X) = c_{1} + \dots + c_{d} \cdot X^{d - 1}$ , we see that: $f (X) = X \cdot g (X) = (X - 0) \cdot g (X)$ As desired.
Case x $\neq =$ 0. Suppose $f (x) = 0$ . Define the polynomial $f^{*} (X) = f (X + x)$ and observe that $f^{*} (0) = 0$ . Therefore, by the “x = 0” case, we conclude that $f^{*} (X) = X \cdot g^{*} (X)$ for some $g^{*} (X) \in F [X]$ . Next write: $f (X) = f ((X - x) + x) = f^{*} (X - x) = (X - x) \cdot g^{*} (X - x)$

If we define $g (X) = g^{*} (X - x)$ , then we see that:

$f (X) = (X - x) \cdot g^{*} (X - x) = (X - x) \cdot g (X)$

Actually, the factor theorem can be broadened a bit: no part of the proof requires $F$ to be a integral domain, at no point did we use the fact that $a \cdot b = 0$ implies $a = 0$ or $b = 0$ . Therefore, the factor theorem applies even to polynomials over any commutative ring, for instance, polynomials over $Z_{2^{k}}$ .

The Fundamental Theorem

The Fundamental Theorem

Let $F$ be an integral domain and let $f (X) \in F [X]$ be a non-zero polynomial of degree $d$ , then $f (X)$ has at most $d$ roots in $F$ .

Proof.

We prove this using induction over the degree $d$ of $f (X)$ .

Base d = 0. If $d = 0$ , then $f (X) = c$ for some non-zero $c \in F$ . Clearly, $f (X)$ has $0$ roots.
Step d > 0. Let $f (X)$ be a polynomial of degree $d$ . If $f (X)$ has $0$ roots in $F$ , then we are done. Otherwise, let $x_{0} \in F$ be a root of $f (X)$ . Then, using the factor theorem, we can write $f (X) = (X - x_{0}) \cdot g (X)$ for some polynomial $g (X)$ of degree $d - 1$ . By the inductive hypothesis, $g (X)$ has at most $d - 1$ roots and $X - x_{0}$ has at most $1$ root. Finally $f (X)$ has at most $d$ roots because $F$ is an integral domain: since if $x \in F$ is such that $f (x) = (x - x_{0}) \cdot g (x) = 0$ , then either $(x - x_{0}) = 0$ or $g (x) = 0$ , in other words, $x$ must be a root of either $X - x_{0}$ or $g (X)$ of which there are at most $1 + (d - 1) = d$ .

The theorem also allows us to conclude that if two polynomials $f_{0} (X)$ and $f_{1} (X)$ of degree $d$ share more than $d$ evaluations in $F$ , then the polynomials must be equal.

Corollary. Equality of Polynomials

Let $f_{0} (X)$ and $f_{1} (X)$ be polynomials of degree $d$ over $F$ . Let $x_{0}, \dots, x_{d} \in F$ be distinct elements. Then: $\forall i \in {0, \dots, d} . f_{0} (x_{i}) = f_{1} (x_{i})$ $⟺$ $f_{0} (X) = f_{1} (X)$

Proof.

Define $f (X) = f_{0} (X) - f_{1} (X)$ . Note that $f (X)$ is a polynomial in $F [X]$ of degree at most $d$ and observe that $f (x_{i}) = 0$ for all $i \in {0, \dots, d}$ : $f (X)$ has at least $d + 1$ roots in $F$ . Therefore, $f (X)$ must be the zero polynomial, i.e. $f_{0} (X) = f_{1} (X)$ .

Schwartz-Zippel Lemma

We can turn the corollary above into a probabilistic check of equality between polynomials, this technique is called the Schwartz-Zippel lemma and is widely used; we will use it a lot.

Corollary. Schwartz-Zippel

Let $f_{0} (X) \neq = f_{1} (X)$ be distinct polynomials of degree $d$ and let $C \subseteq F$ be an arbitrary subset of the integral domain $F$ , then: $P_{x \leftarrow $ C} [f_{0} (x) = f_{1} (x)] \leq \frac{d}{∣ C ∣}$

Proof.

Note that the statement is vacuous if $∣ C ∣ \leq d$ . When $∣ C ∣ > d$ then the statement implies that there must exist a subset $S \subseteq C$ with $∣ S ∣ > d$ such that: $\forall x \in S, f_{0} (x) = f_{1} (x)$ In which case the previous corollary shows that $f_{0} (X) = f_{1} (X)$ .

Multivariate Polynomial Roots

An important way for us to view multivariate polynomials will be as univariate polynomials over polynomial rings. To this end, it is useful for us to verify that multivariate polynomials form integral domains allowing us to apply the fundamental theorem:

Polynomial Ring is Integral Domain

Let $F$ be an integral domain, then $F [X]$ is an integral domain.

Proof.

We can multiply and add polynomials, it is also clear that polynomial multiplication is commutative (since $F$ is), i.e. for $f (X) \in F [X]$ and $g (X) \in F [X]$ , we have: $f (X) \cdot g (X) = g (X) \cdot f (X)$ Finally, let us verify that there are no zero divisors in $F [X]$ , i.e. show: $f (X) \cdot g (X) = 0 ⟹ f (X) = 0 or g (X) = 0$ To see this let $j$ and $i$ be the degrees of $f (X)$ and $g (X)$ respectively, denote by $f_{j} \in F$ and $g_{i} \in F$ the leading coefficients of $f (X)$ and $g (X)$ . Note that $f_{j} \neq = 0$ and $g_{i} \neq = 0$ otherwise $f (X) = 0$ or $g (X) = 0$ respectively. Then the leading coefficient of $f (X) \cdot g (X)$ is $f_{j} \cdot g_{i}$ which is non-zero since $F$ is an integral domain. Therefore, $f (X) \cdot g (X) = 0$ if and only if $f (X) = 0$ or $g (X) = 0$ .

Corollary. Multivariate Polynomial Ring is Integral Domain

By applying the theorem above $n$ times, we can conclude that $F [X_{1}, \dots, X_{n}]$ is an integral domain: let $R_{0} = F$ and $R_{i} = R_{i - 1} [X_{i}]$ for $i = 1, \dots, n$ . Observe that: $R_{i} = (((F [X_{1}]) [X_{2}]) \dots) [X_{i}] = F [X_{1}, \dots, X_{i}]$

This "iterative" construction of $F [X_{1}, \dots, X_{n}]$ allows us to view $k$ -variate polynomials over $F$ as univariate polynomials over $F$ :

$f (X_{1}, \dots, X_{k}) \in F [X_{1}, \dots, X_{k}]$ $Equivalently$ $f (X_{k}) \in F [X_{k}] where F = F [X_{1}, \dots, X_{k - 1}]$

With this interpretation $f (X_{k})$ is a polynomial over $F = F [X_{1}, \dots, X_{k - 1}]$ and therefore $X_{k}$ can take any value in $F$ (not just $F$ ), i.e. we can evaluate $f (X_{k})$ for every $(k - 1)$ -variate polynomial $X_{k} \in F$ , which includes the constant polynomials, i.e. $F \subseteq F$ .

If we apply the fundamental theorem to this particular setting we get:

Corollary. Multivariate Polynomial Root Bound

Let $f (X_{1}, \dots, X_{k}) = f (X_{k}) \in F [X_{k}]$ with $F = F [X_{1}, \dots, X_{k - 1}]$ be a $k$ -variate polynomial. Let $d$ be the degree in the $X_{k}$ variable, then there exist at most $d$ distinct $(k - 1)$ -variate polynomials $x_{k} \in F$ such that $f (x_{k}) = 0 \in F$ . In particular, there exist at most $d$ field elements (constant polynomials) $x_{k} \in F$ such that $f (x_{k}) = 0 \in F [X_{1}, \dots, X_{k - 1}]$ .

If we apply this observation recursively, we can conclude that for sufficiently large tensor products, a polynomial vanishes over the tensor product if and only if the polynomial is the zero polynomial:

Polynomial Vanishing on Tensor Products

Let $f (X_{1}, \dots, X_{k}) \in F [X_{1}, \dots, X_{k}]$ be a non-zero $k$ -variate polynomial with degree $d_{i}$ in each variable $X_{i}$ . Let $H_{k}$ be the tensor product of $S_{1}, \dots, S_{k}$ where $\forall i .∣ S_{i} ∣ > d_{i}$ : $H_{k} = S_{1} \otimes S_{2} \otimes \dots \otimes S_{k} \subseteq F^{k}$ Then: $\forall (x_{1}, \dots, x_{k}) \in H_{k} . f (x_{1}, \dots, x_{k}) = 0$ $⟺$ $f (X_{1}, \dots, X_{k}) = 0 \in F [X_{1}, \dots, X_{k}]$

Proof.

We prove this by induction:

Base k = 1. When $X = (X_{1})$ the “multivariate” polynomial is a univariate polynomial $f (X_{1}) \in F [X_{1}]$ , by applying the fundamental theorem with $F = F$ , we observe that the number of roots of $f (X_{1})$ is at most $d_{1} = de g (f)$ , however since $∣ S_{1} ∣ > d_{1}$ the polynomial cannot evaluate to zero on all of $S_{1}$ . So the claim holds.
Step k > 1. Define $F$ $=$ $F [X_{1}, \dots, X_{k - 1}]$ and now rewrite $f (X)$ as a polynomial with coefficients in $F$ : $f (X_{1}, \dots, X_{k}) = i \sum X_{k}^{i} \cdot f_{i} (X_{1}, \dots, X_{k - 1}) \in F [X_{k}]$ Since $F$ is an integral domain, we can apply the fundamental theorem, this time to $F$ $=$ $F [X_{1}, \dots, X_{k - 1}]$ , rather than $F = F$ . We conclude that at most $d_{k}$ values $x_{k} \in F$ satisfy: $f (x_{k}) = i \sum x_{k}^{i} \cdot f_{i} (X_{1}, \dots, X_{k - 1}) = 0$ And, in particular, there exist at most $d_{k}$ elements $x_{k} \in S_{k} \subseteq F \subseteq F$ (constant polynomials) satisfying this. On the other hand, since $∣ S_{k} ∣ > d_{k}$ there must exist at least one $x_{k} \in S_{k}$ which is not a root, in other words: $f (x_{k}) = g (X_{1}, \dots, X_{k - 1}) \neq = 0 \in F$ We then apply the induction hypothesis on $g (X_{1}, \dots, X_{k - 1})$ to conclude that it does not vanish over $H_{k - 1}$ . In other words, we conclude that there is at least one $(x_{1},$ $\dots,$ $x_{k - 1})$ $\in$ $H_{k - 1}$ such that $g (x_{1}, \dots, x_{k - 1}) \neq = 0$ , which also allows us to conclude: $f (x_{1}, \dots, x_{k}) = g (x_{1}, \dots, x_{k - 1}) \neq = 0 \in F$ So $f (X_{1}, \dots, X_{k})$ also cannot vanish over $H_{k}$ and the claim holds for $k$ as well.

Corollary. Multilinear Polynomial Non-Vanishing on Hypercube

Setting $d_{1} = d_{2} = \dots = d_{k} = 1$ and $H_{k} = {0, 1} \otimes \dots \otimes {0, 1}$ as the $k$ -dimensional hypercube, we conclude that a non-zero multilinear polynomial cannot vanish on the hypercube, i.e. if $f (X_{1}, \dots, X_{k}) \in F [X_{1}, \dots, X_{k}]$ then there exists at least one $(x_{1}, \dots, x_{k}) \in {0, 1}^{k}$ such that $f (x_{1}, \dots, x_{k}) \neq = 0$ .

An easy, but very important, observation is that two multilinear polynomials can agree on the hypercube if and only if they actually are equal as polynomials.

Corollary. Multilinear Polynomial Equality on Hypercube

Let $f, g \in F [X_{1}, \dots, X_{k}]$ be two multilinear polynomials such that: $\forall x \in H_{k} . f (x) = g (x)$ Then $f (X_{1}, \dots, X_{k}) = g (X_{1}, \dots, X_{k})$ .

Proof.

We can form: $h (X_{1}, \dots, X_{k}) = f (X_{1}, \dots, X_{k}) - g (X_{1}, \dots, X_{k})$ By assumption $\forall x \in H_{k} . h (x) = f (x) - g (x) = 0$ , therefore we conclude that $h (X_{1}, X_{2}, \dots, X_{k}) = 0$ by the theorem. Hence $f (X_{1}, \dots, X_{k}) = g (X_{1}, \dots, X_{k})$ .

Multivariate Schwartz-Zippel

We can extend the techniques above to reason about the probability that a multivariate polynomial $f (X_{1}, \dots, X_{k})$ vanishes at a random point $x \leftarrow $ H_{k}$ .

Multivariate Schwartz-Zippel

Let $H_{k} = S_{1} \otimes \dots \otimes S_{k}$ and let $f (X_{1}, \dots, X_{k})$ be a multivariate polynomial of individual degrees $d_{i}$ in $X_{i}$ . Then the probability that the polynomial vanishes at uniformly random $x \leftarrow $ H_{k}$ can be bounded as follows: $P [f (x) = 0] \leq i = 1 \sum k \frac{d _{i}}{∣ S _{i} ∣}$ where $x \leftarrow $ H_{k}$ .

Proof.

We show this by induction:

Base k = 1. When $k = 1$ the “multivariate” polynomial is a univariate polynomial $f (X_{1}) \in F [X_{1}]$ , by applying the fundamental theorem with $F = F$ we observe that the number of roots of $f (X_{1})$ is at most $d_{1}$ . Hence for uniform $x_{1} \leftarrow $ S_{1}$ , the probability that $f (x_{1}) = 0$ , i.e. that $x_{1}$ is one of the at most $d_{1}$ roots, is at most $d_{1} /∣ S_{1} ∣$ .
Step k > 1. Basically, there are two ways that $f (x_{1}, \dots, x_{k})$ could be zero:
- When we partially evaluate we get the zero polynomial $f (X_{1}, \dots, X_{k - 1}, x_{k}) = 0$
- Or, $g (X_{1}, \dots, X_{k - 1}) = f (X_{1}, \dots, X_{k - 1}, x_{k}) \neq = 0$ , but $g (x_{1}, \dots, x_{k - 1}) = 0$ .
Define $F$ $=$ $F [X_{1}, \dots, X_{k - 1}]$ and view $f (X)$ as a polynomial in $F$ : $f (X_{1}, \dots, X_{k}) = i \sum X_{k}^{i} \cdot f_{i} (X_{1}, \dots, X_{k - 1}) \in F [X_{k}]$ Since $F$ is an integral domain, we conclude that at most $d_{k}$ values $x_{k} \in F$ satisfy: $f (x_{k}) = i \sum x_{k}^{i} \cdot f_{i} (X_{1}, \dots, X_{k - 1}) = 0$ And, in particular, there exist at most $d_{k}$ elements $x_{k} \in S_{k} \subseteq F \subseteq F$ which make $f (x_{k}) = 0$ , hence the probability that $x_{k} \leftarrow $ S_{k}$ makes $f (x_{k}) = 0$ is at most $d_{k} /∣ S_{k} ∣$ .

On the other hand, if $x_{k} \leftarrow $ S_{k}$ is not a root: $f (x_{k}) = g (X_{1}, \dots, X_{k - 1}) \neq = 0 \in F$ We can apply the induction hypothesis on $g (X_{1}, \dots, X_{k - 1})$ to conclude that: $P_{x \leftarrow $ H_{k - 1}} [g (x) = 0] \leq i = 1 \sum k - 1 \frac{d _{i}}{∣ S _{i} ∣}$ By applying a union bound on both these events we conclude that: $P_{x \leftarrow $ H_{k}} [f (x) = 0] \leq \frac{d _{k}}{∣ S _{k} ∣} + (i = 1 \sum k - 1 \frac{d _{i}}{∣ S _{i} ∣}) = i = 1 \sum k \frac{d _{i}}{∣ S _{i} ∣}$

Lagrange Interpolation

Subgroup Fast Fourier Transform

Also called the "Cooley-Tukey FFT".

Subspace Fast Fourier Transform

Interpolation over The Unit Circle

Interpolation over Algebraic Curves

Reductions of Knowledge

Karp-Levin reductions reduce membership of one

What if we were to broaden this notion so that it could encompass randomized reductions?

Multivariate Reductions

Multivariate Sum-Check

The multivariate sum-check allows reducing a claim of he form

Let $H_{k} = S_{1} \otimes \dots \otimes S_{k}$

Interactive Reduction: Multivariate Sum-Check

Input Relation:

$R_{Σ, H_{k}, d} : = ⎩ ⎨ ⎧ (f (X_{1}, \dots, X_{k}), σ) ∣ σ = x \in H_{k} \sum f (x) \land \forall i . d_{i} = de g_{X_{i}} (f) ⎭ ⎬ ⎫$

Output Relation:

$R_{Σ, H_{k - 1}, d} : = ⎩ ⎨ ⎧ (f (X_{1}, \dots, X_{k - 1}), σ) ∣ σ = x \in H_{k - 1} \sum f (x) \land \forall i . d_{i} = de g_{X_{i}} (f) ⎭ ⎬ ⎫$

Reduction:

Prover computes:

Polynomial Packing

Interactive Reduction: Multivariate Sum-Check

Input Relation:

$R_{Σ, H_{k}, d} : = ⎩ ⎨ ⎧ (f (X_{1}, \dots, X_{k}), σ) ∣ σ = x \in H_{k} \sum f (x) \land \forall i . d_{i} = de g_{X_{i}} (f) ⎭ ⎬ ⎫$

Output Relation:

$R_{Σ, H_{k - 1}, d} : = ⎩ ⎨ ⎧ (f (X_{1}, \dots, X_{k - 1}), σ) ∣ σ = x \in H_{k - 1} \sum f (x) \land \forall i . d_{i} = de g_{X_{i}} (f) ⎭ ⎬ ⎫$

Reduction:

Prover computes:

Univariate

Smooth Univariate Sum-Check

The following is taken from the Aurora paper [BSCRSVW18].

In the section that follows, let $F$ be an integral domain, in particular any field $F$ .

Vanishing Sum of Powers

Let $H \subseteq F$ be a multiplicative coset of a cyclic subgroup, $∣ H ∣ > 1$ and let $0 < i < ∣ H ∣$ , then: $0 = x \in H \sum x^{i}$

In other words: the combined sum of the (non-zero) $i$ -th powers of the elements of $H$ is zero.

Proof.

Pick $g \in H$ , such that $g^{i} \neq = 1$ , which exists since $i < ∣ H ∣$ and $H$ is a coset of a cyclic subgroup. Since $H$ is a coset we have $g \cdot H = H$ , so: $x \in H \sum (g \cdot x)^{i} = x \in H \sum x^{i}$ Rearranging: $0 = x \in H \sum (g \cdot x)^{i} - x \in H \sum x^{i} = x \in H \sum g^{i} x^{i} - x^{i} = x \in H \sum (g^{i} - 1) \cdot x^{i} = (g^{i} - 1) \cdot (x \in H \sum x^{i})$ Since $g^{i} \neq = 1$ , we have $(g^{i} - 1) \neq = 0$ , therefore $(\sum_{x \in H} x) = 0$ since $F$ is an integral domain.

Observe that the proof above only relies on the existence of $g \in H$ such that $g^{i} \neq = 1$ . We can generalize this proof to polynomials (not just monomials), by simply observing that all monomials $X^{i}$ indidually will sum to zero, except the constant term:

Vanishing Sum of Low-Degree Polynomials

Let $H \subseteq F$ be a multiplicative coset of the integral domain $F$ and let $f (X) \in F [X]$ be a polynomial of $de g f < ∣ H ∣$ . Then: $x \in H \sum f (x) = f (0) \cdot ∣ H ∣$

Proof.

Let $c_{0}, c_{1}, \dots, c_{d - 1}$ be the coefficients of $f (X)$ .

Then: $x \in H \sum f (x) = x \in H \sum c_{0} + c_{1} \cdot x + c_{2} \cdot x^{2} + \dots + c_{d - 1} \cdot x^{d - 1} = (x \in H \sum c_{0}) + (x \in H \sum c_{1} \cdot x) + (x \in H \sum c_{2} \cdot x^{2}) + \dots + (x \in H \sum c_{d - 1} \cdot x^{d - 1}) = c_{0} \cdot (x \in H \sum 1) + c_{1} \cdot (x \in H \sum x) + c_{2} \cdot (x \in H \sum x^{2}) + \dots + c_{d - 1} \cdot (x \in H \sum x^{d - 1}) = c_{0} \cdot (x \in H \sum 1) + c_{1} \cdot 0 + c_{2} \cdot 0 + \dots + c_{d - 1} \cdot 0 = c_{0} \cdot ∣ H ∣ = f (0) \cdot ∣ H ∣$

Interactive Reduction

The theorem above gives raise to a natural protocol for proving that a polynomial evaluates to zero over a multiplicative subgroup of $F$ . Suppose the prover has $g (X) \in F [X]$ of degree greater than $∣ H ∣$ and claims that $α = \sum_{x \in H} f (x)$ . The trick behind the reduction is to ask the prover to split $g (X)$ into provided $f (X)$ and $h (X)$ , with $de g f < ∣ H ∣ - 1$ such that: $g (X) = α /∣ H ∣ + X \cdot f (X) + Z_{H} (X) \cdot h (X)$ Where $Z_{H} (X)$ is the polynomial of degree $∣ H ∣ - 1$ that vanishes on $H$ . Observe that: $x \in H \sum g (x) = x \in H \sum α /∣ H ∣ + x \cdot f (X) + Z_{H} (x) \cdot h (x) = (x \in H \sum α /∣ H ∣ + x \cdot f (x)) + (x \in H \sum Z_{H} (x) \cdot h (x)) = (x \in H \sum α /∣ H ∣ + x \cdot f (x)) + (x \in H \sum 0 \cdot h (x)) = x \in H \sum α /∣ H ∣ + x \cdot f (x)$ Because, by definition, $Z_{H} (x) = 0$ for all $x \in H$ . Now, observe that $α + X \cdot f (x)$ is a polynomial of degree $< ∣ H ∣$ and hence, by the previous theorem: $x \in H \sum α /∣ H ∣ + x \cdot f (x) = ∣ H ∣ \cdot (α /∣ H ∣) = α$ So we simply "put" the sum divided by the cardinality of $H$ "into" the constant term of $f (X)$ . The verifier checks that the polynomial is decomposed correctly by quering all the polynomials involved at a random point $c \in F$ , i.e check: $g (c) = α /∣ H ∣ + c \cdot g (c) + Z_{H} (c) \cdot h (c)$

Interactive Reduction

Input.

$R_{S u m} : = {(x = (α, H), w = g (X)) ∣ α = x \in H \sum g (X)}$

Output.

$R_{E v a l} : = {(x = (c), w = (g (X), f (X), h (X))) ∣ g (c) = α /∣ H ∣ + c \cdot g (c) + Z_{H} (c) \cdot h (c)}$

Protocol

Prover computes $f (X)$ , $h (X)$ st. $de g f < ∣ H ∣ - 1$ $g (X) = α /∣ H ∣ + X \cdot f (X) + Z_{H} (X) \cdot h (X)$
Verifier samples $c \in F$ and checks: $g (c) = α /∣ H ∣ + c \cdot g (c) + Z_{H} (c) \cdot h (c)$

Small Characteristic Sum-Check

The following proofs occured originally in "Power Sums over Finite Subspaces of a Field [BC99], but was included in the Aurora paper [BSCRSVW18] in which this sumcheck was introduced. The following details an efficient sumcheck over subspaces of $F_{q}^{k}$ : vector spaces over $F_{q}$ spanned by a set of vectors/elements ${v_{1}, \dots, v_{n}} \subseteq F_{q}^{k}$ . These techniques are efficient when $q$ is small, for instance, $q = 2$ .

Theorem

Let $H$ be an affine subspace of $F_{q^{k}}$ and let $i < ∣ H ∣ - 1$ , then:

$0 = x \in H \sum x^{i}$

In other words, summing any sufficiently small power of elements in an affine subspace of $F_{q}^{k}$ yields zero.

Definition

The generalized derivative of a polynomial $f (x)$ is defined as:

$Δ_{v} (f) = s \in F_{q} \sum f (X + s \cdot v)$

For $v_{1}, \dots, v_{n} \in F_{q^{k}}$ we inductively define the derivative in the direction $(v_{1}, \dots, v_{n}) \in F^{n}$ as:

$Δ_{(v_{1}, \dots, v_{n})} = Δ_{v_{1}} (Δ_{(v_{2}, \dots, v_{n})})$

Observe that the summation is over the finite field $s \in F_{q}$ , not the whole field $F_{q^{k}}$ , where as the "direction" $v \in F_{q^{k}}$ is from the whole field. In other words, the direction $v \in F_{q^{k}}$ is a vector (extension field element), and we consider all steps (scalar multiples) $s \in F_{q}$ in the direction of this vector. When there are multiple directions, we consider the sum over all the combinations:

$Δ_{v_{1}, v_{2}} (f) = s_{1} \in F_{2} \sum s_{2} \in F_{2} \sum f (X + s_{1} \cdot v_{1} + s_{2} \cdot v_{2})$

The relevance to the question at hand is clear: if $v_{1}, \dots, v_{n} \in F_{q^{k}}$ are a basis, then:

$Δ_{v_{1}, \dots, v_{n}} (δ) = s_{1}, \dots, s_{n} \in F_{q} \sum f (X + s_{1} \cdot v_{1} + \dots + s_{n} \cdot v_{n} + δ)$

Which is exactly the sum of $f (X)$ over all elements in the affine space $H = ⟨ v_{1}, \dots, v_{n} ⟩ + δ \subseteq F_{q^{k}}$ .

Next, we define a notion of "weight". The trick is going to be induction over this "metric": it measures the "size" of a polynomial and we will prove by induction that the statement holds up to a given "size", which happens to contain all polynomials of degree less than $∣ H ∣ - 1$ .

Definition

The $q$ -nary digit sum $ds_{q} (n)$ of a number $n \in N$ is the sum of the digits in its $q$ -nary representation. Let $n \in N$ and write it as:

$n = i = 0 \sum n_{i} \cdot q^{i}$

Where $n_{i} [0, q)$ . Then the $q$ -nary digit sum of $n$ is defined as: $ds_{q} (n) = i = 0 \sum n_{i}$

So, we the induction is going to be over the "size" of the exponents in the polynomial, where "size" is measured by the $q$ -nary digit sum of the degree of the polynomial. We introduce the notion of "weight" for a polynomial, which is simply the maximum $q$ -nary digit sum of its exponents of any (non-zero) monomial.

Definition

The weight $wt (f)$ of a polynomial $f (X)$ is the maximum $q$ -nary digit sum of its exponents of any (non-zero) monomial. Let:

$f (X) = i = 0 \sum d a_{i} \cdot X^{i}$

Where $a_{d} \neq = 0$ . Then:

$wt (f) = i max ds_{q} (i)$

Our first claim is that applying $Δ_{v} (f)$ reduces the "weight" of any polynomial:

$wt (Δ_{v} (f)) \leq max {wt (f) - (q - 1), 0}$

With the max simply being there, because the weight of the polynomial is always non-negative.

We also claim that if $wt (f) < q - 1$ , then $Δ_{v} (f) = 0$ .

Proof.

Let $f (X) = X^{d}$

$Δ_{v} (X^{d}) = s \in F_{q} \sum (X + s \cdot v)^{d} = s \in F_{q} \sum c = 0 \sum d (c d) X^{d} s^{d - c} v^{d - c} = d = 0 \sum d (c d) X^{d} v^{d - c} \cdot s \in F_{q} \sum s^{d - c} = c = 0 \sum d (d c) X^{c} v^{d - c} \cdot s \in F_{q} \sum s^{\sum_{i = 0}^{k} (d_{i} - c_{i}) q^{i}} = c = 0 \sum d (d c) X^{c} v^{d - c} \cdot s \in F_{q} \sum s^{\sum_{i = 0}^{k} (d_{i} - c_{i})} = c \leq_{q} d \sum (c d) X^{c} v^{d - c} \cdot s \in F_{q} \sum s^{ds (d) - ds (c)}$

The rewrites are as follows:

Expanding the definition of $Δ_{v} (f)$ .
Is the binomial expansion of some $(a + b)^{k}$ .
Rearranging sums.
A
Removes the $q^{i}$ in the exponents: since $s \in F_{q}$ , $s^{q^{i}} = s$ for all $i$ .
When $d >_{q} c$ then $(c d) = 0 mod q$ : $(c d) = \frac{d !}{c ! ( d - c )!}$

The theorem trivially extends to polynomials of degree less than $∣ H ∣ - 1$ : since the sum over the affine space of every monomial of degree less than $∣ H ∣ - 1$ is zero, then so must the sum of the sum of monomials: the sum of the polynomial over the affine space:

Arithmetization

In parlance, Arithmetization, is the process of reducing statements about some model of computation into algebraic relations between polynomials, which makes them suitable for

Multilinear Ext. of Branching Programs

The techniques in this section were first explored by Holmgren and Rothblum [HR18].

Introduction

For a general function, $f : {0, 1}^{k} \to F$ , we can evaluate its multilinear extension as:

$\tilde{f} (x) = b \in H_{k} \sum eq (x, b) \cdot f (b)$

This requires $O (2^{k})$ time, linear in the domain of the function, because we enumerate every possible input. For a black-box function this is the best we can do: the smallest possible description of the function may be its evaluation table. However, it turns out that for certain classes of functions, we can do better, much better. One such class is the class of read-once branching programs, which we will now explore.

Read-Once Branching Programs

Let $Σ = {0, 1}^{b}$ be the Alphabet, with $b$ the (bit) size of a single symbol. Let $n$ be the length of the branching program. A Read-Once Branch Program is a function:

$P : Σ^{n} \to F$

Implemented by "walking" through a directed graph: starting at a vertex designated the "source", until you reach the "sink" vertexes, each of which are labeled with a field element.

A picture is worth a thousand words.

Example

Let b = 1, n = 4, in other words:

$Σ = {0, 1}$
$P : {0, 1}^{4} \to {0, 1}$

The program looks like this:

Read-once branching program generated from DOT, showing the same structure programmatically

The program has length 4 and width 2.

Exercise

What is is the output of the program on 0 0 1 1?
What is is the output of the program on 1 0 1 0?
What does this program do?

Example

Let b = 3, n = 2. In other words:

$Σ = {0, 1}^{3}$
$P : ({0, 1}^{3})^{2} \to {0, 1}$

The program looks like this:

Read-once branching program with alphabet size b=3, showing multi-labeled edges between states over 2 steps

This simply means that there are e.g. 4 edges from the (teal) source node with the labels 000, 101, 011, 110 going to the top green node. We omit these to avoid cluttering the diagram with a lot of edges. The program has length 2: because it takes two symbols from the alphabet ${0, 1}^{3}$ .

Exercise

What is the width of this Read-Once Branching Program?
What is the output of the braching program on 000 010?
What is the output of the braching program on 011 110?
What does this program do?

Matrix Branching Programs

To "arithmetize" the read-once braching programs, instead express them as Matrix Branching Programs.

For simplcity, we let $b = 1$ , i.e. $Σ = {0, 1}$ . The technique extends trivially to larger $b$ .

A matrix branching program consists of $n$ pairs of $w \times w$ matrixes $(M_{i}^{0}, M_{i}^{1})$ :

$(M_{1}^{(0)}, M_{1}^{(1)}), (M_{2}^{(0)}, M_{2}^{(1)}), \dots, (M_{n}^{(0)}, M_{n}^{(1)})$

And sink vector $u \in F^{w}$ .

To evaluate the Matrix Branching Program on $m \in {0, 1}^{n}$ compute:

$v = (i = n \prod 1 M_{i}^{m_{i}}) \cdot u = (M_{n}^{(m_{n})} \dots M_{2}^{(m_{2})} \cdot M_{1}^{(m_{1})}) \cdot u = (M_{n}^{(m_{n})} \dots (M_{2}^{(m_{2})} (M_{1}^{m_{1}} u)))$

The output of the MBP is $v_{1}$ , the first entry of $v$ . Observe that even though matrix multiplication is linear, Matrix Branching Programs (MBPs) are not linear in $m$ : the picking of each matrix from the pairs gives them some "discrete" structure.

Adjacency Matrixes: ROBP $\to$ MBP

Walks in directed graphs can be expressed by application of an adjacency matrix $M$ , which has $M_{i, j} = 1$ iff. there is an edge from node $i$ to $j$ and $0$ otherwise. Hence it should be no surprise that we can eeasily convert Read-Once Branching Programs to Matrix Branching Programs: the trick is to define pairs of matrixes:

$(M_{i}^{(0)}, M_{i}^{(1)})$

Such that $M_{i}^{(0)}$ describes the transitions from the current stage to the next when input $m_{i} = 0$ and $M_{i}^{(0)}$ describes the transitions from the current stage to the next when $m_{i} = 1$ . Letting $u = (1, 0, \dots, 0)$ , meaning the "active" node is the first node in the stage, by computing:

$M_{n}^{0} \cdot u = (0, \dots, 0, 1, 0, \dots, 0)$

Where $M_{n}$ has

An example and an image:

Example

Recall this guy:

Read-once branching program generated from DOT, showing the same structure programmatically

If we look just at the 0-labelled edges:

Graph showing only the 0-labeled edge transitions in the branching program

We can describe the transition using this sequence of matrixes:

$M_{0}^{0} = (1000), M_{1}^{0} = (1001), \dots, M_{n}^{0} = (1001),$

The way to read these: the row indicates the current node, the column is the next node. There is a $1$ iff. the current node, goes to the next node on input $0$ .

Similarly, we can look at the “1 edges”:

Graph showing only the 1-labeled edge transitions in the branching program

$M_{1}^{(1)} = (01 am p; 0 am p; 0)$ $M_{2}^{(1)} = (0 am p; 11 am p; 0), \dots, M_{n}^{(1)} = (0 am p; 11 am p; 0),$

As a santity check, let’s check the output on 1010:

$v = M_{4}^{(0)} (M_{3}^{(1)} (M_{2}^{(0)} (M_{1}^{(1)} (u))))$

Step-by-step:

The program is in the bottom state: $M_{1}^{(1)} \cdot u = (0 am p; 01 am p; 0) \cdot (10) = (01)$
Leaves the program in the bottom state: $M_{2}^{(0)} \cdot (01) = (1 am p; 00 am p; 1) \cdot (01) = (01)$
Changes the program state to the top: $M_{3}^{(1)} \cdot u = (0 am p; 11 am p; 0) \cdot (01) = (10)$
Leaves the program state in the top state: $M_{3}^{(0)} \cdot u = (1 am p; 00 am p; 1) \cdot (10) = (10)$

By convention the Matrix Branching Program produces $1$ as output: as that is the first entry of the vector $(1, 0)^{T}$ produced by the product we just computed.

Exercise

What if the label of the second sink was not 0? What if the label of the first sink was not 1? What if they were switched?
What if my branching program had multiple sinks with the same label?
What if some of the sink labels were not 0/1?
e.g. how do I make the ROBP output $a, b \in F$ instead of $0, 1$ ?

Hint: Use “single row” matrixes, i.e. $(a am p; b 0 am p; 0)$

Symbolic Evaluation

A note about Symbolic Evaluation of MBPs:

By computing: $(\prod_{i = n}^{1} M_{i}^{(m_{i})}) = row_{1} ⋮ row_{w}$ And taking the first row, equivalently: $e_{1}^{T} \cdot (\prod_{i = n}^{1} M_{i}^{(m_{i})}) = row_{1}$ Where $e_{1} = (1, 0, \dots, 0)$ .
We can compute the output for any sink $u$ as $⟨ u, ro w_{1} ⟩$

This means that for an input $m$ we can "precompute" a vector $row_{1}$ of length $w$ which allows us to evaluate the branching program for any sink vector: e.g. we can explore which node we would end up in if we "started" in another node than $u = (1, 0, \dots, 0)$

Multilinear Basis Polynomials

All the "non-linearlity" of a Matrix Branching Program is "contained" in the selection of the correct matrix. To implement this section, we use multilinear Lagrange polynomials:

$eq (X, Y) \in F^{\leq 1} [X, Y]$

Such that for $x \in {0, 1}^{n}$ and $y \in {0, 1}^{n}$ we have:

$eq (x, y) = 1 ⟺ x = y$

And otherwise $eq (x, y) = 0$ .

Low-Degree Extensions of Matrix Branching Programs

Let's start small, with an MBP of length 1. I claim that:

$f (X_{1}) = e_{1}^{T} \cdot (eq (X_{1}, 0) \cdot M_{1}^{(0)} + eq (X_{1}, 1) \cdot M_{1}^{(1)}) \cdot u$

Is the low-degree extension of the MBP $((M_{1}^{(0)}, M_{1}^{(1)}), u)$

Exercise

Pause and think: that does this expression “do”?
Convince yourself: when $x_{1} = 0$ and $x_{1} = 1$ it is correct.
Why is $f (X_{1})$ the unique “multi“linear extension?

This trick extends to MBPs of length greater than $1$ :

We compute the "mixed" matrixies: $M_{i} = eq (x_{i}, 0) \cdot M_{i}^{(0)} + eq (x_{i}, 1) \cdot M_{i}^{(1)}$
We compute their product: $M_{*} = \prod_{i = n}^{1} M_{i}$
Compute $M_{*} \cdot u$ and take the first component.

Another, completely equivalent method:

Define: $χ^{(n)} = e_{1}^{T} \cdot M_{n} \in F^{w}$ Where $e_{1} = (1, 0, \dots, 0)$ which has the effect of "taking" the first row.
Iteratively compute: $χ^{(i)} = χ^{(i + 1)} \cdot M_{n_{1}} \in F^{w}$
Output $⟨ χ^{(1)}, u ⟩$

Which allows you to compute the low-degree extension in a streaming way, if needed for your application.

Example

Let’s compute the multilinear extension for the simple MBP from the earlier example:

Read-once branching program generated from DOT, showing the same structure programmatically

Recall the matrices are: $M_{i}^{(0)} = (1001), M_{i}^{(1)} = (0110)$ for $i = 2, 3, 4$ , and: $M_{1}^{(0)} = (1000), M_{1}^{(1)} = (0100)$

Let’s compute the multilinear extension $f (X_{1}, X_{2}, X_{3}, X_{4})$ and evaluate it at $X_{1} = \frac{1}{2}, X_{2} = X_{3} = X_{4} = 0$ .

First, compute the mixed matrices $M_{i} = eq (X_{i}, 0) \cdot M_{i}^{(0)} + eq (X_{i}, 1) \cdot M_{i}^{(1)}$ :

For $i = 1$ with $X_{1} = \frac{1}{2}$ : $M_{1} = \frac{1}{2} (1000) + \frac{1}{2} (0100) = (\frac{1}{2} \frac{1}{2} 00)$

For $i = 2, 3, 4$ with $X_{i} = 0$ : $M_{i} = 1 \cdot (1001) + 0 \cdot (0110) = (1001)$

Now compute using the streaming method with $u = (1, 0)^{T}$ :

Start: $χ^{(4)} = e_{1}^{T} \cdot M_{4} = (1, 0) \cdot (1001) = (1, 0)$
$χ^{(3)} = χ^{(4)} \cdot M_{3} = (1, 0) \cdot (1001) = (1, 0)$
$χ^{(2)} = χ^{(3)} \cdot M_{2} = (1, 0) \cdot (1001) = (1, 0)$
$χ^{(1)} = χ^{(2)} \cdot M_{1} = (1, 0) \cdot (\frac{1}{2} \frac{1}{2} 00) = (\frac{1}{2}, 0)$

Finally: $f (\frac{1}{2}, 0, 0, 0) = ⟨ χ^{(1)}, u ⟩ = ⟨(\frac{1}{2}, 0), (1, 0)⟩ = \frac{1}{2}$

This makes sense! The multilinear extension at $X_{1} = \frac{1}{2}$ gives us the “average” of $P (0, 0, 0, 0) = 1$ and $P (1, 0, 0, 0) = 0$ .

Exercise

Why is this the unique multilinear extension of $P$ ?

Hint, we need to check:

$f (x_{1}, \dots, x_{n})$ is multilinear
$f (x_{1}, \dots, x_{n})$ agrees with $P$ on $x_{1}, \dots, x_{n} \in {0, 1}^{n}$

Protocols

Bibliography

[BSCRSVW18] Eli Ben-Sasson, Alessandro Chiesa, Michael Riabzev, Nicholas Spooner, Madars Virza, Nicholas P. Ward. Aurora: Transparent Succinct Arguments for R1CS. 2018.

[CHMMVW19] Alessandro Chiesa, Yuncong Hu, Mary Maller, Pratyush Mishra, Psi Vesely, Nicholas Ward. Marlin: Preprocessing zkSNARKs with Universal and Updatable SRS. 2019.

[BC99] Nigel P. Byott, Robin J. Chapman. Power Sums over Finite Subspaces of a Field. 1999.

[HR18] Justin Holmgren, Ron D. Rothblum. Delegating Computations with (almost) Minimal Time and Space Overhead. 2018.

Keyboard shortcuts