Have you checked your sums?

Introduction
There has recently been a growing interest in zk-SNARKs (zero-knowledge, succinct, non-interactive arguments of knowledge) due to their capabilities in decentralized private computations and scaling blockchains. These constructions involve a protocol between two parties, a prover and a verifier, where the former attempts to convince the latter of the validity of a given statement. Sometimes, the prover tries to do this without revealing sensitive information. We want the work needed for the verifier to check the statement to be significantly smaller than just doing it himself. For example, we would like to delegate an expensive computation to an untrusted server (for which we do not have the necessary resources) and be able to verify the correctness of the computation using a smartphone. The zero-knowledge property allows us to prove the possession of some secret (such as a private key or the preimage of some hash) without giving that information to the verifier. At the heart of these constructions, we have polynomials and can reduce the statement to some relation between polynomials. For example, STARKs uses univariate polynomials and the FRI protocol to prove the correctness of a given computation. The sumcheck protocol, which involves polynomials in several variables, can be used to build SNARKs.
In this post, we will first describe how to encode vectors as multilinear polynomials (similar to how we encoded vectors as univariate polynomials) and how the sumcheck protocols work. We are currently implementing the sumcheck protocol and multilinear polynomials as part of the learning path of the Sparkling Water Bootcamp; you can follow the development at Lambdaworks.
Encoding vectors as multilinear polynomials
A polynomial p in n variables is called multilinear if the degree of each variable xi is at most one in every term. For example, p1(x1,x2,x3,x4)=x1+2x2+x1x2x4x3 is a multilinear polynomial because the power of each xi is either 0 or 1 in each term. The polynomial p2(x1,x2)=x1x22 is not, since the degree of x2 is 2. The total degree of a multilinear polynomial is the highest sum of all the powers of a term (monomial). For p1, this is 4. For multilinear polynomials, the maximum degree is at most m.
We will restrict ourselves now to polynomials defined over the set D=0,1m. Given a function f defined over D, we can define a multilinear polynomial p(x1,x2,...,xm) such that p coincides with f over the set D, that is p(x)=f(x) for every x∈D. Since this polynomial is unique, the polynomial p is called the multilinear extension of f.
We can use the multilinear extension to represent a vector v containing 2m elements. Suppose the vector v 's elements belong to some finite field F. We first create the function f:D→F, which maps each element of D into an element of v. One easy way to do this is by representing the position in the vector k in its bit form. For example, if the vector has 256 elements, we need 8 variables (bits), and we can define the map as:
f(0,0,0,0,0,0,0,0)=v0
f(0,0,0,0,0,0,0,1)=v1
f(0,0,0,0,0,0,1,0)=v2
f(0,0,0,0,0,0,1,1)=v3
⋮
f(1,1,1,1,1,1,1,1)=v255
In general form, we assign to a tuple (x0,x1,...xm−1) the value corresponding to index k=x0+2x1+4x2+⋯+2m−1xm−1. Then, we can use the fact that the multilinear extension of f exists and create it by Lagrange interpolation, for example. Thus,
p(x0,x1,...xm−1)=∑x0,...,xm−1f(k)Bk(x0,x1,...,xm−1)
where Bk is the Lagrange basis polynomial, which equals one when (x0,x1,...,xm−1) corresponds to the binary representation of k and zero otherwise. If we represent k=(k0,k1,...km−1) (remember each ki is either 0 or 1), the function Bk(x0,x1,...xm−1) has the explicit expression
Bk(x0,x1,...,xm−1)=∏(xiki+(1−xi)(1−ki))
For example, if we have the vector v=(2,5,7,8), we have four Lagrange basis polynomials:
B0(x0,x1)=(1−x0)(1−x1)=1−x1−x0+x1x0
B1(x0,x1)=x0(1−x1)=x0−x0x1
B2(x0,x1)=(1−x0)x1=x1−x0x1
B3(x0,x1)=x0x1
and
p(x0,x1)=2B0+5B1+7B2+8B3
Replacing everything,
p(x0,x1)=2+3x0+5x1−2x0x1
This way, we have encoded our vector as a multilinear polynomial in two variables. We could generally encode a vector of length n as a polynomial in ⌈log2(n)⌉ variables. We can then use this encoding to reduce the validity of some calculation to the sum of this polynomial over all possible values of x0,x1...xn.
The sumcheck protocol
The sumcheck protocol is an interactive proof introduced in 1992 with a fundamental role in the theory of probabilistic proofs in complexity theory and cryptography, leading to the construction of succinct arguments. One of its essential properties is that the prover can be implemented in a number of operations that scale linearly (that is, its running time is O(n)), which has a better asymptotic complexity than algorithms based on the Fast Fourier Transform (O(nlogn)). It also provides the basis for folding techniques for Pedersen commitments in the discrete logarithm setting. For an in-depth explanation of the protocol, look at proofs, arguments and zero-knowledge and sumcheck arguments and their applications.
The sumcheck protocol yields an interactive proof for statements of the form
∑x∈Hmp(x)=S
that is, the sum of all the evaluations of an m-variate polynomial over a domain equals S. The prover is given the polynomial p(x), and the verifier will send him random challenges, rk, from a set C and receive polynomials qk(x), which will allow him to be convinced that the statement is true. The protocol will reduce the workload of the verifier from having to evaluate the m-variate polynomial over |H|m (for example, if the size of H, |H|, is two and we have 16 variables, we need to do 216 evaluations) to a single evaluation over a random point over Fm, plus some additional smaller operations.
The protocol proceeds in rounds and works as follows:
- The prover sends to the verifier the polynomial
qk(x)=∑aj∈H,j≥k+1p(r1,r2,...,rk−1,x,ak+1,...am) - The verifier checks that ∑a1∈Hq1(a1)=S and ∑ak∈Hqk(ak)=qk−1(rk−1).
- If all checks pass, the verifier outputs v=qm(rm) and outputs r1,r2,...,rm,v.
Let's explain the protocol in simple terms. In the first round, the prover sends the verifier a polynomial q1(x1) by summing over all possible values of the rest of the variables. This way, the verifier can check the sum by evaluating the polynomial q1(x1) over all its values, which is much faster than summing over all the variables. However, how does the verifier know that the prover did not cheat and send some fake polynomial q1(x1)? The verifier sends a random challenge r1, and the prover responds with a new polynomial of one variable, q2(r1,x2), which is obtained by fixing the first coordinate and summing over all the other variables except x2. If we evaluate q1(r1), we should get the same as adding over all possible values of q2(x2) (because q1 was obtained by summing over all values of x2). The verifier always has to do a few evaluations of a univariate polynomial.
If the challenge subset C is a sampling subset, then the sumcheck protocol satisfies:
a. Completeness.
b. Soundness, where the soundness error is bounded by md/|C| (the number of variables, the maximum degree in the polynomial, and the number of elements in the challenge subset).
In many cases, we would like to work with Hm={0,1}m, so that x=(x1,x2,...,xm) is the collection of all bitstrings of length m and we can use the encoding for vectors as multilinear polynomials.
To make the sumcheck protocol zero-knowledge, we need to mask the polynomial. We can achieve this by adding a random polynomial.
Conclusion
In this post, we covered the sumcheck protocol, which is at the heart of some SNARKs. It allows the verifier to check that the sum of the evaluations of some multivariate polynomial over a set is equal to some number by delegating most of the computational burden to the prover. The protocol involves a number of rounds equal to the number of variables, where the prover sends at each round a univariate polynomial, and the verifier responds by sending a random challenge. The verifier's highest cost is involved in evaluating the multivariate at one random point, significantly less than trivial verification. In an upcoming post, we will cover how to implement the sumcheck protocol from scratch.