PSI with FHE

This work was done by Gokay Saldamli and Lisandro (Lichu) Acuña at Semiotic Labs. Special thanks to Jonathan Passerat-Palmbach and the broader Flashbots team for their collaboration and comments on the project.

Private Set Intersection with Fully Homomorphic Encryption

Private Set Intersection (PSI) is a cryptographic technique that plays a crucial role in safeguarding data privacy. It enables two parties to identify common elements in their sets without leaking any information about the rest of the elements. This can be used, for example, in secure contact tracing during epidemics or in privacy-preserving human genome testing.

Over the last year, Semiotic Labs has been studying the possible applications of this powerful cryptographic technique to blockchain infrastructure problems, particularly in the context of Maximal Extractable Value (MEV). Through discussions with the Flashbots team, we arrived at the conclusion that an efficient PSI protocol could add a lot of value to the field, for example in access list comparison (EIP-2930) or as a primitive to enable private auctions. This is why we decided to build such a PSI protocol, and in this article we will detail how it works, covering everything from the math to the code.

The Math

If you stop for a few minutes to try to devise a PSI protocol, the first idea that will come to mind will probably have to do with hash functions. It is a natural idea to have Alice hash all the items in her set and send these hashes to Bob, who will then hash the elements in his set and compare them to Alice’s hashed elements. This scheme works and is extremely fast, but unfortunately, it is also extremely insecure, as it can leak Alice’s inputs if the input space is small – Bob could simply hash all the elements in the input space and compare them to Alice’s hashed elements.

Nowadays, most existing and functional PSI implementations rely on an external server (they are “server-aided protocols”). In this case, Alice and Bob learn nothing about each other’s set except the shared elements, and nothing is visible to the server as long as there are no collusions. However, when the server colludes with one of the actors (i.e., they “cooperate”), the protocol becomes completely insecure and the server and its colluding party can read all elements of the other party’s set.

In the PSI protocol we constructed, we use Fully Homomorphic Encryption (FHE) to eliminate reliance on an external server. As you may know, FHE is a family of cryptographic techniques that enable computing directly on encrypted data, preserving data privacy throughout processing. FHE was the original focus of Semiotic when the team started four years ago, and we spent a lot of time working on it. Using FHE to perform PSI in a completely trustless manner is a nod to the past for the team.

Our solution is based on Chang et al.’s “Fast Private Set Intersection from Homomorphic Encryption”. Extracted directly from their paper, this is how their PSI protocol works:

  1. Context: Alice has a set $\mathcal{A} = {a_1, …, a_{N_A}}$ and Bob has a set $\mathcal{B} = {b_1, …, b_{N_B}}$.
  2. Setup: Alice and Bob jointly agree on an FHE scheme, and Alice generates a public-secret key pair for it, sending the public key to Bob and keeping the secret key for herself.
  3. Set encryption: Alice encrypts each element $a_i$ in her set $\mathcal{A}$ using the FHE scheme, and sends the $N_A$ ciphertexts $c_1, …, c_{N_A}$ to Bob.
  4. Computing intersection: For each $c_i$, Bob samples a random non-zero element $r_i$ and homomorphically computes: $$d_i = r_i \prod_{j=1}^{N_B} (c_i – b_j)$$ After doing this for all $c_i$, Bob sends the ciphertexts $d_1, …, d_{N_A}$ to Alice.
  5. Reply extraction: Alice decrypts the ciphertexts $d_1, …, d_{N_A}$.

Understanding these steps may take some time, so feel free to stop and go through them again. Steps 1 and 2 are pretty much just the problem statement and setup. In step 3, Alice encrypts her own set of elements, which Bob will not be able to decrypt as he doesn’t have Alice’s secret key. Step 3 is the key step, and where we actually make use of FHE: the product $\prod_{j=1}^{N_B} (c_i – b_j)$ is the encryption under Alice’s secret key of the product $\prod_{j=1}^{N_B} (a_i – b_j)$, as $c_i$ is the encryption of $a_i$ and we are running this computation in FHE. Observe that $\prod_{j=1}^{N_B} (a_i – b_j)$ will be 0 if and only if one of $b_1, …, b_{N_B}$ (Bob’s set elements) equals $a_i$, and therefore $\prod_{j=1}^{N_B}(c_i – b_j)$ will be the encryption of 0 precisely when this happens, same as $r_i \prod_{j=1}^{N_B}(c_i – b_j)$. If this is not the case, i.e. no $b_j$ equals $a_i$, the term $r_i$ will act as a “randomizer”, making the final product look basically random. Observe that Bob, who performed the computation, doesn’t know if the product ended up being 0 or not, as $d_i$ is the encrypted result of the computation. Only Alice can decrypt this result, which is exactly what she does in step 4. It’s worth noticing that, if both Alice and Bob want to get the intersection, they have to run the protocol twice, interchanging roles.

Once you understand these steps, it is quite intuitive that this PSI protocol works, and its privacy comes from doing all the computation on encrypted data. However, there is a small problem. When running computation over FHE, there is a “noise” that increases with each operation and almost doubles with multiplication. This noise cannot exceed a certain threshold, or else the outputs will be erroneous. In this sense, the multiplicative depth of a circuit is defined as the number of sequential homomorphic multiplications that are computed in our FHE circuit, a parameter that cannot be too high, or else the noise will be too high.

As you can see, the circuit depth in Chang et al.’s protocol is $N_B$, the size of Bob’s set. This is not ideal, as it makes it impossible to run the protocol for relatively large sets. Thus, we modified the protocol and added an optimization that keeps the circuit depth fixed regardless of the size of the sets and therefore allows running PSI on sets of any size. Hooray!

This is not as complex as it seems. We simply define a parameter $N$ that represents the maximum circuit depth that our FHE scheme can tolerate and split Bob’s set of size $N_B$ into $\lceil \frac{N_B}{N} \rceil$ subsets of size at most $N$ each. Next, Alice runs the original protocol with each of Bob’s subsets separately, having a circuit of depth at most $N$ each time.

This leads to an incredibly fast and scalable protocol, with runtime growing linearly on the size of Bob’s set. Moreover, the separate private intersections with each subset of Bob’s set can be run in parallel, leading to an even faster protocol.

The Code

We implemented the protocol using node-seal, a wrapper library of Microsoft SEAL, the most widely used FHE library both in the academia and in the industry. We will walk through the important pieces of this implementation here, but you can also access our public GitHub repository (http://github.com/LichuAcu/psi-demo) where you can read the whole code and try the protocol yourself using a visual interface.

The core logic of the protocol is implemented in the logic.tsx file. Each step in our PSI description is implemented as a separate function.

For Step 1, for example, we see in the setup function how the public and secret keys are created:

const keyGenerator = seal.KeyGenerator(context);
const publicKey = keyGenerator.createPublicKey();
const secretKey = keyGenerator.secretKey();

For Step 2, Alice’s set is encrypted in the alice_encrypt_locations functions:

const set_ciphertexts_alice = encryptor.encrypt(set_plaintexts_alice) as CipherText;

Step 3 is arguably the most interesting one! In the bob_homomorphic_operations function, we first split Bob set into subsets:

const sets_plaintexts_bob = [];
for (let i = 0; i < locations_bob.length; i += batch_size) {
    const batch = locations_bob.slice(i, i + batch_size);
    sets_plaintexts_bob.push(Int32Array.from(batch));
}

Next, each subset is intersected with Alice’s set, following the logic specified in Step 3:

sets_plaintexts_bob.forEach((set_plaintexts_bob) => {
    const final_product = this.seal.CipherText();

    // Homomorphically initialize result to first Alice's element - first Bob's element
    const first_element_bob = Int32Array.from(Array(set_alice_length).fill(set_plaintexts_bob[0]));
    const first_element_bob_encoded = this.encoder.encode(first_element_bob) as PlainText;
    this.evaluator.subPlain(set_ciphertexts_alice, first_element_bob_encoded, final_product);

    for (let i = 1; i < set_plaintexts_bob.length; i++) {
        const ith_element_bob = Int32Array.from(Array(set_alice_length).fill(set_plaintexts_bob[i]));
        const ith_element_bob_encoded = this.encoder.encode(ith_element_bob) as PlainText;
        const temp = this.seal.CipherText();
        this.evaluator.subPlain(set_ciphertexts_alice, ith_element_bob_encoded, temp);
        this.evaluator.multiply(final_product, temp, final_product);
    }

    let random_plaintext = new Int32Array(set_alice_length);
    let random_plaintext_encoded: PlainText;
    getRandomValues(random_plaintext);
    random_plaintext_encoded = this.encoder.encode(random_plaintext) as PlainText;
    this.evaluator.multiplyPlain(final_product, random_plaintext_encoded, final_product);

    const final_product_string = final_product.save();
    counter++;
    final_products.push(final_product_string);
});

return final_products;

The evaluator object in the above code is in charge of homomorphically performing our mathematical operations. Its methods take the operands of our operation as its first two inputs, and the third input is the variable in which the result will be stored. Observe how we sometimes call the multiply method while other times we call the multiplyPlain one; this is because in the former we are multiplying two encrypted values together, while in the latter we are multiplying an encrypted value by a plain value.

Finally, Step 4 is fairly simple:

for (const final_product of final_products) {
    let final_product_ciphertext: CipherText = this.seal.CipherText();
    final_product_ciphertext.load(this.context, final_product);
    const decrypted = this.decryptor.decrypt(final_product_ciphertext) as PlainText;
    const decoded = this.encoder.decode(decrypted);
    for (let i = 0; i < set_alice_length; i++) {
        if (decoded[i] == 0) {
            // The i-th element is in the intersection!
        }
    }
    counter++;
}

Benchmarks

Now that we know why and how our protocol works, let’s see it in action! Since network connectivity and delay may vary depending on the context in which the protocol is used, we ran our benchmarks performing Alice’s and Bob’s calculations locally, thus ignoring the communication time. To estimate it, we should consider the $N_A$ ciphertexts that Alice sends in Step 2 and the $N_A$ ciphertexts that Bob sends in Step 3.

All these benchmarks were run in a 64 GB M2 Max MacBook Pro. Our first set of benchmarks shows something that can be easily intuited from the protocol design, but is worth looking at in numbers: that the size of the intersection does not affect the execution time. We can check this by keeping all other parameters fixed, changing the size of the intersection and observing that the execution time remains basically the same:

Alice’s set sizeBob’s set sizeIntersection sizeRunning time (ms)
100100133884
100100253811
100100503846
1001001003834

Thus, we will not take this parameter into account in the rest of our benchmarks.

Let’s now see what happens when the size of Bob’s set increases. Since Bob’s set will be divided into a number of subsets directly proportional to the size of his original set, it would make sense to expect the runtime to grow linearly as the size of Bob’s set grows. That’s exactly what happens! Take a look at it:

Alice’s set sizeBob’s set sizeRunning time (ms)
100501919
1001003811
1002007646
10040015286
10080030496

Finally, we run the same experiments with different sizes for Alice’s set. It turns out that because of the way Microsoft SEAL is implemented in an array-first way, increasing the size of the Alice set (up to an upper-bound) does not increase the execution time. You can see this in the following benchmarks chart, where running time only changes when Bob’s set size does:

Note that the graph looks exponential because the X-axis is in logarithmic scale, which is actually a visual indication of the underlying linear relationship between running time and Bob’s set size indicated above.

Limitations and conclusion

Overall, these benchmarks show a pretty fast protocol! Performing a completely trustless FHE-based PSI on arbitrarily large sets in just a few seconds is a very good result.

However, there are some design limitations to it that should be kept in mind. The main one is that this protocol is designed for only two entities, and we have not found an efficient way to extend it to more. In the context of access lists, for example, it would probably be desirable to ensure that no element is repeated in any pair of all access lists (i.e., that each storage location is in at most one access list). This could be achieved if each party executed the PSI protocol with each other party, but this would involve $o(n^2)$ executions of the protocol across the network, which is not ideal.

If you have any thoughts on how to solve this limitation, or want to discuss other ideas from this project, we look forward to hearing from you! We are also very interested in hearing about other applications you can think of for this efficient PSI protocol. Feel free to shoot me an email, submit a PR to our GitHub repo or ping me on Twitter at @lichuacu. 🫡

Introduction to the Sum-Check Protocol

TL;DR

It is expensive to run transactions on the Ethereum EVM. Verifiable computing (VC) lets us outsource computing away from the EVM. Today, a popular and exciting form of VC algorithm is the SNARK. There are various families of SNARKs that use the sum-check protocol, which is a simple algorithm to introduce VC. This is a tutorial on the sum-check protocol. This post is focused on how the sum-check protocol is implemented – it does not go into theory. You can skip straight to the finished code here.

Thank you to Gokay Saldamli, Gabriel Soule, and Tomasz Kornuta for providing valuable feedback on this article.

Background

Executing code on Ethereum is expensive. Verifiable computing algorithms promise a way to reduce costs, by outsourcing computing to untrusted parties and only verifying the result on-chain. A key point about useful verifiable computing algorithms is that the verification must be less expensive than the original computation. The sum-check protocol is a foundational algorithm in the field of verifiable computing. In a stand-alone setting, sum-check is not particularly useful. However, it is an important building block of more sophisticated and useful SNARK algorithms. This tutorial introduces sum-check as a way to familiarize the reader with a relatively simple verifiable computing algorithm.

The sum-check protocol is used to outsource the computation of the following sum:

$$H := \sum_{x_1 \in \{0,1\}} \sum_{x_2 \in \{0,1\}} \cdots \sum_{x_v \in \{0,1\}} g(x_1, x_2, \cdots, x_v)$$

The sum-check protocol allows a Prover $P$ to convince a Verifier $V$ that $P$ computed the sum $H$ correctly. You can assume that $P$ has ample computational power or ample time to compute, e.g., it could be an untrusted server somewhere on the internet, and $V$ is has limited computational power, e.g., it could be the EVM.

Why would you want to sum a function over all Boolean inputs, as is done in the equation above? One reason would be if you wanted to provide an answer to the #SAT (“sharp sat”) problem. #SAT is concerned with counting the number of inputs to a Boolean-valued function that result in a true output. For example, what is the #SAT of $f(x_1, x_2, x_3) = x_1 \text{ AND } x_2 \text{ AND } x_3$? It is 1, because only a single input, $x_1=1$, $x_2=1$, $x_3 = 1$, results in 1 being output from $f$. You can probably see that it would be computationally intensive for you to solve #SAT for complex problems. With the sum-check protocol, $P$ can do that work instead of you.

Unless you are a complexity theorist, you may not get too excited about the #SAT problem at first glance. However, more practical problems can be mapped to #SAT, and then sum-check can be used. For example, given an adjacency matrix, you can use sum-check to solve for how many triangles are connected. You can also use sum-check to outsource matrix multiplication. Check out the book mentioned next for more details.

In this tutorial, we use the same example as given in Justin Thaler’s book: Proofs, Arguments, and Zero-Knowledge. In Justin’s book, sum-check is used to solve #SAT for the following example function:

$$\phi(x_1, x_2, x_3, x_4) = (\bar{x_1} \text{ AND } x_2) \text{ AND } (x_3 \text{ OR } x_4)$$

where $\bar{x_1}$ means NOT $x_1$. The function $\phi$ (phi) can also be represented graphically as:

In the graph of $\phi$, we see the logic symbols for AND ($\wedge$) and OR ($\vee$). Note that $\phi$ is only defined over the Boolean inputs (0/1) and that $\phi$ has a Boolean output. However, most popular verifiable computing algorithms only support arithmetic operations, so we need to convert $\phi$ to its arithmetic version. The arithmetized version is called $g$. For our example, $g$ is defined as:

$g(x_1, x_2, x_3, x_4) = (1-x_1)x_2((x_3+x_4)-(x_3x_4))$

and visualized as:

In $g$, notice that the Boolean functions: NOT, AND, OR have been compiled into their arithmetic equivalents. Take a moment to convince yourself that if you pick Boolean values for $x_1$, $x_2$, $x_3$, and $x_4$, that $\phi$ and $g$ will output the same thing.

For reference, here are the specific compiler rules used when converting $\phi$ into $g$:

Boolean gate       Arithmetized version
A AND B A*B
A OR B (A+B)-(A*B)
NOT A 1-A

Notice that $g$ has extended the domain of our original Boolean formula and is now valid for Boolean inputs and integers. The facts that $g$ equals $\phi$ when evaluated at Boolean inputs and that integers are valid inputs to $g$ are leveraged by the sum-check protocol.

The sum-check protocol

This section introduces the sum-check protocol. We will follow the same steps and naming convention as used in Justin’s book. The protocol steps are summarized as:

  1. Prover $P$ calculates the total sum of $g$ evaluated at all Boolean inputs
  2. $P$ computes a partial sum of $g$, leaving the first variable $x_1$ free
  3. Verifier $V$ checks that the partial sum and total sum agree when the partial sum is evaluated at 0 and 1 and its outputs added
  4. $V$ picks a random number $r_1$ for the free variable and sends it to $P$
  5. $P$ replaces the free variable $x_1$ with the random number $r_1$ and computes a partial sum leaving next variable $x_2$ free
  6. $P$ and $V$ repeat steps similar to 3–5 for the rest of the variables: $x_2, \cdots, x_v$
  7. $V$ evaluates $g$ at one input using access to an “oracle”, i.e., $V$ must make a singled trusted execution of $g(r_1, r_2, \cdots, r_v)$

If $V$ makes it through Step 7 without error, then $V$ accepts $P$’s proof. Otherwise, $V$ rejects.

We expand the steps of the algorithm in the following sections.

1. $P$ calculates the total sum of $g$ evaluated at all Boolean inputs

$H$ is the sum described in the first equation in this article. In round zero, $P$ should perform the summation described in the first equation. $P$ assigns this sum to $g_0$ and returns it to $V$. Note that this sum is the #SAT solution, i.e., it gives the answer of how many unique inputs to the function result in 1 being output from $g$. The remainder of the sum-check algorithm is designed to ensure that $P$ performed the #SAT solution correctly.

In our example, $g_0 = 3$ will be returned in Step 1. The following table shows $P$’s work where the must iterate over all possible Boolean combinations of $x_1$,$x_2$, $x_3$, and $x_4$ and evaluate $g(x_1, x_2, x_3, x_4) = (1-x_1)x_2((x_3+x_4)-(x_3x_4))$:

$$\begin{aligned}g(0, 0, 0, 0) &= (1-0) \cdot 0 \cdot ((0 + 0)-(0 \cdot 0)) = 0\\
g(0, 0, 0, 1) &= (1-0) \cdot 0 \cdot ((0 + 1)-(0 \cdot 1)) = 0\\
g(0, 0, 1, 0) &= (1-0) \cdot 0 \cdot ((1 + 0)-(1 \cdot 0)) = 0\\
g(0, 0, 1, 1) &= (1-0) \cdot 0 \cdot ((1 + 1)-(1 \cdot 1)) = 0\\
g(0, 1, 0, 0) &= (1-0) \cdot 1 \cdot ((0 + 0)-(0 \cdot 0)) = 0\\
g(0, 1, 0, 0) &= (1-0) \cdot 1 \cdot ((0 + 1)-(0 \cdot 1)) = 1\\
g(0, 1, 1, 0) &= (1-0) \cdot 1 \cdot ((1 + 0)-(1 \cdot 0)) = 1\\
g(0, 1, 1, 0) &= (1-0) \cdot 1 \cdot ((1 + 1)-(1 \cdot 1)) = 1\\
g(1, 0, 0, 0) &= (1-1) \cdot 0 \cdot ((0 + 1)-(0 \cdot 1)) = 0\\
g(1, 0, 0, 1) &= (1-1) \cdot 0 \cdot ((0 + 1)-(0 \cdot 1)) = 0\\
g(1, 0, 1, 0) &= (1-1) \cdot 0 \cdot ((1 + 0)-(1 \cdot 0)) = 0\\
g(1, 0, 1, 1) &= (1-1) \cdot 0 \cdot ((1 + 1)-(1 \cdot 1)) = 0\\
g(1, 1, 0, 0) &= (1-1) \cdot 1 \cdot ((0 + 0)-(0 \cdot 0)) = 0\\
g(1, 1, 0, 0) &= (1-1) \cdot 1 \cdot ((0 + 1)-(0 \cdot 1)) = 0\\
g(1, 1, 1, 0) &= (1-1) \cdot 1 \cdot ((1 + 0)-(1 \cdot 0)) = 0\\
g(1, 1, 1, 0) &= (1-1) \cdot 1 \cdot ((1 + 1)-(1 \cdot 1)) = 0\end{aligned}$$

The outputs of the above evaluations are added to arrive at $g_0 = 3$.

2. $P$ computes a partial sum of $g$, leaving the first variable $x_1$ free

In Step 2, $P$ computes a similar sum as they did in Step 1, except the first variable is left free/unassigned. I.e.,

$$g_1(X_1) = \sum_{x_2 \in \{0,1\}} \sum_{x_3 \in \{0,1\}} \cdots \sum_{x_v \in \{0,1\}} g(X_1, x_2, \cdots, x_v)$$

Note that $g_1$ is a monomial in $X_1$, and all other variables have been assigned values in each iteration of the loop. We refer to this monomial as the “partial sum” in the next step. Also observe that the only difference between the definition for $g_1$ and the definition for $H$ is that the $g_1$ summation is missing $\sum_{x_1 \in \{0,1\}}$.

For our example, $g(x_1, x_2, x_3, x_4) = (1-x_1)x_2((x_3+x_4)-(x_3x_4))$. To calculate $g_1(X_1)$, we leave $x_1$ free and then calculate the sum of $g$ evaluated at all possible Boolean combinations assigned to $x_2$, $x_3$, and $x_4$. The following table contains the intermediate calculations:
$$\begin{aligned}g(X1, 0, 0, 0) &= (1-X1) \cdot 0 \cdot ((0 + 0)-(0 \cdot 0)) = 0\\
g(X1, 0, 0, 1) &= (1-X1) \cdot 0 \cdot ((0 + 1)-(0 \cdot 1)) = 0\\
g(X1, 0, 1, 0) &= (1-X1) \cdot 0 \cdot ((1 + 0)-(1 \cdot 0)) = 0\\
g(X1, 0, 1, 1) &= (1-X1) \cdot 0 \cdot ((1 + 1)-(1 \cdot 1)) = 0\\
g(X1, 1, 0, 0) &= (1-X1) \cdot 1 \cdot ((0 + 0)-(0 \cdot 0)) = 0\\
g(X1, 1, 0, 0) &= (1-X1) \cdot 1 \cdot ((0 + 1)-(0 \cdot 1)) = (1-X1)\\
g(X1, 1, 1, 0) &= (1-X1) \cdot 1 \cdot ((1 + 0)-(1 \cdot 0)) = (1-X1)\\
g(X1, 1, 1, 0) &= (1-X1) \cdot 1 \cdot ((1 + 1)-(1 \cdot 1)) = (1-X1)\end{aligned}$$
After summing all the rows in the table, we get $g_1(X_1) = -3X_1 + 3$.

3. $V$ checks that the partial sum and total sum agree when the partial sum is evaluated at 0 and 1 and its outputs added

In Step 3, $V$ checks that $P$ was consistent with what they committed in Step 0 and Step 1. Specifically, $V$ checks the following:

$$g_0 \stackrel{?}{=} g_1(0) + g_1(1)$$

In our example, $P$ sent $g_0 = 3$ and $g_1(X_1) = -3x_1 + 3$ to $V$ during Steps 1 and 2, respectively. So $V$ must now check that $3 \stackrel{?}{=} (-3 \cdot 0 + 3) + (-3 \cdot 1 + 3)$, which it does. If this check failed, then $V$ would end the protocol, and $P$’s claim for $g_0$ (which is the supposed answer to the #SAT problem) would be rejected.

Note here what is happening: in Step 1, $P$ made a commitment to the #SAT answer. In Step 2, $P$ also did almost all of the work for $V$, except $V$ still needed to evaluate the monomial $g_1(X_1)$, at two points: $X_1=0$ and $X_1=1$. This is much less work than what $P$ must do.

Observe that at this point, $V$ doesn’t know whether the polynomial being evaluated by $P$ is actually the polynomial of interest, $g$. At this point, $V$ only knows $P$ has been consistent with the polynomial they have used. In fact, everything until Step 7 is only used to verify the consistency of $P$’s answers. Step 7 resolves this uncertainty and is used to prove that $P$ has been both consistent and correct. This observation is one of the “slippery” aspects in understanding the sum-check protocol.

4. $V$ picks a random number $r_1$ for the free variable and sends it to $P$

Before beginning this step, it’s time to introduce a security aspect of the protocol. When the protocol starts, $V$ tells $P$ what finite field $\mathbb{F_p}$ they will be working with. This means that all numbers will be restricted (by using the modulus operator) to be between 0 and $p$. For example, if $P$ is told that $p=16$ then $P$ will reduce everything by mod 16, e.g. $74 \text{ mod } 16 = 10$ and $31x + 17 \text{ mod } 16 = 15x + 1$.

Next, $V$ picks a random number $r_1$ for $P$ to assign to $X_1$. $r_1$ is selected from the finite field $\mathbb{F}_p$. For the running example in this article, we will pick $p = 16$. From here on, all example calculations will be performed mod 16.

For the remainder of the protocol, $P$ will always replace $X_1$ with $r_1$. Note that something interesting is now happening. Up until this point, $P$ only evaluated $g$ with Boolean inputs, but now the protocol is leveraging the fact that because $g$ is a polynomial, it can be evaluated using arbitrary integers instead of only 0s or 1s. This is where the security of the protocol is derived: if $p$ is very large, it will be difficult for $P$ to guess the random challenges before receiving them.

For our running example, we will assume that $V$ selects $r_1 = 2$.

5. $P$ replaces the free variable $x_1$ with the random number $r_1$ and computes a partial sum leaving next variable $x_2$ free

$P$ calculates the sum again, except now they replace $X_1$ with the random number $r_1$:

$$g_2(X_2) =\sum_{x_3 \in \{0,1\}} \sum_{x_4 \in \{0,1\}} \cdots \sum_{x_v \in \{0,1\}} g(r_1, X_2, x_3, \cdots, x_v)$$

For our example, near the end of Step 3, $P$ calculated $g_2(X_1, X_2) = -3X_1X_2 + 3X_2$. At the end of Step 4 above, we mention that $V$ selected $r_1 = 2$. So $P$ replaces $X_1$ with $r_1$, giving $g_2(X_2) = -3X_2$. $P$ returns $g_2(X_2) = -3X_2 \text{ mod } 16 = 13X_2$ to $V$.

6. $P$ and $V$ repeat steps similar to 3–5 for the rest of the variables: $x_2, \cdots, x_v$

Steps 3 through 5 above constitute one “round” of the protocol. In total, if $v$ is the number of variables in $g$, then $v$ rounds in total are required to complete the protocol. Each round includes the same operations as given in Steps 3 through 5 above, with the exception that different variables are left free each round.

For example, at the end of Step 5, $P$ returns $g_2(X_2)$ to $V$. If we apply the pattern in Step 3, noting that $g_1(r_1) = -3X_1 + 3 \text { mod } 16 = 13X_1 + 3$, we see that that $V$ must now check that:

$$\begin{aligned}g_1(r_1) \text{ mod } 16 &\stackrel{?}{=} (g_2(0) + g_2(1)) \text{ mod } 16 \\
(13 \cdot 2 + 3) \text{ mod } 16 &\stackrel{?}{=} (13 \cdot 0 + 13 \cdot 1) \text{ mod } 16 \\
13 \text{ mod } 16 &\stackrel{\checkmark}{=} 13 \text{ mod } 16\end{aligned}$$

So round 2 passes.

In our example, $g$ has four variables, so we will cycle through Steps 3–5 four times, then we move to Step 7.

7. $V$ evaluates $g$ at one input using access to an “oracle”, i.e., $V$ must make a singled trusted execution of $g(r_1, r_2, \cdots, r_v)$

After round $v$ has successfully been checked, then $g_v(X_v)$ has incorporated $r_1, r_2, \cdots, r_{v-1}$. The last step is for $V$ to select $r_v$ and use an oracle to evaluate $g(r_1, r_2, \dots, r_v)$. The oracle is a trusted source which will be guaranteed to evaluate $g$ correctly at a single input. Alternatively, if no oracle is available, then $V$ must compute $g$ once by itself. Without an oracle, $V$ would be required to store and compute $g$ — this could be costly.

After $g(r_1, r_2, \dots, r_v)$ is provided by the oracle, or computed by $V$ itself, then $V$ checks:

$$g(r_1, r_2, \dots, r_v) \stackrel{?}{=} g_v(r_v)$$

If this check passes, then $P$ accepts $V$’s claim that $g_0$ is equal to $H$, or, in our example, that $g_0$ is the #SAT of $g$.

Conclusions and further reading

Python code implementing sum-check can be found in this repo. We omitted security checks in this tutorial for brevity, but those checks are included in the linked repo. Also, this article did not explain why sum-check is secure, nor did it cover the practical costs of the algorithm. Briefly, the security of sum-check can be analyzed using the Schwartz–Zippel lemma — it is basically like random sampling used for quality control. Regarding the costs, to solve #SAT without the sum-check protocol, $V$ would have had to evaluate $g$ at $2^v$ inputs, but with sum-check, the cost for $V$ drops to $v$ steps of the protocol plus a single evaluation of $g$. If you want more depth, see Justin’s book, which is linked above, and these resources:

1) Sum-check article by Justin Thaler: The Unreasonable Power of the Sum-Check Protocol

2) Sum-check notes by Edge & Node cryptographer, Gabriel Soule: https://drive.google.com/file/d/1tU50f-IpwPdCEJkZcA7K0vCr7nwwzCLh/view

If you have made it this far, you now hopefully have a feel for how interactive arguments of knowledge work. This is the first time we have mentioned “interactive”. Note that $P$ and $V$ had to communicate in each round. This is not ideal in scenarios where computers can go offline, i.e., the real world. The Fiat–Shamir transform makes sum-check non-interactive. As you may know, SNARKs also eliminate interactions, as implied by the “N” in the name which stands for “Non-interactive”. SNARKs also use Fiat-Shamir. Perhaps we will make a part 2 of this article where we implement a non-interactive version of sum-check.