TL;DR

It is expensive to run transactions on the Ethereum EVM. Verifiable computing (VC) lets us outsource computing away from the EVM. Today, a popular and exciting form of VC algorithm is the SNARK. There are various families of SNARKs that use the sum-check protocol, which is a simple algorithm to introduce VC. This is a tutorial on the sum-check protocol. This post is focused on how the sum-check protocol is implemented – it does not go into theory. You can skip straight to the finished code here.

Thank you to Gokay Saldamli and Gabriel Soule for reviewing this article.

Background

Executing code on Ethereum is expensive. Verifiable computing algorithms promise a way to reduce costs, by outsourcing computing to untrusted parties and only verifying the result on-chain. A key point about useful verifiable computing algorithms is that the verification must be less expensive than the original computation. The sum-check protocol is a foundational algorithm in the field of verifiable computing. In a stand-alone setting, sum-check is not particularly useful. However, it is an important building block of more sophisticated and useful SNARK algorithms. This tutorial introduces sum-check as a way to familiarize the reader with a relatively simple verifiable computing algorithm.

The sum-check protocol is used to outsource the computation of the following sum:

$$H := \sum_{\vec{b} \in \{0,1\}^v} g(b_1, b_2, \cdots, b_v)$$

where $g$ is a $v$-variate function and $\vec{b} \in \{0,1\}^v$ is the set of all $v$-bit strings, e.g., if $v = 2$, then $\{0,1\}^2 = \{(0,0), (0,1), (1,0), (1,1)\}$. One other detail about the summation defining $H$ is that it is performed in the finite field $\mathbb{F}_p$. That means all arithmetic is reduced modulo an integer $p$, where $p$ is publicly known and chosen before the protocol begins.

The sum-check protocol allows a Prover $P$ to convince a Verifier $V$ that $P$ computed the sum $H$ correctly. You can assume that $P$ has ample computational power or ample time to compute, e.g., it could be an untrusted server somewhere on the internet, and $V$ is has limited computational power, e.g., it could be the EVM.

Why would you want to sum a function over all Boolean inputs, as is done in the equation above? One reason would be if you wanted to provide an answer to the #SAT (“sharp sat”) problem. #SAT is concerned with counting the number of inputs to a Boolean-valued function that result in a true output. For example, what is the #SAT of $f(b_1, b_2, b_3) = b_1 \text{ AND } b_2 \text{ AND } b_3$? It is 1, because only a single input, $b_1=1$, $b_2=1$, $b_3 = 1$, results in 1 being output from $f$. You can probably see that it would be computationally intensive for you to solve #SAT for complex problems. With the sum-check protocol, $P$ can do that work instead of you.

Unless you are a complexity theorist, you may not get too excited about the #SAT problem at first glance. However, more practical problems can be mapped to #SAT, and then sum-check can be used. For example, given an adjacency matrix, you can use sum-check to solve for how many triangles are connected. You can also use sum-check to outsource matrix multiplication. Check out the book mentioned next for more details.

In this tutorial, we use the same example as given in Justin Thaler’s book: Proofs, Arguments, and Zero-Knowledge. In Justin’s book, sum-check is used to solve #SAT for the following example function:

$$\phi(x_1, x_2, x_3, x_4) = (\bar{x_1} \text{ AND } x_2) \text{ AND } (x_3 \text{ OR } x_4)$$

where $\bar{x_1}$ means NOT $x_1$. The function $\phi$ (“phi”) can also be represented graphically as:

In the graph of $\phi$, we see the logic symbols for AND ($\wedge$) and OR ($\vee$). Note that $\phi$ is only defined over the Boolean inputs (0/1) and that $\phi$ has a Boolean output. However, most popular verifiable computing algorithms only support arithmetic operations. So we need to convert $\phi$ to its arithmetic version. The arithmetized version is called $g$. For our example, $g$ is defined as:

$g(x_1, x_2, x_3, x_4) = (1-x_1)x_2((x_3+x_4)-(x_3x_4))$

and visualized as:

In $g$, notice that the Boolean functions: NOT, AND, OR have been compiled into their arithmetic equivalents. Take a moment to convince yourself that if you pick random Boolean values for $x_1$, $x_2$, $x_3$, and $x_4$, that $\phi$ and $g$ will output the same thing.

For reference, here are the specific compiler rules used when converting $\phi$ into $g$:

Boolean gate       Arithmetized version
A AND B A*B
A OR B (A+B)-(A*B)
NOT A 1-A

Notice that $g$ has extended the domain of our original Boolean formula and is now valid for Boolean inputs, integers, real numbers, and complex numbers. We will only use Boolean and integer values in the sum-check protocol. The facts that $g$ equals $\phi$ when evaluated at Boolean inputs and that integers are valid inputs to $g$ is leveraged by the sum-check protocol.

In the second part of this tutorial, we will implement the sum-check protocol in Python and apply the protocol to the example function defined by $\phi$. The implementation leverages SymPy, which is a Python package for symbolic manipulation of mathematical objects. In the following snippet, SymPy is used to instantiate $g$. Treat the code in this tutorial as pseudocode – see this GitHub repo for the finished and commented version.

import numpy as np
import sympy as sp

# `g` is the arithmetized version of `phi`.
x1, x2, x3, x4 = sp.symbols('x1, x2, x3, x4')
g = sp.Poly((1-x1)*x2*((x3+x4)-(x3*x4)), x1, x2, x3, x4)

The sum-check protocol

This section introduces the sum-check protocol. We will follow the same steps and naming convention as used in Justin’s book. The protocol steps are summarized as:

  1. Prover $P$ calculates the total sum of $g$ evaluated at all Boolean inputs
  2. $P$ computes a partial sum of $g$, leaving the first variable $x_1$ free
  3. Verifier $V$ checks that the partial sum and total sum agree when the partial sum is evaluated at 0 and 1 and its outputs added
  4. $V$ picks a random number $r_1$ for the free variable and sends it to $P$
  5. $P$ replaces the free variable $x_1$ with the random number $r_1$ and computes a partial sum leaving next variable $x_2$ free
  6. $P$ and $V$ repeat steps similar to 3-5 for the rest of the variables $x_2, \cdots, x_v$
  7. $V$ evaluates $g$ at one input using access to an “oracle”, i.e. $V$ must make a singled trusted execution of $g(r_1, r_2, \cdots, r_v)$

If $V$ makes it through Step 7 without error, then $V$ accepts $P$’s proof. Otherwise, $V$ rejects.

As we introduce the protocol, we will gradually introduce the Prover and Verifier classes required by the protocol. In the following constructor, we see that the Prover is only given the arithmetic function $g$. From that, the Prover can deduce other required information, including the name and number of variables in $g$ (assuming the $v$ variables in $g$ are called $x_1, x_2, \cdots, x_v$) and the number of rounds in the protocol.

# Helper function
def get_variables(v):
    """Creates a list of `v` variables named x1,x2,...x`v`.

    Returns:
      List of variables in `g`. E.g. [x1, x2, ..., xv]
    """
    ret_val = []
    for i in range(1,v+1):
        ret_val.append(eval(f'x{i}'))
    return ret_val

class Prover:
    def __init__(self, g):
        """Initializes P with the arithmetic function `g`.

        Args:
          g: The function that we will apply sum-check to.
        """
        self.g = g
        self.num_vars = self.rounds = len(g.free_symbols)
        self.variables = get_variables(self.num_vars)
1. $P$ calculates the total sum of $g$ evaluated at all Boolean inputs

This is round zero of the protocol. $H$ is the sum described in the first equation in this article. In round zero, $P$ performs the summation described in the first equation. We call this sum $C_1$, where we believe “C” stands for a “claim”, as in $P$ claims that $C_1$ is $H$, the sum of interest. Also, note that this sum is the #SAT solution, i.e., it gives the answer of how many unique inputs to the function result in 1 being output from $g$.

The following code shows $P$’s method needed to calculate $C_1$, which is never assigned to that variable explicitly, but it is calculated during round zero.

def getSum(self, round):
    """Calculates the sum of `g`.

    Args:
      round: The round of the protocol.
    """
    # Helper function to convert an integer to binary 
    #   string with `n` leading zeros
    getbinary = lambda x, n: format(x, 'b').zfill(n)

    # Number of variables to sum over a Boolean hypercube
    free_vars = self.rounds-round

    # Initialize the sum to zero
    g_j = 0
    # Sum the assigned function `g` over the Boolean
    #   hypercube.
    for input_int in range(2**free_vars):
        # Convert integer to a binary string
        input_bin = getbinary(input_int, free_vars)
        # Assign each bit in binary string to corresponding
        #   variable. E.g. input_bin[0] -> x_1,
        #   input_bin[1] -> x_2, ...
        assignments = {}
        for bit, var in zip(
            input_bin,
            self.variables[round:]
        ):
            assignments[var] = bit
        # Substitute variable names for bit assignments, 
        #   evaluate `g` and accumulate result in `g_j`
        g_j += self.g.subs(assignments)

    return g_j

In our example, after round zero, the value $g_0 = 3$ will be returned. The value returned at round zero is also called $C_1$, i.e., $P$’s answer to the #SAT problem applied to $g$.

2. $P$ computes a partial sum of $g$, leaving the first variable $x_1$ free

In round one, $P$ computes a similar sum as they did in round zero, except the first variable is left free. I.e.,

$$g_1(X_1) = \sum_{(x_2, \cdots x_v) \in \{0,1\}^{v-1}} g(X_1, x_2, \cdots, x_v)$$

Note that $g_1$ is a univariate function. All other variables have been eliminated.

In our example, for round one, getSum will return $g_1(X_1) = -3X_1 + 3$. We refer to this polynomial as the “partial sum” in the next step.

3. $V$ checks that the partial sum and total sum agree when the partial sum is evaluated at 0 and 1 and its outputs added

In Step 2, $P$ basically recalculated the summation that they performed in Step 1, except they left $X_1$ free. In Step 3, $V$ checks that $P$ was consistent with what they committed in rounds zero and one. Specifically, $V$ checks the following:

$$C_1 \stackrel{?}{=} g_1(0) + g_1(1)$$

In our example, $P$ sent $C_1 = 3$ and $g_1(X_1) = -3x_1 + 3$ to $V$ during rounds zero and one, respectively. So $V$ must now check that $3 \stackrel{?}{=} (-3 \cdot 0 + 3) + (-3 \cdot 1 + 3)$, which it does. If this check failed, then $V$ would end the protocol, and $P$’s claim for $C_1$ (which is the supposed answer to the #SAT problem) would be rejected.

Note here what is happening: in round zero, $P$ made a commitment to their answer. In round one, $P$ also did almost all of the work for $V$, except $V$ still needed to evaluate the single degree polynomial, $g_1(X_1)$, at two points: $X_1=0$ and $X_1=1$. This is much less work than what $P$ must do. The rest of the protocol will proceed like this, with the remainder of the variables: $x_2, \cdots, x_v$.

The following code implements the logic described in the previous paragraph:

class Verifier:
def __init__(self, g):
    """Initializes Verifier.

    Args:
        g: The arithmetic function to use for the protocol
    """
    self.g = g
    self.num_vars = self.rounds = len(g.free_symbols)
    self.variables = get_variables(self.num_vars)

def executeSumCheck(self, prover):
    """Allows Verifier to execute the sum-check protocol.

    Args:
        prover: Instance of a Prover object.
    """
    for round in range(self.rounds + 1):
        g_j = prover.getSum(round)

        if round > 0:
            # Must cast `g` to a SymPy Polynomial for 
            #   some reason.
            g_j = sp.Poly(g_j)

        if round == 0:
            # Save the claimed answer for printing
            C1 = g_j
        elif round == 1:
            assert last_g_j == g_j(0) + g_j(1)

        last_g_j = g_j
    
    print(f"Sum-check passed! Answer is {C1}")
4. $V$ picks a random number $r_1$ for the free variable and sends it to $P$

Next, $V$ picks a random number $r_1$ for $P$ to assign to $X_1$. A subtlety of the protocol is that $r_1$ is selected from a finite field $\mathbb{F}_p$. For this example, we will pick $p = 16$, so all calculations will be done mod 16.

For the remainder of the protocol, $P$ will always replace $X_1$ with $r_1$. Here is the modified logic for $V$ (including the use of modular reduction):

class Verifier:
    def __init__(self, g: sp.Poly, F: int) -> None:
        """Initializes Verifier.

        Args:
          g: The arithmetic function to use for the protocol.
          F: Finite field selection. This is the security parameter.

        """
        self.F = F
        self.g = g
        # In sum-check, the verifier must be able to make one call to an oracle
        self.num_vars = self.rounds = len(g.free_symbols)
        self.variables = get_variables(self.num_vars)
        self.challenges = np.random.randint(F, size=len(self.variables))

    def getRandom(self, round: int) -> List:
        """Gets random variable assignments to send to Prover.

        Args:
          round: Current round number of protocol.

        Returns:
          List of random values.
        """
        random = {}
        for rnd in range(round - 1):
            random[self.variables[rnd]] = self.challenges[rnd]
        return random

    def executeSumCheck(self, prover: Prover) -> None:
        """Allows a verifier to execute the sum-check protocol.

        Args:
          prover: Instance of a Prover object.

        Returns:
          None
        """
        for round in range(self.rounds + 1):
            random = self.getRandom(round)
            g_j = prover.getSum(round, random)

            if round > 0:
                # Must cast `g` to a SymPy Polynomial for some reason.
                g_j = sp.Poly(g_j)

                # Check that g_j is univariate
                assert len(g_j.free_symbols) <= 1

                # Check that g_j is of degree at most deg_j(g)
                assert sp.degree(g_j, gen=self.variables[round - 1]) <= sp.degree(
                    self.g, gen=self.variables[round - 1]
                )

            if round == 0:
                # Save the claimed answer for printing
                C1 = g_j
            elif round == 1:
                assert last_g_j == (g_j(0) + g_j(1)) % self.F

            last_g_j = g_j

When we ran the code, $V$ selected $r_1 = 2$.

5. $P$ replaces the free variable $x_1$ with the random number $r_1$ and computes a partial sum leaving next variable $x_2$ free

$P$ calculates the sum again, except now they replace $X_1$ with the random number $r_1$:

$$g_2(X_2) = \sum_{(x_3, x_4, \cdots x_v) \in {\{0,1\}}^{v-2}} g(r_1, X_2, x_3, \cdots, x_v)$$

$P$’s getSum method must now be modified to support random values:

def getSum(self, round: int, r: List) -> sp.Poly:
    """Calculates the sum of `g`.

    Args:
        round: The round of the protocol.
        r: Random challenge(s) from Verifier.
        F: Finite field selection. This is the security parameter.

    Returns:
        Sum of `g`
    """

    # Helper function to convert an integer to binary string with `n`
    #   leading zeros
    getbinary = lambda x, n: format(x, "b").zfill(n)

    # Number of variables to sum over a Boolean hypercube
    free_vars = self.rounds - round

    # Initialize the sum to zero
    g_j = sp.Poly(0, x1)
    # Sum the assigned function `g` over the Boolean hypercube
    for input_int in range(2**free_vars):
        input_bin = getbinary(input_int, free_vars)
        assignments = {}
        for bit, var in zip(input_bin, self.variables[round:]):
            assignments[var] = bit
        print(f"g_j before bit subs: {g_j}")
        # Substitute variable names for bit assignments,
        #   evaluate `g` and accumulate result in `g_j`
        g_j += self.g.subs(assignments)
        g_j = g_j

    if round > 1:
        g_j = g_j.subs(r)

    # Perform modular reduction and return result
    return g_j.set_modulus(self.F)

For our example, near the end of round two, $P$ calculated $g_2(X_1, X_2) = -3X_1X_2 + 3X_2$. At the end of Step 5 above, we mention that $V$ selected $r_1 = 2$. So $P$ replaces $X_1$ with $r_1$, giving $g_2(X_2) = -3X_2$. $P$ returns $g_2(X_2) = -3X_2 \text{ mod } 16 = 13X_2 \text{ mod } 16$ to $V$.

6. $P$ and $V$ repeat steps similar to 3-5 for the rest of the variables $x_2, \cdots, x_v$

Steps 3 through 5 above constitute one “round”. In total, if $v$ is the number of variables in $g$, then $v$ rounds in total are required to complete the protocol. Each round includes the same operations as given in Steps 3 through 5 above, with the exception that different variables are left free each round.

For example, at the end of Step 5, $P$ returns $g_2(X_2)$ to $V$. If we apply the pattern in Step 3, noting that $g_1(r_1) = -3X_1 + 3 = 13X_1 + 3\text { mod } 16$, we see that that $V$ must now check that:

$$\begin{aligned}g_1(r_1) \text{ mod } 16 &\stackrel{?}{=} (g_2(0) + g_2(1)) \text{ mod } 16 \\
(13 \cdot 2 + 3) \text{ mod } 16 &\stackrel{?}{=} (13 \cdot 0 + 13 \cdot 1) \text{ mod } 16 \\
13 \text{ mod } 16 &\stackrel{\checkmark}{=} 13 \text{ mod } 16\end{aligned}$$

So round 2 passes.

In our example, $g$ has four variables, so the protocol will end after round 4 completes.

7. $V$ evaluates $g$ at one input using access to an “oracle”, i.e. $V$ must make a singled trusted execution of $g(r_1, r_2, \cdots, r_v)$

After round $v$ has successfully been checked, then $g_v(X_v)$ has incorporated $r_1, r_2, \cdots, r_{v-1}$. The last step is for $V$ to select $r_v$, use an oracle to evaluate $g(r_1, r_2, \dots, r_v)$. It must be guaranteed that the oracle evaluates $g$ correctly. Alternatively, if no oracle is available, then $V$ must compute $g$ once by itself. Without an oracle, $V$ would be required to store and compute $g$ — this could be costly.

After $g(r_1, r_2, \dots, r_v)$ is provided by the oracle, or computed by $V$ itself, then $V$ checks:

$$g(r_1, r_2, \dots, r_v) \stackrel{?}{=} g_v(r_v)$$

If this check passes, then $P$ accepts $V$’s claim that $C_1$ is equal to $H$, or, in our example, that $C_1$ is the #SAT of $g$.

Conclusions and further reading

The complete code for this tutorial can be found in this repo. We omitted security checks in this tutorial for brevity, but those checks are included in the linked repo. Also, this article did not explain why sum-check is secure, nor did it cover the computational cost of the algorithm. Briefly, the security of sum-check can be analyzed using the Schwartz–Zippel lemma — it is basically like random sampling used for quality control. Regarding the cost, to solve #SAT, $V$ originally had to evaluate $g$ at $2^v$ inputs, but with sum-check, the cost for $V$ drops to $v$ steps of the protocol plus a single evaluation of $g$. If you want more depth, see Justin’s book, which is linked above, and these resources:

1) Sum-check article by Justin Thaler: The Unreasonable Power of the Sum-Check Protocol

2) Sum-check notes by Edge & Node cryptographer, Gabriel Soule: https://drive.google.com/file/d/1tU50f-IpwPdCEJkZcA7K0vCr7nwwzCLh/view

If you have made it this far, you now hopefully have a feel for how interactive arguments of knowledge work. This is the first time we have mentioned “interactive”. Note that $P$ and $V$ had to communicate in each round. This is not ideal in scenarios where computers can go offline, i.e., the real world. The Fiat–Shamir transform makes sum-check non-interactive. As you may know, SNARKs also eliminate interactions, as implied by the “N” in the name which stands for “Non-Interactive”. SNARKs also use Fiat-Shamir. Perhaps we will make a part 2 of this article where we implement a non-interactive version of sum-check.