Multilinear Ext. of Branching Programs

The techniques in this section were first explored by Holmgren and Rothblum [HR18].

Introduction

For a general function, $f : {0, 1}^{k} \to F$ , we can evaluate its multilinear extension as:

$\tilde{f} (x) = b \in H_{k} \sum eq (x, b) \cdot f (b)$

This requires $O (2^{k})$ time, linear in the domain of the function, because we enumerate every possible input. For a black-box function this is the best we can do: the smallest possible description of the function may be its evaluation table. However, it turns out that for certain classes of functions, we can do better, much better. One such class is the class of read-once branching programs, which we will now explore.

Read-Once Branching Programs

Let $Σ = {0, 1}^{b}$ be the Alphabet, with $b$ the (bit) size of a single symbol. Let $n$ be the length of the branching program. A Read-Once Branch Program is a function:

$P : Σ^{n} \to F$

Implemented by "walking" through a directed graph: starting at a vertex designated the "source", until you reach the "sink" vertexes, each of which are labeled with a field element.

A picture is worth a thousand words.

Example

Let b = 1, n = 4, in other words:

$Σ = {0, 1}$
$P : {0, 1}^{4} \to {0, 1}$

The program looks like this:

Read-once branching program generated from DOT, showing the same structure programmatically

The program has length 4 and width 2.

Exercise

What is is the output of the program on 0 0 1 1?
What is is the output of the program on 1 0 1 0?
What does this program do?

Example

Let b = 3, n = 2. In other words:

$Σ = {0, 1}^{3}$
$P : ({0, 1}^{3})^{2} \to {0, 1}$

The program looks like this:

Read-once branching program with alphabet size b=3, showing multi-labeled edges between states over 2 steps

This simply means that there are e.g. 4 edges from the (teal) source node with the labels 000, 101, 011, 110 going to the top green node. We omit these to avoid cluttering the diagram with a lot of edges. The program has length 2: because it takes two symbols from the alphabet ${0, 1}^{3}$ .

Exercise

What is the width of this Read-Once Branching Program?
What is the output of the braching program on 000 010?
What is the output of the braching program on 011 110?
What does this program do?

Matrix Branching Programs

To "arithmetize" the read-once braching programs, instead express them as Matrix Branching Programs.

For simplcity, we let $b = 1$ , i.e. $Σ = {0, 1}$ . The technique extends trivially to larger $b$ .

A matrix branching program consists of $n$ pairs of $w \times w$ matrixes $(M_{i}^{0}, M_{i}^{1})$ :

$(M_{1}^{(0)}, M_{1}^{(1)}), (M_{2}^{(0)}, M_{2}^{(1)}), \dots, (M_{n}^{(0)}, M_{n}^{(1)})$

And sink vector $u \in F^{w}$ .

To evaluate the Matrix Branching Program on $m \in {0, 1}^{n}$ compute:

$v = (i = n \prod 1 M_{i}^{m_{i}}) \cdot u = (M_{n}^{(m_{n})} \dots M_{2}^{(m_{2})} \cdot M_{1}^{(m_{1})}) \cdot u = (M_{n}^{(m_{n})} \dots (M_{2}^{(m_{2})} (M_{1}^{m_{1}} u)))$

The output of the MBP is $v_{1}$ , the first entry of $v$ . Observe that even though matrix multiplication is linear, Matrix Branching Programs (MBPs) are not linear in $m$ : the picking of each matrix from the pairs gives them some "discrete" structure.

Adjacency Matrixes: ROBP $\to$ MBP

Walks in directed graphs can be expressed by application of an adjacency matrix $M$ , which has $M_{i, j} = 1$ iff. there is an edge from node $i$ to $j$ and $0$ otherwise. Hence it should be no surprise that we can eeasily convert Read-Once Branching Programs to Matrix Branching Programs: the trick is to define pairs of matrixes:

$(M_{i}^{(0)}, M_{i}^{(1)})$

Such that $M_{i}^{(0)}$ describes the transitions from the current stage to the next when input $m_{i} = 0$ and $M_{i}^{(0)}$ describes the transitions from the current stage to the next when $m_{i} = 1$ . Letting $u = (1, 0, \dots, 0)$ , meaning the "active" node is the first node in the stage, by computing:

$M_{n}^{0} \cdot u = (0, \dots, 0, 1, 0, \dots, 0)$

Where $M_{n}$ has

An example and an image:

Example

Recall this guy:

Read-once branching program generated from DOT, showing the same structure programmatically

If we look just at the 0-labelled edges:

Graph showing only the 0-labeled edge transitions in the branching program

We can describe the transition using this sequence of matrixes:

$M_{0}^{0} = (1000), M_{1}^{0} = (1001), \dots, M_{n}^{0} = (1001),$

The way to read these: the row indicates the current node, the column is the next node. There is a $1$ iff. the current node, goes to the next node on input $0$ .

Similarly, we can look at the “1 edges”:

Graph showing only the 1-labeled edge transitions in the branching program

$M_{1}^{(1)} = (01 am p; 0 am p; 0)$ $M_{2}^{(1)} = (0 am p; 11 am p; 0), \dots, M_{n}^{(1)} = (0 am p; 11 am p; 0),$

As a santity check, let’s check the output on 1010:

$v = M_{4}^{(0)} (M_{3}^{(1)} (M_{2}^{(0)} (M_{1}^{(1)} (u))))$

Step-by-step:

The program is in the bottom state: $M_{1}^{(1)} \cdot u = (0 am p; 01 am p; 0) \cdot (10) = (01)$
Leaves the program in the bottom state: $M_{2}^{(0)} \cdot (01) = (1 am p; 00 am p; 1) \cdot (01) = (01)$
Changes the program state to the top: $M_{3}^{(1)} \cdot u = (0 am p; 11 am p; 0) \cdot (01) = (10)$
Leaves the program state in the top state: $M_{3}^{(0)} \cdot u = (1 am p; 00 am p; 1) \cdot (10) = (10)$

By convention the Matrix Branching Program produces $1$ as output: as that is the first entry of the vector $(1, 0)^{T}$ produced by the product we just computed.

Exercise

What if the label of the second sink was not 0? What if the label of the first sink was not 1? What if they were switched?
What if my branching program had multiple sinks with the same label?
What if some of the sink labels were not 0/1?
e.g. how do I make the ROBP output $a, b \in F$ instead of $0, 1$ ?

Hint: Use “single row” matrixes, i.e. $(a am p; b 0 am p; 0)$

Symbolic Evaluation

A note about Symbolic Evaluation of MBPs:

By computing: $(\prod_{i = n}^{1} M_{i}^{(m_{i})}) = row_{1} ⋮ row_{w}$ And taking the first row, equivalently: $e_{1}^{T} \cdot (\prod_{i = n}^{1} M_{i}^{(m_{i})}) = row_{1}$ Where $e_{1} = (1, 0, \dots, 0)$ .
We can compute the output for any sink $u$ as $⟨ u, ro w_{1} ⟩$

This means that for an input $m$ we can "precompute" a vector $row_{1}$ of length $w$ which allows us to evaluate the branching program for any sink vector: e.g. we can explore which node we would end up in if we "started" in another node than $u = (1, 0, \dots, 0)$

Multilinear Basis Polynomials

All the "non-linearlity" of a Matrix Branching Program is "contained" in the selection of the correct matrix. To implement this section, we use multilinear Lagrange polynomials:

$eq (X, Y) \in F^{\leq 1} [X, Y]$

Such that for $x \in {0, 1}^{n}$ and $y \in {0, 1}^{n}$ we have:

$eq (x, y) = 1 ⟺ x = y$

And otherwise $eq (x, y) = 0$ .

Low-Degree Extensions of Matrix Branching Programs

Let's start small, with an MBP of length 1. I claim that:

$f (X_{1}) = e_{1}^{T} \cdot (eq (X_{1}, 0) \cdot M_{1}^{(0)} + eq (X_{1}, 1) \cdot M_{1}^{(1)}) \cdot u$

Is the low-degree extension of the MBP $((M_{1}^{(0)}, M_{1}^{(1)}), u)$

Exercise

Pause and think: that does this expression “do”?
Convince yourself: when $x_{1} = 0$ and $x_{1} = 1$ it is correct.
Why is $f (X_{1})$ the unique “multi“linear extension?

This trick extends to MBPs of length greater than $1$ :

We compute the "mixed" matrixies: $M_{i} = eq (x_{i}, 0) \cdot M_{i}^{(0)} + eq (x_{i}, 1) \cdot M_{i}^{(1)}$
We compute their product: $M_{*} = \prod_{i = n}^{1} M_{i}$
Compute $M_{*} \cdot u$ and take the first component.

Another, completely equivalent method:

Define: $χ^{(n)} = e_{1}^{T} \cdot M_{n} \in F^{w}$ Where $e_{1} = (1, 0, \dots, 0)$ which has the effect of "taking" the first row.
Iteratively compute: $χ^{(i)} = χ^{(i + 1)} \cdot M_{n_{1}} \in F^{w}$
Output $⟨ χ^{(1)}, u ⟩$

Which allows you to compute the low-degree extension in a streaming way, if needed for your application.

Example

Let’s compute the multilinear extension for the simple MBP from the earlier example:

Read-once branching program generated from DOT, showing the same structure programmatically

Recall the matrices are: $M_{i}^{(0)} = (1001), M_{i}^{(1)} = (0110)$ for $i = 2, 3, 4$ , and: $M_{1}^{(0)} = (1000), M_{1}^{(1)} = (0100)$

Let’s compute the multilinear extension $f (X_{1}, X_{2}, X_{3}, X_{4})$ and evaluate it at $X_{1} = \frac{1}{2}, X_{2} = X_{3} = X_{4} = 0$ .

First, compute the mixed matrices $M_{i} = eq (X_{i}, 0) \cdot M_{i}^{(0)} + eq (X_{i}, 1) \cdot M_{i}^{(1)}$ :

For $i = 1$ with $X_{1} = \frac{1}{2}$ : $M_{1} = \frac{1}{2} (1000) + \frac{1}{2} (0100) = (\frac{1}{2} \frac{1}{2} 00)$

For $i = 2, 3, 4$ with $X_{i} = 0$ : $M_{i} = 1 \cdot (1001) + 0 \cdot (0110) = (1001)$

Now compute using the streaming method with $u = (1, 0)^{T}$ :

Start: $χ^{(4)} = e_{1}^{T} \cdot M_{4} = (1, 0) \cdot (1001) = (1, 0)$
$χ^{(3)} = χ^{(4)} \cdot M_{3} = (1, 0) \cdot (1001) = (1, 0)$
$χ^{(2)} = χ^{(3)} \cdot M_{2} = (1, 0) \cdot (1001) = (1, 0)$
$χ^{(1)} = χ^{(2)} \cdot M_{1} = (1, 0) \cdot (\frac{1}{2} \frac{1}{2} 00) = (\frac{1}{2}, 0)$

Finally: $f (\frac{1}{2}, 0, 0, 0) = ⟨ χ^{(1)}, u ⟩ = ⟨(\frac{1}{2}, 0), (1, 0)⟩ = \frac{1}{2}$

This makes sense! The multilinear extension at $X_{1} = \frac{1}{2}$ gives us the “average” of $P (0, 0, 0, 0) = 1$ and $P (1, 0, 0, 0) = 0$ .

Exercise

Why is this the unique multilinear extension of $P$ ?

Hint, we need to check:

$f (x_{1}, \dots, x_{n})$ is multilinear
$f (x_{1}, \dots, x_{n})$ agrees with $P$ on $x_{1}, \dots, x_{n} \in {0, 1}^{n}$

Keyboard shortcuts