What makes blockchains secure? [3/5]
In the first two parts of this series, we explored the interplay between the probabilistic nature of distributed block data structures (DBDS) and how one has to modify the traditional analysis of Byzantine Fault Tolerant (BFT) systems for DBDS. Recall that our goal with this series is to demonstrate that there are a small number of common security primitives that underlie most (if not all) distributed consensus algorithms and DBDS. We first evinced this by discussing the seeming universality of analysis techniques for DBDS and distributed consensus protocols, most of which appear to do the following: list a set of axioms, construct an environment for a game to be played between honest agents and a malicious adversary, and then provide probabilistic bounds of the adversary’s success in this game. We also found that the set of axioms that are desired by these systems often boil down to two things:
  • Liveness: The ledger continues to grow and accepts new transactions or state transitions
  • Persistence/Safety: Honest transactions eventually make it into the ledger and cannot be removed (without gargantuan expense) after a sufficient amount of time 
Given this level of similarity between the analyses of seemingly distinct protocols, it seems obvious to ask if there is a way to describe a single framework that can encompass all known distributed consensus methodologies. In this article and the sequel, we will look at two vignettes that suggest this might be possible:
  • [Part 3] Avalanche can be viewed as a randomized realization of the Stellar Consensus Protocol
  • [Part 4] Reductions to “ideal functionality” work for asynchronous models of Bitcoin, Oasis Labs’ Ekiden, and Zexe
But how does a generic framework for analysis relate to what makes blockchains secure? While a general framework for evaluating security is nice, it doesn't quite provide insight into security itself. What I hope to convey is that by generalizing security (e.g. figuring out which pieces of security are common to all protocols), one can get a better understanding of what the minimal necessary & sufficient conditions for a secure blockchain look like. Moreover, the fact that certain protocols can be implemented with the primatives of other protocols suggests that there is some underlying commonality to the more specialized protocol. In this vein (and unlike protocols discussed in the previous articles), the Stellar Consensus Protocol (SCP) provides a general framework as opposed to a concrete implementation — we will first dive into this protocol and illustrate that it is a general framework for a variety of consensus mechanisms. Next, we will give an overview of Avalanche, describe how it can be implemented as an instance of the SCP, and provide a sketch of a proof and numerical evidence for this. We should note that this will implicitly endow Avalanche with much better security properties than those proved in the original paper. Finally, we will conclude by looking forward to ideal functionality and how probabilistic equivalences between abstract state machines (such as the equivalence between Avalanche and an instance of the SCP) provide a way for unifying the analysis of consensus and to bootstrap strong guarantees from weak ones. 
The Stellar Consensus Protocol was written by David Mazières, a professor at Stanford and Chief Scientist at Interstellar (which manages the Stellar network). The SCP is implemented in stellar-core and in my opinion, is one of the best written pieces of blockchain client code that exists within the ecosystem. Mazières’s previous work includes Kademlia, a peer-to-peer distributed hash table (and mesh networking algorithm) and limits on the trade-offs between safety and liveness in BFT systems. Given this background, it is unsurprising that this work focuses on a protocol that has BFT-like guarantees with blockchain-esque communication complexity. In particular, the protocol tries to toe the line between a fully permissioned BFT system and a permissionless blockchain system. It does this by removing the all-to-all validity set of traditional BFT and replaces it with a localized set of “trusted nodes” that are used for validation. Let’s try to describe this visually:
At a high-level, the SCP achieves this by having users choose subsets of users that they trust and a user only amends a transaction if it gets unanimous agreement among these subsets. Why does this provide an improvement?

Recall that in traditional BFT, we require at least 2/3 of validators to agree before a transaction can be amended. In the first drawing, we see a network of five nodes that communicate to each other via the graph in the drawing. In order for any node to decide whether or not to add a transaction to it’s local copy of a ledger, it must receive valid messages from all other nodes (e.g. at least 2/3 of nodes). This means that in order for consensus to be reached, there needs to be a network path from any pair of nodes, which implies that the communication cost is quadratic in the number of nodes [0]. Instead of focusing on the explicit communication network, as BFT protocols do, the SCP focuses instead on what Mazières terms a quorum slice [1]. Intuitively, a quorum slice is a set of subsets of nodes that we need to reach agreement with before we can amend a transaction to our local ledger. Importantly, this subset does not need to be large (relative to the number of nodes). Moreover, a single node usually participates in multiple quorum slices and this is specified via a quorum selection function, Q:N22NQ : N \rightarrow 2^{2^N}-\emptyset for a node set NN. For instance, in the lower portion of the diagram, we see that for node 1, there are two subsets that it can reach consensus with: {1,3,5}\{1,3,5\} and {1,2,5}\{1,2,5\}. The SCP protocol provides an algorithm for a node to only have to reach consensus with one element of its quorum slice. This means that node 1 only needs to reach consensus with {1,3,5}\{1,3,5\} or {1,2,5}\{1,2,5\} to make progress within its own ledger. The idea is that if the size of the set of subsets we have to reach consensus with is sublinear, then we can reach consensus while having subquadratic communication complexity. This seems like a free lunch!

Before continuing onto some of the other assumptions and constraints that are necessary to execute the SCP, let’s first look at how traditional BFT can be represented within this framework (and is a subset of the SCP). Suppose that we have a BFT protocol on a node set NN with N=n|N| = n. For each nNn \in N, define the following set of subsets:

BFT(n)={AN:nA,A=2n3}BFT(n) = \{ A \subset N : n \in A, |A| = \frac{2n}{3}\} 

These sets are the smallest sets that nn can reach BFT consensus with, allowing it to add a transaction to its local copy of a ledger. Then, any valid quorum selection function QBFT:N22NQ_{BFT} : N \rightarrow 2^{2^N} - \emptyset , must satisfy the following constraint:

nN:BFT(n)QBFT(n),SQBFT(n):nSS2n3\forall n \in N: \; BFT(n) \subseteq Q_{BFT}(n), \forall S \in Q_{BFT}(n): n \in S \wedge |S| \geq \frac{2n}{3}

Less formally, this says that in order for a node nNn\in N to reach consensus, it must have at least reached consensus with a set of size at least 2n3\frac{2n}{3} in order for it to add a transaction to its ledger.

At this point, a reader versed in distributed systems theory should be skeptical of the idea that a single node need only agree with a sublinear number of people in order to reach consensus. Indeed, there is no free lunch here — instead of explicitly quantifying the number of nodes that we need to agree with to reach consensus, the SCP instead provides necessary and sufficient conditions on the set of quorum slices such that consensus can be achieved. These conditions are topological in nature and correspond to connectivity properties of quorum slices. We will present visualizations of these conditions. Firstly, let’s consider the concept of a quorum, which is a set of nodes and quorum slices that have a sort of “transitive closure” property. In a sense, this means that U is a self-consistent set — the nodes do not need input from any other nodes to reach consensus. Here’s a picture to illustrate this:


In the left panel, we see three nodes and a set of quorum slices such that each slice is contained within the set of three nodes. On the right, we see a set of five nodes such that the same set, {1,2,3}\{1,2,3\} is not self-consistent. We need to add nodes 4 and 5 in order to create a quorum. Consensus protocols all specify (sometime implicitly) a minimum quorum size for consensus [e.g. 2n3\lceil\frac{2n}{3}\rceil for BFT and n2\lceil\frac{n}{2}\rceil for Nakamoto]. However, the key to SCP is that each node chooses its own notion of ‘size needed to reach consensus’. Moreover, since each node can be contained within multiple quorum slices, there is also redundancy — we don’t rely on a single set of nodes for agreement, but rather we rely on just achieving consensus with one. This gives us two parameters to play with: average size of a quorum slice and the number of slices associate to a single node. In BFT and Nakamoto, both of these parameters are fixed; the SCP provides minimal conditions for reaching consensus so that a protocol developer can tune these two parameters. 

The topological conditions that the SCP provides are reminiscent of definitions in elementary point-set topology. In particular, the most important property that is required for consensus is called the quorum intersection property (QIP), which says that any two quora have non-empty intersection (reminiscent of compactness, no?). This strong condition implies that one can get from any node to another node via quora. If we rephrase this in human terms and let sets of friends represent a quorum, it would mean that if I agree with my friends and one of my friends agrees with your friends, then I will agree with your friends. The beauty of this type of formalism is that it provides a mechanism for agreement to propagate between nodes, regardless of the underlying network graph. It also formalizes how nodes build in redundancy to avoid Eclipse attacks without resorting to describing the explicit peer-to-peer network graph. With this framework, one can reason about security assumptions, such as Algorand’s liveness assumption, in a much cleaner manner as the quora size distribution and overlap controls security. However, there is still no free lunch — we have to construct a quorum selection function that satisfies all of these constraints in order for the protocol to be useful. This has been the main criticism of the protocol, as it is seems extremely hard to construct such a quorum selection function (indeed, doing it algorithmically might be #P-complete)

Finally, we can speak to the protocol itself: the SCP is a modified version of the “single decree synod” of Paxos and provides a way for transactions to be nominated, tentatively accepted, and then finalized via local agreement with one’s quora. The main assumption needed to ensure that this modified Paxos algorithm achieves liveness and persistence is that the set of quora still satisfies the QIP when we remove the set of Byzantine nodes. There are a variety of details in the paper about the precise state transitions that take place within the system — the only thing we will need to know is that Avalanche (defined below) also has a notion of a bivalent state, which is when a network partition causes part of the network to believe that statement a is true and another part to believe a is false. Avalanche and Stellar both provide explicit solutions for the network to regain consensus, even if the network is trapped within a bivalent state. Now onto Avalanche!

Avalanche is a clever reuse of the heavy-hitters algorithm (e.g. generalization of cardinality estimation algorithms like HyperLogLog) to generate a probabilistic consensus algorithm [2]. The core consensus idea is very simple and has far less complexity than the SCP when trying to agree on whether a statement AA is either true or false (represented as A\overline{A}): 
  1. Initialize a table that stores your confidence in either AA or A\overline{A}
  1. Repeatedly query k other users for their decision on AA or A\overline{A}
  1. Increment your confidence in A,AA, \overline{A} when a k-sample votes for A,AA, \overline{A} (respectively)
  1. Once you have overwhelming confidence in either AA or A\overline{A}, accept that choice
This methodology achieves consensus by relying on ‘metastability’ — once you get enough samples, you are ‘forced’ into a decision of truthiness, just as you are forced into a minima of a potential energy function when you are at a metastable critical point (e.g. ball 2) [3]:
Although Avalanche cites the SCP paper, they do not mention that metastability is exactly the same as bivalency in Stellar. 

To go from consensus to a protocol, Avalanche stores transactions in a DAG and decides on whether a new path in the DAG is the one accepted by most participants via this consensus mechanism. If we ignore a few technical details, this is starting to look reminiscent of the SCP: 
  • We have randomized quorum slices (e.g. our queries to k other participants)
  • Each query to k participants corresponds to a new quorum slices
  • With enough samples from ([n]k)\binom{[n]}{k}, the set of k-element subsets, we get close to covering the whole space which should generate quora (with high probability)
  • At some point, we should have a high-enough density of quora that we satisfy the QIP with high probability