State commitments and storage review

State commitments and storage review
Author: Robert Zaremba <robert [at] regen.network>

Introduction

Currently, Cosmos SDK uses IAVL for both state commitments and storage.  Unfortunately, the IAVL project is not well maintained and has performance issues. It’s stated that it’s the second biggest bottleneck in Cosmos:
  • Zaki mentioned: “IAVL is very problematic in it's IO patterns. After Amino, it's probably the largest Cosmos SDK performance bottleneck. It also uses a lot of indirection that makes look ups very inefficient.”
  • Snapshots are very expensive (IAVL is a huge bottleneck).  
The IAVL performance comes from not well optimized cache layers and access patterns. Moreover, updates in IAVL may trigger tree reorganization and possible O(log(n)) hashes re-computation, which can become a CPU bottleneck.

Instead of optimizing the IAVL, we are looking in another solutions for both storage and state commitments.

Basic requirements and an approach

Cosmos SDK blockchains requires:
  • Database - a storage layer to save a current state.
  • State commitment - an efficient mechanism to commit to a set of records and provably reveal one or many records at specific positions at a later time. Efficient means:
  • generating a commitment for a data set should be in polynomial time (usually we want linear time), 
  • generating a proof for record inclusion in the state must be polylograithmic.
  • validating a proof must be polylogarithmic.
  • State commitment is stored in each block header as an `AppHash` and it’s used to commit and validate the state of the storage up to the given block. Usually it’s implemented as a some flavor of a Merkle Tree (here IAVL).

Given the two requirements above are not coupled. We can gain a lot of performance for using a storage with highly optimized read and write operations and keeping the state commitments in a separate  dedicated structure. In fact, in Cosmos SDK the storage is kept separate from the state commitment using tm-dm abstraction (DB: reduce supported dbs #6406)

Functional Requirements

Storage has to provide:
  1. Quick index access
  1. Key / value update
  1. Prefix iteration

Sate commitment:
  1. Record inclusion
  1. Record exclusion
  1. Range proofs (proof for a sequence of records).
We can solve record exclusion and range proofs with a state commitment schema is a vector commitment (we commit to a sequence — an order). If we proof that [a, b]  is in a vector commitment, then there is no a < x < b such that [a, b, c] is in a vector commitment.

Storage commitment structures.

Here we overview few recent approaches for State Commitments and investigate if we can use it in Cosmos SDK.

MMR

Mountain Merkle Range Tree (MMR) is introduced in the FlyClient paper.
MMR has the following properties:
  • P1 Balanced binary search tree (search time is O(log(n)).
  • P2 Is a Merkle Tree (parent node is a commitment to children). 
  • P3 insertion is O(log(n)).
Restrictions:
  • R1 Is an append only data structure. 
  • R2 Key insertion order must be strictly increasing (if k_i is inserted after k_j, then k_i > k_j).
n = total number of elements in the tree.

MMR in FlyClient is used for a whole blockchain commitment (a chain of all blocks). The key of each blocks is a total difficulty of the blockchain. This satisfies the key insertion restriction. MMR allows efficient appends at the prover side and efficient block inclusion verification at the verifier side. Further, it enables efficient sub-tree proofs, a proof that two MMRs agree on the first k leaves. At every block height i, the prover appends the hash of the previous block, B_{i−1} , to the most recent MMR and records the new MMR root, M_i , in the header of B_i . As a result, each MMR root stored at every block height can be seen as a commitment to the entire blockchain up to that height.

Conclusion
Although the implementation guarantees optimal tree operations (no rotations, perfectly balanced binary search tree), we can't use it for state commitments because we can't guarantee [R2] restriction.
Is there any way we can benefit from MMR?
YES! MMR is great for light client implementation.
At it's base, each block has a root hash of the MMR containing all the previous blocks. With this once a block B_n is validated by a light client, he automatically can automatically verify any previous by himself. This is how it works: