Storm v3 read pipeline design

This paper is a draft of the Storm v3 read design.

This document describes ideas of various components of the read pipeline and how they would work together. Knowledge on the current version of Storm (v2) is necessary to understand the concepts described on this document.

The problem that Storm is trying to solve is to be able to perform complicated queries on arbitrary types without too much boilerplate.

Dealing with unknown types in Go can be done in two ways:

Reflection

Non empty interfaces

Storm v2 uses reflection heavily and it works fine. It has some drawbacks though:

It’s slower and less memory efficient than using concrete types directly

The code is hard to read and to maintain

No compile time checks

The public API seems "magic" with a lot of interface{} parameters

Non empty interfaces on the other hand are great because they abstract the model and focus on the behavior.

But a library like Storm needs to perform too many actions on a single model. Having either too many interfaces to implement or a big interface would require the user to write too much boilerplate for Storm to be interesting.

Also, even if Storm doesn’t know anything about the types it's gonna receive, the user knows exactly the type it's using. Even if the rest of its code uses strong typing, the user loses the guarantees of strong typing when using Storm.

With that in mind, Storm v3 will try to create something that gets the best of all the above solutions.

It will focus on having a read pipeline that will:

use reflection by default

allow customization on various parts of the pipeline

allow strong typing through code generation

Draft

The read pipeline is the code that will execute user queries.

To simplify the visualisation, SQL queries will be used to illustrate what a user can do in Storm even though they there will be no support for them.

Cursor

A Cursor is an object used to scan a bucket. Every time it is called, it returns the next record. In its simplest form, it can be represented as this interface:

type Cursor interface {

Next() (key []byte, value []byte)

}

Bolt provides a cursor that scans a bucket from an arbitrary position. This will be the default cursor for Storm and it will be used to scan the entire bucket.

A note on indexes

Cursors can also be generated though and can for example hold a list of keys to fetch and return on each call to Next(). This will be how the index optimisation will work, pre-selecting a list of keys to narrow the scanning of the bucket. We will detail this part when we will tackle the index optimization algorithm.

Bucket scan

Using a cursor to match the records

This is the main loop when scanning a bucket. Each record returned by the cursor will be passed to the matching function alongside the matchers. If the record matches, it gets added to a sink, otherwise it gets ignored.

The decoding will be done lazily inside the matching because we might not necessarily need to decode the records to match them. We can imagine having matchers that match on the raw data instead of the decoded version of the record (i.e. gjson, matchers using the bytes package, etc.).

​​Draft

​​Cursor

​​Bucket scan

Draft

Cursor

Bucket scan