Storm v3 design: Second iteration
After experimenting a bit, i realized that some of Storm v2’s features may not be a good fit for the next version and that i should rethink whether we should keep some of them or not.
Codecs are one of them. Let’s experiment a design without them.

Writes

To write something in a bucket, we need a Schema. It describes all the fields and their respective types for a specific bucket. Schemas are not tied to a specific data structure, they can be generated from a struct and used with a map and the other way around.

Using a Schema we can reason in terms of table and store each field separately.

Since the Schema is typed, we can encode each field using the right method:
  • string → []byte(s)
  • int64 → binary.PutVarint
  • etc.

type Schema map[string]*Field

type Field struct {
    Name string
    Type FieldType // ex: "enum" of all the supported types?
}

The problem is that a bucket has one dimension, while a table has two.
Using one bucket per “column” would add too much overhead though, what’s missing is the concept of a row.

Using Bolt’s NextSequence feature, we can generate row ids to:
  • virtually add a new dimension within a bucket
  • uniquely identify each row
  • group fields that belong to the same row
  • ensure all the fields are contiguous within a bucket

key: rowID + '-' + fieldName
value: byte representation of the field value

// example (pseudo data)
1-Name: "john"
1-Age: 2
2-Name: "jack"
2-Age: 29
...

Reads

That’s where all the benefit of reasoning with tables lies.
Decoupling the data from the data structure (struct or map) allows us to avoid reflection on low level parts of the code.
But the most interesting part comes next: We can now create a read pipeline that can transform ”tables” into other in memory tables.

type Table interface {
  Next() (Row, error)
  Schema() (*Schema, error)
}