*verkle* trie

257-ary *Disclaimer: the code in this package is experimental. It can only be used in research and is not suitable for use in production. The trusted setup must be created in a secure environment. This is a responsibility of the user.*

## General

The repository contains an implementation of the so-called *verkle tree* as a 257-ary trie, a prefix tree. Here's an article explaining what are verkle trees. In short, these are significantly more efficient replacement for Merkle trees. *Verkle trees* uses *polynomial KZG (aka Kate) commitments* for *vector commitments* instead of usual hash function. The approach offers significant advantages.

The implementation uses *trusted setup* completely in Lagrange basis. Please find here all math and formulas as well as references to the articles it is based upon.

The implementation mostly follows the structure of the Merkle Patricia Tree. Instead of being *hexary*, it commits to 257-long vector in each node: 256 possible values of the byte plus one for an optional commitment to the terminal value in each node. So constant `d = 257`

in the trusted setup, hence the *257-ary trie*.

Keys in the trie are of arbitrary length. They are prefixes of the keys from the state's key/values storage. So, the structure of the trie follows the hierarchical structure of the state. This allows commit to partitions of the state and results in shorter keys, more predictable and slow-changing structure of the trie. Any key in the trie can point to a terminal value and same time it can be a prefix in other keys. As it is seen from the implementation, the special 257th "character" does not introduce any significant overhead.

## Repository and dependencies

The repository contains:

`kzg`

package with the implementation of the*KZG commitments*and the*trusted setup*.`kzg_setup`

, the CLI program to create a*trusted setup*from a secret and store it into the file.`trie`

package contains implementation of the*trie*as well as corresponding tests and benchmarks.

The implementation of *KZG commitments* uses DEDIS Advanced Crypto Library for Go Kyber v3 and its `bn256`

bilinear pairing suite as cryptographic primitives. The implementation follows formulas presented in this article.

## Implementation

### The state

The state is assumed to be an arbitrary collection of the key/value pairs. Empty key (`nil`

or `""`

) in the implementation is a valid key. The state assumes the empty key always contains serialized binary value of the *trusted setup* upon which the commitments are calculated. So, you can always check if the root commitment contains the commitment to the *trusted setup* itself.

**Determinism of the state**: the state is a **set** of key/value pairs, i.e. no matter the order of how those key/value pairs were added to the storage and trie, the state (and the commitment to it) is the same.

The key/value store is and implementation of `trie.KVStore`

interface.

The state is implemented as `trie.State`

. It contains partitions for key/values pairs of the state and for the trie itself. It also contains the cache for keeping nodes being updated during bulky state update operations to make them atomic.

### The trie

The trie is represented as a collection of key/value pairs in the `trie`

partition of the state. Each key/value pair in the trie represents a *node* of the trie in serialized form.

```
type Node struct {
pathFragment []byte
children [256]kyber.Point
terminalValue kyber.Scalar
}
```

Each node can keep commitments to its up to 256 children and to the terminal value as one vector.

The *ith* child has a commitment to it in `children[i]`

or `nil`

if there's not commitment to it. Commitment is represented by `kyber.Point`

interface which here is a point on the curve `G1`

.

The commitment to the terminal value, if exists, is not `nil`

and is equal to the `blake2b`

hash of the data itself, adjusted to the underlying field of the curves. It represented by `kyber.Scalar`

interface.

Each *node* represents a *vector* `V = (v0, v1, ...., v256)`

of length 257. Value of `vi`

is 0 if value of the underlying commitment is `nil`

. Otherwise, `v256`

corresponds to the terminal value and other `vi`

are `blake2b`

hashes of commitments adjusted to the field.

Commitment to the node is the commitment to the vector `V`

.

The `pathFragment`

is a slice (can be empty) of bytes taken from the key of the key/value pair in the state.

Let's say the node `N`

is stored in the trie under the key `K`

. Concatenation `P = K || N.pathFragment`

means the following:

- if
`N`

contains commitment to the terminal value`V`

, the`P`

is the key of that value in the state:`P: V`

. - for any not
`nil`

child with index`0 <= i < 256`

, the`Pi = P || {i} = K || N.pathFragment || {i}`

is the key of the node with the*vector*of commitments of the child. Here`{i}`

is a slice of one byte with value`i`

.

So, whenever we need a proof for the key/value pair `K: V`

in the state, we start from the empty key which corresponds to the root node and then recursively follow the path by concatenating corresponding `pathFragment`

values and picking corresponding byte of the next child in each node. The process is finished when we reach our key and the corresponding node which contains commitment of the terminal value `V`

.

## Example

Let's say we have the following key/value pairs in the state:

` "": `
"abra": "something"
"abrakadabra": "anything"
"abra+": "314"
"[email protected]": "217"
"abra-+" "42"

The resulting 257-ary verkle trie will look like this:

## Benchmarks

On *Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz* laptop.

- building a trie:
*0.67 ms*per 1 added key/value pair - generating proof from the state in memory with 100000 keys:
*168 ms per proof of 1 key/value*. It is an expensive operation because always requires`K x 257`

operations on the curve, where`K`

is number of nodes in the proof. - verifying proof (state with 100000 keys):
*12.6 ms per verification*

Trie size estimates:

With 100000 key/value pairs in the state generated uniformly randomly with max key size `70`

:

- keys/nodes in the trie:
*128648*(excess factor*28%*) - average length of the key in the trie:
*2.6 bytes* - average number of children in the node:
*1.78* - number of nodes with only terminal value (no children):
*98685*(*76.7%*) - 96% of nodes has 3 or fewer children
- distribution of length of proofs:
*22%*have length 3,*78%*have length 4

With 100000 key/value pairs in the state generated with max key size *60* assuming realistic patterns of the state of the IOTA Smart Contract chain: first 4-6 bytes identified partition of the smart contract.

- keys/nodes in the trie:
*107940*(excess factor*7%*) - average length of the key in the trie:
*6.07 bytes* - average number of children in the node:
*1.93* - number of nodes with only terminal value (no children):
*98497*(*91.2%*) - 96% of nodes has 1 or 2 children
- distribution of length of the proof:
*38%*have length*4*,*49%*have length*5*,*13%*have length*6*

The following chart shows size of keys in the trie partition in bytes:

The keys are very short due to the big width of the tree.

## TODO

- The function to remove a key/value pair from the state is not implemented yet.
- Optimize on the pattern when most (90%+) nodes are nodes which commits to terminals

## Links

- Constant-Size Commitments to Polynomials and Their Applications, the original paper
- KZG polynomial commitments by Dankrad Feist
- PCS multiproofs using random evaluation by Dankrad Feist
- Verkle trees by Vitalik Buterin
- Kate Commitments: A Primer
- DEDIS Advanced Crypto Library for Go Kyber v3
- Modified Merkle Patricia Trie Specification
- Experimental KZG implementation. Math of this implementation explained in detail

## Acknowledgements

Special thanks to *Dankrad Feist* for pointing out my crypto-mathematical mistakes