Case tree empty, AddBatch was 10.95x times faster than without AddBatch
nCPU: 4, nLeafs: 1024, hash: Poseidon, db: memory
dbgStats(hash: 2.047k, dbGet: 1, dbPut: 2.049k)
Case tree not empty w/ few leafs, AddBatch was 7.28x times faster than without AddBatch
nCPU: 4, nLeafs: 1024, hash: Poseidon, db: memory
dbgStats(hash: 2.047k, dbGet: 198, dbPut: 2.049k)
Case tree not empty w/ enough leafs, AddBatch was 5.94x times faster than without AddBatch
nCPU: 4, nLeafs: 1024, hash: Poseidon, db: memory
dbgStats(hash: 2.047k, dbGet: 1.000k, dbPut: 2.049k)
Case tree not empty, AddBatch was 9.27x times faster than without AddBatch
nCPU: 4, nLeafs: 4096, hash: Poseidon, db: memory
dbgStats(hash: 8.191k, dbGet: 1.800k, dbPut: 8.193k)
Case tree not empty & unbalanced, AddBatch was 10.67x times faster than without AddBatch
nCPU: 4, nLeafs: 4096, hash: Poseidon, db: memory
dbgStats(hash: 10.409k, dbGet: 2.668k, dbPut: 10.861k)
TestAddBatchBench: nCPU: 4, nLeafs: 50000, hash: Blake2b, db: badgerdb
Add loop: 10.10829114s
AddBatch: 732.030263ms
dbgStats(hash: 122.518k, dbGet: 1, dbPut: 122.520k)
TestDbgStats
add in loop in emptyTree dbgStats(hash: 141.721k, dbGet: 134.596k, dbPut: 161.721k)
addbatch caseEmptyTree dbgStats(hash: 24.402k, dbGet: 1, dbPut: 24.404k)
addbatch caseNotEmptyTree dbgStats(hash: 26.868k, dbGet: 2.468k, dbPut: 26.872k)
CASE D: Already populated Tree
==============================
- Use A, B, C, D as subtree
- Sort the Keys in Buckets that share the initial part of the path
- For each subtree add there the new leafs
R
/ \
/ \
/ \
* *
/ | / \
/ | / \
/ | / \
L: A B C D
/\ /\ / \ / \
... ... ... ... ... ...
buildTreeBottomUp splits the key-values into n Buckets (where n is the
number of CPUs), in parallel builds a subtree for each bucket, once all
the subtrees are built, uses the subtrees roots as keys for a new tree,
which as result will have the complete Tree build from bottom to up,
where until the log2(nCPU) level it has been computed in parallel.
As result of this, the tree construction can be parallelized until
almost the top level, almost dividing the time by the number of CPUs.