marp
true

Kademlia

2019-04-26

nodes self sets a random unique ID (UUID)
nodes are grouped in neighbourhoods determined by the node ID distance
Kademlia uses distance calculation between two nodes
- distance is computed as XOR (exclusive or) of the two node IDs

with that last 3 properties we ensure that XOR
- captures all of the essential & important features of a "real" distance function
- is simple and cheap to calculate
each search iteration comes one bit closer to the target
- a basic Kademlia network with 2^n nodes will only take n steps (in worst case) to find that node

k-buckets
- k is a system wide number
- every k-bucket is a list having up to k entries inside
- example:
  - network with k=20
  - each node will have lists containing up to 20 nodes for a particular bit
possible nodes for each k-bucket decreases quickly
- as there will be very few nodes that are that close
since quantity of possible IDs is much larger than any node population, some of the k-buckets corresponding to very short distances will remain empty

Each node knows its neighbourhood well and has contact with a few nodes far away which can help locate other nodes far away.
Kademlia priorizes long connected nodes to remain stored in the k-buckets
- as the nodes that have been connected for a long time in a network will probably remain connected for a long time in the future

when a k-bucket is full and a new node is discovered for that k-bucket
- node sends a ping to the last recently seen node in the k-bucket
- if the node is still alive, the new node is stored in a secondary list (a replacement cache)
  - replacement cache is used if a node in the k-bucket stops responding
- basically, new nodes are used only when older nodes disappear

PING
STORE
FIND_NODE
FIND_VALUE Each rpc msg includes a random value from the initiator, to ensure that the response corresponds to the request

node lookups can proceed asynchronously
- α denotes the quantity of simultaneous lookups
- α tipically is 3
node initiates a FIND_NODE request to the α nodes in its own k-bucket that are closest ones to the desired key

when the recipient nodes receive the request, they will look in their k-buckets and return the k closest nodes to the desired key that they know
the requester will update a results list with the results (node IDs) that receives
- keeping the k best ones (the k nodes that are closer to the searched key)
the requester node will select these k best results and issue the request to them
the proces is repeated again and again until get the searched key

iterations continue until no nodes are returned that are closer than the best previous results
- when iterations stop, the best k nodes in the results list are the ones in the whole network that are the closest to the desired key
node information can be augmented with RTT (round trip times)
- when the RTT is spended, another query can be initiated
- always the query's number are <= α (quantity of simultaneous lookups)

data (values) located by mapping it to a key
- typically a hash is used for the map
locating data follows the same procedure as locating the closest nodes to a key
- except the search terminates when a node has the requested value in his store and returns this value

values are stored at several nodes (k of them)
the node that stores a value
- periodically explores the network to find the k nodes close to the key value
- to replicate the value onto them
- this compensates the disappeared nodes

avoiding "hot spots"
- for popular values (might have many requests)
- near nodes outside the k closest ones, store the value
  - this new storing is called cache
  - caching nodes will drop the value after a certain time
    - depending on their distance from the key
- in this way the value is stored farther away from the key
  - depending on the quantity of requests
- allows popular searches to find a storer more quickly
- alleviates possible "hot spots"
not all implementations of Kademlia have these functionallities (replicating & caching)
- in order to remove old information quickly from the system

to join the net, a node must first go through a bootstrap process
bootstrap process
- needs to know the IP address & port of another node (bootstrap node)
- compute random unique node ID number
- inserts the bootstrap node into one of its k-buckets