goal - reduce the number of keys that needs to be moved to new servers when a server is taken down or is added

the gist is we hash both the object keys and the server names.

challenge - uneven distribution mitigation - virtual nodes. as the number of virtual nodes increase the distribution becomes more even. akin to more random samplings

read further

Load Imbalance
Hotspots
Data movement during scale down and scale up
reduce disruption

Questions

Why can't kafka use consistent hashing to reduce the number of partitions? provided offsets are same?

Referenced in:

remapping

All notes