goal - reduce the number of keys that needs to be moved to new servers when a server is taken down or is added
the gist is we hash both the object keys and the server names.
challenge - uneven distribution mitigation - virtual nodes. as the number of virtual nodes increase the distribution becomes more even. akin to more random samplings
read further
- Load Imbalance
- Hotspots
- Data movement during scale down and scale up
- reduce disruption
- Why can't kafka use consistent hashing to reduce the number of partitions? provided offsets are same?
Referenced in:
All notes