goal - reduce the number of keys that needs to be moved to new servers when a server is taken down or is added

the gist is we hash both the object keys and the server names.

challenge - uneven distribution mitigation - virtual nodes. as the number of virtual nodes increase the distribution becomes more even. akin to more random samplings

read further

  1. Load Imbalance
  2. Hotspots
  3. Data movement during scale down and scale up

Questions

  1. Why can't kafka use consistent hashing to reduce the number of partitions? provided offsets are same?

Referenced in:

All notes