A fixed number of “Thinking Sessions” (M-loop) would be incredibly inefficient. This is where HRM introduces its final, most elegant component: an Efficiency Expert who tells the company when it’s time to stop. In the paper, this is called ==Adaptive Computation Time (ACT)==, and it’s powered by Q-learning.

Referenced in:

All notes