How does the Expert get so smart? It learns through simple, direct feedback. During training, if the Expert advises “halt” and the answer is correct, it gets a reward of +1. If the answer is wrong, it gets a 0. This quickly teaches it to only be confident when the CEO’s internal state is truly coherent and the puzzle is solved.