gokulnk reading list notes

the reward is propagated back to each of the steps as well, so that the model konws what to do at each step

Referenced in:

basal-ganglia
PPO

All notes

gokulnk readinglist notes

© 2026, Site By @gokulnk