you try to optimise for the unique set of values of all the parameters. you are not trying to optimise a single parameter or a set of parameters in isolation but you are looking at the optimal values of all parameters at once.

neural networks and LLMs do this for billions of parameters at once?

neural networks does this for a specific problem ?

since next token prediction is generic enough, it optimises for much broader use cases too?

All notes