for the models to trainllama is trained on 15T tokenspublic github repos less than 1T tokensmitigationsynthetic dataself playRL approachesreinforcement learningAll notes