universal function approximator so i like to think that gradient descent is smarter than meAshish VaswaniAll notes