Follow along at:
Learning from data
Features of the data are extracted/engineered
Training data is used to infer model parameters
Test data is used to evaluate accuracy
A family of machine learning algorithms
How would the error change if we changed the activity in each unit just a little?
How much would change in each weight affect the activity in layer L?
Choose a random batch from your training data
Calculate the errors from the top of the network back
Adjust the weights by a small amount
Repeat until convergence
Inspired by the visual system
Capitalize on spatial correlations in images