Implement the backpropagation algorithm for neural networks and apply it to the task of handwritten digit recognition.
1. Neural Networks
 implement the backpropagation algorithm to learn the parameters for the neural network.
1.1 Visualizing the data

5000 training examples
each training example is a 20 pixel by 20 pixel grayscale image of the digit
The 20 by 20 grid of pixels is “unrolled” into a 400dimensional vector
1.2 Model representation
 3 layers – an input layer, a hidden layer and an output layer
1.3 Feedforward and cost function
 implement the cost function and gradient for the neural network

should not be regularizing the terms that correspond to the bias
Cost function with regularization:
2. Backpropagation

compute the gradient for the neural network cost function
2.1 Sigmoid gradient
Gradient for the sigmoid function:
2.2 Random initialization
 When training neural networks, it is important to randomly initialize the parameters for symmetry breaking.
epsilon init = 0.12; W = rand(L out, 1 + L in) * 2 * epsilon init − epsilon init;
2.3 Backpropagation
Intuition behind the backpropagation algorithm:
 Given a training example (x(t),y(t)), first run a “forward pass” to compute all the activations throughout the network

for each node j in layer l, compute an “error term” δ(l) that measures how much that node was “responsible” j for any errors in our output
Step 14 to implement backpropagation: