【Stanford ML Exercise3】 Multi-class Classification and Neural Networks

In this exercise, use logistic regression and neural networks to recognize handwritten digits (from 0 to 9)

1. Multi-class Classification

  • First part: extend previous logistic regression, apply to one-vs-all classification

1.1 Dataset

5000 training examples, each training example is a 20 pixel by 20 pixel grayscale image of the digit.

1.3 Vectorizing Logistic Regression

  • train 10 separate logistic regression classifiers
  • implement a vectorized version of logistic regression that does not employ any for loops

1.3.3 Vectorizing regularized logistic regression

cost function:

Screen Shot 2017-06-25 at 6.07.37 PM.png

the partial derivative of regularized logistic regression cost:
Screen Shot 2017-06-25 at 6.10.35 PM.png
Code: lrCostFunction.m
function [J, grad] = lrCostFunction(theta, X, y, lambda)

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

J = - y'*log(sigmoid(X*theta)) - (1 - y)'*log(1 - sigmoid(X*theta));
J = J/m;
J = J + (lambda/(2*m))*sum(theta(2:length(theta)).^2);

j = 1;
grad(j) = (sigmoid(X*theta) - y)'*X(:,1);
grad(j) = grad(j)/m;

grad(2:length(theta)) = grad(2:length(theta)) + X(:,2:length(theta))'*(sigmoid(X*theta) - y);
grad(2:length(theta)) = grad(2:length(theta))/m;
grad(2:length(theta)) = grad(2:length(theta)) + (lambda/m)*theta(2:length(theta));

% =============================================================

grad = grad(:);

end

1.4 One-vs-all Classification

  • implement one-vs-all classification by training multiple regularized logistic regression classifiers
  • train one classifier for each class

    return all the classifier parameters in a matrix

Code: oneVsAll.m

function [all_theta] = oneVsAll(X, y, num_labels, lambda)

% Some useful variables
m = size(X, 1);
n = size(X, 2);

% You need to return the following variables correctly 
all_theta = zeros(num_labels, n + 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

for c = 1:num_labels
 initial_theta = zeros(n + 1, 1);
 options = optimset('GradObj', 'on', 'MaxIter', 50);
 [theta] = ...
 fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
 initial_theta, options);
 all_theta(c,:) = theta(:);

% =========================================================================

end

1.4.1 One-vs-all Prediction

  • compute the “probability” that it belongs to each class using the trained logistic regression classifiers

Code: predictOneVsAll.m

function p = predictOneVsAll(all_theta, X)

m = size(X, 1);
num_labels = size(all_theta, 1);

% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

[p_val,p] = max(sigmoid(X*all_theta'), [], 2);

% =========================================================================
end

2. Neural Networks

  • logistic regression cannot form more complex hypotheses as it is only a linear classifier
  • The neural network will be able to represent complex models that form non-linear hypotheses

2.1 Model representation

  • It has 3 layers – an input layer, a hidden layer and an output layer
  • the images are of size 20-20, this gives us 400 input layer units
  • The parameters have dimensions that are sized for a neural network with 25 units in the second layer and 10 output units (corresponding to the 10 digit classes)

2.2 Feedforward Propagation and Prediction

  • implement feedforward propagation for the neural network

Code: predict.m

function p = predict(Theta1, Theta2, X)

% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);

% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);

A2 = sigmoid([ones(m,1), X] *Theta1');

h = sigmoid([ones(m,1), A2]*Theta2');

[p_val,p] = max(h, [], 2);

% =========================================================================

end
Advertisements