Project 4 - Backpropagation

Data files

Each line of the file is one pattern which contains the input values x and y (and z) and the expected output value as the last number in that line.

For Problem 1 (2 inputs)

training1.txt
testing1.txt
validation1.txt

For Problem 2 (3 inputs)

training2.txt
testing2.txt
validation2.txt


Inputs to the program

Your program should accept the following either on the command line or from standard input:
It may help to also accept the number of inputs to the network (2 for Problem 1 and 3 for Problem 2). Note that each hidden layer can have a different number of nodes.


Representing the neural network

The network can be represented as a three dimensional array for the hidden layers and a single dimensional array for the output layer. The first dimension of the hidden layer array gives the layer, the second dimension gives the neuron in a given layer, and the third dimension denotes the weight associated with an input node or a node from the previous hidden layer. The output layer array holds the weights associated with the nodes in the last hidden layer. Each neuron in the hidden and output layers also needs a bias weight. These can be stored as separate arrays or as the 0th elements of the hidden and output arrays.

You will also need a way of keeping track of the output (h and sigma) and the delta values for each hidden node and the output node.


Initializing the weights

Each weight in the network (hidden and output layers) should be initialized to a random number between -0.1 and 0.1.


Training the network

The training method given below uses online learning, where the weights are updated after each pattern is presented. For each epoch, you will iterate through all the training patterns. For each pattern, you will compute the outputs, compute the delta values, and update the weights as follows.

First initialize all output values (h and sigma) and delta values of each hidden and output node to 0. These need to be reset after going through each of the training patterns.

Computing the outputs

The forward pass through the network consists of computing the outputs of each successive hidden layer and then finally the output layer. For the hidden layers, iterate through each node of each hidden layer. For each of these nodes, check to see whether the current hidden layer is the first hidden layer. If so, then iterate through the number of inputs, and update the output value (h) of that node as follows

h += hiddenweight * input

where hiddenweight is the weight associated with the current hidden node and input node and input is the value of the current input node.

If the current hidden layer is not the first one, then iterate through all the nodes of the previous hidden layer. Update the output value (h) of the current hidden node as follows

h += hiddenweight * sigmahiddenprevious

where hiddenweight is the weight associated with the node to be updated and the node of the previous layer and sigmahiddenprevious is the sigma value of the output of the node from the previous layer.

After computing the output of the current node by going through the previous layer, find the sigma value of the output of the current node according to the following formula

sigmahidden = 1/(1+exp(-h))

Then compute the output at the output node by iterating through the nodes in the last hidden layer and updating the output value as follows

output += outputweight * sigmahiddenlast

where outputweight is the weight of the output layer (from the last hidden layer) and sigmahiddenlast is the sigma value of the output of the node from the last hidden layer.

Then find the sigma value of this output value.

Note: Be sure to add in the bias factor when computing the output of each node.

Computing the delta values

The backward pass through the network involves computing the delta values first at the output layer then backwards from the last hidden layer to the first hidden layer.

Compute delta of the output node as

deltaoutput = sigmaoutput * (1-sigmaoutput) * (expectedoutput-sigmaoutput)

where sigmaoutput is the sigma value of the output node and expectedoutput is the expected output of the current pattern.

To compute the delta values for the hidden nodes, iterate through each node of each hidden layer. For each of these nodes, check whether the current hidden layer is the last hidden layer. If so, then compute delta for that node as follows

deltahidden = sigmahidden * (1-sigmahidden) * deltaout * outputweight

where sigmahidden is the sigma value of the output of the current node, deltaout is the delta value of the output node, and outputweight is the weight associated with the current hidden node and the output node.

If the current hidden layer is not the last hidden layer, then iterate through the number of nodes in the next (forward) hidden layer and update delta for the current hidden node as follows

deltahidden += sigmahidden * (1-sigmahidden) * deltahiddennext * hiddenweight

where sigmahidden is the sigma value of the output of the current node, deltahiddennext is the delta value of the node from the next hidden layer, and hiddenweight is the weight associated with the node from the current hidden layer and the node from the next hidden layer.

Updating the weights

For updating the hidden layers, iterate through each node of each hidden layer. For each node, check to see whether or not the current hidden layer is the first hidden layer. If so, then iterate through the number of inputs and update the weight of the hidden node as follows

hiddenweight += learningrate * deltahidden * input

where deltahidden is the delta value of the current hidden node and input is the current input value.

If the current hidden layer is not the first layer, then iterate through the nodes of the previous hidden layer and update the weight of the hidden node as follows

hiddenweight += learningrate * deltahidden * sigmahiddenprevious

where deltahidden is the delta value of the current hidden node and sigmahiddenprevious is the sigma value of the output of the node from the previous hidden layer.

To update the weights of the output layer, iterate through the nodes in the last hidden layer and update the weight as follows

outputweight += learningrate * deltaout * sigmahiddenprevious

where deltaout is the delta value of the output node and sigmahiddenprevious is the sigma value of the output of the hidden node from the previous layer.

Note: Be sure to also update the bias weights.


Testing the network

After each epoch of the training, the network is tested by iterating through all the patterns of the testing set and computing the output of each hidden and output node as in the procedure for computing outputs during training. For evaluating how well the network performs, calculate the root mean square error of the testing data. After presenting a testing pattern to the network a sum is accumulated

sum += (expectedoutput-sigmaoutput)2

where expectedoutput is the expected output of the current testing pattern and sigmaoutput is the sigma value of the output node.

After presenting all the testing patterns, calculate the root mean square error as

rmse = sqrt((1/(2*numtestingpatterns))*sum)

Print to the screen or write to a file this rmse value after each training epoch. You can use this data to generate graphs for your report.

After the training process, apply this same technique to the validation data.


Experiments and report

Try several experiments testing combinations of a different number of hidden layers, a different number of neurons in each hidden layer, and different learning rates. Make graphs of the rmse (and any other measure you use) over time (epochs) and include these graphs and discussion in your report on what network architecture seems optimal for the problem and how changing the network architecture affects the network's performance. Do all this for both Problems 1 and 2 (and the graduate part for 594 students). Be sure to evaluate performance on both the testing sets and the validation sets.