Project 4 - Backpropagation
Data files
Each line of the file is one pattern which contains the input values x
and y (and z) and the expected output value as the last number in that
line.
For Problem 1 (2 inputs)
training1.txt
testing1.txt
validation1.txt
For Problem 2 (3 inputs)
training2.txt
testing2.txt
validation2.txt
Inputs to the program
Your program should accept the following either on the command line or
from standard input:
- number of hidden layers
- number of neurons in each hidden layer
- learning rate
- training, testing, and validation data files
- number of training epochs
It may help to also accept the number of inputs to the network (2 for
Problem 1 and 3 for Problem 2). Note that each hidden layer can have a
different number of nodes.
Representing the neural network
The network can be represented as a three dimensional array for the
hidden layers and a single dimensional array for the output layer. The
first dimension of the hidden layer array gives the layer, the second
dimension gives the neuron in a given layer, and the third dimension
denotes the weight associated with an input node or a node from the
previous hidden layer. The output layer array holds the weights
associated with the nodes in the last hidden layer. Each neuron in the
hidden and output layers also needs a bias weight. These can be stored
as separate arrays or as the 0th elements of the hidden and output
arrays.
You will also need a way of keeping track of the output (h and sigma)
and the delta values for each hidden node and the output node.
Initializing the weights
Each weight in the network (hidden and output layers) should be
initialized to a random number between -0.1 and 0.1.
Training the network
The training method given below uses online learning, where the weights
are updated after each pattern is presented. For each epoch, you
will iterate through all the training patterns. For each pattern, you
will compute the outputs, compute the delta values, and update the
weights as follows.
First initialize all output values (h and sigma) and delta values of
each hidden and output node to 0. These need to be reset after going
through each of the training patterns.
Computing the outputs
The forward pass through the network consists of computing the outputs
of each successive hidden layer and then finally the output layer. For
the hidden layers, iterate through each node of each hidden layer. For
each of these nodes, check to see whether the current hidden layer is
the first hidden layer. If so, then iterate through the number of
inputs, and update the output value (h) of that node as follows
h += hiddenweight * input
where hiddenweight is the weight associated with the
current hidden node
and input node and input is the value of the current input
node.
If the current hidden layer is not the first one, then iterate through
all the nodes of the previous hidden layer. Update the output value (h)
of the current hidden node as follows
h += hiddenweight * sigmahiddenprevious
where hiddenweight is the weight associated with the node
to be updated
and the node of the previous layer and sigmahiddenprevious
is the sigma
value of the output of the node from the previous layer.
After computing the output of the current node by going through the
previous layer, find the sigma value of the output of the current node
according to the following formula
sigmahidden = 1/(1+exp(-h))
Then compute the output at the output node by iterating through the
nodes in the last hidden layer and updating the output value as follows
output += outputweight * sigmahiddenlast
where outputweight is the weight of the output layer (from
the last
hidden layer) and sigmahiddenlast is the sigma value of the
output of the node from the last hidden layer.
Then find the sigma value of this output value.
Note: Be sure to add in the bias factor when computing the output of
each node.
Computing the delta values
The backward pass through the network involves computing the delta
values first at the output layer then backwards from the last hidden
layer to the first hidden layer.
Compute delta of the output node as
deltaoutput = sigmaoutput * (1-sigmaoutput) * (expectedoutput-sigmaoutput)
where sigmaoutput is the sigma value of the output node and
expectedoutput is the expected output of the current
pattern.
To compute the delta values for the hidden nodes, iterate through each
node of each hidden layer. For each of these nodes, check whether the
current hidden layer is the last hidden layer. If so, then compute
delta for that node as follows
deltahidden = sigmahidden * (1-sigmahidden) * deltaout * outputweight
where sigmahidden is the sigma value of the output of the
current node,
deltaout is the delta value of the output node, and
outputweight is the
weight associated with the current hidden node and the output node.
If the current hidden layer is not the last hidden layer, then iterate
through the number of nodes in the next (forward) hidden layer and
update delta for the current hidden node as follows
deltahidden += sigmahidden * (1-sigmahidden) * deltahiddennext * hiddenweight
where sigmahidden is the sigma value of the output of the
current node,
deltahiddennext is the delta value of the node from the
next hidden
layer, and hiddenweight is the weight associated with the
node from the
current hidden layer and the node from the next hidden layer.
Updating the weights
For updating the hidden layers, iterate through each node of each hidden
layer. For each node, check to see whether or not the current hidden
layer is the first hidden layer. If so, then iterate through the number
of inputs and update the weight of the hidden node as follows
hiddenweight += learningrate * deltahidden * input
where deltahidden is the delta value of the current hidden
node and input is the current input value.
If the current hidden layer is not the first layer, then iterate through
the nodes of the previous hidden layer and update the weight of the
hidden node as follows
hiddenweight += learningrate * deltahidden * sigmahiddenprevious
where deltahidden is the delta value of the current hidden
node and
sigmahiddenprevious is the sigma value of the output of the
node from the previous hidden layer.
To update the weights of the output layer, iterate through the nodes in
the last hidden layer and update the weight as follows
outputweight += learningrate * deltaout * sigmahiddenprevious
where deltaout is the delta value of the output node and
sigmahiddenprevious is the sigma value of the output of the
hidden node from the previous layer.
Note: Be sure to also update the bias weights.
Testing the network
After each epoch of the training, the network is tested by iterating
through all the patterns of the testing set and computing the output of
each hidden and output node as in the procedure for computing outputs
during training. For evaluating how well the network performs,
calculate the root mean square error of the testing data. After
presenting a testing pattern to the network a sum is accumulated
sum += (expectedoutput-sigmaoutput)2
where expectedoutput is the expected output of the current
testing pattern and sigmaoutput is the sigma value of the
output node.
After presenting all the testing patterns, calculate the root mean
square error as
rmse = sqrt((1/(2*numtestingpatterns))*sum)
Print to the screen or write to a file this rmse value after each
training epoch. You can use this data to generate graphs for your
report.
After the training process, apply this same technique to the validation
data.
Experiments and report
Try several experiments testing combinations of a different number of
hidden layers, a different number of neurons in each hidden layer, and
different learning rates. Make graphs of the rmse (and any other
measure you use) over time (epochs) and include these graphs and
discussion in your report on what network architecture seems optimal for
the problem and how changing the network architecture affects the
network's performance. Do all this for both Problems 1 and 2 (and the
graduate part for 594 students). Be sure to evaluate performance on
both the testing sets and the validation sets.