Neural Networks Tutorial, Part #3
Posted on December 13, 2007
Filed Under Algorithms |
In our previous tutorial we’ve laid out the basic form of our two layer feed-forward neural network (FFNN). In this installment we’ll derive a way of training it. Just to remind you, here is the basic outline of our neural network, along with all relevant variables:
In general, when training a network, one prepares a set of inputs,
and a corresponding set of outputs,
. The superscript
denotes an input-output set. Note that each
is a vector, with components
:

In this example, N denotes the number of inputs.
In reality, however, when we feed the
into our FFNN, the corresponding outputs
don’t often match our expectations,
. We can quantity the total error from comparing all possible outputs to the desirable outputs by defining an error function as follows:

The double sum here ranges over all input-output pairs, and over all components. We would like to minimize E - ideally make it zero, or at least as small as posible. We do this by employing a gradient descent method - at each iteration of our algorithm we compute the derivatives of E with respect to the bias vectors and weight matrices, and slightly change them in a manner proportional to the derivatives. In other words:

where
is some small number, the choice of which is somewhat of an art and some trial-and-error (more on that later). This method of adjusting our weights and biases is termed backpropagation, for a reason that will be made evident at the end of this post. Meanwhile, we need to get down and busy with computing the above derivatives. Be warned, things are going to get quite messy. For a firm understanding, I suggest following them on your own with pen and paper.
Computing the Derivatives: Output Layer
Our main tool in this computation will be the chain rule, which I’ll assume you’re familiar with. Let us start with the “easier” derivatives first:

Since
, we employ our chain rule as follows:

Since
, we can immediately infer
, where
is the Kronecker delta function, equal to 1 if i=j and 0 otherwise. Substituting and summing over k:

The quantities that appear in this equation are all directly computable by taking the input (for a particular
) and propagating it forwards in our neural networks. We’ll see how it’s actually done in an upcoming installment when we implement things in MATLAB, but for the time being, I’d just like to introduce a notation:

This is the error in the output. Using this definition:

The computation of the bias is done similarly, and is even easier (you should try it yourself before reading further):

where we have used:

Computing the Derivatives: Input Layer
We now turn to the task of computing the derivatives relating to the first, input layer. The method is similar, only now we need to apply the chain rule twice. Here is how it’s done. The first step is a rehash of the computations we did above:

Proceeding as above,

We now re-apply our chain rule as follows: first of all, note that

and that

and hence:

Substituting this back into our main derivation, we obtain:
![\frac{\partial E}{\partial w_{ij}} = \sum_{\mu} \left[ \sum_k \bar{e}_k^\mu \bar{f}' \left( \bar{h}_k^\mu \right) \bar{w}_{ki} \right] f' \left( h_i^\mu \right) x_j^\mu](http://www.physicallyincorrect.com/wp-content/cache/tex_61a51090b88c24ff3e489f45336ec98c.gif)
Formally speaking, this looks exactly like our expression for
derived above provided we define the error in the first layer’s outputs as:

which yields the compact expression (compare with :

I will leave it up to you as an exercise (don’t you just hate it when someone says that?
) to show that:

Now What?
Having computed the derivatives, all we need to do is adjust our weights and biases according to some gradient descent method. In the next installment I will give a “cookbook recipe” for doing that - no need to turn any mental gears. After that we’ll discuss how to initialize the weights and biases in the first place, and conclude with a discussion of some MATLAB code that will put the concepts in this tutorial to use. Until next time - have fun!
Comments
Leave a Reply
