rbfn.m
For simplicity, suppose that we are dealing with a 2-3-2 radial basis function network, and we have 100 input-output data pairs as the training data set. The training data set can be represented by a matrix:
where the input part is denoted by : and the output (target) part is denoted by :For discussion convenience, we shall also defined and as the node outputs for layer 1 and 2, respectively:
The centers of the Gaussian functions in the second layer can be expressed as
where each row represents a center point in a two-dimensional space. The standard deviation of the Gaussian functions in the first layer can be expressed as where each element represents a standard deviation of a gaussian functions in the second layer.The parameters for the second layer can be defined as follows:
The equations for computing the output of the first layer are
where and , i=1,2,3. After plugging 100 inputs into the preceding equation, we have where is the pth input vector. The preceding expression can be further simplified into the following expression: This corresponds to line 63 (or so) inrbfn.m
:
X1 = \exp(-(dist.^2)*diag(1./(2*SIGMA.^2)));
dist
is the distance matrix between the
100 input vectors and 3 gaussian centers.
The equations for computing the output of the second layer are
or equivalently, After plugging 100 data entries into the preceding equation, we have or equivalently, The preceding equation corresponds to line 64 (or os) ofrbfn.m
:
X2 = X1*W;
The instantaneous error measure for the pth data pair is defined by
Ep = (t6,p-x6,p)2 + (t7,p-x7,p)2,
where t6,p and t7,p are the pth target outputs; x6,p and x7,p are the pth network outputs. The derivative of the above instantaneous error measure with respect to the network outputs is written as We can stack the above equation for each p to obtain the following matrix expression: where is the actual output of the MLP. The preceding equation corresponds to line 56 ofrbfn.m
:
dE_dX2 = -2*(T - X2);
Now we can compute the derivatives of Ep with respect to the second-layer's weights. The derivatives of Ep with respect to the parameters of node 6 are
The derivatives of Ep with respect to the parameters of node 7 areWe can combine the above eight equations to have the following concise expression:
Therefore the accumulated gradient vector is
The preceding equation corresponds to line 74 (or so) of rbfn.m
:
dE_dW = X1'*dE_dX2;
For derivatives of Ep with respect to x3, we have
Similarly, we have
The preceding three equations can be put into matrix form:
Hence the accumulated derivatives of E with respect to are
The preceding equation corresponds to line 77 (or so) of
rbfn.m
:
dE_dX1 = dE_dX2*W';
The derivative of layer 1's output with respective to the standard deviations are
Similarly, The preceding three equations can be put into a matrix format: Therefore we have This corresponds to line 78 ofrbfn.m
:
dX1_dSigma = X1.*(dist.^2*diag(SIGMA.^(-3)));
The derivative of Ep with respect to the standard deviatins are
Hence(1) |
rbfn.m
:
dE_dSigma = sum(dE_dX1.*dX1_dSigma)';
Now we are moving toward the final step: to calculate the derivative of E with respect to the centers of the gaussians. Since , the derivative of x3 with respect to are
Hence Similarly, we have(2) |
The first term in the curly braces can be further simplified:
The second term in the curly brace of Equation (2) can be simplified similarly as what we have done in Equation (1), which leads toConsequently, Equation (2) can be simplified as follows:
The preceding equation corresponds to line 81 (or so) of rbfn.m
:
dE_dCenter=diag(SIGMA.^(-2))*((dE_dX1.*X1)'*X0-diag(sum(dE_dX1.*X1))*CENTER);