rbfn.m
For simplicity, suppose that we are dealing with a 2-3-2
radial basis function network, and we have
100 input-output data pairs as the training data set.
The training data set can be represented by a
matrix:
For discussion convenience, we shall also defined and
as the node outputs for layer 1 and 2, respectively:
The centers of the Gaussian functions in the second layer can be expressed as
The parameters for the second layer can be defined as follows:
The equations for computing the output of the first layer are
rbfn.m
:
X1 = \exp(-(dist.^2)*diag(1./(2*SIGMA.^2)));
dist
is the The equations for computing the output of the second layer are
rbfn.m
:
X2 = X1*W;
The instantaneous error measure for the pth data pair is defined by
Ep = (t6,p-x6,p)2 + (t7,p-x7,p)2,
where t6,p and t7,p are the pth target outputs; x6,p and x7,p are the pth network outputs. The derivative of the above instantaneous error measure with respect to the network outputs is written asrbfn.m
:
dE_dX2 = -2*(T - X2);
Now we can compute the derivatives of Ep with respect to the second-layer's weights. The derivatives of Ep with respect to the parameters of node 6 are
We can combine the above eight equations to have the following concise expression:
Therefore the accumulated gradient vector is
The preceding equation corresponds to line 74 (or so) of rbfn.m
:
dE_dW = X1'*dE_dX2;
For derivatives of Ep with respect to x3, we have
Similarly, we have
The preceding three equations can be put into matrix form:
Hence the accumulated derivatives of E with respect to are
rbfn.m
:
dE_dX1 = dE_dX2*W';
The derivative of layer 1's output with respective to the standard deviations are
rbfn.m
:
dX1_dSigma = X1.*(dist.^2*diag(SIGMA.^(-3)));
The derivative of Ep with respect to the standard deviatins are
![]() |
(1) |
rbfn.m
:
dE_dSigma = sum(dE_dX1.*dX1_dSigma)';
Now we are moving toward the final step: to calculate the derivative
of E with respect to the centers of the gaussians.
Since
, the derivative
of x3 with respect to
are
![]() |
(2) |
The first term in the curly braces can be further simplified:
The preceding equation corresponds to line 81 (or so) of rbfn.m
:
dE_dCenter=diag(SIGMA.^(-2))*((dE_dX1.*X1)'*X0-diag(sum(dE_dX1.*X1))*CENTER);