tanmlp.m
For simplicity, suppose that we are dealing with a 1-3-2 multilayer
perceptron with hyperbolic tangent activation functions, and we have
100 input-output data pairs as the training data set.
The training data set can be represented by a
matrix:
For discussion convenience, we shall also defined and
as the node outputs for layer 1 and 2, respectively:
Similarly, the parameters and
for the first and
second layers can be defined as follows:
The equations for computing the output of the first layer are
tanmlp.m
:
X1 = tanh([X0 one]*W1);
tanmlp.m
:
X2 = tanh([X1 one]*W2);
The instantaneous error measure for the pth data pair is defined by
Ep = (t5,p-x5,p)2 + (t6,p-x6,p)2,
where t5,p and t6,p are the pth target outputs; x5,p and x6,p are the pth network outputs. The derivative of the above instantaneous error measure with respect to the network outputs is written astanmlp.m
:
dE_dX2 = -2*(T - X2);
Now we can compute the derivatives of Ep with respect to the second-layer's weights and bias. The derivatives of Ep with respect to the parameters (weights and bias) of node 5 are
Therefore the accumulated gradient vector is
![]() |
(1) |
tanmlp.m
:
dE_dW2 = [X1 one]'*(dE_dX2.*(1+X2).*(1-X2));
For derivatives of Ep with respect to x2, we have
Similarly, we have
The preceding three equations can be put into matrix form:
Hence the accumulated derivatives of E with respect to are
The preceding equation corresponds to line 62 of tanmlp.m
:
dE_dX1 = dE_dX2.*(1-X2).*(1+X2)*W2(1:hidden_n,:)';
By proceeding as what we have done in Equation (1), we have
tanmlp.m
:
dE_dW1 = [X0 one]'*(dE_dX1.*(1+X1).*(1-X1));