tanmlp.m
For simplicity, suppose that we are dealing with a 1-3-2 multilayer perceptron with hyperbolic tangent activation functions, and we have 100 input-output data pairs as the training data set. The training data set can be represented by a matrix:
where the input part is denoted by : and the output (target) part is denoted by :For discussion convenience, we shall also defined and as the node outputs for layer 1 and 2, respectively:
Similarly, the parameters and for the first and second layers can be defined as follows:
The equations for computing the output of the first layer are
or equivalently, After plugging 100 inputs into the preceding equation, we have or equivalently, The preceding equation corresponds to line 47 oftanmlp.m
:
X1 = tanh([X0 one]*W1);
tanmlp.m
:
X2 = tanh([X1 one]*W2);
The instantaneous error measure for the pth data pair is defined by
Ep = (t5,p-x5,p)2 + (t6,p-x6,p)2,
where t5,p and t6,p are the pth target outputs; x5,p and x6,p are the pth network outputs. The derivative of the above instantaneous error measure with respect to the network outputs is written as We can stack the above equation for each p to obtain the following matrix expression: where is the actual output of the MLP. The preceding equation corresponds to line 59 oftanmlp.m
:
dE_dX2 = -2*(T - X2);
Now we can compute the derivatives of Ep with respect to the second-layer's weights and bias. The derivatives of Ep with respect to the parameters (weights and bias) of node 5 are
The derivatives of Ep with respect to the parameters (weights and bias) of node 6 are We can combine the above eight equations to have the following concise expression:Therefore the accumulated gradient vector is
(1) |
tanmlp.m
:
dE_dW2 = [X1 one]'*(dE_dX2.*(1+X2).*(1-X2));
For derivatives of Ep with respect to x2, we have
Similarly, we have
The preceding three equations can be put into matrix form:
Hence the accumulated derivatives of E with respect to are
The preceding equation corresponds to line 62 of tanmlp.m
:
dE_dX1 = dE_dX2.*(1-X2).*(1+X2)*W2(1:hidden_n,:)';
By proceeding as what we have done in Equation (1), we have
The preceding equation corresponds to line 63 oftanmlp.m
:
dE_dW1 = [X0 one]'*(dE_dX1.*(1+X1).*(1-X1));