CIS 311: Neural Networks
Sigmoidal Perceptrons
1. Sigmoidal Perceptron
The simple single-layer Perceptrons with threshold or linear activation functions are not generalizable to more powerful learning mechanisms like multilayer neural networks. That is why, single-layer Perceptrons with sigmoidal activation functions are developed. The sigmoidal Perceptron produces output:
o = s( s ) = 1 / ( 1 + e-s ), where: s= Si=0d wi xi
2. Training Sigmoidal Perceptrons
The Gradient descent rule for training sigmoidal Perceptrons is again:
wi = wi - h ¶E/¶wi
The difference is in the error derivative ¶E/¶wi, which due to the use of the sigmoidal function s( s ) becomes:
¶
= ( ½ )Se¶( ye–oe )2/¶wi
= ( ½ )Se2( ye–oe )¶( ye–oe )/¶wi
= Se( ye–oe )¶( ye - s( s ) )/¶wi
= Se( ye–oe ) s'( s ) ( -xie )
The Gradient descent training rule for training sigmoidal Perceptrons is:
wi = wi + h Se( ye–oe ) s'( s ) xie
where: s'( s ) = s( s )( 1 - s( s ) ).
Gradient Descent Learning Algorithm for Sigmoidal Perceptrons
Initialization: Examples {( xe, ye)}e=1N, initial weights wi set to small random values, learning rate parameter h = 0.1
Repeat
for each training example ( xe, ye )
- calculate the output: o = s( s ) = 1 / ( 1 + e-s ), where: s= Si=0d wi xi
- if the Perceptron does not respond correctly compute weight corrections:
Dwi = Dwi + h ( ye - oe ) s( s )( 1 - s( s )) xie
update the weights with the accumulated error from all examples
wi = wi + Dwi // Gradient Descent Rule
until termination condition is satisfied.
Example: Suppose an example of Perceptron which accepts two inputs x1 and x2, with weights w1 = 0.5 and w2 = 0.3 and w0 = -1.
Let the following example is given: x1 = 2, x2 = 1, y = 0
The output of the Perceptron is :
o = s( -1 + 2 * 0.5 + 1 * 0.3 ) = s( 0.3 ) = 0.5744
The weight updates according to the gradient descent algorithm will be:
Dw0 = ( 0 - 0.5744 ) * 0.5744 * ( 1 -0.5744 ) * 1 = - 0.1404
Dw1 = ( 0 - 0.5744 ) * 0.5744 * ( 1 -0.5744 ) * 2 = - 0.2808
Dw2 = ( 0 - 0.5744 ) * 0.5744 * ( 1 -0.5744 ) * 1 = - 0.1404
Let another example is given: x1 = 1, x2 = 2, y = 1
The output of the Perceptron is :
o = s( -1 + 1 * 0.5 + 2 * 0.3 ) = s( 0.1 ) = 0.525
The weight updates according to the gradient descent algorithm will be:
Dw0 = - 0.1404 + ( 1 - 0.525 ) * 0.525 * ( 1 - 0.525 ) * 1 = -0.0219
Dw1 = - 0.2808 + ( 1 - 0.525 ) * 0.525 * ( 1 - 0.525 ) * 1 = -0.1623
Dw2 = - 0.1404 + ( 1 - 0.525 ) * 0.525 * ( 1 - 0.525 ) * 2 = 0.0966
If there are no more examples in the batch, the weights will be modified as follows:
w0 = - 1 + ( -0.0219 ) = -1.0219
w1 = 0.5 + ( -0.1623 ) = 0.3966
w2 = 0.3 + 0.0966 = 0.3966
Incremental Gradient Descent Learning Algorithm for Sigmoidal Perceptrons
Initialization: Examples {( xe, ye)}e=1N, initial weights wi set to small random values, learning rate parameter h = 0.1
Repeat
for each training example ( xe, ye )
- calculate the output: o = s( s ) = 1 / ( 1 + e-s ), where: s= Si=0d wi xi
- if the Perceptron does not respond correctly update the weights:
wi = wi + h ( ye - oe ) s( s )( 1 - s( s )) xie // Incremental Gradient Descent Rule
until termination condition is satisfied.
Suggested Readings:
Bishop,C. (1995) "Neural Networks for Pattern Recognition", Oxford University Press, Oxford, UK, pp.98-105.
Haykin, Simon. (1999). "Neural Networks. A Comprehensive Foundation", Second Edition, Prentice-Hall, Inc., New Jersey, 1999.