CIS 311: Neural Networks

Sigmoidal Perceptrons

1. Sigmoidal Perceptron

The simple single-layer Perceptrons with threshold or linear activation functions are not generalizable to more powerful learning mechanisms like multilayer neural networks. That is why, single-layer Perceptrons with sigmoidal activation functions are developed. The sigmoidal Perceptron produces output:

o = s( s ) = 1 / ( 1 + e-s ), where: s= Si=0d wi xi

2. Training Sigmoidal Perceptrons

The Gradient descent rule for training sigmoidal Perceptrons is again:

wi = wi - h E/wi

The difference is in the error derivative E/wi, which due to the use of the sigmoidal function s( s ) becomes:

E/wi = ( ( ½ )S e( yeoe )2)/wi

= ( ½ )Se( yeoe )2/wi

= ( ½ )Se2( yeoe )( yeoe )/wi

= Se( yeoe )( ye - s( s ) )/wi

= Se( yeoe ) s'( s ) ( -xie )


where xie denotes the i-th component of the example

The Gradient descent training rule for training sigmoidal Perceptrons is:

wi = wi + h Se( yeoe ) s'( s ) xie

where: s'( s ) = s( s )( 1 - s( s ) ).

Gradient Descent Learning Algorithm for Sigmoidal Perceptrons

Initialization: Examples {( xe, ye)}e=1N, initial weights wi set to small random values, learning rate parameter h = 0.1

Repeat

for each training example ( xe, ye )

- calculate the output: o = s( s ) = 1 / ( 1 + e-s ), where: s= Si=0d wi xi

- if the Perceptron does not respond correctly compute weight corrections:

Dwi = Dwi + h ( ye - oe ) s( s )( 1 - s( s )) xie

update the weights with the accumulated error from all examples

wi = wi + Dwi // Gradient Descent Rule

until termination condition is satisfied.

Example: Suppose an example of Perceptron which accepts two inputs x1 and x2, with weights w1 = 0.5 and w2 = 0.3 and w0 = -1.

Let the following example is given: x1 = 2, x2 = 1, y = 0
The output of the Perceptron is :

o = s( -1 + 2 * 0.5 + 1 * 0.3 ) = s( 0.3 ) = 0.5744

The weight updates according to the gradient descent algorithm will be:

Dw0 = ( 0 - 0.5744 ) * 0.5744 * ( 1 -0.5744 ) * 1 = - 0.1404

Dw1 = ( 0 - 0.5744 ) * 0.5744 * ( 1 -0.5744 ) * 2 = - 0.2808

Dw2 = ( 0 - 0.5744 ) * 0.5744 * ( 1 -0.5744 ) * 1 = - 0.1404

Let another example is given: x1 = 1, x2 = 2, y = 1

The output of the Perceptron is :

o = s( -1 + 1 * 0.5 + 2 * 0.3 ) = s( 0.1 ) = 0.525

The weight updates according to the gradient descent algorithm will be:

Dw0 = - 0.1404 + ( 1 - 0.525 ) * 0.525 * ( 1 - 0.525 ) * 1 = -0.0219

Dw1 = - 0.2808 + ( 1 - 0.525 ) * 0.525 * ( 1 - 0.525 ) * 1 = -0.1623

Dw2 = - 0.1404 + ( 1 - 0.525 ) * 0.525 * ( 1 - 0.525 ) * 2 = 0.0966

If there are no more examples in the batch, the weights will be modified as follows:

w0 = - 1 + ( -0.0219 ) = -1.0219

w1 = 0.5 + ( -0.1623 ) = 0.3966

w2 = 0.3 + 0.0966 = 0.3966

Incremental Gradient Descent Learning Algorithm for Sigmoidal Perceptrons

Initialization: Examples {( xe, ye)}e=1N, initial weights wi set to small random values, learning rate parameter h = 0.1

Repeat

for each training example ( xe, ye )

- calculate the output: o = s( s ) = 1 / ( 1 + e-s ), where: s= Si=0d wi xi

- if the Perceptron does not respond correctly update the weights:

wi = wi + h ( ye - oe ) s( s )( 1 - s( s )) xie // Incremental Gradient Descent Rule

until termination condition is satisfied.

Suggested Readings:

Bishop,C. (1995) "Neural Networks for Pattern Recognition", Oxford University Press, Oxford, UK, pp.98-105.

Haykin, Simon. (1999). "Neural Networks. A Comprehensive Foundation", Second Edition, Prentice-Hall, Inc., New Jersey, 1999.