311: Neural Networks

On-line vs. Batch Learning Algorithms
for Single-layer Perceptrons

1. The Delta Rule

The gradient descent rule faces two difficulties in practice:
- it converges very slowly
- if there are multiple local minima in the error surface then there is no guarantee that it will find the global minimum

That is why, a stochastic version called incremental gradient descent rule is developed to overcome these difficulties.

Whereas the gradient descent rule updates the weights after calculating the whole error accumulated from all examples, the incremental version approximates the gradient descent error decrease by updating the weights after each training example.

Incremental gradient descent is implemented according to the Delta rule:

wi = wi + h ( ye - oe ) xie where o = Si=0d wi xie

The Delta rule is also called LMS (least-mean-square) rule or Widrow-Hoff rule.

Incremental Gradient Descent Learning Algorithm

Initialization: Examples {( xe, ye )}e=1N, initial weights wi set to small random values, learning rate parameter h = 0.1

Repeat

for each training example ( xe, ye )

- calculate the output: o = S i=0d wi xie

- if the Perceptron does not respond correctly update the weights:

wi = wi + h ( ye - oe ) xie // this is the Delta Rule

until termination condition is satisfied.

Example: Suppose an example of perceptron which accepts two inputs x1 = 2 and x2 = 1, with weights w1 = 0.5 and w2 = 0.3 and w0 = -1.

The output of the perceptron is :

O = 2 * 0.5 + 1 * 0.3 - 1 = 0.3

If the given correct output is zero, that is ye=0, the weights will be adjusted according to the incremental gradient descent as follows:

w1 = 0.5 + ( 0 - 0.3 ) * 2 = - 0.1

w2 = 0.3 + ( 0 - 0.3 ) * 1 = 0

w0 = - 1 + ( 0 - 0.3 ) * 1 = - 1.3

Suggested Readings:

Bishop,C. (1995) "Neural Networks for Pattern Recognition",
Oxford University Press, Oxford, UK, pp.98-105.

Haykin, Simon. (1999). "Neural Networks. A Comprehensive Foundation",
Second Edition, Prentice-Hall, Inc., New Jersey, 1999.