CIS 311: Neural Networks

Multilayer Perceptrons

1. The Multilayer Perceptron

The multilayer perceptron (MLP) is a hierarchical structure of several perceptrons, and overcomes the shortcomings of these single-layer networks.

The multilayer perceptron is an artificial neural network that learns nonlinear function mappings. The multilayer perceptron is capable of learning a rich variety of nonlineardecision surfaces.

Nonlinear functions can be represented by multilayer perceptrons with units that use nonlinear activation functions. Multiple layers of cascaded linear units still produce only linear mappings.

1.1 Differentiable Activation Functions

The training algorithm for multilayer networks requires differentiable, continuous nonlinear activation functions.
Such a function is the sigmoid, or logistic function:

o = s ( s ) = 1 / ( 1 + e-s )

where s is the sum: s=S i=0d wi xi of products from the weights wi and the inputs xi.

Sometimes s is called alternatively squashing function as it maps a very large input domain to a small range of outputs.

Another nonlinear function often used in practice is the hyperbolic tangent:

o = tanh( s ) = ( es - e-s ) / (es + e-s)

Sometimes the hyperbolic tangent is preferred as it makes the training a little easier.

1.2 Multilayer Network Structure

A neural network with one or more layers of nodes between the input and the output nodes is called multilayer network.

The multilayer network structure, or architecture, or topology, consists of an input layer, two or more hidden layers, and one output layer. The input nodes pass values to the first hidden layer, its nodes to the second and so on till producing outputs.

A network with a layer of input units, a layer of hidden units and a layer of output units is a two-layer network. A network with two layers of hidden units is a three-layer network, and so on. A justification for this is that the layer of input units is used only as an input channel and can therefore be discounted.

Figure 1. Multilayer Perceptron (MLP)

A two-layer neural network that implements the function:

f( x )= s ( S j=0J wjks ( S i=0I wijxi + w0j ) + w0k )

where: x is the input vector,

w0j and w0k are the thresholds,

wij are the weights connecting the input with the hidden nodes

wjk are the weights connecting the hidden with the output nodes

s is the sigmoid activation function.

These are the hidden units that enable the multilayer network to learn complex tasks by extracting progressively more meaningful information from the input examples.

The multilayer network MLP has a highly connected topology since every input is connected to all nodes in the first hidden layer, every unit in the hidden layers is connected to all nodes in the next layer, and so on.

The input signals, initially these are the input examples, propagate through the neural network in a forward direction on a layer-by-layer basis, that is why they are often called feedforward multilayer networks.

Two kinds of signals pass through these networks:
- function signals: the input examples propagated through the hidden units and processed by their activation functions emerge as outputs;
- error signals: the errors at the otuput nodes are propagated backward layer-by-layer through the network so that each node returns its error back to the nodes in the previous hidden layer.

1.3 Representation Power of MLP

Several properties concerning the representational power of the feedforward MLP have been proven:

- learning arbitrary functions: any function can be learned with an arbitrary accuracy by a three-layer network;

- learning continuous functions: every bounded continuous function can be learned with a small error by a two-layer network (the number of hidden units depends on the function to be approximated);

- learning boolean functions: every boolean function can be learned exactly by atwo-layer network although the number of hidden units grows exponentially with the input dimension.