CIS 311: Neural Networks

Radial - Basis Function Networks

1. The Interpolation Problem

The multilayer perceptron(MLP) address the problem of stochastic approximation.

The radial-basis function networks (RBF) address the problem of curve-fitting, that is approximation in high-dimensional spaces. Learning in this case is equivalent to finding an interpolating surface in the multidimensional space that provides a best fit to the training data, measured by preselected statistical criteria.

The radial-basis function networks are developed for addressing the interpolation problem.

The curve-fitting or interpolation problem can be stated as follows:

Given a set of examples {( xe, ye )}e=1N find a function F: RN® R1that satisfies the interpolation condition: F( xe ) = ye for each e=1,2,...,N.

2. Radial-Basis Functions

The radial-basis functions technique suggests constructing of interpolation functions F of the following form:

F( x ) = Si=1N wi j( || x - xi || )

where: j( || x - xi || ) is a set of nonlinear radial-basis functions, xi are the centers of these functions, and ||.|| is the Euclidean norm.

The unknown weights can be found by solving the following linear matrix equation:

w = ( FTF )-1 FT y

where: w is the weight vector, y is the response vector, and F is the design matrix

F = { jei | jei = j( || xe - xi || ), ( e,i )=1,2,...,N }

Several widely used basis functions are:

- multiquadratics: j( x ) = ( x2 + c2 )1/2 for some c>0

- inverse multiquadratics: j( x ) = 1 / ( x2 + c2 )1/2 for some c>0

- Gaussian: j( x ) = exp( - x2 / 2s 2 ) for some s >0

The multivariate Gaussian function provides also two important properties that make it a proper choice for building radial-basis functions: it is both translation and rotation invariant. The multivariate Gaussian function with these properties is called also Green's function:

G( x, xi ) = exp( - || x - xi ||2 / 2si2 )

3. Regularization Networks

The regularization network models the interpolation function F as a linear superposition (linear weighted sum ) of multivariate Gaussian functions which number is equal to the number of the given input examples N:

F( x ) = Si=1N wi G( x, xi ) or F( x ) = Si=1N wi exp( - || x - xi ||2 / 2si2 )

where: wi are the weights.

The regularization network guarantees three important characteristics:
- it is a universal approximator in that it can approximate arbitrarily well any multivariate continuous function on a compact set, given a sufficiently large number of units;
- it has the best approximation property in that given an unknown nonlinear function there always exists a choice of coefficients that approximates the function better than all other choices;
- it produces optimal solutions that minimize the functional which measures how much the solution deviates from its true value as represented by the training examples.

4. Generalized RBF Networks

The one-to-one correspondence between the number of Green functions and the number of the tarining examples often become computationally inefficient in practice, in the sense that it may require combining a very large number of basis functions (since often in practice a large number of training examples are provided).

In real-world practical situations finding of the linear basis function weights needs to invert very large N ´ N matrices which may be ill-conditioned. In order to avoid such problems the network topology should be reduced.

Reducing of the network complexity means that we will attempt to find a solution that approximates the solution produced by the regularization network.

4.1. RBF Network Structure

An RBF network architecture consists of three layers:
- input layer: which passes the example vectors to the next layers
- hidden layer: applies a non-linear transformation function to the inputs and expandsthem in the usually very high-dimensional hidden space
- output layer: applies a linear transformation to the given vector

Note, however, that sometimes RBF is called one-layer network as it has only one hidden layer!

4.2. Function Mapping

The radial-basis function network models the interpolation function F as a linear superposition of a finite number of multivariate Gaussians as follows:

F( x ) = Si=1n wi exp( - || x - xi ||2 / 2si2 )

where: wi are the weights, and the number of basis functions n is less than the number of points in the training set n < N. The basis function centers xi are determined in advance.

4.3. RBF Learning

Learning of the network weights wi involves minimizing of the cost functional:

C = Se=1N ( ye - F( xe ))2 + l Sj=1n wj2

where: l is a regularization parameter.

The weight vector solution is given by the normal equation:

w = ( FT F + lI )-1 FT y

where: I is n ´ n identity matrix which elements are all zero except these on the diagonal

y is the output vector given with the examples

F is the n ´ n design matrix of basis functions j = exp( - || x - xi ||2 / 2si2 )


F= j11 j12 ... j1n
       j21 j22 ... j2n
       ... 
       jN1 jN2 ... jNn

RBF Training Algorithm

Initialization: Examples {( xe, ye )}e=1N

Determine:

- the network structure with a number n of basis functions ji, i=1,2,...,n

- the basis function centers xi, i=1,2,...,n ( using for example the k-means clustering algorithm )

- the basis function variances si2, i=1,2,...,n ( by setting si2 equal to the average Euclidean distance from xi to its 5 nearest neighbors )

Training:

- calculate the outputs from each eth example with the Gaussian basis functions:

jei = exp( - || xe - xi ||2 / 2si2 ) // at each ith hidden unit

where i=1,2,...,n; e=1,2,...,N

- compute the correlation matrix: FT F, and perform the summation: FT F + lI

- invert the matrix: ( FT F + lI )-1

- compute the vector: FT y

- estimate and the weights: w = ( FT F + lI )-1 FT y

The radial-basis function centers xi and variances s i2 may be trained along with the output weights but this takes a long time. That is why, they are often determined in advance.

Suggested Readings:

Bishop,C. (1995). Neural Networks for Pattern Recognition, Oxford University Press, Oxford, UK, pp.164-190 (Chapter 5).

Haykin, Simon. (1999). Neural Networks. A Comprehensive Foundation., Second Edition, Prentice-Hall, Inc., New Jersey, Section 5.