CIS 311: Neural Networks
Radial - Basis Function Networks
1. The Interpolation Problem
The multilayer perceptron(MLP) address the problem of stochastic approximation.
The radial-basis function networks (RBF) address the problem of curve-fitting, that is approximation in high-dimensional spaces. Learning in this case is equivalent to finding an interpolating surface in the multidimensional space that provides a best fit to the training data, measured by preselected statistical criteria.
The radial-basis function networks are developed for addressing the interpolation problem.
The curve-fitting or interpolation problem can be stated as follows:
Given a set of examples {( xe, ye )}e=1N find a function F: RN® R1that satisfies the interpolation condition: F( xe ) = ye for each e=1,2,...,N.
2. Radial-Basis Functions
The radial-basis functions technique suggests constructing of interpolation functions F of the following form:
F( x ) = Si=1N wi j( || x - xi || )
where: j( || x - xi || ) is a set of nonlinear radial-basis functions, xi are the centers of these functions, and ||.|| is the Euclidean norm.
The unknown weights can be found by solving the following linear matrix equation:
w = ( FTF )-1 FT y
where: w is the weight vector, y is the response vector, and F is the design matrix
F = { jei | jei = j( || xe - xi || ), ( e,i )=1,2,...,N }
Several widely used basis functions are:
- multiquadratics: j( x ) = ( x2 + c2 )1/2 for some c>0
- inverse multiquadratics: j( x ) = 1 / ( x2 + c2 )1/2 for some c>0
- Gaussian: j( x ) = exp( - x2 / 2s 2 ) for some s >0
The multivariate Gaussian function provides also two important properties that make it a proper choice for building radial-basis functions: it is both translation and rotation invariant. The multivariate Gaussian function with these properties is called also Green's function:
G( x, xi ) = exp( - || x - xi ||2 / 2si2 )
3. Regularization Networks
The regularization network models the interpolation function F as a linear superposition (linear weighted sum ) of multivariate Gaussian functions which number is equal to the number of the given input examples N:
F( x ) = Si=1N wi G( x, xi ) or F( x ) = Si=1N wi exp( - || x - xi ||2 / 2si2 )
where: wi are the weights.
The regularization network guarantees three important characteristics:
- it is a universal approximator in that it can approximate arbitrarily well any multivariate continuous function on a compact set, given a sufficiently large number of units;
- it has the best approximation property in that given an unknown nonlinear function there always exists a choice of coefficients that approximates the function better than all other choices;
- it produces optimal solutions that minimize the functional which measures how much the solution deviates from its true value as represented by the training examples.
4. Generalized RBF Networks
The one-to-one correspondence between the number of Green functions and the number of the tarining examples often become computationally inefficient in practice, in the sense that it may require combining a very large number of basis functions (since often in practice a large number of training examples are provided).
In real-world practical situations finding of the linear basis function weights needs to invert very large N ´ N matrices which may be ill-conditioned. In order to avoid such problems the network topology should be reduced.
Reducing of the network complexity means that we will attempt to find a solution that approximates the solution produced by the regularization network.
4.1. RBF Network Structure
An RBF network architecture consists of three layers:
- input layer: which passes the example vectors to the next layers
- hidden layer: applies a non-linear transformation function to the inputs and expandsthem in the usually very high-dimensional hidden space
- output layer: applies a linear transformation to the given vector
Note, however, that sometimes RBF is called one-layer network as it has only one hidden layer!
4.2. Function Mapping
The radial-basis function network models the interpolation function F as a linear superposition of a finite number of multivariate Gaussians as follows:
F( x ) = Si=1n wi exp( - || x - xi ||2 / 2si2 )
where: wi are the weights, and the number of basis functions n is less than the number of points in the training set n < N. The basis function centers xi are determined in advance.
4.3. RBF Learning
Learning of the network weights wi involves minimizing of the cost functional:
C = Se=1N ( ye - F( xe ))2 + l Sj=1n wj2
where: l is a regularization parameter.
The weight vector solution is given by the normal equation:
w = ( FT F + lI )-1 FT y
where: I is n ´ n identity matrix which elements are all zero except these on the diagonal
y is the output vector given with the examples
F is the n ´ n design matrix of basis functions j = exp( - || x - xi ||2 / 2si2 )
F= j11 j12 ... j1n
j21 j22 ... j2n
...
jN1 jN2 ... jNn
RBF Training Algorithm
Initialization: Examples {( xe, ye )}e=1N
Determine:
- the network structure with a number n of basis functions ji, i=1,2,...,n
- the basis function centers xi, i=1,2,...,n ( using for example the k-means clustering algorithm )
- the basis function variances si2, i=1,2,...,n ( by setting si2 equal to the average Euclidean distance from xi to its 5 nearest neighbors )
Training:
- calculate the outputs from each eth example with the Gaussian basis functions:
jei = exp( - || xe - xi ||2 / 2si2 ) // at each ith hidden unit
where i=1,2,...,n; e=1,2,...,N
- compute the correlation matrix: FT F, and perform the summation: FT F + lI
- invert the matrix: ( FT F + lI )-1
- compute the vector: FT y
- estimate and the weights: w = ( FT F + lI )-1 FT y
The radial-basis function centers xi and variances s i2 may be trained along with the output weights but this takes a long time. That is why, they are often determined in advance.
Suggested Readings:
Bishop,C. (1995). Neural Networks for Pattern Recognition, Oxford University Press, Oxford, UK, pp.164-190 (Chapter 5).
Haykin, Simon. (1999). Neural Networks. A Comprehensive Foundation., Second Edition, Prentice-Hall, Inc., New Jersey, Section 5.