CIS 311: Neural Networks
Learning Tasks: The Regression Problem
1. Inductive Learning and Regression
The inductive learning problem can be formulated as a multivariate
regression problem:
Given: a number N of example instantiated vectors
D = {( xi, yi )}i=1N
of several independent variables, that is, pattern vectors
xi = ( xi1, xi2,..., xid ) ÎRd
and corresponding values of the dependent variable yiÎR
Determine: find function models y = f( x ).
A common assumption is that the examples are drawn independently from a certain probability distribution. The function models are considered to be of the following kind:
y = f( x ) + e
where e is a zero mean noise with constant variance se.The solution to the problem is the regression function f~( x ) that maps given x to the conditional mean E[ y | x ] with respect to the underlying probability distribution.
Since this assumption has a somewhat unknown nature, and the examples are often noisy in practice, the goal is reformulated. In practice the goal is to find a reasonably best approximation f( x ) of the regression f~( x ) of y on x by minimizing the empirical error on the examples.
We search for f by picking candidate functions from some model family chosen in advance. The objective is to select from the model family this function that has the minimum distance from the provided data examples. Usually the distance is referred to as error function.
Traditionally the Gaussian distribution is considered, and the least squares fitting criterion is used to search for the function f( x ) that minimizes the average squared residual ASR:
ASR = 1/N * S i=1N( yi - f( xi ) )2
where yi is the true outcome of the i-th example, f( xi ) is the outcome estimated with the i-th input vector xi in the same example, and N is the sample size.
2. Model Function Families
Some widely used model families are:
f( x ) = w0 + Si=1d wi xi = wTx
f( x ) = w0 + Si=1p wi xi
f( x ) = w0 + Si=1d wi hi( x )
where hi() are prespecified, fixed functions.
f( x ) = j( Si=1d wi j( Sj=1d wj j( ... j( S k=1d wi xk )) ))
where j () is usually the sigmoid activation function.
3. Linear vs. Nonlinear Models
Linear regression is easily done by solving a set of linear equations, which is very fast and is available in various mathematical packages. Therefore, one can perform easily a lot of experiments with the data.
The solution to a linear regression problem is unique, there exists only one solution.
Nonlinear regression requires slow iterative search methods to learn the model parameters. Unfortunately there are usually many suboptimal solutions to the problems, that is there are many local optima in the search space.
That is why, it therefore takes considerably longer time to find nonlinear models which leaves less time for conducting experiments.
3. Accuracy and Overfitting
Since the training data are subsample from all possible data that represent the unknown function, when statistical and pure learning algorithms are used they may overemphasize the importance of the provided training data.
There are two notions that we employ to study the overfitting:
Model bias is a measure of how well we can model the true, unknown function with our selected model family.
If the learned model of the true function belongs to our family and perfectly fits the data, then we say this family has a zero model bias.
If the learned model of the true function is not a member of the selected family then we say that our model family is biased.
Model variance is a measure of how much our models vary when we train them with different training subsets.
If the model family is very small then there will be small differences between models trained with different training subsets and we say that the model variance is small.
If the model family is large then there can be large differences between models trained with different training subsets and we say that the model variance is large.
Suggested Readings:
Bishop,C. (1995) "Neural Networks for Pattern Recognition", Oxford University Press, Oxford, UK, pp.1-16.
Seber, G.A.F. (1977). Linear Regression Analysis. John Wiley & Sons: New York.
Wild,C.J. and Seber,G.A.F. (1989) Nonlinear Regression, John Wiley & Sons, New York.
McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models, Chapman & Hall, London.