SVM, RVM

Kernel Machines: SVM and RVM
Our research in machine learning develops kernel methods, like support vector machines (SVM) [Vapnik, 1995]
and relevance vector machines (RVM) [Tipping, 2001], which are contemporary alternatives to the traditional
nearest neighbours, decision trees [Quinlan, 1994], random forests and ensemble techniques (bagging and boosting)
[Friedman, Tibshirani and Hastie, 2009] for classification and regression tasks. The kernel machines provide
effective mechanisms for learning less overfitting models through superpositions of basis functions centered on
some of the data. They find well performing models by transforming the inputs into a higher dimensional feature
spaces where they become more amenable to learning.
We specialise in making relevance vector machines (RVM) that handle properly the uncertainty of the
parameters and the hyperparameters in a unified probabilistic way. The RVM has the same functional format as the SVM
but has less parameters and it infers the full predictive distribution with sparser models, and allows a more
liberal use of kernels. The RVM perform simultaneously parameter search (estimation of the weights distribution)
and model selection (via estimation of the noise distribution and the weights prior). The emphasis in our work is
on design of sequential training algorithms that estimate the variables of interest in recursive manner following
both the expectation maximisation (EM) and the variational Bayes (VB) frameowrks. We have implemented:
a sequential RVM (SRVM) machine operating with nonGaussian noise (using the recursive GaussNewton algorithm)
[Nikolaev and de Menezes, 2008];
an online RVM for Bayesian learning from time series with incremental parameter optimisation
[Nikolaev and Tino, 2005].
The sequential RVM is derived using a factorized variational approximation framework, and it can process data
that contain unusual observations and outliers assuimg a nonGaussian noise distribution. SRVM performs
robust Bayesian inference at two levels: 1) the mean and covariance matrix of the weights distribution are
computed recursively, and 2) the mean of the noise distribution is also obtained in recursive manner.
Experimental results show that these RVM machines show accurate predictions on benchmark and realworld
timeseries forecasting and classification problems.

MDNN, RNN

Mixture Density and Recurrent Neural Networks
The mixture density neural networks (MDNN) offer powerful mechanisms for flexible fitting of data coming from nonnormal
distributions. Equipped with Bayesian training techniques [Bishop, 2006] they allow us to fit and predict accurately
realworld time series from the fields of finance, medicine and environmental studies. Our research develops
static as well as dynamic recurrent mixture density neural networks (RMDN) [Nikolaev et al., 2013a].
The recurrent neural networks (RNN) [Williams and Zipser, 1989] are nonlinear models capable of exhibiting a rich
set of dynamical behaviors. RNNs are particularly suitable for for fitting temporal data as they operate on input
information as well as a trace of previously acquired information (due to recurrent connections) allowing for direct
processing of temporal dependencies. We have designed mixture density neural networks for environmental modeling,
and density recurrent networks for heteroscedastic time series modeling and volatility forecasting, which were
applied to risk evaluation tasks [Nikolaev et al., 2013a]. Our approach is probabilistic and involves recursive
secondorder training of recurrent networks with a regularized Bayesian Levenberg–Marquardt algorithm.
Currently we investigate deep neural networks (DNN) with restricted Boltzmann machines (RBM)
for time series modeling.

References
Bishop,C.M. (2006). Pattern Recognition and Machine Learning, SpringerVerlag, New York.
Friedman,J.H., Tibshirani,R. and Hastie, T. (2009). The Elements of Statistical Learning. Data Mining, Inference, and Prediction,
Springer Series in Statistics, SpringerVerlag, New York.
Nikolaev,N. and Tino,P. (2005). Sequential Relevance Vector Machine Learning from Time Series.
In: Proc. Int. Joint Conference on Neural Networks IJCNN, pp.13081313.
Nikolaev,N. and de Menezes,L. (2008). Sequential Bayesian Kernel Modelling with NonGaussian Noise,
Neural Networks, 21(1):3647.
Nikolaev,N., Tino,P. and Smirnov,E. (2013a). Timedependent Series Variance Learning with
Recurrent Mixture Density Networks, Neurocomputing, 122:501–512.
Nikolaev,N., Boshnakov,G. and Zimmer,R. (2013b). Heavytail Mixture GARCH Volatility Modeling and
ValueatRisk Estimation, Expert Systems with Applications, 40(6):22332243.
Quinlan,J.R. (1994). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers.
Smirnov,E.N., SprinkhuizenKuyper,I.G., and Nikolaev,N.Y. (2006). Generalizing Version Space
Support Vector Machines for NonSeparable Data. In: Proceedings of the IEEE Int. Workshop on
Reliability Issues in Knowledge Discovery (RIKD'06), IEEE Computer Society, Hong Kong, China, pp.744748.
Tipping,M.E. (2001). Sparse Bayesian Learning and the Relevance Vector Machine, Journal of Machine Learning
Research, 1:211–244.
Vapnik,V.N. (1995). The Nature of Statistical Learning Theory, SpringerVerlag, New York.
Williams,R.J. and Zipser,D. (1989). A learning algorithm for continuously running fully connected
recurrent neural networks, Neural Computation, 1(2):270280.
Relevant Learning Network Sites:
n.nikolaev@gold.ac.uk