Nikolay Nikolaev, Goldsmiths College, University of London

Machine Learning


Kernel Machines: SVM and RVM

Our research in machine learning develops kernel methods, like support vector machines (SVM) [Vapnik, 1995] and relevance vector machines (RVM) [Tipping, 2001], which are contemporary alternatives to the traditional nearest neighbours, decision trees [Quinlan, 1994], random forests and ensemble techniques (bagging and boosting) [Friedman, Tibshirani and Hastie, 2009] for classification and regression tasks. The kernel machines provide effective mechanisms for learning less overfitting models through superpositions of basis functions centered on some of the data. They find well performing models by transforming the inputs into a higher dimensional feature spaces where they become more amenable to learning.

We specialise in making relevance vector machines (RVM) that handle properly the uncertainty of the parameters and the hyperparameters in a unified probabilistic way. The RVM has the same functional format as the SVM but has less parameters and it infers the full predictive distribution with sparser models, and allows a more liberal use of kernels. The RVM perform simultaneously parameter search (estimation of the weights distribution) and model selection (via estimation of the noise distribution and the weights prior). The emphasis in our work is on design of sequential training algorithms that estimate the variables of interest in recursive manner following both the expectation maximisation (EM) and the variational Bayes (VB) frameowrks. We have implemented:

-a sequential RVM (SRVM) machine operating with non-Gaussian noise (using the recursive Gauss-Newton algorithm) [Nikolaev and de Menezes, 2008];
-an online RVM for Bayesian learning from time series with incremental parameter optimisation [Nikolaev and Tino, 2005].

The sequential RVM is derived using a factorized variational approximation framework, and it can process data that contain unusual observations and outliers assuimg a non-Gaussian noise distribution. SRVM performs robust Bayesian inference at two levels: 1) the mean and covariance matrix of the weights distribution are computed recursively, and 2) the mean of the noise distribution is also obtained in recursive manner.

Experimental results show that these RVM machines show accurate predictions on benchmark and real-world time-series forecasting and classification problems.


Mixture Density and Recurrent Neural Networks

The mixture density neural networks (MDNN) offer powerful mechanisms for flexible fitting of data coming from non-normal distributions. Equipped with Bayesian training techniques [Bishop, 2006] they allow us to fit and predict accurately real-world time series from the fields of finance, medicine and environmental studies. Our research develops static as well as dynamic recurrent mixture density neural networks (RMDN) [Nikolaev et al., 2013a].

The recurrent neural networks (RNN) [Williams and Zipser, 1989] are nonlinear models capable of exhibiting a rich set of dynamical behaviors. RNNs are particularly suitable for for fitting temporal data as they operate on input information as well as a trace of previously acquired information (due to recurrent connections) allowing for direct processing of temporal dependencies. We have designed mixture density neural networks for environmental modeling, and density recurrent networks for heteroscedastic time series modeling and volatility forecasting, which were applied to risk evaluation tasks [Nikolaev et al., 2013a]. Our approach is probabilistic and involves recursive second-order training of recurrent networks with a regularized Bayesian Levenberg–Marquardt algorithm.

Currently we investigate deep neural networks (DNN) with restricted Boltzmann machines (RBM) for time series modeling.


Bishop,C.M. (2006). Pattern Recognition and Machine Learning, Springer-Verlag, New York.

Friedman,J.H., Tibshirani,R. and Hastie, T. (2009). The Elements of Statistical Learning. Data Mining, Inference, and Prediction, Springer Series in Statistics, Springer-Verlag, New York.

Nikolaev,N. and Tino,P. (2005). Sequential Relevance Vector Machine Learning from Time Series. In: Proc. Int. Joint Conference on Neural Networks IJCNN, pp.1308-1313.

Nikolaev,N. and de Menezes,L. (2008). Sequential Bayesian Kernel Modelling with Non-Gaussian Noise, Neural Networks, 21(1):36-47.

Nikolaev,N., Tino,P. and Smirnov,E. (2013a). Time-dependent Series Variance Learning with Recurrent Mixture Density Networks, Neurocomputing, 122:501–512.

Nikolaev,N., Boshnakov,G. and Zimmer,R. (2013b). Heavy-tail Mixture GARCH Volatility Modeling and Value-at-Risk Estimation, Expert Systems with Applications, 40(6):2233-2243.

Quinlan,J.R. (1994). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers.

Smirnov,E.N., Sprinkhuizen-Kuyper,I.G., and Nikolaev,N.Y. (2006). Generalizing Version Space Support Vector Machines for Non-Separable Data. In: Proceedings of the IEEE Int. Workshop on Reliability Issues in Knowledge Discovery (RIKD'06), IEEE Computer Society, Hong Kong, China, pp.744-748.

Tipping,M.E. (2001). Sparse Bayesian Learning and the Relevance Vector Machine, Journal of Machine Learning Research, 1:211–244.

Vapnik,V.N. (1995). The Nature of Statistical Learning Theory, Springer-Verlag, New York.

Williams,R.J. and Zipser,D. (1989). A learning algorithm for continuously running fully connected recurrent neural networks, Neural Computation, 1(2):270-280.

Relevant Learning Network Sites: