Nikolay Nikolaev, Goldsmiths College, University of LondonAutomated Algorithmic Trading: Learning Agents

An Automated Algorithmic Trading System is a software platform that collects buy and sell orders from algorithmic trading agents and executes the trades on a computerized financial market. These trading systems become increasingly popular due to the electronic nature of the transactions executed on many of the current stock exchanges [Johnson, 2010], [Aldridge, 2013], [Treleaven at al., 2013], [Kissell, 2014]. Our research implements such quantitative trading systems and specializes in the elaboration of optimal intelligent trading agents that bid on the market (rather than on discovery of equilibrium prices). The aim is to construct profitable trading agents, not only to simulate the market evolution. We build dynamic agents that continuously learn to trade on a double auction market, assuming that it is populated by other static agents with preprogrammed behaviour and also market makers. Through temporal retraining of the dynamic agents we seek hedging away risks and achieving high profits. Double Auction Market. The Continuous Double Auction (CDA) Market [Friedman, 2005] is an exchange mechanism that manages the trades of financial instruments (assets) and serves as an arbiter between the players according to the market protocol. The term double comes from the allowance to traders to send two different types of orders: buy (bid) and sell (ask), and the term continuous is because they are allowed to do this at any time. The auction repeats organizing trading sessions (each lasting a number of steps) starting with allocations of stocks and money among the participants and terminating with clearing. Several major exchanges (including NYSE, NASDAQ, LSE, TSE) use such a CDA marketplace today for price formation [Aldridge, 2013]. Following these rules we develop a CDA market simulator (Fig.1) which may be considered a plausible realization of a real stock market. Our virtual electronic market accepts orders, placed with a certain rate (intensity), into a waiting queue of submitted orders, on which it operates like a limit order book. Limit Order Book. The Limit Order Book (LOB) is an engine that drives the process of price formation [Cont, 2011]. The LOB [Gould et al., 2013] manipulates and maintains the orders to buy and sell an asset arranged in two corresponding priority heaps (with pricetime priority, where time refers to the arrival time). Our LOB receives market and limit orders and also allows spontaneous order cancellations. The market orders are executed immediately at the best available price. The limit orders come with specified price and if not fulfilled (satisfied) they are inserted into the book (that is, they may not be executed instantaneously, and even may not be executed at all). The limit orders provide quantities of demand and supply, and thus they contribute to sustaining the liquidity in the market. The market orders cause execution of limit orders from the book, thus they consume the liquidity. The LOB handles the orders depending on their price, size (quantity of shares), and arrival time. A trade is triggered when the price of the newly arrived order matches this of the best (outstanding) opposite order price. If the order size is not fully filled it is walked through from the best to the worst opposite order so as to complete it if further price matches exist. Technically this is facilitated keeping the bid orders sorted by decreasing price, and the ask orders ranked by increasing price. When a trade occurs its formulating offers are deleted from the book (then new outstanding bids and asks arise). The orders persist in the book until matched or until cancelled. The book is cleaned periodically (at the end of each trading session) by removing obsolete unmatched orders. After processing every order the book is readjusted. A trade excution is followed by recording its parameters, updating the asset statistics (VWAP, liquidity), and informing the agents. Fig.1. Diagram of the structure of the automated algorithmic trading system. Boxes represent functions performed on the incoming (and on the internal data), along with their parameters (as well as some internal data). The labels on the arrows show the data flowing on them. Trading Agents. The typical software trading agents are static and controlled by heuristic preprogrammed strategies. Popular baseline static trading players are the zerointelligence (ZI) [Gode and Sunder, 1993] agents that place random orders, but in such a way that the setting of their type, price, size (and intensity) leads to realistic aggregate order flows. Such are the minimal intelligence ZIP agents [Cliff and Bruten, 1997], the patient and impatient agents [Farmer et al., 2005]. The impatient agents send only market orders, while the patient agents send only limit orders. Other auction participants are the market makers (MM) [Aldridge, 2013] who provide liquidity continuously on the market posting limit orders with desired prices slightly lower than the best bid and slightly higher than the best asking price. Every agent is given the opportunity to enter the market at any time (using discretetime representation) with certain probability per time step. That is, the agents are activated asynchronously (stochastically) and next their orders are accepted in our system sequentially in the time (with zerolatency). Dynamic Learning Agents. The key aspect of our research is design of intelligent machine learning agents whose behaviour is controlled by dynamic neural networks [Nikolaev and Iba, 2006]. Our kind of agent is trained using recent market information to improve its bidding strategy and to become more efficient. The neural network is calibrated with a dynamic algorithm to adapt its performance to the changing market activity. The network is given as inputs returns from midprices (in the middle of the bid/ask spread) and predicts the midprice change direction, which helps us to generate useful trading signals. We use small input vectors (fixedlength memory) that reflect only a short history of past returns to force the agent to be more consistent with the recent past. A positive network output indicates to take a long position, while a negative output suggests to take a short position. The order price is determined using the network forecast and takes into account the spread (and eventually the price grid from tick values). The price (return) is included in the training formula together with the transaction cost, and they together have impact on the learning process as well as on the investment results. The conceptual strength of our agent is due to the training objective to maximize the Sharpe ratio criterion, instead of direct prediction of prices (or returns). The neural network training algorithm maximises the agent performance using feedback action signals without forecasting returns and without explicitly provided targets from an external supervisor. Thus, the learning agent is stimulated to achieve higher profits with lower risk exposure, that is to take decisions leading to reduced risk performance via rewards with the Sharpe ratio. Our dynamic agent is riskaverse, while the static agents are riskneutral. The trading recommendations made by the neural machine are restricted by additional parameters to protect from losses and mitigate risks in cases of higher market uncertainty. We also use stoploss indicators and predefined limits for maximum allowed drawdown. The novelty in our learning agent is that it includes an stochastic volatility factor [Nikolaev et al., 2014] in the mean of the temporal neural network model, and this is an enhancement to previous Sharpe ratio maximizing neural networks [Choey and Weigend, 1997]. Volatility is the timedependent (heteroskedastic) variance in series of returns on prices, which is considered predictable to a great extent [Bollerslev, Engle and Nelson, 1994]. The rationale for accommodating the volatility in the neural network model is to improve its forecasting potential and to capture volatility clustering effects observed in financial time series of returns on assets (without being fooled by outliers). In this sense, the design of our agent uses statistical modeling as well as machine learning techniques to predict movements in midprices and to construct an efficient computational tool for limit order markets. Currently we carry out experiments to study the usefulness of our dynamic learning agent, implemented as a Market Taker, in highfrequency trading of S&P500 index futures (Matlab code coming soon). The plots below report preliminary results obtained using a series of S&P500 index prices over the period from 2 November 1995 till 1 November 2013. The curves in the figures are generated using the initial 500 points for training (insample data), and the remaining points for testing (outofsample data). Figure 2 offers plots of the cumulative return achieved by our agent using a longshort trading strategy on the S&P500 in comparison with the price changes in this period (which may be considered as the return from a buyandhold strategy). The cumulative return is a wellknown measure for evaluating the overall predictive performance of a trading strategy. Figure 3 plots the distribution of these cumulative returns and a Gaussian fit to show that it is close to normal. Figure 4 plots the maximum drawdown of these returns as an illustration of the eventual losses that can be made during the same trading period. The maximum drawdown is a risk measure defined as the maximum drop from the previous peak in the series. A low maximum drawdown means that the trader strategy is successfull, and a high maximum drawdown indicates that the trader will make a huge loss. Figure 5 demonstrates how the evolution of the online Sharpe ratio affects the changes in the total Sharpe ratio up to the particular moment. The online Sharpe ratio shows the influence of the current, instantaneous return and risk on the agent performance during training.
Further research will investigate the application of the learning agent for guiding Market Makers to estimate more accurately the supply and demand in the market. The expected advantage of the learning agent as a Market Maker is its ability to make inferences from the orders that help to set precisely the quoting ask and bid prices. 
Nikolaev,N., de Menezes,L. and Smirnov,R. (2014). Nonlinear Asymmetric Stochastic Volatility Filtering, In: Proc. IEEE Conf. Computational Intelligence for Financial Engineering and Economics (CIFEr2014), London.
Kissell,R. (2014). The Science of Algorithmic Trading and Portfolio Management, Academic Press (Elsevier Inc.), San diego, CA.
Aldridge,I. (2013). HighFrequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems, J.Wiley and Sons, Hoboken, NJ.
Gould,M.D., Porter,M.A., Williams,S., McDonald,M., Fenn,D.J. and Howison.S. (2013). Limit Order Books, Quantitative Finance, vol.13, N:11, pp.17091742.
Treleaven,P., Galas,M. and Lalchand,V. (2013). Algorithmic Trading Review, Communications of the ACM, vol.56, N:11, pp.7685.
Cont,R. (2011). Statistical Modeling of High Frequency Financial Data: Facts, Models and Challenges, IEEE Signal Processing, vol.28, N:5, pp.1625.
Johnson,B. (2010). Algorithmic Trading and DMA: An Introduction to Direct Access Trading Strategies, 4Myeloma Press, London, UK.
Nikolaev,N. and Iba,H. (2006). Adaptive Learning of Polynomial Networks: Genetic Programming, Backpropagation and Bayesian Methods, Springer, New York, 2006. (www.amazon.com)
Farmer,J.D., Patelli,P. and Zovko,I. (2005). The Predictive Power of Zero Intelligence in Financial Markets, Proc. of the National Academy of Sciences of the USA, vol.102, pp.22542259.
Friedman,D. (2005). The Double Auction Market Institution: A Survey, In: Friedman,D. and Rust,J. (Eds.), The Double Auction Market: Institutions, Theories, and Evidence. Perseus Publ., Cambridge, MA, pp.325.
Choey,M and Weigend,A.S. (1997). Nonlinear Trading Models Through Sharpe Ratio Maximization, Int. Journal of Neural Systems, vol.8, N:4, pp.417431.
Cliff,D. and Bruten,J. (1997). MinimalIntelligence Agents for Bargaining Behaviors in MarketBased Environments, HP Labs Technical Report HPL9791, Bristol, UK.
Bollerslev,T., Engle,R.F. and Nelson,D.B. (1994). ARCH models, In: R.F.Engle and D.McFadden (Eds.), Handbook of Econometrics, Vol.IV, NorthHolland, Amsterdam.
Gode,D. and Sunder,S. (1993). Gode, Sunder (1993) Allocative Efficiency of Markets with ZeroIntelligence Traders: Market as a Partial Substitute for Individual Rationality, Journal of Political Economy, vol.101, N:1, pp.119137.