前言
深度学习技术在交易中的研究
深度学习最近受到了很多关注,特别是在图像分类和语音识别领域。然而,它的应用似乎并没有广泛应用到交易当中。这项调查涵盖了到目前为止作者(Greg Harris)发现相关的系统交易。(点击阅读原文获取原文PDF)
一些名词:
DBN = Deep BeliefNetwork(深度信念网络)
LSTM = LongShort-Term Memory(长短期记忆),一种时间递归神经网络
MLP = Multi-layer Perceptron(多层神经网络)
RBM = RestrictedBoltzmann Machine(限制玻尔兹曼机)
ReLU = RectifiedLinear Units(修正线性单元),激活函数
CNN =Convolutional Neural Network(卷积神经网络)
Limit OrderBook模型
Sirignano(2016)预测了limit order books的变化。他设计了一个可以利用局部空间结构的“空间神经网络”,他设计的网络可作为分类器而且比一般的神经网络计算效率更高。他建立模型以求出下一个状态的最佳买价、卖价的联合分布情况。同时,也能求出其中之一(买/卖价)的改变对另外一个的影响。
Architecture – Each neural network has 4 layers. The standard neuralnetwork has 250 neurons per hidden layer, and the spatial neural network has50. He uses the tanh activation function on the hidden layer neurons.
Training – He trained and tested on order books from 489 stocks from 2014 to 2015(a separate model for each stock). He uses Level III limit order book data fromthe NASDAQ with event times having nanosecond decimal precision. Traininginvolved 50TB of data and used a cluster with 50 GPUs. He includes 200features: the price and size of the limit order book across the first 50non-zero bid and ask levels. He uses dropout to prevent overfitting. He usesbatch normalization between each hidden layer to prevent internal covariateshift. Training is done with the RMSProp algorithm. RMSProp is similar tostochastic gradient descent with momentum but it normalizes the gradient by arunning average of the past gradients. He uses an adaptive learning rate wherethe learning rate is decreased by a constant factor whenever the training errorincreases over a training epoch. He uses early stopping imposed via avalidation set to reduce overfitting. He also includes an l^2 penalty whentraining in order to reduce overfitting.
Results – He shows that limit order books exhibit some degree of local spatialstructure. He predicts the order book 1 second ahead and also at the time ofthe next bid/ask change. The spatial neural network outperforms the standardneural network and logistic regression with non-linear features. Both neuralnetworks have 10% lower error than logistic regression.
基于价格的分类模型
Dixon(etal.)(2016)使用了一个深度神经网络去预测未来5分钟的价格变化的信号,曾在43种大宗商品和外汇期货中使用。
Architecture – Their input layer has 9,896 neurons for inputfeatures made up of lagged price differences and co-movements betweencontracts. There are 5 learned fully-connected layers. The first of the fourhidden layers contains 1,000 neurons, and each subsequent layer tapers by 100neurons. The output layer has 135 neurons (3 for each class {-1, 0, 1} times 43contracts).
Training – They used the standard back-propagation with stochastic gradientdescent. They speed up training by using mini-batching (computing the gradienton several training examples at once rather than individual examples). Ratherthan an nVidia GPU, they used an Intel Xeon Phi co-processor.
Results – They report 42% accuracy, overall, for three-class classification.They do some walk-forward training instead of a traditional backtest. Theirboxplot shows some generally positive Sharpe ratios from the mini-backtests foreach contract. They did not include transaction costs or crossing the bid-askspread. All their predictions and features were based on the mid-price at theend of each 5-minute time period.
Takkeuchi andLee(2013)研究了动量效应对预测股票月收益率的影响。
Architecture – They use an auto-encoder composed of stacked RBMs toextract features from stock prices which they then pass to a feed-forwardneural network classifier. Each RBM consists of one layer of visible units andone layer of hidden units connected by symmetric links. The first layer has 33units for input features from one stock at a time. For every month t, thefeatures include the 12 monthly returns for month t-2 through t-13 and the 20daily returns approximately corresponding to month t. They normalize each ofthe return features by calculating the z-score relative to the cross-section ofall stocks for each month or day. The number of hidden units in the final layerof the encoder is sharply reduced, forcing dimensionality reduction. The outputlayer has 2 units, corresponding to whether the stock ended up above or belowthe median return for the month. Final layer sizes are 33-40-4-50-2.
Training – During pre-training, they split the dataset into smaller,non-overlapping mini-batches. Afterwards, they un-roll the RBMs to form anencoder-decoder, which is fine-tuned using back-propagation. They consider allstocks trading on the NYSE, AMEX, or NASDAQ with a price greater than $5. Theytrain on data from 1965 to 1989 (848,000 stock-month samples) and test on datafrom 1990 to 2009 (924,300 stock-month samples). Some training data held-outfor validation for the number of layers and the number of units per layer.
Results – Their overall accuracy is around 53%. When they consider thedifference between the top decile and the bottom decile predictions, they get3.35% per month, or 45.93% annualized return.
Batres-Estrada(2015)预测了在给定的交易日中哪些股票会有高于中位数的回报(基于标准普尔500)。他的研究对Takeuchi和Lee(2013)的研究也产生了影响。
Architecture – He uses a 3-layer DBN coupled to an MLP. He uses 400neurons in each hidden layer, and he uses a sigmoid activation function. Theoutput layer is a softmax layer with two output neurons for binaryclassification (above median or below). The DBN is composed of stacked RBMs,each trained sequentially.
Training – He first pre-trains the DBN module, then fine-tunes the entire DBN-MLPusing back-propagation. The input includes 33 features: monthly log-returns formonths t-2 to t-13, 20 daily log-returns for each stock at month t, and anindicator variable for the January effect. The features are normalized usingthe Z-score for each time period. He uses S&P 500 constituent data from1985 to 2006 with a 70-15-15 split for training-validataion-test. He uses thevalidation data to choose the number of layers, the number of neurons, and theregularization parameters. He uses early-stopping to prevent over-fitting.
Results – His model has 53% accuracy, which outperforms regularized logisticregression and a few MLP baselines.
Sharang andRao(2015)使用了DBN(深度信念网络)训练的技术指标对投资组合进行分类。
Architecture – They use a DBN consisting of 2 stacked RBMs. Thefirst RBM is Gaussian-Bernoulli (15 nodes), and the second RBM is Bernoulli (20nodes). The DBN produces latent features which they try feeding into threedifferent classifiers: regularized logistic regression, support vectormachines, and a neural network with 2 hidden layers. They predict 1 ifportfolio goes up over 5 days, and -1 otherwise.
Training – They train the DBN using a contrastive divergence algorithm. Theycalculate signals based on open, high, low, close, open interest, and volumedata, beginning in 1985, with some points removed during the 2008 financialcrisis. They use 20 features: the “daily trend” calculated over different time frames, and thennormalized. All parameters are chosen using a validation dataset. When trainingthe neural net classifier, they mention using a momentum parameter duringmini-batch gradient descent training to shrink the coefficients by half duringevery update.
Results – The portfolio is constructed using PCA to be neutral to the firstprincipal component. The portfolio is an artificial spread of instruments, soactually trading it is done with a spread between the ZF and ZN contracts. Allinput prices are mid-prices, meaning the bid-ask spread is ignored. The resultslook profitable, with all three classification models performing 5-10% moreaccurately than a random predictor.
Zhu(et al.)(2016)使用了基于深度信念网络的箱体震荡理论来进行决策。箱体震荡理论认为股票的价格会在一个确定的范围内(箱体)震荡,如果价格超出这个范围,那么股票价格会完全进入一个新的箱体。他们的交易策略就是在突破箱体顶部时买入和在跌穿箱体底部时卖出。
Architecture – They use a DBN made up of stacked RBMs and a finalback-propagation layer.
Training – They used block Gibbs sampling to greedily train each layer fromlowest to highest in an unsupervised way. They then train the back-propagationlayer in a supervised way, which fine-tunes the whole model. They chose 400stocks out of the S&P 500 for testing, and the test set covers 400 daysfrom 2004 to 2005. They use open, high, low, close prices as well as technicalanalysis indicators, for a total of 14 model inputs. Some indicators are givenmore influence in the prediction through the use of “gray relation analysis” or “gray correlation degree.”
Results – In their trading strategy, they charge 0.5% transaction costs pertrade and add a couple of parameters for stop-loss and “transaction rate.” I don’t fully understand the result tables, but they seem tobe reporting significant profits.
波动率预测
Xiong (etal.)(2015)根据估算出来的开、高、低、收价格预测了标准普尔500指数的日波动率。
Architecture – They use a single LSTM hidden layer consisting of oneLSTM block. For inputs they use daily S&P 500 returns and volatilities.They also include 25 domestic Google trends, covering sectors and major areasof the economy.
Training – They used the “Adam” method with 32 samples per batch and meanabsolute percent error (MAPE) as the objective loss function. They set themaximum lag of the LSTM to include 10 successive observations.
Results – They show their LSTM method outperforms GARCH, Ridge, and LASSOtechniques.
波基于文本的分类模型
Rönnqvist andSarlin(2016)使用新闻文章来预测银行的运营状况。具体来说,他们建立了一个分类器用来判断一个句子表示的是处于困难时期还是平稳时期。
Architecture – They use two neural networks in this paper. The firstis for semantic pre-training to reduce dimensionality. For this, they run asliding window over text, taking a sequence of 5 words and learning to predictthe next word. They use a feed-forward topology where a projection layer in themiddle provides the semantic vectors once the connection weights have beenlearned. They also include the sentence ID as an input to the model, to providecontext and inform the prediction of the next word. They use binary Huffmancoding to map sentence IDs and word to activation patterns in the input layer,which organizes the words roughly by frequency. They say feed-forwardtopologies with fixed context sizes are more efficient than recurrent neuralnetworks for modeling text sequences. The second neural network is forclassification. Instead of a million inputs (one for each word), they use 600inputs from the learned semantic model. The first layer has 600 nodes, themiddle layer has 50 rectified linear hidden nodes, and the output layer has 2nodes (distress/tranquil).
Training – They train it with 243 distress events over 101 banks observed duringthe financial crisis of 2007-2009. They use 716k sentences mentioning thebanks, taken from 6.6m Reuters news articles published during and after thecrisis.
Results – They evaluate their classification model using a custom “Usefulness” measure. The evaluation is done usingcross-validation, leaving N banks out in each fold. They aggregate the distresscounts into various timeseries but don’t go so far as to consider creating a tradingstrategy.
Fehrer andFeuerriegel(2015)训练了一个基于新闻标题的模型用来预测德国的股票收益。
Architecture – They use a recursive autoencoder with an additionalsoftmax layer in each autoencoder for estimating probabilities. They performthree-class prediction {-1, 0, 1} for the following day’s return of the stock associated with theheadline.
Training – They initialize the weights with Gaussian noise, and then updatethrough back-propagation. They use an English ad-hoc news announcement dataset(8,359 headlines) for the German market covering 2004 to 2011. Results – Their recursive autoencoder has 56% accuracy, which in an improvementover a more traditional random forest modeling approach with 53% accuracy. Theydo not develop a trading strategy. They have made a Java implementation oftheir code publicly available.
Ding (etal.)(2015)使用从新闻标题中提取出来的结构化信息来预测标准普尔500指数的变化。他们用OPEN IE(Open information Extraction,不是打开IE=.=)来处理新闻标题,并获得新闻事件所表达的信息(人,事,物,时)。与其他普通的网络不同的是,他们使用了张量神经网络学习语义组合。
Architecture – They combine short-term and long-term effects ofevents, using a CNN to perform semantic composition over the input eventsequence. They use a max pooling layer on top of the convolutional layer, whichmakes the network retain only the most useful features produced by theconvolutional layer. They have separate convolutional layers for long-termevents and mid-term events. Both of these layers, along with an input layer forshort-term events, feed into a hidden layer which then feeds into two outputnodes.
Training – They extracted 10 million events from Reuters and Bloomberg news. Fortraining, they corrupt events by replacing one event argument with a randomargument. During training, they assume that the actual event should be given ahigher score than the corrupted event. When it isn’t, model parameters get updated.
Results – They find that structured events are better features than words forstock market prediction. Their approach outperforms baseline methods by 6%.They make predictions for the S&P 500 index and 15 individual stocks, and atable appears to show that they can predict the S&P 500 with 65% accuracy.
投资组合模型
Heaton (etal.)(2016)试图寻找一个比生物科技指数IBB表现更好的投资组合。他们有目标地跟踪指数和一些股票,并尝试在大幅下跌的情况下仍然能跑赢指数。他们使用支持非线性结构的拟合模型,而不是直接对协方差矩阵建模。
Architecture – They use auto-encoding with regularization and ReLUs.Their auto-encoder has one hidden layer with 5 neurons.
Training – They use weekly return data for the component stocks of IBB from 2012to 2016. They auto-encode all stocks in the index and evaluate the differencebetween each stock and its auto-encoded version. They keep the 10 most “communal” stocks that are most similar to the auto-encodedversion. They also keep a varying number of other stocks, where the number ischosen with cross-validation.
Results – They show the tracking error as a function of the number stocksincluded in the portfolio, but don’t seem to compare against traditional methods. Theyalso replace index drawdowns with positive returns and find portolios thattrack this modified index.