机器学习案例二：缺失时间序列数据填补与ESN(回声状态网络）

　　时间序列数据是一种与时间因素有关系的连续的数据，通常使用传感器等来获取，具有极高的应用价值，可以实时记录被监测设备或人的状态，同时可以用于预测建模，得到对某事件未来发展的一个期望。

在使用传感器进行数据采集的过程中，在没有备用传感器的情况下，会由于种种原因出现采集到的数据在某个时间段内数据缺失的现象。针对某个时间段内的部分数据缺失需要进行科学的验证，最重要的是要验证的是在数据缺失的前后传感器采集的数据是否发生了质的变化（如果发生则认为缺失数据前或后的数据是可用的，整体不可用）。

　　时间序列数据的填补不像单一缺失值的填补那么轻松，特别是在时间序列具有变化趋势和明显的周期波动现象。常用的时间学列填补方法的思路是从前到后填补、从后到前填补和两端同时开始填补。

　　本例中以某传感器采集的时间序列数据为基础，来使用具有递归性质的神经网络来对缺失的数据进行填补。（数据量在1500左右，数据量不是很大）

　　常用的具有递归性质的神经网络有Elman神经网络和ESN神经网络（由于本例数据较少，因此没有使用现在很流行的LSTM神经网络）。Elman神经网络的出现时间较早，原理较简单，这里介绍ESN神经网络。Jarger在2004年首先提出针对传统递归神经网络训练算法改进的新型递归神经网络，即回声状态网络（ESN）。对于BP神经网络中训练样本效率非常低的情况，回声状态网络凭借独特结构形态和训练方式有效避免了神经网络规模无法扩大以及局部最优情况。为了解决传统神经网络遇到的收敛慢和局部最小等问题，全新的ESN神经网络内部构造了储备池作为中心计算单元的重要结构，最大程度地模仿了生物神经元的构造和计算特征。由于没有使用梯度下降的学习算法，回声状态网络转而使用单次训练算法而非大量重复多次训练。另外模型中的复杂网络结构（储备池）由数量极大的神经元群相互连接，需要事先初始化储备池神经网络连接矩阵的权值，这使得ESN较之其他神经网络具有更好的稳定性。相对于其他神经网络而言，ESN能够更好的描述非线性混沌时间序列。

　　ESN的代码如下：

%% Prepare
clear all; 
disp('Loading data......');    

%% Data input
% Train data
traindata = '';
% Teach data
teachdata ='';

%% Data prepare
train = xlsread(traindata);
teach = xlsread(teachdata);

%% Exercise
tic
InputSequence = train;
OutputSequence = teach;
%% ESN
% 训练集和测试集
% [Am,An] = size(YA);
% tic
% InputSequence = YA(1:494,:);
% OutputSequence = input2;

% split the data into train and test

tic

train_fraction = 0.7 ; % use 50% in training and 50% in testing
[trainInputSequence, testInputSequence] = split_train_test(InputSequence,train_fraction);
[trainOutputSequence,testOutputSequence] = split_train_test(OutputSequence,train_fraction);

% generate an esn 
nInputUnits = 9; nInternalUnits = 50; nOutputUnits = 1; 
esn = generate_esn(nInputUnits, nInternalUnits, nOutputUnits, ...
    'spectralRadius',0.2,'inputScaling',[0.1;0.1;0.1;0.1;0.1;0.1;0.1;0.1;0.1],'inputShift',[0;0;0;0;0;0;0;0;0], ...
    'teacherScaling',[0.3],'teacherShift',[-0.2],'feedbackScaling', 0, ...
    'type', 'plain_esn'); 

esn.internalWeights = esn.spectralRadius * esn.internalWeights_UnitSR;

% train the ESN
nForgetPoints = 50 ; % discard the first 100 points
[trainedEsn, stateMatrix] = train_esn(trainInputSequence, trainOutputSequence, esn, nForgetPoints) ; 

nForgetPoints = 50 ; 
predictedTrainOutput = test_esn(trainInputSequence, trainedEsn, nForgetPoints);
predictedTestOutput = test_esn(testInputSequence,  trainedEsn, nForgetPoints) ; 

% create input-output plots
nPlotPoints = 60 ; 
nPlotPoints1 = 100 ; 
plot_sequence(trainOutputSequence(nForgetPoints+1:end,:), predictedTrainOutput, nPlotPoints1,...
    'training: teacher sequence (red) vs predicted sequence (blue)');
grid on;
plot_sequence(testOutputSequence(nForgetPoints+1:end,:), predictedTestOutput, nPlotPoints, ...
    'testing: teacher sequence (red) vs predicted sequence (blue)') ; 
grid on;

%compute NRMSE training error
trainError = compute_error(predictedTrainOutput, trainOutputSequence); 
disp(sprintf('train NRMSE = %s', num2str(trainError)))

%compute NRMSE testing error
testError = compute_error(predictedTestOutput, testOutputSequence); 
disp(sprintf('test NRMSE = %s', num2str(testError)))

disp('训练结束!');
toc

　　ESN神经网络的特殊结构需要调节的参数有隐含层神经元的个数、储备池的半径、输入信号的缩放比例、输入信号的偏移、输出信号的缩放比例和缩放信号的偏移。其中，隐含层的神经元个数对模型的预测精度影响最大，剩余的其他参数中，储备池的半径也对预测精度有较大的影响。在进行仿真的时候，神经元的传递函数选择ESN神经网络中的plain_esn。

　　因此在使用ESN神经网络的时候，主要需要调整的参数是隐含层的神经元个数和储备池的半径，一下是ESN的主函数：

nInputUnits = 9; nInternalUnits = 50; nOutputUnits = 1; 
esn = generate_esn(nInputUnits, nInternalUnits, nOutputUnits, ...
    'spectralRadius',0.2,'inputScaling',[0.1;0.1;0.1;0.1;0.1;0.1;0.1;0.1;0.1],'inputShift',[0;0;0;0;0;0;0;0;0], ...
    'teacherScaling',[0.3],'teacherShift',[-0.2],'feedbackScaling', 0, ...
    'type', 'plain_esn');

需要调节的是：1）nInternalUnits（隐含层的神经元个数）和 2）spectralRadius（储备池的半径）。

通过调节参数就可以进行数据的预测填补了。

plain_esn的源码：

function internalState = plain_esn(totalstate , esn , varargin)

% PLAIN_ESN computes the new internal states of the ESN by using the simple
% esn equations
%
% input arguments:
% totalstate: the previous totalstate, vector of size 
%     (esn.nInternalUnits + esn.nInputUnits + esn.nOutputUnits) x 1
% esn: the ESN structure
%
% output: 
% internalState: the updated internal state, size esn.nInternalUnits x 1
%
% Created April 30, 2006, D. Popovici
% Copyright: Fraunhofer IAIS 2006 / Patent pending%
% Revision 1, June 6, 2006, H.Jaeger
% Revision 2, June 23, 2007, H. Jaeger (include esn.feedbackScaling)


internalState =  feval( esn.reservoirActivationFunction , ...
    [esn.internalWeights , esn.inputWeights , esn.feedbackWeights * diag(esn.feedbackScaling )] * totalstate)  ;   
%%%% add noise to the Esn 
internalState = internalState + esn.noiseLevel * (rand(esn.nInternalUnits,1) - 0.5) ;

还有其他源码需要去下载资源包(http://bbs.06climate.com/forum.php?mod=viewthread&tid=35933)

相关阅读:
五一训练礼包 — B
五一训练礼包—坐标问题
 单链表
 顺序表
 链表
 基础DP（3）
基础DP（2）
基础DP（1）
分治法
 最小表示法
原文地址：https://www.cnblogs.com/zhuozige/p/12877824.html