• 量化投资学习笔记31——《Python机器学习应用》课程笔记05


    用分类算法进行上证指数涨跌预测。
    根据今天以前的150个交易日的数据,预测今日股市涨跌。

    交叉验证的思想:将数据集D划分为k个大小相似的互斥子集,每个子集都尽可能保持数据分布的一致性,即从D中通过分层抽样来得到。然后,每次用k-1个子集的并集作为训练集,余下的那个子集作为测试集。这样可以获得k组训练/测试集,从而可进行k次训练/测试,最终返回的是这k个测试结果的均值。通常称为"k者交叉验证",常用取值是10。

    # coding:utf-8
    # 用分类算法预测股市涨跌
    
    
    import pandas as pd
    import numpy as np
    from sklearn import svm
    from sklearn.model_selection import train_test_split
    import tushare as ts
    
    
    if __name__ == "__main__":
        # 读取股票数据
        data = pd.read_csv("HS300_his.csv")
        print(data.head())
        data.sort_index(0,ascending=True,inplace=True) 
        print(data.head())
        dayfeature = 150
        featurenum = 4*dayfeature
        x = np.zeros((data.shape[0] - dayfeature, featurenum + 1))
        y = np.zeros((data.shape[0] - dayfeature))
        for i in range(0, data.shape[0] - dayfeature):
            x[i, 0:featurenum] = np.array(data[i:i+dayfeature][["close", "open", "low", "high"]]).reshape((1, featurenum))
            x[i, featurenum] = data.ix[i + dayfeature]["open"]
        for i in range(0, data.shape[0] - dayfeature):
            if data.ix[i + dayfeature]["close"] >= data.ix[i + dayfeature]["open"]:
                y[i] = 1
            else:
                y[i] = 0
        # 建模
        clf = svm.SVC(kernel = "rbf")
        result = [] 
        for i in range(5):
            x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)
            clf.fit(x_train, y_train)
            result.append(np.mean(y_test == clf.predict(x_test)))
        print("用rbf核函数的预测准确率:")
        print(result)
        
        clf = svm.SVC(kernel = "sigmoid")
        result = [] 
        for i in range(5):
            x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)
            clf.fit(x_train, y_train)
            result.append(np.mean(y_test == clf.predict(x_test)))
        print("用sigmoid核函数的预测准确率:")
        print(result)
    

    预测结果
    用rbf核函数的预测准确率: [0.6842105263157895, 0.5263157894736842, 0.47368421052631576, 0.47368421052631576, 0.5263157894736842]
    用sigmoid核函数的预测准确率: [0.47368421052631576, 0.6842105263157895,
    0.5263157894736842, 0.42105263157894735, 0.5789473684210527]
    可以看到预测成功率50%左右,跟瞎猜差不多。
    本文代码:
    https://github.com/zwdnet/MyQuant/blob/master/30

    我发文章的四个地方,欢迎大家在朋友圈等地方分享,欢迎点“在看”。
    我的个人博客地址:https://zwdnet.github.io
    我的知乎文章地址: https://www.zhihu.com/people/zhao-you-min/posts
    我的博客园博客地址: https://www.cnblogs.com/zwdnet/
    我的微信个人订阅号:赵瑜敏的口腔医学学习园地

  • 相关阅读:
    python之路3-元组、列表、字典、集合
    python之路2-字符串操作
    Python之路1-变量、数据类型、循环语法
    config模块
    os模块
    logging模块
    控制台报错定位问题所在
    time模块
    random模块
    列表生成
  • 原文地址:https://www.cnblogs.com/zwdnet/p/12390462.html
Copyright © 2020-2023  润新知