• 【心跳信号分类预测】Datawhale打卡- Task 2 数据分析


    【心跳信号分类预测】Datawhale打卡- Task 2 数据分析

    教程里的数据分析基本没有意义,核心数据列为heartbeat_signals,这个才是重点,并且需要考虑到该列的时序意义。

    尝试从:

    • 分解heartbeat_signals的所有列
    • 绘制前10个的心电图的plot图
    • 绘制label为1的分别的10个plot图
    import pandas as pd
    win_file_path = 'E:\competition-data\016_heartbeat_signals\'
    train = pd.read_csv(win_file_path+'train.csv')
    test = pd.read_csv(win_file_path+'testA.csv')
    

    输出行列信息

    print('train.shape', train.shape)
    print('test.shape', test.shape)
    
    train.shape (100000, 3)
    test.shape (20000, 2)
    
    train.head(1)
    
    
    
    id heartbeat_signals label
    0 0 0.9912297987616655,0.9435330436439665,0.764677... 0.0

    判断数据缺失和异常 (都没有)

    data.isnull().sum()——查看每列的存在nan情况

    train.isnull().sum()
    
    id                   0
    heartbeat_signals    0
    label                0
    dtype: int64
    
    test.isnull().sum()
    
    id                   0
    heartbeat_signals    0
    dtype: int64
    

    了解预测值的分布

    train['label'].describe()
    
    count    100000.000000
    mean          0.856960
    std           1.217084
    min           0.000000
    25%           0.000000
    50%           0.000000
    75%           2.000000
    max           3.000000
    Name: label, dtype: float64
    
    train['label'].value_counts()
    
    0.0    64327
    3.0    17912
    2.0    14199
    1.0     3562
    Name: label, dtype: int64
    

    2.3.7 用pandas_profiling生成数据报告

    import pandas_profiling
    
    pfr = pandas_profiling.ProfileReport(data_train)
    pfr.to_file("./example.html")
    

    这里完全不适用, pandas_profiling, 至少目前来说.

    尝试 独立的数据分析:

    • 分解heartbeat_signals的所有列
    • 绘制前5个的心电图的plot图
    • 绘制label为0-3的分别的5个plot重叠图

    分解heartbeat_signals的所有列

    train['heartbeat_signals'] = train['heartbeat_signals'].astype('string')
    x = train['heartbeat_signals'].str.split(",", expand=True)
    x
    
    0 1 2 3 4 5 6 7 8 9 ... 195 196 197 198 199 200 201 202 203 204
    0 0.9912297987616655 0.9435330436439665 0.7646772997256593 0.6185708990212999 0.3796321642826237 0.19082233510621885 0.040237131594430715 0.02599520771717858 0.03170886048677242 0.06552357497104398 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    1 0.9714822034884503 0.9289687459588268 0.5729328050711678 0.1784566262750076 0.1229615224365985 0.13236021729815928 0.09439236984499814 0.08957535516351411 0.030480606866741047 0.04049936195430977 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    2 1.0 0.9591487564065292 0.7013782792997189 0.23177753487886463 0.0 0.08069805776387916 0.12837603937503544 0.18744837555079963 0.28082571505275855 0.3282610568488903 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    3 0.9757952826275774 0.9340884687738161 0.6596366611990001 0.2499208267606008 0.23711575621286213 0.28144491730834825 0.2499208267606008 0.2499208267606008 0.24139674778512604 0.2306703464848836 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    4 0.0 0.055816398940721094 0.26129357194994196 0.35984696254197834 0.43314263962884686 0.45369772898632504 0.49900406742109477 0.5427959768500487 0.6169044962835193 0.6766958323316207 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    99995 1.0 0.677705342021188 0.22239242747868546 0.2571578307224994 0.20469042415279454 0.05466497618736314 0.026152286890497062 0.11818142707296006 0.24483757081121627 0.3289485158861968 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    99996 0.9268571578157265 0.9063471198026871 0.6369932212888393 0.41503751002775946 0.37474480119929776 0.3825812845814957 0.35894293360916163 0.34135861850914284 0.3365254578264915 0.3170292884548231 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    99997 0.9258351628306013 0.5873839035878395 0.6332261741951388 0.6323533645350808 0.6392827243034813 0.6142923239940205 0.5991551019747257 0.5176324324889339 0.4038033525475481 0.2531748788594435 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    99998 1.0 0.9947621698382489 0.8297017704865509 0.45819277171637834 0.26416169623741237 0.24022845026183584 0.21376575735540573 0.18929103849637752 0.20381573166587716 0.21086610220048516 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    99999 0.9259994004527861 0.916476635326053 0.4042900774399834 0.0 0.2630344094167657 0.3854310437765884 0.3610665021846972 0.33270794046870034 0.33985000288462475 0.3504972538285509 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

    100000 rows × 205 columns

    type(x)
    
    
    pandas.core.frame.DataFrame
    

    绘制5个的心电图的plot图 (train)

    from matplotlib import pyplot as plt
    import numpy as np
    import sys
    
    
    for i in range(0, 5):
        plt.figure(12)
        val = [float(x) for x in np.array( x.iloc[i,:])]
        plt.subplot(5,1,i+1)
        my_y_ticks = np.arange(-1.0, 1.0, 0.1)
        plt.yticks(my_y_ticks)
        plt.plot(val)
    
    plt.show()
    

    绘制label为0-3的分别的5个plot重叠图 (train)

    print('')
    for _label in range(0,4):
        for random_state in [2020]:
            spl = train[train['label']==_label].sample(n=5, random_state=random_state)
            sample = x[x.index.isin(spl.index)]
            for i in range(spl.shape[0]):
                float_val =[float(x) for x in np.array(sample.iloc[i, :])]
                plt.plot(float_val)
    
            plt.title("_label=" + str(_label)+",random_state="+str(random_state))
            plt.yticks(np.arange(0, 1.0, 0.1))
            plt.xticks(np.arange(0, 250, 25))
            plt.figure(figsize=(32, 8))
            plt.show()
    

    你不逼自己一把,你永远都不知道自己有多优秀!只有经历了一些事,你才会懂得好好珍惜眼前的时光!
  • 相关阅读:
    微信用户授权,获取code
    关于swiper在vue中不生效的问题
    ES6-Set 和 Map 数据结构
    Js中caller和callee的区别
    类与对象基础总结--继承,多态
    java 类与对象基础整理
    java 的数据库操作--JDBC
    Socket的长连接和短连接
    java 的底层通信--Socket
    算法--树与递归
  • 原文地址:https://www.cnblogs.com/zhazhaacmer/p/14540604.html
Copyright © 2020-2023  润新知