• 2.8数据-paddlepaddle数据集uci_housing


    UCI Housing数据集

    • 该模块将从 https://archive.ics.uci.edu/ml/machine-learning-databases/housing/ 下载数据集,并将训练集和测试集解析为paddle reader creator
    • 每个样本都是正则化和价格编号后的特征

    paddle.dataset.uci_housing:https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/data/dataset_cn/uci_housing_cn.html


    import paddle
    import paddle.fluid as fluid
    import numpy as np
    import paddle.dataset.uci_housing as uci_housing
    
    # 返回一个reader creator,reader中的每个样本都是正则化和价格编号后的特征
    train=uci_housing.train() # <function paddle.dataset.uci_housing.train.<locals>.reader()>
    '''
    [==================================================]i_housing/housing.data not found, downloading http://paddlemodels.bj.bcebos.com/uci_housing/housing.data
    '''
    test=uci_housing.test() # <function paddle.dataset.uci_housing.train.<locals>.reader()>
    
    a_sample=next(train())
    
    print(len(a_sample)) # 2
    print(a_sample[1]) # [24.]
    print(a_sample[0].shape) # (13,)
    print(a_sample[0]) # [-0.0405441   0.06636364 -0.32356227 -0.06916996 -0.03435197  0.05563625 -0.03475696  0.02682186 -0.37171335 -0.21419304 -0.33569506  0.10143217 -0.21172912]
    
    len(uci_housing.feature_names) # 13
    uci_housing.feature_names
    '''
    ['CRIM',
      'ZN',
      'INDUS',
      'CHAS',
      'NOX',
      'RM',
      'AGE',
      'DIS',
      'RAD',
      'TAX',
      'PTRATIO',
      'B',
      'LSTAT']
    ''' 
    
    # 13 var x ,1 var y
    # uci_housing.UCI_TEST_DATA.shape:(102, 14)
    # uci_housing.UCI_TRAIN_DATA.shape:(404, 14)
    uci_housing.UCI_TEST_DATA
    '''
    array([[ 0.42616306, -0.11363636,  0.25525005, ..., -0.0686218 ,
             0.40637243,  8.5       ],
           [ 0.72279828, -0.11363636,  0.25525005, ...,  0.07134996,
             0.28495962,  5.        ],
           [ 0.19222996, -0.11363636,  0.25525005, ...,  0.03415696,
             0.2948934 , 11.9       ],
           ...,
           [-0.03993221, -0.11363636,  0.02907703, ...,  0.10143217,
            -0.1935172 , 23.9       ],
           [-0.03938337, -0.11363636,  0.02907703, ...,  0.09273279,
            -0.17033839, 22.        ],
           [-0.04008226, -0.11363636,  0.02907703, ...,  0.10143217,
            -0.13170704, 11.9       ]])
    '''
    
    
    
  • 相关阅读:
    parallel-fastq-dump是一个大坑
    生信软件安装(2)
    2018年一些感悟
    raw data/PF data/Q30 data/clean data的不同
    专题
    结构体
    指针和数组
    指针
    函数的声明
    C语言中的函数
  • 原文地址:https://www.cnblogs.com/onenoteone/p/12441672.html
Copyright © 2020-2023  润新知