• 3.2:pandas数据的导入与导出【CSV,JSON】


    一:CSV数据

      一】:导入数据

        1)从CSV文件读入数据:pd.read_csv("文件名"),默认以逗号为分隔符

          D:dataex1.csv文件内容:                    D:dataex2.csv文件内容

            a,b,c,d,message            1,2,3,4,hello
            1,2,3,4,hello             5,6,7,8,world
            5,6,7,8,world              9,10,11,12,foo
            9,10,11,12,foo            

     1 In [3]: df1 = pd.read_csv('D:dataex1.csv')    #打开后默认添加index为从0自增长,columns默认用第一行数据
     2 
     3 In [4]: df1
     4 Out[4]:
     5    a   b   c   d message
     6 0  1   2   3   4   hello
     7 1  5   6   7   8   world
     8 2  9  10  11  12     foo
     9 
    10 In [15]: df2 = pd.read_csv('D:dataex2.csv')
    12 In [16]: df2
    13 Out[16]:
    14    1   2   3   4  hello
    15 0  5   6   7   8  world
    16 1  9  10  11  12    foo
    17 
    18 In [17]: df2 = pd.read_csv('D:dataex2.csv',header=None)        #header参数指定columns都为从0自增长的数
    20 In [18]: df2
    21 Out[18]:
    22    0   1   2   3      4
    23 0  1   2   3   4  hello
    24 1  5   6   7   8  world
    25 2  9  10  11  12    foo
    26 
    27 In [8]: df2 = pd.read_csv('D:dataex2.csv',names=list('abcde'))    #用names参数指定columns的值
    29 In [9]: df2
    30 Out[9]:
    31    a   b   c   d      e
    32 0  1   2   3   4  hello
    33 1  5   6   7   8  world
    34 2  9  10  11  12    foo
    35 
    36 In [13]: df2 = pd.read_csv('D:dataex2.csv',names=list('abcde'),index_col='e')    #用index_col用指定的columns首元素作为index
    38 In [14]: df2
    39 Out[14]:
    40        a   b   c   d
    41 e
    42 hello  1   2   3   4
    43 world  5   6   7   8
    44 foo    9  10  11  12

        2)其他格式:pd.read_table('文件名', sep='划分依据'),划分依据可用正则表达式【s:空格等不可见字符】

          注:read_table方法几乎可以读所有的表格型数据,包括txt,csv等等

          D:dataex3.txt                      D:dataex4.txt

            A B C                           D A B C
            aaa -0.2 -1.02 -0.62                    aaa -0.2 -1.02 -0.62
            bbb 0.93 0.3 -0.03                     bbb 0.93 0.3 -0.03
            ccc -0.26 -0.39 -0.22                     ccc -0.26 -0.39 -0.22
            ddd -0.87 -0.35 1.1                      ddd -0.87 -0.35 1.1

     1 In [37]: df1 = pd.read_table('D:dataex3.txt',sep='s+')
     3 In [38]: df1
     4 Out[38]:
     5         A     B     C                          #以最小列数为准,取dataframe数据,且第一行数据作为columns,剩下的如果第一列作为多出则作为index,否者从0自增数作为index
     6 aaa -0.20 -1.02 -0.62
     7 bbb  0.93  0.30 -0.03
     8 ccc -0.26 -0.39 -0.22
     9 ddd -0.87 -0.35  1.10
    10 
    11 In [44]: df2 = pd.read_table('D:dataex4.txt',sep='s+')
    13 In [45]: df2
    14 Out[45]:
    15      D     A     B     C
    16 0  aaa -0.20 -1.02 -0.62
    17 1  bbb  0.93  0.30 -0.03
    18 2  ccc -0.26 -0.39 -0.22
    19 3  ddd -0.87 -0.35  1.10

        3)扩展技巧

          read_csv/read_table函数参数

          

          

          D:dataex5.csv                D:dataex6.csv

          #hey!                       something,a,b,c,d,message            
          a,b,c,d,message                  one,1,2,3,4,NA
          #just wanted to make things            two,5,6,,8,world
          #who are you                   three,9,10,11,12,foo
          1,2,3,4,hello
          5,6,7,8,world
          9,10,11,12,foo

     1 In [46]: df5 = pd.read_csv('D:dataex5.csv')
     3 In [47]: df5
     4 Out[47]:
     5                                            #hey!
     6 a                           b   c   d    message
     7 #just wanted to make things NaN NaN NaN      NaN
     8 #who are you                NaN NaN NaN      NaN
     9 1                           2   3   4      hello
    10 5                           6   7   8      world
    11 9                           10  11  12       foo
    12 
    13 In [48]: df5 = pd.read_csv('D:dataex5.csv',skiprows=[0,2,3])
    15 In [49]: df5
    16 Out[49]:
    17    a   b   c   d message
    18 0  1   2   3   4   hello
    19 1  5   6   7   8   world
    20 2  9  10  11  12     foo
    21 
    22 
    23 In [59]: df5 = pd.read_csv('D:dataex6.csv',nrows = 2)
    25 In [60]: df5
    26 Out[60]:
    27   something  a  b   c  d message
    28 0       one  1  2   3  4     NaN
    29 1       two  5  6 NaN  8   world
    30 
    31 In [55]: df6 = pd.read_csv('D:dataex6.csv',na_values={'message':['foo','NA'],'something':['two']})
    33 In [56]: df6
    34 Out[56]:
    35   something  a   b   c   d message
    36 0       one  1   2   3   4     NaN
    37 1       NaN  5   6 NaN   8   world
    38 2     three  9  10  11  12     NaN

      二】 数据的写出:to_csv('文件名' , [index=..., header=...] )

     1 In [7]: df = pd.read_csv('D:dataex1.csv',header=None)
     2 
     3 In [8]: df
     4 Out[8]:
     5    0   1   2   3        4
     6 0  a   b   c   d  message
     7 1  1   2   3   4    hello
     8 2  5   6   7   8    world
     9 3  9  10  11  12      foo
    10 
    11 In[9]:df.to_csv('D:dataout1.csv')
    12 
    13 In[10]:df.to_csv('D:dataout2.csv',index=False,header=False)    #即是把index和columns都弃掉,header表示columns

    二:JSON格式

       Json类型基本数据类型有对象(字典),数组(列表),字符串,数值,bool 以及 null。

      注:若是字典,键的类型必须是string

    In [22]: js = """{
       ....: "name":"Wes",
       ....: "places_lived":["US","Spain","Germany"],
       ....: "pet":null,
       ....: "siblings":[{"name":"Scott","age":25,"pet":"Zuko"},
       ....: {"name":"Katie","age":33,"pet":"Cisco"}]}
       ....: """
    
    In [26]: import json
    In [27]: data = json.loads(js)      #将json格式转化为python格式
    In [28]: data
    Out[28]:
    {u'name': u'Wes',
     u'pet': None,
     u'places_lived': [u'US', u'Spain', u'Germany'],
     u'siblings': [{u'age': 25, u'name': u'Scott', u'pet': u'Zuko'},
      {u'age': 33, u'name': u'Katie', u'pet': u'Cisco'}]}
    
    In [29]: #a_js = json.dumps(data)    #将python格式转化为json格式
    
    #最简单构造方法就是提取其中的数据,注意columns list中的值对应json数据中的需要提取的键并将其作为columns
    In [31]: siblings = pd.DataFrame(data['siblings'],columns=['name','age'])
    In [32]: siblings
    Out[32]:
        name  age
    0  Scott   25
    1  Katie   33

    三:XML与HTML

    pass

    四:二进制 179

      pass

    五:Excel 180

      pass

    七:HTML 与 Web API 181

      许多网站提供基于json格式的数据API,通过request等库可以获取

      pass

    六:数据库 182

      pass

    七:MongDB

      184

  • 相关阅读:
    App架构师实践指南四之性能优化一
    App架构师实践指南三之基础组件
    App架构师实践指南二之App开发工具
    App架构师实践指南一之App基础语法
    Linux下阅读MHT文件
    What Is Docker & Docker Container ? A Deep Dive Into Docker !
    Difference between Docker Image and Container?
    RabbitMQ .NET/C# Client API Guide
    How RabbitMQ Works and RabbitMQ Core Concepts
    Message Queue vs Message Bus — what are the differences?
  • 原文地址:https://www.cnblogs.com/pengsixiong/p/5050833.html
Copyright © 2020-2023  润新知