• python之pandas&&DataFrame(二)


    简单操作

    Python-层次聚类-Hierarchical clustering

    >>> data = pd.Series(np.random.randn(10),index=[['a','a','a','b','b','c','c','d','d','d'],[1,2,3,1,2,1,2,3,1,2]])
    >>> data
    a  1   -0.168871
       2    0.828841
       3    0.786215
    b  1    0.506081
       2   -2.304898
    c  1    0.864875
       2    0.183091
    d  3   -0.678791
       1   -1.241735
       2    0.778855
    dtype: float64

    Hierarchical与DataFrame之间的转换

    >>> data.unstack()
              1         2         3
    a -0.168871  0.828841  0.786215
    b  0.506081 -2.304898       NaN
    c  0.864875  0.183091       NaN
    d -1.241735  0.778855 -0.678791
    >>> type(data.unstack())
    <class 'pandas.core.frame.DataFrame'>

    Merge,join,Concatenate

    >>> df2 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000]},index=['hangzhou','najing'])
    >>> df1 = pd.DataFrame({'apts':[55000,60000],'cars':[20000,30000]},index=['shanghai','beijing'])
    >>> df3 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000]},index=['guangzhou','chongqing'])
    >>> [df1,df2,df3]
    [           apts   cars
    shanghai  55000  20000
    beijing   60000  30000,            apts   cars
    hangzhou  55000  15000
    najing    60000  12000,             apts   cars
    guangzhou  55000  15000
    chongqing  60000  12000]
    >>> pd.concat([df1,df2,df3])
                apts   cars
    shanghai   55000  20000
    beijing    60000  30000
    hangzhou   55000  15000
    najing     60000  12000
    guangzhou  55000  15000
    chongqing  60000  12000
    frames = [df1,df2,df3]
    >>> result2 = pd.concat(frames,keys=['x','y','z'])
    >>> result2
                  apts   cars
    x shanghai   55000  20000
      beijing    60000  30000
    y hangzhou   55000  15000
      najing     60000  12000
    z guangzhou  55000  15000
      chongqing  60000  12000

    进行拼接concat

    >>> df4 = pd.DataFrame({"salaries":[10000,30000,30000,20000,15000]},index=['suzhou','beijing','shanghai','guanghzou','tianjin'])
    >>> result3 = pd.concat([result,df4],axis=1)
    >>> result3
                  apts     cars  salaries
    beijing    60000.0  30000.0   30000.0
    chongqing  60000.0  12000.0       NaN
    guanghzou      NaN      NaN   20000.0
    guangzhou  55000.0  15000.0       NaN
    hangzhou   55000.0  15000.0       NaN
    najing     60000.0  12000.0       NaN
    shanghai   55000.0  20000.0   30000.0
    suzhou         NaN      NaN   10000.0
    tianjin        NaN      NaN   15000.0

    合并两个DataFrame,并且只是交集

    >>> result3 = pd.concat([result,df4],axis=1,join='inner')
    >>> result3
               apts   cars  salaries
    shanghai  55000  20000     30000
    beijing   60000  30000     30000

    Series和DataFrame一起Concatenate

    >>> s1 = pd.Series([60,50],index=['shanghai','beijing'],name='meal')
    >>> s1
    shanghai    60
    beijing     50
    Name: meal, dtype: int64
    >>> type(s1)
    <class 'pandas.core.series.Series'>
    >>> df1
               apts   cars
    shanghai  55000  20000
    beijing   60000  30000
    >>> type(df1)
    <class 'pandas.core.frame.DataFrame'>
    >>> pd.concat([df1,s1],axis=1)
               apts   cars  meal
    shanghai  55000  20000    60
    beijing   60000  30000    50
    >>> 

    Series可以使用append进行行添加也可以列添加,但是concat不可以

    >>> s2 = pd.Series([18000,12000],index=['apts','cars'],name='xiamen')
    >>> s2
    apts    18000
    cars    12000
    Name: xiamen, dtype: int64
    >>> df1.append(s2)
               apts   cars
    shanghai  55000  20000
    beijing   60000  30000
    xiamen    18000  12000
    >>> pd.concat([df1,s2],axis=0)
                    0     apts     cars
    shanghai      NaN  55000.0  20000.0
    beijing       NaN  60000.0  30000.0
    apts      18000.0      NaN      NaN
    cars      12000.0      NaN      NaN
    >>> pd.concat([df1,s2],axis=1)
                 apts     cars   xiamen
    apts          NaN      NaN  18000.0
    beijing   60000.0  30000.0      NaN
    cars          NaN      NaN  12000.0
    shanghai  55000.0  20000.0      NaN
    >>> 

    merge合并

    >>> df1 = pd.DataFrame({"salaries":[10000,30000,30000,20000,15000],'cities':['suzhou','beijing','shanghai','guanghzou','tianjin']})
    >>> df4 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000],'cities':['shanghai','beijing']})
    >>> result = pd.merge(df1,df4,on='cities') #on表示合并的列                                      
    >>> result cities salaries apts cars 0 beijing 30000 60000 12000 1 shanghai 30000 55000 15000
    >>> result = pd.merge(df1,df4,on='cities',how='right')
    >>> result
         cities  salaries   apts   cars
    0   beijing     30000  60000  12000
    1  shanghai     30000  55000  15000
    >>> result = pd.merge(df1,df4,on='cities',how='left')
    >>> result
          cities  salaries     apts     cars
    0     suzhou     10000      NaN      NaN
    1    beijing     30000  60000.0  12000.0
    2   shanghai     30000  55000.0  15000.0
    3  guanghzou     20000      NaN      NaN
    4    tianjin     15000      NaN      NaN
  • 相关阅读:
    思考:如何保证服务稳定性?
    svn:Item is out of date解决办法
    MAC OS 10.15 Lucene 源码分析环境搭建
    防止数据重复提交的6种方法(超简单)!
    6种快速统计代码执行时间的方法,真香!
    漫画:Integer 竟然有 6 种比较方式?
    IDEA 不为人知的 5 个骚技巧!真香!
    自由职业半年之后,我又滚回职场了...
    为什么建议你使用枚举?
    ESP8266
  • 原文地址:https://www.cnblogs.com/chenyang920/p/8007527.html
Copyright © 2020-2023  润新知