• python数据分析panda库


    panda内有两种数据结构,Series()和DataFrame()

    1 >>> a=pd.Series([1,2],index=['a','b']) 
    2 >>> a
    3 a    1
    4 b    2
    5 dtype: int64
    1 >>> b.index
    2 RangeIndex(start=0, stop=2, step=1)
    3 >>> b.values
    4 array(['b', 'a'], dtype=object)
    5 >>> a/2
    6 a    0.5
    7 b    1.0
    8 dtype: float64
    9 >>> 

    列表切分选择

    >>> s[0:3:2]
    a    2
    c    6
    dtype: int64  
     1  s3=pd.Series(arr)  另一种方式生成series
     2 >>> s3
     3 0    1
     4 1    2
     5 2    3
     6 3    4
     7 dtype: int32
     8 >>> s3=pd.Series(s)
     9 >>> s3
    10 a    2
    11 b    5
    12 c    6
    13 d    3
    14 dtype: int64
    15 >>> s[s>8]
    16 Series([], dtype: int64)
    17 >>> s
    18 a    2
    19 b    5
    20 c    6
    21 d    3
    22 dtype: int64
    23 >>> s[s>3]    找出>3的元素
    24 b    5
    25 c    6
    26 dtype: int64
    27 >>> np.log(s)    对series直接运用函数
    28 a    0.693147
    29 b    1.609438
    30 c    1.791759
    31 d    1.098612
    32 dtype: float64
    33 >>> s.isin([5,6])     看某些元素是否在series中,boolean值
    34 a    False
    35 b     True
    36 c     True
    37 d    False
    38 dtype: bool
    39 >>> s[s.isin([5,6])]
    40 b    5
    41 c    6
    42 dtype: int64
    43 >>> s2=pd.Series([5,2,np.NaN,7,np.NaN])
    44 >>> s2
    45 0    5.0
    46 1    2.0
    47 2    NaN
    48 3    7.0
    49 4    NaN
    50 dtype: float64
    51 >>> s2.isnull()
    52 0    False
    53 1    False
    54 2     True
    55 3    False
    56 4     True
    57 dtype: bool
    58 >>> s2.notnull()
    59 0     True
    60 1     True
    61 2    False
    62 3     True
    63 4    False
    64 dtype: bool
    >>> s2[s2.isnull()]
    2   NaN
    4   NaN
    dtype: float64
    

    Frame的使用

     1 frame2=pd.DataFrame(fram,columns=['name','age'])
     2 >>> frame2
     3         name  age
     4 red        1    2
     5 yellow     5    6
     6 blue       9   10
     7 black     13   14
     8 >>> frame2.values
     9 array([[ 1,  2],
    10        [ 5,  6],
    11        [ 9, 10],
    12        [13, 14]])
    13 >>> frame2.index
    14 Index([u'red', u'yellow', u'blue', u'black'], dtype='object')
    15 >>> frame2.columns
    16 Index([u'name', u'age'], dtype='object')
    17 >>> frame2['name']
    18 red        1
    19 yellow     5
    20 blue       9
    21 black     13
    22 Name: name, dtype: int32
    23 >>> frame2.name
    24 red        1
    25 yellow     5
    26 blue       9
    27 black     13
    28 Name: name, dtype: int32
    29 >>> frame2.age
    30 red        2
    31 yellow     6
    32 blue      10
    33 black     14
    34 Name: age, dtype: int32
    35 >>> frame2[index=['red']]
    >>> frame2[0:2]
            name  age
    red        1    2
    yellow     5    6
    >>> frame2['name'][2]
    9
    1 >>> s.idxmin()
    2 'a'
    3 >>> s.idxmax9)
    4 SyntaxError: invalid syntax
    5 >>> s.idxmax()
    6 'c'
    7 >>> s.index.is_unique
    8 True
    >>> fram
            id  name  age  home
    red      0     1    2     3
    yellow   4     5    6     7
    blue     8     9   10    11
    black   12    13   14    15
    >>> frame4=fram.drop(['name','age'],axis=1)   删除列
    >>> frame4
            id  home
    red      0     3
    yellow   4     7
    blue     8    11
    black   12    15
     1 >>> f=lambda x:x.max()-x.min()   对frame运用自定义函数
     2 >>> fram.apply(f)
     3 id      12
     4 name    12
     5 age     12
     6 home    12
     7 dtype: int64
     8 >>> fram.apply(f,axis=1)
     9 red       3
    10 yellow    3
    11 blue      3
    12 black     3
    13 dtype: int64
    14 >>> fram.apply(f,axis=0)
    15 id      12
    16 name    12
    17 age     12
    18 home    12
    19 dtype: int64
    20 >>> def f(x):
    21     return pd.Series([x.min(),x.max()],index=['min','max'])
    22 
    23 >>> fram.apply(f)
    24      id  name  age  home
    25 min   0     1    2     3
    26 max  12    13   14    15

      frame的一些数学统计值

     1 >>> fram.describe()
     2               id       name        age       home
     3 count   4.000000   4.000000   4.000000   4.000000
     4 mean    6.000000   7.000000   8.000000   9.000000
     5 std     5.163978   5.163978   5.163978   5.163978
     6 min     0.000000   1.000000   2.000000   3.000000
     7 25%     3.000000   4.000000   5.000000   6.000000
     8 50%     6.000000   7.000000   8.000000   9.000000
     9 75%     9.000000  10.000000  11.000000  12.000000
    10 max    12.000000  13.000000  14.000000  15.000000
    11 >>> fram.sum()
    12 id      24
    13 name    28
    14 age     32
    15 home    36
    16 dtype: int64
    17 >>> fram.mean()
    18 id      6.0
    19 name    7.0
    20 age     8.0
    21 home    9.0
    22 dtype: float64
    23 >>> fram.min()
    24 id      0
    25 name    1
    26 age     2
    27 home    3
    28 dtype: int32
  • 相关阅读:
    Android进程的优先级说明
    Android的有序广播和无序广播(解决安卓8.0版本之后有序广播的接收问题)
    Android开发中常用Dialog(普通弹窗&时间选择器&日历选择器)
    Android的显示意图和隐式意图总结
    Android的启动模式
    怎么评论一段php语言文本单词one-hot编码的健壮性
    python 基础知识,解决模板引擎实现原理流程
    SQL----EXISTS 关键字EXISTS基本意思
    omcat启动Publishing failed with multiple errors
    AngularJs directive详解及示例代码
  • 原文地址:https://www.cnblogs.com/caojunjie/p/6702307.html
Copyright © 2020-2023  润新知