• pandas操作数据库


    1.pandas需要配合sqlalchemy的使用

    import pandas as pd
    from sqlalchemy import create_engine
    
    engine = create_engine("mysql+pymysql://%s:%s@%s:3306/%s?charset=utf8" % (TEST_DB.user, TEST_DB.password, TEST_DB.host, TEST_DB.db))
    
    exec_sql = ''
    source_table = pd.read_sql(exec_sql, engine)

    2.遍历数据

    # index代表索引,从0开始
    # v代表数据库中的数据
    for index,v in source_table.iterrows():
            print(index, v)
    
    
    
    # 精确取出数据,需要注意的是,取出的数据值,是一个series类型数据,不是string,需要string()后才可使用split
    v["字段名称"]即可

    3.更改数据

    # 使用iloc方法,
    source_table.iloc[index,2] = int(result)
    
    >>> df = pd.DataFrame(mydict)
            >>> df
                  a     b     c     d
            0     1     2     3     4
            1   100   200   300   400
            2  1000  2000  3000  4000
    
            **Indexing just the rows**
    
            With a scalar integer.
    
            >>> type(df.iloc[0])
            <class 'pandas.core.series.Series'>
            >>> df.iloc[0]
            a    1
            b    2
            c    3
            d    4
            Name: 0, dtype: int64
    
            With a list of integers.
    
            >>> df.iloc[[0]]
               a  b  c  d
            0  1  2  3  4
            >>> type(df.iloc[[0]])
            <class 'pandas.core.frame.DataFrame'>
    
            >>> df.iloc[[0, 1]]
                 a    b    c    d
            0    1    2    3    4
            1  100  200  300  400
    
            With a `slice` object.
    
            >>> df.iloc[:3]
                  a     b     c     d
            0     1     2     3     4
            1   100   200   300   400
            2  1000  2000  3000  4000
    
            With a boolean mask the same length as the index.
    
            >>> df.iloc[[True, False, True]]
                  a     b     c     d
            0     1     2     3     4
            2  1000  2000  3000  4000
    
            With a callable, useful in method chains. The `x` passed
            to the ``lambda`` is the DataFrame being sliced. This selects
            the rows whose index label even.
    
            >>> df.iloc[lambda x: x.index % 2 == 0]
                  a     b     c     d
            0     1     2     3     4
            2  1000  2000  3000  4000
    
            **Indexing both axes**
    
            You can mix the indexer types for the index and columns. Use ``:`` to
            select the entire axis.
    
            With scalar integers.
    
            >>> df.iloc[0, 1]
            2
    
            With lists of integers.
    
            >>> df.iloc[[0, 2], [1, 3]]
                  b     d
            0     2     4
            2  2000  4000
    
            With `slice` objects.
    
            >>> df.iloc[1:3, 0:3]
                  a     b     c
            1   100   200   300
            2  1000  2000  3000
    
            With a boolean array whose length matches the columns.
    
            >>> df.iloc[:, [True, False, True, False]]
                  a     c
            0     1     3
            1   100   300
            2  1000  3000
    
            With a callable function that expects the Series or DataFrame.
    
            >>> df.iloc[:, lambda df: [0, 2]]
                  a     c
            0     1     3
            1   100   300
            2  1000  3000
            """

    4.删除字段

    # axis=1代表列
    values_table = source_table.drop('字段名', axis=1)

    5.数据库更新

    .to_sql()更新数据时,con必须使用"sqlalchemy",如果使用pymysql会报错

     6.选择某些列

    import pandas as pd
    
    # 从Excel中读取数据,生成DataFrame数据
    # 导入Excel路径和sheet name
    df = pd.read_excel(excelName, sheet_name=sheetName)
    
    # 读取某些列,生成新的DataFrame
    newDf = pd.DataFrame(df, columns=[column1, column2, column3])

    7.读取某些列,并根据某个列的值筛选行

    newDf = pd.DataFrame(df, columns=[column1, column2, column3])[(df.column1 == value1) & (df.column2 == value2)]

    8.添加新的列

    # 第一种直接赋值
    df["newColumn"] = newValue
    
    # 第二种用concat组合两个DataFrame
    pd.concat([oldDf, newDf])

    9.更改某一列的值

    # 第一种,replace
    df["column1"] = df["column1"].replace(oldValue, newValue)
    
    # 第二种,map
    df["column1"] = df["column1"].map({oldValue: newValue})
    
    # 第三种,loc
    # 将column2 中某些行(通过column1中的value1来过滤出来的)的值为value2
    df.loc[df["column1"] == value1, "column2"] = value2

    10.填充缺失值

    # fillna填充缺失值
    df["column1"] = df["column1"].fillna(value1)

    11.过滤出某些列

    Examples
    
    df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
                      index=['mouse', 'rabbit'],
                      columns=['one', 'two', 'three'])
    df
            one  two  three
    mouse     1    2      3
    rabbit    4    5      6
    # select columns by name
    df.filter(items=['one', 'three'])
             one  three
    mouse     1      3
    rabbit    4      6
    # select columns by regular expression
    df.filter(regex='e$', axis=1)
             one  three
    mouse     1      3
    rabbit    4      6
    # select rows containing 'bbi'
    df.filter(like='bbi', axis=0)
             one  two  three
    rabbit    4    5      6

    12.mean()用法

    Pandas Series.mean()函数返回给定Series对象中基础数据的平均值。

    用法: Series.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)


    参数:
    axis:要应用的功能的轴。
    skipna:计算结果时排除NA /null值。
    level:如果轴是MultiIndex(分层),则沿特定级别计数,并折叠成标量。
    numeric_only:仅包括float,int,boolean列。
    **kwargs:要传递给函数的其他关键字参数。

    返回:均值:标量或系列(如果指定级别)

    # 求和,求平均:
    
    import pandas as pd
    student = pd.read_excel("C:/Users/Administrator/Desktop/Students.xlsx",index_col="ID")
    temp = student[["Test_1","Test_2","Test_3"]]
    student["total"] = temp.sum(axis=1)#axis 0为列,1为行
    student["avg"] = temp.mean(axis=1)
    print(student)

    #算各科成绩平均,求和:
    
    col_mean = student[["Test_1","Test_2","Test_3","total","avg"]].mean()
    col_mean["Name"]="Summary"
    student = student.append(col_mean,ignore_index=True)
    student[["Test_1","Test_2","Test_3","total","avg"]] = student[["Test_1","Test_2","Test_3","total","avg"]].astype(int)
    print(student)

    转自https://www.cnblogs.com/jiangxinyang/p/9672785.html

    转自https://blog.csdn.net/glittledream/article/details/87902161

  • 相关阅读:
    DNS 原理入门
    DiG HOWTO How to use dig to query DNS name servers.
    Top 10 Free Wireless Network hacking/monitoring tools for ethical hackers and businesses
    LoadRunner:视频教程、课件
    Android:开发环境搭建相关问题
    Android:使用代理服务器安装SDKs
    JavaSe:Cookie 管理的API介绍
    使用 Eclipse 玩转 C、C++
    C++: 主要知识点
    C、C++: 引用、指针、实例、内存模型、namespace
  • 原文地址:https://www.cnblogs.com/ttyypjt/p/13692444.html
Copyright © 2020-2023  润新知