某些SQL查询语句与Pandas语句可以互相转换。
以下表(命名为df)为例:
sex tip total_bill
0 Female 1.01 16.99
1 Male 1.66 10.34
2 Male 3.50 23.68
3 Male 3.31 23.68
4 Female 3.61 24.59
import pandas as pd import numpy as np df = pd.DataFrame({'total_bill': [16.99, 10.34, 23.68, 23.68, 24.59], 'tip': [1.01, 1.66, 3.50, 3.31, 3.61], 'sex': ['Female', 'Male', 'Male', 'Male', 'Female']})
1,选择
SQL:SELECT ... FROM ...
SELECT total_bill, tip FROM df
Pandas:loc, iloc
df.loc[:, ['total_bill', 'tip']]
2,条件过滤
SQL:WHERE
SELECT * FROM df WHERE total_bill > 20
Pandas:df[df[colunm] boolean expr],.query()
df[df['total_bill'] > 20]
df.query('total_bill > 20')
3,限制行数
SQL:LIMIT
SELECT total_bill, tip FROM df LIMIT 3
Pandas:loc, iloc,.head()
df.loc[:2, ['total_bill', 'tip']]
df[['total_bill', 'tip']].head(3)
4,跳过指定行数
SQL:OFFSET
SELECT total_bill, tip
FROM df
LIMIT 3 OFFSET 1
Pandas:loc, iloc,.tail()
df.loc[1:3, ['total_bill', 'tip']]
df.loc[:3, ['total_bill', 'tip']].tail(3)
5,不重复的值
SQL:DISTINCT
SELECT DISTINCT sex FROM df
Python:.unique()
df['sex'].unique()
6,条件连接
SQL:AND,OR
SELECT * FROM df WHERE sex = "Female" AND tip > 2
Pandas:&,|
df[(df['sex']=='Female') & (df['tip']>2)]
7,排序
SQL:ORDER BY
SELECT * FROM df ORDER BY total_bill DESC, tip
Pandas:.sort_values()
df.sort_values(['total_bill', 'tip'], ascending=[False, True])
8,在/不在...里面
SQL:IN,NOT IN
SELECT * FROM df WHERE tip NOT IN (1.66, 3.31)
Pandas:.isin(),~
df[~df['tip'].isin([1.66, 3.31])]
9,是否是空值
SQL:IS NULL,IS NOT NULL
SELECT total_bill FROM df WHERE total_bill IS NOT NULL
Pandas:.isnull(),.notnull()
df.total_bill[df['total_bill'].notnull()]
10,分组
SQL:GROUP BY
SELECT sex, COUNT(tip) FROM df GROUP BY sex
Pandas:.groupby()
df.groupby('sex')['tip'].count()
11,计数
SQL:COUNT
SELECT sex, COUNT(tip) FROM df GROUP BY sex
Pandas:.count(), .size()
df.groupby('sex')['tip'].count()
注:.count()返回非空字符出现的次数,相当于SQL中的COUNT(col_name);而.size()返回所有行数,相当于SQL中的COUNT(*)。
12,对分组结果进行条件过滤
SQL:HAVING
SELECT sex, COUNT(tip) FROM df GROUP BY sex HAVING COUNT(tip) > 2
Pandas:
temp=df.groupby("sex")["tip"].count() temp[temp>2]
13,聚合函数
SQL: MIN,MAX,AVG,SUM
SELECT sex, MAX(tip), SUM(total_bill) FROM df GROUP BY sex
Pandas:.agg()
df.groupby('sex').agg({'tip': np.max, 'total_bill': np.sum})
14,表连接
SQL:JOIN,LEFT JOIN,RIGHT JOIN
Pandas:.merge()
15, 表并集
SQL:UNION,UNION ALL
Pandas:pd.concat() + .drop_dupplicates(),pd.concat()