pandas-20 DataFrame()的基本操作
感觉上pandas的DataFrame就像numpy中的矩阵,不过它拥有列名和索引名,实际操作起来会更方便一些。
如:
df = pd.read_clipboard()
df.columns
df.Ratings
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
# 使用 浏览器 打开 某网址
#import webbrowser
#link = 'https://www.tiobe.com/tiobe-index'
#webbrowser.open(link)
# 从 粘贴板上 复制
df = pd.read_clipboard()
print(df)
'''
Oct 2018 Oct 2017 Change Programming Language Ratings Change.1
0 1 1 NaN Java 17.801% +5.37%
1 2 2 NaN C 15.376% +7.00%
2 3 3 NaN C++ 7.593% +2.59%
3 4 5 change Python 7.156% +3.35%
4 5 8 change Visual Basic .NET 5.884% +3.15%
'''
print(type(df)) # <class 'pandas.core.frame.DataFrame'>
# 打印出所有的 列名
print(df.columns)
'''
Index(['Oct 2018', 'Oct 2017', 'Change', 'Programming Language', 'Ratings',
'Change.1'],
dtype='object')
'''
# 打印出 其中 一列的 值
print(df.Ratings)
'''
0 17.801%
1 15.376%
2 7.593%
3 7.156%
4 5.884%
Name: Ratings, dtype: object
'''
# 可以使用访问字典的方式 访问
print(df['Ratings'])
'''
0 17.801%
1 15.376%
2 7.593%
3 7.156%
4 5.884%
Name: Ratings, dtype: object
'''
print(type(df['Ratings'])) # 每一列 其实都是一个 series : <class 'pandas.core.series.Series'>
# 拿出其中的某一些列,组成一个新的dataframe
df_new = DataFrame(df, columns=['Change', 'Ratings'])
print(df_new)
df_new = DataFrame(df, columns=['Change', 'Ratings', 'name'])
print(df_new)
'''
Change Ratings name
0 Java None NaN
1 C None NaN
2 C++ None NaN
3 Python None NaN
4 Visual 5.884% NaN
添加的新列,会默认一 NaN填充
'''
# 为 新列 赋值
df_new['name'] = range(0, 5)
print(df_new)
'''
Change Ratings name
0 Java None 0
1 C None 1
2 C++ None 2
3 Python None 3
4 Visual 5.884% 4
'''
# 更改某一列 的值
df_new['Change'] = np.arange(0, 5)
print(df_new)
'''
Change Ratings name
0 0 None 0
1 1 None 1
2 2 None 2
3 3 None 3
4 4 5.884% 4
'''
# 直接赋值一个 Series 也可以
df_new['Ratings'] = pd.Series(np.arange(0, 5))
print(df_new)
# 单独为 某列的某几个值赋值
df_new['name'] = pd.Series([100, 300], index = [1, 2])
print(df_new)
'''
Change Ratings name
0 0 0 NaN
1 1 1 100.0
2 2 2 300.0
3 3 3 NaN
4 4 4 NaN
'''