目录
1. 为什么要学习numpy?
- numpy可以对整个array进行复杂计算,而不需要像list一样写loop
- 它的
ndarray
提供了快速的基于array的数值运算 - memory-efficient container that provides fast numerical operations
- 学习pandas的必备
证明numpy比list优秀:
import numpy as np
my_arr = np.arange(1000000)
my_list = list(range(1000000))
%time for _ in range(10): my_arr2 = my_arr * 2 # Wall time: 25 ms
%time for _ in range(10): my_list2 = [x * 2 for x in my_list] # Wall time: 933 ms
2. Numpy基本用法
2.1. 创建np.ndarry
注意: numpy只能装同类型的数据
# Method 1: np.array()
## 1-D
a = np.array([1,2,3])
a.shape
a.dtype # int32, boolean, string, float
a.ndim
## 2-D
a = np.array([[0,1,2],[3,4,5]])
# Method 2:使用函数(arange, linspace, ones, zeros, eys, diag,random)创建
a = np.arange(10)
a = np.linspace(0,1,6, endpoint=False)
a = np.ones((3,3))
a = np.zeros((3,3))
a = np.eye(3)
a = np.diag(np.array([1,2,3,4]))
a = np.triu(np.ones((3,3)),1)
# Method 3: Random values
a = np.random.rand(4) # unifomr in [0,1]
a = np.random.randn(4) # Gaussian
np.random.seed(1234)
2.2. Indexing and Slicing
- Slice create a view on the original array(change will affect original array)
# 1-D
a = np.arange(10)
a[5], a[-1] # Index: 4,9
a[5:8] = 12 # Slice: all 5-8 is set as 12
arr[5:8].copy() # Slice without view
# 2-D
a = np.ones((3,3))
a[2] # second row
a[2].copy() # slice without view
a[0][2] # special value
a[:2]
a[:2, 1:] = 0
Boolean Index
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4)
data[names == 'Bob'] # select a row from data based on the if names equals Bob(boolean value)
data[~(names == 'Bob')] # not equal to Bob
data[(names == 'Bob') | (names == 'Will')] #e qual to Bob and Will
data[data<0] = 0
2.3. Universal Functions
a function that performs element-wise operations on data in ndarrays
a = np.arange(10)
b = np.arange(2,12)
# single
a + 1
a*2
np.sqrt(a)
np.exp(a)
np.sin(a)
# binary
a>b # return boolean ndarray
np.array_equal(a,b) # eual?
np.maximum(a, b) # find max value between each pair values
np.logical_or(a,b) # Attentions, a and b must be boolean array
2.4. Array-oriented
- Probelm 1
we wished to evaluate the function `sqrt(x^2 + y^2)`` across a regular grid of values.
The np.meshgrid
function takes two 1D arrays and produces two 2D matrices corresponding to all pairs of (x, y) in the two arrays:
points = np.arange(-5, 5, 0.01) # 1000 equally spaced points
xs, ys = np.meshgrid(points, points)
z = np.sqrt(xs ** 2 + ys ** 2)
import matplotlib.pyplot as plt
%matplotlib inline
plt.imshow(z, cmap=plt.cm.gray); plt.colorbar()
plt.title("Image plot of $sqrt{x^2 + y^2}$ for a grid of values")
- Problem 2
we have two array(x,y)
and one boolean array, we want select x if boolean=True, while select y if boolean=False->np.where()
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])
result = np.where(cond, xarr, yarr) # array([1.1, 2.2, 1.3, 1.4, 2.5])
np.where
的后面两个参数可以是array,数字. 是数字的话就可以做替换工作,比如我们将随机生成的array中大于0的替换为2,小于0的替换为-2
arr = np.random.randn(4, 4)
np.where(arr > 0, 2, -2) # 大于0改为2,小于0改为-2
np.where(arr > 0, 2, arr) # 大于0改为2,小于0不变
2.5. Mathematical Operations
a = np.random.randn(5, 4)
np.mean(a)
np.mean(a, axis = 1)
np.sum(a)
a.consum()
a.sort()
a.argmax() # index of maxium
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
np.unique(names)
sorted(set(names))