Dataframe & Series Columns & Index Missing values: NaN
df.index
df.columns
df.data
type(...)
df.dtypes
series.to_frame()
s.value_counts()
s.describe()
s.isnull()
s.fillna(0)
s.dropna()
s.value_counts(normalize=True)
s.hasnans()
dataframe.isnull()
df.sum()
pd.read_csv(..., index_col="...")
df.reset_index
df.rename(index={...}, columns={...})
idx_list = df.index.tolist()
idx_list[1] = ...
df.index = idx_list
df.drop("...", axis="columns")
df.insert(loc=..., column="...", value=[])
Operations
df.filter(like="...")
df.filter(regex="...")
df.count(...) // no NaN values
df.isnull()
df.sum()
df.head()
df.memory_usage()
df.nunique()
col.astype("categorical")
df.nlargest()
df.sort_values(...)
df.drop_duplicate()
df.iloc[...] // index
df.loc[...] // label
df.columns
df.get_loc(...)
df.col.pct_change()
pd.cut(col, bins)
Tidy data => "Hadley"
- Stack & melt
- vs Unstack & pivot
The Zen of Python
Combining Pandas Objects
df.loc[len(df)] = {Age: ...}
pd.concat([df1, df2])
Time Series Analysis
- date
- time
- datetime
- timedelta
- pd.Timestamp
df.between_time()
df.at_time()
df.resample("w")
df.size()
df.resample("w", on="col1")
REF
https://gist.github.com/MaximePawlakFr/71a5cfbaef45ad5b0f4f23536752f229