机器学习入门-数值特征-数据四分位特征 1.quantile(用于求给定分数位的数值) 2.plt.axvline(用于画出竖线) 3.pd.pcut(对特征进行分位数切分，生成新的特征)

函数说明:

1. .quantile(cut_list) 对DataFrame类型直接使用,用于求出给定列表中分数的数值，这里用来求出4分位出的数值

2. plt.axvline() # 用于画出图形中的竖线

3. pd.qcut(feature, cut_list, labels) 用于对特征进行切分，cut_list切分的分数位置，labels切分后新的标签值

我们可以根据某个特征的四分位数值，给定这个特征一个新的四分位数值的特征

四分位表示的是数值的中位数，1/4位和3/4位

比如，我们可以根据工资的四分位，给定工资一个新的四分位特征，这里的分数我们可以做出自我定义

代码：

第一步：导入数据

第二步：对数据中的收入特征画出直方图

第三步：使用.quantile(cut_list) 找出cut_list分数对应的特征数值

第四步：使用plt.axvline对四分位特征数值画竖线

第五步：使用pd.qcut(data, cut_list, labels) 对收入特征做四分位切分，labels表示切分后新的标签名，不输入labels使用的是数值范围

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# 第一步导入数据
fcc_survey_df = pd.read_csv('datasets/fcc_2016_coder_survey_subset.csv', encoding='utf-8')

# 第二步：对收入数据画直方图
fig, ax = plt.subplots()
fcc_survey_df['Income'].hist(bins=30, color='#A9C5D3')
ax.set_xlabel('Income', fontsize=10)
ax.set_ylabel('Frequency', fontsize=10)
ax.set_title('Frequency', fontsize=10)
plt.show()

# 第三步：使用.quantile 找出四分位处的数值，
cut_list = [0, 0.25, 0.5, 0.75, 1]
cut_income = fcc_survey_df['Income'].quantile(q=cut_list)
print(cut_income)

# 第四步：在上述图形的基础上使用plt.axvline画出竖线
fig, ax = plt.subplots()
fcc_survey_df['Income'].hist(bins=30, color='#A9C5D3')
colors = ['red', 'green', 'blue', 'yellow']
for i in range(0, 4):
    plt.axvline(cut_income[cut_list[i]], color=colors[i], label=str(cut_list[i])+'_line')
plt.legend(fontsize=14)
ax.set_xlabel('Income', fontsize=10)
ax.set_ylabel('Frequency', fontsize=10)
ax.set_title('Frequency', fontsize=10)
plt.show()

# 第五步：使用pd.qcut将原始特征转换为4分位数的特征
cut_labels = ['0-25Q', '25-50Q', '50-75Q', '75-100Q']
fcc_survey_df['cut_qua'] = pd.qcut(fcc_survey_df['Income'], cut_list)
fcc_survey_df['cut_qua_labels'] = pd.qcut(fcc_survey_df['Income'], cut_list, labels=cut_labels)
print(fcc_survey_df[['Income', 'cut_qua', 'cut_qua_labels']].head())

相关阅读:
Dll版本管理
线程池ThreadPool
关于sitemesh和freemark在struts2中的一些问题总结
Google 怎么搜索
android 设计模式
android webview
ios 基础数据类型
android 常用
android Handler vs Timer
网站

原文地址：https://www.cnblogs.com/my-love-is-python/p/10322024.html