Lending Club 公司2007-2018贷款业务好坏帐分析

1 好贷款商业前景

1.1 按地域分析

IA(艾奥瓦州) 违约率最高，但由于其贷款数量少，总的只有三笔贷款
CA(加利福尼亚)、TX(德克萨斯)、NY(纽约州) 贷款额大，收益率和违约率都在平均水平，是稳定收入来源
HI(夏威夷州) 违约率处于中等水平，但平均收益率达到最高是值得研究学习的，VT(佛蒙特州) 也有如此特点

1.2 收入类型与贷款质量

收入越高的用户倾向于大笔贷款，收入低的用户贷款额相对也低
收入越高的用户坏账率相对较少，而收入低的用户坏账率相对高一些

2 坏贷款深入查看

2.1 坏款率与住房

按揭贷款的用户坏贷款额较高，其次是自有住房用户，最后是租房用户，贷款额逐年增高

2.2 坏款率与贷款利率

相对于低利率水平（<13%）,高利率水平更容易出现坏账
用户在高利率时的长期贷款比比低利率时长期贷款比更高
相对于长期贷款（60月），短期贷款（36月）更容易出现坏账

2.3 坏款率与贷款等级

信用等级良好贷款额逐年上升，信用等级低贷款额在2012年达到顶峰，之后逐年降低
贷款利率随着贷款信用等级的降低而提高,信用等级高的用户利率十年间基本维持在统一水平线，而低信用用户则利率逐年上升，甚至达到 30+%
高信用等级用户坏款率较低，低信用等级用户坏账率较高
C 等级用户也有很高的坏账率，这是一个值得关注的群体

2.4 坏款率与目的

总体上来看，用于教育、小生意的贷款坏账率较高，但教育贷款的利率不是很高，而小生意的利率较高
大部分贷款用于还其他债务，这部分坏账率14.2% 左右
高收入人群用于教育贷款的坏款率较高，达到了50%，其他处于正常水准
中等收入人群用于教育、结婚、小生意的贷款率较高，处于10-20%左右
低收入人群的用于教育和小生意的坏款率较高，处于15-25%左右

附录代码：

#coding:utf-8

import pandas as pd
import numpy as np
import time
import matplotlib.pyplot as plt
import seaborn as sns

plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False    # 用来正常显示负号,#有中文出现的情况，需要u'内容'

# 核心代码，设置显示的最大列、宽等参数，消掉打印不完全中间的省略号
pd.set_option('display.max_columns', 1000)
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth', 1000)

start_time = time.process_time()

# # # 读取数据
# loan = pd.read_csv('loan.csv',low_memory = False)
# # df = loan[['issue_d','addr_state','emp_length', 'emp_title', 'annual_inc','home_ownership',
# #          'pub_rec', 'dti','purpose',
# #          'loan_amnt','funded_amnt','funded_amnt_inv','term', 'grade', 'sub_grade','int_rate','loan_status',
# #          'tot_cur_bal','avg_cur_bal', 'total_acc', 'total_bc_limit']].copy()
# df=loan.copy()
# # 将 ‘issue_d’ str形式的时间日期转换
# dt_series = pd.to_datetime(df['issue_d'])
# df['year'] = dt_series.dt.year
# df['month'] = dt_series.dt.month
#
# # 根据收入 'annual_inc' 定义收入“低、中、高”
# # <10万 --> low , 10万~20万 --> medium , >20 万 --> high
# df['income_category'] = np.nan
# lst = [df]
# for col in lst:
#     col.loc[col['annual_inc'] <= 100000, 'income_category'] = 'Low'
#     col.loc[(col['annual_inc'] > 100000) & (col['annual_inc'] <= 200000), 'income_category'] = 'Medium'
#     col.loc[col['annual_inc'] > 200000, 'income_category'] = 'High'
#
# # 根据贷款状态 ‘loan_status’ 定义好坏贷款
# bad_loan = ["Charged Off",
#             "Default",
#             "Does not meet the credit policy. Status:Charged Off",
#             "In Grace Period",
#             "Late (16-30 days)",
#             "Late (31-120 days)"]
# df['loan_condition'] = np.nan
#
# def loan_condition(status):
#     if status in bad_loan:
#         return 'Bad Loan'
#     else:
#         return 'Good Loan'
#
# df['loan_condition'] = df['loan_status'].apply(loan_condition)
# df['loan_condition_int'] = np.nan
# for col in lst:
#     col.loc[df['loan_condition'] == 'Good Loan', 'loan_condition_int'] = 0  # Negative (Bad Loan)
#     col.loc[df['loan_condition'] == 'Bad Loan', 'loan_condition_int'] = 1  # Positive (Good Loan)
# df['loan_condition_int'] = df['loan_condition_int'].astype(int)
#
# # df.to_pickle("second_df.pkl")
# df.to_pickle("loan_condition.pkl")
#

df = pd.read_pickle('second_df.pkl')
df_state = df.groupby('addr_state').agg({'loan_amnt':'sum','annual_inc':'sum','int_rate':'mean',
                                      'loan_condition_int':'sum','loan_status':'count',
                                         'dti':'mean'}).reset_index()
df_state = df_state.rename(columns = {'loan_amnt':'loan_amnt_sum','annual_inc':'annual_inc_sum','int_rate':'int_rate_mean',
                                      'loan_condition_int':'default_num','loan_status':'loan_num','dti':'dti_mean'})
df_state['default_rate'] = df_state['default_num']/df_state['loan_num']*100

# # 3. 好贷款，有前景
# # 3.1 各个地区 贷款量、违约率、收益率之间关系
# print(df_state.sort_values(by='default_rate'))
# print(df_state.sort_values(by='int_rate_mean'))
# print(df_state.sort_values(by='loan_amnt_sum'))
# print(df_state.sort_values(by='annual_inc_sum'))
# print(df_state.sort_values(by='dti_mean'))
# print(df_state.columns)
#
# fig1_1 = plt.figure(figsize=(6,5))
# ax = fig1_1.add_subplot(111)
# plt.scatter(df_state['default_rate'],df_state['int_rate_mean'],color='royalblue',s=40)
# plt.ylabel('平均收益率（%）',fontsize=12)
# plt.xlabel('违约率（%）',fontsize=12)
# plt.title('各洲违约率-平均收益率',fontsize=16)
# plt.axhline(df_state['int_rate_mean'].mean(),color='r',linestyle='--')
# plt.axvline(df_state['default_rate'].median(),color='r',linestyle='--')
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0,wspace=0)
# plt.savefig("3.1.1 2007-2015 Lending Club 各洲违约率-平均收益率.png",dpi=1000,bbox_inches = 'tight')#解决图片不清晰，不完整的问题
# # plt.show()
#
#
# fig1_2 = plt.figure(figsize=(6,5))
# ax = fig1_2.add_subplot(111)
# plt.scatter(df_state['loan_amnt_sum'],df_state['int_rate_mean'],color='royalblue',s=40)
# # plt.ylim(0,25)
# plt.ylabel('平均收益率（%）',fontsize=12)
# plt.xlabel('贷款总额',fontsize=12)
# plt.title('各洲贷款总额-平均收益率',fontsize=16)
# plt.axhline(df_state['default_rate'].median(),color='r',linestyle='--')
# plt.axvline(df_state['loan_amnt_sum'].median(),color='r',linestyle='--')
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0,wspace=0)
# plt.savefig("3.1.2 2007-2015 Lending Club 各洲贷款总额-平均收益率.png",dpi=1000,bbox_inches = 'tight')#解决图片不清晰，不完整的问题
# # plt.show()
#
# fig1_3 = plt.figure(figsize=(6,5))
# ax = fig1_3.add_subplot(111)
# plt.scatter(df_state['loan_amnt_sum'],df_state['default_rate'],color='royalblue',s=40)
# plt.ylim(0,25)
# plt.ylabel('违约率（%）',fontsize=12)
# plt.xlabel('贷款总额',fontsize=12)
# plt.title('各洲贷款总额-违约率',fontsize=16)
# plt.axvline(df_state['loan_amnt_sum'].median(),color='r',linestyle='--')
# plt.axhline(df_state['default_rate'].median(),color='r',linestyle='--')
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0,wspace=0)
# plt.savefig("3.1.3 2007-2015 Lending Club 各洲贷款总额-违约率.png",dpi=1000,bbox_inches = 'tight')#解决图片不清晰，不完整的问题
#
# # 2. 贷款质量与收入分布
# fig2,ax=plt.subplots(1,2,figsize=(12,5))
# # plt.suptitle('贷款质量和收入分布', fontsize=16)
# sns.violinplot(x='income_category',y='loan_amnt',data=df,ax=ax[0],palette='Set2')
# ax[0].set_ylabel("贷款额",fontsize=12)
# ax[0].set_xlabel("收入类型",fontsize=12)
# ax[0].set_title("收入类型-贷款额",fontsize=14)
# sns.violinplot(x='income_category',y='loan_condition_int',data=df,ax=ax[1],palette='Set2')
# ax[1].set_ylabel("贷款好坏",fontsize=12)
# ax[1].set_xlabel("收入类型",fontsize=12)
# ax[1].set_title("收入类型-贷款好坏",fontsize=14)
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0,wspace=0.3)
# plt.savefig("3.2 2007-2015 Lending Club  贷款质量和收入分布情况.png",dpi=1000,bbox_inches = 'tight')#解决图片不清晰，不完整的问题
# plt.show()


# 4. 风险评估
# 1. 每年不同状态贷款分布
df_status=df.groupby(by=['loan_status','year']).loan_amnt.mean().unstack(-1)
print("
每年不同状态贷款分布
",df_status)

# 2. 违约贷款地域分布 TX > NY > CA
df_default=df[df['loan_status']=='Default']
print("
违约贷款地域分布
",df_default.groupby(by='addr_state').loan_amnt.sum().sort_values())

# 3. 各地区债务比分布情况
print("
各地区债务比分布情况
",df_state.sort_values(by='dti_mean'))

# # 4. 相关原因
# # a) 各变量相关性 协方差热力图
# # 与其从负相关到正相关 分别是 total_rec_prncp ,last_pymnt_amnt ,out_prncp
# # --> total_rec_late_fee,int_rate,hardship_dpd,collection_recovery_fee,recoveries
#
df_all = pd.read_pickle('loan_condition.pkl')
all_corr=df_all.corr()
print("
与坏贷款线性相关性
",all_corr['loan_condition_int'].sort_values())
#
# fig4_1,ax=plt.subplots(figsize=(12,12))
# sns.heatmap(all_corr,cmap='bwr',square=True)
# plt.title('变量协方差矩阵图',fontsize=16)
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0.1,wspace=0.1)
# plt.savefig("4.1 2007-2015 变量协方差矩阵图.png",dpi=1000,bbox_inches = 'tight')#解决图片不清晰，不完整的问题
# # plt.show()
#
# # b) 住房类型-坏贷款
# df_bad=df.loc[df['loan_condition_int']==1]
# fig4_2,ax = plt.subplots(2,1,figsize=(11,5))
#
# sns.boxplot(x='home_ownership',y='loan_amnt',hue='loan_condition',
#             data=df_bad,color='royalblue',ax=ax[0])
# ax[0].set_ylabel("贷款额",fontsize=12)
# ax[0].set_xlabel("住房类型",fontsize=12)
# ax[0].set_title("不同住房类型下的坏贷款额分布",fontsize=12)
#
# sns.boxplot(x='year',y='loan_amnt',hue='home_ownership',
#             data=df_bad,palette='Set3',ax=ax[1])
# ax[1].set_ylabel("贷款额",fontsize=12)
# ax[1].set_xlabel("年",fontsize=12)
# ax[1].set_title("不同年份不同住房类型下的坏贷款额分布",fontsize=12)
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0.5,wspace=0.1)
# plt.savefig("4.2 2007-2015 不同住房类型与坏贷款额关系.png",dpi=1000,bbox_inches = 'tight')#解决图片不清晰，不完整的问题
# plt.show()
# #
# # c) 收入类型-坏贷款
# df_purpose = df.groupby(by=['income_category','purpose']).aggregate({'int_rate':'mean','loan_amnt':'sum',
#                                                                    'loan_condition_int':'sum','loan_condition':'count'}).reset_index()
# df_purpose = df_purpose.rename(columns = {'int_rate':'int_rate_mean','loan_amnt':'loan_amnt_sum',
#                                         'loan_condition_int':'bad_loans_count','loan_condition':'all_loans_count'})
# df_purpose['bad_ratio'] = df_purpose['bad_loans_count']/df_purpose['all_loans_count']*100
# 
# # df_purpose.to_csv('df_purpose.csv')
# df_purpose_high_int = df_purpose[df_purpose['income_category'] == 'High']
# df_purpose_medium_int = df_purpose[df_purpose['income_category'] == 'Medium']
# df_purpose_low_int = df_purpose[df_purpose['income_category'] == 'Low']
#
# # 高中低收入类型的贷款目的分布雷达图
# high_lales = df_purpose_high_int['purpose']
# theta1 = np.linspace(0,2*np.pi,len(high_lales),endpoint = False)
# theta1 = np.concatenate((theta1,[theta1[0]]))
# data1 = df_purpose_high_int['bad_ratio']
# data1 = np.concatenate((data1,[data1[0]]))
#
# fig4_3_1 = plt.figure(figsize=(6,5))
# ax1 = fig4_3_1 .add_subplot(111, polar=True) # polar参数！！
# ax1.plot(theta1, data1, 'royalblue', linewidth=2)# 画线
# ax1.fill(theta1, data1, facecolor='r', alpha=0.25)# 填充
# ax1.set_thetagrids(theta1 * 180/np.pi, high_lales, fontproperties="SimHei")
# ax1.set_title("高收入类型坏贷款职业分布", va='bottom', fontproperties="SimHei",fontsize=14)
# ax1.set_rlim(0,70)
# ax1.grid(True)
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0.1,wspace=0.1)
# plt.savefig("4.3.1 2007-2015 高收入类型坏贷款职业分布.png",dpi=1000,bbox_inches = 'tight') #解决图片不清晰，不完整的问题
# plt.show()
#
# medium_labels = df_purpose_medium_int['purpose']
# theta2 = np.linspace(0,2*np.pi,len(medium_labels),endpoint = False)
# theta2 = np.concatenate((theta2,[theta2[0]]))
# data2 = list(df_purpose_medium_int['bad_ratio'])
# data2 = np.concatenate((data2,[data2[0]]))
#
# fig4_3_2 = plt.figure(figsize=(6,5))
# ax2 = fig4_3_2 .add_subplot(111, polar=True) # polar参数！！
# ax2.plot(theta2, data2, 'royalblue', linewidth=2)# 画线
# ax2.fill(theta2, data2, facecolor='g', alpha=0.25)# 填充
# ax2.set_thetagrids(theta2 * 180/np.pi, medium_labels, fontproperties="SimHei")
# ax2.set_title("中收入类型坏贷款职业分布", va='bottom', fontproperties="SimHei",fontsize=14)
# ax2.set_rlim(0,30)
# ax2.grid(True)
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0.1,wspace=0.1)
# plt.savefig("4.3.2 2007-2015 中收入类型坏贷款职业分布.png",dpi=1000,bbox_inches = 'tight') #解决图片不清晰，不完整的问题
# plt.show()
#
# low_labels = df_purpose_low_int['purpose']
# theta3 = np.linspace(0,2*np.pi,len(low_labels),endpoint = False)
# theta3 = np.concatenate((theta3,[theta3[0]]))
# data3 = list(df_purpose_low_int['bad_ratio'])
# data3 = np.concatenate((data3,[data3[0]]))
#
# fig4_3_3 = plt.figure(figsize=(6,5))
# ax3 = fig4_3_3.add_subplot(111, polar=True) # polar参数！！
# ax3.plot(theta3, data3, 'royalblue', linewidth=2)# 画线
# ax3.fill(theta3, data3, facecolor='grey', alpha=0.25)# 填充
# ax3.set_thetagrids(theta3 * 180/np.pi, low_labels, fontproperties="SimHei")
# ax3.set_title("低收入类型坏贷款职业分布", va='bottom', fontproperties="SimHei",fontsize=14)
# ax3.set_rlim(0,30)
# ax3.grid(True)
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0.1,wspace=0.1)
# plt.savefig("4.3.1 2007-2015 低收入类型坏贷款职业分布.png",dpi=1000,bbox_inches = 'tight') #解决图片不清晰，不完整的问题
# # plt.show()
#
# # d) 贷款利率水平-坏贷款
# # print(df['int_rate'].describe())
# df['int_payments'] = np.nan
# lst = [df]
# for col in lst:
#     col.loc[col['int_rate'] <= 13.09, 'int_payments'] = 'Low'
#     col.loc[col['int_rate'] > 13.09, 'int_payments'] = 'High'
#
# fig4_4 = plt.figure(figsize=(12,5))
# palette = ['royalblue','darkorange']
# plt.subplot(121)
# ax=sns.countplot(x='int_payments', hue='loan_condition', data = df,palette=palette)
# ax.set_xlabel('利率水平',fontsize=12)
# ax.set_ylabel('贷款额',fontsize=12)
# ax.set_title("不同贷款利率水平下好坏贷款额度情况",fontsize=14)
#
# plt.subplot(122)
# ax1=sns.countplot(x='int_payments', hue='term', data = df,palette=palette)
# ax1.set_xlabel('利率水平',fontsize=12)
# ax1.set_ylabel('贷款额',fontsize=12)
# ax1.set_title("不同贷款利率水平下长短期贷款额度情况",fontsize=14)
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0.1,wspace=0.1)
# plt.savefig("4.4 2007-2015 贷款利率与贷款额关系.png",dpi=1000,bbox_inches = 'tight')#解决图片不清晰，不完整的问题
# # plt.show()
#
#
# # e) 贷款信用等级-坏贷款
# #  每年各个贷款等级贷款额、贷款利率关系
# df_score_loan_amnt = df.groupby(by=['year','grade']).loan_amnt.mean()
# df_score_loan_amnt = df_score_loan_amnt.unstack(-1)
# df_score_int_rate  = df.groupby(by=['year','grade']).int_rate.mean()
# df_score_int_rate  = df_score_int_rate.unstack(-1)
#
# fig4_5_1,ax = plt.subplots(1,2,figsize=(12,5))
# cmap = plt.cm.coolwarm
# df_score_loan_amnt.plot(legend=False, ax=ax[0], colormap=cmap)
# ax[0].set_title("每年各贷款等级平均发放额",fontsize = 14)
# df_score_int_rate.plot(legend=True, ax=ax[1],  colormap=cmap)
# ax[1].legend(bbox_to_anchor=(-1.0, -0.2, 1.7, 0.1), loc=0, prop={'size':12},
#            ncol=7, mode="expand", borderaxespad=0.)
# ax[1].set_title("每年各贷款等级平均利率",fontsize = 14)
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0,wspace=0.1)
# plt.savefig("4.5.1 2007-2015 Lending Club 每年贷款等级分布情况.png",dpi=1000,bbox_inches = 'tight')#解决图片不清晰，不完整的问题
# # plt.show()
#
# # 各个贷款等级的贷款好坏分布
# df_score_condition = df.groupby(by=['grade', 'loan_condition']).size()
# df_score_condition = df_score_condition.unstack(-1)
#
# df_score_condition_sub = df.groupby(by=['sub_grade', 'loan_condition']).size()
# df_score_condition_sub = df_score_condition_sub.unstack(-1)
#
# fig4_5_2,ax = plt.subplots(1,2,figsize=(12,5))
# df_score_condition['Good Loan'].plot(kind='bar',legend=False, ax=ax[0], color='royalblue')
# df_score_condition['Bad Loan'].plot(kind='bar',legend=False, ax=ax[0], color='darkorange')
# ax[0].set_title("各贷款等级好坏贷款分布",fontsize = 14)
# df_score_condition_sub['Good Loan'].plot(kind='bar',legend=False, ax=ax[1], color='royalblue')
# df_score_condition_sub['Bad Loan'].plot(kind='bar',legend=False, ax=ax[1], color='darkorange')
# ax[1].set_title("各贷款子等级好坏贷款分布",fontsize = 14)
# ax[1].legend(bbox_to_anchor=(-0.6, -0.2, 1, 0.1), loc=0, prop={'size':12},
#            ncol=7, mode="expand", borderaxespad=0.)
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0,wspace=0.1)
# plt.savefig("4.5.2 2007-2015 Lending Club 各贷款等级好坏贷款分布情况.png",dpi=1000,bbox_inches = 'tight')#解决图片不清晰，不完整的问题
# # plt.show()
#
#
# # f) 贷款周期-坏贷款
# fig4_6 = plt.figure(figsize=(6,5))
# palette = ['royalblue','darkorange']
#
# plt.subplot(111)
# ax=sns.countplot(x='term', hue='loan_condition', data = df,palette=palette)
# ax.set_xlabel('贷款周期',fontsize=12)
# ax.set_ylabel('贷款额',fontsize=12)
# ax.set_title("不同贷款周期下好坏贷款额度情况",fontsize=14)
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0,wspace=0.1)
# plt.savefig("4.6 2007-2015 Lending Club 不同贷款周期下好坏贷款额度情况.png",dpi=1000,bbox_inches = 'tight')#解决图片不清晰，不完整的问题
# plt.show()
#
#
# # g) 贷款目的-坏贷款
# # 好坏贷款目的分布直方图
# purpose_condition = round(pd.crosstab(df['loan_condition'], df['purpose']).apply(lambda x: x/x.sum() * 100), 2)
# print()
# ticks=list(purpose_condition.columns)
# x=np.arange(purpose_condition.shape[1])
# y1=purpose_condition.iloc[1,:]
# y2=purpose_condition.iloc[0,:]
#
# fig4_7 = plt.figure(figsize=(10,8))
# p1 = plt.bar(x,y1, color='royalblue')
# p2 = plt.bar(x,y2,color='darkorange')
# for (a,b) in zip(x,y1):
#     plt.text(a,b+0.05,s=str(b)+'%',ha='center',va='bottom')
# for (a,b) in zip(x,y2):
#     plt.text(a,b+0.05,s=str(b)+'%',ha='center',va='bottom')
# plt.ylim(0,100)
# plt.xticks(x,ticks,rotation=20)
# plt.xlabel('贷款目的',fontsize=12)
# plt.ylabel('贷款占比（%）',fontsize=12)
# plt.title('好坏贷款的不同目的占比情况',fontsize=16)
# plt.legend([p1,p2],labels=['Good Loan','Bad Loan'],loc='best')
# plt.subplots_adjust(top=1,bottom=0,left=0,right=1,hspace=0.1,wspace=0.1)
# plt.savefig("4.7 2007-2015 好坏贷款的不同目的占比情况.png",dpi=1000,bbox_inches = 'tight') #解决图片不清晰，不完整的问题
# plt.show()
#

end_time = time.process_time()
print("程序运行了 %s 秒"%(end_time-start_time))

相关阅读:
jquery接收后台数组或集合回显复选框
 解决微信小程序滑动遮罩时底层跟着滑动的问题
 前端实现滑动开关
 css简单动画
 MyBatis的数据库操作
 前端开发面试题-JavaScript（转载）
前端开发面试题-CSS（转载）
前端开发面试题-HTML（转载）
H5 canvas 实现飞机大战游戏
 vuejs学习笔记（1）--属性，事件绑定，ajax
原文地址：https://www.cnblogs.com/Ray-0808/p/12668594.html