python信用评分卡建模(附代码,博主录制)
由于模型是以特定时期的样本所开发的,此模型是否适用于开发样本之外的族群,必须经过稳定性测试才能得知。稳定度指标(population stability index ,PSI)可衡量测试样本及模型开发样本评分的的分布差异,为最常见的模型稳定度评估指针。其实PSI表示的就是按分数分档后,针对不同样本,或者不同时间的样本,population分布是否有变化,就是看各个分数区间内人数占总人数的占比是否有显著变化。公式如下:
补充解释ln()为自然对数函数
PSI实际应用范例:
1)样本外测试
针对不同的样本测试一下模型稳定度,比如训练集与测试集,也能看出模型的训练情况,我理解是看出模型的方差情况。
2)时间外测试
测试基准日与建模基准日相隔越远,测试样本的风险特征和建模样本的差异可能就越大,因此PSI值通常较高。至此也可以看出模型建的时间太长了,是不是需要重新用新样本建模了。
http://ucanalytics.com/blogs/population-stability-index-psi-banking-case-study/
This is a continuation of the banking case study for the creation of application risk scorecards we have discussed in some previous articles. You could find the previous parts of the series at the following links (Part 1), (Part 2), (Part 3) and (Part 4).
In this article, we will discuss the Population Stability Index (PSI), an important metric to identify a shift in population for retail credit scorecards. Before we delve deeper into the calculation of the population stability index (PSI) and its utility, let’s try to understand the overall purpose of the PSI and similar indexes by connecting a few dots between.
这是我们在之前的一些文章中讨论过的创建应用程序风险记分卡的银行案例研究的延续。 您可以在以下链接(第1部分),(第2部分),(第3部分)和(第4部分)中找到该系列的前几部分。
在本文中,我们将讨论人口稳定性指数(PSI),这是确定零售信用记分卡人口变化的重要指标。 在我们深入研究人口稳定性指数(PSI)及其效用的计算之前,让我们尝试通过在两者之间连接几个点来理解PSI和类似指数的总体目的。
Dictators and Credit Crisis
What is similar between Napoleon’s and then Hitler’s attempts to invade Russia and financial crisis of 2007-08?
Napoleon tried to invade Russia in 1812 and Hitler repeated Napoleon’s misdeeds in 1941 – both invasions ended with severe defeats for the armies of the dictators. The armies of both Napoleon and Hitler were far superior to the Russians. It was the conditions in which the battles were fought that resulted in these defeats. Russian winters are often held responsible for the fate of these armies. In reality, it was the ill-preparedness and bad judgment of both Napoleon’s and Hitler’s men that caused them the humiliating defeats. They were very well trained men but they were trained in benevolent conditions of France and Germany. This time, the battle was in completely different and extreme conditions, and they could not cope with it.
The failure of credit risk models during the financial crisis 0f 2007-08 could be related to the fate of both the French and German armies. The models were built and trained in a benevolent economic environment and were ill-prepared to deal with extreme economic conditions at the time. Additionally, there were series of bad judgments by the executives at the financial firms that resulted in total economic collapse.
The moral of the above stories is that one has to keep a close tab on a change in conditions in the currently prevalent environment and training environment. The Basel III accord has paid a significant attention towards monitoring portfolio on a regular basis for a good reason. The population stability index (PSI) is one such index that helps risk managers in performing this task for retail credit scorecards.
独裁者与信用危机
拿破仑和希特勒试图入侵俄罗斯和2007 - 08年的金融危机有何相似之处?
拿破仑试图在1812年入侵俄罗斯,希特勒于1941年重复了拿破仑的不端行为 - 两次入侵都以独裁者军队的严重失败而告终。拿破仑和希特勒的军队远远优于俄罗斯人。正是这场战斗的条件导致了这些失败。俄罗斯的冬天经常对这些军队的命运负责。实际上,正是拿破仑和希特勒的男人们的准备不足和不良判断导致了他们羞辱性的失败。他们是训练有素的人,但他们受过法国和德国的良好条件训练。这一次,战斗处于完全不同的极端条件下,他们无法应对。
2007-08财政危机期间信用风险模型的失败可能与法国和德国军队的命运有关。这些模型是在一个仁慈的经济环境中建立和培训的,并且没有准备好应对当时的极端经济条件。此外,金融公司的高管们做出了一系列糟糕的判断,导致经济全面崩溃。
上述故事的寓意是,必须密切关注当前流行的环境和培训环境中的条件变化。 “巴塞尔协议III”已经非常重视监督投资组合,这是有充分理由的。人口稳定指数(PSI)就是这样一个指数,它可以帮助风险管理人员完成零售信用记分卡的这项任务。
Population Stability Index (PSI) – Our Banking Case Continues
You are the chief-risk-officer at CyndiCat bank. It’s been a couple of years since your team, in your supervision, has built the auto-loans credit scorecard. Since then the overall risk assessment process for the bank has improved significantly. Though being a prudent risk manager you have asked your team to regularly compare the population for which the scorecard was built and the existing through-the-door population (applicants for auto loans). A good place to start this comparison is by checking how two populations are distributed across the risk bands created through the scorecard. The following is a representation for the latest quarterly comparison your team has performed against the benchmark sample. Here Actual %’ is the population distribution for the latest quarter and ‘Expected %’ is the population distribution for the validation sample (a.k.a. benchmark sample).
人口稳定指数(PSI) - 我们的银行业案例继续
您是CyndiCat银行的首席风险官。 自从您的团队在您的监督下建立了汽车贷款信用记分卡以来已经过去了几年。 从那时起,银行的整体风险评估流程得到了显着改善。 虽然您是一名谨慎的风险经理,但您已经要求您的团队定期比较建立记分卡的人口和现有的门户(汽车贷款申请人)。 开始这种比较的一个好地方是检查两个种群如何在通过记分卡创建的风险区分布。 以下是您的团队针对基准样本进行的最新季度比较的表示。 这里实际%'是最新季度的人口分布,'预期%'是验证样本的人口分布(a.k.a.基准样本)。
Comparing two populations visually is a good place to start. The current population seems to have shifted towards the right side of the graph. To a small extent, this is expected since scorecards often influence the through-the-door population as the market starts reacting to the approval strategies of the bank. However, the question we need to ask is whether this a major shift in the population? Essentially, you are comparing two different distributions and could use any goodness-of-fit measure such as Chi-square test. However, the population stability index is an industry-accepted metric that presents some convenient rules of thumb for the same. The population stability index (PSI) formula is displayed below (refer to ‘Credit Risk Scorecards’ by Naeem Siddiqui)
目视比较两个人群是一个很好的起点。 目前的人口似乎已转向图表的右侧。 在很小程度上,这是预期的,因为市场开始对银行的审批策略作出反应,因为记分卡经常影响到门户。 但是,我们需要问的问题是,这是否是人口的重大转变? 基本上,您正在比较两种不同的分布,并且可以使用任何拟合度度量,例如卡方检验。 然而,人口稳定性指数是一个行业认可的指标,为此提供了一些方便的经验法则。 人口稳定性指数(PSI)公式如下所示(参见Naeem Siddiqui的“信用风险记分卡”)
Again like the weight of evidence and the information value, PSI seems to have it’s root in information theory. Let’s calculate the population stability index (PSI) for our population (we have already seen a histogram for this above).
再次像证据的重量和信息价值,PSI似乎已经成为信息理论的根源。 让我们计算人口的人口稳定性指数(PSI)(我们已经看到了上面的直方图)。
Score bands | Actual % | Expected % | Ac-Ex | ln(Ac/Ex) | Index |
< 251 | 5% | 8% | -3% | -0.47 | 0.014 |
251–290 | 6% | 9% | -3% | -0.41 | 0.012 |
291–320 | 6% | 10% | -4% | -0.51 | 0.020 |
321–350 | 8% | 13% | -5% | -0.49 | 0.024 |
351–380 | 10% | 12% | -2% | -0.18 | 0.004 |
381–410 | 12% | 11% | 1% | 0.09 | 0.001 |
411–440 | 14% | 10% | 4% | 0.34 | 0.013 |
441–470 | 14% | 9% | 5% | 0.44 | 0.022 |
471–520 | 13% | 9% | 4% | 0.37 | 0.015 |
520 < | 9% | 8% | 1% | 0.12 | 0.001 |
Population Stability Index (PSI)= | 0.1269 |
The last column in the above table is what we care for. Let us consider the score band 251-290 and calculate the index value for this row.
上表中的最后一列是我们关心的。 让我们考虑分数带251-290并计算该行的索引值。
The final value for the PSI i.e. 0.13 is the sum of all the values of the last column. Now the question is how to interpret this value? The rule of thumb for the PSI is displayed below
PSI的最终值,即0.13,是最后一列的所有值的总和。 现在的问题是如何解释这个值? PSI的经验法则如下所示
PSI Value | Inference | Action |
Less than 0.1 | Insignificant change | No action required |
0.1 – 0.25 | Some minor change | Check other scorecard monitoring metrics |
Greater than 0.25 | Major shift in population | Need to delve deeper |
The value of 0.13 falls in the second bucket which indicates a minor shift in population from the validation or benchmark sample. These are handy rules to have. However, one must ask, how is this population shift going to make any difference in the scorecard? Actually, it may or may not make any difference. Each score band of a scorecard has an associated bad rate or probability of customers not paying off their loans. For instance, score band 251-290 in our scorecard has a bad rate of 10% or one customer out of the population of 10 in this score band won’t service his/her loan. The population stability index simply indicates changes in the population of loan applicants. However, this may or may not result in deterioration in performance of the scorecard to predict risk. Nevertheless, the PSI indicates changes in the environment which need to be further investigated through analyzing the change in macroeconomic conditions and overall lending policies of the bank.
值为0.13属于第二个桶,表示人口与验证或基准样本的微小变化。这些都是方便的规则。但是,必须要问的是,这个人口如何转变会对记分卡产生任何影响?实际上,它可能有也可能没有任何区别。记分卡的每个分数带都有相关的不良率或客户未偿还贷款的概率。例如,我们的记分卡中的分数带251-290具有10%的不良率,或者该分数带中的10个人口中的一个客户将不会为他/她的贷款提供服务。人口稳定指数仅表明贷款申请人口的变化。然而,这可能会或可能不会导致记分卡的性能恶化以预测风险。然而,PSI表明环境的变化需要通过分析银行宏观经济状况和整体贷款政策的变化进一步调查。
Sign-off Note
The population stability index is one of the metrics to keep a check on changing conditions – however, the idea is clear that one has to capture robust metrics to keep a close look on the ever changing economic winds to prevent a crash landing. On the other side, Russian winters did change the history of the planet for better – I guess change is not always for bad.
This was a bit of a detour from our previous article on books to learn probability and Bayesian statistics. Hopefully, you have got a chance to check out some of the books mentioned in the earlier article, see you soon with the second part of that article.
签收说明
人口稳定性指数是检查不断变化的条件的指标之一 - 但是,很明显,人们必须捕捉到强大的指标,以便密切关注不断变化的经济风,以防止崩溃着陆。另一方面,俄罗斯的冬天确实改变了地球的历史 - 我想改变并不总是坏事。
这与我们之前关于学习概率和贝叶斯统计的书籍的文章有点迂回。希望您有机会查看前一篇文章中提到的一些书籍,很快就会看到该文章的第二部分。
python风控建模实战lendingClub(博主录制,catboost,lightgbm建模,2K超清分辨率)
https://study.163.com/course/courseMain.htm?courseId=1005988013&share=2&shareId=400000000398149
扫描和关注博主二维码,学习免费python视频教学资源