Auto Byte

Science AI

Ayhan Dis作者丁楠雅校对张睿毅翻译

# creditR的基于实践的导论：一个神奇的改良信用风险评分和验证的R包（附代码）

### 背景介绍

“信用评分是放款人和金融机构为获得一个人的信用度而进行的统计分析。出借人通过信用评分决定是否延长或拒绝信用。”

--投资银行

creditR 包就应运而生了！它允许你在机器学习应用程序之前轻松创建信用风险评分的基本模型。此外它还含有一些可用于验证这些进程的函数。

### 一、为什么要用creditR

creditR软件包既可以自动使用传统方法，也可以验证传统和机器学习模型。

### 二、开始使用creditR

`install.packages("devtools", dependencies = TRUE)`

```library(devtools)
devtools::install_github("ayhandis/creditR")
library(creditR)```

### 三、creditR内的一系列函数

`ls("package:creditR")`

### 四、creditR包的一项应用

```# Attaching the library
library(creditR)

#Model data and data structure
data("germancredit")
str(germancredit)

#Preparing a sample data set
sample_data <- germancredit[,c("duration.in.month","credit.amount","installment.rate.in.percentage.of.disposable.income", "age.in.years","creditability")]

#Converting the ‘Creditability’ (default flag) variable into numeric type

#Calculating the missing ratios
missing_ratio(sample_data)```

```#Splitting the data into train and test sets
traintest <- train_test_split(sample_data,123,0.70)
train <- traintest\$train
test <- traintest\$test```

WOE变换是一种结合变量与目标变量的关系将变量转化为分类变量的方法。下面的“WoeRules”对象包含WOE的规则。

```#Applying WOE transformation on the variables
woerules <- woe.binning(df = train,target.var = "creditability",pred.var = train,event.class = 1)

#Creating a dataset with the transformed variables and default flag
train_woe <- woe.get.clear.data(train_woe,default_flag = "creditability",prefix = "woe")

#Applying the WOE rules used on the train data to the test data
test_woe <- woe.get.clear.data(test_woe,default_flag = "creditability",prefix = "woe")```

```#Performing the IV and Gini calculations for the whole data set
IV.calc.data(train_woe,"creditability")```

`Gini.univariate.data(train_woe,"creditability")`

```#Creating a new dataset by Gini elimination. IV elimination is also possible
eliminated_data <- Gini_elimination(train_woe,"creditability",0.10)
str(eliminated_data)```

```#A demonstration of the functions useful in performing Clustering
clustering_data <- variable.clustering(eliminated_data,"creditability", 2)
clustering_data```

```# Returns the data for variables that have the maximum gini value in the dataset
selected_data <- variable.clustering.gini(eliminated_data,"creditability", 2)```

`correlation.cluster(eliminated_data,clustering_data,variables = "variable",clusters = "Group")`

```#Creating a logistic regression model of the data
model= glm(formula = creditability ~ ., family = binomial(link = "logit"),  data = eliminated_data)
summary(model)```

```#Calculating variable weights
woe.glm.feature.importance(eliminated_data,model,"creditability")```

```#Generating the PD values for the train and test data
ms_train_data <- cbind(eliminated_data,model\$fitted.values)
ms_test_data <- cbind(test_woe[,colnames(eliminated_data)], predict(model,type = "response", newdata = test_woe))
colnames(ms_train_data) <- c("woe.duration.in.month.binned","woe.age.in.years.binned","woe.installment.rate.in.percentage.of.disposable.income.binned","creditability","PD")
colnames(ms_test_data) <- c("woe.duration.in.month.binned","woe.age.in.years.binned","woe.installment.rate.in.percentage.of.disposable.income.binned","creditability","PD")```

```#An example application of the Regression calibration method. The model is calibrated to the test_woe data
regression_calibration <- regression.calibration(model,test_woe,"creditability")
regression_calibration\$calibration_data
regression_calibration\$calibration_model
regression_calibration\$calibration_formula```

```#Creating a master scale
master_scale <- master.scale(ms_train_data,"creditability","PD")
master_scale```

```#Calibrating the master scale and the modeling data to the default rate of 5% using the bayesian calibration method
ms_train_data\$Score = log(ms_train_data\$PD/(1-ms_train_data\$PD))
ms_test_data\$Score = log(ms_test_data\$PD/(1-ms_test_data\$PD))
bayesian_method <- bayesian.calibration(data = master_scale,average_score ="Score",total_observations = "Total.Observations",PD = "PD",central_tendency = 0.05,calibration_data = ms_train_data,calibration_data_score ="Score")

#After calibration, the information and data related to the calibration process can be obtained as follows
bayesian_method\$Calibration.model
bayesian_method\$Calibration.formula```

```#The Scaled score can be created using the following function
scaled.score(bayesian_method\$calibration_data, "calibrated_pd", 3000, 15)```

```#Calculating the Vif values of the variables.
vif.calc(model)```

```#Calculating the Gini for the model
Gini(model\$fitted.values,ms_train_data\$creditability)```

0.3577422

```#Performing the 5 Fold cross validation
k.fold.cross.validation.glm(ms_train_data,"creditability",5,1)```

```#The KS test is performed on the distributions of the estimates for good and bad observations
Kolmogorov.Smirnov(ms_train_data,"creditability","PD")
Kolmogorov.Smirnov(ms_test_data,"creditability","PD")```

```#Variable stabilities are measured
SSI.calc.data(train_woe,test_woe,"creditability")```

```#The HHI test is performed to measure the concentration of the master scale
Herfindahl.Hirschman.Index(master_scale,"Total.Observations")```

0.1463665

```#Performing the Anchor point test
Anchor.point(master_scale,"PD","Total.Observations",0.30)```

```#The Chi-square test is applied on the master scale

```#The Binomial test is applied on the master scale
Binomial.test(master_scale,"Total.Observations","PD","DR",0.90,"one")```

### 最后几点

creditR软件包为用户提供了许多执行传统信用风险评分的方法，以及一些用于测试模型有效性的方法，这些方法也可应用于ML算法。此外，由于该包在传统方法的应用中提供了自动化，因此可以降低这些过程的操作成本。

Ayhan Dis是一名高级风险顾问。他负责咨询项目，如国际财务报告准则9/IRB模型开发和验证，以及高级分析解决方案，包括欺诈分析、客户分析和风险分析等领域的ML/DL，熟练使用Python、R、Base SAS和SQL。

• https://github.com/ayhandis

• disayhan@gmail.com

Hands-On Introduction to creditR: An Amazing R Package to Enhance Credit Risk Scoring and Validation

https://www.analyticsvidhya.com/blog/2019/03/introduction-creditr-r-package-enhance-credit-risk-scoring-validation-r-codes/

THU数据派

THU数据派"基于清华，放眼世界"，以扎实的理工功底闯荡“数据江湖”。发布全球大数据资讯，定期组织线下活动，分享前沿产业动态。了解清华大数据，敬请关注姐妹号“数据派THU”。