Credit Approval Data Set – Predicting Credit Approval Using Logistic Regression and Matching Predictions to DataSet

When someone applies for Credit, it will be unfair to reject those who duly qualify and it might be detrimental to the company to wrongly accept the wrong people. This is likely to happen if we try to make such decisions based on gut feelings.

So how do we use Machine Learning to increase our chances of selecting the right people who qualify for credit and also eliminate those who are not yet qualified, using a couple of information we know about those applicants.

In this tutorial, we will be using a Credit Approval Data Set from UCI Machine Learning Repository. The data can be downloaded from here: Credit Approval Data Set

Now let’s read in our dataset and proceed with our model building. We will be using Logistic Regression in R but there a host of other algorithms you can use

library(data.table)

Let’s read in the credit data

crx.data <- data.table(read.table("crx.data.txt", header = FALSE, sep = ",", na.strings = "?"))

Lets preview our credit data

head(crx.data)

##    V1    V2    V3 V4 V5 V6 V7   V8 V9 V10 V11 V12 V13 V14 V15 V16
## 1:  b 30.83 0.000  u  g  w  v 1.25  t   t   1   f   g 202   0   +
## 2:  a 58.67 4.460  u  g  q  h 3.04  t   t   6   f   g  43 560   +
## 3:  a 24.50 0.500  u  g  q  h 1.50  t   f   0   f   g 280 824   +
## 4:  b 27.83 1.540  u  g  w  v 3.75  t   t   5   t   g 100   3   +
## 5:  b 20.17 5.625  u  g  w  v 1.71  t   f   0   f   s 120   0   +
## 6:  b 32.08 4.000  u  g  m  v 2.50  t   f   0   t   g 360   0   +

It is good to have an idea about the class of the variables, their levels and some sample data

str(crx.data)

## Classes 'data.table' and 'data.frame':   690 obs. of  16 variables:
##  $ V1 : Factor w/ 2 levels "a","b": 2 1 1 2 2 2 2 1 2 2 ...
##  $ V2 : num  30.8 58.7 24.5 27.8 20.2 ...
##  $ V3 : num  0 4.46 0.5 1.54 5.62 ...
##  $ V4 : Factor w/ 3 levels "l","u","y": 2 2 2 2 2 2 2 2 3 3 ...
##  $ V5 : Factor w/ 3 levels "g","gg","p": 1 1 1 1 1 1 1 1 3 3 ...
##  $ V6 : Factor w/ 14 levels "aa","c","cc",..: 13 11 11 13 13 10 12 3 9 13 ...
##  $ V7 : Factor w/ 9 levels "bb","dd","ff",..: 8 4 4 8 8 8 4 8 4 8 ...
##  $ V8 : num  1.25 3.04 1.5 3.75 1.71 ...
##  $ V9 : Factor w/ 2 levels "f","t": 2 2 2 2 2 2 2 2 2 2 ...
##  $ V10: Factor w/ 2 levels "f","t": 2 2 1 2 1 1 1 1 1 1 ...
##  $ V11: int  1 6 0 5 0 0 0 0 0 0 ...
##  $ V12: Factor w/ 2 levels "f","t": 1 1 1 2 1 2 2 1 1 2 ...
##  $ V13: Factor w/ 3 levels "g","p","s": 1 1 1 1 3 1 1 1 1 1 ...
##  $ V14: int  202 43 280 100 120 360 164 80 180 52 ...
##  $ V15: int  0 560 824 3 0 0 31285 1349 314 1442 ...
##  $ V16: Factor w/ 2 levels "-","+": 2 2 2 2 2 2 2 2 2 2 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Lets see the full column names in our credit data set

names(crx.data)

##  [1] "V1"  "V2"  "V3"  "V4"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10" "V11"
## [12] "V12" "V13" "V14" "V15" "V16"

Let’s examine the distribution of the target variable, the one we are trying to predict to view , the distribution, we will convert the values to numeric values for this purpose We saw that the target variables values were “+” and “-”, hence in our conversions, the “+” = 1 and “-”” = 0

hist(as.numeric(crx.data$V16)-1)

Lets see how are target variable now looks

as.numeric(crx.data$V16)-1

##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [71] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [106] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [141] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [176] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [211] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [246] 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0
## [281] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [316] 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [351] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [386] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [421] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [456] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [491] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
## [526] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## [561] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [596] 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
## [631] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [666] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Lets now actually change the target variable to 0s and 1s in our dataset for the rest of the analysis

crx.data$V16 <- as.numeric(crx.data$V16)-1

Lets create check the distribution of each of the attributes of the credit data using histograms

numeric_data <- as.data.frame(crx.data[,c(2,3,8,11,14,15)])
summary(numeric_data)

##        V2              V3               V8              V11      
##  Min.   :13.75   Min.   : 0.000   Min.   : 0.000   Min.   : 0.0  
##  1st Qu.:22.60   1st Qu.: 1.000   1st Qu.: 0.165   1st Qu.: 0.0  
##  Median :28.46   Median : 2.750   Median : 1.000   Median : 0.0  
##  Mean   :31.57   Mean   : 4.759   Mean   : 2.223   Mean   : 2.4  
##  3rd Qu.:38.23   3rd Qu.: 7.207   3rd Qu.: 2.625   3rd Qu.: 3.0  
##  Max.   :80.25   Max.   :28.000   Max.   :28.500   Max.   :67.0  
##  NA's   :12                                                      
##       V14            V15          
##  Min.   :   0   Min.   :     0.0  
##  1st Qu.:  75   1st Qu.:     0.0  
##  Median : 160   Median :     5.0  
##  Mean   : 184   Mean   :  1017.4  
##  3rd Qu.: 276   3rd Qu.:   395.5  
##  Max.   :2000   Max.   :100000.0  
##  NA's   :13

par(mfrow=c(2,3))
for(i in 1:6) {
    hist(numeric_data[,i], main=names(numeric_data)[i])
}

The variable V2 seems to have bell curve shape but most of the numeric variables of our dataset are skewed to the right ( their tails point to the right). This suggests we can preprocess or transform the data using techniques like Box-Cox to see if it enhances the model performance after our first model built. (However, we will not be covering that technique in this article)

A closer look at variable V14 particularly V15 (and also deducing from the summary statistics indicate there might be some outliers . For instance, the average of the V15 variable is 1017.4 whereas the maximum is 100000.0 which is way from the normal. There advanced techniques to statistically detect outliers (That is not covered here) and we will not be removing these data points in this analysis

Lets take a closer look at V15 again

par(mfrow=c(1,1))
hist((numeric_data[,"V15"]))

#hist(log10(numeric_data[,"V15"]))
summary(numeric_data$V15)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##      0.0      0.0      5.0   1017.0    395.5 100000.0

Let’s look at the non numerical values to have a sense of how they are distributed

non_numeric <- numeric_data <- as.data.frame(crx.data[,-c(2,3,8,11,14,15)])

How many are there?

dim(non_numeric)

## [1] 690  10

What are they ?

names(non_numeric)

##  [1] "V1"  "V4"  "V5"  "V6"  "V7"  "V9"  "V10" "V12" "V13" "V16"

Lets see their distribution. V16 was a factor variable but that was later converted to numeric. but we will view it as part of this distribution

par(mfrow=c(2,5))
for(i in 1:10) {
    plot(non_numeric[,i], main=names(non_numeric)[i])
}

Lets examine the relationship between the numerical features of the application

plot(crx.data[,c(2,3,8,11,14,15)], col=crx.data$V16)

Lets check the correlations between the numerical features

correlations <- cor(crx.data[,c(2,3,8,11,14,15)])
print(correlations)

##     V2        V3         V8        V11 V14        V15
## V2   1        NA         NA         NA  NA         NA
## V3  NA 1.0000000 0.29890156 0.27120674  NA 0.12312115
## V8  NA 0.2989016 1.00000000 0.32232967  NA 0.05134493
## V11 NA 0.2712067 0.32232967 1.00000000  NA 0.06369244
## V14 NA        NA         NA         NA   1         NA
## V15 NA 0.1231212 0.05134493 0.06369244  NA 1.00000000

lets get a pairwise visualization of all the dataset

pairs(V16~., data=non_numeric, col=non_numeric$V16)

summary

summary(crx.data)

##     V1            V2              V3            V4         V5     
##  a   :210   Min.   :13.75   Min.   : 0.000   l   :  2   g   :519  
##  b   :468   1st Qu.:22.60   1st Qu.: 1.000   u   :519   gg  :  2  
##  NA's: 12   Median :28.46   Median : 2.750   y   :163   p   :163  
##             Mean   :31.57   Mean   : 4.759   NA's:  6   NA's:  6  
##             3rd Qu.:38.23   3rd Qu.: 7.207                        
##             Max.   :80.25   Max.   :28.000                        
##             NA's   :12                                            
##        V6            V7            V8         V9      V10    
##  c      :137   v      :399   Min.   : 0.000   f:329   f:395  
##  q      : 78   h      :138   1st Qu.: 0.165   t:361   t:295  
##  w      : 64   bb     : 59   Median : 1.000                  
##  i      : 59   ff     : 57   Mean   : 2.223                  
##  aa     : 54   j      :  8   3rd Qu.: 2.625                  
##  (Other):289   (Other): 20   Max.   :28.500                  
##  NA's   :  9   NA's   :  9                                   
##       V11       V12     V13          V14            V15          
##  Min.   : 0.0   f:374   g:625   Min.   :   0   Min.   :     0.0  
##  1st Qu.: 0.0   t:316   p:  8   1st Qu.:  75   1st Qu.:     0.0  
##  Median : 0.0           s: 57   Median : 160   Median :     5.0  
##  Mean   : 2.4                   Mean   : 184   Mean   :  1017.4  
##  3rd Qu.: 3.0                   3rd Qu.: 276   3rd Qu.:   395.5  
##  Max.   :67.0                   Max.   :2000   Max.   :100000.0  
##                                 NA's   :13                       
##       V16        
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.4449  
##  3rd Qu.:1.0000  
##  Max.   :1.0000  
##

What percentage of the dataset is missing some VALUES. if more than 10% we should go and find more and appropriate data Will this percentage of missen data have an impact on our ? How do we handle missen data ?

missen <- sum(!complete.cases(crx.data))/dim(crx.data)[1] *100
missen

## [1] 5.362319

This shows about 5.36% missen data

HANDLING MISSEN VALUES

Input the missen values in the variable 1 (V1) with the most occuring value – MODE The list of variables are few hence we will be inputing the missen values one after the other. Ideally, you would a create function that will shorten the entire process.

Create function to get the mode

getmode <- function(v) {
    uniqv <- unique(v)
    uniqv[which.max(tabulate(match(v, uniqv)))]
}

v1.mode <- getmode(crx.data$V1)

Input the missen values with the mode

crx.data[is.na(crx.data$V1), 1]  <- v1.mode

Preview the inputation

table(crx.data$V1)

## 
##   a   b 
## 210 480

Input the missen values in the variable V2 with the average

crx.data[is.na(crx.data$V2),  2]  <- mean(crx.data$V2, na.rm = TRUE)

Preview the inputation

summary(crx.data$V2)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   13.75   22.67   28.62   31.57   37.71   80.25

Input the missen values in the V4 with the mode

crx.data[is.na(crx.data$V4), 4]  <- getmode(crx.data$V4)
#preview the inputation
table(crx.data$V4)

## 
##   l   u   y 
##   2 525 163

Input the missen values in the V5 with the mode

crx.data[is.na(crx.data$V5), 5]  <- getmode(crx.data$V5)
#preview the inputation
table(crx.data$V5)

## 
##   g  gg   p 
## 525   2 163

V6: input the missen values with the mode

crx.data[is.na(crx.data$V6), 6]  <- getmode(crx.data$V6)
#preview the inputation
table(crx.data$V6)

## 
##  aa   c  cc   d   e  ff   i   j   k   m   q   r   w   x 
##  54 146  41  30  25  53  59  10  51  38  78   3  64  38

Any missen values

table(is.na(crx.data$V6))

## 
## FALSE 
##   690

V7: input the missen values with the mode

crx.data[is.na(crx.data$V7), 7]  <- getmode(crx.data$V7)
#preview the inputation
table(crx.data$V7)

## 
##  bb  dd  ff   h   j   n   o   v   z 
##  59   6  57 138   8   4   2 408   8

V14: input the missen values with the mode

crx.data[is.na(crx.data$V14), 14]  <- as.integer(mean(crx.data$V14, na.rm = TRUE))
#preview the inputation
summary(crx.data$v14)

## Length  Class   Mode 
##      0   NULL   NULL

Lets call the mlbench and caret libraries. You can install them if they are not already installed.

#install.packages("mlbench")
#install.packages("caret")
library(mlbench)
library(caret)

## Loading required package: ggplot2

Split data into test and train set Define an 70%/30% train/test split of the dataset

set.seed(257)
trainIndex <- createDataPartition(crx.data$V16, p=0.70, list=FALSE)
dataTrain <- crx.data[ trainIndex,]
dataTest <- crx.data[-trainIndex,]

Run logistic regression on the training dataset

model <- glm(V16 ~.,family=binomial(link='logit'),data=dataTrain)

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Lets see the summary of our model results

summary(model)

## 
## Call:
## glm(formula = V16 ~ ., family = binomial(link = "logit"), data = dataTrain)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4383  -0.3272  -0.1221   0.4432   3.3254  
## 
## Coefficients: (2 not defined because of singularities)
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -4.243e+00  2.058e+03  -0.002 0.998355    
## V1b         -2.702e-01  3.618e-01  -0.747 0.455196    
## V2           1.543e-02  1.524e-02   1.012 0.311384    
## V3          -1.895e-02  3.193e-02  -0.594 0.552808    
## V4u          1.338e-01  2.058e+03   0.000 0.999948    
## V4y         -3.783e-01  2.058e+03   0.000 0.999853    
## V5gg                NA         NA      NA       NA    
## V5p                 NA         NA      NA       NA    
## V6c          3.418e-01  5.888e-01   0.580 0.561654    
## V6cc         1.250e+00  8.525e-01   1.466 0.142633    
## V6d          6.549e-01  9.276e-01   0.706 0.480152    
## V6e          2.194e+00  1.282e+00   1.711 0.087154 .  
## V6ff        -1.944e+01  1.455e+03  -0.013 0.989340    
## V6i          3.286e-01  7.948e-01   0.413 0.679283    
## V6j         -1.753e+01  1.455e+03  -0.012 0.990392    
## V6k         -1.303e-01  7.723e-01  -0.169 0.866028    
## V6m          3.100e-01  8.137e-01   0.381 0.703214    
## V6q          6.079e-01  6.437e-01   0.944 0.344941    
## V6r         -1.399e+01  1.455e+03  -0.010 0.992329    
## V6w          1.226e+00  6.897e-01   1.777 0.075571 .  
## V6x          2.662e+00  9.430e-01   2.823 0.004762 ** 
## V7dd        -3.426e-01  1.970e+00  -0.174 0.861909    
## V7ff         1.865e+01  1.455e+03   0.013 0.989778    
## V7h          1.237e+00  6.711e-01   1.844 0.065242 .  
## V7j          1.900e+01  1.455e+03   0.013 0.989584    
## V7n          3.458e+00  1.660e+00   2.083 0.037278 *  
## V7o         -1.259e+01  1.455e+03  -0.009 0.993098    
## V7v          5.835e-01  6.088e-01   0.958 0.337841    
## V7z         -2.992e+00  1.861e+00  -1.607 0.107950    
## V8           3.096e-02  5.223e-02   0.593 0.553275    
## V9t          3.621e+00  3.997e-01   9.060  < 2e-16 ***
## V10t         9.237e-01  4.395e-01   2.102 0.035592 *  
## V11          7.929e-02  6.318e-02   1.255 0.209516    
## V12t        -1.047e-01  3.268e-01  -0.321 0.748577    
## V13p         4.057e+00  1.051e+00   3.861 0.000113 ***
## V13s         2.915e-01  6.023e-01   0.484 0.628408    
## V14         -2.854e-03  9.985e-04  -2.859 0.004256 ** 
## V15          6.357e-04  2.190e-04   2.903 0.003699 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 667.84  on 482  degrees of freedom
## Residual deviance: 291.88  on 447  degrees of freedom
## AIC: 363.88
## 
## Number of Fisher Scoring iterations: 14

From the result, we can see that variables V6x, V9t , V13p, and V15 are statistically significant as they have p-values less than 0.05.

For instance from the result we can see that a unit change in variable V9t will increase the person’s chance of being approved for credit by 3.901e+00 whilst holding the other variables constant (because the coefficient of V9t is positive)

Now lets check our deviances. The smaller the deviance value, the better.

Our Null deviance = 664.19. OUr Null deviance is how welll our model is perforning if we do not have any predictor variables considered but only accounting for the intercept. And our Residual Deviance = 210.11 and this indicates how well our model is performaning when we add in predictor variables to our model.

We can see that deviance reduces (which means our model performs better when we add in our predictor variables). Which indicates, we can make better decisions on either approving a person for a credit or not if we consider other variables than just making a guess without taking some significant variables into consideration.

Let’s see a table of deviances by running anova on our model

anova(model, test="Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: V16
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                   482     667.84              
## V1    1    1.406       481     666.43 0.2356401    
## V2    1   14.303       480     652.13 0.0001556 ***
## V3    1   10.101       479     642.03 0.0014819 ** 
## V4    2   20.973       477     621.05 2.791e-05 ***
## V5    0    0.000       477     621.05              
## V6   13   71.731       464     549.32 3.847e-10 ***
## V7    8    7.839       456     541.48 0.4493155    
## V8    1   11.281       455     530.20 0.0007831 ***
## V9    1  175.302       454     354.90 < 2.2e-16 ***
## V10   1   21.597       453     333.30 3.364e-06 ***
## V11   1    5.416       452     327.89 0.0199566 *  
## V12   1    2.241       451     325.65 0.1343758    
## V13   2   13.604       449     312.04 0.0011113 ** 
## V14   1    8.351       448     303.69 0.0038541 ** 
## V15   1   11.814       447     291.88 0.0005877 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Taking note of the Null deviance, it can be observed that as we add our predictor variables sequentially to the model (first to last as they appear in the result), the deviance reduces (model performs better). The deviance drops most where the model indicates

Let’s make predictions using the data we kept aside, which is the Test Data

probabilities <- predict(model, newdata = dataTest[,-16], type='response')

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type =
## ifelse(type == : prediction from a rank-deficient fit may be misleading

predictions <- ifelse(probabilities > 0.5,'1','0')

Let’s summarize the accuracy of the predictions

table(predictions, dataTest$V16)

##            
## predictions   0   1
##           0 112  10
##           1  15  70

It can be seen that the correct predictions are on the diagonal and the wrong ones are on the “off-diagonal”

It can be seen that we predicted *70 to be “1” (TRUE or they should be approved) and they were indeed “1”(TRUE, correct that we should approve them). That is our TRUE-POSITIVE (TP).

We also predicted 112 to be “0” (FALSE, that they should be rejected) and they were indeed “0” (FALSE, in the test dataset, they were actually rejected) which indicate our TRUE-NEGATIVE (TN)

and we predicted 10 to be “1”(TRUE, that they should be accepted) but that prediction was FALSE (WRONG, in the Test Dataset, they were not accepted) . That is our FALSE-POSITIVE (FP)

and we predicted 15 to be “0” (FALSE, they should be rejected) but that was WRONG as they were “1” (TRUE, they were accepted in the Test Dataset). That is the FALSE-NEGATIVE (FN)

Lets check the overall Accuracy of our model

misClasificError <- mean(predictions != dataTest$V16)
print(paste('Accuracy',1-misClasificError))

## [1] "Accuracy 0.879227053140097"

The overall Accuracy of our model is 87.9% . This can be improved further with other advanced techniques and Parameter tuning (but that will not be covered here)

Though we get 85.5% overall Accuracy, it is always better to check our recall (TRUE-POSITIVE rate or sensitivity). Recall = TP / (TP + FN)

and also your precision which is Precision = TP/ (TP + FP)

And it is also good to have a peep at your TRUE NEGATIVE RATE (which is Specificity) and that is TRUE NEGATIVE RATE = TN / (TN + FP)

Let’s match predictions to original dataset

copy_dataTest <- data.frame(dataTest)
dim(copy_dataTest)

## [1] 207  16

copy_dataTest_pred <- cbind(copy_dataTest, predictions)
copy_dataTest_pred

##     V1       V2     V3 V4 V5 V6 V7     V8 V9 V10 V11 V12 V13  V14    V15
## 1    b 30.83000  0.000  u  g  w  v  1.250  t   t   1   f   g  202      0
## 2    a 24.50000  0.500  u  g  q  h  1.500  t   f   0   f   g  280    824
## 3    b 27.83000  1.540  u  g  w  v  3.750  t   t   5   t   g  100      3
## 4    b 32.08000  4.000  u  g  m  v  2.500  t   f   0   t   g  360      0
## 5    b 33.17000  1.040  u  g  r  h  6.500  t   f   0   t   g  164  31285
## 6    a 22.92000 11.585  u  g cc  v  0.040  t   f   0   f   g   80   1349
## 7    b 29.92000  1.835  u  g  c  h  4.335  t   f   0   f   g  260    200
## 8    a 23.25000  5.875  u  g  q  v  3.170  t   t  10   f   g  120    245
## 9    b 25.00000 11.250  u  g  c  v  2.500  t   t  17   f   g  200   1208
## 10   a 47.75000  8.000  u  g  c  v  7.875  t   t   6   t   g    0   1260
## 11   b 54.58000  9.415  u  g ff ff 14.415  t   t  11   t   g   30    300
## 12   b 34.17000  9.170  u  g  c  v  4.500  t   t  12   t   g    0    221
## 13   b 29.67000  1.415  u  g  w  h  0.750  t   t   1   f   g  240    100
## 14   b 41.50000  1.540  u  g  i bb  3.500  f   f   0   f   g  216      0
## 15   b 26.00000  1.000  u  g  q  v  1.750  t   f   0   t   g  280      0
## 16   b 37.42000  2.040  u  g  w  v  0.040  t   f   0   t   g  400   5800
## 17   b 43.25000  3.000  u  g  q  h  6.000  t   t  11   f   g   80      0
## 18   a 36.00000  1.000  u  g  c  v  2.000  t   t  11   f   g    0    456
## 19   b 20.67000  5.290  u  g  q  v  0.375  t   t   1   f   g  160      0
## 20   a 49.00000  1.500  u  g  j  j  0.000  t   f   0   t   g  100     27
## 21   b 31.56817  5.000  y  p aa  v  8.500  t   f   0   f   g    0      0
## 22   a 28.58000  3.540  u  g  i bb  0.500  t   f   0   t   g  171      0
## 23   b 37.50000  1.750  y  p  c bb  0.250  t   f   0   t   g  164    400
## 24   b 27.83000  4.000  y  p  i  h  5.750  t   t   2   t   g   75      0
## 25   b 24.58000 12.500  u  g  w  v  0.875  t   f   0   t   g  260      0
## 26   a 33.75000  0.750  u  g  k bb  1.000  t   t   3   t   g  212      0
## 27   b 20.67000  1.250  y  p  c  h  1.375  t   t   3   t   g  140    210
## 28   b 39.92000  6.210  u  g  q  v  0.040  t   t   1   f   g  200    300
## 29   a 44.17000  6.665  u  g  q  v  7.375  t   t   3   t   g    0      0
## 30   b 67.75000  5.500  u  g  e  z 13.000  t   t   1   t   g    0      0
## 31   b 20.42000  1.835  u  g  c  v  2.250  t   t   1   f   g  100    150
## 32   b 33.58000  2.750  u  g  m  v  4.250  t   t   6   f   g  204      0
## 33   b 43.00000  0.290  y  p cc  h  1.750  t   t   8   f   g  100    375
## 34   b 32.83000  2.500  u  g cc  h  2.750  t   t   6   f   g  160   2072
## 35   a 40.33000  7.540  y  p  q  h  8.000  t   t  14   f   g    0   2300
## 36   a 30.50000  6.500  u  g  c bb  4.000  t   t   7   t   g    0   3065
## 37   b 23.08000  2.500  u  g  c  v  1.085  t   t  11   t   g   60   2184
## 38   a 28.50000  3.040  y  p  x  h  2.540  t   t   1   f   g   70      0
## 39   b 44.00000  2.000  u  g  m  v  1.750  t   t   2   t   g    0     15
## 40   b 25.00000 12.500  u  g aa  v  3.000  t   f   0   t   s   20      0
## 41   b 20.17000  8.170  u  g aa  v  1.960  t   t  14   f   g   60    158
## 42   a 21.25000  2.335  u  g  i bb  0.500  t   t   4   f   s   80      0
## 43   a 57.08000 19.500  u  g  c  v  5.500  t   t   7   f   g    0   3000
## 44   a 22.42000  5.665  u  g  q  v  2.585  t   t   7   f   g  129   3257
## 45   b 21.33000 10.500  u  g  c  v  3.000  t   f   0   t   g    0      0
## 46   b 22.67000  1.585  y  p  w  v  3.085  t   t   6   f   g   80      0
## 47   b 24.83000  2.750  u  g  c  v  2.250  t   t   6   f   g  184    600
## 48   a 20.75000 10.250  u  g  q  v  0.710  t   t   2   t   g   49      0
## 49   b 36.33000  2.125  y  p  w  v  0.085  t   t   1   f   g   50   1187
## 50   b 28.67000  9.335  u  g  q  h  5.665  t   t   6   f   g  381    168
## 51   b 39.33000  5.875  u  g cc  h 10.000  t   t  14   t   g  399      0
## 52   b 26.67000  2.710  y  p cc  v  5.250  t   t   1   f   g  211      0
## 53   b 48.17000  3.500  u  g aa  v  3.500  t   f   0   f   s  230      0
## 54   a 50.08000 12.540  u  g aa  v  2.290  t   t   3   t   g  156      0
## 55   b 23.25000  4.000  u  g  c bb  0.250  t   f   0   t   g  160      0
## 56   b 18.08000  5.500  u  g  k  v  0.500  t   f   0   f   g   80      0
## 57   a 58.42000 21.000  u  g  i bb 10.000  t   t  13   f   g    0   6700
## 58   a 20.67000  1.835  u  g  q  v  2.085  t   t   5   f   g  220   2503
## 59   b 21.33000  7.500  u  g aa  v  1.415  t   t   1   f   g   80   9800
## 60   b 28.33000  5.000  u  g  w  v 11.000  t   f   0   t   g   70      0
## 61   b 33.17000  3.040  y  p  c  h  2.040  t   t   1   t   g  180  18027
## 62   b 23.17000 11.125  u  g  x  h  0.460  t   t   1   f   g  100      0
## 63   b 31.56817  0.625  u  g  k  v  0.250  f   f   0   f   g  380   2010
## 64   b 20.00000  0.000  u  g  d  v  0.500  f   f   0   f   g  144      0
## 65   a 24.50000  1.750  y  p  c  v  0.165  f   f   0   f   g  132      0
## 66   a 52.17000  0.000  y  p ff ff  0.000  f   f   0   f   g    0      0
## 67   b 17.08000  0.085  y  p  c  v  0.040  f   f   0   f   g  140    722
## 68   b 18.33000  1.210  y  p  e dd  0.000  f   f   0   f   g  100      0
## 69   b 30.67000  2.500  u  g cc  h  2.250  f   f   0   t   s  340      0
## 70   b 16.25000  0.835  u  g  m  v  0.085  t   f   0   f   s  200      0
## 71   b 17.58000 10.000  u  g  w  h  0.165  f   t   1   f   g  120      1
## 72   a 31.56817  1.500  u  g ff ff  0.000  f   t   2   t   g  200    105
## 73   b 29.50000  0.580  u  g  w  v  0.290  f   t   1   f   g  340   2803
## 74   a 21.75000  1.750  y  p  j  j  0.000  f   f   0   f   g  160      0
## 75   b 35.75000  2.415  u  g  w  v  0.125  f   t   2   f   g  220      1
## 76   b 32.92000  2.500  u  g aa  v  1.750  f   t   2   t   g  720      0
## 77   b 16.33000  2.750  u  g aa  v  0.665  f   t   1   f   g   80     21
## 78   b 23.42000  1.000  u  g  c  v  0.500  f   f   0   t   s  280      0
## 79   b 23.50000  2.750  u  g ff ff  4.500  f   f   0   f   g  160     25
## 80   b 18.58000 10.290  u  g ff ff  0.415  f   f   0   f   g   80      0
## 81   b 27.75000  1.290  u  g  k  h  0.250  f   f   0   t   s  140      0
## 82   a 24.83000  4.500  u  g  w  v  1.000  f   f   0   t   g  360      6
## 83   a 18.58000 10.000  u  g  d  v  0.415  f   f   0   f   g   80     42
## 84   b 16.25000  0.000  y  p aa  v  0.250  f   f   0   f   g   60      0
## 85   b 17.50000 22.000  l gg ff  o  0.000  f   f   0   t   p  450 100000
## 86   a 33.67000  0.375  u  g cc  v  0.375  f   f   0   f   g  300     44
## 87   b 30.17000  1.085  y  p  c  v  0.040  f   f   0   f   g  170    179
## 88   b 47.33000  6.500  u  g  c  v  1.000  f   f   0   t   g    0    228
## 89   a 34.83000  1.250  y  p  i  h  0.500  f   f   0   t   g  160      0
## 90   b 38.92000  1.750  u  g  k  v  0.500  f   f   0   t   g  300      2
## 91   b 62.75000  7.000  u  g  e  z  0.000  f   f   0   f   g    0     12
## 92   b 63.33000  0.540  u  g  c  v  0.585  t   t   3   t   g  180      0
## 93   b 41.17000  1.335  u  g  d  v  0.165  f   f   0   f   g  168      0
## 94   b 32.42000  3.000  u  g  d  v  0.165  f   f   0   t   g  120      0
## 95   a 30.25000  5.500  u  g  k  v  5.500  f   f   0   t   s  100      0
## 96   b 23.08000  2.500  u  g ff ff  0.085  f   f   0   t   g  100   4208
## 97   b 26.83000  0.540  u  g  k ff  0.000  f   f   0   f   g  100      0
## 98   a 22.75000  6.165  u  g aa  v  0.165  f   f   0   f   g  220   1000
## 99   b 33.00000  2.500  y  p  w  v  7.000  f   f   0   t   g  280      0
## 100  b 26.33000 13.000  u  g  e dd  0.000  f   f   0   t   g  140   1110
## 101  b 26.25000  1.540  u  g  w  v  0.125  f   f   0   f   g  100      0
## 102  b 28.67000 14.500  u  g  d  v  0.125  f   f   0   f   g    0    286
## 103  b 20.67000  0.835  y  p  c  v  2.000  f   f   0   t   s  240      0
## 104  a 22.67000  7.000  u  g  c  v  0.165  f   f   0   f   g  160      0
## 105  b 22.08000 11.460  u  g  k  v  1.585  f   f   0   t   g  100   1212
## 106  b 22.58000  1.500  y  p aa  v  0.540  f   f   0   t   g  120     67
## 107  b 21.17000  0.000  u  g  c  v  0.500  f   f   0   t   s    0      0
## 108  b 24.75000  0.540  u  g  m  v  1.000  f   f   0   t   g  120      1
## 109  b 41.17000  1.250  y  p  w  v  0.250  f   f   0   f   g    0    195
## 110  a 33.08000  1.625  u  g  d  v  0.540  f   f   0   t   g    0      0
## 111  b 20.75000  5.085  y  p  j  v  0.290  f   f   0   f   g  140    184
## 112  b 28.92000  0.375  u  g  c  v  0.290  f   f   0   f   g  220    140
## 113  a 22.67000  0.335  u  g  q  v  0.750  f   f   0   f   s  160      0
## 114  a 19.58000  0.665  y  p  c  v  1.000  f   t   1   f   g 2000      2
## 115  b 17.08000  0.250  u  g  q  v  0.335  f   t   4   f   g  160      8
## 116  b 31.25000  2.835  u  g ff ff  0.000  f   t   5   f   g  176    146
## 117  a 22.67000  0.790  u  g  i  v  0.085  f   f   0   f   g  144      0
## 118  b 40.58000  1.500  u  g  i bb  0.000  f   f   0   f   s  300      0
## 119  a 22.25000  1.250  y  p ff ff  3.250  f   f   0   f   g  280      0
## 120  b 22.50000  0.125  y  p  k  v  0.125  f   f   0   f   g  200     70
## 121  a 26.58000  2.540  y  p ff ff  0.000  f   f   0   t   g  180     60
## 122  b 35.00000  2.500  u  g  i  v  1.000  f   f   0   t   g  210      0
## 123  b 26.17000  0.835  u  g cc  v  1.165  f   f   0   f   g  100      0
## 124  b 33.67000  2.165  u  g  c  v  1.500  f   f   0   f   p  120      0
## 125  b 24.58000  1.250  u  g  c  v  0.250  f   f   0   f   g  110      0
## 126  b 37.50000  0.835  u  g  e  v  0.040  f   f   0   f   g  120      5
## 127  b 22.92000  3.165  y  p  c  v  0.165  f   f   0   f   g  160   1058
## 128  b 19.00000  0.000  y  p ff ff  0.000  f   t   4   f   g   45      1
## 129  b 19.58000  0.585  u  g ff ff  0.000  f   t   3   f   g  350    769
## 130  a 53.33000  0.165  u  g ff ff  0.000  f   f   0   t   s   62     27
## 131  a 27.17000  1.250  u  g ff ff  0.000  f   t   1   f   g   92    300
## 132  b 39.58000  5.000  u  g ff ff  0.000  f   t   2   f   g   17      1
## 133  b 16.50000  0.125  u  g  c  v  0.165  f   f   0   f   g  132      0
## 134  a 27.33000  1.665  u  g ff ff  0.000  f   f   0   f   g  340      1
## 135  b 39.50000  1.625  u  g  c  v  1.500  f   f   0   f   g    0    316
## 136  b 29.75000  0.665  u  g  w  v  0.250  f   f   0   t   g  300      0
## 137  b 25.67000  0.290  y  p  c  v  1.500  f   f   0   t   g  160      0
## 138  a 24.50000  2.415  y  p  c  v  0.000  f   f   0   f   g  120      0
## 139  b 21.92000  0.500  u  g  c  v  0.125  f   f   0   f   g  360      0
## 140  a 30.42000  1.375  u  g  w  h  0.040  f   t   3   f   g    0     33
## 141  b 21.08000  4.125  y  p  i  h  0.040  f   f   0   f   g  140    100
## 142  b 26.75000  2.000  u  g  d  v  0.750  f   f   0   t   g   80      0
## 143  b 23.58000  0.835  u  g  i  h  0.085  f   f   0   t   g  220      5
## 144  b 39.17000  2.500  y  p  i  h 10.000  f   f   0   t   s  200      0
## 145  b 22.75000 11.500  u  g  i  v  0.415  f   f   0   f   g    0      0
## 146  b 23.50000  3.165  y  p  k  v  0.415  f   t   1   t   g  280     80
## 147  b 34.67000  1.080  u  g  m  v  1.165  f   f   0   f   s   28      0
## 148  b 24.50000 13.335  y  p aa  v  0.040  f   f   0   t   g  120    475
## 149  b 24.17000  0.875  u  g  q  v  4.625  t   t   2   t   g  520   2000
## 150  b 28.25000  5.125  u  g  x  v  4.750  t   t   2   f   g  420      7
## 151  b 21.00000  4.790  y  p  w  v  2.250  t   t   1   t   g   80    300
## 152  b 13.75000  4.000  y  p  w  v  1.750  t   t   2   t   g  120   1000
## 153  b 31.56817 10.500  u  g  x  v  6.500  t   f   0   f   g    0      0
## 154  b 22.83000  3.000  u  g  m  v  1.290  t   t   1   f   g  260    800
## 155  a 22.50000  8.500  u  g  q  v  1.750  t   t  10   f   g   80    990
## 156  b 41.58000  1.750  u  g  k  v  0.210  t   f   0   f   g  160      0
## 157  a 57.08000  0.335  u  g  i bb  1.000  t   f   0   t   g  252   2197
## 158  a 25.33000  2.085  u  g  c  h  2.750  t   f   0   t   g  360      1
## 159  b 40.92000  0.835  u  g ff ff  0.000  t   f   0   f   g  130      1
## 160  a 33.92000  1.585  y  p ff ff  0.000  t   f   0   f   g  320      0
## 161  a 24.92000  1.250  u  g ff ff  0.000  t   f   0   f   g   80      0
## 162  b 19.67000 10.000  y  p  k  h  0.835  t   f   0   t   g  140      0
## 163  b 44.25000 11.000  y  p  d  v  1.500  t   f   0   f   s    0      0
## 164  b 34.75000 15.000  u  g  r  n  5.375  t   t   9   t   g    0    134
## 165  b 38.58000  3.335  u  g  w  v  4.000  t   t  14   f   g  383   1344
## 166  a 22.42000 11.250  y  p  x  h  0.750  t   t   4   f   g    0    321
## 167  b 26.75000  1.125  u  g  x  h  1.250  t   f   0   f   g    0   5298
## 168  a 20.83000  3.000  u  g aa  v  0.040  t   f   0   f   g  100      0
## 169  b 23.08000 11.500  u  g  w  h  2.125  t   t  11   t   g  290    284
## 170  b 43.08000  0.375  y  p  c  v  0.375  t   t   8   t   g  300    162
## 171  b 21.00000  3.000  y  p  d  v  1.085  t   t   8   t   g  160      1
## 172  b 32.25000  0.165  y  p  c  h  3.250  t   t   1   t   g  432   8000
## 173  b 30.17000  0.500  u  g  c  v  1.750  t   t  11   f   g   32    540
## 174  b 31.67000  0.830  u  g  x  v  1.335  t   t   8   t   g  303   3290
## 175  b 32.67000  9.000  y  p  w  h  5.250  t   f   0   t   g  154      0
## 176  a 28.08000 15.000  y  p  e  z  0.000  t   f   0   f   g    0  13212
## 177  b 64.08000 20.000  u  g  x  h 17.500  t   t   9   t   g    0   1000
## 178  b 26.67000  1.750  y  p  c  v  1.000  t   t   5   t   g  160   5777
## 179  b 38.67000  0.210  u  g  k  v  0.085  t   f   0   t   g  280      0
## 180  b 25.75000  0.750  u  g  c bb  0.250  t   f   0   f   g  349     23
## 181  a 29.50000  0.460  u  g  k  v  0.540  t   t   4   f   g  380    500
## 182  b 29.83000  1.250  y  p  k  v  0.250  f   f   0   f   g  224      0
## 183  b 16.17000  0.040  u  g  c  v  0.040  f   f   0   f   g    0      0
## 184  b 22.00000  0.790  u  g  w  v  0.290  f   t   1   f   g  420    283
## 185  a 38.33000  4.415  u  g  c  v  0.125  f   f   0   f   g  160      0
## 186  b 29.42000  1.250  u  g  c  h  0.250  f   t   2   t   g  400    108
## 187  b 22.67000  0.750  u  g  i  v  1.585  f   t   1   t   g  400      9
## 188  b 29.58000  4.750  u  g  m  v  2.000  f   t   1   t   g  460     68
## 189  b 22.00000  7.835  y  p  i bb  0.165  f   f   0   t   g  184      0
## 190  a 27.25000  0.290  u  g  m  h  0.125  f   t   1   t   g  272    108
## 191  b 32.42000  2.165  y  p  k ff  0.000  f   f   0   f   g  120      0
## 192  b 34.17000  2.750  u  g  i bb  2.500  f   f   0   t   g  232    200
## 193  b 36.17000  0.420  y  p  w  v  0.290  f   f   0   t   g  309      2
## 194  a 15.83000  7.625  u  g  q  v  0.125  f   t   1   t   g    0    160
## 195  a 15.75000  0.375  u  g  c  v  1.000  f   f   0   f   g  120     18
## 196  a 28.58000  3.750  u  g  c  v  0.250  f   t   1   t   g   40    154
## 197  b 22.25000  9.000  u  g aa  v  0.085  f   f   0   f   g    0      0
## 198  b 29.83000  3.500  u  g  c  v  0.165  f   f   0   f   g  216      0
## 199  b 31.08000  1.500  y  p  w  v  0.040  f   f   0   f   s  160      0
## 200  b 25.83000 12.835  u  g cc  v  0.500  f   f   0   f   g    0      2
## 201  a 37.33000  2.500  u  g  i  h  0.210  f   f   0   f   g  260    246
## 202  a 41.58000  1.040  u  g aa  v  0.665  f   f   0   f   g  240    237
## 203  a 17.92000 10.210  u  g ff ff  0.000  f   f   0   f   g    0     50
## 204  a 20.08000  1.250  u  g  c  v  0.000  f   f   0   f   g    0      0
## 205  b 27.83000  1.000  y  p  d  h  3.000  f   f   0   f   g  176    537
## 206  b 36.42000  0.750  y  p  d  v  0.585  f   f   0   f   g  240      3
## 207  b 40.58000  3.290  u  g  m  v  3.500  f   f   0   t   s  400      0
##     V16 predictions
## 1     1           1
## 2     1           1
## 3     1           1
## 4     1           0
## 5     1           1
## 6     1           1
## 7     1           1
## 8     1           1
## 9     1           1
## 10    1           1
## 11    1           1
## 12    1           1
## 13    1           1
## 14    1           0
## 15    1           0
## 16    1           1
## 17    1           1
## 18    1           1
## 19    0           1
## 20    0           1
## 21    0           0
## 22    0           0
## 23    0           0
## 24    0           1
## 25    0           1
## 26    0           1
## 27    0           1
## 28    1           1
## 29    1           1
## 30    1           1
## 31    1           1
## 32    1           1
## 33    1           1
## 34    1           1
## 35    1           1
## 36    1           1
## 37    1           1
## 38    1           1
## 39    1           1
## 40    1           1
## 41    1           1
## 42    1           1
## 43    1           1
## 44    1           1
## 45    1           1
## 46    1           1
## 47    1           1
## 48    1           1
## 49    1           1
## 50    1           1
## 51    1           1
## 52    1           1
## 53    1           1
## 54    1           1
## 55    1           0
## 56    1           0
## 57    1           1
## 58    1           1
## 59    1           1
## 60    1           1
## 61    1           1
## 62    1           1
## 63    0           0
## 64    0           0
## 65    0           0
## 66    0           0
## 67    0           0
## 68    0           0
## 69    0           0
## 70    0           1
## 71    0           0
## 72    0           0
## 73    0           0
## 74    0           0
## 75    0           0
## 76    0           0
## 77    0           0
## 78    0           0
## 79    0           0
## 80    0           0
## 81    0           0
## 82    0           0
## 83    0           0
## 84    0           0
## 85    1           1
## 86    1           0
## 87    0           0
## 88    0           0
## 89    0           0
## 90    0           0
## 91    0           0
## 92    0           1
## 93    0           0
## 94    0           0
## 95    0           0
## 96    0           0
## 97    0           1
## 98    0           0
## 99    0           0
## 100   0           0
## 101   0           0
## 102   0           0
## 103   0           0
## 104   0           0
## 105   0           0
## 106   0           0
## 107   0           0
## 108   0           0
## 109   0           0
## 110   0           0
## 111   0           0
## 112   0           0
## 113   0           0
## 114   0           0
## 115   0           0
## 116   0           0
## 117   0           0
## 118   0           0
## 119   0           0
## 120   0           0
## 121   0           0
## 122   0           0
## 123   0           0
## 124   0           1
## 125   0           0
## 126   0           0
## 127   0           0
## 128   0           0
## 129   0           0
## 130   0           0
## 131   0           0
## 132   0           0
## 133   0           0
## 134   0           0
## 135   0           0
## 136   0           0
## 137   0           0
## 138   0           0
## 139   0           0
## 140   0           0
## 141   0           0
## 142   0           0
## 143   0           0
## 144   0           0
## 145   0           0
## 146   0           0
## 147   0           0
## 148   0           0
## 149   1           1
## 150   1           1
## 151   1           1
## 152   1           1
## 153   1           1
## 154   1           1
## 155   0           1
## 156   0           0
## 157   0           1
## 158   0           1
## 159   0           0
## 160   0           0
## 161   0           0
## 162   0           0
## 163   0           1
## 164   1           0
## 165   1           1
## 166   1           1
## 167   1           1
## 168   1           1
## 169   1           1
## 170   1           1
## 171   1           1
## 172   1           1
## 173   1           1
## 174   1           1
## 175   1           1
## 176   1           1
## 177   1           1
## 178   1           1
## 179   1           0
## 180   1           0
## 181   1           1
## 182   0           0
## 183   1           0
## 184   0           0
## 185   0           0
## 186   0           0
## 187   0           0
## 188   0           0
## 189   0           0
## 190   0           0
## 191   0           1
## 192   0           0
## 193   0           0
## 194   0           0
## 195   0           0
## 196   0           0
## 197   0           0
## 198   0           0
## 199   0           0
## 200   0           0
## 201   0           0
## 202   0           0
## 203   0           0
## 204   0           0
## 205   0           0
## 206   0           0
## 207   0           0

write.csv(copy_dataTest_pred, file = "match_predictions.csv")

Lets check the Area Under the Curve (AUC) and ROC plot

#install.packages("ROCR")
library(ROCR)

## Loading required package: gplots

## 
## Attaching package: 'gplots'

## The following object is masked from 'package:stats':
## 
##     lowess

predicted <- predict(model, newdata=dataTest[,-16], type="response")

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type =
## ifelse(type == : prediction from a rank-deficient fit may be misleading

pr <- prediction(predicted, dataTest$V16)
prf <- performance(pr, measure = "tpr", x.measure = "fpr")
plot(prf)

The ROC Curve shows the plot high above and also closer to “1” which is good

Lets check the area under the curve accuracy

auc <- performance(pr, measure = "auc")
auc <- auc@y.values[[1]]
auc

## [1] 0.9265748

Hope this gives you a basis for predicting whether or not someone will be approved for a Credit based on some characters (variables) we know about them!

Credit Approval Data Set – Predicting Credit Approval Using Logistic Regression and Matching Predictions to DataSet

Declare Public Protected and Private Variables in Python – Object Oriented Programming

Run Explore Frequency Distribution of Your Dataset in SAS Studio

Run Your Python and R Codes Online For Your Data Science and Machine Learning Projects Mini Projects For Free

Python iloc, loc, ix Data Retrieving Selection Functions

Simple Interactive Google Data Studio Dashboard

Step By Step Methodology or Guide to Tackle A Data Science Competition or Project

Leave a Reply Cancel reply

DataPandas LTS

EXPLORE DataPandas

ImportAnt link

GET IN TOUCH

© 2024 DataPandas