“confusionMatrix” function in R – The data contain levels not found in the data

“confusionMatrix” function of “caret” package threw error as below while validating prediction results in R.

Error message:
Error in confusionMatrix.default(loan$Defaulter, loan$Prediction) :
The data contain levels not found in the data.

Reason:
This error comes up because the two columns we feed into confusion matrix function have different levels.

R line of code that gives error is

> confusionMatrix(loan$Defaulter, loan$Prediction)

“str” on the data frame reveals the below.

> str(loan)
'data.frame': 15 obs. of 3 variables:
 $ Customer : Factor w/ 15 levels "A","B","C",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Defaulter : Factor w/ 2 levels "0","1": 2 2 1 2 1 1 1 2 2 1 ...
 $ Prediction: Factor w/ 2 levels "1","2": 2 1 1 2 2 1 1 2 1 1 ...

Defaulter variable existing in the loan dataset has levels 0 and 1, where 0 denotes a non-defaulter and 1 denotes a defaulter.

Prediction variable that is created and populated by our R code has levels 1 and 2, where 1 denotes a non-defaulter and 2 denotes a defaulter.

Due to this mismatch, a confusion matrix cannot be created.

Fix:
We have the required information, but just denoted by mismatching labels.

The levels of Prediction variable are changed to 0 & 1 as below using “levels”.

> levels(loan$Prediction) <- list("0" = "1", "1" = "2")

Post this level correction, “str” on the data frame shows matching levels in Defaulter and Prediction variables.

> str(loan)
'data.frame':  15 obs. of  3 variables: 
$ Customer  : Factor w/ 15 levels "A","B","C",..: 1 2 3 4 5 6 7 8 9 10 ... 
$ Defaulter : Factor w/ 2 levels "0","1": 2 2 1 2 1 1 1 2 2 1 ... 
$ Prediction: Factor w/ 2 levels "0","1": 2 1 1 2 2 1 1 2 1 1 ...

confusionMatrix function now produces the desired result.

> confusionMatrix(loan$Defaulter, loan$Prediction)
Confusion Matrix and Statistics

          Reference
Prediction 0 1
         0 5 2
         1 3 5
                                          
               Accuracy : 0.6667          
                 95% CI : (0.3838, 0.8818)
    No Information Rate : 0.5333          
    P-Value [Acc > NIR] : 0.2201          
                                          
                  Kappa : 0.3363          
 Mcnemar's Test P-Value : 1.0000          
                                          
            Sensitivity : 0.6250          
            Specificity : 0.7143          
         Pos Pred Value : 0.7143          
         Neg Pred Value : 0.6250          
             Prevalence : 0.5333          
         Detection Rate : 0.3333          
   Detection Prevalence : 0.4667          
      Balanced Accuracy : 0.6696          
                                          
       'Positive' Class : 0
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s