“confusionMatrix” function of “caret” package threw error as below while validating prediction results in R.
Error message:
Error in confusionMatrix.default(loan$Defaulter, loan$Prediction) :
The data contain levels not found in the data.
Reason:
This error comes up because the two columns we feed into confusion matrix function have different levels.
R line of code that gives error is
> confusionMatrix(loan$Defaulter, loan$Prediction)
“str” on the data frame reveals the below.
> str(loan)
'data.frame': 15 obs. of 3 variables: $ Customer : Factor w/ 15 levels "A","B","C",..: 1 2 3 4 5 6 7 8 9 10 ... $ Defaulter : Factor w/ 2 levels "0","1": 2 2 1 2 1 1 1 2 2 1 ... $ Prediction: Factor w/ 2 levels "1","2": 2 1 1 2 2 1 1 2 1 1 ...
Defaulter variable existing in the loan dataset has levels 0 and 1, where 0 denotes a non-defaulter and 1 denotes a defaulter.
Prediction variable that is created and populated by our R code has levels 1 and 2, where 1 denotes a non-defaulter and 2 denotes a defaulter.
Due to this mismatch, a confusion matrix cannot be created.
Fix:
We have the required information, but just denoted by mismatching labels.
The levels of Prediction variable are changed to 0 & 1 as below using “levels”.
> levels(loan$Prediction) <- list("0" = "1", "1" = "2")
Post this level correction, “str” on the data frame shows matching levels in Defaulter and Prediction variables.
> str(loan)
'data.frame': 15 obs. of 3 variables: $ Customer : Factor w/ 15 levels "A","B","C",..: 1 2 3 4 5 6 7 8 9 10 ... $ Defaulter : Factor w/ 2 levels "0","1": 2 2 1 2 1 1 1 2 2 1 ... $ Prediction: Factor w/ 2 levels "0","1": 2 1 1 2 2 1 1 2 1 1 ...
confusionMatrix function now produces the desired result.
> confusionMatrix(loan$Defaulter, loan$Prediction)
Confusion Matrix and Statistics Reference Prediction 0 1 0 5 2 1 3 5 Accuracy : 0.6667 95% CI : (0.3838, 0.8818) No Information Rate : 0.5333 P-Value [Acc > NIR] : 0.2201 Kappa : 0.3363 Mcnemar's Test P-Value : 1.0000 Sensitivity : 0.6250 Specificity : 0.7143 Pos Pred Value : 0.7143 Neg Pred Value : 0.6250 Prevalence : 0.5333 Detection Rate : 0.3333 Detection Prevalence : 0.4667 Balanced Accuracy : 0.6696 'Positive' Class : 0
Hello Arun,
I am getting an error in R as follows:
> confusionMatrix(as.factor(predicted), as.factor(Netflix.train$type))
Error in confusionMatrix.default(as.factor(predicted), as.factor(Netflix.train$type)) :
The data must contain some levels that overlap the reference.
I checked the levels and they are different. How to change it?
> levels(as.factor(Netflix.train$type))
“1” “2”
> levels(as.factor(predicted))
“Movie” “TVShow”
Thanks,
Mohamed Asfar
LikeLike
Hi Mohamed,
Do you have a line of code that converts predicted probabilities into labels? You must be passing the labels “Movie” and “TVShow” somewhere in your R code, assigning them to certain probability ranges I believe.
Let me assume 1 maps to Movie and 2 maps to TVShow. Please try out the following command. You can just swap the values if it is the other way around.
> levels(predicted) = list(“1” = “Movie”, “2” = “TVShow”)
Then try your Confusion Matrix..
> confusionMatrix(as.factor(predicted), as.factor(Netflix.train$type))
LikeLike