“prediction” function in R – Number of cross-validation runs must be equal for predictions and labels

A blog reader reached out to me with an error he faced with prediction function while trying to plot an ROC curve. This post captures the error and the fix. Error message: Error in prediction(Traindata$predict.score, Traindata$Subscribe) :   Number of cross-validation runs must be equal for predictions and labels. Reason: R code segment that gives error […]

12 – Types of Sampling

In the previous post on Sampling and Estimation we got introduced to some important sampling terms and concepts and the types of estimation using sampling approach. In this post we will delve into the types of sampling and the pros and cons of each approach. Probability Sampling: Samples selected through probability sampling techniques listed below […]

11 – Sampling and Estimation

We would ideally like to base our estimations on the entire data that is of interest for given business problem. That would result in higher accuracy. But, often we will deal with situations where it is practically not feasible to collect and process the entire data. We may not be able to afford the cost […]

“confusionMatrix” function in R – The data contain levels not found in the data

“confusionMatrix” function of “caret” package threw error as below while validating prediction results in R. Error message: Error in confusionMatrix.default(loan$Defaulter, loan$Prediction) : The data contain levels not found in the data. Reason: This error comes up because the two columns we feed into confusion matrix function have different levels. R line of code that gives […]

10 – Descriptive Statistics – Numeric Variable

In the previous post we saw the different distributions and charts available to summarize the categorical variables. There are similar distributions and charts available for Numeric variables which we will see in this post. Frequency Distribution: Similar to categorical variables, Frequency Distributions can be created for quantitative variables too. In the case of categorical variables […]

9 – Descriptive Statistics – Categorical Variable

Before getting into any statistical modelling and more detailed analytics, it is important for us to understand the data and its distribution at a more basic level. Below are some distributions and plots that will help us to understand the categorical variables in our data set. These are called Descriptive Statistics. Frequency Distribution: Assume a […]