11 – Sampling and Estimation

We would ideally like to base our estimations on the entire data that is of interest for given business problem. That would result in higher accuracy. But, often we will deal with situations where it is practically not feasible to collect and process the entire data. We may not be able to afford the cost […]

confusionMatrix function in R – The data contain levels not found in the data

“confusionMatrix” function of “caret” package threw error as below while validating prediction results in R. Error message: Error in confusionMatrix.default(loan$Defaulter, loan$Prediction) : The data contain levels not found in the data. Reason: This error comes up because the two columns we feed into confusion matrix function have different levels. R line of code that gives […]

10 – Descriptive Statistics – Numeric Variable

In the previous post we saw the different distributions and charts available to summarize the categorical variables. There are similar distributions and charts available for Numeric variables which we will see in this post. Frequency Distribution: Similar to categorical variables, Frequency Distributions can be created for quantitative variables too. In the case of categorical variables […]

9 – Descriptive Statistics – Categorical Variable

Before getting into any statistical modelling and more detailed analytics, it is important for us to understand the data and its distribution at a more basic level. Below are some distributions and plots that will help us to understand the categorical variables in our data set. These are called Descriptive Statistics. Frequency Distribution: Assume a […]