Understanding different types of data is fundamental to the application of statistics. The data we have at our disposal can have variables that belong to different scales or levels of measurement. What scale a particular variable belongs to determines what kind of summarization or statistical methods can be applied on it.
A variable can belong to one of these four scales of measurement – Nominal, Ordinal, Interval & Ratio.
Scales of Measurement
This is the least informative of the measurement scales. As the literal meaning of the word “nominal” suggests, this data exists in name only. It can only be used to identify the observation against a category or segment.
For example, sports event in a school is organized by dividing the students into houses called Blue, Green, Red & Yellow.
By looking at this student name vs house information, we will not be in a position to tell which group is strongest and which is the weakest. We will not be able to rank the observations or perform any mathematical operations using this variable.
Ordinal scale is one level higher than the nominal measurement. This type of variable has all characteristics of nominal data, and in addition it has a ranking as the meaning of the word “ordinal” suggests.
For example, we take up an online learning course and at the end of the course we are asked to fill out a survey and rate our experience during the course. One survey question looks like below.
This is similar to nominal data to the extent that user experience is tagged against one of the five defined categories. In addition, this response can be ranked with “strongly agree” being the best experience and “strongly disagree” being the worst. But, the distance between “strongly disagree” and “disagree” need not be equal to the distance between “agree” and “strongly agree”.
Interval scale is one level higher than ordinal. Ordinal data could only be ordered, but the differences between successive categories need not be equal. With interval scale, as the name suggests, the differences between successive elements in the order are measurable.
Let me take the most common example you will find on this topic.
Assume the temperatures on three successive days were recorded as 10, 20 & 30 °C respectively. The difference between first and second days is 10 degrees, which is the same as the difference between second and third days. The intervals can be measured and compared.
But, it is not possible to say the second day was twice as hot as the first day. The reason is that the measurement hasn’t started from an absolute zero. It is rather a convenience zero or arbitrary zero.
In other words, we cannot say there is no temperature when it reads 0 °C. 0 °C is equal to 32 °F and 273.15 K. There is some temperature at that point. Celsius scale chooses the freezing point of water as its reference for measurement and denotes that as 0 °C. For this reason, multiplication or division on such numbers will not yield any meaningful results.
Ratio level is the highest of all measurement levels. It has all properties of Interval level as well as it has an absolute zero. So, once again as the name “ratio” implies, relative measurements or ratios are meaningful. Multiplication and division can be applied on such data.
Examples include Age, Height, Weight, Income and so on.
A person with age of 20 years can be considered twice as old as another with 10 years. 50cm is half as tall as 100cm. Similarly, Weight and Income can be subject to mathematical operations like addition, subtraction, multiplication and division.
Types of Variables
Broadly, the four levels of measurement discussed above can be grouped into two types of variables – Categorical & Numeric.
A Categorical Variable is one which contains nominal or ordinal level data. This data can also be referred as qualitative data or non-metric data. Statistical methods that can be applied on categorical variable are called non-parametric statistics. They are limited compared to what can be done with numeric variable.
Many times the templates used to collect and store data could use numeric values to denote categories. At the start of our analytics exercise, we need to apply caution in identifying the variables that contain numbers but are actually categorical variables. Serial numbers, customer IDs, sports jersey numbers are examples where even numeric data should be interpreted as categorical.
A Numerical Variable is one which contains interval or ratio level data. This data can also be referred as quantitative or metric data. Statistics that can be applied on this type of data are called parametric statistics.
If this blog felt little too basic or boring, I can assure you that it will get more interesting as we move along. No matter how simple or complex are the Statistical Models you build in future, they will still squarely depend on your understanding of these fundamental concepts. Nominal, Ordinal, Interval & Ratio will happen to be the four corner stones of your Analytics foundation.