4 – Central Tendency

Measure of Central Tendency describes the middle point around which data is spread. This is one of the basic measures in Descriptive Statistics that allows us to understand the data in hand before proceeding with more complex analysis.

Described below are some available methods for computing central tendency and the scenarios in which each of these should be applied.

Mean:

Mean refers to the Arithmetic Mean or Average. It is the sum of all values of a variable divided by the number of observations.

Imagine the variable “x” contains age of students in a training.

x = {17,13,19,12,18,19,14}

Mean for this set is

mean
which is 112/7 = 16.

Mean is very commonly used as the measure of central tendency for numeric variables. It is a more appropriate measure when the values are evenly distributed.

Median:

Median is the value that occurs in the middle after all values are arranged in sorted order.

In the example

x = {17,13,19,12,18,19,14}

the values are arranged in sorted order as below

{12,13,14,17,18,19,19}

and the value that occurs at the center is 17.

Median is generally used for variables that belong to a minimum of ordinal scale as ordering of values is possible. It is not possible to compute mean for an ordinal variable as it is categorical.

With numeric variables, mean can be sensitive to data that contains extreme values (Outliers). For example, assume a variable contains salaries of employees in a company and we intend to calculate the mean salary. The resulting mean will be pushed considerably higher due to salary of the CEO. It will not be a real indicator of the central value of employee salaries. In such cases where the distribution of data is not even, median is a good alternative for central tendency instead of mean.

The disadvantage with median is that it is merely computed by location and the information available in other numbers is not captured. On the other hand, mean captures the information available in all values effectively.

Mode:

Mode is the most commonly occuring value for the given variable.

Taking the same example

x = {17,13,19,12,18,19,14}

the value 19 occurs twice and every other value occurs once. So, the mode for this set is 19.

When a categorical variable is of nominal scale of measure, neither the mean nor median can be computed. Mode is a good measure of central tendency in this case.

If two different values occur in the same maximum frequency, it is called bimodal data. If more than two values share the top frequency it is called multimodal data.

Weighted Mean:

When the values of a variable have different weightage, they cannot be averaged directly. Weights need to be applied to get a meaningful mean.

For example, we buy properties as listed below.

  • 2400 square feet of land at 200 rupees per square feet
  • 4000 square feet of land at 250 rupees per square feet
  • 3600 square feet of land at 300 rupees per square feet

What is the average square feet rate at which we bought our total 10000 square feet of property?

To answer this question we cannot apply a straight average of just the rates. We have bought different proportions of our property at different rates. The respective sizes of lands act as weights in the average computation.

The mean rate of purchase is

weighted-mean

which is ((200 * 2400) + (250 * 4000) + (300 * 3600)) / 10000 = 256

Geometric Mean:

Geometric mean is applied in scenarios like financial growth rate. The normal mean is applicable only in additive scenarios. As growth rate is a multiplicative scenario, geometric mean can only cater to it.

For example, we invest Rs. 100 in a share that grows at rates given below for 3 years.

  • 10% in year 1
  • -5% in year 2
  • 6% in year 3

The final return will be 100 * 1.10 * 0.95 * 1.06 = 110.77

The mean growth rate per year is

geometric-mean

which is 3rd root of (1.10 * 0.95 * 1.06) = 1.0345

or 3.45%

A normal mean in this case will give a result of 3.67%  which is inflated and not a real indicator of the yearly growth rate.

Mean, Median and Mode are the three most common measures of central tendency that are used in analytics.

Advertisements

5 thoughts on “4 – Central Tendency

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s