13 – Introduction to Probability

Probability is another very integral subject when it comes to analytics. The significance of variables or features used in a model, significance of relationship between variables & significance of a statistical model are all represented in terms of “p” values. Deep understanding of probability is key to setting up a hypothesis, building models and interpreting the results.

Definitions & Concepts:

Probability is the likelihood of a certain event occurring. It is a number between 0 and 1, with 0 indicating no chance of the event occurring and 1 indicating a 100% chance of the event occurring. For example, a coin toss has a 0.5 probability of head.

Experiment is a process that generates results from defined set of outcomes. A single execution of an experiment generates only one outcome from the defined set of outcomes. Examples of experiments are

  • Coin toss with head & tail as possible outcomes
  • Roll of dice with 1 through 6 as possible outcomes
  • Pick from a deck of cards having 52 possible outcomes.

Sample Space for an experiment is the list of all possible outcomes.

Sample Point is a single element (experimental outcome) in the sample space.

Counting Rules:

While it is easy to list all possible outcomes for smaller numbers as in examples above, it is not always required to do so. It will not be convenient for scenarios involving larger numbers. We only need the total number of possible outcomes that can then be used in probability calculations. Below are few rules that can be used to arrive at the numbers.

Multi-step experiment: In a multi-step experiment, the total number of possible outcomes can be calculated as the product of number of outcomes from each step.

For example, let us consider the coin toss scenario. Two tosses of a coin produces four possible combinations of results – {(H,H),(H,T),(T,H),(T,T)}.

Here, the number of possible outcomes in step 1 is 2 (Head or Tail). Similarly, the number of possible outcomes in step 2 is again 2 (Head or Tail). The total number of possible outcomes of this 2-step experiment as per the counting rule is 2*2 = 4.

Combinations: When we select r items from a total of n, the number of possible combinations is given by the formula below.

Combination formula

Permutations: When we select r items from a total of n and the order of items is important, the number of possibilities is given by the formula below.

Permutation formula

The combination formula simply gets multiplied by r! in this case as each set of r items can be ordered in r! different ways.

Probability Assignment:

Probabilities can be assigned to experimental outcomes by three different methods – Classical, Relative Frequency & Subjective.

Rules: Two rules need to always be satisfied in probability assignment.

  1. Probability of any particular outcome is always a value between 0 and 1 (inclusive).Probability Rule 1bwhere Ei is the ith experimental outcome and P(Ei) is the probability of ith experimental outcome
  2. The sum of probabilities of all experimental outcomes in a sample space is 1Probability Rule 2bwhere there are n possible outcomes

Classical Method: When there are n possible experimental outcomes and all of them are equally likely to occur, a probability of 1/n is assigned to each outcome. This is the classical method. In the coin toss scenario, head & tail are both equally possible outcomes when it is a fair coin. So they get a probability of ½ each.

Relative Frequency Method: When we have historic data that tells us how many times a particular outcome has occurred, the probability for that outcome is calculated as (number of occurrences / total number of executions). For example, if a cricket player has scored 5 centuries from 20 matches, the probability of him scoring a 100 in upcoming match is 5/20 or 0.25.

Subjective Method: When the experimental outcomes are all not equally likely and when there is no historic data available to make relative frequency calculations, we can go with experience or intuition. But this method of assigning probabilities is subjective to the decision maker. The values could be different in the views of different individuals. The classical and relative frequency methods automatically take care of satisfying the two rules discussed above. With subjective method, care must be taken that the probabilities are assigned in such a way that the two rules of probability assignment are satisfied.

Read more about Probability in my next post Introduction to Probability – Part 2.


One thought on “13 – Introduction to Probability

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s