In the previous post on Sampling and Estimation we got introduced to some important sampling terms and concepts and the types of estimation using sampling approach. In this post we will delve into the types of sampling and the pros and cons of each approach.
Samples selected through probability sampling techniques listed below result in predictable sampling distributions. Hence, some set formulae can be used for making inferences on population parameters using the derived sample statistics. Statements can be made about the confidence level (or error rate) of inferences thus made. It is always recommended that the researchers go with probability sampling techniques for more reliable results.
Simple Random Sampling & Random Sampling:
Simple Random Sampling is a method by which a sample of n elements is selected from a population of N elements in such a way that every possible combination of n elements has an equal chance of getting selected. In other words, when a particular element is selected to be part of the sample, every available element in the population has an equal chance of getting selected. Similarly, when the second element is selected, every remaining element in the population has an equal chance. This method ensures the selection is truly random and without bias, thereby producing a sample that is most likely to be representative of the population. The total number of possible combinations of n elements that can be selected from N sized population is given by
Simple Random Sampling is applied when working with a finite population (Eg: Number of employees in an organization).
In the method described above, an element can be selected only once. The second selection is made from remaining N-1 elements. This is called Sampling Without Replacement. In this approach, there will be no duplicate elements in the sample. Alternately, the element selected first can be put back in the population before selecting second element. This way, the same element has a chance of getting picked more than once and the resulting sample can have duplicate elements. This approach is called Sampling With Replacement.
What we discussed so far is the scenario of sampling from a finite population. There are situations where the total population is theoretically infinite or cannot be defined. Example is an ongoing production process or the customers entering a retail store. Random Sampling is the method used in this case. The following conditions need to be met for a sample to be considered Random Sample.
Each selected element comes from the same population (sampling is done around same points in time ensuring similar working conditions)
Each element is selected independently (the selection of one element doesn’t influence the selection of any other element)
Stratified Random Sampling
Stratified Random Sampling can be considered where the population comprises of different groups and the characteristics of elements within each group are homogeneous. A single element belongs to one and only one strata (there is no overlap). Separate random samples can be drawn within each strata (group). There are formulae available to combine the results of analysis within each strata and come up with an estimate for the overall population. Relatively small sample sizes in each strata will suffice as the assumption is that the elements within a strata are similar and there is not much variance in characteristics. Also, it will be more beneficial if the heterogeneity across elements in different strata is higher.
Examples of strata are employees belonging to different job domains, shoppers from different age groups, sex, socio-economic status, religion, and ethnicity and so on.
This sampling method is also useful in situations where we want to make sure the overall sample that we take from a given population is made up of equal representation from each strata. The other requirement could be to take samples from each strata proportionate to the percentage of the respective strata in population.
Stratified random sampling usually takes up more cost, but increases the accuracy of estimation.
In Cluster Sampling the population is split in a number of clusters and then randomly some clusters are selected. Each element within a selected cluster becomes a part of the sample. In this method also, a single element can belong to one and only one cluster. The key difference though when compared to stratified random sampling is, in this case the elements within a given cluster are heterogeneous. Each cluster on its own is expected to be a smaller representation of the population with similar distribution of characteristics.
A typical Cluster Sampling scenario is selecting certain area blocks or colonies, houses, colleges for surveys. The sample sizes are usually higher than stratified random sampling. But the cost is lower as once the surveyor gets to the selected location he gets all members of that area accessible for his interviews and relatively large number of observations can be captured in short duration. This method is chosen more often when we practically don’t have access to sample from the entire population.
When simple random sampling is tedious to implement on a large population, Systematic Sampling can be used as an alternative. Here, the population is divided into blocks of size “k”. The block size “k” is determined by (N/n) where “N” is the size of population and “n” is the sample size required. Only from the first block a single element is randomly selected. Then from that selected element every kth element is selected till the last block.
As long as the data is available in a random order without any particular bias or sequence, this method is expected to yield results that are similar in accuracy as simple random sampling.
Samples obtained through methods other than probability sampling may not yield a predictable sampling distribution. Sometimes, researches employ the same formulae from probability techniques to these methods also with the argument that these also are random selections in a way. But the results should be interpreted with caution as they could be highly error prone.
Sometimes it is not feasible to follow a probability method for sampling. For example, shoppers exiting a retail store in a particular time window can be surveyed instead of randomly taking samples distributed over a month or year. Students from certain courses available easily for interviews during weekend classes can be interviewed for making inferences on the university across all courses & all students. Here, the choice is made by what is easily accessible for the surveyors. All elements belonging to the population do not have an equal probability of becoming part of the sample.
Sometimes the judgement of the analyst or interviewer can be used to selectively pick some elements. The interviewer may select certain patients in a hospital because he believes those selected patients form a sample that is representative of all patients / diseases dealt in that hospital.
This judgement may or may not be correct and such a selection is highly subjective depending on the researcher involved. There is no objective way to measure how much the judgement could differ across the researchers and whose judgement should be deemed more accurate.
Quota Sampling is similar to stratified random sampling, in the sense the population is split into strata. But, within each strata a convenience sample is taken instead of a simple random sample. Most recent or easily accessible elements are chosen until the quota set for that strata is filled. It is more like a convenience sampling approach fit into stratified sampling.
This essentially takes out the randomness and hence the sampling error is not measurable statistically.
One subject is chosen for study and then the next subjects are chosen by taking references from first subject. This reference chain continues until required sample size is met. Such an approach could be useful when researches involve rare types of elements where it is not easy to locate them. But again, it is effectively a non-random technique and its accuracy cannot be statistically explained.