What is Statistical Significance Testing, and Why is it Important?
by Tony Zahorik
A Marketing Information Problem
Suppose you want to compare two populations – let’s call them the A’s and the B’s – to see which one has a higher percentage of individuals that are interested in purchasing a product. However, each population is far too big to survey everyone, i.e., to conduct a census. The best we can do is interview a sample of people from each population and compare the results. How will that help? We know that any given sample is unlikely to perfectly represent the population it is drawn from; so just because the sample of A’s has a higher percentage of interested people than the sample of B’s, that doesn’t prove that the population of A’s has a higher percentage of interested people than the population of B’s. Maybe if we drew two other samples we might get a higher percentage among the B’s than A’s. So how can we accurately compare two populations if we can only study samples from each?
The theories of probability and statistics come to the rescue. It can be proven that when a random sample is drawn from a population, most samples will be pretty much like the population. For example, if 20% of the entire population is interested in your product, probability theory shows that at least 70% of all samples of size 200 drawn from the population will have between 17.5% and 22.5% interested people, and 95% of those samples will have between 14.5% and 25.5% interested people. So while a small fraction of the samples will be quite different from the population, and even contain 0% or 100% interested, these unrepresentative samples are a very tiny percent of possible outcomes. Therefore, when we choose a random sample of 200 people from a population, the odds are good that our sample will be closely representative of the whole population.
However, different random samples drawn from the same population are likely to be slightly different from each other. Some might have higher percentages of interested people than the population and some lower. So how can we compare two populations based on samples when samples are only approximations of their populations? The fact is, if the two populations are very similar to one another, it will be difficult to tell them apart based on samples. Samples from one of the populations will look very much like those from the other population. However, if the populations are different enough, then the samples are unlikely to look that similar, since most sample values will be very close to their respective population values.
The Basic Logic Behind a Significance Test
In most cases, we start with the working assumption that the two populations are actually the same, for example, that the percent of people in population A that are interested in your product is the same as the percent in population B. (This is called the “null hypothesis” in statistical lingo.) Therefore:
- If the populations actually are the same, the percentage of interested people in the samples should be fairly close to one another. But, we know that each sample value is only an approximation of its population value, so we don’t require the two samples to have identical values to support the null hypothesis. If they’re “close” we can’t really claim that the two populations are different.
- However, if the two sample values are very different, that suggests that the populations are probably different. We can’t be sure, since there’s always a chance that we selected two very unrepresentative samples from similar populations. But we know that odds of that are small, and so we can reasonably claim that the populations are probably different.
The probabilities of these various outcomes are all measurable, using probability theory. In fact, that is what the outcome of a significance test is. The test examines the samples and determines how likely it would be to get sample values that differ if the null hypothesis is true, i.e., that the populations are actually the same. The output is a probability – the probability of getting two random samples that differ the populations are really the same. The probability is called the “significance level.”
- If this probability is fairly high, that means the samples could easily have come from identical populations, and so there is no case to be made that the populations are different.
- If this probability is very small, that suggests that the samples are far enough apart that they are unlikely to have come from identical populations. In other words, a case can be made now that the populations are probably different from each other. Note that the result doesn’t definitively say that the populations are different, only that the two samples are very unlikely to have come from equal populations. We can never be 100% certain about the populations if we have examined only random samples from each.
- How small is “small?” A standard cutoff used in marketing applications is 5%. In other words, if there is less than a 5% chance of getting two samples so far apart from equal populations, we might be willing to take the chance in declaring the two populations different. However, if the significance level from a significance test is .03, that means that the analysis determined that 3% of the time samples taken from two identical populations are at least as different as the two you have drawn. So the significance level can also be interpreted as the probability that you are wrong in declaring identical populations to be different based on the samples. Again, since we are only looking at samples, we must live with some risk of being wrong about our conclusions. The 5% cutoff is standard for many applications. But for some major decisions (e.g., with million dollar consequences), a 5% chance of being wrong might be too high. The researcher might not be willing to declare a difference unless the significance level is less than, say 1%. For some less consequential decisions, researchers have been known to use a 10% cutoff. It is ultimately a judgment call based on the consequences of being wrong.
The Importance of Using Significance Testing
When researchers are limited to studying only samples of target populations, we have to remember that our clients, the marketing decision makers, don’t really care what a mere sample of a few hundred people say when they are marketing to populations of many thousands or millions – unless the samples provide information about the populations. But, that’s what significance tests enable us to do. They allow us to leverage population-level comparisons from sample differences. So when sample differences are declared to be statistically significant, it doesn’t merely mean that the samples were “quite different.” It means much more than that. It gives us permission to treat the populations they represent as different from each other.
Cautions in Using Significance Testing
There are two cautions to mention when using significance testing. When a significance test declares two sample values to be “significantly different” from one another, it means that we can be fairly certain that the populations are not the same on some dimension. However, it doesn’t necessarily mean that the difference is meaningful in a managerial sense. It’s unfortunate that statisticians have used the term “significant” in this context, because decision makers often interpret to mean that the difference is big enough that it will have an impact on profits or other marketing measures. It doesn’t mean that at all. The difference between the two populations might be inconsequential to a marketer, but we can be sure that there is some difference. Secondly, the calculation of the significance level reported by a significance test is based on the assumption that the samples being studied were selected from the universe of all possible samples by a random process. In fact, random (“probability”) samples are becoming reasonably rare in marketing research, particularly for quota samples constructed from online panel providers. In that case, the calculations of significance levels technically are not valid. However, many analysts calculate and report them anyway to provide some perspective on whether observed differences appear large enough to indicate population differences.
As a member of the teaching staff at Burke Institute, Dr. Tony Zahorik enjoys traveling around the world sharing his extensive knowledge of marketing research methodology with a variety of industries. He has been acclaimed for his ability to teach technical subjects to both technically and non-technically oriented students.
As always, you can follow Burke, Inc. on our LinkedIn, Twitter, Facebook and Instagram pages.
Source: Feature image – @interstid – istockphoto.com