In this unit, we'll deal a lot with questions suggested by variation between observed and expected counts in categorical data. In other words, variation between what we find and what we expect to find may be random or not.
What does this mean? 🤔
Image From Pinterest
Just as with any outcome, there is always a chance that something abnormal happens. The tricky part (AKA statistics) comes into play when we determine if our outcome was just due to a random chance or something incorrect in the original claim.
When conducting a statistical test, there is always the possibility that the observed difference between the variables is due to chance rather than a true relationship between the variables. This is known as random variation. The p-value that is calculated in a statistical test reflects the likelihood that the observed difference between the variables occurred by chance.
If the p-value is very low (e.g., below 0.05), then it is unlikely that the observed difference occurred by chance, and we can conclude that there is a significant difference between the variables. However, if the p-value is high (e.g., above 0.05), then it is more likely that the observed difference occurred by chance, and we cannot conclude that there is a significant difference between the variables. 🍀
For instance, if we flip a fair coin 10 times, we would expect to get 5 heads and 5 tails. Would this absolutely be our results? Probably not. Flipping a coin is a random process that could result in a variety of outcomes. The most likely outcome would be 5 heads and 5 tails, but there are other outcomes that are almost just as likely.
If we flipped a coin 10 times and got 4 heads and 6 tails, would we doubt that the coin was a fair coin? Probably not. That is a normal outcome and it is pretty close to our expected counts of 5 and 5. If we were to get 10 heads and 0 tails, this is a much larger discrepancy so this might cause us to doubt that the coin is really a fair coin.
Just like we had with our other inference procedures, our sample size plays a huge part in our outcome. When we flip a coin 10 times and receive 4 heads and 6 tails, no big deal. If we flipped a coin 1000 times and received 400 heads and 600 tails, that seems a lot more unlikely. 🪙
The main reason why the sample size affects our expected outcome is due to the standard deviation decreasing as the sample size increases. This is a very important inverse relationship between sample size and standard deviation among all statistics that we have discussed this year and is almost sure to show up on the AP exam several times!
The sample size does play a role in the statistical power of a test, which is the probability of correctly detecting a difference between the variables if one actually exists. In general, the larger the sample size, the greater the statistical power of the test.
One reason why a larger sample size can increase the statistical power of a test is because it leads to a smaller standard deviation of the test statistic. For example, in a chi-square test, the standard deviation of the chi-square statistic decreases as the sample size increases, which means that the test is more precise and has a higher probability of detecting a true difference between the variables. 👊
The law of large numbers states that as the sample size increases, the sample mean will become increasingly close to the population mean. This is because the sample mean is a better estimate of the population mean as the sample size increases, due to the decreased standard deviation of the sample mean. 🏙️
In the context of flipping a coin, the law of large numbers would state that as the number of coin flips increases, the proportion of heads will converge towards the true probability of getting heads (which is 0.5). For example, if you flip a coin 10 times and get 6 heads, you might not be confident that the coin is fair. Therefore, if we still had a 40/60 count after 1000 flips, we might still doubt the fairness of the coin. However, if you flip the coin 1000 times and get 500 heads, you would be more confident that the coin is fair, because the proportion of heads is very close to the true probability of getting heads.
The law of large numbers is a fundamental principle of statistics that underlies many statistical procedures and is important for understanding how to make reliable inferences about a population based on a sample. 😄