Variation in the shape of a data distribution can be either random or non-random.
Random variation in the shape of a data distribution occurs when the data values are randomly distributed and there is no underlying pattern or structure to the distribution. This type of variation is often seen in data that is collected from a random sample of a population.
Non-random variation in the shape of a data distribution occurs when there is an underlying pattern or structure to the distribution. This type of variation can be caused by factors such as measurement error, bias, or systematic differences in the population. Non-random variation can result in distorted or skewed data distributions.
It's important to consider the possible sources of variation when analyzing data, as it can affect the conclusions that are drawn from the data and the inferences that are made about the population. The next question is: what about normalcy? What does it mean to be normal (or not) in the context of statistics and sampling distributions?
If you're looking back at the snowboarder study from the
Unit 6 Overview, "being normal" here isn't referring to being regular footed as a snowboarder. 🤣
In fact, the term "normal" has much larger statistical implications. When we are performing statistical inference, our calculations are very largely based on sampling distributions of a proportion, which yep, you guessed it, are normal curves.
If you refer back to Unit 1.1, we know that a lot of fancy Calculus calculations can allow us to calculate probabilities using these normal curves. When testing a statistical claim or estimating a population proportion, we need the normal curve to calculate the probability in our sampling distribution
(see Unit 5). By taking our sample and standardizing it to a common density curve, we can use calculator functions or our
z-score chart to make inference using the normal curve.
To check if our sampling distribution is normal, we need to verify that the expected successes and expected failures of our study is at least 10. This is known as the Large Counts Condition.
In formula form, this is np ≥ 10 and n(1-p) ≥ 10.
This verifies that our sampling distribution is normal and we can continue with z-scores to calculate our probabilities or intervals.
Let's say that we believe that hockey players have a 95% chance of breaking a bone at some point in their life. We decide to test that claim by taking a sample of 500 retired hockey players and asking them if they have ever broken a bone. To verify that we can use the normal curve in this test/interval, we would list the following:
✔ ️500 (0.95) ≥ 10 & 500 (0.05) ≥ 10
✔ ️ 475 ≥ 10 & 25 ≥ 10
Since both calculations come out to be more than 10, we can use our proportion from our sample to check if the 95% value given is actually true!
🎥
Watch: AP Stats - Normal Distributions