Statistics is all about data. We collect sets of data, analyze our data and ultimately, use our data sets to make inferences about larger sets of individuals in our population.
We are going to be focusing on univariate data or one-variable data in this unit. This is data that only has one aspect of it that is being measured. Among our sets of univariate data, we will divide our data sets into two different types: quantitative and categorical. 📝
Have you ever wondered what the average AP score was? Or perhaps the average number of bananas purchased at the grocery store per bunch? Both of these are examples of quantitative data because each individual is assigned a quantity. Whether it is assigning each test taker an AP score, or each banana bunch purchased, each individual being measures is assigned a number. One of the big giveaways for quantitative data is that we can take the mean, or the average, of the data set. In other words, quantitative data is average-able. 📲
EXAMPLE: You have taken 5 exams in your math class and you want to know your average score. The scores on the exams are as follows:
Exam 1: 80
Exam 2: 90
Exam 3: 70
Exam 4: 85
Exam 5: 75
To find the average, you need to add up all of the exam scores and then divide by the total number of exams. In this case, the total score is 80 + 90 + 70 + 85 + 75 = 400, and the total number of exams is 5.
Therefore, the average exam score is 400 / 5 = 80; in this example, your average exam score is 80.
This is a very simple example and in practice, you may encounter more complex problems that involve larger datasets and more variables. However, the basic principle of finding the average by summing the values and dividing by the count remains the same.
💡 Quantitative data uses means, or averages, to make inference!
On the flip side, we have categorical data. Have you ever asked a group of people whether they liked coffee? What about what their favorite vegetable is? How about if they prefer 🍩 or 🍪 for dessert? Each of these types of surveys would be examples of categorical data. The reason why is because each individual chooses a category: do you fall into the 🍩 or 🍪 category? Because of this separation of data, it is impossible to calculate the average dessert preference. After all, it would not make sense to make a statement like "the average dessert preference is a cookie." Instead, we typically measure categorical datasets using measures like proportions. It makes a lot more sense to make a statement like, "the proportion of people who prefer cookies is 0.65."
Here are some examples of statements outlining categorical data using proportions:
"In a survey of 100 people, 50% identified as male and 50% identified as female."
"In a sample of 300 customers, 20% reported having a positive experience with the company's customer service, while 80% reported a negative experience."
"Of the 1000 people surveyed, 30% reported having a bachelor's degree, while 70% reported having a high school diploma or lower level of education."
"Of the 200 products reviewed, 40% received a rating of 4 or 5 stars, while 60% received a rating of 3 stars or lower."
"In a study of 500 students, 25% reported experiencing bullying at school, while 75% reported not experiencing bullying."
In each of these examples, the proportion of individuals or items in each category is described using percentages. This allows us to see the relative frequency or prevalence of each category within the data. ⚖️
💡 Categorical data uses percentages, or proportions, to make inference.
In practice, statistics is used in a wide range of fields, including business, economics, biology, psychology, social sciences, and many others. It is a powerful tool for understanding and interpreting real-world phenomena, and is used to inform decision-making, policy-making, and research in a variety of contexts. 📈
Some common tasks in statistics include:
Collecting data through surveys, experiments, or other methods
Describing and summarizing data using measures such as mean, median, mode, and standard deviation
Visualizing data using graphs and plots
Testing hypotheses and making inferences about population parameters based on sample data
Building statistical models to predict outcomes or understand relationships between variables
Even beyond this course, there are many different branches of statistics, including descriptive statistics, inferential statistics, predictive modeling, and more. Each of these areas has its own set of techniques and approaches for analyzing data! ⌛️
One of the major things that is going to feel very different for this course as opposed to other mathematics courses you have taken in the past is the way in which you record your answers. In an Algebra or Calculus course, it is sufficient to say "x = 5" when that is your answer. In AP Statistics, it is a good idea to go ahead and get in a habit of tying your answer to whatever the specific context of the problem you are working on. Instead of simply saying, "x = 5" make your answer more specific by saying things like "the average number of bananas per bunch is 5." 🧩
💡 Our goal in statistics is not just to find the correct answer, but to communicate our findings to our audience so that the answer is useful in making further predictions.
Perhaps the biggest concept and skill of this first unit is being able to describe data. In quantitative data, this consists of four main parts: center, outliers, spread, and shape. It is also important to include context in your answer. 💠
For example, if we had a set of data regarding the amount of bananas per bunch purchased, a model response may look like the following: "The mean number of bananas purchased was 5 bananas, There was one outlier when a customer purchased a bunch of 12 bananas. The shape of our data distribution was fairly symmetric. The range of bananas per bunch was 10, with the largest bunch being 12 and the smallest bunch being 2."
In categorical data, this process may look different. It is usually more valuable with context data to discuss which category was most likely to happen and which was least likely to happen. For example, a description could look like this: "Our most likely outcome was people who prefer donuts with a proportion of 0.45 and our least likely outcome was people who prefer cookies with a proportion of 0.15." 👨🍳
Sometimes it is also beneficial with categorical data to discuss raw counts rather than proportions. However, it is more likely that the AP exam will ask you to describe a distribution of a quantitative data set rather than a categorical data set. For more information on content from Unit 1, check the link below! 🏃♂️
🎥 Watch: AP Stats - Unit 1 Streams