Josh Argo

Jed Quiaoit

*the**chi-square test for goodness of fit**(for a distribution of proportions of one categorical variable in a population),**the***chi-square test for independence***(for associations between categorical variables within a single population), or**the**chi-square test for homogeneity**(for comparing distributions of a categorical variable across populations or treatments).*

In __Unit 6__, we began to discuss inference for categorical data by looking at one proportion z intervals and tests, along with two proportions z intervals and test to test between the differences in two samples/populations. But what happens if we have two separate variables we are looking at testing the correlation or the difference between? A common method used to test the difference or correlation between two categorical variables is to use a chi-squared testing method. 🟧

Much like we did with our other hypothesis tests, our test procedures will require necessary **conditions** before performing the **test** and our conclusion will be based on an acquired **p-value** from a sampling distribution.

A **chi-square test** is a statistical test that is used to determine whether there is a significant difference between the *observed* frequencies in a sample and the *expected* counts of a particular variable in a reference distribution. It is commonly used to test for associations between categorical variables. 🧮

For example, if one wants to analyze the difference between a person's state of residence and political party affiliation, a chi square test could be done to compare the number of expected Democrat/Republican voters in a given state (the number of voters if the state of residence did *not* play a part in their party affiliation), with the actual number of Democrat/Republican voters in a given state.

This difference would likely be great in states such as California (mostly Democrat) and Alabama (mostly Republican). If this difference between actual and expected is great enough (as I am sure this example would be), we can have convincing evidence that these two variables are in fact related to each other.

In order to perform a hypothesis test using a chi-square procedure, one would need either a **two-way table **or** frequency table distribution** of our categorical variable(s). From there, we can compare our actual counts from the distribution to our expected counts based on a given probability. 🪑

Just like we had with other inference procedures, our test hinges on certain conditions being met. With chi square testing, we need the following two conditions: ❗

- Our sample was taken randomly or treatments were assigned
**randomly**in an experiment. **Large Counts**: All expected counts are at least 5. This is similar to our*normal condition*in previous inference procedures.

In our voting example, Joe Biden received 51.3% of the vote nationwide in the 2020 elections, while Donald Trump garnered 46.9% of the vote. Based on these **expected percentages**, we would expect Joe Biden to receive about 1.2 million votes out of the approximate 2.3 million votes in Alabama. However, Joe Biden only received 849,000. Since there is such a discrepancy between our expected vote count and our actual vote count, we would likely conclude that state of residence and vote recipients are related in some way (which is something we obviously know anyways), but there is actually a statistical test to prove this! 🗳️

When performing inference, it is a great idea to have a template that you follow to ensure that you cover all of our bases when performing a FRQ on the exam. One popular inference templates/acronyms is SPDC:

**S**tate (parameter of interest and hypotheses if necessary)**P**lan (Conditions for inference)**D**o (Calculations with calculator speak if using a calculator)**C**onclude (Conclusion based off of interval or p-value)

This template is a *huge* test-taking tip that can help you be successful on the inference FRQ on the exam. There are other acronyms such as PANIC or PHANTOM that some students use, but I find SPDC to be the simplest, most consistent and efficient template to use for all inference procedures. 👻

See you in the next couple study guides on chi-square tests! 👋

Browse Study Guides By Unit

👆Unit 1 – Exploring One-Variable Data

✌️Unit 2 – Exploring Two-Variable Data

🔎Unit 3 – Collecting Data

🎲Unit 4 – Probability, Random Variables, & Probability Distributions

📊Unit 5 – Sampling Distributions

⚖️Unit 6 – Proportions

😼Unit 7 – Means

✳️Unit 8 – Chi-Squares

📈Unit 9 – Slopes

✏️Frequently Asked Questions

✍️Free Response Questions (FRQs)

📆Big Reviews: Finals & Exam Prep

© 2023 Fiveable Inc. All rights reserved.