Image From NYU Web Publishing
The first thing to decide when you realize you are looking at categorical data with more than one variable is to determine if you want to perform a test for independence or a test for homogeneity.
A χ2 test for independence is appropriate when we are looking at one sample or populations with two variables. Both groups will be drawn from the same population! 1️⃣
A χ2 test for homogeneity is appropriate when we are looking at two separate samples to determine any difference between their respective populations. 2️⃣
Once you determine which test is appropriate, the next step is to write your hypotheses. Regardless of the test, be sure to include context in your hypotheses, either by using meaningful subscripts or identifying the parameters of interest. ✍️
Templates
The appropriate hypotheses for a chi-square test for homogeneity are:
H0: There is no difference in distributions of a categorical variable across populations or treatments.
Ha: There is a difference in distributions of a categorical variable across populations or treatments.
The appropriate hypotheses for a chi-square test for independence are:
H0: There is no association between two categorical variables in a given population or the two categorical variables are independent.
Ha: Two categorical variables in a population are associated or dependent
Example: Independence
When writing a set of hypotheses for a test for chi-squared test for independence, your null hypothesis is that there is no association between the two categorical variables in your given population. Your alternative hypothesis is that there IS an association between the two categorical variables of interest.
For example, let’s say that we are looking at how our favorite sport affects someone’s grade in an AP Statistics class. We could take a random sample of 100 students from your high school’s AP Statistics class and ask them what is their favorite sport, football, basketball or baseball, along with their letter grade for the class. 🏈
Our hypotheses would be as follows:
H0: There is no association between sports preference and letter grade in AP Statistics for students at XYZ High School.
Ha: There is an association between sports preference and letter grade in AP Statistics for students at XYZ High School.
Since this problem involves one population (AP Statistics students at XYZ High School), this would require a test for independence.
Example: Homogeneity
When writing a set of hypotheses for a test for chi-squared test for homogeneity, your null hypothesis is that there is no difference in the distribution of the categorical variables between population 1 and population 2. The alternate hypothesis would be that there is a difference between the distribution of the categorical variable between the two populations of interest.
For example, if we wanted to observe how the distribution of sports preference differs among AP Statistics students and AP Calculus students, we could take a random sample of 100 Stats students and 100 Calculus students and determine if the distribution of football, baseball, or basketball preference differs between these two groups. ⚾
Our hypotheses would be as follows:
H0: There is no difference in sports preference between AP Statistics and AP Calculus students at XYZ High School.
Ha: There is a difference in sports preference between AP Statistics and AP Calculus students at XYZ High School.
Since this problem involves two populations (AP Statistics students at XYZ High School and AP Calculus students at XYZ High School), this would require a test for homogeneity (we are looking to see if two populations are homogeneous in terms of sports preference)..
A test for homogeneity is also used in a randomized experiment since our sample is creating two “populations.” For instance, individuals receiving new drug treatment & individuals receiving placebo. 💉
Chi-squared tests require two familiar conditions for inference:
When sampling without replacement, we should check the 10% condition for independence (n < 10%N)
For our large counts condition, we need to verify that all of our expected counts are at least 5 (similar to other chi-square test set-ups). 🗼
Test for Independence
For our test for independence, we need to verify that our data was collected using a simple random sample.
To verify that your data was collected using a simple random sample, you can check that the following conditions have been met:
Every member of the population has an equal probability of being included in the sample.
The sample is drawn independently of other samples.
If these conditions have been met, then your data was likely collected using a simple random sample, which means that it should be representative of the population and can be used to draw conclusions about the population! 😄
Test for Homogeneity
For our test for homogeneity, we need to verify that our data was collected using a stratified random sample or treatments were randomly assigned (experimental design).
To verify that your data was collected using a stratified random sample, you can check that the following conditions have been met:
The population has been divided into non-overlapping groups, or strata, based on some relevant characteristic.
A simple random sample is drawn from each stratum.
If these conditions have been met, then your data was likely collected using a stratified random sample, which means that it should be more representative of the population than a simple random sample because it takes into account the inherent structure of the population.
Alternatively, if you are conducting an experimental study, you can verify that treatments were randomly assigned by checking that the following conditions have been met:
The subjects in the study are randomly assigned to treatment groups.
The experimenter is unaware of which subjects are in which treatment group (i.e., the study is double-blind).
If these conditions have been met, then it is likely that the treatments were randomly assigned, which means that any differences between the treatment groups can be attributed to the treatments rather than to preexisting differences between the groups. 😄
🎥 Watch: AP Stats Unit 8 - Chi Squared Tests