Planning a study involves identifying the sampling method that will be used to select a sample from the population, defining the population that the study will be examining, and considering any potential sources of bias or error that could affect the results. ✏️
A census is a statistical study that collects data from every individual in a population. This type of study is often extremely time-consuming and expensive, as it requires researchers to collect data from every individual in the population rather than just a sample.
Censuses are often conducted by governments to gather information about the characteristics of their populations, including demographic information such as age, gender, and race, as well as information about employment, education, and other social and economic factors.
Our goal is to sample in a way that we can generalize the data collected at a population level! After all, it's only appropriate to make generalizations about a population based on samples that are randomly selected or otherwise representative of that population.
To make a conclusion about a certain population, statisticians begin by planning out a sample survey by identifying the population. A sample survey is a study that collects data from a sample that is chosen to represent a specific population. Determine what you’d like to measure from the population you’ll be choosing a sample from.
In an observational study, treatments are not imposed. Investigators examine data for a sample of individuals (retrospective) or follow a sample of individuals into the future collecting data (prospective) in order to investigate a topic of interest about the population. (Note that in an experiment, treatments are imposed but in studies, you gather data to examine possible relations between variables.) ⌛
A sample survey is a type of observational study that collects data from a sample in an attempt to learn about the population from which the sample was taken.
You can establish a plausible relationship or association between two variables depending on whether the data is valid. The data you collect can only be applied to the population of generalization or the population that your sample researched.
(1) You are a statistician working for a public health research organization. You are considering conducting a study to investigate the relationship between air pollution and respiratory illness.
You are considering two different study designs: an observational study and an experiment. In the observational study, you will collect data on the levels of air pollution and the incidence of respiratory illness for a sample of individuals living in different neighborhoods. In the experiment, you will randomly assign individuals to different neighborhoods and measure the levels of air pollution and the incidence of respiratory illness for each group.
Which of the following study designs is more likely to produce reliable and valid results? Explain your answer.
(2) Going off the same set-up as number (1), which of the following variables could potentially confound the relationship between air pollution and respiratory illness? Explain your answer.
A. Age
B. Gender
C. Smoking status
D. Exercise habits
E. Income
(1) The experimental study design is more likely to produce reliable and valid results because it involves the random assignment of individuals to different groups. This minimizes the influence of any confounding variables that may be related to both the levels of air pollution and the incidence of respiratory illness.
In contrast, the observational study design does not involve any manipulation of the variables being studied. This means that any observed relationship between air pollution and respiratory illness may be confounded by other factors that are related to both variables. For example, individuals living in neighborhoods with higher levels of air pollution may also be more likely to smoke or have other risk factors for respiratory illness, making it difficult to disentangle the effects of air pollution from these other factors.
Overall, the experimental study design is preferred because it allows researchers to more confidently attribute any observed relationships to the variables of interest, rather than to confounding variables.
(2) Trick question! All of the variables listed could potentially confound the relationship between air pollution and respiratory illness.
Age: Older individuals may be more susceptible to respiratory illness, and may also be more likely to live in neighborhoods with higher levels of air pollution.
Gender: Men and women may differ in their susceptibility to respiratory illness, and may also differ in their exposure to air pollution.
Smoking status: Smokers may be more likely to develop respiratory illness, and may also be more likely to live in neighborhoods with higher levels of air pollution.
Exercise habits: Individuals who exercise regularly may have a lower risk of respiratory illness, and may also be more likely to live in neighborhoods with lower levels of air pollution.
Income: Individuals with lower incomes may be more likely to live in neighborhoods with higher levels of air pollution, and may also have a higher risk of respiratory illness.
Overall, it is important to consider and account for potential confounding variables in order to accurately interpret the results of a study. When answering AP Stats MCQs and FRQs about confounding variables, do your best to think outside the box as you come up with variables not accounted for in a study or experiment!