12 min read•january 4, 2021

Jerry Kosoff

Practicing FRQs is a great way to prep for the AP exam! Review student responses for a Mixed Units FRQ and corresponding feedback from Fiveable teacher Jerry Kosoff.

The city council of a large city is considering raising a city tax to provide funding for road repairs. The council wishes to gauge citizen interest in the plan. The council mails a survey to random sample of 1,000 city residents, of whom 450 reply. The survey asks “should we increase city taxes in order to provide additional funding to road repairs?” Of those who reply, 170 say yes; the other 280 say no.

- Explain why 0.378 (170/450) may be a biased estimate of the proportion of all city residents who would reply yes to the question.
- Another survey is commissioned; in it, a random sample of 300 city residents is contacted. They are also asked if they would support in increase in city taxes for the purpose of providing additional funding for road repairs. In the second survey, 161 of those surveyed reply “yes.” The city council plans to use the results of this survey to construct a 95% confidence interval to estimate
*p*, the proportion of all city residents who would reply “yes” to the question.- (a) Explain the meaning of the 95% confidence
**level**in the context of this problem. - (b) The 95% confidence interval constructed from the survey is (0.480, 0.593). Does this interval provide convincing statistical evidence that the proportion of all city residents who would say “yes” to the question is at least 50%? Justify your answer.

- One of the conditions that must be met for the confidence interval in (2) to be constructed is that at least 10 residents must have said “yes” and at least 10 residents must have said “no.” Explain why this condition is necessary.

- Those who responded yes probably make a lot of money during their job. People who have a well paid job usually travel daily on roads to arrive at work everyday. This makes it seem like that most of the city residents travel long distances on roads so would like to increase city taxes for road repairs.
- If I took many many samples and generated confidence intervals on them, 95% of city residents would contain the true proportion of all city residents who support an increase in city taxes to help fund road repairs.
- No, since the interval ranges from .48-.593 there is a possibility that the true mean of city residents who support an increase in city taxes for road repairs is not at least 50% since .48-.49999 is less than 50%.
- This condition is necessary as it makes sure that the samples are independent from one another meaning that one city resident whose decision supports an increase in city taxes for road repairs does not affect another city resident whose decision supports an increase in city taxes.

Teacher Feedback

In #1, you give a plausible reason for bias (people travel a lot and would like to see the taxes increase), but do notexplicitlydescribe the impact this has on the estimate they got (37.8%). Saying “it seems like most of the city residents would like to increase city taxes” is not the same as clearly stating that you believe this the survey resulted in anoverestimateof the true proportion. I know it seems harsh, but that’s how the rubrics go with describing bias: (1) explain HOW the bias comes to be, (2) explain WHY the bias happens, (3) give a specific DIRECTION of the bias (over/under estimate of the true ____). Your answer only does #1 and #2.

In part 2, your sentence looks great - except you said “95% of city residents” instead of “95% of the constructed intervals” or something similar. Read your sentence back, and I think you’ll see what I mean. That would ding your answer from full to partial credit unfortunately. You have something similar in part 3 - you say “true MEAN of city residents” instead of “true PROPORTION of city residents”. Mixing up mean & proportion there would also knock you down a scoring level, even though the rest of your answer is on point.

Finally, in part 4, you are correct in that we need theobservations(not “samples”) within the sample to be independent, but that comes from the “10% condition” (that the sample size is less than 10% of the overall population size). The reason for the “at least 10 successes/failures” condition is to ensure an approximately normal sampling distribution of p-hat, from which we can calculate the confidence interval.

- The 0.378 may be a biased estimate of the proportion of all city residents who would reply yes to the question because the survey has non-response bias present. According to the information present in the passage, the survey indicated that not everyone responded to the survey which means that the proportion is not an accurate estimate of all city resident individuals who would reply yes to the question because the proportion is an underestimate of the true proportion who would reply yes to the question.
**(A)**In the context of this problem, a 95% confidence level means that we are 95% confident that the interval captures the true portion of all city residents who would reply “yes” to the question.**(B)**This interval does provide convincing statistical evidence that the proportion of all city residents who would say “yes” to the question is at least 50% since 50% is included in the interval 0.480 to 0.593.- This condition is necessary in order to ensure that the sampling distribution of p-hat is approximately normal.

Teacher Feedback

In part #1, you would likely earn partial credit. When discussing bias on the AP exam, you typically have to do 3 things: (1) explain the source of the bias (“how” it happens), (2) explain the reason for that source existing (“why” it happens), and (3) explain the impact on the result (“what” happens). When reading your response, I see evidence for #1 and #3 - you mention “not everyone responded to the survey” (#1 - how) and that this will probably “underestimate the true proportion” (#3 - the impact). To my eyes, though, your response does not addresswhythis happens andwhythe non-response will lead to an underestimate, which would imply that the 37.8% is lower than the true proportion if we actually asked everyone (and it would be maybe more like 50% or something like that). You would need to make an argument for *why the people who responded to the survey are more likely to say no and thus produce an underestimate* - perhaps they are strongly opposed to taxes of any kind, or the wording of the question made them feel like their money could be better spent elsewhere. Whatever you decide is the case, you should present and defend why it impacts the responses. For nonresponse to turn into nonresponse bias, the people whodoparticipate must be more likely to answer a certain way than the people whodon’tparticipate.

Additionally, while I’m not assuming this is the case, I often have students misunderstand that getting responses from fewer people than you expect does not automatically produce anunderestimate. “Underestimate” specifically refers to the proportion/mean/whatever-statistic-is-being-measured being lower than would be reflected in the population. A small and biased sample can produce anoverestimate just as easily as an underestimate - perhaps in this scenario we ask a small group of people who live near roads with lots of potholes what they think. They would be likely to support the city’s proposal more than others, and therefore produce an overestimate. [OK, thanks for coming to my TED Talk about bias. On to the next part…]

In part #2, we have a little bit of reviewing to do. In part (a), you correctly interpret what a 95% confidenceintervalis, but that is not the same as a confidencelevel. A confidencelevelrepresents a “long-run capture rate” that is then reflected in each specific confidence interval. You can check out an overview from a previous stream at this link 1 - it’s time-stamped to the part you’d need. The correct answer in this case would sound something like “if we were to take many, many random samples of 300 city residents and ask them the question, about 95% of the confidence intervals we constructed would capture the correct value forp, the proportion of all city residents who would respond yes to the question.”

For #2 part (b), you’ve also committed a relatively common error, in that while it is true that 50% is in fact in the interval, the presence of other, smaller values in the interval provides evidence against the claim that at least 50% of residents support the proposal. It’s just as plausible that 48.5%, or 49%, or 49.9% would say “yes”. And sinceallvalues within a confidence interval are considered “reasonable” values forp, we cannot say with confidence that the true population proportion is at least 50%. We could only say that if theentireinterval is 50% of higher - for example, (0.512, 0.592).

In part ( c ), you give the correct rationale for the “large counts” condition - short and to the point! This would earn full credit.

- The result may be a biased estimate of the proportion of all city residents who would reply yes to the question because only 450 of those 1000 people replied. This could lead to under coverage of the population and non-response bias: people who do not reply might have different opinions about the plan.
- Part a and b:
- a. To be 95% confidence in the answer ye of the residents means that if many random samples of size 300 residents are conducted and many confidence intervals are created, then 95% of the interval would contain the true proportion of city residents who would reply “yes” to the question.
- b. No. The interval (0.480, 0.593) does not provide convincing statistical evidence that the proportion of all city residents who would say “yes” to the question is at least 50% because 50% is contained in the interval. It means one could get a result of a random sample of 300 city residents that 50% say yes just by random chance alone.

- The Success/ Failures must be met for the confidence interval because to make sure the sample is large enough - the sampling distribution of the proportion is approximately Normal.

Teacher Feedback

For part 1, you would likely earn partial credit. When discussing bias on the AP exam, you typically have to do 3 things: (1) explain the source of the bias (“how” it happens), (2) explain the reason for that source existing (“why” it happens), and (3) explain the impact on the result (“what” happens). When reading your response, I see evidence for (1) and (2) when you describe “only 450 of those 100 people replied” and cite the possibility that those who replied “might have different opinions about the plan.” From there, you should “take a stand” per se and make a conjecture as tohowthose people would differ in their opinions from the general population, and if that will produce an overestimate (more likely to say yes) or underestimate (less likely to say yes) of the true proportion. You can actually justify either direction here, as long as your explanation is clear.

Your response in part 2a is strong, and shows a clear understanding of what a confidencelevelrepresents. In part 2b, you give the correct answer (“no”) with a correct reason (“50% is contained in the interval”), but you lose me a bit with your last sentence. In theory, we could getanyproportion in a sample of 300 residents just by random chance alone. A more direct statement may be something like “this means that 50% is a plausible value for the proportion of all adults who would say yes, and we therefore do not have statistical evidence that the proportion is greater”

Your response in part 3 is on the money - the approximately normal sampling distribution is why we check that condition!

- Since this is a survey that was sent out to a 1000 people but only 450 chose to reply, this may lead to a bias where the people who are strongly opinionated about the topic of raising a city tax to provide funding for road repairs, may choose to respond compared to others, who did not respond, may have a different opinion/ response or didn’t feel strongly enough to respond. This will lead to a over representation of the strong opinionated people.
- Part a and b
**a)**using the method, if we sampled repeatedly, 95% of the intervals created would contain the true proportion of people who would reply yes to the survey**b)**no because 50 is included in the interval of plausible values of the 95% confidence interval. This means that 50 is a plausible value for the proportion of all adults who would say yes and therefore we do not have statistical evidence that the proportion is greater

- The reason of the rule is to ensure that we have a large enough sample and that we have an approximately normal sampling distribution.

Teacher Feedback

In part (a), you clearly identify the possibility of non-response bias. However, when discussing bias on the exam, it’s important to pick adirectionof the bias - you’re correct that we may end up with an over-representation of strongly opinionated people, but you should “pick a side” as to how those people will land (either more or less in favor of the proposal than the general public), resulting in either an under or over-estimate of the true parameter. In most cases, you can defend either side, as long as you give a reasonable possibility. In parts (b) and ( c ), you’ve provided correct answers with appropriate context.

- In this scenario, there is a non-response bias because only 450 people out of the 1000 who were mailed the survey chose to respond to the survey. It is reasonable to assume that only people who were very opinionated on the subject chose to reply. People who saw an increase of taxes as a threat may have been more likely to respond than people who saw the increase as a good thing. This results in an underestimation of the true number of individuals who would respond yes to the survey.
- Part a and b
- a) If the city was to repeat the survey multiple times, about 95% of the intervals would contain the true proportion of residents who would reply yes to the question.
- b) This interval does not provide convincing evidence that the true proportion is at least 50% because the interval contains numbers that are less than 50.

- This is the large counts condition and is necessary to ensure that there is a approximately normal sampling distribution.

Teacher Feedback

Great work! All three parts are complete. In part (a), you named the source of bias, explainedhowit might impact people’s responses,andconnected that to the proportion we were trying to estimate. In part (b), you correctly interpret both parts, and in part ( c ) you give the correct reason for checking that condition.

- 1. This is an example of Voluntary Response Bias. When given the option to respond, the people that are inclined to do it follow through with it; in this context that may be citizens that checked their mail, or citizens with strong opinions over whether or not to raise the city tax, based on their economic status or their history with road conditions or other factors.
- Part a and b:
- a. 95% of samples of 161 people capture the true proportion of city residents that would reply “yes” to the question.
- b. The interval implies that we are 95% confident that the true proportion of city residents that respond yes falls between 0.48 and 0.593. 50% falls within this interval, so this is a likely outcome, albeit not the only possibility because I can be less than 50%.

- The success/failure condition indicates that the sampling distribution is approximately Normal.

Teacher Feedback

When discussing bias like in part (a), you need to take it step further and explicitly comment on whether you think the sample results in this case are too high (an overestimate ofp) or too low (an underestimate ofp). In a case like this where it’s not obvious, it’s OK to “pick a side” and just go with it: for example, “citizens who are concerned about tax increases may be more likely to respond and say no, producing an underestimate of the proportion of all citizens who would support the proposal”

In part 2, your confidence level interpretation is well done (though I think you should say “95% of samples of 161produce confidence intervalsthat…”, and you reach the correct conclusion in part b. 3 has the correct rationale for the 10 successes/failures condition.

Browse Study Guides By Unit

👆Unit 1 – Exploring One-Variable Data

✌️Unit 2 – Exploring Two-Variable Data

🔎Unit 3 – Collecting Data

🎲Unit 4 – Probability, Random Variables, & Probability Distributions

📊Unit 5 – Sampling Distributions

⚖️Unit 6 – Proportions

😼Unit 7 – Means

✳️Unit 8 – Chi-Squares

📈Unit 9 – Slopes

✏️Frequently Asked Questions

📚Study Tools

🤔Exam Skills

© 2024 Fiveable Inc. All rights reserved.