Practicing with FRQs is a great way to prep for the AP exam! Work through this FRQ from Unit 6, then review sample student responses and corresponding feedback from Fiveable teacher Jerry Kosoff!
After completing a sale, a car company likes to send a follow-up survey where customers can indicate their level of satisfaction with their experience. One of the questions in the survey asks “would you recommend our company to a friend looking to purchase a vehicle?” The company wonders if people would answer the question differently based on whether they bought a new or used vehicle. From a list of all 2018 vehicle sales, the company randomly selects 105 customers who bought a new vehicle 120 customers who bought a used vehicle. 88 of the customers who bought new vehicles answered “yes,” while 85 of the customers who bought used vehicles answered “yes.”
At the significance level of 0.05, do the data provide convincing statistical evidence that the proportion of customers who would answer “yes” to the survey question is different for new vs used vehicle sales?
P(yes|new) = 88/105 = 0.838
P(yes|used) = 85/120 = 0.708
margin of error = +/- (1.960)sqrt[(0.838(0.162)/105)+(0.708(0.292)/120)]
margin of error = +/- 0.108
confidence interval = 0.05 +/- 0.108 = (-0.058, 0.113)
No, the data do not provide convincing statistical evidence that the proportion of customers who would answer “yes” to the survey question is different for new vs used vehicle sales. Since 0 is captured in the 95% confidence interval of -0.058 and 0.113, the data shows that the true difference in proportions could be 0.
Teacher Feedback
I’ll give feedback on your work below, but I want to start with noticing that you used a 2-sample confidence interval to answer the question. That is a totally valid strategy for a situation like this, but only because the alternative hypothesis was “different”; had the scenario asked “higher” or “lower” the confidence interval would not work in the same way. Typically, when given a significance level, and asked if there is “convincing statistical evidence” of something, we should be running a hypothesis test. That said, you will still be scored for your work with the confidence interval.
The scoring for a “convincing statistical evidence…” scenario includes:
Stating null/alternative hypotheses
Defining the parameters in the null/alternative hypotheses
Choosing an appropriate test/interval by name
Checking the conditions to run the chosen test/interval
Writing the results from the chosen test/interval
Correctly interpreting the results from the chosen test/interval in terms of whether we do or don’t have evidence for the alternative hypothesis.
Given that list (some parts are scored together to create a question with 3-4 scoring components), you can likely see that your work doesn’t have enough there to be earning much of the available credit. You calculate the appropriate margin of error, and therefore obtain a confidence interval, but never name the interval, check conditions (random samples, approximately normal sampling distribution [at least 10 successes/at least 10 failures], 10% condition), or write hypotheses. Additionally, you used “0.05” in the interval, instead of using (0.838 - 0.708 = 0.13) as your difference of proportions to add/subtract the margin of error. That would have led you to a different confidence interval where 0 was not included. Given that your interval did include 0 though, your conclusion that we do not have convincing evidence would get scored as correct, because you interpreted the answer you got correctly. Unfortunately, you would not get credit for the other components of the question.
p_1 = the proportion of customers who bought a new vehicle and answered yes to the survey question
p_2 = the proportion of customers who bought a new vehicle and answered yes to the survey question
2-sample z test for p_1 - p_2
H_0: p_1 - p_2 = 0, H_a: p_1 - p_2 not equal 0
Conditions:
Random - Stated that the company “randomly selects” customers for the survey
10% Condition for Independence - satisfied since it is safe to assume that there are at least 105(10) = 1050 customers who bought a new vehicle at the car company, and at least 120(10) = 1200 customers who bought a used vehicle at the car company.
Large Counts Condition - satisfied since
n_1p-hat_1 = 1050.838 = 87.99 >=10
n_1*(1-p-hat_1) = 1050.162 = 17.01 >=10
n_2p-hat_2 = 1200.708 = 84.96 >= 10
n_2(1-p-hat_2) = 120*0.292 = 35.04 >=10
With Large Counts Condition satisfied, the sampling distribution of p-hat_1 - p-hat_2 is approximately normal.
p-hat_1 = 88/105 = 0.838, n_1=105
p-hat_2 = 85/120 = 0.708, n_2=120
z* = 2.303
P-val = P(z>=2.303 or z<=-2.303) = 0.021247
Since 0.021247 < alpha of 0.05, we reject the null hypothesis, because there is convincing statistical evidence that the proportion of customers who would answer “yes” to the survey question is different for new vs used vehicle sales.
Teacher Feedback
Strong execution from top to bottom, presented clearly. One thing that you’re going to facepalm about: you defined the parameters p1 and p2 as the exact same thing. “p2” should say “used”
Test type- two sample proportional difference hypothesis z-test
H0= p1-p2=0
Ha=p1-p2 does not = 0
p1=the proportion of all customers from a list of 2018 vehicle sales who bought a new vehicle and would recommend our company to a friend
p2=the proportion of all customers from a list of 2018 vehicle sales who bought a used vehicle and would recommend our company to a friend
Conditions
Since we are dealing with a two-sample proportional difference hypothesis test, we will have to pool/combine our proportions.
pc=88+85/105+120=0.75
qc=1-0.75=0.25
*we have to check for the independence of our pooled data --> .75(225)=168.75 >=10 & .25(225)=56.25 >= 10
Conditions for new cars:
- simple random sample: stated in the problem- “the company randomly selects 105 customers…”
- independence: 10(105)=1050 Assume that the population of new cars purchased in 2018 is greater than 1050.
- normal: .84(105)=(88 >= 10), .162(105)=(17 >= 10)
All of the conditions for new cars are met.
Conditions for old cars:
- simple random sample: stated in the problem- “the company randomly selects…120 customers…”
- independence: 10(120)=1200 Assume that the population of new cars purchased in 2018 is greater than 1200.
- normal: .708(120)=(85 >= 10), .292(120)=(35 >= 10)
- All of the conditions for old cars are met.
Solve
Conclusion
Teacher Feedback
This is about as thorough a response as I’ve seen! Very well done - you’ve nailed all of the components.
p1= Proportion of customers who bought a new car and answers “yes” to the survey question.
p2= Proportion of customers who bought a used car and answered “yes” to the survey question.
Ho= p1-p2=0 Ha= p1-p2 does not equal 0
We are interested in conducting a 2 sample z test for a difference in population proportions.
Conditions:
Random- A random sample of 105 customers who bought a new vehicle and 120 customers who bought a used vehicle is taken
Normal-
Sample of new cars: np = 105 * 0.838= 88 is greater than or equal to 10.
n(1-p)= 105(0.162)= 17 is greater than or equal to 10.
Sample of used cars: np= 120* 0.708= 85 is greater than or equal to 10.
n(1-p)= 120(0.292)= 35 is greater than or equal to 10.
Calculator: 2-Prop Z Test {x1=88, n1=105, x2=85, n2=120, p1 does not equal p2} = p: 0.0212
Since the p-value of 0.0212 is less than our alpha level of 0.05, we have convincing statistical evidence to reject the null hypothesis. The proportion of customers who would answer “yes” to the survey question is different for new vs. used vehicle sales.
Teacher Feedback
Nice job! You’ve defined parameters, checked conditions, named the test, obtained appropriate test statistic and p-value, and made an appropriate conclusion.
Hypotheses
H_o: p_1 = p_2
H_a: p_1 ≠ p_2
Where p_1 is the true proportion of customers who bought a new vehicle and answered “yes” to the survey question.
Where p_2 is the true proportion of customers who bought a used vehicle and answered “yes” to the survey question.
Assumptions
Independence:
Normality:
n_1 * p-hat_1 = 105 * 0.8381 = 88 ≥ 10
n_1 * (1-p-hat_1) = 105* (0.1619) = 16.9995 ≥ 10
n_2 * p-hat_2 = 120 * 0.7083 = 84.996 ≥ 10
n_2 * (1-p-hat_2) = 120 * 0.2917 = 35.004 ≥ 10
Since all 4 are greater than 10, the sampling distribution is approximately normal.
Calculations
p_hat_combined = 105(0.8381) + 120(0.7083) / 105+102 = 0.7689
z = (0.8381 - 0.7083) - 0 / sqrt((0.7689 * (1-0.7689) / 105) + (0.7869 * (1-0.7869) / 120) = 2.3036
p-value = 2*normalcdf(2.3036, 1E99, 0, 1) = 0.0212
alpha = 0.05
p-value<alpha
Conclusion
Since the p-value<alpha, we reject the H_o. There is sufficient evidence to suggest that the proportion of customers who bought a new vehicle and answered “yes” to the survey question is different from the proportion of customers who bought a used vehicle and answered “yes” to the survey question.
Teacher Feedback
Well done from top to bottom - you’ve got parameters, conditions, appropriate calculations, and appropriate conclusions. You’re ready!
H null: p1-p2=0
H alternative: p1-p2 doesn’t equal 0.
Anything with subscript 1 pertains the new vehicle group while any thing with subscript 2 pertains to the used vehicle group.
The test we will be using is a two sample z-test (for difference in proportion).
Conditions:
np1 hat and nq1 hat both greater than 10 (or 15). np2 hat and nq2 hat both greater than 10 or (15). Calculations omitted, but it is clear that both groups both have more than 10 successes and failures.
Samples are random and independent. This is met as the question specifies that the selection of customers from both groups was random, and it is clear that the selection of a customer from one group does not affect the selection of a customer from the other group.
It is also safe to assume that the number of customers who bought new and used cars is at least 10510 = 1050 and 12010 - 1200 people, respectively.
Z= (p1 hat - p2 hat)/(phat *(1-phat)(1/n1 + 1/n2))^.5 where phat is the pooled proportion = 173/235 = 0.7361
p1 hat = 0.8381
p2 hat = 0.7083
Z=0.1298/((0.7361)(0.2639)(0.0179))^.5
Z=2.19
pvalue = 2(1-0.9857) = 0.0286.
Conclusion: Since the pvalue of 0.0286 is less than the alpha level of 0.05, we reject the null hypothesis, leadings us to the conclusion that the data does provide convincing statistical evidence that the proportion of customers who would answer yes to the survey question is different for new vs used vehicle sales.
Teacher Feedback
Good work! You’ve done all required parts of a hypothesis test and answered appropriately. The only place where you might not receive full credit is in your hypotheses: you’ve used symbols, and named which group is which, but did not define the parameter in context. The parameter here was “proportion of people who would say yes to the survey”, and then can be differentiated between new/used sales.
STATE
Ho: p1=p2
Ha: p1=/p2 (=/ means not equal to)
Alpha = 0.05
p1: The proportions of customers that answered “yes” to “would you recommend our company to a friend looking to purchase a vehicle?”from buying the new vehicle
p2: The proportions of customers that answered “yes” to “would you recommend our company to a friend looking to purchase a vehicle?”from buying the used vehicle
2 sample z test for proportions
PLAN
Random: met, since stated in the problem that company selects customers randomly.
10%: 105<= 1/10 all customers who bought new vehicle
120<= 1/10 all customers who bought used vehicle
Large Counts: For new: 105(88/105)>=10 105(1-(88/105))>=10
For Used: 120(85/120)>=10 120(1-(85/120)) >=10
Since, Large counts conditions are met for both we can assume that the distribution is approximately normal.
DO
pc = 88+85/105+120 = 109/840
z score = 109/840 / root 0.769(1-0.769)/105 + 0.769(1-0.769)/120 = 2.3039
p-value=0.0212
CONCLUDE
Since, the p value is 0.0212 and the alpha level is 0.05, the p value is smaller than alpha. Therefore, we reject the null hypothesis of p1=p2. We have convincing evidence to say that there is a difference between the proportion of customers who would answer “Yes” to the survey question is different from new vs used vehicle sales.
Teacher Feedback
Nicely done! You’ve done all components of a hypothesis test correctly (and shout-out to the state-plan-do-conclude method that’s in the textbook I use as well).
2-sample z test for proportion
Ho: p1 = p2
Ha: p1 =/= p2
Conditions:
Random: We are told that the customers are “randomly selected”
Normal: It is approximately normal because:
10% condition: There are more than 10 x 120 and 10 x 105 customers. The two samples are independent from each other.
Calculation: Pc = 88+85/105+120 = 173/225
z= (88/105-85/120)-0/ sqrt(173/225)(52/225) x sqrt(1/105+1/120) = 2.304
p-value: .0212
Interpret: Because or p-value (.0212) is below the alpha (0.05) we reject the Ho. There is significant evidence that the proportion of customers who would answer “yes” to the survey question is different for new vs used vehicle sales.
Teacher Feedback
The only part you need more on is your hypotheses; you don’t define what you mean by “p1” and “p2”, so you’d only get partial credit there (you didn’t define the parameter). Every other part is done appropriately and would earn full credit.
pnew= proportion of customers who bought a new car and would recommend to a friend.
pused= proportion of customers who bought a used car and would recommend to a friend.
H0=pnew−pold=0
Ha=pnew−pold≠0
Conditions:
Random: “The company randomly selects 105 customers who bought a new vehicle and 120 customers who bought a used vehicle”
Normal: For the new vehicle population, there is 88≥10 successes and 105−88=17≥10 failures. For the old vehicle population, there is 85≥10 successes and 120−85=35≥10 failures.
Independent: It is reasonable to assume that the population of the people who bought new and old cars is at least 1050 and 1200 respectively.
We will be conducting a 2 sample z-test for difference of population proportions.
Calculations:
p=0.021
Conclusion:
Since our p-value of 0.021<0.05 , we can reject the H0 . We have convincing statistical evidence that there is a difference between population proportion of the people who would recommend to a friend between the people who bought new and old cars.
Teacher Feedback
Very good work. Little thing: on “calculations” part, it is typical to show both the test statistic (in this case, your z), as well as the p-value.