Paired t Procedures
Objectives: In this module you will learn how to construct a confidence interval and perform a paired t-test in the case when we have two quantitative variables collected in pairs. You will make a confidence interval for and test hypotheses about the population mean difference,. You will be able to provide a statement about how confident you are about your interval estimate or in your decision.
Overview: Matched or paired data results from a deliberate experimental design scheme. For example, suppose we are examining the effect of a drug on a certain type of response. The drug is administered to a group of people. Responses for each individual can be measured both before and after the drug is given. Or consider an experiment where rats are matched by weight, and then one rat in each match receives a new diet and the other rat in the match receives a control diet. These types of design are called paired data designs. Note that paired designs can occur when you have two measurements on the same individual or when you have two individuals that have been matched or paired prior to administering a treatment.
The inference procedures for a paired data design are based on the one sample of differences, and thus the one-sample t procedures from Module 5 could be used. We are interested in estimating or testing hypotheses about the population mean difference, generally with the hypothesized value of zero, indicating no difference on average.
Activity: Do books purchased from Borders (in-store) cost more on average than if purchased online at Amazon.com?
Background: In recent years, the popularity of purchasing books via the Internet has increased dramatically. The conventional bookstore no longer dominates the sales of books. The most influential factor that sways customers into purchasing books online is lower prices when compared to local bookstores.
A group of Statistics 350 students decided to perform a comparison of the Amazon.com prices versus Borders bookstore (Ann Arbor) prices based on a sample of 40 books, selected from a wide range of categories. For Amazon, a standard ground shipping of $4.29 and local state tax of 6% were included in the cost.
The corresponding costs are available in the SPSS data set called books.sav (Source: Stat 350 group project, 2004). Do the data provide sufficient evidence to conclude that, on average, Borders (in-store) books are more expensive than Amazon.com books?
Task: Perform the appropriate paired t-test regarding the mean difference in book price,, where the differences are computed as “Borders less Amazon” (i.e. ‘price at Borders’ minus ‘price on Amazon’).
Before conducting any test, here are a set of questions to ask yourself:
* How many populations are there? One Two More than two
* How many variables are there? One Two
* What is the response variable?
* What type of variable is the response? Categorical Quantitative
* What type of parameter would be useful for summarizing this response?
Proportion Mean Other (see Supplement 3)
Based on the answers to these questions, you should be able to identify the appropriate inference procedure. You may refer back to Supplement 3 – Name that Scenario for assistance.
The appropriate inference procedure for this scenario is ______________________________
and the specific parameter of interest is ___________________ .
NOTE: Why is this a paired procedure?
1. State the hypotheses: H0: __________ = __________ Ha: _______________________
where _____ represents
Your parameter definition should always be a statement about the population(s) under study.
2. Assumption Checks and Computing the Test Statistic:
a. For this scenario, we need to assume that the sampled differences are a ________ sample.
To check this assumption, we would make a _______ plot (if there was time order) of the
_________________ and look for _____________________________________.
b. We also need to assume that the ___________________ of differences
is normally distributed. To check this assumption, we would make a _____________ plot
of the __________________.
c. We will assume the assumptions are reasonable for this example.
d. Generate the paired t-test output.
Use Analyze> Compare Means> Paired-Samples T-Test.
Note: If you want a CI, you can use Options to change the confidence level from 95%.
e. The test value is _____ (this is the null value from the null hypothesis).
f. What is the value of the test statistic?
g. What is the distribution of the test statistic if the null hypothesis is true?
This is the same as asking what model you use to find the p-value.
3. Calculate the p-value:
a. What is the SPSS reported p-value? _____________. Is it the p-value we want? _____
b. Draw a picture of the p-value we want.
c. So, our p-value is _____________________
d. Provide an interpretation of the p-value.
What is your decision at a 5% significance level? Reject H0 Fail to reject H0
Remember: Reject H0 <=> Results statistically significant
Fail to reject H0 <=> Results not statistically significant
5. Conclusion: State your conclusion in the context of the problem.
Conclusions should not be too strong — i.e. say you have sufficient evidence or equivalent, do NOT say we have proven.
Conclusions should always include a reference to the population parameter of interest.
Check Your Understanding:
1. The denominator of the test statistic is the standard error of the sample mean difference. The following two sentences attempt to interpret a standard error. Which one is correct and why?
“The standard error of the sample mean estimates roughly the average distance of the sample mean from the population mean”
“The standard error of the sample mean difference estimates roughly the average distance of the observed differences from the population mean differences”
Think About It:
What is the connection between the paired t-test procedure and the one-sample t-test procedure from Module 5?
How could you carry out this test via the one-sample procedure?
Try this and compare your results. Comment on your findings.
Example Exam Question on Paired t-Test
A utilization study was conducted to see how often two rooms of a sports facility were being used during the lunch hour. The number of people in each room was counted at 12:30 noon each Monday for 10 weeks. The results are summarized below.
Suppose you want to test the hypothesis of no difference between the utilization of the two rooms against the alternative that the dance studio is used by more people during the lunch hour on average. You conduct a (matched) paired t-test and enter the above data into SPSS to obtain the following output.
a. The observed test statistic is given as t = 2.13. State what this value tells you about the location of the sample mean difference of 3.6.
b. State the appropriate null and alternative hypotheses, and define the parameter of interest.
H0: _______ _____________________ Ha: _______ ____________________
where ______ is ____________________________________________________.
c. Report the p-value for the test in part (b) and decision using a significance level of 0.10.
p-value: _____________________ Decision: (circle) Reject H0 Fail to Reject H0
d. A decision was made in part (c). Which type of error could have been made? Use the appropriate statistical name to identify the mistake. Error:
e. Circle each of the following statements that is an assumption required for performing the paired t-test.
…the population standard deviation for the difference in room use is known.
…the numbers using the dance studio are normally distributed.
…the difference in room use is normally distributed.
…the standard deviation for the number using the dance studio is equal to
the standard deviation for the number using the weight room.
…the numbers using the dance studio are independent of the numbers using the weight room.