A p-value is calculated with a hypothesis test (more formally
called a null hypothesis statistical test). The example you read about in Task 1 of
this module was one way of carrying out a hypothesis test: I created a fake population where there was no difference between the
groups, and then looked to see how many samples from that population would show a big difference. (We won't be conducting any more
tests like that in this class, but that kind of test is called a permutation test, in case you were wondering.)
It often is not feasible to create a fake population like this (for example, we might not known how much variation in
age there should be in the population, so we wouldn't know how to make up the fake population). Instead, people usually use a math
formula to approximate this kind of test. Specifically, they use a math formula to calculate a number (called a
test statistic) which falls somewhere on a known distribution (unlike my example in
the earlier task, where I had to make up a distribution by simulating tests on fake data 5000 times) and then get the p-value
by seeing where in the distribution the test statistic falls.
One of the simplest and most common examples of a statistical procedure like this is the t-test. Imagine that
I have ten students, and I compare each student's score on a math test and reading test; I want to see if the students do better at
reading than math. For each student, I can calculate the reading test score minus the math test score to see how much better they
did at reading. Here is a set of scores:
5, 17, -6, 3, 12, -11, 8, 13, 10, -2
We can see that, on average, students did better on reading than they did on math (the average of these values is
4.9, meaning they scored an average of 4.9 points higher on reading than they did on math). But some students actually did worse on
reading. And, as we know, the results from this sample might not match the results of the population. We want to do a statistical
test to help us decide whether to conclude that, in the population, this reading-minus-math difference is likely to be bigger than
zero (if reading and math scores are the same in the population, the difference between them would be zero). Keep in mind that
calculating a p-value does not actually answer this question, for the reasons we discussed in the previous tasks (a
p-value tells us about the probability of a result under a certain population difference, it does not tell us the
probability of a population difference given a certain result); nonetheless, you may be expected to calculate a p-value
anyway.
To get a p-value, we first calculate a t statistic using the below formula:
\(t = \frac{\bar{x}}{^s/_{\sqrt{N}}}\)
\(\bar{x}\) refers the average of the results (4.9), s refers to the sample standard deviation (a measure of
how much the results vary across people; you can calculate this in Excel or other statistical software, or by hand), and N
refers to the number of participants. If I plug all the numbers into the equation I get the following:
So, the t-statistic for these data is 1.7313. The next step is to know what p-value this corresponds
to. In the past we would do this by looking up the value from a table, but nowadays most statistical software calculates this
automatically. In this case, the p-value is .05872, meaning that if there were no difference between reading and writing
scores in the population, there is a 5.872% chance that we might have observed a t-value of 1.73133 or bigger in this
sample. (A good rule of thumb is that if your study has a large enough sample, t-values above 2 or below -2 will tend to
have p-values below .05. If your study has a small sample [e.g. less than about 30 participants or items] this rule of thumb
will not work reliably, though.)
There are many more nuances to how to do a t-test. First of all, the formula for calculating a t-value
is slightly different if you are just examining one average (as in our present example, seeing if the number is different from zero)
or if we are examining two averages that come from two different groups of people (as in the PolyU/HKUST example discussed above).
And the way you calculate a p-value from a t-statistic depends on your previous predictions (in this case, I had
predicted beforehand that reading scores would be higher; if I had not made that prediction, the p-value for this same
t-statistic would be different). If you plan to use a t-test in your own research, you will need to read more about it
to make sure you handle these issues correctly, as what you've read here is just a brief crash course and is not yet enough to
prepare you to use these tests responsibly (although some of these issues will be addressed in the next activity in this module).
Nevertheless, the formula for the t-statistic is a useful formula to know, because it gracefully illustrates
all the things that are important when you design research. To get a statistically significant result (i.e., a small p-value),
you want to get a high t-statistic (the higher the t-statistic is, the smaller its corresponding p-value will
be, if all else is held constant). If you look at the formula for the t-statistic above, and do some basic math, you should
see that there are three things you could do to make t bigger:
If \(\bar{x}\) (the size of the effect) is bigger, t will be bigger;
If s (the variation across participants) is smaller, t will be bigger;
If N (the number of participants) is bigger, t will be bigger.
Therefore, the t formula is a perfect summary of the three things you can do in your study to maximize the
chance of finding a significant effect. If you try to find bigger effects (by doing whatever you can to make the difference as big as
possible; e.g., choosing tests that accentuate the difference between math and reading, as opposed to choosing a "math" test that
still requires a lot of reading), minimize the variation across participants (e.g., by trying to test people under as similar
circumstances as possible, rather than e.g. testing some students at night and some in the morning), and find as many volunteers as
you can, you can increase your chance of finding a significant effect. Even if you never actually use a t-test, the t
formula is an excellent reminder of how to design good research.
To practice calculating a t-statistic, try the exercise below.
Imagine I test students at both the beginning and end of the semester to see if they improve on some test. At the end
of the semester, I take their end-of-semester score minus their beginning-of-semester score to see how much
improvement they had. Now imagine my class has 1000 students but I could only perform these tests for 20 randomly
selected students. Therefore, maybe the university wants to know if the improvement I observed in just 20 students is
statistically significant (even though a statistical significance test doesn't tell us whether there was reliable
improvement across the whole population of students).
Below are the 20 improvement scores for the students I tested. Calculate the t-statistic for checking if these
improvement scores are significantly greater than zero.
When you have finished these activities, continue to the next section of the module:
"Other types of t-tests".
Answer: 3.866215. You can get this by plugging the mean difference score (5.8), standard deviation (6.708988), and number of students (20) into the t formula.