Advanced level (2 hours)

↵ Back to module homepage

Find a dataset that would be appropriate for testing with a t-test. This may be a dataset from a research paper (many are available at this repository, although not all will be appropriate for t-tests*) or it may be any random thing (there's lots of random "big data" datasets online nowadays, or things like country GDP, COVID-19 cases, or all kinds of other things that you can download); it doesn't need to be related to linguistics. You can use the data as-is, or you can change the variables to be something more meaningful (e.g., you might download some data about the height of people from different countries, but call it "language proficiency" instead of "height" to make it easier to link to your own research). If a dataset has lots of variables, you don't need to analyze all of them; you could just choose a subset of the data would would be appropriate for a t-test. Once you've chosen a dataset, do a t-test.

*In particular, be aware that to use a t-test, you have to have just one (for independent t-test) or two (for dependent t-test) data points per person. Many of these sample datasets may have lots of data for each person; in that case, you would first need to average the data within each person before you can do at-test. Or you can search for data that are already in the proper format.

Describe the hypothesis that you want to test on these data, describe the results of the t-test you carried out, and describe what conclusion you can make based on the p-value of that test. (Remember that in activity #2 of this module, all the conclusions I listed there were "false"; this is your chance to describe what an accurate conclusion from statistical test can be.)


by Stephen Politzer-Ahles. Last modified on 2021-05-15. CC-BY-4.0.