VoiceSauce is a set of MATLAB tools that calculates millisecond-by-millisecond acoustic measurements from audio files. Its output is a matrix where each row is a sample, and each column is a measurement (or filename/label information) for that sample (or, to be specific: a measurement for a 25-ms window centered on that sample).
Since segments of an audio recording may have different lengths (e.g., different words/segments are longer than others, and productions by different speakers are longer than others), in order to make direct statistical comparisons across items and speakers, many researchers choose to time-normalize the data in some way. One way to do so is to divide each recorded segment into windows based on proportions of the duration (e.g., the first 20% of a segment, the next 20%, the third 20%, etc.) and average over samples within each of those windows.
The R code described in this page imports acoustic measurements from VoiceSauce output files (.txt format) into R and implements that normalization procedure described above.
This code is only meant as an example; it is very likely that you will have to make some adjustments to it to fit your own data. For example, this code only imports a single file per speaker, but it is possible that you have multiple recordings per speaker (from different recording sessions, different conditions, etc.) that all need to be imported and combined. It is also possible that you have extra preprocessing to do to the data (such as combining or removing certain segments) before averaging samples in windows. You are welcome to modify the code as you see fit for your own purposes.
You may download the code here. Lines that will need to be modified to run the code on your own computer are gathered near the beginning of the script and are indicated with comments off to the right.
numtocut * 2 + 5
) long, then after trimming the beginning and end of the segment there
will be fewer than 5 samples left, and the script will crash when trying to collapse fewer than 5 segments
into 5 windows. Therefore, if your data include very short segments, you should either set numtocut
to something lower (setting it to 0 will make nothing be trimmed at all), or modify the code to remove
short segments before the section of the code that performs the averaging, or modify the code to lengthen
the middle 5 ms of extremely short segments by resampling (e.g. with the signal::resample()
function).Those changes are all that is needed to run the basic script on simple data. For other tasks you may need to adjust other parts of the code. Once the code is ready, you can run it by simply pasting it into R, or using an IDE like RStudio. The output of this script is an R data frame (called windowdata) where each observation (row) is one window from one word from one speaker, and each column holds dependent measures or independent data (speaker, word, etc.) associated with that observation.
By Stephen Politzer-Ahles. For questions, contact me by e-mail.