Peter Jansen | PhD Candidate | Cognitive Science Laboratory | SayWhen Tutorial

SayWhen 0.9a beta
SayWhen is an algorithm designed for the accurate offline measurement of spoken voice onset times for behavioral experiments in cognitive psychology. SayWhen is like a Voice Key -- the most common voice onset measurement tool in use today -- except that SayWhen is far more accurate, and must be used afterwards on a recording of an experiment rather than as the experiment is running.

SayWhen is currently provided in the form of a Windows executable with a (slightly clunky) graphical user interface. It's freely available for download and use for non-profit research or educational purposes.

Currently SayWhen is in beta. While under the hood the SayWhen algorithm itself performs quite well, it was important to us to develop not just a good algorithm, but also a graphical user interface to allow people to easily use it. The interface is functional but not without its bugs -- it's important to read the known issues list before using SayWhen.

about the tutorial
Thanks for choosing to use SayWhen. With any luck you should be up and running, and able to analyze the sample dataset within a few minutes. While SayWhen's interface is fairly standard and should be very familiar to Windows users, there are a few procedures and buttons we'll have to introduce ourselves to before we can easily make use of SayWhen. A good familiarity with the Windows environment, the ability to browse for files, unzip them, load programs, and so on, is assumed.

This tutorial is structured as though one were running a real experiment, though it makes use of sample data. That sample data can be found here.

Step 1: run your experiment!
Of course first you'll need to run the experiment. The sample dataset includes a number of trials from an experiment where participants are shown a target word and asked to say the first word that comes to mind, as quickly as possible. This experiment was run using Presentation (, although your choice of software isn't critical. A few considerations must be met to use SayWhen:
  • A quiet experimental room containing a computer and an audio recording setup including a microphone. An external CD recorder may be used to record the audio, or the computer itself may be -- this isn't critical. The recording must be stereo (see trial onset marker note below).
  • A trial onset marker in the form of a special short .WAV file must be played at the start of each trial. This .WAV file is how SayWhen accurately detects the beginning of the trial. This file is included with SayWhen, and must be played back on only a single channel (left OR right) of the stereo recording. We're currently working to streamline this and accurately detect a different type of trial onset marker that wouldn't require a stereo recording. (The issue here is using a trial onset marker we know we can accurately detect nearly 100% of the time, and that one would almost never encounter in speech or noise).
  • One long recording of one participant's experiment session. SayWhen currently takes as input one big .WAV file that's a recording of the entire experiment, one trial after another, deliminated by the trial onset markers. This method is especially useful for those using CD recorders, but not those whose experiment presentation software records a separate .WAV file for each trial. It has come to our attention that a lot of folks use software that records in this way, so we plan to write a simple utility that will merge many separate .WAV files into one large .WAV file, and deliminate them using the trial onset marker. In this case, one wouldn't have to play the trial onset marker during their experiment. We're currently working on this -- with any luck the utility should be out soon.
The sample dataset was recorded using the above considerations. Let's have a quick look (using Audacity):

sample dataset waveform. click to enlarge

Here we see eight trials, with trial 3 highlighted. The 'blip' before the speech is the trial onset marker -- notice how it's only present on one channel (the top), and not the other. SayWhen's speech onset measurements are made relatively to this marker.

Step 2: prepare your WAV file
This is a quick but necessary step. Load up your favorite WAV editing program and crop your WAV file so that it only contains trial data and doesn't contain any recordings of the experimenter giving the participant instructions, or other such things. The first trial onset marker should be about 1 second into the recording (not too soon), and the last spoken response should have a second or two between it and the end of the file. This step has already been completed in the sample dataset.

Step 3: load the recording in SayWhen
Open Saywhen. Select File > Open, locate and select the sample dataset saywhen_sampledata1.wav, and click Open.

After you click 'Open', SayWhen will read in the data file, then perform it's signal processing algorithms to locate the trial onset markers and determine speech onsets. The speed of this process will greatly depend both on the speed of the computer you're using for the analysis, and the size of the dataset. It may take several minutes to load a very large dataset, so please be patient. The sample dataset we're using only contains a few trials, so it should load up in a couple of seconds. While SayWhen beta is loading and processing a dataset, it may appear unresponsive. Don't worry -- the screen will automatically update when it's done.

After it's loaded, you should see the following screen with the waveform in the middle, and a set of buttons along the top. Let's have a look:

saywhen after loading the sample dataset. click to enlarge

Here we're focused on trial one's onset marker, and ready to start exploring SayWhen's onset marking.

Step 4: becoming familiar with the interface
Let's have a quick introduction to the SayWhen Interface:

an illustration of the saywhen interface.

This illustration contains all the main components of the SayWhen interface. In the main view area we see the waveform of the recording, and along the top there are a number of buttons. We'll also use the mouse for some input -- let's take a quick look at each.

The Menu bar contains the File, View, and Help icons. Of these, File is the most useful. It's 'Open' option allows us to open WAV files of our recordings, and it's 'Save' option allows us to save text files containing speech onset data.

The Toolbar or Button Bar contains a set of buttons we'll often use. From left to right, these are:
  • Open. This button allows one to open a WAV file containing speech data and trial onset markers.
  • About. Contains information about the version of SayWhen you're currently using.
  • Previous Trial / Next Trial. These two buttons allow one to skip to the trial onset marker of the previous or next trial, respectively.
  • Move Trial Onset Marker. Allows a poorly detected trial onset market to be moved using the mouse. In our tests this has occurred extremely infrequently, but is included just incase.
  • Skip to Speech Onset Marker. Skips to the speech onset marker.
  • Delete Current Trial. Delete's the current trial. Used in the infrequently case where a trial onset marker is incorrectly identified.
  • Add Trial Onset Marker. Add a new trial onset marker using the mouse. Used in the infrequent case that a trial onset marker is not identified.
  • Algorithm Setup and Settings. Contains various parameters associated with the SayWhen algorithm.
  • Skip to Previous / Next Flagged Trial. These two buttons allow one to skip to the speech onset marker of the previous or next trial that was flagged by the SayWhen algorithm as potentially having been detected inaccurately and requiring human attention.

The Viewport contains the trial waveform, onset markers, and trial information.
  • Purple and Red bars represent the trial onset marker and speech onset marker, respectively.
  • Trial Information contains the current trial number, it's time index in the WAV file, the detected reaction time (RT), and the distance to the next trial onset marker. This distance marker is a useful tool for determining if a trial marker was missed -- if the distance to the next trial is much larger than normal, it's possible there's a missed trial marker in this trial.
  • Sliders let one scroll through the waveform. It's likely very long, and at a fairly high zoom level. You'll likely use the 'skip to next trial' / 'skip to speech onset marker' buttons more often than actually manually scrolling through the trial.
  • The Status Bar at the bottom of the viewport normally reads 'Ready', but contains helpful descriptions if you hover the mouse over some of the buttons.

Finally, the Mouse is likely how you'll be doing most of the navigation through SayWhen.
  • Left Click activates buttons in the button bar, or changes the Speech Onset Marker when used in the viewport. If you've just selected the 'Change Trial Onset Marker' or 'New Trial Onset Marker' buttons, the next click in the viewport will be used to define the new location for the trial onset marker.
  • a Right Click in the viewport will play a brief section of the WAV recording from where you clicked. It will also overlap a small portion of the display with some spectral information and information about average amplitude -- you can largely ignore this, but it's useful for calibration.

Step 5: browsing through the trials
With that, you're now prepared to start browsing through the trials. Typically one can either inspect all trials, a flagged subset of them, or none of them. The Previous/Next Onset buttons, as well as the Previous/Next Flagged Onset buttons, allow one to explore the data using one's preferred method. If you've skipped to a trial's purple trial onset marker and would like to skip to that same trial's red speech onset marker, remember this can be done quickly using the 'Skip to Speech Onset Marker' button.

While you'll likely develop your own pattern and heuristics for verifying trials, a typical pattern might include:
  • Verify the (purple) Trial Onset Marker was detected correctly. It's important to verify that all trial onset markers have been correctly detected -- if one was missed or an extra one added, your trial indices will be offset when you export your data.
  • Skip to the (red) Speech Onset Marker.
  • Right-click the red line to play a short section starting from the trial onset marker. Does it sound like the beginning of a word?
  • Slide the View left and right. Does it look like there are any phonemes that were missed early on? Are there any very low-duration /f/ or /s/ phonemes present that were overlooked? Does the waveform continue to the right quite a bit from the red line?
  • If the speech onset marker needs to be moved, determine its best new location, and left click. It's that simple!
  • Once you're satisfied, move onto the next trial.
The sample data includes a couple of interesting trials to help you recognize some issues. Go through the sample data on your own and see what you can come up with.

Step 6: taking up the sample data
In the sample dataset, SayWhen should detect each trial onset correctly. The reaction times and other information for each trial are as follows:

  • Trial 1 (bee). SayWhen RT = 764 msec
    This looks to have been automatically tagged well.

  • Trial 2 (fire). SayWhen RT = 359 msec
    This trial has a very low amplitude onset that was missed.

  • Trial 3 (doctor). SayWhen RT = 933 msec
    This looks to have been automatically tagged well.

  • Trial 4 (bread). SayWhen RT = 812 msec
    This looks to have been automatically tagged well.

  • Trial 5 (east). SayWhen RT = 597 msec
    This looks to have been automatically tagged well.

  • Trial 6 (bad). SayWhen RT = 594 msec
    This looks to have been automatically tagged well.

  • Trial 7 (lose). SayWhen RT = 670 msec
    This looks to have been automatically tagged well.

  • Trial 8 (s..uhh..end [false start]). SayWhen RT = 664 msec
    This is a spoiled trial -- it contains a false start, and we likely wouldn't include it in our experimental analysis. Even still, SayWhen seems to have caught part of the early phoneme of the false start.

  • Trial 9 (nuts). SayWhen RT = 686 msec
    This looks to have been automatically tagged well.

  • Trial 10 (right). SayWhen RT = 585 msec
    This is an interesting trial -- Saywhen's definitely caught the earliest signal that isn't noise. It's a very low-amplitude signal, and continues all the way to a more recognizable onset. It's not clear if this is the onset of a low-amplitude start to the /r/, or if it's another noise -- you decide.

  • Trial 11 (earth). SayWhen RT = 810 msec
    This looks to have been automatically tagged well.

  • Trial 12 (square). SayWhen RT = 722 msec
    This is also an interesting low-amplitude onset. SayWhen catches the earliest onsets of the /s/ within about 2 msec of what a human would likely tag.

  • Trial 13 (fruit). SayWhen RT = 673 msec
    A final low-amplitude onset. This trial is particularly interesting in that SayWhen misses the initial phoneme, and tags near the onset of the /r/. What's particularly interesting about this phoneme is that its amplitude over most of its duration is very near the noise level of the recording -- it's difficult (even for a human) to tell there's a phoneme there without listening to the waveform. Don't worry! Let's continue on to the next step in our tutorial (which we've intentionally skipped) -- calibration!.

Step 7: calibration
So far SayWhen's done fairly well on our sample data set -- it's tagged most of the onsets very well, and flagged several for our attention. Most importantly, though, there are a couple of trials it hasn't done so well on, so it's vital that we do check through and verify its work. As you've probably noticed, this process becomes very quick after you've worked through a couple trials -- a few clicks over a few seconds and an attention to detail is often all that's required to verify a trial.

Something that we haven't yet discussed is the idea of calibration. Finely tuning SayWhen's parameters to your recording has the potential to significantly increase it's performance and detection efficiency (but the opposite is also true, so we must take care when tuning the parameters). Tuning and calibration is also likely to be a must when you try SayWhen on your own recordings -- you'll likely have to calibrate SayWhen for the noise level of your recording environment with a particular recording setup. The SayWhen setup dialog can be accessed by the Algorithm Setup and Settings button, and is pictured below.

the saywhen setup dialog.

There are quite a few parameters here, but don't worry -- in normal use you'll likely not have to tinker with them unless you're using SayWhen for a different language, or a dramatically different recording environment with a poor signal to noise ratio or very high gain on the recording. The algorithm and these parameters are all described in the Behavior Research Methods paper, so we don't go into detail for most of them. There are a few common ones that we'll now quickly investigate, centered around problem trial tagging, and the noise threshold. Most of these parameters are either durations, specified either in terms of samples or milliseconds, or amplitudes specified in arbitrary units used by the WAVE recording.

You probably noticed that a number of trials in the sample dataset are flagged for having speech onsets faster than a certain time, or having amplitudes below a certain threshold. The Problem Trial Tagging section let's us tune these.
  • the Quiet Phoneme Threshold lets us alter the threshold for the low-amplitude onset phoneme flagging. Decreasing this value means that ever quieter phonemes are flagged as potentially requiring human attention, at the expense of more time required for a human to check them over. Increasing this value means that fewer phonemes will be tagged, at the expense of potentially missing incorrectly detected onsets.

  • the Minimum/maximum Trial Reaction Time parameters are highly dependent upon each experiment, and can help detect spoiled trials or environmental noise. Several of the trials in our sample dataset have been categorized as too fast -- we'll likely have to tune these values for the sample dataset's experiment.

Of the other parameters, the Check-forward-for-silence heuristic is probably of the most interest to our current needs.
  • the Noise Threshold helps reject sudden spikes due to environmental noise by checking slightly ahead of any detected candidate signals for other signals. It does this by looking at the average amplitude ahead of the signal, and checking to see if it's above this threshold. The hitch is that some very quiet phonemes may actually be detected by SayWhen, but categorized as noise by this heuristic and passed by. This is likely what happened for both of the missed trials in our sample dataset.

Time to calibrate. The values you use are entirely up to you, and what you feel best suits the situation. Remember that right-clicking the waveform will display the average amplitude for a short region ahead of the point you clicked on, and this can be very helpful when calibrating. Whatever values you choose, clicking on Rescan will erase all the current onsets, then scan through the entire WAVE file again using the new values.
  • In this case we can probably alter the minimum/maximum trial reaction time to values that better suit this experiment -- say 500msec/1500msec.
  • the noise threshold, as we discussed earlier, is critical -- so it's important we verify whatever changes we make work correctly. I tend to prefer conservative values, and don't mind going through to check each trial -- it's very quick using SayWhen.

    Let's try lowering 180 to 170, and hitting 'Rescan'. How did that change the results? This doesn't have much of an effect.

    Let's now try lowering the noise threshold down to 100, to see what will happen. Hit 'Rescan'. How did that change the results? Trial 13 has now been caught, and the others are unchanged.

    Just for the fun of it, try increasing the threshold to 2000, and rescan. What happens? Now we categorize many of the initial phonemes (and in some cases, the entire word) as noise, and skip right them. Don't forget to set the threshold back to a more useful value, and rescan before continuing.

Step 8: saving onset data
After you're done verifying and correcting the onsets of your data, you'll likely want to examine these onsets in another program to continue your experimental analysis. SayWhen saves speech onset data as comma-deliminated text files (CSV), which can be easily read by many data analysis packages and spreadsheets including Excel and OpenOffice.

To save, simply select File > Save As..., and once you've completed the process, try to load it up. With any luck, you'll see something like this:

the sample dataset saved as a CSV and loaded in openoffice.

Congratulations! With any luck, you've now successfully used SayWhen to analyze the sample dataset!