One of your most important decisions when analyzing vocal sounds is choosing
methods and setting
parameters for distinguishing vocalization from background noise. At the macro scale, it allows you to detect bouts of vocalization, and at shorter time scale, to segment the sound to syllables (vocal events with short stops between them) and silences.
Even though some analysis can be done on the continuous signal in file, once vocal events are identified and segmented it is possible to do much more, e.g. identify units of vocalization, classify sounds and compare them by similarity measurements and clustering methods.
In this chapter we focus on non-real-time analysis, but similar approaches of identifying vocal sounds are also used in real-time analysis during recording. However in real time, we usually need to make intermediate steps to give way to higher priority processes (the recording itself), so that
Sound Analysis Recorder first makes crude decision what sound should be temporarily saved and, a few seconds later, the live-analysis engine make proper segmentation and decides what sound files should be processed and permanently saved to specific folders.
In SAP2, detection of animal sound is based primarily on amplitude envelop. However, certain spectral filters can be set to reject noise or band-limit the amplitude detection. We offer the following approaches:
- Use a fixed amplitude threshold to segment sounds
- Use a dynamic (adaptive) amplitude threshold to segment sounds
- Write your own query for custom segmentation based on various features
- Export raw feature vectors to Matlab and design your own algorithm there
In this chapter we only cover approaches 1. and 2. Approach 3. is documented in the
batch chapter, and Approach 4. in
exporting data.
Using a fixed amplitude threshold to segment sounds - One of the simplest and most widely used methods for segmenting sounds is by a fixed amplitude threshold:
Open
Explore & Score, Ensure that
“fine segmentation” is turned
off (see Fig 1 below)
Fig 1: Fine Segmentation "off"
Open your sound file or use Example1 (found in the sap directory) and then move the amplitude threshold slider (the one closest to the frequency axis) up to about 43Db:
Fig 2: Amplitude Threshold Slider
The
yellow curve shows the
amplitude, and the
strait yellow line is the
threshold. Amplitude is shown only when above threshold.
Syllable units are underlined by a light blue color below them, and
bouts are underlined by a red color.
Note segmentation outlines at bottom of sounds:
Fig 3: Segmentation outlines
Additional constraints on segmentation can be set, so at to reject some sources of noise. Here is an example:
Set the “advance window” slider to 2ms, and set the amplitude threshold to 30Db. Open example3:
Fig 4: Frequency of syllables
As shown, the last 3 ‘syllables’ are actually low frequency cage noise. Move the mouse to just above the noise level while observing the frequency value at the Features at Pointer panel (see red arrow). As show, most of the noise is below 1500Hz, whereas most of the power of the syllables is above that range.
We are not going to filter out those low frequencies. Instead, we will use this threshold to make a distinction between cage noise and song syllables: Click the
“Options & Settings” tab. Turn the high pass noise detector on and change frequency to 1500Hz:
Fig 5: High Pass - Noise Detector
Go back to sound 1, and click update display below the sonogram image:
Fig 6: Noise - No longer detected
Note that the most of the noise is no longer detected as vocal sound:
Fig 7: Noise isolated from vocal sounds
This filter does not affect any analysis of the remaining vocal sounds. This is because we set the noise detector filter as an additional criterion (on top of the amplitude threshold) to eliminate ‘syllables’ where at more than 90% of the energy is at the noise range.
There are several other controls that affect segmentation indirectly. Those include the FFT window, advance window, the band-pass filters on feature calculation, etc.
Here is an example of using the band-pass filter: turn the noise detector off and update the display so that the noise is once again detected as vocal sound. Then move the right sliders as shown
Fig 8: Noise isolated from vocal sounds
Now click update display:
Fig 9: Noise isolated from vocal sounds
And the outlines under the noise that is below the detection band should disappear. Note, however, than now all features for all syllables are only computed based on the band-pass filter that you set. Namely, frequencies outside the band are ignored across the board.
- Segmentation by a dynamic amplitude threshold
One limitation of static amplitude threshold is that when an animal vocalizes the “baseline” power often change as vocalization becomes more intense. For example, open the file “thrush nightingale example 1” with 3ms advance window and 0 amplitude threshold. Let’s observe the amplitude envelope of this nightingale song sonogram:
Fig 10: Noise isolated from vocal sounds
And let’s also look at the spectral derivatives, and a certain threshold indicated by the black line:
Fig 11: Noise isolated from vocal sounds
It is easy to see that no fixed threshold can work in this case (see arrows). To address this, turn
“fine segmentation” on. A new slider – called Diff – should appear between the amplitude threshold slider and the display contrast slider. Set it to zero (all the way up). In the fine segmentation box (bottom left of the SAP2 window) set the course filter to 500, fine filter to 0, update display and click filters:
Fig 12: White cureve - coarse amplitude filter, black line - fine filter, and segmentation
The white curve shows the coarse amplitude filter, which is the dynamic (adaptive) threshold. The black line is the fine filter, which is the same as amplitude in this case. The segmentation is set by the gap between them, where diff=0 means that we segment when the black line touches the white line, namely vocal sound is detected when the fine filter is higher than the course filter.
We can see that all syllables are now detected and segmented, but there are two problems:
- The diff detect low amplitude sounds, but also falsely detect small changes in background noise as sounds (look at the beginning of the file).
- Segmentation to syllables is often too sensitive and unreliable because each small modulation of amplitude may case segmentation.
A simple way of avoiding false detection of silences is to impose some minimal fixed amplitude threshold on top of the filters. To do this, set the Db threshold to 24:
Fig 13: No more silence is detected
As shown, no more silences are detected as sounds.
To decrease the sensitivity of segmentation we can use two methods. One is to make the Diff more liberal – allowing the detection of sounds even when the fine filter is slightly below the coarse one. Set the diff to -2.5 gives this result:
Fig 14: Setting the "diff filter to -2.5"
It is often better approach is to set the fine filter a bit coarser. For example setting fine filter to 5, keep course filter at 500, and setting the diff slider to -1.5 gives this segmentation:
Fig 15: Sound with fine filter set a to coarser setting
As shown, we have achieved a rather reliable segmentation despite the wide range of amplitudes in this song.