The automated sound recognition system
A brief review of the SAP2 Recording & Live processing functionality
Automated animal-voice recognition is a principal feature of SAP2, making it possible to
record and analyse an entire song development. Please read this section carefully and
make sure you understand the theory, practice, and limitations of the automated sound
recognition utilities.
We first present the computational framework in a nutshell, step by step. Steps 1-3 take
place at the Recorder, whereas steps 4 take place at the processing live module.
1. The SAP2 Recorder captures sounds from an audio channel into a memory ring buffer.
It examines sound amplitude in real-time, and if the amplitude is higher then background
noise level, a recording session starts. Recording to a temporary wave file continues until
sound amplitude is below threshold for some time.
2. Immediately after the temporary wave file is saved. However, while recording, SAP2
keeps monitoring sound amplitude and records stops when sound amplitude stays below
threshold background for a certain duration (say 1s). This first phase procedure is the
only ‘real’ real-time component of SAP2 Recorder.
3. The recorder then makes its final decision if to save the file, or delete it, based on the
number of peak amplitude events observed during the recording. This stage eliminates
long silence intervals and very short clicks. The wave file is now moved to the input
folder of the Sound Processing Live module.
4. The ‘Sound Processing Live’ application now takes over. Note that we separated the
recorder from the SA+ application. This is done to ensure that recording will persist no
matter what happened during the later stages of analysis. Once a sound file has been
forwarded to analysis, it is captured by the SAP2 processing within several ms. Since this
off-line processing occurs almost in real-time, it is called pseudo-real-time analysis.
5. The module first performs multi taper spectral analysis of the recorded sounds, and
extracts song features.
6. Based on amplitude envelope and on Wiener entropy values, the sound is segmented
the sound into syllable and bout units.Additional "noise detector" filters might be used at
that stage to eliminate sounds that are not species typical.
7. A final decision if to accept of reject the sound is made based on the number of
syllable, their duration, and the bout duration.
8. If sound was rejected, the file is deleted.
9. Otherwise, the module save all or some of the following:
- The wave file (under a new name)
- A table of syllable features
- A table of raw features (every 1ms)
Based on this analysis (that occurs about 10-20 times faster than the real-time progression
of the sound) SA+ decides if to save the sound or not.
Created using Helpmatic Pro HTML