Introduction to processing live
Previous Top Next

The automated sound recognition system


A brief review of the SAP2 Recording & Live processing functionality

Automated animal-voice recognition is a principal feature of SAP2, making it possible to record and analyse an entire song development. Please read this section carefully and make sure you understand the theory, practice, and limitations of the automated sound recognition utilities.

We first present the computational framework in a nutshell, step by step. Steps 1-3 take place at the Recorder, whereas steps 4 take place at the processing live module.

1. The SAP2 Recorder captures sounds from an audio channel into a memory ring buffer. It examines sound amplitude in real-time, and if the amplitude is higher then background noise level, a recording session starts. Recording to a temporary wave file continues until sound amplitude is below threshold for some time.

2. Immediately after the temporary wave file is saved. However, while recording, SAP2 keeps monitoring sound amplitude and records stops when sound amplitude stays below threshold background for a certain duration (say 1s). This first phase procedure is the only ‘real’ real-time component of SAP2 Recorder.

3. The recorder then makes its final decision if to save the file, or delete it, based on the number of peak amplitude events observed during the recording. This stage eliminates long silence intervals and very short clicks. The wave file is now moved to the input folder of the Sound Processing Live module.

4. The ‘Sound Processing Live’ application now takes over. Note that we separated the recorder from the SA+ application. This is done to ensure that recording will persist no matter what happened during the later stages of analysis. Once a sound file has been forwarded to analysis, it is captured by the SAP2 processing within several ms. Since this off-line processing occurs almost in real-time, it is called pseudo-real-time analysis.

5. The module first performs multi taper spectral analysis of the recorded sounds, and extracts song features.

6. Based on amplitude envelope and on Wiener entropy values, the sound is segmented the sound into syllable and bout units.Additional "noise detector" filters might be used at that stage to eliminate sounds that are not species typical. 

7. A final decision if to accept of reject the sound is made based on the number of syllable, their duration, and the bout duration.

8. If sound was rejected, the file is deleted.

9. Otherwise, the module save all or some of the following:
            - The wave file (under a new name)
            - A table of syllable features
            - A table of raw features (every 1ms)

 


Based on this analysis (that occurs about 10-20 times faster than the real-time progression of the sound) SA+ decides if to save the sound or not.
  

Created using Helpmatic Pro HTML