Sound Analysis Pro



Sound Analysis Pro has it's own site now:


Click here to download Sound Analysis Pro (it's free)

Click here to see publications that used SAP


Sound Analysis Pro2011 is a software & hardware system designed specifically to manage the acquisition and analysis of animal vocalization. SA+ eliminates much of the efforts involved in maintaining long-term vocal learning experiments, allowing automated acquisition and analysis of large amounts of sound data, scheduled training and on-line monitoring of behavior of several animals simultaneously. SAP2011 is an open code freeware with several options and extensions that can be implemented. The data acquisition component continuously monitors sounds and perform online sound analysis to recognize and record sounds of specific category (e.g., birdsongs). The training component performs a fully automated operant-training with song playbacks and provides on-line summary of vocal changes. By integrating four engines: recording-control, training-control, sound-analysis and database, SAP2011 can record and analyze all (and little but) the relevant the data during a prolonged period (e.g., throughout vocal development of a bird). SA+ integrates online and offline analysis methods on a large amount of sound data, handling millions of sounds and summarizing them into simple graphs, histograms and movie clips. Sound similarity measurements are now more reliable and faster with a few alternative methods to fit specific tasks. Finally, although we provide no formal technical support or liability, we made SA+ an open code, public domain freeware, in order to encourage uses to actively participate in this project, to eventually develop standards that will enhance cross-lab studies. Hence, users are strongly encouraged to contact us when encountering problems ( and we will make efforts to respond and solve the problem as much as we can.


Feature summary of SAP2011:

SAP2011 can be installed in any Windows computer (XP to Windows 7). Other operating systems are not supported. Hardware requirements depend on the application (e.g., you will need multi channel sound card to perform multi channel recording, see details in the installation section). Here and in the rest of this manual, the SAP2011 features are presented in the order of data-flow, namely, from the recording and training setup through real-time and nearly-online sound processing to offline analysis, similarity measurements and descriptive models of the sound data. Each one of the modules described below is self-contained and can be used regardless of (and in parallel with) the other modules. Beyond the scenes, however, modules interact with each other and share common resources. In most cases, running a few version of SA+ in parallel will not cause problems.

1. Sound Analysis Recorder The recorder performs multi-channel triggered recording of sound, monitoring of other behaviors (such as pecking on a key, or movement) and training with sound playbacks. It manages four input and four output audio channels simultaneously (and can be extended to handle an arbitrary number of channels* ). It analyzes the sound signal of each channel in nearly real-time and performs the first-pass filter on the sound data to discard long silence intervals and some types of cage noise. It then transfers sound data to wave files that contain sound intervals that are likely to include animal vocalization. Those files are immediately captured by the Sound Processing Live module, which performs the MT analysis and feature extraction (see below). The SA Recorder includes a fully automated operant training system that continuously interacts with the bird, monitors its behavior (e.g., when the bird pecks on a key), and responds with an appropriate playback or by activating peripheral devices when so indicated. Training regimen is fully automated and adjustable, including automated onset and termination of training on specific dates, alternating song models, setting daily quotas of playbacks and saving the training results into the database. There are no special hardware requirements for recording (except for a sound card with an appropriate number of channels), but for the operant-training setup you will need a low-cost ($170) digital I/O card. The training system can be extended to include delayed auditory feedback or for the sound-activation of devices such as light bulb, fan, or any other on/off gadget*. Detailed instructions on how to build a training system from scratch are provided in chapter 2A.

2. Sound Processing Live This module is the analysis companion of the Sound Analysis Recorder. It processes sound data that passed the recorder thresholds and performs online multi-taper (MT) spectral analysis followed by calculation of acoustic features and by segmentation to syllables and bout units. The graphic user interface (GUI) makes it easy for the user to design a scheme of parameter settings that captures the kinds of sounds that are of interest and record them (e.g., to save and process zebra finch songs but not single calls or cage noise). To achieve this, we designed a 3-stage decision process: stage i) animal-voice recognition, stage ii) segmentation to syllables and stage iii) analysis of temporal structure of syllables. All the calculations are translated into a simple graphic representation, displayed in nearly real-time, so that the user can test how any parameter settings affect the outcome of each phase recording session. Once SA+ ‘decided’ that sound data should be saved, it can save data of three types (we recommend that you save them all): i) the raw wave file containing the sound, ii) a binary file containing the curves of acoustic features, and iii) a syllable table containing a set of features that summarizes the acoustic structure of each syllable (e.g. its duration, mean pitch, mean FM, etc.). SA+ keeps data organized not only in the tables, but also by using consistent file annotation template including animal ID, serial number, date and time of recording. Data can be saved either to the local computer or through the network to any other network-accessible PC. To make data backup easier, files are automatically arranged into data folders of appropriate capacity, .e.g., that of a DVD.

3. Explore & Score This module is a thorough revision of the previous Sound Analysis 3 software. It allows you to explore the features of sounds, segment them (manually or automatically), perform a variety of measurements, explore feature space, as well as score the similarity between sounds. There are several improvements in this module. First, it is about 10 times faster than the old Sound Analysis and you can now open sound files of several minutes long. Second, database management and exporting of data directly to Excel or to Matlab are now fully implemented. Finally, similarity measurements have been improved. We have provided alternative methods for scoring similarity. All the results of the similarity measurements are saved into similarity tables.

4. Feature & similairty batch The song data of even a single bird can easily accumulate to several gigabytes of sound. The SA+ approach is to analyze these in nearly real-time. However, it is often desired to reanalyze the data, or to analyze a large amount of existing data (e.g., data collected using software such as AviSoft or Raven). The features batch can analyze a very large amount to sound data with the following options: it can be used i) to sort sound files according to content, ii) to calculate acoustic features and save them into binary files, and iii) to segment the sound into syllable units and save syllable features to a syllable table. iv) Once binary files have been computed, they can be used instead of the sound files to re-segment the sound based on a different set of criteria, or to perform similarity measurements. The advantage of the binary files is storage gain (by a factor of 10) and speed gain (by a factor of up to 100). This can allow the user to explore many segmentation methods and examine alternative Dynamic Vocal Development maps (see below) based on alternative segmentation methods. The similarity batch can be used to perform a large set of similarity measurements. It supports two batch modes: one is for comparing ordered pairs of sounds, and the other is for comparing matrixes (M x N) of sounds.

5. Dynamic Vocal Development maps As summarized above, SAP2011 automatically generates and updates a syllable-table for each bird, which summarizes every song syllables produced during vocal development (in a zebra finch, it is typically 1-2 million syllables). Obviously there is a lot of information in those syllable tables. To make this information easily accessible we developed a descriptive model called the Dynamic Vocal Development (DVD) map. DVD maps are presented as movie clips showing how syllable features change during song learning (or as a result of experimental manipulation). In the adult bird, the distribution of syllable structure is highly clustered, and the DVD maps show how these clusters (syllable types) come about. We developed several types of such maps to show different aspects of song development including syntax, circadian factors, and cross time-scales vocal changes. The different modes of DVD maps use shape, color and even sound-clicks to represent different aspects of song structure. Importantly, DVD maps can be played in nearly real-time, so that you can see a vocal change as it occurs. We believe that the DVD maps are the most important feature of SA+.

6. Clustering Clustering is used to detect syllable types and to automatically trace vocal changes during song development. We are still in the process of developing appropriate methods. As a temporary solution we implemented a nearest-neighbor hierarchal clustering method into an extensive graphic user interface including a display of clusters in color code, assessment of residuals, and an account of the number of members in each syllable type. The procedure performs the cluster analysis recursively, throughout song development. It provides online visual assessment of the outcome in each stage of analysis. The results of the clustering are automatically registered into the syllable table, so that as you do the cluster analysis you can play DVD maps, and ensure, by inspecting the color-code of each cluster, that the tracing procedure is indeed ‘locked on the target’. The tracing of each syllable type progresses from the mature song and back until the clustering procedure fails to trace the syllable type. As long as a cluster can be traced, it is easy to detect vocal changes that occur as the feature of a cluster approaches the final (target) state. Cluster Analysis is therefore a formal way of analyzing (and parameterizing) the DVD map. The user can select alternative features set for clustering or impose different constraints on the procedure, so as to achieve stable and reproducible results even in difficult cases.

7. Database All of the SAP2011 output is managed by the mySQL database engine, which is included in this package together with the mySQL control center. SQL is a simple, industry-standard language for querying databases. It is used extensively behind the scene of many SAP2011 functions, e.g., when you are playing a DVD-map. You can type SQL commands to set criteria for selecting and manipulating data in the syllable-tables and similarity-tables generated by SAP2011. Flexibility of filtering data becomes really important in tables that include millions of records. SA+ provides simple procedures for filtering tables and for exporting syllable-tables and similarity-tables to Matlab and to Excel.

8. Online help system SAP2011 now includes many functions and to make them easily accessible to the user. We implemented a goal-oriented modulation of the graphic user interface, including a hierarchal setting of 8 modules, 35 windows and over 1100 gadgets (buttons, sliders, images) to keep each procedure simple and intuitive to the user. In several windows we included a set of instructions that will help you perform procedures in an orderly and appropriate manner. In other cases we included question-mark ‘?’ buttons providing specific information that might help the user solve a problem without referring to the user manual. We also included warning ‘!’ massages near buttons that can cause trouble if not properly used.


About Sound Analysis Pro

Sound Analysis Pro2011 is an integrated system for studying animal vocalization. The system includes operant training with playbacks, a smart recorder, a variety of online and offline sound analysis toolboxes and an integrated database system with easy exporting of data to MS Excel and to Matlab. All stages of data acquisition and analysis are linked: an automated vocalization recognition procedure records the entire relevant vocal output (i.e. songs but not calls), followed by automated sound analysis, segmentation to syllable units, calculation of acoustic features and their distribution, and finally, transparent data management routines summarize the entire vocal data over time (i.e. song development) into a single database. The entire data (often including millions of sounds) is then presentable in the form of images or movie clips. These Dynamic Vocal Development (DVD) maps show how each syllable type emerges, and, in the case of vocal learning, how the animal manipulates syllable features to eventually approximate the model. Cluster analysis and similarity measurements are then used to detect syllable types, trace vocal changes and explore syntax structure across arbitrary time scales.

SAP2011 is an extension of Sound Analysis 3, although little remains from previous versions. SAP2011 was developed by a team of developers with a broader scope than that of Sound Analysis. Different features such as the smart multi-channel recorder, the animal-voice recognition system, training with sound-playbacks, the foundation for online auditory feedback controls and the automated classification of syllable types were developed by combining talents, often imposing challenges in maintaining an integrated and unified environment. The applications we had in mind were the continuous recording and nearly real-time analysis of vocal changes, as well as combined neural and acoustic data acquisition, but we also wanted to allow an easy generalization to a variety of applications that we cannot anticipate. We also made efforts to improve the similarity measurements. We extended the documentation to include both informal and formal descriptions of all measures. Furthermore, we provide the entire source-code as well as several Matlab functions. Despite hundreds of functions now included in SA+, we made efforts to maintain the approach of keeping it simple and friendly, with an online help system and the modular design of an extended graphic user interface (GUI).


The heart of SAP2011 is a digital signal processing (DSP) engine called ztBirdEngine, developed for us by David Swigger (ZettaTronics Inc). The engine handles sound input from several channels and performs online signal processing, recording and playback control. It is encapsulated into an ActiveX component, which is included in this package together with its entire source code and documentation. David also improved (by a factor of 20) the performance of the multi-taper spectral analysis. He encapsulated the public domain FFTW ( algorithm, including features for optimizing performance of frequency analysis to the hardware configuration of each computer (all those features are automatically and transparently implemented in SA+, but can be used for other applications). We provide the implementation of these FFTW routines, together with routines of randomly accessing wave files in a Dynamic Link Library with open source code and appropriate documentation. Calculation of acoustic features (pitch, Wiener entropy, FM…) is now encapsulated in a C++ class developed by David Swigger, which is also included with this package with proper documentation.


On top of this foundation, we designed several higher-level engines. Most of these were based on mathematical foundation and algorithms developed by Partha P. Mitra. The algorithms for classification of syllable types were implemented by Aylin Cimenser at Columbia University. Rafael Baptista helped us developing the database system using MySQL ( database engine. MySQL is also a public domain and open-code application. Implementation of the MySQL engine was done through the Core Lab MyDAQ3 package ( We used the Borland C++ Builder 5 compiler, MS Visual Studio compiler (), and SDL Component Suite ( The Excel exporting was implemented using the XL Report by Afalina Co., Ltd ( All other algorithms, implementations to C++ code, integration of components, software design as well as the GUI and help system were developed by Ofer Tchernichovski and Partha P. Mitra. Finally we would like to thank the NIH (NIDCD) for supporting this project until 2009.


Many of the improvements in software design should be credited to user feedback and data sharing: we would like to thank Cheryl Harding, Michael Brainard, Heather Williams, Santosh Helekar, David Vicario, and Noam Leader for their help and data sharing.



Ofer Tchernichovski