1D developmental histograms
2D DVD maps
3D DVD map
3D DVD maps
About SAP2
about the features batch
About the MySQL database interface
additional features
Advance usage of the SELECT keyword
Altering, cleaning and merging tables
Amplitude
Amplitude modulation
Animal, Actions, and Action -> Animal tables
Animals & Tables
An introduction to the SAP2 Database Structure
Articulation Based Analysis
Asymmetric comparisons
Asymmetric score -- an example
automatic authentication
Batch cluster target mode
Batch Setting step 6
Batch Setting step 7
Batch table 1 x table 2 mode
Calibrate thresholds
Chapter 3: Spectral Analysis
Comparing related and unrelcated songs
Continuity Over Frequency
Continuity Over Time
Contour Derived Features
Correcting errors
Cummulative DVD images
Display options
Excel interface
Exhaustive clustering
Exploring similarity across features
Exporting tables into Matlab
Exporting the Feature Distribution Results to Excel
Feature Space
Finding and Filtering Data
flexible frequency range
Spectral Derivatives
Frequency Modulation (Contour Slope Estimate)
Frequency Modulation (Derivatives Estimate)
Glossary of terms
Goodness of pitch
hardware instalation: building training boxes
hardware instalation: sound hardware
hardware instalation: storage media
How Feature are used in SAP2
How SAP2 handles Animal Information
How to practice this tutorial in MCC or in Matlab
improved database
improved similarity score
Input Display
instant clustering
Interpreting the similarity score
title
Introduction to Spectral Analysis
Introduction to clustering
Introduction to DVD maps
Introduction to processing live
Introduction to Sound & Brain
Licence
live
live_1
Looking at the Tables in MySQL Control Center
master-slave recording
Mean Frequency
Mean Frequency Amplitude (MFA)
mySQL tutorial for SAP2
oscilloscope
Batch modes: Pairs and MxN modes
Performance control
Period of repetition
Pitch
Playbacks Control
Introduction
Principal Frequency
Raw features DVD maps
recorder
Running the batch
Saving the data
Segmentation to syllable units
segmented comparisons
SELECT and CREATE tables
Self-similarity test
Setting Excel Access
Setting Keys & other detectors
Setting National Instrument Cards
Setting the recorder: step 1
Setting the recorder: step 2
Setting the recorder: step 3
Setting the recorder: step 4
Setting the recorder: step 5
Spectral Derivatives
Similarity batch
Singing around the clock
software instalation
sound & brain
Sound Feedback Control
sound inpute and output
Spectral Derivatives
state transitions
Batch Setting step 2
Batch Setting step 3
Batch Setting step 4
Batch Setting step 5
Step 6: setting playbacks
Step 7: start recording & troubleshooting
Step by step clustering
Symmetric comparisons
Syn-Song DVD maps
Syntax DVD maps
The Animals Table
The Channels Table
The Control Panel
The feature_scale table
The File Table
The Key-peck table
The main controls
the main window
The metric system
The NIDAQ table
The Raw Features Table
The settings table
The Similarity table
The Syllable Table
the tables structure
The Tasks table
The Wave Files
The Bouts table
time course versus mean values
A bird view of Sound Analysis Pro 2
recorder features
Batch Setting step 1
Viewing Feature Summaries of Individual Syllables
Viewing the features
Why should you learn some Standard Query Language (SQL)
Why use Features?
Wiener Entropy
Chapter 4: The Song Features
Chapter 5: Exploring the Song Features
Chapter 6: Your Animals
Chapter 2: Installation
Chapter 1: Overview of SAP2
Highlights
Chapter 7: Sound Analysis Data Structure
Chapter 7: Sound Analysis Data Structure
Chapter 9: SAP2 Recorder
Chapter 10A: Sound Processor Live
Chapter 10B: Sound and Brain Live
Chapter 11: Features Batch
Chapter 12: Similarities, Measurement & Batch
Chapter 13: Dynamic Vocal Development (DVD) Maps
Chapter 14: Clustering Syllables


1D developmental histograms
Previous Top Next



One-dimensional DVD maps (developmental histograms)


One-dimensional DVD-maps display changes in the values of a single feature, capturing the entire table (e.g., the whole song development) in a single image. Petr Janata was the first to suggest this idea, and showed that presenting the evolution of duration histograms during development has traceable structure. The plot shows the distribution of syllable durations during any one day in each raw using gray-scale and the entire matrix span that entire song development. SA+ uses a similar approach, except that the gray-scale is replaced by a 3D reconstruction.

Click the '1D developmental histograms' tab and open the table of bird109. Then select 'duration' for the X axis (Y axis selection has no effect - we only look at one feature) and click 'Histogram'. Computing the histogram takes about 30 seconds.

Each row in the 3D histogram shown below represents a single day of song development, and the height of the mountains represents the abundance (frequency) of a certain duration. The far side of the histogram represents the end of song development, and the 3 peaks stand for the 3 syllable types produced by this bird. As shown, it is very easy to trace the peaks backwards to the point of origin. The advantage of this representation is that it shows us the entire song development at a glance, based on the entire song development data. It is also making it obvious why is it so important to record and analyze the entire song development data - otherwise, computing such histograms won't be as robust and won't work on a short time scale. Note how one of the peaks 'takes a turn' during early song development (this is the vocal change of time-warping). You can see that the vocal change occurs within a few days, whereas thereafter, duration did not change much. As we shall discuss later on, vocal changes are often like this - they can take a few hours, a few days or several days, and then nothing happens during the next several days. Given this hectic nature of vocal change, DVD-maps give us an easy way of detecting critical moments in song development.


graphic

















Finally, histograms can be computed for any features - not only duration - Try mean Wiener entropy and mean FM.


The range can be automatic or manual. When selecting a manual scale graphic the minimum and maximum range is set by the user. The number of bins, however, is fixed and equals 100, so selecting a duration range of 0-500ms will give bins of 5ms. If you choose auto-scale, SA+ will scale the features based on the normalization of syllable features (see in options -> feature normalization). 

Aliasing issues: never let the bin-size approach the encoding accuracy of a feature. The likely problem is with duration. The actual encoding is the window-step taken, which is about 1.4ms by default. Since we use 100 bins, the default definition is about 5ms when the range is set between 0-500ms. Setting it from, say 50-300ms is taking us down to 2.5ms - which is still fine, but going down to, say, 100-200 is taking us down to 1ms - which is below the critical (Nyquist) value - which will result in aliasing: a nasty, unresolvable artifact in the duration histogram.
  

2D DVD maps 2D DVD maps
2D DVD maps
Previous Top Next


2-D DVD maps of features


Click 'open table', select 'bird109' and click 'DVD Play/Stop'. You should see and image like the one presented below:
Only syllable with longer or shorter duration than those thresholds are shown
File name containing the current syllable 
Date and time of current syllable 


 
graphic

























Date and time of current syllable 
X-axis feature presented 

Y-axis feature presented




Each red dot represents a syllable, the X axis is 'duration' by default, and the Y axis is mean FM. Note that feature units are scaled, with median at 0 and with a spread of a single MAD (median absolute deviation) unit. The beginning of the movie is also the beginning of song development, when songs syllable are relatively unstructured.

Click at the long slider at the bottom, it should turn yellow, and move its thumb to the right, about ¾ of the way. The movie should now look very different:
graphic























             We can see 3 nice clusters, which correspond to the 3 syllable types of the song produced. To see other features of those clusters click on the Y-axis feature choice (you can do this as the movie is playing). Clicking on the pitch shows different projections of the clusters. Moving the mouse into the movie image will show a hand pointer, indicating that you can move the image as desired. On the bottom right you will see a radio-group with 3 choices: rotate, pan and zoom - try them all during the movie play.


graphicgraphic




At this point, our notion of clusters is informal, but in the next chapter you will learn how to use SA+ to automatically identify the clusters. In this example, we already clustered the data. To view the clusters select the 'color by cluster' from the color scheme. You can now see those syllables that we did not cluster in grey color, and those that were clustered by different colors. In the syllable table, each cluster is identified by a number, and these numbers correspond to the following colors:

l  Red;                  2 Blue;                        3 Yellow;                    4 Lime;            5 Black;                                                   6 Fuchsia;           7 Olive;                       8 Silver;      9 Green;          10  Aqua;

graphic
















Looking at DVD maps after clustering is a useful method for validating that the outcome of the clustering is reasonable.
  

3D DVD map 3D DVD map
3D DVD map
Previous Top Next

  

3D DVD maps 3D DVD maps
3D DVD maps
Previous Top Next

graphic

about
Previous Top Next


About Sound Analysis Pro

            Sound Analysis Pro II (SAP2) is an integrated system for studying animal vocalization and for integrating measurements across auditory, peripheral and brain levels.
The system includes a multi-channel smart recorder, operant training with playbacks with preprogrammed regimes, a variety of online and offline sound analysis toolboxes and an integrated database system with easy exporting of data to MS Excel and to Matlab.

All stages of data acquisition and analysis are linked: an automated vocalization recognition procedure records the entire relevant vocal output (i.e. songs but not calls), followed by automated sound analysis, segmentation to syllable units, calculation of acoustic features and their distribution, and finally, transparent data management routines summarize the entire vocal data over time (i.e. song development) into a single database. Recording from additional channels (e.g., multiunit activity in RA) are fully synchronized and integrated into the database (as additional columns to the same tables used to summarized acoustic structure).

The entire data (often including millions of syllables or hundreds of millions of millisecond by millisecond features) is then presentable in the form of images or movie clips. These Dynamic Vocal Development (DVD) maps show how each syllable type emerges, and, in the case of vocal learning, how the animal manipulates syllable features to eventually approximate the model. Cluster analysis and similarity measurements are then used to detect syllable types, trace vocal changes and explore syntax structure across arbitrary time scales. Of course, it makes little sense to have all the possible segmentation and presentation methods ‘hardwired’ in SAP2, and therefore, we have made several Matlab functions available in a separate documented library called SAM (Sound Analysis in Matlab). 

SAP2 is an extension of SA+. It was developed by a team of developers so that different features such as the smart multi-channel recorder, the animal-voice recognition system, training with sound-playbacks, the foundation for online auditory feedback controls and the automated classification of syllable types were developed and revised by combining talents, often imposing challenges in maintaining an integrated and unified environment. The applications we had in mind were the continuous recording and nearly real-time analysis of vocal changes across system levels: from sound, to articulation and brain level investigation. As before, we also wanted to allow an easy generalization to a variety of applications that we cannot anticipate. We also made efforts to improve the similarity measurements. We extended the documentation to include both informal and formal descriptions of all measures. Furthermore, we provide the entire source-code as well as several Matlab functions. Despite hundreds of functions now included in SA+, we made efforts to maintain the approach of keeping it simple and friendly, with an online help system and the modular design of an extended graphic user interface (GUI).

            The heart of SAP2 is a digital signal processing (DSP) engine called ztBirdEngine, developed for us by David Swigger (ZettaTronic Inc). The revised engine is much enhanced now. It handles sound input from 10 channels and performs online signal processing, recording and playback control. It is encapsulated into an ActiveX component, which is included in this package together with its entire source code and documentation. David also improved (by a factor of 20) the performance of the multi-taper spectral analysis. He encapsulated the public domain FFTW (www.fftw.org) algorithm, including features for optimizing performance of frequency analysis to the hardware configuration of each computer (all those features are automatically and transparently implemented in SA+, but can be used for other applications). We provide the implementation of these FFTW routines, together with routines of randomly accessing wave files in a Dynamic Link Library with open source code and appropriate documentation. Calculation of acoustic features (pitch, Wiener entropy, FM…) is now encapsulated in a C++ class developed by David Swigger, which is also included with this package with proper documentation.

            On top of this foundation, we designed several higher-level engines. All the new Matlab utilities were developed by Sigal Saar. Most of these were based on mathematical foundation and algorithms developed by Partha P. Mitra. The algorithms for classification of syllable types were implemented by Aylin Cimenser at Columbia University. Rafael Baptista helped us developing the database system using MySQL (www.mysql.com) database engine. MySQL is also a public domain and open-code application. Implementation of the MySQL engine was done through the Core Lab MyDAQ3 package (http://www.crlab.com)  We used the Borland C++ Builder 5 compiler, MS Visual Studio compiler (), and SDL Component Suite (http://www.lohninger.com). The Excel exporting was implemented using the XL Report by Afalina Co., Ltd (http://www.afalinasoft.com). All other algorithms, implementations to C++ code, integration of components, software design as well as the GUI and help system were developed by Ofer Tchernichovski and Partha P. Mitra. Finally we would like to thank the NIH (NIDCD) for supporting this project until 2009. 

Many of the improvements in software design should be credited to user feedback and data sharing: we would like to thank Cheryl Harding, Michael Brainard, Heather Williams, Santosh Helekar, David Vicario, and Noam Leader for their help and data sharing. 


Ofer Tchernichovski, David Swigger, Partha P. Mitra & Sigal Saar

August 2007

about the features batch about the features batch
about the features batch
Previous Top Next

What is "features batch"?
                                                                                                                        

In this chapter we document two types of batch operations: features batch and similarity batch. The features batch is design to process a large amount of sound data stored as wave files. Similarity batch can perform many thousands of similarity measurement automatically. In addition to adding sound files manually, SA+ allows you to use the output of other modules to target specific sounds, e.g., those that belongs to a certain cluster. In general, all the fields in a syllable table can be used to quarry retrieval of two sets of sounds followed by MxN similarity measurements.


Feature batch can be used to:

·     calculate raw features from sound data

·     Segment the sound to syllables, compute syllable features and store them in tables

·     Sort sound files according to content
  

About the MySQL database interface About the MySQL database interface
About the MySQL database interface
Previous Top Next


About the MySQL database interface

graphic

You have seen above that the SA+ contains mechanisms of transferring data to Excel and to Matlab, and we mentioned that the data 'source' for these mechanisms is a database system called MySQL. MySQL is an open-code application, that can be used freely for non-commercial purposes (in fact, MySQL is a success story: in contrast to companies like Microsoft, they make their profits by opening, rather then protecting their code). MySQL provides a foundation of tables, queries and data manipulation methods. You can find the documentation and user manual in 'c:\mysql\docs', but in addition to this, we provide a 'mySQL in a nutshell' summary, with some useful commands in Appendix I.

graphicThe MySQL control center will allow you to observe, copy and query your tables. SA+ does not create any new database but saves all the tables in the default 'mysql' database. To view a table, open the MySQL control center and browse the databases to obtain this view:
  


graphic

Then right-click the table and choose 'return limit'.
You should then select a limit such as 0-1000 and observe the (first 1000) data in the table.

Alternatively, you can select specific data from the table by clicking on a table, and then clicking the SQL button. Here is an example:

For more details about how to design SQL commands, see Appendix I and the MySQL user manual at c:\mySQL\docs.
  

additional features additional features
additional features
Previous Top Next


Contours and contour-based features were added:

graphic

Advance usage of the SELECT keyword Advance usage of the SELECT keyword
Advance usage of the SELECT keyword
Previous Top Next

Advance usage of the SELECT keyword



Table1 includes only three song bouts produced by a bird named b109. In our database you will find () a table called b109clustered, which contains the entire song development of this bird. In this table you may not want to run the query shown above, because this query would return hundreds of thousands of records.
In this case, you should always using the keyword limit,  e.g., (Matlab):

[c1_duration, c1_pitch]=mysql('select duration, mean_pitch from b109clustered where duration>100 and duration<150 and mean_pitch>600 and mean_pitch<1100 limit 5000;');

will execute promptly and will return the first 5000 syllables that match your criteria. Because this table is already “clustered”, namely, the syllable types are already identified (see chapter xx) you could have simplified this query since this cluster is already identified as cluster number 4, so this query can turn to

select duration, mean_pitch from b109clustered where cluster=4 limit 5000;

Now, this query will return the first 5000 syllables of type 4, but what if you want not the first 5000 syllables by a random sample of 5000 syllables? That's easy:

select duration, mean_pitch from b109clustered where cluster=4 order by rand() limit 5000;

What if you only want to count how many syllables of each count exists in the table?
select count(*) from b109clustered where cluster=4;

will return 173581, so you can now repeat this query for clusters 0 (not clustered), 1 (introductory notes) and  3, 4 (song syllable types of this bird)  and tell that in this bird the frequency of types over development is:

0: 279354
1: 337884
3: 198997
4: 173581

 We can now see if those types are present in similar proportion throughout development. To do so, we will limit our queries to different dates.

select count(*) from b109clustered where month=8 and day<19 and cluster=4;

will return 0, but

select count(*) from b109clustered where month=8 and day=20 and cluster=4;

will return 858, telling us that (using our clustering method) this cluster was not identified before August 19. 

It should be now quite easy for you to realize how to select data by any combination of criteria. Using a simple for loop, you can pump the results of these queries into Matlab functions. When properly used, the MySQL server is a very potent device, capable of amazing performance returning complicated queries in a remarkable speed. 

  

Altering, cleaning and merging tables Altering, cleaning and merging tables
Altering, cleaning and merging tables
Previous Top Next

Altering, cleaning and merging tables


Mistakes happens, and tables need to be altered or updated occasionally. You can alter data manually in the MCC, but as your tables become larger, it becomes impractical to correct values by hand. The simplest way of altering a table is to apply a change to an entire field, e.g., 

Update my_syllables set cluster=0;

is going to erase all your cluster values, setting them to zero. So, if you actually like most of your clusters, but not cluster 4, you could type:

Update my_syllables set cluster=0 where cluster=4;

In short, you can alter subset of your data using the exact same method for selecting subset of your data.

Eliminating duplicates:

 
Finally, it is quit easy to merge two tables into a single, large table:

Insert into my_first_table select * from my_second_table;


  

Amplitude

Amplitude

Formal definition: Amplitude in units Decibels is calculated as where Pf is the power at any one frequency and were baseline is (arbitrarily) set to 70dB as a default.

Fig 1: Equation of Amplitude

Amplitude is measured in dB scale, but note that the power is un-scaled, hence the baseline is arbitrary. You can adjust the baseline value in options->feature calculation->amplitude baseline. SAPII uses amplitude as one of the criteria for segmenting syllables and adjustments are based on relative values (namely, you adjust the amplitude slider to be slightly above the cage noise observed in the spectral image). Other than that, however, amplitude is not used for any other procedure.

Amplitude Modulation

Amplitude Modulation
 


Formal definition:
graphic
that is, AM is simply the overall time-derivative power across all frequencies within a range. Units of AM are 1/t, SAP2 does not scale AM, and time units are defined by the ‘advance window’ parameter.

Amplitude modulation captures changes in the amplitude envelope of sounds. It is positive in the beginning and negative at the end of each sound, with the sound unit being either a syllable or a change of vocal state within a syllable.

Animal, Actions, and Action -> Animal tables

Animal, Actions, and Action -> Animal tables
 
The Animal table we just explored is one of three linked tables:

graphic


Together, these three tables can help you organize your experiments and analysis. Actions is simply a list of actions performed on the animal (surgery, playbacks) or on the data (clustering, similarity scores):

graphic

You can add actions to this table as you like, and then in each time you perform a procedure, just add it to the Animal_actions_log table as follows:
Go to the Animals page and click on an animal
Go to the Actions page and click on an action
Go to the Action -> Animal log page, add details if you like, and click “Add Record”

graphic
Note that the date and time of each assignment is automatically entered. Examples of how to use this utility are posted below.

Animals & Tables Animals & Tables
Animals & Tables
Previous Top Next


Animals & Tables

Setting Animals
Setting the animal: Before recording and processing you must make sure that animals are appropriatly identified by each channel. the Database module of SAP2 provides you with the most complete set of tools to add animals to the SAP2 database. You then only have to click "Change Bird" to switch. The recommended method is to set the animal for each channel using the SAP2 Recorder, which also allows you to add new birds. Once done, just click "Update All" in this module so as to synchronize the Processing live with the Recorder. Note that if there is a que of waiting files waiting unprocessed from a  previous session, these data will be falsely added to the new bird table.

If you are processing several birds at the same time, it often happens that you change one bird "on the fly". In this case do not click "Update All" and instead click "Change Bird" in the appropriate channel. For more information, go to chapter 6 "your animals".



graphic

Saving Tables

By default SAP2 Sound Processing Live saves only the wave files that were "accepted" and all the  features that were calculated are discurded. Of course, you can always recalcualte them using the batch module, but you should decide of you want to save them in real time. There are many reasons to save, but keep those two issues in mind:

1. If you want to save the raw features, namely the continiuous feature vector, you are saving an record of 14 fields (about 200 bits) every millisecond of vocalization, which is quite a lot. For example, the raw wave files (your sound data) is about 700 bit / ms).  Taking other database issues into account, you are increasing the storage need by maybe 30-40%. Another consideration is that the database storage at this fast rate takes a toll on your memory and processor, which can be quite significant.

2. If you want to save the syllable table features, then both storage, memory and processing costs are neglegible. However, the syllable table is only as good as your segmentation is. This is why we often recalculate the syllable table off line using more careful choices of segmentation parameters. We also have Matlab routines that can explore different thresholds and optimize the segmentation. So, in most cases, you should consider the real time syllable table as a crude, and try to improve it off line. 

An introduction to the SAP2 Database Sstructure

An introduction to the SAP2 Database Sstructure
 
How analysis results are stored in the database: SAP2 automatically create wave files and analyze them. The results are saved in a database called SAP, where each table summarizes data of different type. Here is a brief description of the data tables according to their types:

graphic


The Sound Analysis Recorder detects vocalization and save the sound data in wave files. Wave files are stored in daily folders. During recording, in nearly real time, the Sound Processing Live engine performs multi taper spectral analysis, and then computes the song feature. The features, usually computed in steps of 1ms, are stored in Raw Features tables. Note that these records are continuous, with no distinction between “silences” and “sounds”. By default, SAP2 creates a single Raw Feature table per day, per bird. It is often a large table, including up to hundreds of millions of records. As the data from the (many) wave files flows into the Raw Features table, SAP2 updates a small File Table, which contains indexes and file names of each processed wave file. Indexes for those wave files are saved in each of the Raw feature table records.The Raw Feature Table, together with the File Table, can be used to generate a Syllable Table. SAP2 automatically create one Syllable Table that includes records of the entire song syllables the bird produced during an entire experiment.

The transition from Raw Feature Tables to the Syllable table is computationally cheap. However, segmentation of sound data is a difficult task, which you might want to revisit. This is one reason why it is useful to have the Raw Feature Tables - it is a analyzed version of your raw data that is still very raw. A second reason to have the Raw Feature Tables is that it is something meaningful to analyze sound data “as is” without any segmentation. Such analysis is particularly useful for so called “graded” signal, where segmentation is not necessarily meaningful and might induce artifacts. It is easy to compare raw features using the KS statistic.  The more complicated Similarity Measurements of SAP2 are also based on raw features.

In the previous chapter we presented examples of how SAP2 data can be exported easily to MS Excel. This method, however, is very  limited in its capacities and flexibilities. In order to take full advantage of SAP2 you will need to understand some of its database structure and you will need to know how to access raw and processed data using using MySQL and Matlab. If you do not use Matlab it is recommended that you take a look at this chapter, and learn something about how to use MySQL to visualize and manipulate the data. If you do use Matlab, this chpater is very important, and also, we strongly recommend that you will learn just a little SQL (we will help you), which will help you to swiftly retrieve the appropriate data into matlab variables. In this chapter we present the database structure in a simplified manner. We then describe the structure of the feature tables, and show how those tables can be accessed in Matlab. SAP2 raw data are saved as wave files, whereas all the derived data (features, similarity measures, clusters) are saved in a MySQL database tables.

SAP2 automatically generates tables of several different types, but you only need to know about some of them. Here is a simplified flowchart of what happens to sound data in SAP2:

graphic


The SAP2 recorder detects and save wave files into daily folders. Each daily folder might contain thousands of wave files. As wave files are recorded, “sound processing live” engine

Articulation based analysis

Articulation based analysis

The analytic framework of Sound Analysis is rooted in a robust spectral analysis technique that is used to identify acoustic features that have good articulatory correlates. The acoustic features that we chose to characterize zebra finch song are represented by a set of simple, one-dimensional measures designed to summarize the multidimensional information present in a spectrogram. A procedure for measuring similarity, based on such an analytic framework has the following advantages:

1. It enables the examination of one acoustic feature at a time, instead of having to cope with the entire complexity of the song of two birds. A distributed and then integrated assessment of similarity across different features promotes stability of scoring.

2. It is analytic: it evaluates each feature separately and tells you not only that two sounds are similar or different - by also in what sense they are similar/different. For example, it allows examination of how each of the features emerges during development and is affected by different experimental manipulations.

Note that all the features presented here (except for amplitude) are amplitude invariant. That is, the amplitude of the sound recorded does not affect them and hence the distance between the bird and the microphone should have only minor effect as long as the shape of the sound wave has not been distorted by the environment.

Asymmetric comparisons Asymmetric comparisons
Asymmetric comparisons
Previous Top Next


Asymmetric similarity measurements


Asymmetric similarity measurements are those where sound 1 is the model (or template) and sound 2 is the copy, and we want to judge how good the copy is in reference to the model. For example, if a bird has copied 4 out of 5 syllables in the song playbacks it has heard, we will say that 80% of the model was copied. However, what should we say had the bird produced a song of 10 syllables, including accurate copies of the 5 model syllables and 5 improvised syllables?  It makes sense to state that all (100%) of the model song was copied, but the two songs are only 50% similar to each other. To capture both notions we will say that asymmetrically, the similarity to the model is 100%, and that symmetrically, the similarity between the two songs is 50%. We shall start with asymmetric comparisons:

Start SA+, open 'Example 2' and outline the entire song. Click the 'Sound 2' tab, open 'Example 2' and outline it. Make sure that the amplitude threshold is set to 37dB in both windows. Click the 'Similarity' tab and click 'Score'. The following image should appear within a few seconds:  
graphic


























The gray level of the similarity matrix represents the Euclidean distances: the shorter the distance the brighter the color; intervals with feature distances that are higher than threshold are painted black.

graphicSimilarity sections are neighborhoods of intervals that passed the threshold (e.g., when the corresponding p-value of Euclidean distance is less than 5% for all neighbors). As noted, the gray level represents the distance calculated for each pair of intervals. However the only role of the distance calculation across (70ms) intervals is to set a threshold based on 'viewing' features across a reasonably long interval. The actual similarity values are calculated frame-to-frame within the similarity section, where p-value estimates are based on the cumulative distribution of Euclidean distances across a large sample (250,000) of random pairs of frames obtained from comparisons across 25 random pairs of zebra finch songs: 






Local (frame level) similarity scores: Based on this distribution, we can endow each pair of frames with a local similarity score, which is simply the complement of the Euclidean distance p-value. That is, if a single-frame p-value is 5% we say that the similarity between the two frames is 95%. Local similarity is encoded by colors in the similarity matrix as follows:  
Score (1-p)%
Color
95-100
red
85-94
yellow
75-84
lime
65-74
green
50-64
olive
35-49
blue
graphic









graphic
Section-level similarity Score: We now turn to the problem of estimating the overall similarity captured by each section. First, SA+ detects the boundaries of each section. Then, single frame scores are calculated for each pixel and finally, SA+ searches for the best 'oblique cut' through the section, which maximizes the score. In the simplest case (e.g., of two identical sounds) similarity will maximize on a 450 angle at the center of the section. In practice, it is not always the center of the section that gives the highest similarity, and the angle might deviate from 450 if one of the sounds is time warped in reference to the other. We therefore need to expand in different displacement areas and at different angles. The default 'time warping tolerance' is set to 5% by default, allowing up to 5% angular deviation from the diagonal. Note that computation time increases exponentially with the tolerance. The search for best match is illustrated below:

graphic

























graphicgraphicWe now consider only the frames that are on the best-matching diagonal, and calculate the average score of the section. This score is plotted above the section. Boundaries of similarity sections can be observed more clearly by clicking the global 'combo' button:

 
The light blue lines show the boundaries of each section and the rectangles enclose the best diagonal match of each section

Similarity across sections: Note that there are several sections with overlapping projections on both songs. To obtain a unique similarity estimate, SA+ must eliminate redundancy by trimming (or omitting) sections that overlap with sections that explain more similarity. We call the former 'inferior sections' (blue rectangles) and the latter (red rectangle) 'superior sections'.  







Final sections: once redundancy has been trimmed, it often makes sense to perform one final filtering, by omitting similarity sections that explain very little similarity (which are likely to be 'noise'). By default, SA+ omits sections that explain less than the equivalent of 10ms x 100% similarity. Superior similarity sections that passed this final stage are called final sections.  


The overall similarity score is a product of 3 components: % similarity, mean accuracy and sequential match. You can eliminate each component from the overall assessment by un-checking it.
graphic
% similarity is the percentage of tutor's sounds included in final sections. Note that the p-value used to detect sections is computed across intervals of 70ms: This similarity estimate is asymmetric and it bears no relation to the local similarity score we discussed above.

Mean accuracy is the average local similarity scores across final sections.
To estimate a combined score, we simply multiply the accuracy by the % similarity. For example if we have 60% similarity and 70% accuracy (in the similar parts) the total score will be 42%.

Note: some people do not like to report combined scores because lower numbers are often judged as 'weak'. Presenting both significant similarity and combined scores is not a bad idea.

Sequential match is calculated by sorting the final sections according to their temporal order in reference to sound 1, and then examining their corresponding order in sound 2. We say that two sections graphicare sequential if the beginning of graphic in sound 2 occurred between 0-80ms after the end of Si . This tolerance level accounts for the duration of stops and also for possible filter effect of very short sections that are not sequential. This procedure is repeated for all the consecutive pairs of sections on sound 1 and the overall sequential match is estimated as:

graphic .

Note that multiplying by 2 is offsetting the effect of adding only one (the smallest) of two sections in the numerator. This definition is used for asymmetric scoring, whereas for symmetric scoring the sequential match is simply the ratio between the two outlined intervals on sound 1 and sound 2, namely:

graphic

Weighting the sequential match into the overall score: In the case of symmetric scoring only the sequentially matching parts of the two sounds can be considered, so it makes sense to multiply the sequential match by the combined score. In the case of time-series comparison, it does not make sense to multiply the numbers, because this will mean that we give 100% weight to sections that are sequential, and 0% weight to those that are not. Therefore, you have to decide what weight should be given to non-sequential sections. The problem is that sequential mismatches might have different meanings. For example, in extreme cases of 'low contrast' similarity matrices (with lots of gray areas) the sequence might be the only similarity measure that captures meaningful differences, but when several similar sounds are present in the compared intervals, it might be shear luck if SA+ will sort them out sequentially or not. In short - we cannot advise you what to do about it, and the default setting of 50% weight is arbitrary. 
  

Asymmetric score -- an example Asymmetric score -- an example
Asymmetric score -- an example
Previous Top Next


An example of asymmetric comparison

 
Start the SA+ 'explore and score' module, make sure that the amplitude threshold is 37dB (yellow slider) and open the sounds Example 1. Select the 'auto segment a single syllable' radio-button and click at the second syllable:
graphic


graphic





Next click the 'Sound 2' tab, open Example 2 and click on the last syllable:
 
graphic









Click the 'similarity' tab and click score.


graphicgraphic
graphicgraphic













As shown, the % similarity is 100%, meaning that the similarity section captured the entire model song (sound 1). The accuracy is moderately high, showing that the frame-to-frame similarity is reasonable. Now reverse the comparison by opening example 2 in the 'sound 1' tab and then open example 1 in 'sound 2'. Outline the appropriate syllables and score the similarity: Note that the % similarity is now only 53% because only the second part of the model matches with the copy. Note also that the accuracy is similar, but not identical to the reciprocal comparison. This is because the section is cut also asymmetrically, in reference to the model (since we do not insist on 450 but allow some jitter, the comparison is not one on one).

We next examine the outcome of comparing two songs using either the time-course score or mean values score. An alternative approach is comparing the means of the feature values in contrast to comparing their curves. In this approach we do not care anymore about the order of feature values within the interval, but rather we are looking at the smooth estimate of the distances observed across raw feature values. Note that in contrast to the distance across curves method, which increases the contrast (by adding more information), here we are flattening the distances by averaging before comparing (so those are opposite approaches so to speak).

  

automatic authentication automatic authentication
automatic authentication
Previous Top Next

graphic

Batch cluster target mode Batch cluster target mode
Batch cluster target mode
Previous Top Next


Cluster target mode


graphicWe use this mode frequently to trace changes in similarity values during song development in conjunction with the clustering module. Once we have identified clusters and traced them during development we want to know how similar the cluster to the song model cluster. For example, the similarity to the song model might change gradually or abruptly, and tracing the similarity of all the instances of this syllable type to the model could tell us when, and sometimes even how the learning has occurred. Since the bird produces several thousands syllables of this type each day, it seems too costly to score each single one in reference to the model syllable, but using the batch mode makes it easy. Say that you have all the song files that SA+ saved for this bird during August 10 stored as wave files in a DVD. All you need to do is insert the DVD, start SA+ in similarity batch mode and select 'cluster target mode'. Next, go to the 'sound 1' page, open the model song and outline the target syllable. Then go back to the 'similarity batch' page, click 'Table 1' and open the syllable table of this bird (syllable tables are generated automatically by SA+ in both 'sound processing live' and 'syllable batch' modes). Now, identify the syllable type by typing a query. For example:

If clustering was done, and you would like to score cluster number 3, type 'cluster=3'

If you would like to score only non-modulated long calls, type 'duration>100 and FM<10'

The syllable table includes the entire song development of this bird, but the DVD only includes data from a few days of development.  Therefore, we need to specify which of the syllables in the table can be accessed in the current DVD. There are two ways around this. If your data are organized by dates, all you need is to add to the query the appropriate date by typing 'month=8 and day=10' for August 10. Alternatively you can find the record number (recnum) of this syllable in the table.  The most efficient way of finding a record in the table is using a query. Here is how it is done:

graphicBrowse the DVD and find the earliest file (say bird109_9520_on_Aug_10_15_32.wav). Now go to the mySQL control panel, browse to and click the 'SQL' button.




Type the following line:


select * from bird109 where file_name="bird109_9520_on_Aug_10_15_32.wav"

and click the ! button to run the query. The first recnum in the list (100020) is what you want.

graphic

Now in SA+ your query line can look like this

Cluster=3 and recnum>100010

Note that we only found the first record, how about the last one? Do not worry about it: once the batch fail to find the wave file in the specified folder it automatically stops.


Next, in order for SA+ to find the syllable, it has to make some assumptions about time unit conversion. By default, the time units are 1.36ms (as set in the 'advance window' slider). If from any reason you want to deviate from this default, you must type the new conversion in  the conversion box. graphic

Note: time conversion units will be automated (and transparent) in the next update.


Finally, you might want to wrap the outline syllable with some time interval, e.g., to test similarity also in the neighborhood of the syllable. We provide this option in the 'add before and after' box.

graphic


you should be aware, however, that changing the value might in some cases cause an error if SA+ try to outline an interval that does not exist in the sound file.


Note: SA+ does not check if the automated syllable outlining procedure passed the data boundaries. We will take care of this in the next update.


 You are now ready to run the batch.

  

Batch Setting step 6 Batch Setting step 6
Batch Setting step 6
Previous Top Next

Step 6: Identifying Data Age
                                                                                                                         Next ->

Tracking time is important in most batch operation. For example, knowing the hatching date of a bird, and knowing the date of data collection, the batch can relate each data record to the age of the bird and to the time of day, etc. All this information is then inserted to the data records of each table.

If you do not care at all about relating your sound data to any data and time  check the graphic and go to step 7 (the errors the will be ignored are, e.g., a negative age).

Otherwise, you will have to tell SAP2 how to detect the date & time of data in each wave file, and there are 4 alternative methods for doing that:

graphic

Warning: incorrect setting of date & time stamp might result in errors and data corruption

Get age from file age: this method uses the MS Windows FileAge() function. This function detects the age of the data -- not the age of the file, which is what we want. For example, say that you record a wave file on one day, and then copy the file later on. The copy of the file still carries the same FileAge() as that of the original file. In many cases, this option would work reasonably well, but this time is only 2s accurate.

Get age from SAP2 recorder files: Use this option when processing files that were created by the SAP2 recorder without any further processing (for files processed by the Live mode, use the next option). These files are of the following name format:

birdName_month_day_year_milliseconds.wav

 FileAge() used to get age from file age has one significant disadvantage: the actual accuracy of the date & time stamp is around 2 seconds (do not let the numeric accuracy mislead you!). Therefore, if two files were created less than 2s apart, getting age from file age might mix their order (something that caused us a major headache!). To avoid this problem, the SAP2 recorder is equipped with a accurate timer, which keeps track of the number of milliseconds elapsed since midnight, saving it in the file name. SAP2 batch can then extract that number from the file name and use it to improve the accuracy to the millisecond range.


Get age from SAP2 Live Processed files: Use this option when processing files that were created by the SAP2 recorder and then further processed by the SAP2 Live (or Sound Live) modules). These files are of the following name format:

birdName_daysNumber_msNumber_month_day_hour_min_sec.wav, e.g. R109_39152.86197671_3_11_23_56_37.wav


Get age from SAP1 files: This option is obsolete -- use "get age from file age" instead.

Batch Setting step 7 Batch Setting step 7
Batch Setting step 7
Previous Top Next

Step 7: Set acceptance criteria
                                                                                                                         Next ->

As in the Live modules, the batch can decide dynamically if a wave file contains meaningful data, and exclude files that fail the criteria. For example, you might want to reject files that contain little or no vocalization. Otherwise, check the Save all box.

graphic

The pass/reject criteria are as in other modules: SAP2 keep track of the longest syllable (continuous sound) duration, of the longest bout duration and of the number of syllables. Any combination of these can be used.

By default, when a file is rejected data are not saved into tables. This is different than what happens in the Live modules, where rejected files are deleted. Note, however, that you can use the batch to move passed files to a different folder.

Moving passed files to a different folder:   to move passed files to a different folder, go to the tab "saving sounds and options" and click

graphic

you can then set a sound output folder and test its accessability.

Batch table 1 x table 2 mode Batch table 1 x table 2 mode
Batch table 1 x table 2 mode
Previous Top Next


Cluster target mode


graphicWe use this mode frequently to trace changes in similarity values during song development in conjunction with the clustering module. Once we have identified clusters and traced them during development we want to know how similar the cluster to the song model cluster. For example, the similarity to the song model might change gradually or abruptly, and tracing the similarity of all the instances of this syllable type to the model could tell us when, and sometimes even how the learning has occurred. Since the bird produces several thousands syllables of this type each day, it seems too costly to score each single one in reference to the model syllable, but using the batch mode makes it easy. Say that you have all the song files that SA+ saved for this bird during August 10 stored as wave files in a DVD. All you need to do is insert the DVD, start SA+ in similarity batch mode and select 'cluster target mode'. Next, go to the 'sound 1' page, open the model song and outline the target syllable. Then go back to the 'similarity batch' page, click 'Table 1' and open the syllable table of this bird (syllable tables are generated automatically by SA+ in both 'sound processing live' and 'syllable batch' modes). Now, identify the syllable type by typing a query. For example:

If clustering was done, and you would like to score cluster number 3, type 'cluster=3'

If you would like to score only non-modulated long calls, type 'duration>100 and FM<10'

The syllable table includes the entire song development of this bird, but the DVD only includes data from a few days of development.  Therefore, we need to specify which of the syllables in the table can be accessed in the current DVD. There are two ways around this. If your data are organized by dates, all you need is to add to the query the appropriate date by typing 'month=8 and day=10' for August 10. Alternatively you can find the record number (recnum) of this syllable in the table.  The most efficient way of finding a record in the table is using a query. Here is how it is done:

graphicBrowse the DVD and find the earliest file (say bird109_9520_on_Aug_10_15_32.wav). Now go to the mySQL control panel, browse to and click the 'SQL' button.




Type the following line:


select * from bird109 where file_name="bird109_9520_on_Aug_10_15_32.wav"

and click the ! button to run the query. The first recnum in the list (100020) is what you want.

graphic

Now in SA+ your query line can look like this

Cluster=3 and recnum>100010

Note that we only found the first record, how about the last one? Do not worry about it: once the batch fail to find the wave file in the specified folder it automatically stops.


Next, in order for SA+ to find the syllable, it has to make some assumptions about time unit conversion. By default, the time units are 1.36ms (as set in the 'advance window' slider). If from any reason you want to deviate from this default, you must type the new conversion in  the conversion box. graphic

Note: time conversion units will be automated (and transparent) in the next update.


Finally, you might want to wrap the outline syllable with some time interval, e.g., to test similarity also in the neighborhood of the syllable. We provide this option in the 'add before and after' box.

graphic


you should be aware, however, that changing the value might in some cases cause an error if SA+ try to outline an interval that does not exist in the sound file.


Note: SA+ does not check if the automated syllable outlining procedure passed the data boundaries. We will take care of this in the next update.


 You are now ready to run the batch.

  

Calibrate thresholds Calibrate thresholds
Calibrate thresholds
Previous Top Next

Calibrating threshold to background noise level: in general, you should set the detection of a sound event (triggering) to a level only slightly higher then that of the background noise level. Remove the animal from the training box, start the SA+ Recorder. Make sure that the oscilloscopic view turned on. Click the Recording tab, Chan options and check the monitor only checkbox (both Active and Monitor only should be checked). Click ok, and then change the peaks value (top of the Recording window) to 1, and the occurred within value to 0. Now go to the main window and click Record. Locate the Thresh slider just below the oscilloscopic view and turn it all the way to the left -- a green light should turn on above the oscilloscopic view. Now move the slider slowly to the right until the green light turns off, and remains off (make sure that no sound enters the training box). Repeat for all boxes. This is the correct threshold level (you can go a bit above it, but not too much). Set the mode of each channel back to Recording, and turn back the recording setting to normal (do not leave the occurred within level at zero!). Place the animal in the box, and as long as the animal stays quiet, you should see that SA+ records nothing. Any sound produced by the animal (including cage noise) will trigger a recording session, indicated by the green light. Note that if you decide to change gain, you must recalibrate the threshold level, so it is recommended to first set the gain to convenient levels and keep it unchanged thereafter.
  

Chapter 3: Spectral Analysise

Chapter 3: Spectral Analysis

Comparing related and unrelcated songs Comparing related and unrelcated songs
Comparing related and unrelcated songs
Previous Top Next


. Comparing related and unrelated songs


In this example we compare the song model Samba to the song of a bird that was trained with this model. 
                                
graphicgraphic

























In the example below, we examine two songs that share a few syllable types, but in a different sequential order. Note that the low sequential match will reduce the overall similarity score to very low levels. In general, there are cases when sequential match is the most robust marker of song imitation, whereas in other cases, it is not. A lot depends on the spectral diversity across syllables. In some cases, and particularly if the recording quality is poor, you will find it very difficult to obtain a p-value that is selective enough to reject and accept sounds ‘properly’ but even in such cases, one can hope that songs that are ‘really’ similar, will show more sequential match than other songs. 

graphic
























For example, those two unrelated songs have several syllables of complex harmonic structure. Obviously, there are several similar sections, but their sequential match is extremely poor. In sum: in difficult cases, your choices are either to reduce the p-value threshold, or to take the sequential match as a criterion. 

graphicgraphic                     












In some cases we need to examine similarity between two sounds that differ in their time scale, such that one sound is a stretched or a squeezed version of the other, while the acoustic features of the two sounds (pitch, FM, etc) are otherwise similar. Although Sound Analysis does not perform any calculations of time warping, it can capture similarity across time-warped sounds, where the warp is represented as the slope of the similarity section.

One strength of scoring similarity by sections is that by setting appropriate levels of time warping tolerance SA+ can easily trace time-warped versions of a syllable type, because the similarity sections will ‘aim’ at the best angle that maximizes the similarity as demonstrated below:

Stretch
Squeeze
graphicgraphicgraphic

























As shown, warping the sounds did not have much effect on the similarity score, and we can clearly see the different warps as different slopes of sections in the similarity matrix. You can change the tolerance of Sound Analysis to time warps: Click options and find the time warping slider at the bottom left. Moving the slider all the way to the left will cause rejection of similarity across warped sounds. After changing the threshold you can click Reset, and Score to see how the results change.
  

Continuity Over Frequency

Continuity Over Frequency
 
For each time window ti we detect all the contours and measure the frequency range of each contour. The mean frequency range across the contours is the continuity over frequency.

Continuity Over Time

Continuity Over Time
 
For each time window ti we detect all the contours and measure the duration of each contour. The mean duration across the contours is the continuity over time.

Contour Derived Features

Contour Derived Features
 
We took a first look at the frequency contours in the last part of the previous chapter, and now we look at them more deeply, explaining how contours are calculated and used to compute song features. SAP2 calculate frequency contours by detecting zero crossings of the spectral derivatives. In order to reject artifact, we require that the contours passed a certain contrast threshold.

We use a dynamic threshold T calculated for each time window ti and frequency fi as:
graphicwere T’ is a user defined threshold
Therefore, the detection threshold is weighted by the distance from the mean frequency (the gravity center of the frequencies) and by the width of the power spectrum.

graphic

A pixel in the time frequency space is define as contour if i) there is zero crossing between the neighboring pixels at any one of the 8 possible directions (see diagram above) and ii) both neighboring pixels (in the direction of the zero crossing) are larger than T.

graphic

Correcting errors Correcting errors
Correcting errors
Previous Top Next

Correcting errors


It occasionally happens that clustering goes wrong, and the wrong data are written into the table. This does not occur when SA+ warns you about the back-tracing problem, but rather when SA+ does not warn you but visual inspection indicates that clustering went wrong. To correct a false clustering click the forward arrow of the 'time control', change the cluster name to 0, and click trace-back twice.
  

Cummulative DVD images Cummulative DVD images
Cummulative DVD images
Previous Top Next


Looking at the movie, you can see how clusters (syllable types) emerge and change with development. In order to obtain some solid evidence to this change, it is often useful to compare two 'frames' of the DVD map. To best observe the long-term effect of time on the distribution of syllable features, select 'color by linear time' from the color scheme, move the 'Time control' slider to the beginning of song development and then select the 'cumulative movies' tab. Select duration for the X axis and FM for the Y axis. In the marker size selection, click 'tiny' and then click 'DVD Play/Stop' to start the movie. Color will slowly change from yellow to red. The little slider to the right of the 'color by linear time' control can be used to change this rate. The result should look like the figure below. Note that you can also change colors manually at any time during the movie, as well as moving the 'Time control' slider to different developmental time, either during play or offline.

graphic



















                      
Changes in feature distribution occur also during short time-scales, e.g., after night sleep or during certain hours of the day. Click 'clear' and change the color scheme to 'color by time of day'. SA+ will now code circadian time by color - red for early morning, orange for late morning, green for noon, blue for the afternoon and black for the evening song. Set the time control to late song development and click 'DVD Play/Stop'. You should be able to see how the duration fluctuates with the time of day. Looking at other features during early development will show even stronger changes - try the Wiener entropy variance. Feel free to switch back and forth between dynamic movies and cumulative movies on the fly.


graphic
  

Display options Display options
Display options
Previous Top Next


Display options

Those include the window size, oscilloscope display, and scrolling speed. It is always useful to observe the input signal (sound, or neuronal activity) in real time. In SAP2, however, it is even more critical, because the shape of the input signal us used to determine values of the sound-detection parameters. Those parameters control the behavior of the sound activated recording. False negatives (failing to detect a song), or indiscriminate recording are likely consequences of inappropriate parameter settings.

The default display mode is of side-by-side view of the recorder and processing applications. Keeping those applications separate promote stability of recording regardless of analysis issues. When you set the recorder parameters, however, it is recommended to use the “broad” mode, which allows you to see more details, but also the ongoing statistics of the real-time analysis of the waveform:

graphicgraphic



















The Oscilloscope Display allows you to choose between regular oscilloscope and a scrolling display as shown below. The scrolling display allows you to see the amplitude envelope of the signal scrolling (slowly or fast) from right to left. It will allow you to identify, for example, song syllables. The horizontal red doted lines represent the recording threshold. We will talk about this threshold in details shortly, but note how easy it is to see the cases where signal amplitude passed that threshold. Peak Count is showing you the number of samples that passed the threshold during the current “session”.

graphic


Keep in mind that SAP2 allows pre-triggering of recording, namely, the recorder keeps a ring buffer of the signal so that it can decide if to save or discard a recording session retroactively (a few seconds later). Therefore, depending on how selective you like the recorder to be, you might want to see a larger chunk of sound in the display mode. Moving the scroll slider to the left allows you to see several seconds of sound signal, e.g., to detect a song bout by visual inspection. Of course, the display is irrelevant to the detection algorithm, but it allows you to judge the behavior of the detector based on your visual inspection.

Switching to oscilloscope display, now allows you to see the waveform of the data as shown below:

graphic






Switching the speed of the “scrolling will now alter the updating rate of the oscilloscope, which can be useful while looking for neuronal spikes for example.

Note: The Display of each channel can be set individually (e.g., as scrolling or not) using the settings menu. When changes are made via the control panel, those changes immediately apply to ALL channels.
  

Excel interface Excel interface
Excel interface
Previous Top Next

graphic

Exhaustive clustering Exhaustive clustering
Exhaustive clustering
Previous Top 

Exhaustive clustering


As stated earlier, the cluster analysis method included in SA+ is far from perfect, and we hope to incorporate the new approach that Aylin Cimenser is currently developing shortly. In difficult cases, one can still succeed to cluster using the exhaustive clustering approach as following:

·     Start from the easiest (most prominent and distinct) clusters and cluster them throughout development.
·     Use the mySQL Control Center to split the data into unclustered and clustered tables:

CREATE TABLE bird109_unclustered SELECT * FROM bird109 where cluster=0;
CREATE TABLE bird109_clustered SELECT * FROM bird109 where cluster>0;
ALTER TABLE bird109_unclustered CHANGE recnum recnum PRIMARY KEY;

·     Open SA+ and cluster bird109_unclustered - you might find that this is an easier task now
·     Finally, merge the two tables in mySQL Control Center typing

                   INSERT INTO b109clustered SELECT * FROM b109unclustered

Comment: always make sure that recnum is the primary key and that the table is sorted by recnum. You can always change sorting by editing the table and changing sorting in
 graphic
Also, you can sort by any field after deleting recnum, and then recreate recnum as the primary key and auto-increment to change sorting order as desired.
  

Exploring similarity across features Exploring similarity across features
Exploring similarity across features
Previous Top Next


. Exploring similarity across features


graphicAs stated above, the global similarity estimate is based on five different acoustic features. You can view the similarity across each feature separately by clicking one of the buttons in the similarity display group. As noted earlier, asymmetric similarity is estimated in two stages: first global comparisons across 70ms intervals are used to threshold the match and detect similarity sections. Then, similarity is estimated locally, based on frame-by-frame scores. Both local and global distances can be viewed, and those views are useful to assess what might have gone wrong when the similarity results do not seem reasonable. For example, you might discover that the pitch is not properly estimated, which will show similarity across all features but pitch.

FM
AM
graphicgraphicgraphicgraphicgraphicgraphicgraphicgraphicThe effect of global versus local estimates can be seen in the example below showing FM and AM partial similarities on local and global scales. Note that locally, FM shows a similar area in the middle of the matrix where the two sounds are not modulated, and we can see four bulges emerging from each corner of the central rectangle. Those are the similarities between the modulated parts of the syllable. Since the syllable is frequency modulated both in its onset and offset, we have similarity between beginning and end parts. Now look at the global similarity and note how the rectangle turned into a diagonal line, which captures the similarity in the transitions from high-low-high FM. In addition, we see short sidebands, indicating the shorter scale similarity between the beginning of one syllable and the end of the other. Now examine the partial similarity of AM. Here the local similarity does not show any similarity between the beginning of one sound and the end of the other sound, but it does show strong similarity between the two beginnings and the two ends. This is because the sign of amplitude modulation is positive in the onset and negative at the offset of each sound. Hence, when looking at the global AM matrix we do not have sidebands.

Overall, the message is that by comparing similarity across different features we capture different aspects of the similarity. By taking all those features into account, we can often obtain a reasonable overall assessment of how good the similarity is, and we might then also develop some understanding of meaningful articulatory variables that are similar or different across the two sounds. However, it might also happen that the similarity is good in some features and poor with respect to others, and in such cases, it might be desired to omit some features from the global estimate (this is not something you want to do just in order to obtain a better match!). In the options (similarity tab), you can set different scales and exclusion of features:
graphic
  

Exporting tables into Matlab Exporting tables into Matlab
Exporting tables into Matlab
Previous Top Next

Exporting tables into Matlab


To use Matlab: download the files () from the link in () and copy them to ..\Matlab… This free utility will allow you to use MySQL in Matlab using the function 'mysql()'. To start type
mysql('open');

And Matlab should say something like:
Connecting to  host=localhost Uptime: 81473  Threads: 5  Questions: 5529  …

Then type
mysql('use sap');

and Matlab should say “Current database is "sap"” and that's it. You are now ready to type queries.
 

We maintain another method of exporting data to Matlab directly through SAP2, using this method is no longer recommended (using the interface is much faster and more robust). Assuming you have Matlab installed, first, make sure that you know what is the table you want to open. The current table name is presented at the right side of the screen:graphic Click 'Export to Matlab' and open that table. Check the fields you would like to export to Matlab. Note that this procedure can only export fields that contain numbers, but not strings (do not try to export 'file name'!).

 graphic

Then click 'Convert to Matlab' and type the new file a name including '.mat' extension, e.g., MyData.mat .

Note that you can select how many records to copy and where to start from.
Click 'open' and a message saying 'Mat file created successfully' should appear in the text box. Open Matlab and open the mat file. Type
 
MyData=LocalDouble'; % we need to transpose the matrix

 And you are all set.
  

Exporting the Feature Distribution

Exporting the feature distribution
 
Exporting the feature distribution results to Excel All the results of SAP2 processes are stored in MySQL tables. The next chapter will show you how to access those tables directly, but in addition to this, SAP2 provides two mechanisms of exporting data, to MS Excel and to Matlab. Starting with Excel (assuming that you have MS Excel installed, and assuming that you followed the installation instruction in click ‘Export to Excel’. You should see that MS Excel opened up with your data.


Features Space

Feature across an Interval
 

Feature space
It is sometimes desired to observe the distribution of raw features either within a syllable or within a longer time scale. This is a qualitative method, and its analytic equivalencies are the principal component analysis (PCA) and a variety of cluster analysis and feature classification techniques. Launch SA+, go to ‘explore and score’ and open Example 2. Now outline Syllable 1:

graphic

Click the feature space tab. On the top right you will find two radio-buttons groups titled ‘Select sound’ make sure that sound 1 Draw All button position is down. Select the red button from the color palate on the top and click Draw .
The image should look like this:

graphic

Next, outline the second syllable, click on the blue color and on the X marker, and then click Draw:

graphic

Now rotate the image and pan it using the mouse, until you can only see Wiener entropy versus pitch:

graphic

graphic

Clearly, the two syllables differ in pitch and in Wiener entropy, but less different in FM.  Now, outline syllable 1, select Red, and Draw, outline syllable 2, select Blue and Draw, outline syllable 3, select Green and Draw:
graphic

What is this exercise good for? Well, it is a way of examining the emergence of categories of sounds in a continuous space. For example, if you suspect that ‘Dee’ notes of two populations of chickadees are distinct – you may examine the data in feature space. Outline each note type and use a color code to distinguish between the populations. Select several examples from each population and see how it looks like before you get yourself committed into heavy quantitative analysis.

graphic

Finding and Filtering Data

Finding and Filtering Data
 
Say that you would like to find any particular bird, or perhaps a group of birds, based on certain criteria. There are two easy way of doing this using the find and filter fields of the table display. Typing “R109” into the find field while observing the data below you can see how the data dynamically sorts itself to match your criteria. Note that this is a case sensitive matching.

graphic

Using the filter is slightly more complicated: first type “R109” in the filter field and then type Enter (forgetting to type Enter is a common error), note that nothing really happened yet:

graphic

Now click “Apply filter”:

graphic

You can use combined filters with “wild cards” labeled as * (for everything) or use any regular expression you like in different fields – do not forget to type enter after each typing. Here is an example that returns all the birds whose name start with RR simple.



Click “Clear filters” after finding or filtering data to return to the full table display.

flexible frequency range flexible frequency range
flexible frequency range
Previous Top Next

graphic

Frequency Contours

Frequency Contours
 
Open the song “Example 2” and Click the “Contours” button (next to the “Derivatives” button):

graphic
Fig 1: Decrease in contour threshold - show zero crossings

Click back and forth a few times “Derivatives” and “Contours” and as you can see, this display show the zero-crossings of the spectral derivatives. This algorithm is a bit more complex than it seems, details are given in the next chapter. There are two parameters that might need manipulation to obtain contours of good quality with your data. This adjustment is recommended because the contours are used to calculate some of the sound features (continuity and principal contour).

graphic
Fig 2: The contour threshold slider

Decreasing the contour threshold will detect zero crossings in a less contrast background. For example, decreasing the threshold to zero (and then reopen the sound) looks like this:

graphic
Fig 3: Sound with contour threshold set to zero

Now set the threshold back to 10 and reopen the sound looking at the zero crossing, this will give you an intuitive feeling to what the threshold is achieving.
In "options & settings" you will find a second parameter:

graphic
Fig 4: Sound with contour threshold set to 10

The second parameter, contour bias, has no visual effect but it does affect some interpretations of the contours, we will discuss these interpretations in the next chapter.

In "options & settings" you will find a second parameter:
graphic
Fig 5: The contour bias paramenter

The second parameter, contour bias, has no visual effect but it does affect some interpretations of the contours, we will discuss these interpretations in the next chapter.

Frequency Modulation (Contour Slope Estimate)

Frequency Modulation (Contour Slope Estimate)
 
Another estimate of frequency modulation is based on the slope of the principal contour in units of Hz/millisecond. This estimate is directional (positive for upsweeps and negative for downsweeps).

Frequency Modulation (Derivatives Estimate)

Frequency Modulation (Derivative Estimates)
 
Formal definition:
that is, FM is the angular component of squared time and frequency derivatives. This measure gives an absolute (unsigned) estimate of frequency modulation.



Frequency modulation is estimated based on time and frequency derivatives across frequencies. If the frequency derivatives are much higher than the time derivatives, we say that FM is low and visa versa. Visually, FM is an estimate of the (absolute) slope of frequency traces in reference to the horizontal line. Note: SAP2 calculates an absolute estimate of FM, but it is easy to change the definition to obtain signed FM.

Glossary of terms

Glossary of terms
 
Sound Data: Sound data are defined as files of digital sound. See chapter 8 for information about digital recording, digitizing and re-sampling sounds.

Digital recording: is a sequence of sound pressure data (waveform) in a digital format. The two parameters that determine the quality of the data are sampling rate and sampling accuracy.

Sampling rate: determines how many samples of sound data are collected during one second (Hz), e.g. 44100Hz is sometimes referred to as ‘CD quality’. The accuracy of the digital representation of each sound pressure data is measured in bits, e.g. 16 bits means that each sample of sound is represented by one of 216 = 65536 possible values. Sound Analysis require accuracy of 16 bits and sampling rate of either 22050, 44100 or 88200 Hz.

Hertz: (symbol: Hz) is the SI unit of frequency defined as the number of cycles per second of a periodic phenomenon

Sound units: Sound Analysis distinguishes between silence intervals and signal.

Syllable: is defined as a continuous sound, bounded by silence intervals. Depending on the task, Sound Analysis sometimes treats the sound as a continuous analog signal and sometimes as a set of syllables.

Fourier transformation: Fourier transformation (FT) transforms a short segment of sound to the frequency domain. The FT is implemented algorithmically using the Fast Fourier Transformation technique (FFT).

Time window (frame): Time window is the duration of the segment of sound upon which Fast Fourier Transformation technique (FFT) is performed. Sound Analysis default is 409 samples of sound (9.3ms) and the next window starts 1.4 ms after the beginning of the previous one and has therefore an 85% overlap with the previous window.

Spectrogram: is a sequence of spectra computed on such windows, typically represented as an image where power is represented on a scale of gray ranging from white to black. Because frequency resolution is finite, the spectrogram does not capture a ‘pure sine wave’ but represents the ‘frequency’ as a ‘trace’ of a certain distribution in time and frequency. Using long time windows improves the frequency resolution at the expense of time resolution. 

MultiTaper (MT) spectral Analysis: MultiTaper methods are a modern framework for performing spectral analysis. In particular, they lead to spectral estimates that are similar but superior to the traditional spectrogram. Apart from the estimation of the spectrogram, MultiTaper methods also provide robust estimates of derivatives of the spectrogram as well as provide a framework for performing harmonic analysis (detection of sine waves in a broadband noisy background).

Spectral derivatives: The traditional sonogram represents the power of sound in a time-frequency plan, while the spectral derivatives represent the change of power. The derivatives behave as ‘edge detectors’ of frequency traces in the time-frequency plan, and provide a superior spectral image.

Spectral Analysis:

Goodness of pitch

Goodness of Pitch
 
Formal definition: Goodness of pitch is the peak of the derivative-cepstrum calculated for harmonic pitch
graphic

Units are comparable to amplitude and can be converted to dB by subtracting a baseline and converting to log scale (not implemented in current version).

graphic


This is a measure of how periodic the sound is. It only captures the 'goodness' of harmonic pitch, whereas both noisy sounds and pure tones give low values. Noisy sounds, however, also give high entropy.

Therefore the combination of Wiener entropy and goodness of pitch is useful:

graphic

hardware instalation
Previous Top Next


Hardware installation


General considerations
In the previous chapter we provided a brief summary of 8 features of SAP2. If you are only interested in features 3-7 (that is, without the recorder and live analysis components), skip to section 2B (Software Installation). If you already have a recording system that includes a multi-channel sound card, microphones connected to an existing experimental setup, and you want to implement SAP2 as a recorder - but not for operant training - you should read parts II and III of this section. Otherwise, you should read the entire chapter.

Section I: Hardware installation from scratch
 
Setting training boxes

graphic
Building your own sound boxes is easy and will save you several thousand dollars. The sound isolation quality will depend on the thickness of the foam used, and on the accuracy of application, but in general, it should be easy to achieve at least 30dB of attenuation.  In zebra finches, we did not detect any penetration of bird vocalization sounds across boxes (as long as both boxes are closed and airflow is intact).

Coolers: There are many brand names (cost is bout $40-$60 per box). There are certain differences in the quality of hinges and ease of handling. We used Thermos 110 qt Coolers (Outside: 34 x 16 x 17; Inside: 32 x 14 x 15). It is very likely that you will have to replace the hinges after a few years of usage. 

Sound isolation foam: The material cost is about $50 per box. We recommend SOUNDFOAM "M" from www.soundcoat.com/absorption.htm which is an acoustic quality, open cell, flexible polyether based urethane foam designed specifically to provide maximum sound absorption in minimum thickness. It is characterized by outstanding humidity resistance, excellent chemical resistance, fine and uniform cell size and consistent sound absorption. It is available plain, embossed, or with decorative and protective surface finishes of aluminized polyester film, Tedlar, and matte urethane film. Soundfoam "M" is supplied with one of several high performance pressure sensitive adhesives in “ready to use custom die-cut parts. Glazed ¾ and ½ “: thick for the sides and front lead:

graphic

Use silicon glue to suture the joints between sheets (you will need glue gun and silicon glue tubes). xzLatching: Remove the latches from the cooler and replace them by metal locks. We use Stanley LifeSpan USA+ from Home Depot.

Cages: 18 x 9 x 10 Previw-Hendrix Model DB (Breeding small cages with a split, which is not used for our application).

Lights: We use LB1-W12, 12-LED Light Bars (http://www.superbrightleds.com/light_bars.htm)

graphic

12 VDC Power Supply (http://www.superbrightleds.com/specs/ps_specs.htm) of 5Amp can be used to connect at least 4 bars (namely for 4 cages). You will also need a 36” long LB1 Jumper – Power cable and at liest 2 Light Bar Mounting Tracks per cage. Overall cost is around $30 per cage (including the partial coast of power supply). You will need to do some soldering and make sure that the power supply is secured in an appropriate enclosure (the provided enclosure is not secure enough – place it inside an appropriate electric box).

Air supply: The airflow system should be centralized. We use 100W or 120W aquarium air pump model #LPH120. www.Jehmco.com or http://www.jehmco.com/PRODUCTS_/HARDWARE_/Central_Air_Pumps/ This should suffice for 20 boxes or so. These pumps are very quiet since they are surrounded by a ‘sound box’. You can choose between a few different capacities. The highest one (xxx Amp) should suffice for 20 training boxes, whereas the smaller ones () can support 10 or 15 boxes. The pump cost is about $300 (about $20 per box).

Airline Silicon tubing: 250 ft, small diameter tubing with good damping of sound noises, from Foster & Smith. Price is about $30.
 
Racks: use any comfortable shelving solution, and get the coolers fixed (e.g., by bungee cords or by screws). 


graphic

























  

hardware instalation: sound hardware
Previous Top Next

Sound Hardware

Microphones
There are many good (and expensive!) choices, but we feel that most condenser microphones can provide reasonable sound quality. We like the EarthWorks microphones, (SR69 or SRO). They have high sensitivity and a very flat frequency response up to 20kHz. Another good, and less expensive option is the Audio-Technica

http://www.audio-technica.com/cgi-bin/product_search/wired_mics/mics_by_type.pl?product_type=Microphones%3A+Small-diaphragm+Condenser

Some sound cards supplies appropriate phantom power and the combination works very well. One thing we discovered though is that those microphones should be protected from bird feces. We discovered that zebra finches can spray their feces in a straight upward direction (they do so while performing wheelbarrow jumps in the case!) and a direct hit from a single piece of feces can destroy the microphone. A piece of cloth set loosely around the recording tube solved that problem.


Sound card or Analog Input card

Option 1: Buy any multi channel sound card that are MS-DirectSound compatible (ASIO wont cut it). Delta cards are inexpensive and stable and SAP2 is now compatible with both recording and playbacks via those cards:

http://www.m-audio.com/index.php?do=products.family&ID=PCIinterfaces 

Option 2: Most computers have a sound card already installed. SAP2 can use such inexpensive cards, but their quality is often low and you might experience about 5% leaking of sound across stereo channels. You can install two such cards to allow recording from 4 channels but in most cases you will have to use cards of different brands to avoid driver conflict (which can be very nasty). In addition, you will need to buy a preamplifier for each microphone and for your speakers as well.

Option 3: National Instruments Card. SAP2 is compatible with several analog NI cards. For more details about how to install those and fit them to SAP2 click here

Speakers, mirrors & plastic models
The configuration of keys, plastic models and mirrors in the training boxes appears to affect the outcome and probably also the speed of song learning. We are now testing several combinations of speakers & plastic bird models & mirror configurations. The most updated information will be posted in our website.

  

hardware instalation: storage media
Previous Top Next

Data Storage Solutions


Option 1: External Hard Disks (HD)
At this time (winter 2008) 500Gb external Lacie HD with USBII enclosure of costs about $130. A new “brick”” configuration with 500Gb is now available for a similar price. Advantages are that connecting them with USB cables is easy and require no installation, and it is easy to connect several such HDs to a single computer. The USBII interface is fast enough. The disadvantage is that there are sometimes reliabilities issues with those connections, and that a backup solution is necessary.
graphicgraphic
Options 2: RAID 5 storage

RAID 5 is a configuration of a disk array, which combine several HDs into a single, fault tolerant drive. The cost of a Raid system is relatively high (about $1000 per terabyte), but storage is extremely fast, and fairly fault tolerant. RAID 5 can withstand the failure of up to 2 drives without losing any bit of data. It is usually “hot swappable” which means that when a HD has failed, you simply replace it without powering down.


Option 3: burning DVDs
The solution is no longer recommended. We used to use is to save data to DVD/R media. The cost of labor and media no longer justify this option.



  

How Features are used in SAP2

How Features are used in SAP2
 
Everything that SAP2 does is based on the features presented in chapter 4, including: Segmentation and Charectarization of syllable structure, Similarity measurments, Clustering of syllables to type, Dynamic Vocal Development maps, etc.

In this chapter we present a few methods for exploring feature and of exporting them  to Excel. In this chapter we start by looking at feature summary statistics within an interval, within a syllable, or over several syllalbes. Then we will look at the distribution of continuous (ms by ms) feature values using the 3D “feature space” utility.

How SAP2 handles Animal Information

How SAP2 handles Animal Information
 
Until now we looked at derivatives and features without identifying the animals that the data belong to. Once you inserted animal information into SAP2, such as animal name, date of birth, training method, etc, this information is automatically retrieved each time you access data from that animal. This automatic retrieval has different implementations in different modules, but the example below will clarify the principle and the advantages of this mechanism. Overall, we recommend that each animal you analyze be identified, as shown below.

Let us start by looking at some song development data of a bird called R109. When SAP2 recorded this bird, the user already inserted the name of the bird, but your new SAP2 does not have this information. To insert  a new bird using the Explore & Score module, click the “New/Change bird” button. You will see an “animals” window that allows you to either select an animal or to insert a new animal.
Click the “New Animal” tab and fill in the fields as shown below:
graphic


From now and on, each time you record from this bird, and each time you open a wave file of this bird, and each time you access any kind of data derived from this bird (e.g., syllable table), SAP2 will use the information you just inserted to make more information available to you.

For example, in each time you record from this bird, SAP will generate wave files that start with the bird’s name, e.g.,:

R109_39152.86207609_3_11_23_56_47.wav

We will describe the SAP2 files and table names annotation method in the next chapter.

Each time you open one of those files, SAP2 will recognize the file as belonging to bird R109, and based on the file attributes that indicate when the data were generated, and based on the hatching date information, will present to you the age of the bird at the time of recording this file.  Let’s try it:

Click “Open Sound” and open the sound
graphic

The age 66 indicate that this bird was 66 days old when these data were recorded. Once you have several birds in the SAP2 database it is easy to look at one, or a few of them, according to any search and filter criteria. We will start by uploading my CCNY animals database.
Go to xxxxxxx and copy the files … to …Now exit Explore & Score (or open SAP2) and click “Database”:

graphic

The first page, Animals, present a list of animals we experimented with. Use the mouse to click on different records, and scroll up and down, note that the data entry fields below are updated accordingly.

If you like to change a bird entry, or to add a new bird (perhaps based on some of those values already entered for another bird) click “Allow Add/Update”. The table and fields now turned yellow, which indicates that you can now edit the data fields. If you change any field other than the “Name”, and then click Add/Update Record, the information about the bird will be updated according to your changes. If you changed the “Name” field and then click Add/Update Record, SAP2 will add a new bird at the end of the table.

graphic

How to practice this tutorial in MCC or in Matlab
Previous Top Next

How to practice this tutorial in MCC or in Matlab

This tutorial can be practiced in either the MySQL Control Center or in Matlab.

To use the MySQL Control Center (MCC): open MCC (Start -> All programs -> MySQL Control Center).
If this is the first time you open MCC please follow the instructions in section 6c. If the server (root) is not connected, right click on it and choose 'connect'. Once the consol manager shows the databases, select 'sap' so that the tables of 'sap' database are shown at the right panel. Double click on a table to open it and then click the 'sql' button (below the console menu). You can then easily type your query and then click '!' to execute. The results of the query (if there are any) should show up at the lower panel.

To use Matlab: download the files () from the link in () and copy them to ..\Matlab… This free utility will allow you to use MySQL in Matlab using the function 'mysql()'. To start type
mysql('open');

And Matlab should say something like:
Connecting to  host=localhost Uptime: 81473  Threads: 5  Questions: 5529  …

Then type
mysql('use sap');

and Matlab should say “Current database is "sap"” and that's it. You are now ready to type queries.

 
  

improved database improved database
improved database
Previous Top Next

graphic

improved similarity score improved similarity score
improved similarity score
Previous Top Next

graphic

Input Display Input Display
Input Display
Previous Top Next


Input Display

Let's now look at the Input Display more thoroughly:

graphic

Clicking on the Channel Settings opens a pop-up menu that allows you to change all the parameters of that channel. Everything you see in the input display and control window is some of those parameters “exteriorized” so as to give you an “express access” to those parameters on the fly. We will start by presenting those exteriorized parameters, and then will systematically go through the channel settings menu.

graphic
Input channel identity shows you the current input channel. In order to see the entire identity string, click the “long” button in the control panel.

The two led lights - green and red, are very important:

Recording Indicator: The green led, when turned on, tells you that the channel is currently recording data into the hard disk. The behavior of the indicator depends on the channel mode. For example, in an Active mode, the green light turns on when recording is triggered by that channel.  

Trimming Indicator: The red led light, when turned on, indicates trimming (bad news). Trimming occurs when the channel input amplitude has reached the maximum value of the digital recording range. Such recording is of bad quality and frequency distortions. This is particularly an issue when the bird makes very loud calls and very soft songs. If you are more interested in recording high quality songs, it is not a big deal if trimming occurs while recording very loud calls.
graphic

                    Input gain (yellow), Trigger threshold (red) & Display gain (blue)


Each channel has three color sliders: yellow (input gain), blue (display gain) and red (peak threshold).

Input gain (yellow) slider Note: there is an accuracy (recording quality) cost to changing gain, but it is often low, or negligible when you change the gain by a few notches only. The input slider controls the gain of the channel, namely it can increase or decrease the volume of the input signal. For example, turning it to 2 will give you wave file of twice the intensity. If you find that you are trimming data (red light on) you can turn it down a bit, or if you want to maximize gain, you can go up until trimming start occurring. We generally recommend to keep the gain at or around 1, and instead, change signal amplitude prior to digitization. This can be done either by your external amplifier (if you have one) or by the control panel of your sound card (or of the NI card).  To understand the cost of using the gain control, consider the following example: in 16 bit recording, the input signal is mapped into numbers ranging from 32,768 to -32,768. Now, if your signal is weak compared to that range (say ranging from 3,200 to -3,200) and you then set input gain to 10, then your signal will look loud in the wave file, since it covers the entire digital range. However, this multiplying by 10 gives number in steps of 10, and in fact, gives you a recording quality of about 12bit, instead of 16bit. If, on the other hand, you amplify the input signal in the sound card control panel, you do not loose any accuracy.

Trigger threshold (red) slider set the threshold to trigger a recording session. When the channel is recording, the threshold is shown as two dotted red lines in the oscilloscope display, as shown in the earlier figures.

Display gain (blue) slider is the flip side of the peak threshold slider. It controls the gain of the signal display and has no effect on the recorded signal. However, it does affect the recording triggering just as the peak threshold slider does. For example, if you want to record soft sounds it is more convenient (visually) to increase the peak gain than to decrease the peak threshold.
  

instant clustering instant clustering
instant clustering
Previous Top Next

You can now cluster your sound in a few clicks, and instantly see visualization of the cluster analysis on top of each sound

graphic

Interpreting the similarity score Interpreting the similarity score
Interpreting the similarity score
Previous Top Next

. Interpreting similarity scores


graphic
Scoring similarity between two sounds as described above might work well in some cases, and less well in other cases. It should be used carefully and wisely. The outcome of similarity scoring depends heavily on appropriate scaling of the features to units of median absolute deviation from the average in the ‘population’. The next chapter explains how to scale features and when new feature scaling should be considered. A related factor is feature weight: the default assumption is that the five features are equally important. This assumption has not been tested, but empirically, giving an equal weight to the five features works well for scoring song similarity across adult zebra finches. Each feature has different strengths and weaknesses and together, they complement each other. The feature that is most likely to cause you troubles is pitch: pitch is sometimes difficult to calculate, and an unstable pitch estimate might bias the scoring procedure.

Reading about the complexities involved in calculating similarity scores you might wonder about the consequences of improper use of these methods. Compared to the human-observer scoring method, the automated approach has pros and cons. No doubt, the judgment of any human observer is preferred over automated methods. The main difference is that automated methods provide well defined metrics for distances between sounds and can quantify subtle differences. Statistically, however, you should handle automated similarity scores just as you would handle human scores, except that you might consider using parametric methods (if the scores distribution appears to be normal) – but this is not a big issue. If at the end of the day all you care about is whether two groups of animals differ in their sounds – it might not matter much how the scores were calculated, under what assumptions, etc. For example, if you use the feature scale of zebra finches on monkey sounds, and find strong differences in similarity scores across two groups of animals using some non-parametric estimate of the scores, the difference is real regardless of the strong biases injected by using a wrong normalization. However, wrong normalization means that some features got might higher weight than others in the overall estimate. You do not want to use wrong normalization since this might reduce the sensitivity and reliability of scoring method, making it least-likely that significant differences will be found. Further, if you did find an effect, it is most likely due to a single feature that got uninteddently high weight and biased the score. Overall, in most cases, you will want to use the scoring method that maximizes the difference between your groups. The actual p-value used for threshold is just a yardstick, and it has nothing to do with statistical significance per se.
  

intro
 Top Next

Click on the images to proceed

graphic

Introduction to Spectral Analysis

Introduction to Spectral Analysis

This chapter presents some concepts of Spectral Analysis and Acoustic features including the minimal knowledge base needed in order to properly use SAP2 (and to understand the next chapters). We will use the Explore & Score module to present those concepts. This module is similar to the previous versions of Sound Analysis with several new features (you will also note that real-time performance is now about 20 times faster).

Because of time constraints, a few features that existed originally existed in Sound Analysis Pro were not yet implemented in the new version. Previous versions provided three spectral views: derivatives, sonogram, and contours, the current version however only include the spectral derivatives and (a low quality display of) , but no contours display. We also did not implement that ‘frequency range’ and the ‘number of tapers’ controls.

Introduction to clustering Introduction to clustering
Introduction to clustering
Previous Top Next

Background & computational approach



In the previous chapter (DVD-maps) we saw an example of song development where syllable-feature distribution is broad during early stages, and then feature distribution becomes clustered. Most clusters (or types) appear to stay, and one can often keep track of each cluster throughout the rest of the movie. This cluster analysis module performs the (difficult) task of automatically classifying syllables into types and tracking each type during time. We use the terms 'type' and 'cluster' interchangeably, but one should keep in mind that a cluster is not a generic type, but a transient entity that is only defined in one animal (during a particular time). More often, the term 'syllable type' refer to species-specific sound, sometimes in relation to perceptual categories or other behaviors. The narrow definition of type has some advantages, however, of being simple, robust and 'analytic'. 

At best, the clustering method is as good as the segmentation method is. We have no doubt that both cluster analysis and segmentation methods presented here are sub-optimal. Aylin Cimenser, a Physics PhD student at Columbia University is now in the process of optimizing both methods, and we believe that it should be possible to properly segment and cluster vague sounds that the current technology cannot handle.  

The cluster analysis methods used here were implemented by Aylin Cimenser at Columbia University under the supervision of Partha P Mitra and Tim Halperin Hilly. The algorithm is based on the Nearest Neighbor Hierarchal Clustering (NNC) method. This is a non-parametric, density-based method (in contrast Gaussian mixture methods and K-means, which are parametric). This works nicely when clusters are reasonably segregated, regardless of their shape, and the back-tracing method is fully automated and self-checking. We cannot claim, however, that this approach always works. Rarely, clusters that can be identified visually are not easily identified. Users should keep in mind that a perfect solution for clustering problems should not be expected. It is no coincidence that there are many alternative cluster analysis methods in the 'market' - there is no generic solution to this problem. Namely, one cannot cluster data without making some strong assumptions about either shape or density of clusters, and those assumptions are sometimes false. For example, NNC does not handle well cases where some dense clusters are too close to sparse clusters. 

More significantly, in song development, clustering has to be performed by recursive steps, starting from the end of song development, where identifying clusters is very easy, and concluding at an early stage of song development, when clustering becomes increasingly difficult, and finally impossible. The difference between approaches is not if the procedure will fail - but when.

NNC is a very simple algorithm and we encourage you to understand it:

1.    Obtain a sample of a few thousand (say 3,000) syllables produced at a given time and compute features.
2.    For each feature (e.g., syllable duration), calculate Euclidean distances between all possible (3000 x 3000) pairs of different syllables. Repeat for all features, scale units and add. This step includes about 50 million multiplications (but only about 1s).
3.    Sort syllable pairs according to Euclidean distances in ascending order and discard syllable-pairs with a distance greater than a certain (say 0.01 MAD) threshold.
4.    Now the best-best-friends, the syllable pair of shortest distance (nearest-neighbor pair), are establishing the first cluster.
5.    Examine the next pair of nearest neighbors and consider the following scenarios: 
n    Case A. If both syllables are new (not clustered yet), define another cluster with this pair. 
n    Case B. If one of the syllables is already a member of a cluster (remember that any one syllable can appear in many pairs!), and its pair is not a member of any cluster, add this pair to that same cluster (that is: a friend of my friend is my friend!).
n    Case C. If one syllable belongs to one cluster, and the second one to a different cluster - merge the two clusters.
n    Case D. If both syllables belong to the same cluster, do nothing.
6.    Repeat step 5 for all the remaining pairs (with a distances above threshold) according to distance order.

In many cases, you will find that the default threshold suffices to properly cluster the data, but be aware that the choice of threshold is very critical. Setting the threshold too high will join all the data into a single cluster. The threshold is determined empirically, and if not optimal you will experience one of two problems: if the threshold is too conservative, you will see too many clusters and many residuals (not clustered) syllables. If the threshold is too liberal, you will see 'false joining' of distinct clusters. In some cases (the most difficult ones) you will see both false joining and many residuals. To address these cases, you can endow a veto power to certain features. The meaning of veto is: 'do not merge two clusters if the difference between them is too big with respect to a specific feature', so that even if the pair of syllables (in case C) is really tight, SA+ won't join their clusters, if the centers of those clusters are too far away with respect to one feature. For example, you can tell SA+ to refuse merging two clusters if the duration difference between them is more than 0.5 MAD (which is 36ms in the zebra finch). If this 'distance gap' remains empty of other clusters, those two clusters will never merge.
  

Introduction to DVD maps Introduction to DVD maps
Introduction to DVD maps
Previous Top Next

The online analysis of vocalization and the automatic generation of the syllable table is the corner stone of SA+, and one of the most useful descriptive model of such tables is displaying dynamic vocal development (DVD) maps, showing vocal changes as movies. You should keep in mind, however, that segmentation of sound to syllable units is a double-edge sword: it can uncover song structure if properly used, but it might also inappropriately chop the sound into inappropriate units. This is not only because the criteria for segmenting sounds are not always robust, but mainly because the deep structure of vocalization bout is not trivially related to syllable units. The binary feature files that are automatically created during the live analysis keep record of non segmented (continuous) feature curves. SA+ does not provide a means of explicitly analyzing these data, but you can easily export them to Matlab.  

Introduction to processing live Introduction to processing live
Introduction to processing live
Previous Top Next

The automated sound recognition system


A brief review of the SAP2 Recording & Live processing functionality

Automated animal-voice recognition is a principal feature of SAP2, making it possible to record and analyse an entire song development. Please read this section carefully and make sure you understand the theory, practice, and limitations of the automated sound recognition utilities.

We first present the computational framework in a nutshell, step by step. Steps 1-3 take place at the Recorder, whereas steps 4 take place at the processing live module.

1. The SAP2 Recorder captures sounds from an audio channel into a memory ring buffer. It examines sound amplitude in real-time, and if the amplitude is higher then background noise level, a recording session starts. Recording to a temporary wave file continues until sound amplitude is below threshold for some time.

2. Immediately after the temporary wave file is saved. However, while recording, SAP2 keeps monitoring sound amplitude and records stops when sound amplitude stays below threshold background for a certain duration (say 1s). This first phase procedure is the only ‘real’ real-time component of SAP2 Recorder.

3. The recorder then makes its final decision if to save the file, or delete it, based on the number of peak amplitude events observed during the recording. This stage eliminates long silence intervals and very short clicks. The wave file is now moved to the input folder of the Sound Processing Live module.

4. The ‘Sound Processing Live’ application now takes over. Note that we separated the recorder from the SA+ application. This is done to ensure that recording will persist no matter what happened during the later stages of analysis. Once a sound file has been forwarded to analysis, it is captured by the SAP2 processing within several ms. Since this off-line processing occurs almost in real-time, it is called pseudo-real-time analysis.

5. The module first performs multi taper spectral analysis of the recorded sounds, and extracts song features.

6. Based on amplitude envelope and on Wiener entropy values, the sound is segmented the sound into syllable and bout units.Additional "noise detector" filters might be used at that stage to eliminate sounds that are not species typical. 

7. A final decision if to accept of reject the sound is made based on the number of syllable, their duration, and the bout duration.

8. If sound was rejected, the file is deleted.

9. Otherwise, the module save all or some of the following:
            - The wave file (under a new name)
            - A table of syllable features
            - A table of raw features (every 1ms)

 


Based on this analysis (that occurs about 10-20 times faster than the real-time progression of the sound) SA+ decides if to save the sound or not.
  

Introduction to Sound & Brain Introduction to Sound & Brain
Introduction to Sound & Brain
Previous Top Next

  

Licence
Previous Top Next

Sound Analysis Pro version 2 is provided under the terms of

GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 

Note: Sound Analysis Pro is provided “as is” with NO WARRANTY. However  please  feel free to email ofer@sci.ccny.cuny.edu if you have any questions, suggestions or concerns.

live
Previous Top Next

graphic

live_1 live_1
live_1
Previous Top Next


10 recording and 8 analysis channels

graphic

Looking at the Tables in MySQL Control Center

Looking at the Tables in MySQL Control Center
 
SAP2 can auto-generate all the tables if they are not found in the database. All the SAP2 tables are located, by default, in a database called SAP. This database location is c:\mysql\data\sap. Changing the location of is database (e.g., to another drive) is not a trivial task. To simplify it, we provide a separate application: "change_mySQL_data_place.exe". You can find it in the c:\sap\ folder. To use, insert the destination folder
graphic
and click "do it!". Note that the table located in the original data\mysql\ folder might be essential for the mySQL server to work, and you might need to move this folder to the same location. You can access the tables via Matlab or via Excel, but the native table-browser is the

MySQL Control Center (MCC). If you do not like MCC, an alternative, and more user friendly alternative is called MySQLyog (http://webyog.com/en/). When opening MCC for the first time you will need to start a new session (just give it a name keeping all the default settings) and then you can “connect” and observe the databases. Double click the SAP database to observe the tables:
graphic

master-slave recording master-slave recording
master-slave recording
Previous Top Next

graphic

Mean Frequency

Mean Frequency
 
Formal definition: Mean frequency in Hz is calculated with both:
as the time-derivative, and

as the frequency-derivative of frequency f.


Note that the mean is of the squared spectral derivative. A similar result could be obtained with power, but empirically, we find the derivative to be


The mean feature replaces the peak frequency.


Note that in contrast to peak-frequency, mean frequency provides a smooth estimate of the center of derivative power. In contrast to peak frequency, mean-frequency does not ‘stick’ to any frequency trace, as shown in the example above. 

Mean Frequency Amplitude

Mean Frequency Amplitude (MFA)
 
This corollary (minor) feature calculate the amplitude at a dynamic range of 500Hz around the mean frequency. MFA is sometimes useful for sub-syllabic segmentation because it changes its value sharply during some sub-syllabic transitions. Also, the gap between amplitude and MFA (use ‘group features’ to observe them together) sometimes changes in an interested manner during the song and during song development.

mySQL tutorial for SAP2 mySQL tutorial for SAP2
mySQL tutorial
Previous Top Next


mySQL Tutorial for SAP2

We will start by creating a very small Syllable Table. Open SAP2 'explore & score' and click 'New table'. Call the table table1. Now set the amplitude (yellow slider) threshold to 42Db.  Next, click 'open sound' and open the sound 'bird102_long.wav' (you may download this file from http://). Select graphic and then outline the entire sound (click to the left of the first syllable and then move the scroll bar below all the way to the right before clicking at the end of the last syllable). You should now see 45 records (syllables) in the table. 

7c. The SELECT keyword


Once you understand how to use SELECT keyword, you can already achieve a lot. Open MCC and navigate to sap database, and then click the SQL button or open Matlab (after installing the mySQL interface), open sap as shown above and then type
MCC:  select duration, mean_pitch from table1; and click !
Matlab: [duration pitch]=mysql('select duration, mean_pitch from table1');

In MCC you should see the result of your query like this:

graphic

If you use MCC select the entire output table, click Control-C (to copy the fields to the clipboard) and then open Excel and type Control-V to paste. Click data->Text to column and choose delaminated and then 'next' and in 'other' type '|'

graphic
And then click finish. You can now create a X-Y plot of duration versus pitch.

In Matlab, type
plot(duration, pitch, '.r');


graphic

Although the sample size is really small you can already see that some of the data are clustered.  Let's query to obtain data that belong to one cluster, for example we can ask for data where duration is between 100-150ms and pitch is between 600-1100Hz:

MCC: select duration, mean_pitch from table1 where duration>100 and duration<150 and mean_pitch>600 and mean_pitch<1100;
Matlab: [c1_duration, c1_pitch]=mysql('select duration, mean_pitch from table1 where duration>100 and duration<150 and mean_pitch>600 and mean_pitch<1100;');
  

oscilloscope oscilloscope
oscilloscope
Previous Top Next

graphic

Batch modes: Pairs and MxN modes Batch modes: Pairs and MxN modes
Pairs and MxN modes
Previous Top Next


Score by pairs, and M x N modes


These modes are very similar. In both, the entire content of each sound file is compared to the other, as if you opened the two files and score them as a whole, without any outlining.  Use the Drive, Folder and File-list boxes to navigate to the folder that contains the sound/s of interest. Highlight the file/s of interest and then highlight either Sound 1 or Sound 2 by clicking on it. Then click the Add button and the file should move to the highlighted box as shown bellow. 

graphic

You can add the same file as many times as you like, and the selecting of file is either by click or by click and drag. If the number of files in the two lists is not the same, batch will stop when one list is at end. In the batch mode the results are automatically stored into a table. In contrast to the syllable tables, SA+ only keeps a single similarity table. You should empty the table in the beginning of analysis and save its content to Excel in the end of the batch.  In addition to saving the scores in the table, SA+ also show a display of the scores in the edit box. This display is only for visual feedback, but if you wish, you can highlight this text, copy it to the clipboard (ctrl-c) and paste it to another application (ctrl-v).

  

Performance control Performance control
Performance control
Previous Top Next

Idle Interval: As the Sound Processing Live procedure handles input sound files from multiple channels, and in order to maximize the speed of the frequency analysis, this module might be less responsive to user input. The idle time between processing gives the module some extra time to respond to your input. Increase it value if you want to improve responsiveness. If you set and idle time of 100ms, SA will wait 100s after finishing the processing of any one sound. The faster your PC, the less idle time needed.
graphic
Slow Down Execution: Depending on your system, demands imposed by the Sound Processing Live module might interfere with the Recorder module. In particular, the playbacks regimen might crush if the processor is 100% busy with analysis for too long. To overcome this problem, you should slow down the execution of the live analysis. This is done by halting execution for very short durations during the spectral analysis. You want to decrease processor load to about 60-70%. To do so, start the Windows Task Manager (control-alt-del) and click performance.
graphic

Then Start the Live analysis while the SA Recorder is running, and watch the CPU usage during the analysis (as sonograms appear on the screen). Then increase the 'Slow-down execution' value until you see that the peak CPU usage is down to 80% or less.

Period of repetition Period of repetition
Period of repetition
Previous Top Next


Period of repetition


Period-of-repetition is somewhat similar to the concept of a song motif in that it is a repeated unit; it is an estimate of the typical interval duration between two consecutive renditions of similar sounds. A song motif, however, may contain repeating sounds, so the two measures may differ (SA+ does not calculate motif duration -- only period). Earlier we presented the similarity score as a mean of evaluated similarity between two different sounds base on the Euclidean distance between their features. The same idea can be applied for finding similar sections within a single interval of sound (typically, a long one): We start by choosing a frame of sound (insert point) at a random location within an interval of sound outlined by the user and observe the features in a 70ms window around it. A 'silence' is not allowed to be an insert point. We first need to depart from the insert point by moving the window forward in time until we encounter sound that is different enough from the original sound to meet a similarity rejection threshold. The default value is 75% similarity. We then keep sliding the window until we find a sound that is similar to the insert point to meet a similarity acceptance threshold. The default is 95% similarity. To obtain a stable estimate of the typical period, SA+ examines a random sample of 50 random insert points, and measures the interval between each insert point and the next rendition of a similar sound. The output is the median of those measurements; that is, it is an estimate of the typical period of sound repetition. To obtain a good estimate of the time interval between two similar renditions you will need to outline a relatively long sound interval. We cannot tell you exactly how long the interval should be - but it should contain several renditions of syllables, e.g., 10s is reasonable for most bird songs. You can combine several song bouts into one file and then calculate the period. The stability of the measurements should be validated experimentally (e.g., try it on different samples of sounds and make sure that the results are similar).
To sum up:
Period= a typical interval between two occurrences of 'the same' sound, starting from a random insert point (anywhere in the outlined interval).

Here is the algorithm:

1.    Period is calculated by implementing the similarity measurement (see next chapter) as follows: Starting from a randomly selected frame within a 10s epoch, the time frame is first moved forward from the point of origin until it encounters a frame that is less than 75% similar to the sound at the selected frame, so as to depart from the original sound. The time window is moved further forward, until it encounters a sound that is at least 90% similar to the original frame. This gives an instantiation of the time elapsed between two similar sounds. This procedure is repeated 50 times, each time starting from a different randomly selected frame within the 10sec epoch, and the median of the durations thus identified is a measure of periodic structure in the song. We call this the median period.

Median period is calculated as follows:
1.    Choose a random insert point in a sound interval outlined by the user, and make sure that the insert point contains sound signal. Define a reference window r of 50ms time window around the insert.
2.    Set a similarity rejection threshold probability, say PTH=0.75
3.    Define a second 50ms time window b=r.
4.     Move b one frame forward in time.
5.    For each feature, calculate the Euclidian distance across the sequence of that feature in r and b, frame by frame, starting from the first frame f1 until the last frame fn:graphic
6.    Transform Ds to the p value of achieving such similarity by chance alone (for detailed description of this transformation see Tchernichovski et al. Animal Behaviour (2000) 59, 1167-1176).
7.    While (Ds>PTH) repeat steps 4-6. // Keep moving the window b until it is not similar to the r any more.
8.    Set a similarity acceptance threshold probability, say PTH=0.92
9.    While (Ds<PTH) repeat steps 4-6. // Keep moving the window b until it is similar enough to r.
10.  Period is the time difference between r and b.
11.  Repeat steps 1-10 50 times and calculate the median period.

 


In addition to the period measure, SA+ also calculates 'random period' measure: here instead of progressing from the insert until we find a match, SA+ makes a random search in the outlined interval, until it finds a matching sound. In the ideal case of fully periodic syntax, e.g., 2,3,4,5,2,3,4,5,2,3,4,5… and given an insert of 5, the odds of obtaining 5 in each random draw is 25%, and the mean random period is 4, which is identical to the period. However, if the sound syntax is 2,2,2,3,3,3,4,4,4,5,5,5,… then the random period will remain 4, but the median estimate of period will be much shorter than the random period. 

Note: both Period and Random-period are based on random sampling of the data, and will never give you the same answer twice. You must compute them several times and make sure that the estimates give you a stable central tendency.  
  

Pitch

Pitch
 
Formal definition: Pitch is measured in Hz or cycles per second. We calculate two pitch estimates: harmonic pitch and mean frequency.
Mean frequency is defined above whereas:

graphic

Harmonic_Pitch is calculated as cepstrum peak (calculated upon the spectrum of log spectrum) but instead of log spectrum we use the derivative spectrum. At any given time, pitch might be either harmonic, sinusoidal (whistle) or not well-defined. In the two later cases, mean frequency provides an appropriate pitch estimate.
Hence, at time window t, we calculate pitch according to three threshold parameters T1, T2, T3.

Pitch t= mean frequency
IF harmonic estimate is higher than T1
OR if Wiener entropy is lower than T2 & Goodness of pitch is lower than T Otherwise Pitch t = harmonic pitch


Note that  harmonic pitch is typically low, hence we reject harmonic pitch estimates that are higher than the natural range (e.g., in zebra finch harmonic pitch rarely approaches 2kHz).



Pitch is a measure of the period of oscillation. It is the only feature that requires careful adjustments based on three parameters. Tonal pitch, as in a whistle, is simply THE frequency of the sound, whereas harmonic pitch is an estimate of the fundamental-frequency of a complex sound composed of harmonically related frequencies. The fundamental is equivalent to the typical frequency difference between consecutive harmonics (the common denominator of the harmonics). The main challenge is the automatic distinction between tonal pitch and harmonic pitch: this distinction is species-specific and might vary with recording conditions as well. Nevertheless, pitch is a central feature of song, and with careful adjustments, one can often obtain a good estimate. Furthermore, when we calculate the mean pitch of a syllable, we adjust the weight of each time window by the goodness of pitch, which often stabilizes the mean pitch estimate of that syllable type.

SAP2 distinguishes between the two based on three considerations: first, harmonic pitch is often frequency bounded, e.g. in zebra finches we rarely see harmonic sounds with a fundamental higher than 1800Hz. Therefore, we reject a cepstral estimate higher than this threshold and prefer the mean frequency estimate.  Second, if the goodness of pitch is very low, pitch is unlikely to be harmonic. Third, if both goodness of pitch and Wiener entropy are low, pitch is even less likely to be harmonic. You can manipulate those parameters in the options.

Playbacks Control Playbacks Control
Playbacks Control
Previous Top Next

Playbacks Control:

Playbacks are often used in birdsong research. We developed the playback controls primarily to allow automatic song-training, but it can be used for other purposes as well. We train our birds with operant song playbacks triggered by key-pecks, or by passive playbacks, randomly triggered. The keys are represented by the circles. A yellow circle indicates that the key is in an off position, red indicates that the key is pressed down (on), and blue indicates “not in session”.


graphic

  

Introduction Introduction
preview
Previous Top Next



 

8a. Introduction to similarity measurements



The similarity measurements implement the same feature-based metrics of Euclidean distances used for the cluster analysis, except that here we do not look across thousands of syllables, but rather at the time course of feature values across two specific sounds. Similarity measurements should not be used for comparing simple sounds (such as two tones), in such cases, differences across mean and range of features is preferred. Similarity measurements are necessary for comparing two complex sounds, e.g., a pair of songs or several pairs of complex syllables. Although SA+ segments sounds to syllables, this segmentation is not used as a unit of similarity analysis, that is, we compare everything to everything in the two sounds regardless of syllable structure.

Limitation: Similarity measures must be tuned to each species. The current version is tuned to zebra finches. SA+ makes it very easy to set and save feature scales to other species. Setting the feature scale for a new species is easy. First, set all the sliders to the new position and then click 'save new scale' and type the new species name. You can then browse between the settings by clicking the up and down arrowheads. Note that SA+ will always start with the default setting (zebra finch) and you should remember to set it each time to the appropriate species. 
graphic

The aim of analysis is to address three issues:

·     Assessing the likelihood that two sounds are related to each other.
·     Quantifying the accuracy of the vocal match (assuming that sounds are related).
·     Accounting for the temporal order (syntax) of sounds when scoring similarity.


Similarity measurements in SA+ are quite different than those used in previous versions of Sound Analysis:

·     Similarity measurements are more accurate and more transparent to the user.
·     Both symmetric and asymmetric similarity measurements methods are implemented. 
·     Partial similarity scores are shown for each syllable.
·     Users can open two long recording sessions (of a few minutes each) and perform partial. similarities much more efficiently without draining memory.
·     New features include amplitude modulation (AM) and goodness of pitch. We eliminated the spectral continuity feature
·     Automated batching provides a link between cluster analysis and similarity measurements, allowing a fully automated measurement of thousands of sounds, which are automatically opened, outlined and scored.
·     The memory management scheme has been altered to reduce memory allocation error and optimize the section detection.

  

Principal Frequency

Principal Frequency
 
For each time window ti, we call the longest contour (over time) principal frequency. Principal frequency often capture sub syllabic transitions because it will often remain continuous within a note, but change abruptly during transitions.

Raw features DVD maps Raw features DVD maps
Raw features DVD maps
Previous Top Next

  

recorder recorder
recorder
Previous Top Next



graphic

Running the batch Running the batch
Running the batch
Previous Top Next

Step 7: Running the batch
                                                                                                                        
Your should now be ready to run the batch. Click "Do It All" (if disabled, it is probably because you did set a bird in step 4).

Running the batch will automatically switch display to the "batch progress" tab:
graphic

This display gives you important information about the batch. The status indicator should turn red within a few seconds. This means that the batch is actually working. Once the batch is done this indicator will turn green.

The other information is self explanatory, keep monitoring it occasionally, making sure that table names are correct. E.g., we can see that the raw features table name raw_R109_95 indicates that this bird was 95 day old at the time of recording, confirming that the hatching date information is correct.
graphic

  

Saving the data Saving the data
Saving the data
Previous Top Next

Based on amplitude envelope and on Wiener entropy values, the sound is segmented into syllable and bout units. A final decision if to accept of reject the sound is made based on the number of syllable, their duration, and the bout duration:
graphic

The criteria for saving a sound file (and features) is a combination of these conditions: Note that longest syllable && bout duration are determined in part by the amplitude and entropy thresholds. Make sure that those threshold are appropriate by observing the light-blue && red lines at the bottom of spectral images.

Segmentation to syllable units

Segmentation to syllable units
 
One of your most important decisions when analyzing vocal sounds is choosing methods and setting parameters for distinguishing vocalization from background noise. At the macro scale, it allows you to detect bouts of vocalization, and at shorter time scale, to segment the sound to syllables (vocal events with short stops between them) and silences.

Even though some analysis can be done on the continuous signal in file, once vocal events are identified and segmented it is possible to do much more, e.g. identify units of vocalization, classify sounds and compare them by similarity measurements and clustering methods.

In this chapter we focus on non-real-time analysis, but similar approaches of identifying vocal sounds are also used in real-time analysis during recording. However in real time, we usually need to make intermediate steps to give way to higher priority processes (the recording itself), so that Sound Analysis Recorder first makes crude decision what sound should be temporarily saved and, a few seconds later, the live-analysis engine make proper segmentation and decides what sound files should be processed and permanently saved to specific folders.

In SAP2, detection of animal sound is based primarily on amplitude envelop. However, certain spectral filters can be set to reject noise or band-limit the amplitude detection. We offer the following approaches:
  1. Use a fixed amplitude threshold to segment sounds
  2. Use a dynamic (adaptive) amplitude threshold to segment sounds
  3. Write your own query for custom segmentation based on various features
  4. Export raw feature vectors to Matlab and design your own algorithm there
In this chapter we only cover approaches 1. and 2. Approach 3. is documented in the batch chapter, and Approach 4. in exporting data.

Using a fixed amplitude threshold to segment sounds - One of the simplest and most widely used methods for segmenting sounds is by a fixed amplitude threshold:
Open Explore & Score, Ensure that “fine segmentation” is turned off (see Fig 1 below)

img_002
Fig 1: Fine Segmentation "off"


Open your sound file or use Example1 (found in the sap directory) and then move the amplitude threshold slider (the one closest to the frequency axis) up to about 43Db:

img_004
Fig 2: Amplitude Threshold Slider


The yellow curve shows the amplitude, and the strait yellow line is the threshold. Amplitude is shown only when above threshold. Syllable units are underlined by a light blue color below them, and bouts are underlined by a red color.

Note segmentation outlines at bottom of sounds:

img_006
Fig 3: Segmentation outlines

Additional constraints on segmentation can be set, so at to reject some sources of noise. Here is an example:
Set the “advance window” slider to 2ms, and set the amplitude threshold to 30Db. Open example3:

img_007img_010
Fig 4: Frequency of syllables


As shown, the last 3 ‘syllables’ are actually low frequency cage noise. Move the mouse to just above the noise level while observing the frequency value at the Features at Pointer panel (see red arrow). As show, most of the noise is below 1500Hz, whereas most of the power of the syllables is above that range.
We are not going to filter out those low frequencies. Instead, we will use this threshold to make a distinction between cage noise and song syllables: Click the “Options & Settings” tab. Turn the high pass noise detector on and change frequency to 1500Hz:

img_012
Fig 5: High Pass - Noise Detector

Go back to sound 1, and click update display below the sonogram image:

img_014
Fig 6: Noise - No longer detected


Note that the most of the noise is no longer detected as vocal sound:

img_016
Fig 7: Noise isolated from vocal sounds


This filter does not affect any analysis of the remaining vocal sounds. This is because we set the noise detector filter as an additional criterion (on top of the amplitude threshold) to eliminate ‘syllables’ where at more than 90% of the energy is at the noise range.
There are several other controls that affect segmentation indirectly. Those include the FFT window, advance window, the band-pass filters on feature calculation, etc.
Here is an example of using the band-pass filter: turn the noise detector off and update the display so that the noise is once again detected as vocal sound. Then move the right sliders as shown
img_018
Fig 8: Noise isolated from vocal sounds

Now click update display:

img_020
Fig 9: Noise isolated from vocal sounds

And the outlines under the noise that is below the detection band should disappear. Note, however, than now all features for all syllables are only computed based on the band-pass filter that you set. Namely, frequencies outside the band are ignored across the board.
  1. Segmentation by a dynamic amplitude threshold
One limitation of static amplitude threshold is that when an animal vocalizes the “baseline” power often change as vocalization becomes more intense. For example, open the file “thrush nightingale example 1” with 3ms advance window and 0 amplitude threshold. Let’s observe the amplitude envelope of this nightingale song sonogram:

img_022
Fig 10: Noise isolated from vocal sounds

And let’s also look at the spectral derivatives, and a certain threshold indicated by the black line:

img_026img_028
Fig 11: Noise isolated from vocal sounds


It is easy to see that no fixed threshold can work in this case (see arrows). To address this, turn “fine segmentation” on. A new slider – called Diff – should appear between the amplitude threshold slider and the display contrast slider. Set it to zero (all the way up). In the fine segmentation box (bottom left of the SAP2 window) set the course filter to 500, fine filter to 0, update display and click filters:

img_030
Fig 12: White cureve - coarse amplitude filter, black line - fine filter, and segmentation


The white curve shows the coarse amplitude filter, which is the dynamic (adaptive) threshold. The black line is the fine filter, which is the same as amplitude in this case. The segmentation is set by the gap between them, where diff=0 means that we segment when the black line touches the white line, namely vocal sound is detected when the fine filter is higher than the course filter.
We can see that all syllables are now detected and segmented, but there are two problems:
  1. The diff detect low amplitude sounds, but also falsely detect small changes in background noise as sounds (look at the beginning of the file).
  2. Segmentation to syllables is often too sensitive and unreliable because each small modulation of amplitude may case segmentation.
A simple way of avoiding false detection of silences is to impose some minimal fixed amplitude threshold on top of the filters. To do this, set the Db threshold to 24:

img_030
Fig 13: No more silence is detected

As shown, no more silences are detected as sounds.
To decrease the sensitivity of segmentation we can use two methods. One is to make the Diff more liberal – allowing the detection of sounds even when the fine filter is slightly below the coarse one. Set the diff to -2.5 gives this result:

img_034
Fig 14: Setting the "diff filter to -2.5"

It is often better approach is to set the fine filter a bit coarser. For example setting fine filter to 5, keep course filter at 500, and setting the diff slider to -1.5 gives this segmentation:
img_036
Fig 15: Sound with fine filter set a to coarser setting

As shown, we have achieved a rather reliable segmentation despite the wide range of amplitudes in this song.

segmented comparisons segmented comparisons
segmented comparisons
Previous Top Next


Segmented comparisons


SA+ provides several improvements to the ‘segmented comparisons’ window. First, it is now very easy to move (slide) sounds in reference to each other. Clicking just below the sound moves the onset to the mouse position. Double clicking below the sound shifts to ‘sticky mouse’ mode, allowing continuous sliding of the sound. Double clicking again below the sound will release the image.
graphic


graphic
The next feature you may find useful is auto-alignment: outline an interval in the top sound and then in the bottom sound. You will see that the two outlined intervals have moved to provide perfect alignment. This works nicely using the ‘auto segment a single syllable’ mode (you can set it in either ‘sound 1’ or ‘sound 2’ window). You can then open two song bouts and align syllables to each other in a single click while monitoring changes in the ‘distribution similarity’ score.

Feature distribution scores: based on the MAD table of syllables (see Options -> Feature Normalization -> Normalization of syllable features) we can calculate a distance score, just as we did at the level of frames. When outlining arbitrary intervals of sound, SA+ scores the similarity between them as if they were syllables, which is not very meaningful as a p-value, but still, it can be used as a non-metric score (just as human score is). To obtain a more generic statistical distance estimate, in a way that is not related to any assumption about the distribution of syllables (that is, without any normalization), the Kolmogorov Smironov statistic should be used. 

Kolmogorov-Smirnov statistic: is simply the distance between two cumulative histograms. For each given interval, we can calculate the cumulative histogram of feature distribution and compare it to the cumulative distribution of the same feature in the other interval. KS values are unit-less and additive across features. The KS statistic is simple, valid for all species and has very little assumptions in it (in contrast to all other methods we use to score similarity). It does have one major disadvantage, which is – it uses the interval chosen to estimate some ‘momentary’ distribution of features, but this estimate is not robust for short intervals because in a time series of feature values, each feature value is strongly correlated with other feature values (those that occur just prior to, or after it) and this problem cannot be solved by reducing the overlap between features. Therefore, the KS statistic is only meaningful statistically when the two intervals chosen include several syllables. In this case too, you can use the KS statistic between syllables as a non-metric estimate.

SELECT and CREATE tables SELECT and CREATE tables
SELECT and CREATE tables 
Previous Top Next

SELECT and CREATE tables 


It is sometimes useful to create new tables from subsets (queries) of an existing table (or even by consolidating data from several tables). We will cover this subject briefly here:

The simplest approach is demonstrated in this query:

create table b109cluster4 select * from  b109clustered where cluster=4 limit 9999999;

creates a new table, called b109cluster4, that includes only records of syllable type 4. Note that if you do not use the limit keyword, mySQL will create a table using some default limit (usually 1000 records) so we advise that you always use limit.

Now, a slightly more sophisticated case is when you need to create a new table from certain fields, combining data from a few different tables. For example, say that you have a raw feature table that is a bit faulty (my_faulty_table) and you suspects that some of the records in that table has Wiener entropy values which are positive (Wiener entropy must be negative!). Now detecting those records is easy:

 SELECT entropy FROM my_faulty_table where entropy>0;

Of course, you can now create a new table excluding the faulty records:

create table my_right_table  SELECT * FROM my_faulty_table where entropy<0 limit 999999;

However, you also want to identify the wave file that those faulty data belongs to, so as to figure out what happened.  The problem is that the name of the wave file is not included in the raw features table. In that table you only have the file_index field. This field, however, is pointing at the appropriate wave file name in another linked table, called file_table. This data structure saves much space since we have thousands of records from each file in the raw features table. So we now need to join information across those related table, and mySQL makes it very easy. All you need to do is identify the table where the field comes from by the syntax table_name.field_name.

For example, the query

SELECT entropy, file_table.file_name FROM my_faulty_table, file_table where entropy>0 and my_faulty_table.file_index=file_table.file_index limit 99999;

returns the positive entropy values side by side with the file names that relate to those values.

The simple examples above demonstrated how a new table that include only some of the table records (row) and fields (columns) can be generated from an already existing table using the Select command combined with other commands such as create table.

  

Self-similarity test Self-similarity test
Self-similarity test
Previous Top Next


Self similarity test


graphicgraphicgraphicIt is often useful to compare the similarity of a sound to itself and see similarity of 100% (or at least close to this). There are several reasons that the similarity estimate might deviate from 100%, some are important and others can often be ignored.


In the example on the right, we see a nice 100% diagonal similarity across all but one syllable. Note the diagonal red dashed line: it turns black during silence intervals but this is one occasion, where we see a small white section on the diagonal.
This section is white because it has been rejected for being too short. Reducing the min duration with the slider will eliminate this problem, but at the cost of adding ‘noise’ of many short (and mostly meaningless) sections. Therefore, what you should have in mind is that optimizing the similarity contrast is more important than reaching 100% (as opposed to 95%) during self-similarity tests.

Another issue is setting the amplitude and Wiener entropy thresholds: if those settings differ across Sound 1 and Sound 2, self-similarity might deviate from 100% but usually not by more than 10%. Any self similarity test that gives less than 90% similarity requires a careful investigation of what went wrong.
  

Setting Excel Access
Previous Top Next



Setting Excel Security to allow SAP2 access

In order to be able to export tables into Excel spreadsheets you need to set Excel to accept data from SAP2: open MS Excel, click Tools -> Options and select the security tab.


graphic


Click Macro Security and set the options as shown below:

graphic
graphic


For these settings to take effect you might need to restart your computer.


  

Setting Keys & other detectors Setting Keys & other detectors
Setting Keys & other detectors
Previous Top Next







Setting keys & other detectors
graphic

  

Setting National Instrument Cards
Previous Top Next

Our results suggest that operant training facilitates song imitation in zebra finches compared to yoked trained birds. We are still working on improving the training regimen, and users should consult our website (http://ofer.sci.ccny.cuny.edu) for the most current updates. In particular, we found that replacing the keys by strings, and adding mirrors in specific locations might facilitate vocal learning. At this time, we are using a key-pecking system as our standard, as documented below.

Digital I/O card: if you want to set an operant training you will need a National Instruments PCI-6503 I/O card. This is the least expensive NI card, and it is great. You will have to install the NIDAQ-traditional driver of this card. By the NB1 Ribbon Cable (1m) or  NB1 Ribbon Cable (2m) part number (180524-20) and the Block Screw Terminals 777101-01 CB-50LP - Unshielded

graphic   graphic




Install the card (follow instructions!), connect the cable to the screw terminal and then connect the screw terminal to the keys:

Keys: Cherry 1g lever key can be

Keys: you may use any lever key, we had good experience with the Chery 1g lever keys. Use a standard low voltage electrical cable to connect them. Connect to the NO (normally open) connectors. You may join all the grounds together, as long as all of them are connected to the same NI card.

Screw terminal configuration: The screw terminal has even numerals (2-50) at the back row (closer to the ribbon cable connection). Those are all grounds. The distant row with the even numerals has the actual channels. The channels are divided into three ports (port 0, 1, 2) and each port has eight lines (lines 0-7). Number 47 in the screw terminal corresponds to port 0 line 0. the next screw to the right, number 45, corresponds to port 0 line 1, and so on all the way to number 1, which corresponds to port 2 line 7. First, connect each terminal screw to the to the appropriate device. For example, port 0 line 0 to key 1 of training box 1, port 0 line 1 to key 2 of training box 1, port 0 line 2 to key 1 of training box 2, and so forth. , line 2 to key 2, etc.  Input wires such as lever keys should be connected via a long lamp-cable (polarity does not matter in this case). Now you need to connect each device to the ground. All the even channels are common ground – connect whichever way you like. No resistor is necessary using this configuration.

graphicgraphic


graphic



Now you should test your system (before connecting the other keys). Turn you computer on and start the NIDAQ instrumentation panel. Double-click on the devices icon and you should see the PCI icon. Right click on it and choose ‘test panel’. You should see a panel with buttons. Set the top row as input port 0 and the bottom row as input port 1. If the buttons of port 0 are not red, click ‘generate output’. Make sure that the port is set to 0 and input, and then, while looking at the panel, click on the key. You should see that the first gray button has turned gray – you just activated line 0. Every time the bird pecks on a key, when a motion detector is activated and so forth, SA+ will capture the event via one of these lines and will respond appropriately.

From here an on you continue with every other wire: connecting wire 6 to a key, and then back to the common should activate line 1, connecting wire 8 will activate line 2, and so forth. Keep track of your connections; you will have to remember them when setting up the input/output configuration of the SAP2 recorder.

Installation of  National Instruments analog card to work with SAP2:
SAP2 allows recording via most NI analog data cards. You will have to set up a data neighborhood for each channel and then use them to record with the SAP2 recorder just as you do with sound cards. This is particularly useful for recording combined sound and electrophysiology data. Contact ofer (ofer@sci.ccny.cuny.edu) for more details about how to install it.
  

Setting the recorder: step 1 Setting the recorder: step 1
Setting the recorder for the first time
Previous Top Next

step 1 -- set channel Identification


Note: If you are just trying the recorder, skip stage A, scroll down to B, and type a fake name in the configuration menu.

A. If you are starting an experiment, click the Animals tab and then click new (for a new animal) or change (to switch to animal you already recorded from). This will set the animal name, age, and everything else you need to specify.
graphic

Note that all the SAP2 modules can access the animal information you just inserted, so you only need to do this once  per animal.

 

B. for each channel you plan to use, click graphic and go carefully through the options:
Setting Identification & Mode: type a bird name and check "Active" as shown below.
graphic
just to make sure, set the visualization as shown below
graphic


Go to the Next Stage ->


Or, below, you can find additional information (you probably do not need to know this):

ID: this is the internal identification of the channel. Normally you will not want to change it. Again -- do not change it unless you want to save different configurations for the same channel.
Warning: Changing “ID” instead of “Name”, and then graphic will reset all channel options

Bird Name: The recommended way of setting an animal name is not via this settings but via the “Animals” tab. For now, we will set a temporary animal name just to allow testing. The name of the animal - it can be either a string or a number. No spaces are allowed. Feel free to change it whenever you want and it will have an immediate impact on the wave file names (each wave file created by the recorder starts with the bird's name).

States & Modes

Each channel can record in one of the following modes:

Active: The channel record sound when triggered by an appropriate input.  

Monitor only: The channel display the sound but does not record or triggered other channels.

Direct recording: The channel behaves like a simple tape recorder (remember those devices?): it turns on by clicking start and off by clicking stop. It records files of certain duration, e.g., a new file every minute.

Pre-Buffer: Do not worry about this for now. It set how many samples are recorded prior to triggering. Setting it to 44100 will record sound starting 1 second before triggering occurred (e.g., about 1s before the song begins).
Slave: Recording is triggered by another (master) channel. This is useful when recording combined sound and electrophysiology. Note that the sampling and accuracy quality of most sound cards is good enough for capturing single and multiunit activity. For EEG recording you must take into account that low frequencies (2Hz and lower) are not captured by most sound cards.

Note: in order to automatically save recorded data on a slave channel, you must check 'slave' in the Sound Processing Live module of this channel. Otherwise, the content of that channel will be judged separately than that of the master. For example, if the slave channel is used to capture neural data, you want those data to be coupled to the sound so that a decision to save or delete the sound (master) channel will apply to the slave channels.
  

Setting the recorder: step 2 Setting the recorder: step 2
Setting the recorder: step 2
Previous Top Next

Step 2 -- setting the input channels


Set the input channels starting by choosing the IO Module. The DirectSound module give you an access to the sound cards. The NIDAQ module (if exists) give you access to the National Instrument Data Neighborhood (you need to set it up properly first, see xxxx for details).
graphic
In the Device window you will see a list of compatible devices (the actual cards channels). Once the device is highlighted, the Format window show you the format options of the selected channel. In DirectSound devices, channels are stereo paired. Choosing mono will simply unite the two channels (usually you will not want to do that). Once you choose stereo, you can select left of right channel. For example if you use 8 channel sound card they will appear as 4 stereo pairs.

  

Setting the recorder: step 3 Setting the recorder: step 3
Setting the recorder: step 3
Previous Top Next

Step 3 -- setting the output channels


If you plan to play sounds in your experiments, you should set the output channels. Settings is rather similar, but not identical, to setting inputs. 

graphic

As before, select an IO Module and a Device, and then (most likely) you will see only one format option because Windows treat channels as stereo pairs, allow you to pan left (to play sounds through the first channels, right to play through the other channel, and Center allows you to play through both. Do not worry about the other settings for now. 
  

Setting the recorder: step 4 Setting the recorder: step 4
Setting the recorder: step 4
Previous Top Next

Step 4 -- setting the sound storage folders


Once (actually a little big before) sound recording is triggered, the SAP2 recorder start saving a temporary wave file into the hard disk. This storage folder (temporary file path) must exist, otherwise, nothing will be saved ever. Once recording is triggered off, the recorder “decides” to either delete the temporary file, or to move it into a “Complete files path”. Note, however, that “complete” from the point of view of the recorder, is the input folder of the “Live” module, which perform spectral analysis and made a final decision about the storage. By default, the path should look like this, more or less:   

graphic

If you have two hard disks in your computer, it is recommended to use the other hard disk (D) for both temporary and complete storage, so as to leave the system hard disk (C) protected.

Note: it is easier to test the Temporary and Completed folders from the "Recording" tab of the main window: 

graphic

Setting the recorder: step 5 Setting the recorder: step 5
Setting the recorder: step 5
Previous Top Next

Step 5 - setting the recording parameters


The default mode of SAP2 is sound-amplitude triggered recording.  You should decide what parameter values would provide you with the best results. You can set those parameters in the Recording tab:

graphic

SAP2 keeps recording into a memory ring buffer, which allows you to have a recording session starting prior to triggering by amplitude. For example, setting"Pre-threshold recoding duration" to 477ms will generate sound files that start 477ms before the bird started to sing.

Recording then continue for the duration specified in the "Post-threshold recording duration" -- or longer.  Each time sound amplitude is above threshold, this timer is reset to zero. In the case above, recording continues 1000ms after the trigger. If during that time trigger has passed again, this clock is reset (and recording will continue for additional 1000ms). However, when recording passed "Maximal recording duration" (30000ms in the example above), recording stops and sound file is save to the hard disk.

Once the sound was saved to a temporary folder, the recorder "decides" if to accept or reject it based on the number of times threshold was crossed during the recording session. In the example above sound where threshold was crossed 1000 time or more are saved, and moved into "stage 1 sound folder". Note that, in theory, the sound might cross threshold up to 22000 in each second of recording, and in practice even a single syllable may cross the threshold hundreds of times.

Direct recording: is for using SAP2 as a simple on/off tape recorder, you should set up recording epochs. In this case, a wave file is saved every 30seconds.
  

Spectral Derivatives

Setting the Spectral Parameters
 
SAP2 supports sounds digitized at of 44.1, and 22.05 kHz only. Opening sounds of other digital values will cause distortion. You can use GoldWave www.goldwave.com to automatically resample your sounds in a batch mode. The frequency range is adjustable in this version to 3 ranges.

You can change the contrast of the derivative display without changing the contrast of the feature curves by moving the Gain slider on the left.


Figure 1: Contrast slider at the minimum



Figure 2: Contrast slider at the maximum



This manipulation has only visual effect.


If however, the sounds you are analyzing is of very low amplitude (e.g., forcing you to increase the display gain all the way up), you may want to boost them prior to analysis.
You can do this by going to the Options & Settings tab > General Options button > Input & Output Settings tab > check "boost sound amplitude x 5".

Be aware though, that this might virtually “clip” sounds of high amplitude.

Note that changing the amplitude of wave files data should have no effect on song features except for amplitude of course, as long as data are not clipped (namely, as long as the numbers are within the 16bit range).

Spectral parameters

There are 4 parameters that you can change within the ‘Explore & Score’ module, the Frequency range, FFT data window, Advance window and Contour threshold.
Be aware that the scope of those changes is global – that is all other modules will apply this change.



Figure 3: Four parameters that you can change within the ‘Explore & Score’ module


Start by changing the size of the FFT data window. Reducing it to 5ms, and reopening the sound will present a different time-frequency compromise. Note that feature calculation & scale are affected by such manipulation.

Return the data window size to 9.27ms, and change the advance window to 0.4ms, and sound display, as well as feature calculation, becomes more elaborate. The only feature that will change scale is AM: Try identifying and comparing the first two syllables across the two displays. Change the advance window back to 1.2ms.

Similarity batch Similarity batch
Similarity batch
Previous Top Next

Overview of the similarity batch

The similarity batch allows you to perform similarity measurements on a large (even huge) data set. It has four modes:


·     Score by pairs select two ordered set A, B, of sound files, and compare A1 against B1, A2 against B2 and so forth. The number of files in A and B must be the same.

·     M x N comparisons performs all possible comparisons between the sets A and B. The two sets do not have to be of equal length.

·     Target clusters This mode reads the syllable table, find sound of specific type and compare them to a reference sound. For example, it can be used to compare sounds that belong to one cluster to a target sound (e.g., model song syllable). SA+ reads the table, detect a syllable, find the file that contains it. Outline only this syllable and compare it to a target syllable.

·     Table1 x Table2 compares all the syllables in one table to all the syllables in the other table. SA+ reads the tables, find the file that contains each syllable, and score similarity for all pairs of syllables across the two tables. For example, if table 1 contains several calls of one animal, and table 2 contains several calls from another animals, this mode will compare calls across the two animals. 
  

Singing around the clock Singing around the clock
Singing around the clock
Previous Top Next



5c. Around the clock monitoring of vocal activity


It is often useful to be able to see how vocal activity is distributed around the clock. Many animals vocalize more in certain hours, and also, the health of the animal, as well as the health of your system (bad microphones…) will reflect in the circadian distribution of sounds. Therefore, SA+ provides an automatically updated online display of vocal activity. Furthermore, it shows 6 different curves, to capture distribution of sounds of different ‘kind’.

Note: In the coming two chapters, we will present methods of classifying syllables to types, based on their feature distribution. Here we only present a first-pass, and rather arbitrary categorization of sounds. To do off-line assessment of vocal activity, we suggest you use the cluster analysis approach.

 The ‘around the clock’ page tab includes 4 graphs, one for each bird. In each graph you will find a 24h count of six syllable types. Each count is performed within intervals (bins) of 15 minutes. Syllables are automatically (and arbitrarily) categorized into the following types:





  
Type
Definition
Comments
1
Duration < 100ms, mean FM<10
Non-modulated short notes
2
Not type 1 and mean pitch > 3000
High-pitch notes
3
All other sounds with duration<100ms
Introductory notes and cage noise.
4
Duration > 100ms, mean FM<10
Non-modulated long calls
5
Duration>100ms, mean pitch>2000
High-pitch song syllables
6
All other sounds with duration > 100ms
Other syllables and cage noise


graphic

Note: because types 4-6 are often much more rare than types 1-3 (and are often of main interest), the display shows a 2x multiplication of their values.

  

software instalation software instalation
software instalation
Previous Top Next

Software Installation

Before Installing SAP2 you must has SAP installed in your machine. For a new installation, first install SAP version 1.2 (full version) as shown below. Once SAP is installed download and extract the SAP2 installer and follow the instructions.



Important: follow installation instructions exactly as described below.

Step A: extract the contents of the zip file into a temporary folder on your C drive. 
Step B: make sure that the folder hierarchy shown below has been maintained:

graphic

Step C: run the Sound Analysis Pro Installer.exe application. The following window

should appear: graphic

1: Click Install MySQL and follow the instructions, selecting the default (typical) setting option. If you already have MySQL version 4 or higher installed skip this step.  

2: Clicking this button will first start the MySQL administer. In this first usage (and only this time) you will be first asked to provide a user name and a password. Do this and click okay.

3: Click Install SQL control panel. Wait a few moments and you should see a traffic light with a green light appearing in your taskbar:

 
graphic

Then, the MySQL control panel installation should start automatically. Note that the control panel is not essential for using SA+, but you must have the MySQL server on (green light) when starting SA+.

3A: If you intend to use SA+ for prolonged recording sessions, we recommend changing the settings of MySQL to prevent loosing connection with the server after a few hours of operations. Follow these instructions carefully: Click on the traffic-light taskbar icon of MySQL and select ‘show me’. Select ‘my.ini setup’ tab and add the following line:

set-variable=wait_timeout=999999

below [mysqld] statements. Here is an example of my.ini file with the new line:

[mysqld]
basedir=C:\mysql
#bind-address=127.0.0.1
datadir=C:\mysql\data
#language=C:\mysql\share\english
#slow query log#=
#tmpdir#=
#port=3306
#set-variable=key_buffer=16M
set-variable=wait_timeout=999999
[WinMySQLadmin]
Server=C:/mysql/bin/mysqld-nt.exe
user=xxxx
password=yyyy

Now click ’save modification’. Click the ‘Environment’ tab and click ‘hide me’. Next, click on the traffic light icon that reappeared in the taskbar and select WinNT -> ‘stop the service’. When the light turned red click it again and restart the service. When the light turned green click ‘show me’ and select the variables tab. Click ‘refresh variables’ (bottom right) and scroll down to confirm that that the wait_timeout variable is now 999999. Do not click the ‘x’ – it will stop the service, instead, go to the ‘Environment’ tab and click ‘hide me’. You may wonder why MySQL developers made it so tedious to change variable settings – so do we.

4: Click Install Sound Analysis Pro. Follow installation instructions.

5: If you do not have a National Instrument card and you do not have the NIDAQ driver installed you must remove the file ztNIDAQ_R.dll from c:\program files\sound analysis pro\ folder. Keep this file in another folder so that if you decide to install NIDAQ (e.g., for operant training with keys), all you will need to do is to move this file back to the sound analysis pro folder.
 
To start SAP2 go to the start button -> Sound Analysis Pro -> SAP2. You should see the opening screen:


graphic

sound & brain sound & brain
sound & brain
Previous Top Next


Sound (top) and neuronal data (below) are analyzed in synch and are saved into the same raw features table.

graphic




  

Sound Feedback Control Sound Feedback Control
Sound Feedback Control
Previous Top Next


Delivering sound feedback


The SAP2 Recorder is provided with some foundation that will allow you to extend functionality to include control of auditory feedback response to appropriate sounds. The current version has a primitive prototype that performs the simplest possible feedback response, which is a sound playback in response to amplitude-trigger (with no online frequency analysis). We did not include online frequency analysis because we believe that the construction of the so called ‘match filter’ should include several compromises, and we feel that determining a useful set of constraints on the algorithm will only be useful if it involves laboratories with a primary interest in accurate auditory feedback  

The current version only includes the prototype methods that allow automatic response to sound based on amplitude threshold, which is the simplest (and not the most useful) kind of auditory feedback. This illustration shows how the mechanism works:

graphic

The feedback implementation is very simple: set the number of sound peaks to trigger playbacks in the feedback window, and all the rest of the setting is identical to that of operant training.
graphic




When setting the number of peaks to trigger to any positive number, SA+ will respond to peaks that crossed amplitude threshold with a sound playback.

Using simple manipulation of the source code, you can easily extend this functionality to detect any pattern in the time-series of peaks with little extra computation cost. This is very easy to do, by addition functionality to the event handler of peak detection. That is, you can add code to the method that handles the notification of peaks, which currently say only:

{
    case PeakCounterTrigger:                           // in a case of X number of peaks detected
                Bird1->TriggerSound();                  // play a sound to bird 1
}

For example, you can extend this command to trigger a sound only if X peaks occurred within a certain duration (which set a frequency filter) with little real-time cost. Feel free to manipulate this function and we will be happy to incorporate your modification in SA+.  
 
Limitations: With the current setting, and with no efforts made to improve performance, the SA+ typical response time is about 25ms from detection. Frequency analysis should only add a few milliseconds. To allow high specificity of delayed feedback, based on spectral shape, users should be ready to invest some resources. One of the least expensive options is using the services of David Swigger (), who is the programmer of this DSP engine, to determine the cost of the desired extension. In terms of computation cost, the extension is not too expensive using the existing SA+ routines. The more challenging part, however, is of achieving high accuracy of feedback timing. Depending on the application, achieving sufficient accuracy might be constrained by hardware limitation and by requirements for certain manipulations of the MS Windows operating system.
  

sound inpute and output
Previous Top Next


The Sound input and output monitor shows the que of sound files generated by the SAP recorder that are waiting to be processed. In the case shown below, the cue is empty. This could be either because all sound has been processed already or because the output folder of the SAP recorder (labeled as "Stage 1 sound folder" is not the same as the input folder of this module.

graphic


To check for this, go the the SAP recorder and click the Recording tab.

graphic


You should first validate that the input and output chans are correctly specified. 

Input settings: Remember that the input of this application is the output of the SAP2 Recorder. Any sound file saved by the recorder should appear on the file-list window. You must make sure that the input folder is correctly specified. You can click the buttons to navigate to the appropriate folder, or type the folder address in the edit box. However you can do this only when the engine is stopped or when the channel is paused.

Output settings: the output of the sound processor is divided into three categories.

Sound output  The first channel in the sound output is where sound files that matched a criteria (e.g., sound files that contain song bouts) will be moved to. Note that other files are permanently deleted (see batch operations for some alterative approaches of sorting rather than deleting files). Sound files are not only moved, but also renamed. By default, SAP2 ignores the original input file name and generates a file name with the following file annotation format:


nancy_39365.70262828_10_10_19_31_2.wav



Note that the same date & time stamp is also used as a serial_number in the syllable database, allowing easy access to the raw sound data of each syllable.

Should I save all the sound files?  In many cases, you will find that the most significant issue is the load of accumulating sound files on your hard disk. You may ask yourself if saving all the raw data is indeed necessary. The practical answer to this question is that with appropriate maintenance, saving the entire sound data is inexpensive and easy. The cost of a DVD/R disk is now down to $1 (when buying in bulk), and the overall cost of saving the entire vocal ontogeny is about $15.  SA+ provides an easy way of saving raw sound files and keeping track of the data. Raw data are an asset that our community may use and reuse and saving raw data is generally a good idea.  

Saving sound output through the network: You may define the output folder pass as a standard network address. E.g., //Peterson_computer/c/SA+P/bird1/ will work just like a local folder address. Note however, that the network might crush. If SA+ detects a network crush or inaccessibility to a computer, it will revert to save the sounds to a default local drive, called c:\rescue\. If this has happened, take your time, solve the network problem, and then click ‘stop’ and then ‘start’ to resume the selected destination you previously set to the Sound Processing Live engine. 
  

Spectral Derivatives

Spectral Derivatives
 
What are Spectral Derivatives? The traditional sonogram represents the power of sound in a time-frequency plan, while the spectral derivatives represent the change of power. For each point of the two-dimensional time-frequency plan of a sonogram, one can measure change of power from left to right (on time), from bottom to top (on frequency) or at any arbitrary direction. So, spectral derivatives are derivatives of the spectrogram in an ‘appropriate’ direction in the time-frequency plane. Spectral derivatives can be estimated using MultiTaper spectral methods, they have the same resolution and are not artificially broadened.

Sound Analysis Pro uses spectral derivatives to track frequency traces in the spectrogram as follows. As one cuts across a horizontal frequency trace, from low to high, there is a sharp increase in power, then a plateau, then a decrease in power. The same cuts are first positive and than negative, passing through zero at the peak power location. A useful property of these derivatives is that they show a sharp transition from positive to negative values, providing a contour that is more accurately defined than just the frequency trace.

If the frequency trace is not horizontal, then the direction of maximum change in power is not in the frequency axis, but rather at an angle to both time and frequency axes. To capture the direction of maximal power change in the frequency trace, it is then natural to take a directional derivative perpendicular to the direction of frequency modulation (think about detecting waves in the ocean by cutting through the surface in many arbitrary direction until you hit the wave). The directional derivative is easily computed as a linear combination of the derivatives in the time and frequency directions, and may be thought of as an edge detector in the time-frequency plane.

We find the derivatives spectrogram an excellent means of visualizing the spectral information in a song. The derivatives of each point are calculated in an angle that is perpendicular to the direction of frequency modulation. As a result of this edge detector technique, zero crossings (transitions from black to white in the middle of frequency traces) look equally sharp in the modulated and in the unmodulated portions of a note. Peak Frequency contour is defined by the zero crossings of successive directional derivatives.

graphic
Fig 1: The Explore and Score Interface


Fig 2: A MultiTaper Sonogram of a bird song segment


Spectral derivatives of the same sound in Fig 2 here the frequency traces are more distinct. Since the derivatives are calculated in a different direction for each point, subtle modulations are also visible Estimates of frequency and time derivatives of the spectrum may be robustly obtained using quadratic inverse techniques (Thomson, 1990, 1993).

These estimates have the general form: An approximation of the above matrix is define by:

Fig 3: Spectral Derivatives of the same sound

As shown, the frequency traces are more distinct. Since the derivatives are calculated in a different direction for each point, subtle modulations are also visible.


Formal definition of spectral derivatives


Estimates of frequency and time derivatives of the spectrum may be robustly obtained using quadratic inverse techniques (Thomson, 1990, 1993).
These estimates have the general form:
An approximation of the above matrix is define by:
Empirically:
Where:
is a directional derivative of the spectrogram in the time-frequency plan, the direction being specified by the angular parameter:
In particular, the time and frequency derivatives of the spectrogram may be obtained by setting:

state transitions state transitions
state transitions
Previous Top Next

Estimating state transitions

Some sounds are changing slowly and some are changing rapidly. One standard way of estimating stationary is to compare the sound to its self (e.g., using autocorrelation matrix) and see how quickly the self similarity (100% at the diagonal by definition) dies as we move off the diagonal. 

SAP2 offers a similar procedure, but based on similarity across features. For each time point in a sound, we calculated how long the self-similarity holds. For example, starting from a  time point i, we move forward to time point i+1... until the similarity to i decrease to a certain threshold. Then the same procedure is repeated, but moving backward to time point  i-1...

We use MAD (median absolute deviation) as a yardstick for similarity across four features: pitch, FM, Wiener entropy and Goodness of pitch. The second parameter is how many time similarity should decrease below threshold before we say that the state has changed (namely, that the sound is no longer similar to its origin).

Note: one way of doing this procedure is by keeping the segmentation threshold so low that the entire sound is a "single syllable". If you do this, the estimates will include silences as a "state", namely, how long a silence last will be treated the same as how long a note self-similarity lasts. If you do segment the sound (as you probably do by default) then silences will not count and will be marked as zeros (you should then treat those zeros as undefined).

How to:

Start SAP2 in "Explore & Score" and open a sound in Sound 1. Go to "segmented comparisons" and observe the "period of repetition" control:

graphic 


The parameter "state transition threshold" determines the difference between sound time windows in units of MADs that indicates a state transition. The "# mismatches to reject" is by default 5 mismatches (in each direction). Click Calc Durations and SAP2 will show you the results in the "note duration" window.

graphic 

Each number is the duration of a state, and the numbers are 1ms apart by default, or as determined by the "advance window" parameter of the spectral analysis.

Now select all the numbers and cut them (ctrl-c) and paste them to Excel (ctrl-v). plotting the graph might look like this for a zebra finch song.
graphic


And here is a bengalese finch song example:

graphic


  

Batch Setting step 2 Batch Setting step 2
step 22
Previous Top Next

Step 2: Setting the segmentation threshold
                                                                                                                         Next ->

Before processing a batch you want to make sure that the sound is properly segmented to syllables. This is important even if you are not creating a syllable table because SAP2 decides whether to accept or reject the file based on its syllable content (namely, on segmentation).

Double clicking one of the wave file invoke a 'fake trial' namely, all calculations are done without saving anything:
graphic

Move the yellow slider up or down and then double click the sound file again to update the segmentation. If the quality of segmentation is really important to you (e.g., to generate DVD maps for analysis) then see details at the segmentation part of the user manual. Otherwise, a gross segmentation might do (you can always create a syllable table from the unsegmented raw features table.
                                                                                                                Next ->
  

Batch Setting step 3 Batch Setting step 3
step 3
Previous Top Next

Step 3: Setting the slave channels
                                                                                                                         Next ->

If you have master and slave channels (e.g., sound files + neuronal activity channels) you have to tell SAP2 where those channels are. SAP2 expect to fine wave files that match the master (sound) channel by name. The expected annotation (which is automatically generated by SAP2 recorder) is

sound: BirdName_dateStamp.timeStamp_month_day_hour_min_sec.wave
first slave: s1~BirdName_dateStamp.timeStamp_month_day_hour_min_sec.wave
second slave: s2~BirdName_dateStamp.timeStamp_month_day_hour_min_sec.wave
third slave: s3~BirdName_dateStamp.timeStamp_month_day_hour_min_sec.wave


By default, the slave channels are located in subfolders named Slave1, Slave2... To set the slave channel folders manual select
graphic

and then type the folder name. Turn the slave channel on by selecting Average (if you want to save the average values for each millisecond of data), rectified (absolute values) or squared: 

graphic

One more decision you have to set up the batch so as to stop (wait) if there is a missing slave channel. If you want the batch to continue even if no slave channel was found, select graphic


After you finish setting the slave channel click graphic to return to the main page.

Note: the raw feature table will have different number of fields depending on the number of slave channels (each channel add 5 fields). Do not change the number of slave channels "on the fly" Instead, set the maximum number of slave channels and check graphic

Batch Setting step 4 Batch Setting step 4
step 4
Previous Top Next

Step 4: Setting the bird Identity
                                                                                                                         Next ->

No batch can be processed before setting the bird identity. There are 2 different approaches you can choose from:

the first (and recommended) approach is to handle the bird identity and information via the the "animals" table, using
graphic

 The second approach is to create or retrieve the output tables (syllable table, raw features table) manually, using

graphic
 
The BIG advantage of handling the bird identity via the "animals" table, is that you only have to enter the bird information once, and this information is then available to all the SAP2 modules (e.g., the recorder, Explore & Score, DVD maps, or other batches).

If you already entered the bird information (e.g., in the recorder, or in Explore & Score) all you need to do is click graphic and choose the bird from the list

graphic
and click OK.

Now look at the bottom of the batch window
graphic
as you can see, the name of the bird, it's current age, the model song it was trained with, and the training age are now available to the batch. Just below the Change Bird button you will see graphic indicating that a syllable table was created (or retrieved) with 0 records in it.

If you did not enter the bird information yet click on the  graphic button
and punch in the bird information.
graphic
if you do not know some of it you can edit it later on, using the Database module, for now, the minimum you want to enter is the bird's name and (if developmental recording) the hatching date.



The second approach is to create or retrieve a table manually. This approach is strait forward but remember the in that case, SAP2 has no way of knowing the identify and information of the animal. You can use this approach to process data regardless of animal identity, keeping adding data to the same table in processing order.

Batch Setting step 5 Batch Setting step 5
step 5
Previous Top Next

Step 5: What to save?
                                                                                                                         Next ->

The save options allow you to choose between saving raw features table, saving syllable tables or saving nothing but sound (do not save). For bird R109 the default syllable table name is syll_R109

graphic


The File Table: If you do not want to deal with it, leave graphic checked. For bird R109 the default file table is file_table_R109

Manual setting of File table: The file table contains the list of processed file names with indexes that are linked to the raw feature table. It allows you to rebuild a syllable table (based on different segmentation criteria) from the raw feature tables. By default, the file table is created automatically and its name is file_table_ + the bird's name but you can use whatever name you like, either by creating a new file table or by retrieving one created earlier. In most cases, you will want to have one file table per bird.
graphic

Naming Raw Feature Tables:    The simplest option is to leave the default checked

graphic

This will auto-generate raw feature table for each experimental (or developmental) day. For example if the bird is called R109 the raw features tables will be called, for days 40, 41, 42, raw_R109_40 ,  raw_R109_41 ,  raw_R109_42

Manual setting of raw features table  Depending on your sound data, you might want to uncheck the Auto-create daily raw features table. For example, if you want to combine data from several days or if you are combining data across birds. After unchecking, click New Raw table or Change Raw table. Note that this raw feature table will remain the same throughout the batch. It might become huge though...

Step 6: setting playbacks Step 6: setting playbacks
Step 6: setting playbacks
Previous Top Next


Step 6 – set playbacks


Go to the “Playbacks List” tab.
Setting passive playbacks: if you want playbacks to occur without an operant action, click Add on the top of list 1 or list 2, and use the popup menu to add sounds. When “passive” check list 1, the highlighted sound in list one will be played according to the ‘playbacks criteria’: click the ‘playbacks quota’ tab, check ‘passive playbacks. Then for each channel you want to use for playbacks check and set the odds and playbacks hours as shown below. 

graphic

Setting operant playbacks: If you installed a National Instruments digital IO card and appropriate detectors (e.g., keys) in the cages, you should now set the operant playbacks. For each channel you will see a playbacks setting menu, which corresponds to the graphic display of the keys (you can see them in the main tab when ‘Train’ is pushed down.  As shown, the left circle is detector 1, a virtual display of the physical key located in the cage. Playbacks can be triggered by activation of a detector (such as a key, or motion detector), or passively. The two lists can be filled with one or several wave file by clicking ‘Add’. To remove a wave file from a list click on the file and press the ‘delete’ (in the keyboard).
graphic
















Note: if the key is blue that means that playback are not active. This could be because the playbacks list is empty or because training session is not currently active. Click the ‘playbacks quota’ tab and set the training session times:

graphic

Testing the playbacks: There is a “Play” button below each channel that allows you to test if playbacks really work. You can only click Play when the engine is started! Go to the main tab and click “Start”. Also, make sure that the volume (Vol) is turn to the right: graphic
  

Step 7: start recording & troubleshooting Step 7: start recording & troubleshooting
Step 7: start recording
Previous Top Next

Step 7 –start recording & troubleshooting


Go to the main window and click “Start”. If all is cool, you should see the oscilloscope display moving for each channel as shown below.
graphic

Let’s troubleshoot two cases:

1. Nothing happens, no osciloscope: First is graphic checked? Second is there any oscilloscope display? If not, click “stop” and then click “long” graphic. This will allow you to look at the input channel:
graphic
Make sure that there are no duplicate channels and that the channel name is valid. Click the “Display off” and then click “scrolling” graphic and then click “Start” again. If the problem is not solved, click stop, settings -> output selection and click graphic. This will remove the output channel, which could potentially interact with the recorder.

2. The oscilloscope is frozen:  Click “Start” and look at the bottom right of the channel graphic. Are the number moving? If yes, move the Thr (yellow) slider to higher value. If this does not help, it is likely that you have a hardware problem (microphone not connected?). If the numbers are not moving, the channel itself is not working – try restarting your computer.
  

Step by step clustering Step by step clustering
Step by step clustering
Previous Top Next


Scaling syllable features


Scaling of syllables, based on 'maximum likelihood estimates' is the same as that used for similarity measurements at the levels of frames and intervals (see in section 8a).

Example case: bird109

Open the clustering module, open the table of bird109 and click 'Analyze'. The result should look like this:
Only up to 10 (most abundant clusters) are shown 
graphic
Y axis feature
(FM)
X axis feature
(duration)
First and last file in the interval, file name annotation is useful!
27% of the 3000 syllables did not pass threshold and are not clustered 
In this case, all pairs that passed threshold where clustered


































Note that this representation is basically the same as a 2D DVD-map, with a default of duration for the X axis and mean FM for the Y axis. Not all features are used for clustering: graphicby default, SA+ uses syllable duration and the mean values of pitch, Wiener entropy, FM and goodness of pitch. Feature units are scaled to MAD also in the display, as noted, proper scaling is essential for calculating meaningful Euclidean distances across features. For every one animal, you might find biases, but overall, all clusters should live in the neighborhood of 4 MADs and have a mean spread between 1-2 MADs (averaged across features). The colors identify the clusters, but the initial color identity is determined by the population of the cluster (how many members). Therefore, the color is not yet identifying any cluster in the long run (only at the current moment). Shortly, we will discuss the techniques of marking (identifying) a cluster for tracing. The legend on the left allows you to pick a cluster or to view some of its properties. For example, the most abundant cluster is painted red, and you can see near the red legend that this cluster has 606 member syllables. Once you identified a cluster and clicked at the appropriate legend - you should give the cluster an ID. The ID must be an integer number, type it in the edit box placed on top of the legend. Once you ID a cluster and start tracing it back, it will keep its original color (that is, we uncoupled the abundance from the color).

graphicgraphicgraphicSA+ presents the actual Euclidean distance cutoff of the most distant pair of syllables included in the analysis in addition to the threshold. The 'data included' results show the number of paired syllables that passed that threshold. The upper bound for 3000 syllables is 3000x3000=9 million. The threshold reduces this number to about 50,000 syllable pairs that passed. You can reduce it even further by moving the 'data included' slider to the left and observe the gray display of 'data included' changing as you go. Now click restart and observe the consequence of this action on the cutoff.  This technique allows you to quickly test the consequences of changing the cutoff without re-calculating Euclidean distances (which is the time-limiting step).  Note that the table of syllable pairs has still a lot of redundancy with 50,000 syllables in pairs that are extracted from no more 3000 different syllables (and often, much less). In fact, looking below the legend will show you that only about 2500 different syllables passed the threshold.    A syllable in a 'crowded area of feature space' will participate in many pairs, whereas in a sparse area, a syllable might have no neighbor close enough to join a pair. Also, remember that SA+ only analyzes the 10 largest clusters. If you want to cluster more then 10 types, you can do so exhaustively as described later. You should be aware that filtering the table (removing clusters) is a non-linear operation with regards to clustering. That is, the results might change abruptly with filtering. In practice, this is more often a plus than a minus, since it can turn an unstable performance into a stable one.



graphicgraphic

Before we get into the tracing technique, let's explore the different displays that will help you judge how good the clustering is. Click on the 'all data' tab, then click the 'residuals' tab, and move back and forth from cluster to both of those displays. As you can see, most but not all the syllables were clustered.

A careful look at the outcome raises a few questions about the clustering performance.

First, how come that the yellow and green clusters where not joined into a single cluster? The answer to this question becomes clear when looking at different projections of this cluster in feature space. Changing the Y axis to pitch shows that the two clusters overlap in their frequency modulation but not in their pitch.
Duration / Pitch residuals

Note those low-pitch residuals: cage noise! 
Duration / Pitch
Duration / FM
graphicgraphicgraphic















Second, what sounds compose the 27% residuals? Looking at the residuals shows that some belong to 'sparse clouds' that have not been clustered. These 'clouds' are often long and short calls, which are less stereotyped and less abundant in our recording (the lower abundance of calls is, in fact, an artifact of our recording method, which intends to preserve song data and eliminate isolated calls).  Other residuals belong to the category of cage noise - these are often characterized by low-pitch and broad distribution of duration, as shown in the example above. Finally, some residuals are left-overs of clusters - these residuals can be reduced by having a more liberal threshold. Similarly, one can cluster a 'sparse cloud' of data by having a more liberal threshold.

You might ask - how can one decide what should the threshold value be? The answer is that the ideal threshold value is 'cluster dependent'. If a cluster is very close to another cluster, having a too liberal threshold will join them. The point is you do not want to try to cluster all your data at once. Instead, the strategy we implemented is of clustering types one by one. This requires more work, but it gives you the freedom to set appropriate conditions, that works well for this particular type of sound.


We will now start the process of tracing back syllables, but first, let us illustrate some of the problems involved in trace-back. As noted earlier, the major issue is that the nature of the task changes as we step backwards in song development, as we expect clusters to eventually fall apart when we reach the beginning of song development. What we are trying to trace is, in fact, the structure of this 'falling apart' (actually, the 'build up' when forward-tracking) process. During later stages of song development, we will typically observe movement of clusters in feature space. This process is easy to trace since we have a complete recording of ontogeny, and since most features change slowly compared to the size of time-slices we try to bridge across (typically, 3000 syllables occurs in time scales of several minutes to a few hours). Even non-linear vocal changes, such as period-doubling, will rarely cause problems since other features of the syllable will remain stable during this event. During early stages of song development, we often see how clusters merge - since in almost every bird, different syllable types emerge from a smaller number of prototype syllables in the process of 'syllable differentiation'. Detecting the point of transition is a challenging task.

Let's look at two clusters of bird 109, which are shown as yellow and green in the figure above. We noted that those clusters are close to each other. They have similar FM but different pitch, and there is also a slight duration difference between them. Move the 'Time control' slider 2/3 to the left and click 'Analyze'. Note, that since we are not back-tracing, SA+ will make no attempt to re-identify clusters, so the colors will change arbitrarily (according to the number of members in each cluster).

graphicgraphicgraphicgraphic 















Note that although we stepped several weeks, the two images are similar, and we can see that the blue cluster is still there, but stained yellow, and the red one has turned blue (and is somewhat larger). The problem is that the yellow and green clusters have merged - and are both red now. The question is - is this a false merging, or something that the bird did? Looking at the raw data (right panel) shows clearly that the clusters are indeed merged. This example demonstrates some of the difficulties you might encounter while back-tracing - now let's try it.

Pull the Time control slider to the end of song development and click 'Analyze'. We will start with the easiest cluster - this one:
graphic












Now we need to tell SA+ that this is the cluster we want to trace, and we need to give it a permanent name. This name will appear in the appropriate records of the database table as we do the procedure, unless you uncheck the 'write permit' check box (please do uncheck it!). Since this cluster appeared blue we check the blue radio-button in the legend, and then on top of the legend we type the permanent cluster name. The cluster name must be an integer number, and we suggest starting with 1.


graphic
Number of members in each cluster



















                                                
graphicNow you should see that the track-back button (top) became enabled. Click it.

Note that the 'Time control' did not take a step back yet - it only identified this cluster in the current slice and (if write permitted), registered it in the table of bird 109 so that each occurrence of this cluster is marked as 1 (the default value for a cluster is 0). Now click track-back once more. Note that the Time control has moved a tiny bit to the left. The new image is from an earlier developmental time, e.g.,
graphicgraphic againstgraphic

Now click 'Repeat tracing back' and you will see that tracing back occurs automatically, step after step, until you click this button again - or - until something bad happens…

Let's try to understand more formally what is happening here. SA+ did clustering and you have chosen a cluster to trace, we will call it the reference cluster. When tracing back, SA+ does a similar clustering on a slightly earlier time window. SA+ then computes the centroid of each cluster (that is, the mean duration of syllables in the cluster, the mean mean-pitch of syllables in the cluster and so forth. Then, the centroids of each of those new clusters are compared to the centroid of the reference cluster. The cluster with the most similar centroid to that of the reference cluster is assumed to be an earlier version of that cluster - but this is only if it is similar enough to the reference. The default threshold for this comparison is 0.2 MADs (across all features chosen).

Tracing this cluster should work very well for several weeks of song development, but eventually, it is doomed to fail.

graphicgraphic

You will need to define the cluster again - based on its location (change the radio-choice in the legend to the appropriate color to re-activate the 'trace-back' button). Then keep tracing back, with some 'playing around' you should be able to trace it back until August 10 or so, which is 3 days after the onset of training.

Try to trace other clusters of bird 109. You will find some of them easy and others more tricky. For example, this yellow cluster will cause frequent troubles by merging with the one below it:


graphicgraphic













To solve such problems, your first line of defense is decreasing the Euclidean distance threshold, e.g. to 0.01
graphic

To check quickly if threshold reduction can solve a problem, click analyze and then click the 'data included' left-arrow followed by 'restart'.
graphic


graphic






This approach also fails from time to time - but do not give up - reduce threshold to 0.008 to regain hold, back-trace once and try 0.01 again, and then auto-trace until the next failure. By the time you approach the beginning of September, tracing this cluster becomes really difficult.

This experience might (and should) have raised some concerns about the objectivity of this clustering method. Indeed, one would like to be able to set the parameters once, rather then keep playing with them. The reality of cluster analysis, however, is that one often needs to adjust parameters. We suggest you document your adjustments, and also, try it more than once in such difficult cases. In this particular cluster, the problem is that the only good distinguishable feature is pitch - all other features of these two clusters overlap. Trying to distinguish between them can only work for some time, but as the pitch values approach each other, the mission is turning impossible. Furthermore, you pay a toll of high percentage of residuals.

The solution is therefore to also cluster the two clusters together, and consider the time when they are separated as two descendent clusters of a main branch (as in a dendrogram).

To do this, move the time control to the end of song development, return the threshold to 0.015 and in the features included, uncheck the 'mean pitch'. Check the 'write permit' but uncheck the 'overwrite clusters'. Now give the joined cluster a different name (say 2) and click 'analyze' and the clusters will immediately merge. You will see that the two clusters immediately join, since when pitch is not taken into account, other features they have in common prevail.
  

Symmetric comparisons Symmetric comparisons
Symmetric comparisons
Previous Top Next


An example of symmetric comparison

graphicgraphic
We will now use the same example presented above to demonstrate the effect of symmetric scoring, starting from an example that demonstrates the weakness of this approach. Keeping the setting of 8E, go to the ‘Similarity’ page and click ‘Symmetric’, click score:

graphic As you can see, the similarity is estimated from the beginning of the two sounds. Only sections are calculated, and the comparison is strictly on the 450 of the matrix, except that it has some (10ms) thickness. This thickness can be manipulated by moving the ‘min dur’ slider. 



graphicNote that reversing the order of Example 1 and Example 2 has only a minor effect on the similarity values (including % similarity).

To get the symmetric comparison to behave properly in this case, we need to manually segment the syllable in Example 2:
So, when should one use symmetric comparisons? These measurements are the best choice if we know in advance the sound units that should be compared and if we do not categorize one sound as a model for the other. For example, comparing two calls of the same type across different members of a colony calls for symmetric scoring. Also, once we have identified a cluster of syllables we might want to measure the similarity across sounds of this cluster, here too, a symmetric approach is most appropriate. SA+ provides a method for automatically identifying clusters across files and scoring their similarity (allowing batching of many thousands of comparisons, see batch operations for details).

  

Syn-Song DVD maps Syn-Song DVD maps
Syn-Song DVD maps
Previous Top Next


The Syn-Song mode


graphicThis mode combines visual and auditory effects to represent the moment to moment syntax changes, as well as the syntax 'musical rhythm'. Make sure that your computer's speaker volume is up. Move the 'Time control' towards 2/3 to the right, click 'DVD Play/Stop' and Click on the 'Dynamic movies' tab and select the Syn-Song mode. In this mode you can see the short-term syntax transitions accompanied by clicks of different pitch. The pitch is determined by the Y axis location of each current syllable. After some listening, try accelerating the movie by manipulating the position of the 'speed' slider. Next, try changing features observed on both X and Y axis. As you can hear, the Syn-Song mode generates a sort of 'synthetic song' out of the syllable tables. Changing the rate of this 'song' allows your mind to integrate across time scales that are way above your integration capacity of song heard in real-time, and the combination of visual and auditory stimuli is also good. You can manipulate the 'memory' of the observe trajectory by changing the interval duration.  


Try also different combinations of features. Remember that the tones will change only with the Y axis, so that each feature selected for Y axis will give a different 'music'.
graphic graphic

  

Syntax DVD maps Syntax DVD maps
Syntax DVD maps
Previous Top Next


The syntax mode


Click on the 'Dynamic movies' tab and select the syntax mode. Note that the interval and slide positions have changed, and that 'Draw lines' is now checked.

graphic












Move the time control to the later stages of song development, select duration for X and pitch for Y axis. Set the color scheme to user defined and select red. Then click 'DVD Play/Stop'. Instead of looking at the syllables as dots, we now look at the trajectories that connect them in time. That is, the order of singing the song syllables is now represented by lines that connect those syllables. When a short duration syllable is followed by a longer duration syllable, SA+ will paint the line red, and when a long syllable is followed by a shorter one, the line is blue. It is immediately apparent by the movie, that the song is stereotyped; however, it is also easy to see drifts in the pattern. As shown by the black arrows (these are not in the movie) the shape equivalent of this song is a triangle.  Each projection (using different features) will give a different view of this syntax patters, obviously, some projections are nicer then others.
graphic

graphic                   

























graphic


We can now explore the syntax development of this song. Moving the slider back to the original setting, duration versus FM, shows nice transitions of syntax structure during development as exemplified in the figure to your right. Note how the blue and red 'streams' get separated during development as the third song syllable appears, turning the shape into triangle.
graphic
Now say that you want to observe the possible effect of circadian time on song syntax: selecting 'color by time of day' will present the trajectories with circadian color, indicating that for some features (e.g., pitch) evening (blue) and morning (red) trajectories differ during early development.

graphic











Note that although we call this 'syntax mode' what you actually see is not really syntax, but some combo display capturing both syntax and feature (sort of phonetic) changes. We find this representation particularly useful because there are reasons to believe that the changes of feature structure (within a syllable) and changes of syntax might be linked. For example, a prototype sound can give rise to two different types of sounds (see details in Tchernichovski et al 2001, sound differentiation in situ).

This is a good time to elaborate about the relations between the observed clusters and the actual syllables produced by the bird. The clusters are at best as robust as the segmentation method is, and unfortunately, our current methods of segmentation, based on amplitude and Wiener entropy thresholds are not always robust. The most common type of measurement noise is inconsistency in the segmentation of short attenuations. For example, if the 'real' pattern of syllable is ABC,ABC,ABC… and the pause between B and C is very short, SA+ might sometimes join B and C, to give an additional 'combination cluster' that we shall call D. hence, what we observed as ABC,AD,AD,ABC should actually be ABC,ABC,ABC… This problem, once detected, is not too difficult to solve. For example we can re-segment the data (using the binary files rather than the raw sound files) using a more aggressive criteria for partitioning sounds. Next versions of SA+ will include some more sophisticated methods for addressing those issues. Methods of detecting 'combination clusters' are described in chapter 7.
  

The Animals Table The Animals Table
The Animals Table
Previous Top Next

The Animals Table

graphic

  

The Channels Table The Channels Table
The Channels Table
Previous Top Next

The Channels Table is an automatically generated table that contains the state of each SAP2 Recorder channel. You do not need to know much about it.
  

The Control Panel The Control Panel
The Control Pannel
Previous Top Next

The SAP2 recorder control panel

graphic

The Control Panel allows you to start and stop the engine, to change the overall appearance of the recorder, to switch between oscilloscope display modes, and to save the overall configuration:
graphic

  

The feature_scale table The feature_scale table
The feature_scale table
Previous Top Next

The feature_scale table is an automatically generated table that contains one or several schema for scaling features. Features are scaled for several purposes, including similarity measurements, based their distribution. We are interested in two types of distributions: distribution of raw features (pitch, FM…) and the distribution of syllable features (mean_pitch, mean_FM, var_pitch, etc). To scale features we need to know the central tendency and width of the distributions in a representative sample of data (e.g., within the songs of 100 zebra finches). We use the median and the median absolute deviation (MAD) of each feature to characterize those distributions. SAP2 comes with a zebra finch settings and users can easily add settings for other species, song types, etc.
  

The File Table The File Table
The File Table
Previous Top Next

The File Table:  A file table is generated each time a live or a batch module create syllable tables or raw features tables. The file index then appear in those feature tables as an index to the file table, which tells you more about the identity, location and attributes of each of the sound files. As mentioned earlier, the File Table is an important utility that allows you to query arbitrary subsets of your raw data from any set of processed data. Here is the File Table structure:

graphic


Note that the file_age and the bird_age can be used to design queries for retrieving subsets of data. However, the file age is a bit less accurate than the ms table serial number time stamp. That is, the number of days elapsed from 1900 is accurate but the fraction of the day is not the same as number of milliseconds elapsed from midnight but the DOS time prompt, with accuracy of about 2 seconds.   - more on this in the xxxx.
  

The Key-peck table The Key-peck table
The Key-peck table
Previous Top Next

The Key-peck table: Is an automatically generated table that contains information about activation of detectors in operant song training. If you are using operant playbacks, , this table is used by the recorder to save events of key activation.

graphic

The main controls The main controls
The main controls
Previous Top Next

Starting the automated sound processing for the first time



When you click the ‘start’ button, SA+ will start monitoring the input addresses to detect new sound files. Then, SA+ processes each sound file separately. It first performs spectral analysis and feature calculation, segmentation and computation of syllable features. Based on those measures it ‘decides’ if to save or discard the data. A decision to save means that 1) the sound file will be moved to a permanent storage address, 2) the raw feature file will be created and saved in a permanent storage address, and 3) the detected syllable features will be saved by adding records to the syllable-table of this animal.  

Vocalization or noise? The Sound Analysis Recorder already eliminated all the ‘empty’ sound streams from the recording channel based on simple amplitude filtering. We now need to make an accurate determination for each time window, if it contains animal vocalization or not. SA+ uses several criteria to make this determination, and here we only discuss (in a nutshell) the two central controls, which are amplitude and Wiener entropy thresholds. The yellow slider at the left of each spectral display is for controlling amplitude threshold, whereas the red slider is for controlling the Wiener entropy threshold. In other modules of SA+ you can change these controls ‘live’ while SA+ interactively updates the display and the calculation of features. However, this ‘live; feature is implemented in the ‘live’ mode, since in this mode, SA+ keeps bouncing from processing one sound to the other. This means that whatever change you have made to the sliders, you will only see its outcome the next time this channel processes a sound file. We therefore recommend that you first make yourself familiar with how these sliders work in the ‘Explore & Score’ module. 

Setting the amplitude threshold:

graphic

  

the main window
Previous Top Next


The Main Processing Live window

The control panel includes a status indicator: red = not ready, green=ready. If all is okay, you

graphic

should see a message "Ready" next to the green indicator. If this did not happened, the most common reason for the delay is a large number of "waiting files" in the input folder.

The pushdown buttons on the right side of the control panel determines which features can be seen on top of the derivative images. By default, you can see Wiener entropy (red) and amplitude (yellow). Note that you can only see the features when sound i amplitude is above threshold (controlled by the yellow slider on top of each channel display).

The Processing Live module has many state parameters (e.g., sliders position). To save their values (for the next time SAP2 opens), click on the floppy disk. 



Each channel shows the name of the bird (in the example below "nancy") and you should see the same name on top of the recorder channel as well. The sliders are as in the Explore & Score module, but located above the channel with red=entropy threshold, yellow=amplitude threshold and black=display contrast. Note that changes in those sliders position takes an effect for the next sound.

graphic

The metric system The metric system
The metric system
Previous Top Next


The metric system


An FFT data window, or frame, is a short (~10ms) interval of sound, which is the unit of multi taper spectral analysis. The spectral structure of each frame is summarized by measurements of five features: Pitch, FM, AM, Wiener entropy, and goodness of pitch. Each of these features has different units and different statistical distributions in the population of songs studied. To arrive at an overall score of similarity, we transformed the units for each feature to units of statistical distances. One can transform the units of pitch, for example, from Hz to units of standard deviation. Instead of SD we use a similar (and sometimes better) measure of deviation called MAD (median absolute deviation from the mean). We can then compute Euclidean distances across all features.  A similar procedure can be used to compare larger units of time, which we shall call intervals. SA+ uses two methods to estimate Euclidean distances across intervals.


Euclidean distances across mean values: given two intervals, A and B, we first calculate the mean feature values for each feature, and then compute Euclidean distances across the mean features, just as we would have done for a single frame. For example, consider two intervals of 3 frames in each, and (for simplicity) we shall consider only a single feature: A=[10, 20 ,30 ] ; B=[30, 20,10]. We first average across frames, which gives graphicand obviously, the Euclidian distance is 0. That is, this approach looks at the overall interval, allowing local differences to cancel each other.
 
Euclidean distances across time courses: given two intervals, A and B, we compute Euclidean distances across pairs of features, A1 against B1, A2 against B2, and so forth. We then calculate the mean Euclidean distance across all pairs. Now consider the same example: A=[10, 20 ,30 ] ; B=[30, 20,10], the Euclidian difference will be graphic= 28.3 MADs.


As shown, when we compared single frames, it is not unlikely to obtain short, or even zero distances, but comparing time series, a distance of zero requires that all the pairs of distances are zero. Hence, when examining the cumulative distribution of Euclidean distances across the two methods in a large sample of sounds, the two methods give different results:
Cumulative distribution of mean values
Cumulative distribution of time courses
graphicgraphic




















This difference has a very practical implication when comparing songs: the time course approach is good for detecting similarity between two sequences of features that show similar curves of feature values. Note that moving an interval even by a single frame changes the entire frame of comparison. By comparing all possible pairs of intervals between two sounds, we can detect the rare pairs of intervals where the sequential match between all (or most) frames is high. Euclidean distance across mean values achieves exactly the opposite: dependency between neighboring intervals is high and we are looking for high similarity between distributions regardless of the short scale differences. 
Note: The difference between those approaches applies also to other SA modules: for example, the syllable table is based on mean and variance feature values calculated for each syllable, and hence all the table-based methods (DVD maps, cluster analysis) are based on Euclidean distances across mean values. Therefore, when we identify a nice cluster of syllables, we should not assume that similarity measurements based on the Euclidean distances across time series will show high similarity across members of the cluster. In fact, current findings suggest to us that birds stabilize the overall (mean) values of syllable features at a time when the frame-to-frame feature values are dissimilar across syllables.
  

The NIDAQ table The NIDAQ table
The NIDAQ table
Previous Top Next

The NIDAQ table: This automatically generated table maintain the identity of the National Instrument detectors and effectors. If you are using operant playbacks, the configuration of the National Instruments ports and channels (for more information, see the installation and SAP recorder chapters. 
graphic

The Raw Features Table The Raw Features Table
The Raw Features Table
Previous Top Next



The Raw Features Table


graphic
SAP2 performs spectral analysis and calculate acoustic (and other) features across partially overlapping spectral (FFT) windows. By default those windows are 9.27ms long and we advance the window 1ms in each step. Namely, we start from calculating features in the first interval of the sound file (1-9.27ms), then we calculate features in the interval 2-10.27, 3-11.27, and so forth. The raw features table presents those raw features as records. Let's make an example: open SAP2, click “Explore & Score” and then in at the bottom right check “save raw features”. In the popup window type “raw_features1” and click Ok. Now click “open sound” and open the sound “example1.wav”. This sound is about 820ms long. Next open the MySQL Control Center (if already open, right click the database SAP and click “refresh”). The new table you just created,  raw_features1, should show up in the list of tables in SAP and it should include 812 records, one for each 1ms in the file excluding edges (note that the number of records is determined by the “advance window” not by the FFT window size!).  Double click this table and you should see a display like this:

graphic
 
Scroll down a bit:

graphic

Note that 'time' and 'file_index' fields have a little key above them. Those fields are the 'primary key' of the raw feature table. Primary keys must be unique, namely the table cannot include duplicate of the same time within the same file_index (each file_index identifies a wave file, and indeed, the same time should not occur twice within a file). The primary key is an index used by MySQL to sort the table and to accelerate access to different locations in the table.

We will now go through the fields:

time display the time of day in units of 0.1milliseconds elapsed since midnight. For example, the first number 391,740,010 can be divided by 860,400,000 (the number of 0.1milliseconds in each day) to obtain the fraction of the day elapsed = 0.45. So in units of hours (24 x 0.45) the time is 10.8 (10:48AM). In other words, we are saying that this sound was recorded at 10:48AM - how can we tell? In this case (the default mode of the “Explore & Score” module) the time was retrieved by the Windows API utility FileAge(). FileAge() has the advantage of returning the time of when the original file was created. It remains unchanged in any copy of the original file, which is nice. There are two issues you should keep in mind regarding FileAge():

1.    If your original data are not wave files (e.g., mat files) then the generated wave files will give meaningless time stamp. In such cases, the solution is to generate file names of appropriate annotation (see section 6b) and then instruct SAP2 to extract “time” from the file name. To manipulate SAP2 method of extracting time information from wave files go to options -> input & output settings, and check the appropriate “extract time stamp” option.

2.    The Windows time stamp is only about 2 seconds accurate. That is, our time estimate can be very accurate within each file, but across files we have considerable time jitter. In the SAP2 recorder, we overcame this limitation by implementing an accurate millisecond counter. The accurate milliseconds count is then displayed in the recorder generated file names, and then the sound processing live uses this information to use raw features table of 1ms accuracy across files. Note that raw feature tables generated using the new Recorder are indistinguishable in their time format from raw features tables generated using Explore & Score or Batch - it is the user responsibility to keep track of the cross-file accuracy of the time field in any given data. For example, all the song development data generated in our lab were, unfortunately, created prior to SAP2 and therefore, our cross-file time field is of 2 second accuracy and there is nothing we can do about it. 

Note that the raw features table does not contain any information the date but only the time of day. The file_index field (see below) allows you to retrieve the information if needed, however, in most cases you will not need to: When the raw features table is generated in “live recording”, SAP2 creates one raw features table in each day for each bird. The (automatically generated) name of the daily table (e.g., b109_76) will tell you the identity of the bird and its age (bird 109 on day 76 post hatch). 

file_index field points to an automatically generated File Table (see below), which provides several details about the wave file, so as to make easy to link the features at  the sound data.

The features fields: In order to minimize storage space (Raw Features Tables are huge) and decrease computation time (for some features) we encoded some of the features to units that allow efficient storage and quick calculations. Here is a list of those decoding together with decoding (namely, the procedure that will transform the features back to their appropriate units):
  
Feature
Original units
Raw Features units
Decoding
Amplitude
Db
Db
None
Mean Frequency Amplitude
Db
Db
None
Pitch
Hz
Hz
None
Mean Frequency
Hz
Hz
None
FM
Degrees (00-900)
Degrees x 10
/10
AM
1/t
1/t x 100
/100
Goodness of pitch
None
None
None
Wiener entropy
None
x 100
/100
Principal frequency
Hz
Hz/43-120
+120 and then x 43
Persistent frequency
Hz
Hz/43-120
+120 and then x 43
Slope
Hz/ms
Hz/ms - 120
+120
Continuity over time
milliseconds
Milliseconds x 100
/100
Continuity over frequency
Hz
Hz x 100
/100


 A wave file of, say, 10s contains 441,000 samples of sound (sampling rate is 44,100Hz). Each sample is a 16 bit number. When we analyze this file and save raw features table the number of records is only 10,000 (this is the number of milliseconds of sound we analyzed). However, each record contains several numbers, and therefore keeping the number of bits per record reasonably low can save much storage space. The field types we chose, in addition to some simple encoding of units reducing the size of the raw feature tables to about one third of the raw data, as described below.   

The settings table The settings table
The settings table
Previous Top Next

The settings table

graphic

This is an automatically generated input/output table that in most cases you will not need to access directly, but when encountering “strange behavior” of SAP2 it might become useful to know something about it. The Settings replaces the “birds” table used in previous versions of SAP. There is only one “settings” table. As shown below, this table has a major role in SAP2 “remembering” its previous state upon restart. However, not all the states are captured in Settings - the state of the recording channels (of the recorder) are stored in a table called Channels, and the settings of keys and other digital devices (if present) are stored in the NIDAQ table. Settings table is a 12 records: the first is used to store the settings of single analysis modules such as the Explore & Score and Feature Batch. The other 11 records are used to store multiple analysis modules as in Sound Live
 The fields (entry types) and only 12 records (one record per recording channel). The first record is used as the default state for all the modules except for the live modules (you may think about this as 'channel 0'). The other 11 records are used for the real recording and processing “channels” of the recorder and of the live processing modules. Settings table communicates with all the SAP2 modules and maintaining birds IDs, ages, recording and segmentation parameters, playbacks, etc.
  

The Similarity table The Similarity table
The Similarity table 
Previous Top Next

The Similarity Table


graphic

Similarity table contains the results of similarity measurements. It's fields include the name of the files that contains the sound compared, the starting point of each comparison, and the results of similarity measurements using the three measures: similarity, accuracy, and sequential match. In addition, it contains fields to save the KS statistic and period of repletion (those are not always calculated).
  

The Syllable Table The Syllable Table
The Syllable Table
Previous Top Next


The Syllable Table

graphic



Syllable table is perhaps the most useful table the SAP2 generates (for example, see DVD maps). What makes the syllable table so useful is that it captures much of the dynamics of vocal changes at the syllabic level during an entire development or during an entire experiment and have it all in a single file. It allows you to cluster syllables to type, watch how syllable types 'move' in their feature space until model match is achieved, and it provides an extensive summary of different features of different syllables, namely summarizing their structure. In fact, you already created a syllable table in the previous chapter and exported it to Excel. Let's now repeat this exercise with a few variations: Open SAP2 Explore & Score. Under the 'data management' title click 'new table' and create the table 'syll1'. Open the sound 'example 1' and then select the 'Auto segment & save all syllables'. Outline the song as shown below:
graphic














Note that the number of records in the table is now 4, and as you can see, this is not a particularly robust segmentation (see red arrow). We will discuss the issue of sound segmentation later on, for now just keep in mind that song segmentation is not a trivial task. Next open the sound 'example 2' and again outline it. You should now have 9 records in the syll1 table. Let's look at them in the MySQL Control Center. Because syllable tables contains several fields we capture the table in 3 snapshots:

graphic

graphic

graphic

recnum is the global index (and the primary key) of the syllable table. SAP2 uses recnum to retrieve data epochs from the table, e.g., when displaying a DVD map. Also, the recnum tells you in what order the syllables were originally inserted into the table (and this order should match the order of serial number (the date & time 'age' stamp) of each syllable.

serial_number is the date & time stamp of each syllable, and it is identical to the age of the sound (wave) file that the syllable belongs to. As in the file name encoding, the integer number is the number of days elapsed since 1900. The fraction, however, is not the milliseconds elapsed since midnight but the fraction of the day elapsed (both numbers are given by the 'FileAge Windows function).
 
bird_ID is a number that identifies the bird. In Explore & Score it is zero by default, but in the Live mode, it corresponds to the bird's name.  In Batch mode, user is always used to provide a bird ID.

start_on is the time (in milliseconds) within the sound file where the syllable begin. It is an important feature that allows SAP2 (or you) to accurately identify the syllable in the raw data. For example, SAP2 can performe batch similarity measurements within and across syllable types, automatically retrieving sound files and outlining a specific syllable according to start_on and duration.

The features first and second order statistics fields are duration, mean features, minimum and maximum number of features, as well as the variance of features.

Date & time fields: albeit redundant, it is often convenient to have date and time (month, day of month, hour, etc) represented separately in fields, so as to allow simply querying of data subsets.

file_name is the name of the sound file that the syllable belongs to.
  

the tables structure the tables structure
the tables structure
Previous Top Next

An example of table structure

Here is a summary of the features, as they appear in in a raw features table, including neuronal features (in red):
feature
description
 time     
time elapsed since midnight in 0.1ms units
 file_index
an index to the wave file (lined to the birds file table
 amplitude
the momentary sound amplitude in Db
 mean_frequency_amp
the amplitude at the 500Hz range around mean frequency
 pitch
the momentary pitch
 mean_frequency
the center of gravity of the power spectrum
 FM 
frequency modulation (0-90 degrees)
 am 
amplutude modulation
 goodness
the goodness of pitch (cepstrum peak value)
 entropy
Wiener entropy
 principal_frequency
The longest frequency contour
 slope
The mean slope of contours
 continuity_t
continuity of contours over time
 continuity_f
continuity of contours over frequency
 s1_amplitude
The squared amplitude of the neural signal
 s1_entropy
The Wiener entropy of the neuronal power spectrum
 s1_PeakFr
The peak frequency of the neural power spectrum
 s1_PeakFr_power
The power at peach frequency (500-8000Hz)
 s1_highpass_power
The amplitude of the high-passed ( 500-8000Hz)
…..
……
 s4_highpass_power
 
SAP2 can perform several different tasks and our database structure is shared across those tasks.
  

The Tasks table The Tasks table
The Tasks table
Previous Top Next

The Tasks table stores information about playbacks tasks (not regular key-peck tables).
  

The Wave Files

The Wave Files
 
All the raw data generated by SAP2 are saved as wave files of the following file-naming format:

graphic


When multiple channels are recorded (in a master-slave mode), e.g., when recording electrophysiology data, the master channel (typically the sound) has the name annotation showed above, and the slave channels will have the same name as the master, but starts with Sn~ where n is the channel number (e.g., S3~109_39013.... is the third slave channel of this bird, in this case an electrode located in RA).

When SAP2 records & processes data from a certain animal (life recording) it identifies vocalization events (e.g., a song bout) and save it into a wave file. Over a few days of recording, the number of these files might reach tens of thousands. Files are automatically organized into daily folders. The name of each folder is the age of the bird, e.g., the file is from bird 111 when 62 days old. The age is calculated by subtracting the age of the file from the hatch date of the bird (hatching date is automatically retrieved from the Settings table ).

Having many, relatively short files could be difficult to maintain, but SAP2 provides, in addition to the folder and name annotation, an important utility called the File Table. The File table contains information about all the files recorded for a certain bird, and it can be used to query an arbitrary subset of the data - e.g., the first 100 files of the morning. This is why it is convenient to keep the SAP2 raw data in small packages.

The Bouts table The Bouts table
The Bouts table
Previous Top Next

The Bouts table contains the duration of the longest bout in each recorded file. It is used in the recorder self-feedback training, so as to pick playbacks form a recent file which contains the longest song bout (so as to avoid playing back too much noise. Fields include:
recnum: the record number (auto-increment)
bird_name
bout_duration: in milliseconds
file: include both file name and folder

time course versus mean values time course versus mean values
time course versus mean values
Previous Top Next


An example of time-course versus mean values comparison

Using the same example sounds (examples 1 and 2) outline sections as shown below and score asymmetrically, first using the time course method and then using the mean-values method.  The results should look like this:

graphicgraphicgraphicgraphicgraphicgraphic
Below each comparison we also show the sections (click graphic). Note that although we did not change the threshold p-value, the sections are very different in shape and size. The time-course method gives narrower sections that capture the sequential (diagonal) match, whereas the mean-values method shows big blobs of similarity. In both cases we see that the top section is surrounded by a red rectangle, indicating that the final section is identical to the original similarity section. This is not the case with the bottom section. You can see how the original section (blue) was modified (red) so as to trim the redundancy with the top section -- which is also the superior one. It is superior in the sense that it explains more similarity even though its local score is lower. This is because multiplying the duration of the top section by its local score gives higher similarity than that of the lower section. That is, SA+ takes into account the overall similarity explained by each section. Note also, that although the shape of the sections is very different, the oblique cuts through the sections are similar across methods, hence the overall score and the partial scores are very similar. You should examine the matrix shown in the combo button quite often, since it will help you understand what’s going on ‘behind the scenes’ of the similarity measurements. The white cross shown on top of some of the blue rectangles signifies that this section was excluded because of redundancy.

title
Previous Top Next


A bird view of Sound Analysis Pro 2

graphic

Sound Analysis Pro II (SAP2) is a software & hardware system designed specifically to manage the acquisition and analysis of animal vocalization. SAP2 eliminates much of the efforts involved in maintaining long-term vocal learning experiments, allowing automated acquisition and analysis of large amounts of sound data, scheduled training and on-line monitoring of behavior of several animals simultaneously.  It is an open code freeware with several options and extensions that can be implemented. The data acquisition component continuously monitors sounds and perform online sound analysis to recognize and record sounds of specific category (e.g., birdsongs). The training component performs a fully automated operant-training with song playbacks and provides on-line summary of vocal changes. By integrating four engines: recording-control, training-control, sound-analysis and database, SAP2 can record and analyze all (and little but) the relevant the data during a prolonged period (e.g., throughout vocal development of a bird). SAP2 integrates online and offline analysis methods on a large amount of sound data, handling millions of sounds and summarizing them into simple graphs, histograms and movie clips. Sound similarity measurements are now more reliable and faster with a few alternative methods to fit specific tasks. Finally, although we provide no formal technical support or liability, we made SAP2 an open code, GNU General Public License freeware, in order to encourage uses to actively participate in this project, to eventually develop standards that will enhance cross-lab studies. Hence, users are strongly encouraged to contact us when encountering problems (ofer@sci.ccny.cuny.edu) and we will make efforts to respond and solve the problem as much as we can.



Feature summary of SAP2:
SAP2 can be installed in any Windows XP computer and should work in most Windows 2000 computers. Other operating systems are not supported. Hardware requirements depend on the application (e.g., you will need multi channel sound card (or a National Instruments analog card) to perform multi channel recording, see details in the installation section). Here and in the rest of this manual, the SAP2 features are presented in the order of data-flow, namely, from the recording and training setup through real-time and nearly-online sound processing to offline analysis, similarity measurements and descriptive models of the sound data. Each one of the modules described below is self-contained and can be used regardless of (and in parallel with) the other modules. Beyond the scenes, however, modules interact with each other and share common resources. In most cases, running a few version of SAP2 in parallel will not cause problems.


graphic1. Sound Analysis Recorder The recorder performs multi-channel (up to 10, in the current version) triggered recording, continuous recording, and master-slave synchronized recording of sound (or other time varying signal), monitoring of other behaviors (such as pecking on a key, or movement) and training with sound playbacks. It manages 10 input and four output audio channels simultaneously (and can be extended to handle an arbitrary number of channels* ). It analyzes the sound signal of each channel in nearly real-time and performs the first-pass filter on the sound data to discard long silence intervals and some types of cage noise. It then transfers sound data to wave files that contain sound intervals that are likely to include animal vocalization. Those files are immediately captured by the Live Sound Analysis or by the Live Sound & Brain module (see below), which performs the MT analysis and feature extraction (see below). The SA Recorder includes a fully automated operant training system that continuously interacts with the bird, monitors its behavior (e.g., when the bird pecks on a key), and responds with an appropriate playback or by activating peripheral devices when so indicated. Training regimen is fully automated and adjustable, including:
·     automated onset and termination of training on specific dates,
·    alternating song models,
·    setting daily quotas of playbacks and
·    saving the training results into the database.

There are no special hardware requirements for recording (except for a sound card with an appropriate number of channels), but for the operant-training setup you will need a low-cost (about $170) digital I/O card. The training system can be extended to include delayed auditory feedback or for the sound-activation of devices such as light bulb, fan, or any other on/off gadget*. Detailed instructions on how to build a training system from scratch are provided in chapter 2A



graphic 2. Live Sound Analysis This module is the analysis companion of the Sound Analysis Recorder. It processes sound data that passed the recorder thresholds and performs online multi-taper (MT) spectral analysis followed by calculation of acoustic features and by segmentation to syllables and bout units. The graphic user interface (GUI) makes it easy for the user to design a scheme of parameter settings that captures the kinds of sounds that are of interest and record them (e.g., to save and process zebra finch songs but not single calls or cage noise). To achieve this, we designed a 3-stage decision process:
·    Animal-voice recognition
·    Segmentation to syllables
·    Analysis of temporal structure of syllables.
All the calculations are translated into a simple graphic representation, displayed in nearly real-time, so that the user can test how any parameter settings affect the outcome of each phase recording session. Once SAP2 'decided' that sound data should be saved, it can save data of three types (we recommend that you save them all):
·    The raw wave file containing the sound,
·    A millisecond by millisecond acoustic features table (ms table)
·    A syllable table containing a set of features that summarizes the acoustic structure of each segmented syllable (e.g. its duration, mean pitch, mean FM, etc.). 
SAP2 keeps data organized not only in the tables, but also by using consistent file annotation template including animal ID, serial number, date and time of recording. Data can be saved either to local hard disks or through the network to any other network-accessible PC. To make data backup easier, files are automatically arranged into data folders of appropriate capacity, .e.g., of daily folders. 



graphic 3. Live Sound & Brain
This is an alternative companion of the Sound Analysis Recorder, designed for experiments where brain or peripheral data are collected in synch to each other and to vocalization recording. In most cases, the top channel of the Sound Analysis Recorder is set to be a 'master' and all other channels are slaved to it, such that the master channel (typically the one that record sound), pre-trigger synchronous recording in all channels, keeping them fully synchronized to the master channel and to each other. Then, the Live Sound & Brain module analyze the master channel as a regular sound channel, calculating pitch, FM, Wiener entropy, etc, whereas all other channels are analyzed as specified by the user (e.g., rectified amplitude, etc). All features, those of the master and those of the slaves are saved as a single record into the millisecond data table, and also to the syllable table. Raw channel data are saved as wave files, separately for each channel. This architecture provides the user with maximum flexibility in further analyzing the data across channels using other software packages (e.g., using Chronoux in Matlab and mySQL Matlab interface).  



graphic4. Explore & Score     This module is used to explore the features of sounds, segment them (manually or automatically), perform a variety of measurements, explore feature space, as well as score the similarity between sounds. There are several improvements in this module. The database management and exporting of data directly to Excel or to Matlab are similar, but more developed than those of other modules. Similarity measurements have been improved. We have provided alternative methods for scoring similarity. All the results of the similarity measurements are saved into similarity tables. New and revised 'frequency contours' based acoustic features include contours FM, principal frequency (longest frequency contours), continuity over time, and continuity over frequency.


graphic
5.       5. Feature & similairty batch  The song data of even a single bird can easily accumulate to several gigabytes of sound. The SAP2 approach is to analyze these in nearly real-time. However, it is often desired to reanalyze the data, or to analyze a large amount of existing data (e.g., data collected using software such as  AviSoft or Raven). The features batch can analyze a very large amount to sound data with the following options: it can be used to:

·    Sort sound files according to content,
·    Calculate acoustic features and save them into ms data tables
·    Segment the sound into syllable units and save syllable features to a syllable table.
·    Once ms data tables have been computed, they can be used instead of the sound files to re-segment the sound based on a different set of criteria, or to perform similarity measurements.
The advantage of the ms data tables is storage gain (by a factor of 3) and speed gain (by a factor of up to 30).  This can allow the user to explore many segmentation methods and examine alternative Dynamic Vocal Development maps (see below) based on alternative segmentation methods.  The similarity batch can be used to perform a large set of similarity measurements. It supports two batch modes: one is for comparing ordered pairs of sounds, and the other is for comparing matrixes (M x N) of sounds. 


6.        graphic6. Dynamic Vocal Development maps  As summarized above, SAP2 automatically generates and updates a syllable-table for each bird, which summarizes every song syllables produced during vocal development (in a zebra finch, it is typically 1-2 million syllables). Obviously there is a lot of information in those syllable tables. To make this information easily accessible we developed a descriptive model called the Dynamic Vocal Development (DVD) map. DVD maps are presented as movie clips showing how syllable features change during song learning (or as a result of experimental manipulation). In the adult bird, the distribution of syllable structure is highly clustered, and the DVD maps show how these clusters (syllable types) come about. We developed several types of such maps to show different aspects of song development including syntax, circadian factors, and cross time-scales vocal changes. The different modes of DVD maps use shape, color and even sound-clicks to represent different aspects of song structure. Importantly, DVD maps can be played in nearly real-time, so that you can see a vocal change as it occurs. We believe that the DVD maps are the most important feature of SAP2.


7.   graphic7. Clustering   Clustering is used to detect syllable types and to automatically trace vocal changes during song development. We are still in the process of developing appropriate methods. As a temporary solution we implemented a nearest-neighbor hierarchal clustering method into an extensive graphic user interface including a display of clusters in color code, assessment of residuals, and an account of the number of members in each syllable type. The procedure performs the cluster analysis recursively, throughout song development. It provides online visual assessment of the outcome in each stage of analysis. The results of the clustering are automatically registered into the syllable table, so that as you do the cluster analysis you can play DVD maps, and ensure, by inspecting the color-code of each cluster, that the tracing procedure is indeed 'locked on the target'. The tracing of each syllable type progresses from the mature song and back until the clustering procedure fails to trace the syllable type. As long as a cluster can be traced, it is easy to detect vocal changes that occur as the feature of a cluster approaches the final (target) state. Cluster Analysis is therefore a formal way of analyzing (and parameterizing) the DVD map. The user can select alternative features set for clustering or impose different constraints on the procedure, so as to achieve stable and reproducible results even in difficult cases.   


8.       graphic8. Database   All of the SAP2 output is managed by the mySQL database engine, which is included in this package together with the mySQL control center. SQL is a simple, industry-standard language for querying databases. It is used extensively behind the scene of many SA+ functions, e.g., when you are playing a DVD-map. You can type SQL commands to set criteria for selecting and manipulating data in the syllable-tables and similarity-tables generated by SA+. Flexibility of filtering data becomes really important in tables that include millions of records. SAP2 provides simple procedures for filtering tables and for exporting syllable-tables and similarity-tables to Matlab and to Excel. 


9.   graphicOnline help system    SAP2 now includes many functions and to make them easily accessible to the user. We implemented a goal-oriented modulation of the graphic user interface, including a hierarchal setting of 9 modules, 40 windows and over 1200 gadgets (buttons, sliders, images) to keep each procedure simple and intuitive to the user.  In several windows we included a set of instructions that will help you perform procedures in an orderly and appropriate manner. In other cases we included question-mark '?' buttons providing specific information that might help the user solve a problem without referring to the user manual. We also included warning '!' massages near buttons that can cause trouble if not properly used.


* Extensions can be ordered directly from our programmer David Swigger (dswigger@yahoo.com). You will be asked to pay for some or all of the development cost, depending on priorities and funding availability. Alternatively, using the provided source code, you may also approach other programmers, but make sure that the programmers are aware that the source code is copyrighted and cannot be used for profit (i.e., selling the extension is forbidden). In addition, we request that you to share with us any extension of the provided code.

recorder features recorder features
Feature summary of the SAPII recorder
Previous Top Next

The recorder features


Start SA+ and click the Live Analysis button.

You should see a screen similar to the one below:
graphic

In this chapter we will only explain the Sound Analysis Recorder (left window) functions, whereas the Sound Processing Live is documented in the next chapter.
 
Warning: do not click “Record” before going through the setup!

First, familiarize yourself with the main window of the recorder. It gives you an online view of the recording functions of four independent channels. You can see the sound waves in the oscilloscope, the key-pecking activity of the bird in the training box representation, and you can control and test the automated recording trigger as well as the onset of training. Do not change any settings yet.
 The Main window includes a control panel and 10 channels. Each channel includes input display, input control, and playbacks control as shown below: 
  

Batch Setting step 1 Batch Setting step 1
Topic 3
Previous Top Next

Step 1: Setting the Input files:
Open Sound Analysis and click “Features batch.                                         Next ->
 
If the files you want to process are in a single folder select the appropriate drive and click on the folder tree display (left panel) so as to navigate to the appropriate folder. Double click on the folder so as to open it, and all the wave files should appear on the right panel as shown below. The batch goes from top to bottom so clicking on any one file will make the batch skip all the files listed above it.

graphic

If the files you want to process are located in multiple folders, click
graphic

Note: you can play the selected sound by clicking graphic
which will play sounds one by one down the wave files list.

Opening the master folder (which contain the subfolder with the wave file will show:
graphic

Now the master folder is on the left panel, top right panel shows the subfolders list and the bottom right panel show the wave files of the outlined folder. Clicking on any of the listed subfolders should show in the bottom panel a list of  the wave files that are contained in that subfolder. Note that when starting the batch, it will progress starting from the highlighted subfolder and wave files downwards. For example, if you select the subfolder named 45 (when the bird is 45 days old) as in the example above, the batch would not process the files included in the subfolder 27.
Note: if you want  to process only the selected subfolder, uncheck graphic

 Next ->

Viewing Feature Summaries of Individual Syllables

Feature across an Interval

graphic


Until now we have only presented the calculation of acoustic features within ‘units’ of our frequency analysis, i.e. within 10ms time-windows. However, it is often desired to examine the distribution of features across time-windows, so as to characterize natural sound units, such as notes, syllables, motifs and bouts.
graphic


Manual segmentation: Open ‘example 2.wave’ and choose
graphic

(bottom right of the screen). Then click at the beginning of a syllable. Point to the end of the syllable and click again. The red and blue vertical lines now outline the syllable. The feature distribution of this syllable is now summarized in the ‘features across interval’ window. Now say that you want to save this information about the syllable features. SA+ saves all information into MySQL tables. To set a new table for your data click ‘new table’ and call it myData. Click on the text box below and type ‘syllable 1’ and then click ‘add record’.


Note that there is now 1 record in myData. Let’s say that you want to save the features of all the syllables of this song. Type ‘all syllables’ in the text box, and click ‘Add record’. The feature distribution summary of these syllables is now saved in our database. You will see how to access this information in a moment.

Automatic segmentation and saving of multiple syllables: Open ‘example 2.wave’ and check
graphic


Take a look at the ‘records in table’ label (under ‘data management’), and remember the number of syllables shown. Now outline the entire song by clicking once to eliminate the previous outlining, and then click in the beginning and in the end of the song. Note that there are now 5 additional records in the table although you did not click ‘add record’. This is because all 5 syllables were added at once with their appropriate feature distribution when you outlined the song.


Auto-gegment a single syllable: Auto-outline is a method of automatically (and accurately) outlining the boundaries of a single syllable.

Select
graphic


and click twice in the middle of a syllable. The first click will outline the beginning and the second the end of this syllable according to its boundaries, as determined by the amplitude and entropy thresholds. <


Adjusting the average pitch estimate: Calculation of mean and variance values is trivial, except for pitch, where you can choose between simple average or (look at the ‘Settings & options’) tab

graphic


method. Adjusting pitch by its goodness can improve the stability of the estimated pitch average. As shown below, pitch values sometime ‘jump’ as a consequence of the method of pitch estimate, which dynamically flips between harmonic pitch and mean-frequency methods. The undesired effect of an unstable pitch estimate can be reduced by adjusting the pitch average by its goodness, so that pitch estimates are endowed with a higher weight when the goodness of pitch is higher. Using this approach in the example below, we obtain an average pitch estimate of 731Hz, only 44Hz higher than the real harmonic pitch, whereas the simple average increases the error seven-fold to 315Hz.
graphic

Viewing the features

Viewing the features

Locate the 'view features' panel, have the group graphic button positioned up and click feature by feature: Viewing feature values
You can view the value of the features by looking at the 'features at pointer' display. Move the pointer around the spectral image and observe the changing values of pitch, FM, as well as song time and frequency at the current pointer position. Raw features are saved into binary files that can be easily opened in Matlab using a script provided in Appendix II.

Why should you learn some Standard Query Language (SQL) Why should you learn some Standard Query Language (SQL)
Why should you learn some Standard Query Language (SQL)
Previous Top Next

Why should you learn some Standard Query Language (SQL)

Standard Query Language (SQL) will allow you to retrieve and manipulate SAP2 data easily, and very quickly. MySQL is one of the most standard implementation of SQL and it is (practically) free. Since MySQL is used heavily in SAP2, it is not a bad idea to know just enough SQL to be able to manipulate the tables and export data efficiently to other applications. This is particularly important because the SAP2 tables often contain large quantity (even millions) of records. There are very few applications that can efficiently manage large amount of automatically generated data. Therefore, you should look at SQL as an efficient data pump between the tables and other applications. For example, Matlab does not have any (respectable) database functionality, and it stores all variables in memory. It is therefore inefficient to import an entire table to Matlab, but you can use SQL to pump data epochs into Matlab, calculate something and move to the next epoch. Furthermore, those data epochs can be very flexible in content. For example, obtaining epochs containing the pitch of the first 100 syllables produced at around 9AM is a very simple command:

                   select pitch from my_table where hour=9 limit 100

as you can see, it is very simple and intuitive to write SQL queries.

SQL is used not only to retrieve but also to process data from your tables, but here we will only teach you how to select data from a table, how to copy data into a new table, how to change table values, and how to merge tables together.

  

why use features?

Introduction

We now take a deeper look into the acoustic features and the measures we derive from them. The first step of the analysis is to reduce the sound spectrograph to four simple features. All the analysis from this stage and on is based on those four features – the features replace the sonogram. 4a. Why features?

Many of the previous attempts to automate the analysis of sound similarity used a sound-spectrographic cross-correlation as a way to measure the similarity between syllables: correlation between the spectrograms of the two notes was examined by sliding one note on top of the other and choosing the best match (the correlation peak). However, measures based on the full spectrogram suffer from a fundamental weakness: the high dimensionality of the basic features.  For example, cross-correlations between songs can be useful if the song is first partitioned into its notes and if the notes compared are simple. But even in this case, mismatch of a single feature can reduce the correlation to baseline level. For example, a moderate difference between the fundamental frequencies of two complex sounds that are otherwise very similar would prevent us from overlapping their spectrogram images (a vertical translation will not help since the harmonics won’t match).

The cross-correlation approach, as mentioned above, requires, as a first step, that a song be partitioned into its component notes or syllables.  This, in itself, can be a problem. Partitioning a song into syllables or notes is relatively straightforward in a species such as the canary in which syllables are always preceded and followed by a silent interval. Partitioning a song into syllables is more difficult in the zebra finch, whose song includes many changes in frequency modulation and in which diverse sounds often follow each other without intervening silent intervals.  Thus, the problems of partitioning sounds into their component notes and then dealing with the complex acoustic structure of these notes compound each other. The analytic approach of Sound Analysis addresses both of the above difficulties.  It achieves this by reducing complex sounds to an array of simple features and by implementing an algorithm that does not require that a song be partitioned into its component notes. 

Wiener entropy

Wiener Entropy
 
Formal definition: Wiener entropy is a pure number defined as the ratio of geometric mean to arithmetic mean of the spectrum:




Wiener entropy is a measure of the width and uniformity of the power spectrum. Noise is typically broadband with sound energy smeared rather smoothly within the noise range, whereas animal sounds, even when multi-harmonic is less uniform in their frequency structure. Wiener entropy is a pure number, that is, it does not have units. On a scale of 0-1, white noise has an entropy value of 1 and complete order, and a pure tone has an entropy value of 0.

To expand the dynamic range, the Wiener entropy is measured on a logarithmic scale, ranging from 0 to minus infinity (white noise: log1=0; complete order: log0=minus infinity). The Wiener entropy of a multi-harmonic sound depends on the distribution of the power spectrum:

graphic

a narrow power spectrum (the extreme of this is a pure tone) has a large, negative Wiener entropy value; a broad power spectrum has a Wiener entropy value that approaches zero.  The amplitude of the sound does not affect its Wiener entropy value, which remains virtually unchanged when the distance between the bird and microphone fluctuates during recording. Yet, the entropy time series (or curve) of a song motif is negatively correlated with its amplitude time series. This is because noisy sounds tend to have less energy than tonal sounds.  A similar phenomenon has also been observed in human speech, where unvoiced phonemes have low amplitude.

Wiener entropy may also correlate with the dynamic state of the syringeal sound generator, which shifts between harmonic vibrations and chaotic states. Such transitions may be among the most primitive features of song production and maybe of song imitation.

Chapter 4: The Song Features

Chapter 4: The Song Features

Chapter 5: Exploring the Song Features

Chapter 5: Exploring the Song Features

Chapter 6: your animals

Chapter 6: Your Animals

Chapter 2: Installation

Chapter 2: Installation

Chapter 1: overview of sap2

Chapter 1: Overview of SAP2

highlights

Highlights

Chapter 7: sap data structure

Chapter 7: Sound Analysis Data Structure

Chapter 8: Mysql in a nutshell

Chapter 8: MySQL in a Nutshell

Chapter 9: The SAP2 Recorder

Chapter 9: The SAP2 Recorder

Chapter 10A: Sound Processing Live

Chapter 10A: Sound Processing Live

Chapter 10B: Sound & Brain Live

Chapter 10B: Sound & Brain Live

Chapter 11: Features Batch

Chapter 11: Features Batch

Chapter 12: Similarities, Measurements & Batch

Chapter 12: Similarities, Measurements & Batch

Chapter 13: Dynamic Vocal Development(DVD) Map

Chapter 13: Dynamic Vocal Development(DVD) Map

Chapter 14: Clustering Syllables

Chapter 14: Clustering Syllables