why use features?

Introduction to Song Features

We now take a deeper look into the acoustic features and the measures we derive from them. The first step of the analysis is to reduce the sound spectrograph to four simple features. All the analysis from this stage and on is based on those four features – the features replace the sonogram. 4a. Why features?

Many of the previous attempts to automate the analysis of sound similarity used a sound-spectrographic cross-correlation as a way to measure the similarity between syllables: correlation between the spectrograms of the two notes was examined by sliding one note on top of the other and choosing the best match (the correlation peak). However, measures based on the full spectrogram suffer from a fundamental weakness: the high dimensionality of the basic features.  For example, cross-correlations between songs can be useful if the song is first partitioned into its notes and if the notes compared are simple. But even in this case, mismatch of a single feature can reduce the correlation to baseline level. For example, a moderate difference between the fundamental frequencies of two complex sounds that are otherwise very similar would prevent us from overlapping their spectrogram images (a vertical translation will not help since the harmonics won’t match).

The cross-correlation approach, as mentioned above, requires, as a first step, that a song be partitioned into its component notes or syllables.  This, in itself, can be a problem. Partitioning a song into syllables or notes is relatively straightforward in a species such as the canary in which syllables are always preceded and followed by a silent interval. Partitioning a song into syllables is more difficult in the zebra finch, whose song includes many changes in frequency modulation and in which diverse sounds often follow each other without intervening silent intervals.  Thus, the problems of partitioning sounds into their component notes and then dealing with the complex acoustic structure of these notes compound each other. The analytic approach of Sound Analysis addresses both of the above difficulties.  It achieves this by reducing complex sounds to an array of simple features and by implementing an algorithm that does not require that a song be partitioned into its component notes.