Scaling syllable features
Scaling of syllables, based on 'maximum likelihood estimates' is the same as that used for
similarity measurements at the levels of frames and intervals (see in section 8a).
Example case: bird109
Open the clustering module, open the table of bird109 and click 'Analyze'. The result
should look like this:
Only up to 10 (most abundant
clusters) are shown
First and last file in the interval, file
name annotation is useful!
27% of the 3000 syllables
did not pass threshold and
are not clustered
In this case, all pairs that passed
threshold where clustered
Note that this representation is basically the same as a 2D DVD-map, with a default of
duration for the X axis and mean FM for the Y axis. Not all features are used for
clustering:
by default, SA+ uses syllable
duration and the mean values of pitch, Wiener
entropy, FM and goodness of pitch. Feature
units are scaled to MAD also in the display, as
noted, proper scaling is essential for calculating
meaningful Euclidean distances across features.
For every one animal, you might find biases, but
overall, all clusters should live in the
neighborhood of 4 MADs and have a mean
spread between 1-2 MADs (averaged across
features). The colors identify the clusters, but
the initial color identity is determined by the
population of the cluster (how many members). Therefore, the color is not yet identifying
any cluster in the long run (only at the current moment). Shortly, we will discuss the
techniques of marking (identifying) a cluster for tracing. The legend on the left allows
you to pick a cluster or to view some of its properties. For example, the most abundant
cluster is painted red, and you can see near the red legend that this cluster has 606
member syllables. Once you identified a cluster and clicked at the appropriate legend -
you should give the cluster an ID. The ID must be an integer number, type it in the edit
box placed on top of the legend. Once you ID a cluster and start tracing it back, it will
keep its original color (that is, we uncoupled the abundance from the color).


SA+ presents the actual Euclidean distance
cutoff of the most distant pair of syllables
included in the analysis in addition to the threshold. The 'data included' results show the
number of paired syllables that passed that threshold. The upper bound for 3000 syllables
is 3000x3000=9 million. The threshold reduces this number to about 50,000 syllable
pairs that passed. You can reduce it even further by moving the 'data included' slider to
the left and observe the gray display of 'data included' changing
as you go. Now click restart and observe the consequence of
this action on the cutoff. This technique allows you to quickly
test the consequences of changing the cutoff without re-calculating Euclidean distances (which is the time-limiting
step). Note that the table of syllable pairs has still a lot of
redundancy with 50,000 syllables in pairs that are extracted from no more 3000 different
syllables (and often, much less). In fact, looking below the legend will show you that
only about 2500 different syllables passed the threshold. A syllable in a 'crowded area
of feature space' will participate in many pairs, whereas in a sparse area, a syllable might
have no neighbor close enough to join a pair. Also, remember that SA+ only analyzes the
10 largest clusters. If you want to cluster more then 10 types, you can do so exhaustively
as described later. You should be aware that filtering the table (removing clusters) is a
non-linear operation with regards to clustering. That is, the results might change abruptly
with filtering. In practice, this is more often a plus than a minus, since it can turn an
unstable performance into a stable one.


Before we get into the tracing technique, let's explore the different displays that will help
you judge how good the clustering is. Click on the 'all data' tab, then click the 'residuals'
tab, and move back and forth from cluster to both of those displays. As you can see, most
but not all the syllables were clustered.
A careful look at the outcome raises a few questions about the clustering performance.
First, how come that the yellow and green clusters where not joined into a single cluster?
The answer to this question becomes clear when looking at different projections of this
cluster in feature space. Changing the Y axis to pitch shows that the two clusters overlap
in their frequency modulation but not in their pitch.
Duration / Pitch residuals
Note those low-pitch residuals:
cage noise!
Second, what sounds compose the 27% residuals? Looking at the residuals shows that
some belong to 'sparse clouds' that have not been clustered. These 'clouds' are often long
and short calls, which are less stereotyped and less abundant in our recording (the lower
abundance of calls is, in fact, an artifact of our recording method, which intends to
preserve song data and eliminate isolated calls). Other residuals belong to the category
of cage noise - these are often characterized by low-pitch and broad distribution of
duration, as shown in the example above. Finally, some residuals are left-overs of
clusters - these residuals can be reduced by having a more liberal threshold. Similarly,
one can cluster a 'sparse cloud' of data by having a more liberal threshold.
You might ask - how can one decide what should the threshold value be? The answer is
that the ideal threshold value is 'cluster dependent'. If a cluster is very close to another
cluster, having a too liberal threshold will join them. The point is you do not want to try
to cluster all your data at once. Instead, the strategy we implemented is of clustering
types one by one. This requires more work, but it gives you the freedom to set
appropriate conditions, that works well for this particular type of sound.
We will now start the process of tracing back syllables, but first, let us illustrate some of
the problems involved in trace-back. As noted earlier, the major issue is that the nature of
the task changes as we step backwards in song development, as we expect clusters to
eventually fall apart when we reach the beginning of song development. What we are
trying to trace is, in fact, the structure of this 'falling apart' (actually, the 'build up' when
forward-tracking) process. During later stages of song development, we will typically
observe movement of clusters in feature space. This process is easy to trace since we
have a complete recording of ontogeny, and since most features change slowly compared
to the size of time-slices we try to bridge across (typically, 3000 syllables occurs in time
scales of several minutes to a few hours). Even non-linear vocal changes, such as period-doubling, will rarely cause problems since other features of the syllable will remain
stable during this event. During early stages of song development, we often see how
clusters merge - since in almost every bird, different syllable types emerge from a smaller
number of prototype syllables in the process of 'syllable differentiation'. Detecting the
point of transition is a challenging task.
Let's look at two clusters of bird 109, which are shown as yellow and green in the figure
above. We noted that those clusters are close to each other. They have similar FM but
different pitch, and there is also a slight duration difference between them. Move the
'Time control' slider 2/3 to the left and click 'Analyze'. Note, that since we are not back-tracing, SA+ will make no attempt to re-identify clusters, so the colors will change
arbitrarily (according to the number of members in each cluster).
Note that although we stepped several weeks, the two images are similar, and we can see
that the blue cluster is still there, but stained yellow, and the red one has turned blue (and
is somewhat larger). The problem is that the yellow and green clusters have merged - and
are both red now. The question is - is this a false merging, or something that the bird did?
Looking at the raw data (right panel) shows clearly that the clusters are indeed merged.
This example demonstrates some of the difficulties you might encounter while back-tracing - now let's try it.
Pull the Time control slider to the end of song development and click 'Analyze'. We will
start with the easiest cluster - this one:
Now we need to tell SA+ that this is the cluster we want to trace, and we need to give
it a
permanent name. This name will appear in the appropriate records of the database table
as we do the procedure, unless you uncheck the 'write permit' check box (please do
uncheck it!). Since this cluster appeared blue we check the blue radio-button in the
legend, and then on top of the legend we type the permanent cluster name. The cluster
name must be an integer number, and we suggest starting with 1.
Number of members in each
cluster
Now you should see
that the track-back button (top) became
enabled. Click it.
Note that the 'Time control' did not take a step back yet - it only
identified this cluster in the current slice and (if write permitted),
registered it in the table of bird 109 so that each occurrence of this
cluster is marked as 1 (the default value for a cluster is 0). Now
click track-back once more. Note that the Time control has moved a
tiny bit to the left. The new image is from an earlier developmental
time, e.g.,
Now click 'Repeat tracing back' and you will see that tracing back occurs automatically,
step after step, until you click this button again - or - until something bad happens…
Let's try to understand more formally what is happening here. SA+ did clustering and
you
have chosen a cluster to trace, we will call it the reference cluster. When tracing back, SA+ does a similar clustering on a slightly earlier time window. SA+ then computes the
centroid of each cluster (that is, the mean duration of syllables in the cluster, the mean
mean-pitch of syllables in the cluster and so forth. Then, the centroids of each of those
new clusters are compared to the centroid of the reference cluster. The cluster with the
most similar centroid to that of the reference cluster is assumed to be an earlier version of
that cluster - but this is only if it is similar enough to the reference. The default threshold
for this comparison is 0.2 MADs (across all features chosen).
Tracing this cluster should work very well for several weeks of song development, but
eventually, it is doomed to fail.
You will need to define the cluster again - based on its location (change the radio-choice
in the legend to the appropriate color to re-activate the 'trace-back' button). Then keep
tracing back, with some 'playing around' you should be able to trace it back until August
10 or so, which is 3 days after the onset of training.
Try to trace other clusters of bird 109. You will find some of them easy and others more
tricky. For example, this yellow cluster will cause frequent troubles by merging with the
one below it:
To solve such problems, your first line of defense is decreasing the Euclidean distance
threshold, e.g. to 0.01
To check quickly if threshold reduction can solve a problem, click analyze and then click
the 'data included' left-arrow followed by 'restart'.
This approach also fails from time to time - but do not give up - reduce threshold to 0.008
to regain hold, back-trace once and try 0.01 again, and then auto-trace until the next
failure. By the time you approach the beginning of September, tracing this cluster
becomes really difficult.
This experience might (and should) have raised some concerns about the objectivity of
this clustering method. Indeed, one would like to be able to set the parameters once,
rather then keep playing with them. The reality of cluster analysis, however, is that one
often needs to adjust parameters. We suggest you document your adjustments, and also,
try it more than once in such difficult cases. In this particular cluster, the problem is that
the only good distinguishable feature is pitch - all other features of these two clusters
overlap. Trying to distinguish between them can only work for some time, but as the
pitch values approach each other, the mission is turning impossible. Furthermore, you
pay a toll of high percentage of residuals.
The solution is therefore to also cluster the two clusters together, and consider the time
when they are separated as two descendent clusters of a main branch (as in a
dendrogram).
To do this, move the time control to the end of song development, return the threshold to
0.015 and in the features included, uncheck the 'mean pitch'. Check the 'write permit' but
uncheck the 'overwrite clusters'. Now give the joined cluster a different name (say 2) and
click 'analyze' and the clusters will immediately merge. You will see that the two clusters
immediately join, since when pitch is not taken into account, other features they have in
common prevail.
Created using Helpmatic Pro HTML