Background & computational approach
In the previous chapter (DVD-maps) we saw an example of song development where
syllable-feature distribution is broad during early stages, and then feature distribution
becomes clustered. Most clusters (or types) appear to stay, and one can often keep track
of each cluster throughout the rest of the movie. This cluster analysis module performs
the (difficult) task of automatically classifying syllables into types and tracking each type
during time. We use the terms 'type' and 'cluster' interchangeably, but one should keep in
mind that a cluster is not a generic type, but a transient entity that is only defined in one
animal (during a particular time). More often, the term 'syllable type' refer to species-specific sound, sometimes in relation to perceptual categories or other behaviors. The
narrow definition of type has some advantages, however, of being simple, robust and
'analytic'.
At best, the clustering method is as good as the segmentation method is. We have no
doubt that both cluster analysis and segmentation methods presented here are sub-optimal. Aylin Cimenser, a Physics PhD student at Columbia University is now in the
process of optimizing both methods, and we believe that it should be possible to properly
segment and cluster vague sounds that the current technology cannot handle.
The cluster analysis methods used here were implemented by Aylin Cimenser at
Columbia University under the supervision of Partha P Mitra and Tim Halperin Hilly.
The algorithm is based on the Nearest Neighbor Hierarchal Clustering (NNC) method.
This is a non-parametric, density-based method (in contrast Gaussian mixture methods
and K-means, which are parametric). This works nicely when clusters are reasonably
segregated, regardless of their shape, and the back-tracing method is fully automated and
self-checking. We cannot claim, however, that this approach always works. Rarely,
clusters that can be identified visually are not easily identified. Users should keep in
mind that a perfect solution for clustering problems should not be expected. It is no
coincidence that there are many alternative cluster analysis methods in the 'market' - there
is no generic solution to this problem. Namely, one cannot cluster data without making
some strong assumptions about either shape or density of clusters, and those assumptions
are sometimes false. For example, NNC does not handle well cases where some dense
clusters are too close to sparse clusters.
More significantly, in song development, clustering has to be performed by recursive
steps, starting from the end of song development, where identifying clusters is very easy,
and concluding at an early stage of song development, when clustering becomes
increasingly difficult, and finally impossible. The difference between approaches is not if
the procedure will fail - but when.
NNC is a very simple algorithm and we encourage you to understand it:
1. Obtain a sample of a few thousand (say 3,000) syllables produced at a given time
and compute features.
2. For each feature (e.g., syllable duration), calculate Euclidean distances between
all possible (3000 x 3000) pairs of different syllables. Repeat for all features,
scale units and add. This step includes about 50 million multiplications (but only
about 1s).
3. Sort syllable pairs according to Euclidean distances in ascending order and
discard syllable-pairs with a distance greater than a certain (say 0.01 MAD)
threshold.
4. Now the best-best-friends, the syllable pair of shortest distance (nearest-neighbor
pair), are establishing the first cluster.
5. Examine the next pair of nearest neighbors and consider the following scenarios:
n Case A. If both syllables are new (not clustered yet), define another cluster
with this pair.
n Case B. If one of the syllables is already a member of a cluster (remember
that any one syllable can appear in many pairs!), and its pair is not a member
of any cluster, add this pair to that same cluster (that is: a friend of my friend
is my friend!).
n Case C. If one syllable belongs to one cluster, and the second one to a
different cluster - merge the two clusters.
n Case D. If both syllables belong to the same cluster, do nothing.
6. Repeat step 5 for all the remaining pairs (with a distances above threshold)
according to distance order.
In many cases, you will find that the default threshold suffices to properly cluster the
data, but be aware that the choice of threshold is very critical. Setting the threshold too
high will join all the data into a single cluster. The threshold is determined empirically,
and if not optimal you will experience one of two problems: if the threshold is too
conservative, you will see too many clusters and many residuals (not clustered) syllables.
If the threshold is too liberal, you will see 'false joining' of distinct clusters. In some cases
(the most difficult ones) you will see both false joining and many residuals. To address
these cases, you can endow a veto power to certain features. The meaning of veto is: 'do
not merge two clusters if the difference between them is too big with respect to a specific
feature', so that even if the pair of syllables (in case C) is really tight, SA+ won't join their
clusters, if the centers of those clusters are too far away with respect to one feature. For
example, you can tell SA+ to refuse merging two clusters if the duration difference
between them is more than 0.5 MAD (which is 36ms in the zebra finch). If this 'distance
gap' remains empty of other clusters, those two clusters will never merge.
Created using Helpmatic Pro HTML