Somehow measuring a bunch of spectral/temporal parameters and then reducing its dimensionality using principal component analysis has become the standard procedure when looking at variation in signal structure (i.e. measuring acoustic space), particularly in behavioral ecology and comparative bioacoustics. In most cases the approach is used without any kind of ground-truthing that can help validate the analysis. Given the complexity of animal acoustic signals, the approach could miss key signal features. Here I share a quick-and-dirty comparison of this ‘standard approach’ to a potentially better suited alternative.
But first load/install warbleR, set warbleR options and create a fancy color palette for catalogs:
As in the previous post, we will run the comparison on signals detected on a recording from a male Striped-throated Hermit (Phaethornis striigularis) from Xeno-Canto. We can download the sound file and convert it into wave format as follows:
A long spectrogram would help to get a sense of the song structure in this species:
We can also listen to it from Xeno-Canto:
The elements of this song are pure tone, highly modulated sounds that are recycled along the sequence. Overall, the structure of the element types seems to be consistent across renditions and the background noise level of the recording looks fine.
To run any analysis we need to detect the time ‘coordinates’ of the signals in the sound file using
Lets’ select the 100 highest signal-to-noise ratio signals, just for the sake of the example:
… and measure the frequency range (as in the previous post):
Finally, let’s pack the acoustic data and metadata together as a ‘extended_selection_table’ (check this post to learn more about these objects):
We can take a look at the selected signals (or elements, subunits or whatever you want to call them) by creating a catalog:
Some are too noisy, but still good enough for the example.
Element classification using the ‘standard’ approach
So let’s use the spectro-temporal parameters + PCA recipe. First acoustic parameters are measured using
spec_an and then a PCA is run over those parameters:
The first 5 components explain almost %80 of the variance.
Now let’s look and how good is the classification of elements based on the first 5 PCs. To do this we can use the
Mclust function from the mclust package to choose the most likely number of clusters and assign each element to one of those clusters:
The classification can be visually assessed using a ‘group-tagged’ catalog. In the catalog, elements belonging to the same cluster are located next to each other. Elements are also labeled with the cluster number and colors highlight groups of elements from the same clusters (note that colors are recycled):
A better way to look at this is by plotting the first 2 PC’s:
Most clusters include several different element types and the same element type can be found on several categories. In this example the performance of the ‘standard approach’ is not ideal.
When working with pure tone modulated whistles, the best approach is likely measuring dynamic time warping distances on dominant frequency contours. We can do all that at once using
To convert this distance matrix to a rectangular data frame we can use TSNE (check out this awesome post about it). The name stands for T-distributed Stochastic Neighbor Embedding and is regarded as a more powerful way to find data structure than PCA (and yes, it can also be applied to non-distance matrices). The method can be easily run in R using the
Rtsne function from the package of the same name. The following code does the clustering and cataloging as we did above:
We can obtain 2 dimensions using TSNE so it fits better in a bi-dimensional plot (grouping is likely to improve when adding more dimensions, so this plot gives a conservative estimate):
The classification seems OK. Most clusters contain a single element type, and most types are found in a single cluster. Nonetheless, the classification was not perfect. For instance, clusters 5 and 6 share some element types. However, it’s much better compared to the ‘standard approach’. In a more formal analysis I will make sure the frequency contours are tracking the signals (using
sel_tailor()). This will likely improve the analysis.
This quick-and-dirty comparison suggests that we (behavioral ecologists) might actually be missing important signal features when using the spectral/temporal parameters + PCA recipe as the silver bullet in bioacoustic analysis. It also stresses the importance of validating our analyses in some way. Otherwise, there is no way to tell whether the results are simply an artifact of our measuring tools, particularly when no differences are found.