Extended selection tables

This post shows how to create and use the new warbleR object class extended_selection_table.

These objects are created with the selec_table() function. The function takes data frames containing selection data (sound file name, selection, start, end …), checks whether the information is consistent (see checksels() function for details) and saves the ‘diagnostic’ metadata as an attribute. When the argument extended = TRUE the function generates an object of class extended_selection_table which also contains a list of wave objects corresponding to each of the selections in the data frame. Hence, the function transforms selection tables into self-contained objects as they no longer need the original sound files for running most acoustic analysis in warbleR. This can facilitate a lot the storing and sharing of (bio)acoustic data. In addition, it also speeds up processes as sound files do not need to be read every time the data is analyzed.

Let’s first install and/or load warbleR developmental version (if there is an older warbleR version installed it has to be removed first):

# remove warbleR
remove.packages("warbleR")

# install devtools if not installed
if (!"devtools" %in% installed.packages()[,"Package"])  
  install.packages("devtools")

# and install warbleR from github
devtools::install_github("maRce10/warbleR")

# load warbleR
library(warbleR)

… set a temporary folder, load the example sound files and set warbleR options (see warbleR_options() documentation):

# set temporary directory
setwd(tempdir())

# load example data
data(list = c("Phae.long1", "Phae.long2", "Phae.long3", "Phae.long4",
              "selec.table"))

# save recordings as wave files
writeWave(Phae.long1,"Phae.long1.wav")
writeWave(Phae.long2,"Phae.long2.wav")
writeWave(Phae.long3,"Phae.long3.wav")
writeWave(Phae.long4,"Phae.long4.wav")

# set warbleR options
warbleR_options(wl = 300, pb = FALSE, 
          parallel = parallel::detectCores() - 1)

Now, as mentioned above, you need the selec_table() function to create extended selection table. You also need to set the the argument extended = TRUE (otherwise the class would be a “selection_table”). Here the example data that comes with warbleR is used as the data frame to be converted to an object of class extended_selection_table:

selec.table

The following code converts it to an extended selection table:

# make extended selection table
ext_st <- selection_table(X = selec.table, pb = FALSE, 
          extended = TRUE, confirm.extended = FALSE)

And that’s it. Now the acoustic data and the selection data (as well as the additional metadata) are all together in a single R object.

 

Manipulating extended selection tables

Several functions can be used to deal with objects of this class. You can test if the object belongs to the extended_selection_table:

is_extended_selection_table(ext_st)
[1] TRUE

You can subset the selection in the same way that any other data frame in it will maintain its attributes:

ext_st2 <- ext_st[1:2, ]

is_extended_selection_table(ext_st2)
[1] TRUE

There is also a generic version of print() for these class of objects:

## print
print(ext_st)
object of class 'extended_selection_table' 
 contains a selection table data frame with 11 rows and 9 columns: 
       sound.files channel selec start     end bottom.freq top.freq sel.comment rec.comment
1 Phae.long1.wav_1       1     1   0.1 0.27303      2.2201   8.6044         c24          NA
2 Phae.long1.wav_2       1     1   0.1 0.26305      2.1694   8.8071         c25          NA
3 Phae.long1.wav_3       1     1   0.1 0.27492      2.2183   8.7566         c26          NA
4 Phae.long2.wav_1       1     1   0.1 0.23257      2.3169   8.8223         c27          NA
5 Phae.long2.wav_2       1     1   0.1 0.22615      2.2840   8.8880         c28          NA
6 Phae.long3.wav_1       1     1   0.1 0.23122      3.0068   8.8223         c29          NA
... and 5 more rows 
11 wave objects (as attributes): 
[1] "Phae.long1.wav_1" "Phae.long1.wav_2" "Phae.long1.wav_3" "Phae.long2.wav_1" "Phae.long2.wav_2"
[6] "Phae.long3.wav_1"
... and 5 more 
and a data frame (check.results) generated by checkres() (as attribute) 
the selection table was created by element (see 'class_extended_selection_table')
## which is the same than this
ext_st
object of class 'extended_selection_table' 
 contains a selection table data frame with 11 rows and 9 columns: 
       sound.files channel selec start     end bottom.freq top.freq sel.comment rec.comment
1 Phae.long1.wav_1       1     1   0.1 0.27303      2.2201   8.6044         c24          NA
2 Phae.long1.wav_2       1     1   0.1 0.26305      2.1694   8.8071         c25          NA
3 Phae.long1.wav_3       1     1   0.1 0.27492      2.2183   8.7566         c26          NA
4 Phae.long2.wav_1       1     1   0.1 0.23257      2.3169   8.8223         c27          NA
5 Phae.long2.wav_2       1     1   0.1 0.22615      2.2840   8.8880         c28          NA
6 Phae.long3.wav_1       1     1   0.1 0.23122      3.0068   8.8223         c29          NA
... and 5 more rows 
11 wave objects (as attributes): 
[1] "Phae.long1.wav_1" "Phae.long1.wav_2" "Phae.long1.wav_3" "Phae.long2.wav_1" "Phae.long2.wav_2"
[6] "Phae.long3.wav_1"
... and 5 more 
and a data frame (check.results) generated by checkres() (as attribute) 
the selection table was created by element (see 'class_extended_selection_table')

You can also row-bind them together. Here the original extended_selection_table is split into 2 and bind back together using rbind():

ext_st3 <- ext_st[1:5, ]

ext_st4 <- ext_st[6:11, ]

ext_st5 <- rbind(ext_st3, ext_st4)

#print
ext_st5
object of class 'extended_selection_table' 
 contains a selection table data frame with 11 rows and 9 columns: 
       sound.files channel selec start     end bottom.freq top.freq sel.comment rec.comment
1 Phae.long1.wav_1       1     1   0.1 0.27303      2.2201   8.6044         c24          NA
2 Phae.long1.wav_2       1     1   0.1 0.26305      2.1694   8.8071         c25          NA
3 Phae.long1.wav_3       1     1   0.1 0.27492      2.2183   8.7566         c26          NA
4 Phae.long2.wav_1       1     1   0.1 0.23257      2.3169   8.8223         c27          NA
5 Phae.long2.wav_2       1     1   0.1 0.22615      2.2840   8.8880         c28          NA
6 Phae.long3.wav_1       1     1   0.1 0.23122      3.0068   8.8223         c29          NA
... and 5 more rows 
11 wave objects (as attributes): 
[1] "Phae.long1.wav_1" "Phae.long1.wav_2" "Phae.long1.wav_3" "Phae.long2.wav_1" "Phae.long2.wav_2"
[6] "Phae.long3.wav_1"
... and 5 more 
and a data frame (check.results) generated by checkres() (as attribute) 
the selection table was created by element (see 'class_extended_selection_table')
# the same than the original one
all.equal(ext_st, ext_st5)
[1] TRUE

The wave objects can be indvidually read using read_sound_file(), a wrapper on tuneR’s readWave() function, that can take extended selection tables:

wv1 <- read_sound_file(X = ext_st, index = 3, from = 0, to = 0.37)

These are regular wave objects:

class(wv1)
[1] "Wave"
attr(,"package")
[1] "tuneR"
wv1

Wave Object
	Number of Samples:      8325
	Duration (seconds):     0.37
	Samplingrate (Hertz):   22500
	Channels (Mono/Stereo): Mono
	PCM (integer format):   TRUE
	Bit (8/16/24/32/64):    16 
spectro(wv1, wl = 150, grid = FALSE, scale = FALSE, ovlp = 90)

plot of chunk unnamed-chunk-3

par(mfrow = c(3, 2), mar = rep(0, 4))

for(i in 1:6){
  
  wv <- read_sound_file(X = ext_st, index = i, from = 0.05, to = 0.32)

  spectro(wv, wl = 150, grid = FALSE, scale = FALSE, axisX = FALSE,
          axisY = FALSE, ovlp = 90)

}

plot of chunk unnamed-chunk-3

The read_sound_file() function takes the table as well as the index of the selection to be read (e.g. the row number).

Keep in mind that is likely that other functions that modify data frames will remove the attributes in which wave objects and metadata are stored. For instances, merging and extended selection table will get rid of its attributes:

# create a new data frame 
Y <- data.frame(sound.files = ext_st$sound.files, site = "La Selva", lek = c(rep("SUR", 5), rep("CCL", 6)))

# merge
mrg_ext_st <- merge(ext_st, Y, by = "sound.files")

# check class
is_extended_selection_table(mrg_ext_st)
[1] FALSE

In this case we can use the fix_extended_selection_table() function to transfer the attributes from the original extended selection table:

# fix
mrg_ext_st <- fix_extended_selection_table(X = mrg_ext_st, Y = ext_st)

# check class
is_extended_selection_table(mrg_ext_st)
[1] TRUE

This works as long as some of the original sound files are kept and no other selections are added.

 

Object size

Extended selection table size will be a function of the number of selections, sampling rate, selection duration and margin duration (the margin is how much extra time you want to keep at each side of the selection). In this example a data frame with 1000 selections is created just by repeating the example data frame several times and then converted to an extended selection table:

lng.selec.table <- do.call(rbind, replicate(100, selec.table, 
                        simplify = FALSE))[1:1000,]

lng.selec.table$selec <- 1:nrow(lng.selec.table)

nrow(lng.selec.table)

lng_ext_st <- selection_table(X = lng.selec.table, pb = FALSE, 
                        extended = TRUE, confirm.extended = FALSE)

lng_ext_st
object of class 'extended_selection_table' 
 contains a selection table data frame with 1000 rows and 9 columns: 
       sound.files channel selec start     end bottom.freq top.freq sel.comment rec.comment
1 Phae.long1.wav_1       1     1   0.1 0.27303      2.2201   8.6044         c24          NA
2 Phae.long1.wav_2       1     1   0.1 0.26305      2.1694   8.8071         c25          NA
3 Phae.long1.wav_3       1     1   0.1 0.27492      2.2183   8.7566         c26          NA
4 Phae.long2.wav_4       1     1   0.1 0.23257      2.3169   8.8223         c27          NA
5 Phae.long2.wav_5       1     1   0.1 0.22615      2.2840   8.8880         c28          NA
6 Phae.long3.wav_6       1     1   0.1 0.23122      3.0068   8.8223         c29          NA
... and 994 more rows 
1000 wave objects (as attributes): 
[1] "Phae.long1.wav_1" "Phae.long1.wav_2" "Phae.long1.wav_3" "Phae.long2.wav_4" "Phae.long2.wav_5"
[6] "Phae.long3.wav_6"
... and 994 more 
and a data frame (check.results) generated by checkres() (as attribute) 
the selection table was created by element (see 'class_extended_selection_table')
format(object.size(lng_ext_st), units = "auto")
[1] "31.3 Mb"

As you can see the object size is only ~31 MB. So, as a guide, a selection table with 1000 selections similar to those in ‘selec.table’ (mean duration ~0.15 seconds) at 22.5 kHz sampling rate and the default margin (mar = 0.1) will generate an extended selection table of ~31 MB or ~310 MB for a 10000 row selection table.

 

Running analysis on extended selection tables

These objects can be used as input for most warbleR functions. We need to delete the sound files in order to show the data is actually contained in the new objects:

list.files(pattern = "\\.wav$")
[1] "Phae.long1.wav" "Phae.long2.wav" "Phae.long3.wav" "Phae.long4.wav"
# delete files (be careful not to run this 
# if you have sound files in the working directory!)
unlink(list.files(pattern = "\\.wav$"))

list.files(pattern = "\\.wav$")
character(0)

Here are a few examples of warbleR functions using extended_selection_table:

Spectral parameters

# spectral parameters
sp <- specan(ext_st)

sp

Cross correlation

xc <- xcorr(ext_st, bp = c(1, 11))

xc

Signal-to-noise ratio

# signal-to-noise ratio
snr <- sig2noise(ext_st, mar = 0.05)

snr

Dynamic time warping distance

dtw.dist <- dfDTW(ext_st, img = FALSE)

dtw.dist
calculating DTW distances (step 2 of 2, no progress bar):

Performance

Using extended_selection_table objects can improve performance (in our case measured as time). Here we used the microbenchmark to compare the performance of sig2noise() and ggplot2 to plot the results. We also need to save the wave files again to be able to run the analysis with regular data frames:

# save recordings as wave files
writeWave(Phae.long1,"Phae.long1.wav")
writeWave(Phae.long2,"Phae.long2.wav")
writeWave(Phae.long3,"Phae.long3.wav")
writeWave(Phae.long4,"Phae.long4.wav")

#run this one if microbenchmark is not installed
# install.packages("microbenchmark")
library(microbenchmark)

# install.packages("ggplot2")
library(ggplot2)

# use only 1 core
warbleR_options(parallel = 1, pb = FALSE)

# use the first 100 selection for the long selection tables
mbmrk.snr <- microbenchmark(extended = sig2noise(lng_ext_st[1:100, ], 
      mar = 0.05), regular = sig2noise(lng.selec.table[1:100, ], 
                    mar = 0.05), times = 50)

autoplot(mbmrk.snr) + ggtitle("sig2noise")

plot of chunk unnamed-chunk-3

Distribution of sig2noise() timing on regular and extended selection tables

The function runs much faster on extended selection tables. The gain in performance is likely to improve when using longer recordings and data sets (i.e. compensate for computing overhead).

By song

The extended selection tables above were all made ‘by selection’. This is, each sound file inside the object contains a single selection (i.e. 1:1 correspondence between selections and wave objects). Extended selection tables, however, can also be created by using a higher hierarchical level with the argument by.song. In this case, ‘song’ represents a higher level that contains one or more selections and that the user may want to keep together for some particular analysis (e.g. gap duration). The argument by.song takes the name of the character or factor column with the IDs of the different “songs” within a sound file (note that the function assumes that a given song can only be found in a single sound file so selections with the same song ID but from different sound files is taken as different ‘songs’).

For the sake of the example, let’s add an artificial song column to our example data set in which each sound files 2 songs:

# add column
selec.table$song <- c(1, 1, 2, 1, 2, 1, 1, 2, 1, 2, 2)

The data frame looks like this:

Now we can create an extended selection table ‘by song’ using the name of the ‘song’ column (which in this silly example is also ‘song’) as the input for the by.song argument:

bs_ext_st <- selection_table(X = selec.table, extended = TRUE,
                              confirm.extended = FALSE, by.song = "song")

In this case we should only have 8 wave objects instead of 11 as when the object was created ‘by selection’:

# by element
length(attr(ext_st, "wave.objects"))
[1] 11
# by song
length(attr(bs_ext_st, "wave.objects"))
[1] 8

Again, these objects can also be used on further analysis:

# signal-to-noise ratio
bs_snr <- sig2noise(bs_ext_st, mar = 0.05)

The margin would be an important parameter to take into consideration for some downstream functions like those producing plots or using additional time segments around selection to run analysis (e.g. sig2noise() or xcorr()).

Sharing acoustic data

The new object class allows to share complete data sets, including the acoustic data. For instance, with the following code you can download a subset of the data used in Araya-Salas et al (2017) (it can also be downloaded here):

URL <- "https://marceloarayasalas.weebly.com/uploads/2/5/5/2/25524573/extended.selection.table.araya-salas.et.al.2017.bioacoustics.100.sels.rds"

dat <- readRDS(gzcon(url(URL)))

nrow(dat)
[1] 100
format(object.size(dat), units = "auto")
[1] "10.1 Mb"

The total size of the 100 sound files from which these selections were taken adds up to 1.1 GB. The size of the extended selection table is just 10.1 MB.

This data is ready to be used:

sp <- specan(dat, bp = c(2, 10))

head(sp)

And the spectrograms can be displayed:

par(mfrow = c(3, 2), mar = rep(0, 4))

for(i in 1:6){
  
  wv <- read_sound_file(X = dat, index = i, from = 0.17, to = 0.4)

  spectro(wv, wl = 250, grid = FALSE, scale = FALSE, axisX = FALSE,
          axisY = FALSE, ovlp = 90, flim = c(0, 12), 
          palette = reverse.gray.colors.1)
}

plot of chunk unnamed-chunk-6

The ability to compress large data sets and the easiness of conducting analyses requiring only a single R object can potentially simplify data sharing and the reproducibility of bioacoustic analyses.

Please report any bugs here.


Session information

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.0.0        microbenchmark_1.4-4 kableExtra_0.9.0     knitr_1.20           warbleR_1.1.15      
[6] NatureSounds_1.0.0   seewave_2.1.0        tuneR_1.3.3          maps_3.3.0          

loaded via a namespace (and not attached):
 [1] rgl_0.95.1441        Rcpp_0.12.18         fftw_1.0-4           assertthat_0.2.0     rprojroot_1.3-2     
 [6] digest_0.6.16        R6_2.2.2             plyr_1.8.4           Sim.DiffProc_4.1     backports_1.1.2     
[11] signal_0.7-6         evaluate_0.11        pracma_2.1.5         httr_1.3.1           highr_0.7           
[16] pillar_1.3.0         rlang_0.2.2          lazyeval_0.2.1       curl_3.2             rstudioapi_0.7      
[21] rmarkdown_1.10       devtools_1.13.6      moments_0.14         readr_1.1.1          stringr_1.3.1       
[26] RCurl_1.95-4.11      munsell_0.5.0        proxy_0.4-22         compiler_3.4.4       Deriv_3.8.5         
[31] pkgconfig_2.0.2      htmltools_0.3.6      tidyselect_0.2.4     tibble_1.4.2         dtw_1.20-1          
[36] bioacoustics_0.1.5   viridisLite_0.3.0    crayon_1.3.4         dplyr_0.7.6          withr_2.1.2         
[41] MASS_7.3-50          bitops_1.0-6         grid_3.4.4           gtable_0.2.0         git2r_0.23.0        
[46] magrittr_1.5         scales_1.0.0         stringi_1.2.4        pbapply_1.3-4        scatterplot3d_0.3-41
[51] bindrcpp_0.2.2       xml2_1.2.0           rjson_0.2.20         iterators_1.0.10     tools_3.4.4         
[56] glue_1.3.0           purrr_0.2.5          hms_0.4.2            jpeg_0.1-8           parallel_3.4.4      
[61] yaml_2.2.0           colorspace_1.3-2     soundgen_1.3.1       rvest_0.3.2          memoise_1.1.0       
[66] bindr_0.1.1         

See also

comments powered by Disqus