A friend of mine wants to “create individual sound files for each selection” in a selection table. This is a good opportunity to show how to create a function that works iteratively on signals in a selection table (like most warbleR functions).
It takes 3 main steps:
Create a (internal) function that does what we want on a single selection (i.e. a single row of the selection table)
Add and
(X)lapply
loop to run the function from step 1 iteratively on each rowPut all inside a new function
Other things can be added to the function to check arguments and set working directory, but this is not critical.
Step 1: Create function that does what we want on a single selection
First install and/or load warbleR (which also loads tuneR, the package for cutting and saving sound files):
if(!"devtools" %in% installed.packages()[,"Package"]) install.packages("devtools")
require("devtools")
devtools::install_github("maRce10/warbleR")
require("warbleR")
Save the example sound files as .wav in a temporary working directory (to test the code):
setwd(tempdir())
data(list = c("Phae.long1", "Phae.long2", "Phae.long3",
"Phae.long4", "selec.table"))
writeWave(Phae.long1,"Phae.long1.wav")
writeWave(Phae.long2,"Phae.long2.wav")
writeWave(Phae.long3,"Phae.long3.wav")
writeWave(Phae.long4,"Phae.long4.wav")
We can check that now there are 4 sound files in the working directory:
list.files(pattern = "\\.wav$", ignore.case = TRUE)
## [1] "Phae.long1.wav" "Phae.long2.wav" "Phae.long3.wav" "Phae.long4.wav"
The selection table selec.table
contains columns for sound file name, start and end of signals (in seconds), and a unique ID for each selection:
head(selec.table)
## sound.files channel selec start end bottom.freq top.freq
## 1 Phae.long1.wav 1 1 1.1693549 1.3423884 2.220105 8.604378
## 2 Phae.long1.wav 1 2 2.1584085 2.3214565 2.169437 8.807053
## 3 Phae.long1.wav 1 3 0.3433366 0.5182553 2.218294 8.756604
## 4 Phae.long2.wav 1 1 0.1595983 0.2921692 2.316862 8.822316
## 5 Phae.long2.wav 1 2 1.4570585 1.5832087 2.284006 8.888027
## 6 Phae.long3.wav 1 1 0.6265520 0.7577715 3.006834 8.822316
## sel.comment rec.comment
## 1 c24 NA
## 2 c25 NA
## 3 c26 NA
## 4 c27 NA
## 5 c28 NA
## 6 c29 NA
Now write a code that takes the selection from a single row, extracts the .wav file segment, and save it to the working directory. To do this use the functions readWave
and writeWave
from tuneR:
#lets use X for the selection table data frame as is the convention in warbleR functions
X <- selec.table
# Cut wave for the first row
wvcut <- tuneR::readWave(as.character(X$sound.files[1]), from = X$start[1], to = X$end[1], units = "seconds")
# save cut removing file extension
tuneR::writeWave(object = wvcut, filename = paste0(gsub("\\.wav$", "",X$sound.files[1], ignore.case = TRUE), "-", X$selec[1], ".wav"))
Note that the sound file name (with the file extension removed) and selection ID were used to name the cut. This is important because it will produce a unique name for each cut, so they won’t be overwritten.
The following code shows that there is a new sound file corresponding to the new cut:
list.files(pattern = "\\.wav$", ignore.case = TRUE)
## [1] "Phae.long1-1.wav" "Phae.long1.wav" "Phae.long2.wav"
## [4] "Phae.long3.wav" "Phae.long4.wav"
Now put the code inside a function. The index number 1 used above has to be replaced by ‘i’ so it runs one each row iteratively. I also added a few more line of codes to:
- Allow adding margins at both sides of the signals
- Adjust the start and end when the margin surpasses the start (0) and end of the whole sound file
cutFUN <- function(X, i, mar){
# Read sound files, initialize frequency and time limits for spectrogram
r <- tuneR::readWave(as.character(X$sound.files[i]), header = TRUE)
f <- r$sample.rate
t <- c(X$start[i] - mar, X$end[i] + mar)
# fix margins if below 0 or length of recordings
mar1 <- mar
mar2 <- mar1 + X$end[i] - X$start[i]
if (t[1] < 0) t[1] <- 0
if(t[2] > r$samples/f) t[2] <- r$samples/f
# Cut wave
wvcut <- tuneR::readWave(as.character(X$sound.files[i]), from = t[1], to = t[2], units = "seconds")
tuneR::writeWave(object = wvcut, filename = paste0(as.character(X$sound.files[i]), "-", X$selec[i], ".wav"))
}
Step 2: Add a (X)lapply function
Several versions of the lapply
function can be used. The function pblapply
from the package pbapply provides a nice progress bar. The following code should produce cuts for each selection in selec.table
:
out <- pbapply::pblapply(1:nrow(selec.table), function(y)
cutFUN(X = selec.table, i = y, mar = 0.05))
We could also used lapply
(no progress bar) or mclapply
([parallel] package, parallel computing), or even pbmclapply
(pbmcapply package, parallel computing and progress bar, but not available for windows). In fact, all these options are included in most warbleR functions.
Step 3: Put all the code inside a new function
Just copy/paste the cutFUN
function and the (X)lapply
function inside a new function:
cut.selections <- function(X, mar){
# internal function to cut each selection
cutFUN <- function(X, i, mar){
# Read sound files, initialize frequency and time limits for spectrogram
r <- tuneR::readWave(as.character(X$sound.files[i]), header = TRUE)
f <- r$sample.rate
t <- c(X$start[i] - mar, X$end[i] + mar)
# fix margins if below 0 or length of recordings
mar1 <- mar
mar2 <- mar1 + X$end[i] - X$start[i]
if (t[1] < 0) t[1] <- 0
if(t[2] > r$samples/f) t[2] <- r$samples/f
# Cut wave
wvcut <- tuneR::readWave(as.character(X$sound.files[i]), from = t[1], to = t[2], units = "seconds")
tuneR::writeWave(object = wvcut, filename = paste0(as.character(X$sound.files[i]), "-", X$selec[i], ".wav"))
}
# lapply function to run over all selections
out <- pbapply::pblapply(1:nrow(X), function(y) cutFUN(X = selec.table, i = y, mar = 0.05))
}
Let’s see if it works. First we should remove the cuts we created previously. The following code removes only the cuts but not the original sound files (filtered out based on file size):
wvs <- list.files(pattern = "\\.wav$", ignore.case = TRUE)
sz <- file.info(list.files(pattern = "\\.wav$", ignore.case = TRUE))$size
unlink(wvs[sz < 50000])
list.files(pattern = "\\.wav$", ignore.case = TRUE)
## [1] "Phae.long1.wav" "Phae.long2.wav" "Phae.long3.wav" "Phae.long4.wav"
And run the function:
cut.selections(X = selec.table, mar = 0.05)
The cuts now are found in the working directory:
list.files(pattern = "\\.wav$", ignore.case = TRUE)
## [1] "Phae.long1.wav" "Phae.long1.wav-1.wav" "Phae.long1.wav-2.wav"
## [4] "Phae.long1.wav-3.wav" "Phae.long2.wav" "Phae.long2.wav-1.wav"
## [7] "Phae.long2.wav-2.wav" "Phae.long3.wav" "Phae.long3.wav-1.wav"
## [10] "Phae.long3.wav-2.wav" "Phae.long3.wav-3.wav" "Phae.long4.wav"
## [13] "Phae.long4.wav-1.wav" "Phae.long4.wav-2.wav" "Phae.long4.wav-3.wav"
I put together all the code in a new warbleR function called cut_sels
(available in version 1.1.9, currrently only on github). I added a few more arguments (labels, overwrite, parallel, …), some argument checks to warn users when using invalid values, and parallel and progress bar options. Here is the code in case you are curious:
cut_sels <- function(X, mar = 0.05, parallel = 1, path = NULL, dest.path = NULL, pb = TRUE,
labels = c("sound.files", "selec"), overwrite = FALSE, ...){
#check path to working directory
if(!is.null(path))
{wd <- getwd()
if(class(try(setwd(path), silent = TRUE)) == "try-error") stop("'path' provided does not exist") else
setwd(path)} #set working directory
#check path to working directory
if(!is.null(dest.path))
{if(class(try(setwd(dest.path), silent = TRUE)) == "try-error") stop("'dest.path' provided does not exist")} else dest.path <- getwd()
#set working directory
#if X is not a data frame
if(!class(X) == "data.frame") stop("X is not a data frame")
if(!all(c("sound.files", "selec",
"start", "end") %in% colnames(X)))
stop(paste(paste(c("sound.files", "selec", "start", "end")[!(c("sound.files", "selec",
"start", "end") %in% colnames(X))], collapse=", "), "column(s) not found in data frame"))
#if there are NAs in start or end stop
if(any(is.na(c(X$end, X$start)))) stop("NAs found in start and/or end")
#if end or start are not numeric stop
if(all(class(X$end) != "numeric" & class(X$start) != "numeric")) stop("'end' and 'selec' must be numeric")
#if any start higher than end stop
if(any(X$end - X$start<0)) stop(paste("The start is higher than the end in", length(which(X$end - X$start<0)), "case(s)"))
#return warning if not all sound files were found
recs.wd <- list.files(pattern = "\\.wav$", ignore.case = TRUE)
if(length(unique(X$sound.files[(X$sound.files %in% recs.wd)])) != length(unique(X$sound.files)))
(paste(length(unique(X$sound.files))-length(unique(X$sound.files[(X$sound.files %in% recs.wd)])),
".wav file(s) not found"))
#missing label columns
if(!all(labels %in% colnames(X)))
stop(paste(paste(labels[!(labels %in% colnames(X))], collapse=", "), "label column(s) not found in data frame"))
#count number of sound files in working directory and if 0 stop
d <- which(X$sound.files %in% recs.wd)
if(length(d) == 0){
stop("The .wav files are not in the working directory")
} else {
X <- X[d, ]
}
#convert factors to characters
X[,sapply(X, is.factor)] <- apply(X[,sapply(X, is.factor)], 2, as.character)
#remove .wav from sound file names
X2 <- X
X2$sound.files <- gsub("\\.wav$", "", X2$sound.files, ignore.case = TRUE)
# If parallel is not numeric
if(!is.numeric(parallel)) stop("'parallel' must be a numeric vector of length 1")
if(any(!(parallel %% 1 == 0),parallel < 1)) stop("'parallel' should be a positive integer")
#if parallel and pb in windows
if(parallel > 1 & pb & Sys.info()[1] == "Windows") {
message("parallel with progress bar is currently not available for windows OS")
message("running parallel without progress bar")
pb <- FALSE
}
#create function to run within Xapply functions downstream
cutFUN <- function(X, i, mar, labels, dest.path){
# Read sound files, initialize frequency and time limits for spectrogram
r <- tuneR::readWave(as.character(X$sound.files[i]), header = TRUE)
f <- r$sample.rate
t <- c(X$start[i] - mar, X$end[i] + mar)
# fix margins if below 0 or length of recordings
mar1 <- mar
mar2 <- mar1 + X$end[i] - X$start[i]
if (t[1] < 0) t[1] <- 0
if(t[2] > r$samples/f) t[2] <- r$samples/f
# Cut wave
wvcut <- tuneR::readWave(as.character(X$sound.files[i]), from = t[1], to = t[2], units = "seconds")
# save cut
if(overwrite) unlink(file.path(dest.path, paste0(paste(X2[i, labels], collapse = "-"), ".wav")))
tuneR::writeWave(object = wvcut, filename = file.path(dest.path, paste0(paste(X2[i, labels], collapse = "-"), ".wav")), ...)
}
# Run parallel in windows
if(parallel > 1) {
if(Sys.info()[1] == "Windows") {
i <- NULL #only to avoid non-declared objects
cl <- parallel::makeCluster(parallel)
doParallel::registerDoParallel(cl)
out <- foreach::foreach(i = 1:nrow(X)) %dopar% {
cutFUN(X = X, i = i, mar = mar, labels = labels, dest.path = dest.path)
}
parallel::stopCluster(cl)
}
if(Sys.info()[1] == "Linux") { # Run parallel in Linux
if(pb)
out <- pbmcapply::pbmclapply(1:nrow(X), mc.cores = parallel, function (i) {
cutFUN(X = X, i = i, mar = mar, labels = labels, dest.path = dest.path)
}) else
out <- parallel::mclapply(1:nrow(X), mc.cores = parallel, function (i) {
cutFUN(X = X, i = i, mar = mar, labels = labels, dest.path = dest.path)
})
}
if(!any(Sys.info()[1] == c("Linux", "Windows"))) # parallel in OSX
{
cl <- parallel::makeForkCluster(getOption("cl.cores", parallel))
doParallel::registerDoParallel(cl)
out <- foreach::foreach(i = 1:nrow(X)) %dopar% {
cutFUN(X = X, i = i, mar = mar, labels = labels, dest.path = dest.path)
}
parallel::stopCluster(cl)
}
}
else {
if(pb)
out <- pbapply::pblapply(1:nrow(X), function(i) cutFUN(X = X, i = i, mar = mar, labels = labels, dest.path = dest.path)) else
out <- lapply(1:nrow(X), function(i) cutFUN(X = X, i = i, mar = mar, labels = labels, dest.path = dest.path))
}
if(!is.null(path)) setwd(wd)
}
That’s is it. If you developed a function that you think could be useful to other people we could include it in warbleR.