Sound

Objetives

Learn the basic aspects of sound as a physical phenomenom
Get familiar with how sound is represented as a digital object in R

Sound waves are characterized by compression and expansion of the medium as sound energy moves through it. There is also back and forth motion of the particles making up the medium:

taken from https://dosits.org

The variation in pressure that is perceived at a fixed point in space can be represented by a graph of pressure (amplitude) by time:

Sounds waves are typically quantified by their frequency (number of cycles per second, Hz) and amplitude (relative intensity).

Digitizing sound

0.1 Sampling frequency

Digitizing implies discretizing, which requires some sort of regular sampling. Sampling frequency refers to how many samples of the pressure level of the environment are taken per second. A 440 Hz sine wave recorded at 44100 Hz would have around 100 samples per cycle. This plot shows 2 cycles of a 440 Hz sine wave sampled (vertical dotted lines) at 44100 Hz:

0.2 Nyquist frequency

Sampling should be good enough so the regularity of the sine wave can be reconstructed from the sampled data. Low sampling frequencies of a high frequency sine wave might not be able to provide enough information. For instance, the same 440 Hz sine wave sampled at 22050 Hz looks like this:

As you can see way less samples are taken per unit of time. The threshold at which samples cannot provide a reliable estimate of the regularity of a sine wave is called Nyquist frequency and corresponds to half of the frequency of the sine wave. This is how the 2 cycles of the 440 Hz would look like when sampled at its Nyquist frequency (sampling frequency of 880 Hz):

0.3 Quantization

Once we know at which point amplitude samples will be taken we just need to measure it. This process is called quantization. The range of amplitude values is discretized in a number of intervals equals to 2 ^ bits. Hence, it involves some rounding of the actual amplitude values and some data loss. This is the same 440 Hz sine wave recorded at 44100 kHz quantized at 2 bits (2^2 = 4 intervals):

Rounding and data loss is more obvious if we bind the sampled points with lines:

This is the same signal quantized at 3 bits (2^3 = 8 intervals):

4 bits (2^4 = 16 intervals):

.. and 8 bits (2^8 = 256 intervals):

At this point quantization involves very little information loss. 16 bits is probably the most common dynamic range used nowadays. As you can imagine, the high number of intervals (2^16 = 65536) allows for great precision in the quantization of amplitude.

1 Sound in R

Sound waves can be represented by 3 kinds of R objects:

Common classes (numerical vector, numerical matrix)
Time series classes (ts, mts)
Specific sound classes (Wave, sound and audioSample)

1.1 Non-specific classes

1.1.1 Vectors

Any numerical vector can be treated as a sound if a sampling frequency is provided. For example, a 440 Hz sinusoidal sound sampled at 8000 Hz for one second can be generated like this:

Code

library(seewave)

# create sinewave at 440 Hz
s1 <- sin(2 * pi * 440 * seq(0, 1, length.out = 8000))

is.vector(s1)

[1] TRUE

Code

mode(s1)

[1] "numeric"

These sequences of values only make sense when specifying the sampling rate at which they were created:

Code

oscillo(s1, f = 8000, from = 0, to = 0.01)

1.1.2 Matrices

You can read any single column matrix:

Code

s2 <- as.matrix(s1)

is.matrix(s2)

[1] TRUE

Code

dim(s2)

[1] 8000    1

Code

oscillo(s2, f = 8000, from = 0, to = 0.01)

If the matrix has more than one column, only the first column will be considered:

Code

x <- rnorm(8000)

s3 <- cbind(s2, x)

is.matrix(s3)

[1] TRUE

Code

dim(s3)

[1] 8000    2

Code

oscillo(s3, f = 8000, from = 0, to = 0.01)

1.1.3 Time series

The class ts and related functions ts(), as.ts(), is.ts() can also be used to generate sound objects in R. Here the command to similarly generate a series of time is shown corresponding to a 440 Hz sinusoidal sound sampled at 8000 Hz for one second:

Code

s4 <- ts(data = s1, start = 0, frequency = 8000)

str(s4)

 Time-Series [1:8000] from 0 to 1: 0 0.339 0.637 0.861 0.982 ...

To generate a random noise of 0.5 seconds:

Code

s4 <- ts(data = runif(4000, min = -1, max = 1), start = 0, end = 0.5, frequency = 8000)

str(s4)

 Time-Series [1:4001] from 0 to 0.5: 0.182 0.9231 -0.5833 0.0771 -0.0794 ...

The frequency() and deltat() functions return the sampling frequency ($f$) and the time resolution ($Delta t$) respectively:

Code

frequency(s4)

[1] 8000

Code

deltat(s4)

[1] 0.000125

As the frequency is incorporated into the ts objects, it is not necessary to specify it when used within functions dedicated to audio:

Code

oscillo(s4, from = 0, to = 0.01)

In the case of multiple time series, seewave functions will consider only the first series:

Code

s5 <- ts(data = s3, f = 8000)

class(s5)

[1] "mts"    "ts"     "matrix"

Code

oscillo(s5, from = 0, to = 0.01)

1.2 Dedicated R classes for sound

There are 3 kinds of objects corresponding to the wav binary format or themp3 compressed format:

Wave class of the package tuneR
sound class of the package phonTools
AudioSample class of the package audio

1.2.1 `Wave` class (tuneR)

The Wave class comes with the tuneR package. This S4 class includes different “slots” with the amplitude data (left or right channel), the sampling frequency (or frequency), the number of bits (8/16/24/32) and the type of sound (mono/stereo). High sampling rates (> 44100 Hz) can be read on these types of objects.

The function to import .wav files from the hard drive is readWave:

Code

# load packages
library(tuneR)

s6 <- readWave("./examples/Phae.long1.wav")

We can verify the class of the object like this:

Code

# object class
class(s6)

[1] "Wave"
attr(,"package")
[1] "tuneR"

S4 objects have a structure similar to lists but use ‘@’ to access each position (slot):

Code

# structure
str(s6)

Formal class 'Wave' [package "tuneR"] with 6 slots
  ..@ left     : int [1:56251] 162 -869 833 626 103 -2 43 19 47 227 ...
  ..@ right    : num(0) 
  ..@ stereo   : logi FALSE
  ..@ samp.rate: int 22500
  ..@ bit      : int 16
  ..@ pcm      : logi TRUE

Code

# extract 1 position
s6@samp.rate

[1] 22500

“Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio. In a PCM stream, the amplitude of the analog signal is sampled regularly at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps” (Wikipedia).

The samples come in the slot ‘@left’:

Code

# samples
s6@left[1:40]

 [1]  162 -869  833  626  103   -2   43   19   47  227   -4  205  564  171  457
[16]  838 -216   60   76 -623 -213  168 -746 -248  175 -512  -58  651  -85 -213
[31]  586   40 -407  371  -51 -587  -92   94 -527   40

The number of samples is given by the duration and the sampling rate.

Exercise

How can we calculate the duration of the wave object using the information in the object?

Extract the first second of audio from the object s6 using indexing (and squared brackets)

An advantage of using readWave() is the ability to read specific segments of sound files, especially useful with long files. This is done using the from andto arguments and specifying the units of time with the units arguments. The units can be converted into “samples”, “minutes” or “hours”. For example, to read only the section that begins in 1s and ends in 2s of the file “Phae.long1.wav”:

Code

s7 <- readWave("./examples/Phae.long1.wav", from = 1, to = 2, units = "seconds")

s7


Wave Object
    Number of Samples:      22500
    Duration (seconds):     1
    Samplingrate (Hertz):   22500
    Channels (Mono/Stereo): Mono
    PCM (integer format):   TRUE
    Bit (8/16/24/32/64):    16

The .mp3 files can be imported to R although they are imported inWave format. This is done using the readMP3() function:

Code

s7 <- readMP3("./examples/Phae.long1.mp3")

s7


Wave Object
    Number of Samples:      56448
    Duration (seconds):     2.56
    Samplingrate (Hertz):   22050
    Channels (Mono/Stereo): Mono
    PCM (integer format):   TRUE
    Bit (8/16/24/32/64):    16

To obtain information about the object (sampling frequency, number of bits, mono/stereo), it is necessary to use the indexing of S4 class objects:

Code

s7@samp.rate

[1] 22050

Code

s7@bit

[1] 16

Code

s7@stereo

[1] FALSE

A property that does not appear in these calls is that readWave does not normalize the sound. The values that describe the sound will be included between $\pm2^{bit} - 1$:

Code

range(s7@left)

[1] -32768  32767

Exercise

The function Wave can be used to create wave objects.

Run the example code in the function documentation
Plot the oscillogram for the first 0.01 s of ‘Wobj’
Note that the function sine provides a shortcut that can be used to create wave object with a sine wave. Check out other similar functions described in the sine function documentation. Try 4 of these alternative functions and plot the oscillogram of the first 0.01 s for each of them.

The function read_sound_files from warbleR is a wrapper over several sound file reading functions, that can read files in ‘wav’, ‘mp3’, ‘flac’ and ‘wac’ format:

Code

library(warbleR)

# wave
rsf1 <- read_sound_file("Phaethornis-eurynome-15607.wav", path = "./examples")

class(rsf1)

[1] "Wave"
attr(,"package")
[1] "tuneR"

Code

# mp3
rsf2 <- read_sound_file("Phaethornis-striigularis-154074.mp3", path = "./examples")

class(rsf2)

[1] "Wave"
attr(,"package")
[1] "tuneR"

Code

# flac
rsf3 <- read_sound_file("Phae.long1.flac", path = "./examples")

class(rsf3)

[1] "Wave"
attr(,"package")
[1] "tuneR"

Code

# wac
rsf4 <- read_sound_file("recording_20170716_230503.wac", path = "./examples")

class(rsf4)

[1] "Wave"
attr(,"package")
[1] "tuneR"

The function can also read recordings hosted in an online repository:

Code

rsf5 <- read_sound_file(X = "https://xeno-canto.org/35340/download")

class(rsf5)

[1] "Wave"
attr(,"package")
[1] "tuneR"

Code

rsf6 <- read_sound_file(X = "https://github.com/maRce10/OTS_BIR_2023/raw/master/examples/Phae.long1.flac")

class(rsf6)

[1] "Wave"
attr(,"package")
[1] "tuneR"

1.3 Class `sound` (phonTools)

The loadsound() function of phonTools also imports ‘wave’ sound files into R, in this case as objects of class sound:

Code

library(phonTools)

s8 <- loadsound("./examples/Phae.long1.wav")

s8


      Sound Object

   Read from file:         ./examples/Phae.long1.wav
   Sampling frequency:     22500  Hz
   Duration:               2500.044  ms
   Number of Samples:      56251

Code

str(s8)

List of 5
 $ filename  : chr "./examples/Phae.long1.wav"
 $ fs        : int 22500
 $ numSamples: num 56251
 $ duration  : num 2500
 $ sound     : Time-Series [1:56251] from 0 to 2.5: 0.00494 -0.02652 0.02542 0.0191 0.00314 ...
 - attr(*, "class")= chr "sound"

This function only imports files with a dynamic range of 8 or 16 bits.

1.4 Class `audioSample` (audio)

The audio package is another option to handle .wav files. The sound can be imported using the load.wave() function. The class of the resulting object is audioSample which is essentially a numerical vector (for mono) or a numerical matrix with two rows (for stereo). The sampling frequency and resolution are saved as attributes:

Code

library(audio)

s10 <- load.wave("./examples/Phae.long1.wav")

head(s10)

sample rate: 22500Hz, mono, 16-bits
[1]  4.943848e-03 -2.652058e-02  2.542114e-02  1.910400e-02  3.143311e-03
[6] -6.103702e-05

Code

s10$rate

[1] 22500

Code

s10$bits

[1] 16

The main advantage of the audio package is that the sound can be acquired directly within an R session. This is achieved first by preparing a NAs vector and then using therecord() function. For example, to obtain a mono sound of 5 seconds sampled at 16 kHz:

Code

s11 <- rep(NA_real_, 16000 * 5)

record(s11, 16000, 1)

A recording session can be controlled by three complementary functions: pause(), rewind(), and resume().

1.5 Export sounds from R

For maximum compatibility with other sound programs, it may be useful to save a sound as a simple .txt file. The following commands will write a “tico.txt” file:

Code

data(tico)

export(tico, f = 22050)

1.6 Format ‘.wav’

tuneR and audio have a function to write .wav files: writeWave() and save.wave() respectively. Within seewave, the savewav() function, which is based on writeWave(), can be used to save data in .wav format. By default, the object name will be used for the name of the .wav file:

Code

savewav(tico)

1.7 Format ‘.flac’

Free Lossless Audio Codec (FLAC) is a file format for lossless audio data compression. FLAC reduces bandwidth and storage requirements without sacrificing the integrity of the audio source. Audio sources encoded in FLAC are generally reduced in size from 40 to 50 percent. See the flac website for more details (flac.sourceforge.net).

The .flac format cannot be used as such with R. However, the wav2flac()function allows you to call the FLAC software directly from the console. Therefore, FLAC must be installed on your operating system. If you have a .wav file that you want to compress in .flac, call:

Code

wav2flac(file = "./examples/Phae.long1.wav", overwrite = FALSE)

To compress a .wav file to a .flac format, the argument reverse = TRUE must be used:

Code

wav2flac("Phae.long1.flac", reverse = TRUE)

This table, taken from Sueur (2018), summarizes the functions available to import and export sound files in R. The table is incomplete since it does not mention the functions of the phonTools package:

tabla imp exp waves

Exercise

How does the sampling rate affect the size of an audio file? (hint: create 2 sounds files with the same data but different sampling rates; use sine())
How does the dynamic range affect the size of an audio file?
Use the system.time() function to compare the performance of the different functions to import audio files in R. For this use the file “LBH.374.SUR.wav” (Long-billed hermit songs) which lasts about 2 min

The following code creates a plot similar to oscillo but using dots instead of lines:

Code

# generate sine wave
wav <- sine(freq = 440, duration = 500, xunit = "samples", samp.rate = 44100)

# plot
plot(wav@left)

Use the function downsample() to reduce the sampling rate of ‘wav’ (below 44100) and plot the output object. Decrease the sampling rate until you cannot recognize the wave pattern from the original wave object. Try several values so you get a sense at which sampling rate this happens.

1.8 References

Sueur J, Aubin T, Simonis C. 2008. Equipment review: seewave, a free modular tool for sound analysis and synthesis. Bioacoustics 18(2):213–226.
Sueur, J. (2018). Sound Analysis and Synthesis with R.
Sueur J. (2018). I/O of sound with R. seewave package vignette. url: https://cran.r-project.org/web/packages/seewave/vignettes/seewave_IO.pdf

Session information

R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_CR.UTF-8        LC_COLLATE=es_ES.UTF-8    
 [5] LC_MONETARY=es_CR.UTF-8    LC_MESSAGES=es_ES.UTF-8   
 [7] LC_PAPER=es_CR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_CR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] audio_0.1-10       phonTools_0.2-2.1  warbleR_1.1.28     NatureSounds_1.0.4
[5] tuneR_1.4.4        knitr_1.42         seewave_2.2.0     

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0   xfun_0.39          pbapply_1.7-0      colorspace_2.1-0  
 [5] vctrs_0.6.2        generics_0.1.3     testthat_3.1.8     htmltools_0.5.5   
 [9] viridisLite_0.4.2  yaml_2.3.7         utf8_1.2.3         rlang_1.1.1       
[13] pillar_1.9.0       glue_1.6.2         withr_2.5.0        lifecycle_1.0.3   
[17] stringr_1.5.0      munsell_0.5.0      gtable_0.3.3       moments_0.14.1    
[21] htmlwidgets_1.5.4  evaluate_0.21      fastmap_1.1.1      fftw_1.0-7        
[25] parallel_4.2.2     fansi_1.0.4        Rcpp_1.0.10        scales_1.2.1      
[29] formatR_1.12       jsonlite_1.8.4     brio_1.1.3         gridExtra_2.3     
[33] rjson_0.2.21       ggplot2_3.4.2      digest_0.6.31      stringi_1.7.12    
[37] dplyr_1.1.0        bioacoustics_0.2.8 dtw_1.23-1         grid_4.2.2        
[41] cli_3.6.1          tools_4.2.2        bitops_1.0-7       magrittr_2.0.3    
[45] RCurl_1.98-1.12    proxy_0.4-27       tibble_3.2.1       pkgconfig_2.0.3   
[49] MASS_7.3-58.2      rmarkdown_2.21     rstudioapi_0.14    viridis_0.6.3     
[53] R6_2.5.1           signal_0.7-7       compiler_4.2.2

--- title: Sound toc: true toc-depth: 2 toc-location: left number-sections: true highlight-style: pygments format: html: df-print: kable code-fold: show code-tools: true css: styles.css link-external-icon: true link-external-newwindow: true --- ::: {.alert .alert-info} ## **Objetives** {.unnumbered .unlisted} - Learn the basic aspects of sound as a physical phenomenom - Get familiar with how sound is represented as a digital object in R ::: ```{r, echo = FALSE} library(seewave) library(knitr) opts_chunk$set(tidy = TRUE, warning = FALSE, message = FALSE) ``` Sound waves are characterized by compression and expansion of the medium as sound energy moves through it. There is also back and forth motion of the particles making up the medium: <img src="images/wave_white_bg.gif" alt="wave animation" width="700"/> taken from https://dosits.org The variation in pressure that is perceived at a fixed point in space can be represented by a graph of pressure (amplitude) by time: ![wave and oscillogram](images/amplitude.gif){alt="wave and oscillogram" width="600" height="350"} Sounds waves are typically quantified by their frequency (number of cycles per second, Hz) and amplitude (relative intensity). ```{r, echo = FALSE, out.width="85%", warning=FALSE, message=FALSE} opar <- par() x <- c(0, 0.1, 0.55, 1) y <- c(0, 0.4, 0.8, 1) lwd <-4 col <- viridis::viridis(10)[3] mat <- matrix(c( c(x[1], x[2], y[1], y[4]), # 1 left amptliude panel c(x[1], x[4], y[3], y[4]), # 2 upper freq panel c(x[2], x[3], y[2], y[3]), # 3 left upper oscillo c(x[3], x[4], y[2], y[3]), # 4 right upper oscillo c(x[2], x[3], y[1], y[2]), # 5 left lower oscillo c(x[3], x[4], y[1], y[2]) # 6 left upper oscillo ), ncol = 4, byrow = TRUE ) a <- split.screen(figs = mat) screen(1) par(mar = c(0, 0, 0, 0), bg = "white", new = T) plot(1, frame.plot = FALSE, type = "n", xaxt='n', yaxt='n') text(x = 0.8, y = 0.98, "Frequency", srt = 90, cex = 2) text(x = 1.2, y = 1.2, "High", srt = 90, cex = 1.2) text(x = 1.2, y = 0.8, "Low", srt = 90, cex = 1.2) screen(2) par(mar = c(0, 0, 0, 0), bg = "white", new = T) plot(1, frame.plot = FALSE, type = "n", xaxt='n', yaxt='n') text(x = 1.04, y = 1.1, "Amplitude", cex = 2) text(x = 0.8, y = 0.8, "Low", cex = 1.2) text(x = 1.2, y = 0.8, "High", cex = 1.2) # generate sine wave high.freq <- tuneR::sine(freq = 440, duration = 500, xunit = "samples", samp.rate = 44100) low.freq <- tuneR::sine(freq = 220, duration = 500, xunit = "samples", samp.rate = 44100) screen(3) par(mar = rep(0, 4)) # plot plot(high.freq@left / 2, type = "l", xaxt='n', yaxt='n', ylim = c(-1, 1), lwd = lwd, col = col) abline(h = 0, lty = 2) screen(4) par(mar = rep(0, 4)) plot(high.freq@left, type = "l", xaxt='n', yaxt='n', ylim = c(-1, 1), lwd = lwd, col = col) abline(h = 0, lty = 2) # screen(5) par(mar = rep(0, 4)) plot(low.freq@left / 2, type = "l", xaxt='n', yaxt='n', ylim = c(-1, 1), lwd = lwd, col = col) abline(h = 0, lty = 2) screen(6) par(mar = rep(0, 4)) plot(low.freq@left, type = "l", xaxt='n', ann=FALSE, yaxt='n', ylim = c(-1, 1), lwd = lwd, col = col) abline(h = 0, lty = 2) close.screen(all.screens = TRUE) par(opar) # dev.off() ``` ::: {.alert .alert-success} Digitizing sound ## Sampling frequency Digitizing implies discretizing, which requires some sort of regular sampling. Sampling frequency refers to how many samples of the pressure level of the environment are taken per second. A 440 Hz sine wave recorded at 44100 Hz would have around 100 samples per cycle. This plot shows 2 cycles of a 440 Hz sine wave sampled (vertical dotted lines) at 44100 Hz: ```{r, echo = FALSE, out.width="100%", fig.height= 3, warning=FALSE, message=FALSE} source("~/Dropbox/R_package_testing/warbleR/R/read_sound_file.R") source("~/Dropbox/R_package_testing/warbleR/R/warbleR_options.R") opar <- par() par(mar = c(4, 4, 1, 1), bg = "#daeace") # generate sine wave sn.wv <- tuneR::sine(freq = 440, duration = 200, xunit = "samples", samp.rate = 44100) plot(x = seq(0, duration(sn.wv), length.out = length(sn.wv)), y = sn.wv@left, type = "l", yaxt='n', ylim = c(-1, 1), lwd = lwd, col = col, xlab = "Time (s)", ylab = "Amplitude", xaxs='i') rect(xleft=0, xright= duration(sn.wv)/ 2, ybottom=-2, ytop=2, col = adjustcolor("gray", alpha.f = 0.3), border = NA) abline(h = 0, lty = 2) abline(v = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], lwd = 0.5, lty = 2) points(y = sn.wv@left[seq(0, length(sn.wv), 10)], x = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], col = "orange", pch = 20, cex = 2) ``` ## Nyquist frequency Sampling should be good enough so the regularity of the sine wave can be reconstructed from the sampled data. Low sampling frequencies of a high frequency sine wave might not be able to provide enough information. For instance, the same 440 Hz sine wave sampled at 22050 Hz looks like this: ```{r, echo = FALSE, out.width="100%", fig.height= 3, warning=FALSE, message=FALSE} par(mar = c(4, 4, 1, 1), bg = "#daeace") # generate sine wave sn.wv <- tuneR::sine(freq = 440, duration = 50, xunit = "samples", samp.rate = 11025) plot(x = seq(0, duration(sn.wv), length.out = length(sn.wv)), y = sn.wv@left, type = "l", yaxt='n', ylim = c(-1, 1), lwd = lwd, col = col, xlab = "Time (s)", ylab = "Amplitude", xaxs='i') abline(h = 0, lty = 2) abline(v = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], lwd = 0.5, lty = 2) rect(xleft=0, xright= duration(sn.wv)/ 2, ybottom=-2, ytop=2, col = adjustcolor("gray", alpha.f = 0.3), border = NA) points(y = sn.wv@left[seq(0, length(sn.wv), 10)], x = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], col = "orange", pch = 20, cex = 2) ``` As you can see way less samples are taken per unit of time. The threshold at which samples cannot provide a reliable estimate of the regularity of a sine wave is called **Nyquist frequency** and corresponds to half of the frequency of the sine wave. This is how the 2 cycles of the 440 Hz would look like when sampled at its Nyquist frequency (sampling frequency of 880 Hz): ```{r, echo = FALSE, out.width="100%", fig.height= 3, warning=FALSE, message=FALSE} opar <- par() par(mar = c(4, 4, 1, 1), bg = "#daeace") # generate sine wave sn.wv <- tuneR::sine(freq = 400, duration = 200, xunit = "samples", samp.rate = 40000) sn.wv2 <- tuneR::sine(freq = 400, duration = 4, xunit = "samples", samp.rate = 800) plot(x = seq(0, duration(sn.wv), length.out = length(sn.wv)), y = sn.wv@left, type = "l", yaxt='n', ylim = c(-1, 1), lwd = lwd, col = col, xlab = "Time (s)", ylab = "Amplitude", xaxs='i') rect(xleft=0, xright= duration(sn.wv)/ 2, ybottom=-2, ytop=2, col = adjustcolor("gray", alpha.f = 0.3), border = NA) abline(h = 0, lty = 2) abline(v = seq(0, duration(sn.wv2), length.out = length(sn.wv2)), lwd = 0.5, lty = 2) sq <- seq(0, duration(sn.wv), length.out = length(sn.wv2)) sq <- sq / max(sq) points(y = sn.wv@left[round(sq * length(sn.wv@left))+ 1], x = seq(0, duration(sn.wv2), length.out = length(sn.wv2)), col = "orange", pch = 20, cex = 2) ``` ## Quantization Once we know at which point amplitude samples will be taken we just need to measure it. This process is called **quantization**. The range of amplitude values is discretized in a number of intervals equals to `2 ^ bits`. Hence, it involves some rounding of the actual amplitude values and some data loss. This is the same 440 Hz sine wave recorded at 44100 kHz quantized at 2 bits (2\^2 = 4 intervals): ```{r, echo = FALSE, out.width="100%", fig.height= 3, warning=FALSE, message=FALSE} opar <- par() intervals <- 4 par(mar = c(4, 4, 1, 1), bg = "#daeace") # generate sine wave sn.wv <- tuneR::sine(freq = 440, duration = 200, xunit = "samples", samp.rate = 44100) sn.wv@left <- sn.wv@left * (intervals - 1) / 2 plot(x = seq(0, duration(sn.wv), length.out = length(sn.wv)), y = sn.wv@left, type = "l", yaxt='n', ylim = c(-1 * (intervals ) / 2, (intervals - 1) / 2), lwd = lwd, col = col, xlab = "Time (s)", ylab = "Amplitude", xaxs='i') rect(xleft=0, xright= duration(sn.wv)/ 2, ybottom=-intervals, ytop= intervals, col = adjustcolor("gray", alpha.f = 0.3), border = NA) abline(h = 0, lty = 2) abline(v = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], lwd = 0.5, lty = 2) quant <- as.numeric(cut(sn.wv@left, breaks = intervals)) - (1 + (intervals / 2)) abline(h = unique(quant), lwd = 0.5, lty = 2) points(y = quant[seq(0, length(sn.wv), 10)], x = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], col = "orange", pch = 20, cex = 2) ``` Rounding and data loss is more obvious if we bind the sampled points with lines: ```{r, echo = FALSE, out.width="100%", fig.height= 3, warning=FALSE, message=FALSE} opar <- par() intervals <- 4 par(mar = c(4, 4, 1, 1), bg = "#daeace") # generate sine wave sn.wv <- tuneR::sine(freq = 440, duration = 200, xunit = "samples", samp.rate = 44100) sn.wv@left <- sn.wv@left * (intervals - 1) / 2 quant <- as.numeric(cut(sn.wv@left, breaks = intervals)) - (1 + (intervals / 2)) plot(x = seq(0, duration(sn.wv), length.out = length(sn.wv)), y = sn.wv@left, type = "l", yaxt='n', ylim = c(-1 * (intervals ) / 2, (intervals - 1) / 2), lwd = 2, col = col, xlab = "Time (s)", ylab = "Amplitude", xaxs='i', lty = 2) lines(x = seq(0, duration(sn.wv), length.out = length(sn.wv)), y = quant, type = "l", yaxt='n', ylim = c(-1 * (intervals - 1) / 2, (intervals - 1) / 2), lwd = lwd, col = col, xlab = "Time (s)", ylab = "Amplitude", xaxs='i') rect(xleft=0, xright= duration(sn.wv)/ 2, ybottom=-intervals, ytop= intervals, col = adjustcolor("gray", alpha.f = 0.3), border = NA) abline(h = 0, lty = 2) abline(v = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], lwd = 0.5, lty = 2) abline(h = unique(quant), lwd = 0.5, lty = 2) points(y = quant[seq(0, length(sn.wv), 10)], x = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], col = "orange", pch = 20, cex = 2) ``` This is the same signal quantized at 3 bits (2\^3 = 8 intervals): ```{r, echo = FALSE, out.width="100%", fig.height= 3, warning=FALSE, message=FALSE} opar <- par() intervals <- 8 par(mar = c(4, 4, 1, 1), bg = "#daeace") # generate sine wave sn.wv <- tuneR::sine(freq = 440, duration = 200, xunit = "samples", samp.rate = 44100) sn.wv@left <- sn.wv@left * (intervals - 1) / 2 quant <- as.numeric(cut(sn.wv@left, breaks = intervals)) - (1 + (intervals / 2)) plot(x = seq(0, duration(sn.wv), length.out = length(sn.wv)), y = sn.wv@left, type = "l", yaxt='n', ylim = c(-1 * (intervals ) / 2, (intervals - 1) / 2), lwd = 2, col = col, xlab = "Time (s)", ylab = "Amplitude", xaxs='i', lty = 2) lines(x = seq(0, duration(sn.wv), length.out = length(sn.wv)), y = quant, type = "l", yaxt='n', ylim = c(-1 * (intervals - 1) / 2, (intervals - 1) / 2), lwd = lwd, col = col, xlab = "Time (s)", ylab = "Amplitude", xaxs='i') rect(xleft=0, xright= duration(sn.wv)/ 2, ybottom=-intervals, ytop= intervals, col = adjustcolor("gray", alpha.f = 0.3), border = NA) abline(h = 0, lty = 2) abline(v = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], lwd = 0.5, lty = 2) abline(h = unique(quant), lwd = 0.5, lty = 2) points(y = quant[seq(0, length(sn.wv), 10)], x = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], col = "orange", pch = 20, cex = 2) ``` 4 bits (2\^4 = 16 intervals): ```{r, echo = FALSE, out.width="100%", fig.height= 3, warning=FALSE, message=FALSE} opar <- par() intervals <- 2 ^ 4 par(mar = c(4, 4, 1, 1), bg = "#daeace") # generate sine wave sn.wv <- tuneR::sine(freq = 440, duration = 200, xunit = "samples", samp.rate = 44100) sn.wv@left <- sn.wv@left * (intervals - 1) / 2 quant <- as.numeric(cut(sn.wv@left, breaks = intervals)) - (1 + (intervals / 2)) plot(x = seq(0, duration(sn.wv), length.out = length(sn.wv)), y = sn.wv@left, type = "l", yaxt='n', ylim = c(-1 * (intervals ) / 2, (intervals - 1) / 2), lwd = 2, col = col, xlab = "Time (s)", ylab = "Amplitude", xaxs='i', lty = 2) lines(x = seq(0, duration(sn.wv), length.out = length(sn.wv)), y = quant, type = "l", yaxt='n', ylim = c(-1 * (intervals - 1) / 2, (intervals - 1) / 2), lwd = lwd, col = col, xlab = "Time (s)", ylab = "Amplitude", xaxs='i') rect(xleft=0, xright= duration(sn.wv)/ 2, ybottom=-intervals, ytop= intervals, col = adjustcolor("gray", alpha.f = 0.3), border = NA) abline(h = 0, lty = 2) abline(v = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], lwd = 0.5, lty = 2) abline(h = unique(quant), lwd = 0.5, lty = 2) points(y = quant[seq(0, length(sn.wv), 10)], x = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], col = "orange", pch = 20, cex = 2) ``` .. and 8 bits (2\^8 = 256 intervals): ```{r, echo = FALSE, out.width="100%", fig.height= 3, warning=FALSE, message=FALSE} opar <- par() intervals <- 2 ^ 8 par(mar = c(4, 4, 1, 1), bg = "#daeace") # generate sine wave sn.wv <- tuneR::sine(freq = 440, duration = 200, xunit = "samples", samp.rate = 44100) sn.wv@left <- sn.wv@left * (intervals - 1) / 2 quant <- as.numeric(cut(sn.wv@left, breaks = intervals)) - (1 + (intervals / 2)) plot(x = seq(0, duration(sn.wv), length.out = length(sn.wv)), y = sn.wv@left, type = "l", yaxt='n', ylim = c(-1 * (intervals ) / 2, (intervals - 1) / 2), lwd = 2, col = col, xlab = "Time (s)", ylab = "Amplitude", xaxs='i', lty = 2) lines(x = seq(0, duration(sn.wv), length.out = length(sn.wv)), y = quant, type = "l", yaxt='n', ylim = c(-1 * (intervals - 1) / 2, (intervals - 1) / 2), lwd = lwd, col = col, xlab = "Time (s)", ylab = "Amplitude", xaxs='i') rect(xleft=0, xright= duration(sn.wv)/ 2, ybottom=-intervals, ytop= intervals, col = adjustcolor("gray", alpha.f = 0.3), border = NA) abline(h = 0, lty = 2) abline(v = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], lwd = 0.5, lty = 2) abline(h = unique(quant), lwd = 0.5, lty = 2) points(y = quant[seq(0, length(sn.wv), 10)], x = seq(0, duration(sn.wv), length.out = length(sn.wv))[seq(0, length(sn.wv), 10)], col = "orange", pch = 20, cex = 2) ``` At this point quantization involves very little information loss. 16 bits is probably the most common dynamic range used nowadays. As you can imagine, the high number of intervals (2\^16 = 65536) allows for great precision in the quantization of amplitude. ::: # Sound in R Sound waves can be represented by 3 kinds of R objects: - Common classes (numerical vector, numerical matrix) - Time series classes (ts, mts) - Specific sound classes (Wave, sound and audioSample) ## Non-specific classes ### Vectors Any numerical vector can be treated as a sound if a sampling frequency is provided. For example, a 440 Hz sinusoidal sound sampled at 8000 Hz for one second can be generated like this: ```{r, eval = TRUE} library(seewave) # create sinewave at 440 Hz s1 <- sin(2 * pi * 440 * seq(0, 1, length.out = 8000)) is.vector(s1) mode(s1) ``` These sequences of values only make sense when specifying the sampling rate at which they were created: ```{r, eval = TRUE} oscillo(s1, f = 8000, from = 0, to = 0.01) ``` ### Matrices You can read any single column matrix: ```{r, eval = TRUE} s2<-as.matrix(s1) is.matrix(s2) dim(s2) oscillo(s2, f = 8000, from = 0, to = 0.01) ``` If the matrix has more than one column, only the first column will be considered: ```{r, eval = TRUE} x<-rnorm(8000) s3<-cbind(s2,x) is.matrix(s3) dim(s3) oscillo(s3, f = 8000, from = 0, to = 0.01) ``` ### Time series The class `ts` and related functions `ts()`, `as.ts()`, `is.ts()` can also be used to generate sound objects in R. Here the command to similarly generate a series of time is shown corresponding to a 440 Hz sinusoidal sound sampled at 8000 Hz for one second: ```{r, eval = TRUE} s4 <- ts(data = s1, start = 0, frequency = 8000) str(s4) ``` To generate a random noise of 0.5 seconds: ```{r, eval = TRUE} s4 <- ts(data = runif(4000, min = -1, max = 1), start = 0, end = 0.5, frequency = 8000) str(s4) ``` The `frequency()` and `deltat()` functions return the sampling frequency ($f$) and the time resolution ($Delta t$) respectively: ```{r, eval = TRUE} frequency(s4) deltat(s4) ``` As the frequency is incorporated into the `ts` objects, it is not necessary to specify it when used within functions dedicated to audio: ```{r, eval = TRUE} oscillo(s4, from = 0, to = 0.01) ``` In the case of multiple time series, **seewave** functions will consider only the first series: ```{r, eval = TRUE} s5 <- ts(data = s3, f = 8000) class(s5) oscillo(s5, from = 0, to = 0.01) ``` ## Dedicated R classes for sound There are 3 kinds of objects corresponding to the `wav` binary format or the`mp3` compressed format: - `Wave` class of the package **tuneR** - `sound` class of the package **phonTools** - `AudioSample` class of the package **audio** ### `Wave` class (**tuneR**) The `Wave` class comes with the **tuneR** package. This S4 class includes different "slots" with the amplitude data (left or right channel), the sampling frequency (or frequency), the number of bits (8/16/24/32) and the type of sound (mono/stereo). High sampling rates (\> 44100 Hz) can be read on these types of objects. The function to import `.wav` files from the hard drive is `readWave`: ```{r, eval = TRUE} # load packages library(tuneR) s6 <- readWave("./examples/Phae.long1.wav") ``` We can verify the class of the object like this: ```{r, echo=TRUE} # object class class(s6) ``` S4 objects have a structure similar to lists but use '\@' to access each position (slot): ```{r} # structure str(s6) # extract 1 position s6@samp.rate ``` ::: {.alert .alert-warning} "Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio. In a PCM stream, the amplitude of the analog signal is sampled regularly at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps" ([Wikipedia](https://en.wikipedia.org/wiki/Pulse-code_modulation)). ::: The samples come in the slot '@left': ```{r} # samples s6@left[1:40] ``` The number of samples is given by the duration and the sampling rate. ::: {.alert .alert-info} Exercise - How can we calculate the duration of the `wave` object using the information in the object? - Extract the first second of audio from the object `s6` using indexing (and squared brackets) ::: An advantage of using `readWave()` is the ability to read specific segments of sound files, especially useful with long files. This is done using the `from` and`to` arguments and specifying the units of time with the `units` arguments. The units can be converted into "samples", "minutes" or "hours". For example, to read only the section that begins in 1s and ends in 2s of the file "Phae.long1.wav": ```{r, eval = TRUE} s7 <- readWave("./examples/Phae.long1.wav", from = 1, to = 2, units = "seconds") s7 ``` The `.mp3` files can be imported to R although they are imported in`Wave` format. This is done using the `readMP3()` function: ```{r, eval = TRUE} s7 <- readMP3("./examples/Phae.long1.mp3") s7 ``` To obtain information about the object (sampling frequency, number of bits, mono/stereo), it is necessary to use the indexing of S4 class objects: ```{r, eval = TRUE} s7@samp.rate s7@bit s7@stereo ``` A property that does not appear in these calls is that `readWave` does not normalize the sound. The values that describe the sound will be included between $\pm2^{bit} - 1$: ```{r, eval = TRUE} range(s7@left) ``` ::: {.alert .alert-info} Exercise The function `Wave` can be used to create wave objects. - Run the example code in the function documentation - Plot the oscillogram for the first 0.01 s of 'Wobj' - Note that the function `sine` provides a shortcut that can be used to create wave object with a sine wave. Check out other similar functions described in the `sine` function documentation. Try 4 of these alternative functions and plot the oscillogram of the first 0.01 s for each of them. ::: The function `read_sound_files` from warbleR is a wrapper over several sound file reading functions, that can read files in 'wav', 'mp3', 'flac' and 'wac' format: ```{r, eval = TRUE} library(warbleR) # wave rsf1 <- read_sound_file("Phaethornis-eurynome-15607.wav", path = "./examples") class(rsf1) # mp3 rsf2 <- read_sound_file("Phaethornis-striigularis-154074.mp3", path = "./examples") class(rsf2) # flac rsf3 <- read_sound_file("Phae.long1.flac", path = "./examples") class(rsf3) # wac rsf4 <- read_sound_file("recording_20170716_230503.wac", path = "./examples") class(rsf4) ``` The function can also read recordings hosted in an online repository: ```{r} rsf5 <- read_sound_file(X = "https://xeno-canto.org/35340/download") class(rsf5) rsf6 <- read_sound_file(X = "https://github.com/maRce10/OTS_BIR_2023/raw/master/examples/Phae.long1.flac") class(rsf6) ``` ## Class `sound` (**phonTools**) The `loadsound()` function of *phonTools* also imports 'wave' sound files into R, in this case as objects of class `sound`: ```{r, eval = TRUE} library(phonTools) s8 <- loadsound("./examples/Phae.long1.wav") s8 str(s8) ``` This function only imports files with a dynamic range of 8 or 16 bits. ## Class `audioSample` (**audio**) The **audio** package is another option to handle `.wav` files. The sound can be imported using the `load.wave()` function. The class of the resulting object is `audioSample` which is essentially a numerical vector (for mono) or a numerical matrix with two rows (for stereo). The sampling frequency and resolution are saved as attributes: ```{r, eval = TRUE} library(audio) s10 <- load.wave("./examples/Phae.long1.wav") head(s10) s10$rate s10$bits ``` The main advantage of the **audio** package is that the sound can be acquired directly within an R session. This is achieved first by preparing a `NAs` vector and then using the`record()` function. For example, to obtain a mono sound of 5 seconds sampled at 16 kHz: ```{r, eval = FALSE} s11 <- rep(NA_real_, 16000*5) record(s11, 16000, 1) ``` A recording session can be controlled by three complementary functions: `pause()`, `rewind()`, and `resume()`. ## Export sounds from R For maximum compatibility with other sound programs, it may be useful to save a sound as a simple `.txt` file. The following commands will write a "tico.txt" file: ```{r, eval = TRUE} data(tico) export(tico, f=22050) ``` ## Format '.wav' **tuneR** and **audio** have a function to write `.wav` files: `writeWave()` and `save.wave()` respectively. Within **seewave**, the `savewav()` function, which is based on `writeWave()`, can be used to save data in `.wav` format. By default, the object name will be used for the name of the `.wav` file: ```{r, eval = FALSE} savewav(tico) ``` ## Format '.flac' Free Lossless Audio Codec (FLAC) is a file format for lossless audio data compression. FLAC reduces bandwidth and storage requirements without sacrificing the integrity of the audio source. Audio sources encoded in FLAC are generally reduced in size from 40 to 50 percent. See the flac website for more details ([flac.sourceforge.net](http://flac.sourceforge.net)). The `.flac` format cannot be used as such with R. However, the `wav2flac()`function allows you to call the FLAC software directly from the console. Therefore, FLAC must be installed on your operating system. If you have a `.wav` file that you want to compress in `.flac`, call: ```{r, eval = FALSE} wav2flac(file = "./examples/Phae.long1.wav", overwrite = FALSE) ``` To compress a `.wav` file to a `.flac` format, the argument `reverse = TRUE` must be used: ```{r, eval = FALSE} wav2flac("Phae.long1.flac", reverse = TRUE) ``` This table, taken from Sueur (2018), summarizes the functions available to import and export sound files in R. The table is incomplete since it does not mention the functions of the `phonTools` package: <img src="images/tabla-waves.png" alt="tabla imp exp waves" width="700"/> ------------------------------------------------------------------------ ::: {.alert .alert-info} Exercise - How does the sampling rate affect the size of an audio file? (hint: create 2 sounds files with the same data but different sampling rates; use `sine()`) - How does the dynamic range affect the size of an audio file? - Use the `system.time()` function to compare the performance of the different functions to import audio files in R. For this use the file "LBH.374.SUR.wav" (Long-billed hermit songs) which lasts about 2 min The following code creates a plot similar to `oscillo` but using dots instead of lines: ```{r, eval = FALSE} # generate sine wave wav <- sine(freq = 440, duration = 500, xunit = "samples", samp.rate = 44100) # plot plot(wav@left) ``` ```{r, echo = FALSE} # generate sine wave wav <- sine(freq = 440, duration = 500, xunit = "samples", samp.rate = 44100) par(bg = "#ccebf5", mar = c(5, 4, 1, 0) + 0.1) # plot plot(wav@left) ``` - Use the function `downsample()` to reduce the sampling rate of 'wav' (below 44100) and plot the output object. Decrease the sampling rate until you cannot recognize the wave pattern from the original wave object. Try several values so you get a sense at which sampling rate this happens. ```{r, eval = FALSE, echo = FALSE} down_wav <- downsample(object = wav, samp.rate = 2756) # plot plot(down_wav@left) ``` ::: ------------------------------------------------------------------------ ## References 1. Sueur J, Aubin T, Simonis C. 2008. Equipment review: seewave, a free modular tool for sound analysis and synthesis. Bioacoustics 18(2):213--226. 2. Sueur, J. (2018). Sound Analysis and Synthesis with R. 3. Sueur J. (2018). I/O of sound with R. seewave package vignette. url: https://cran.r-project.org/web/packages/seewave/vignettes/seewave_IO.pdf ------------------------------------------------------------------------ Session information ```{r session info, echo=F} sessionInfo() ```

Objetives

0.1 Sampling frequency

0.2 Nyquist frequency

0.3 Quantization

1 Sound in R

1.1 Non-specific classes

1.1.1 Vectors

1.1.2 Matrices

1.1.3 Time series

1.2 Dedicated R classes for sound

1.2.1 Wave class (tuneR)

1.3 Class sound (phonTools)

1.4 Class audioSample (audio)

1.5 Export sounds from R

1.6 Format ‘.wav’

1.7 Format ‘.flac’

1.8 References

1.2.1 `Wave` class (tuneR)

1.3 Class `sound` (phonTools)

1.4 Class `audioSample` (audio)