Data Exploration¶

In this notebook, we are going to take a closer look at the data. Let us begin by loading everything in.

import librosa
import pandas as pd
import numpy as np
from IPython.lib.display import Audio
from matplotlib import pyplot as plt
import multiprocessing
import scipy.signal
from scipy import signal

anno = pd.read_pickle('data/annotations.dataframe.pkl.gz')

anno.head()

The annotations dataframe contains extracted calls in the call column. This data sets does not include other annotations. All of the calls have been recorded with a sample rate of 24kHz.

SAMPLE_RATE = 24000

There is a total of 17882 calls in this dataset.

anno.shape

(17882, 6)

They are of varying type and we do not have additional labels for them.

Here is what the distribution of call duration looks like

call_durations = anno.call.apply(lambda x: x.shape[0] / SAMPLE_RATE)
plt.title('Call durations in seconds')
plt.xlabel('seconds')
plt.ylabel('count')
plt.hist(call_durations);

Out of the 17882 calls, 16432 (92% of all calls) are under two seconds long.

sum(call_durations < 2)

16432

Let us look at the distribution of these shorter calls more closely.

plt.title('Call durations in seconds')
plt.xlabel('seconds')
plt.ylabel('count')
plt.hist(call_durations[call_durations < 2]);

The vocalizations are extremely varied. Below is a non-exhaustive selection to provide a better understanding of some of the richness of this dataset.

The labels are not annotations, but my own qualitative description of the calls.

freqs, times, Sx = signal.spectrogram(anno.call.iloc[1101], fs=SAMPLE_RATE)
f, ax = plt.subplots()
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx), cmap='viridis')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');

CPU times: user 38 s, sys: 984 ms, total: 39 s
Wall time: 19.7 s

There are calls that don't fit clearly into any of the above categories or that are some combination of them.

Here are two examples of unusual calls.

Another challenging aspect of this dataset is that some calls are quite subtle. Below are two such examples

To get a feel for what a conversation might sound like, below I am including a 20 second example. A couple of growl-like and squeak-like vocalizations can be heard in succession.

Potential issue in working with the dataset¶

Some calls are recorded against a mechanical background¶

Here is an example of such a call.

And here are 10 seconds from a recording where this issue exists.

Only a small portion of the dataset is affected but, depending on the downstream tasks, it might be necssary to single out and exclude these calls.

Some calls might be hard to visualize¶

Some calls can be clearly heard, but are not easy to visualize on a spectrogram due to being relatively faint.

freqs, times, Sx = signal.spectrogram(anno.call.iloc[1101], fs=SAMPLE_RATE)
f, ax = plt.subplots()
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx), cmap='viridis')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');

We can attempt to bring the vocalization to the foreground by adding a small value to the spectrogram before log-scaling the values.

freqs, times, Sx = signal.spectrogram(anno.call.iloc[1101], fs=SAMPLE_RATE)
f, ax = plt.subplots()
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-11), cmap='viridis')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');

Since most of the information is in the lower frequency range, log scaling the frequency axis is also worth considering.

freqs, times, Sx = signal.spectrogram(anno.call.iloc[1101], fs=SAMPLE_RATE)
f, ax = plt.subplots()
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-11), cmap='viridis')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');
ax.set_yscale('symlog')

Another option might be reaching out for the linearly reassigned spectrogram.

from spectral_hyperresolution.linear_reassignment_pytorch import high_resolution_spectrogram
# this can be installed by running !pip install git+git://github.com/earthspecies/spectral_hyperresolution.git

%%time
q = 1
tdeci = 100
over = 20
noct = 24
minf = 4e-3 # 4e-3 corresponds to frequency 4e-3 * sr which is 96 hz
maxf = 1

lin_spectrogram = high_resolution_spectrogram(anno.call.iloc[1101].reshape((-1, 1)), q, tdeci, over, noct, minf, maxf, 'cpu')

CPU times: user 38 s, sys: 984 ms, total: 39 s
Wall time: 19.7 s

lin_spectrogram = lin_spectrogram.detach().cpu().numpy().T

freqs_lin = np.linspace(0, 1, num=spectrogram.shape[0]) # dummy values
times_lin = np.linspace(0, anno.call.iloc[1101].shape[0] / SAMPLE_RATE, num=spectrogram.shape[1])

f, ax = plt.subplots()
ax.pcolormesh(times_lin, freqs_lin, 10 * np.log10(spectrogram+1e-6)[::-1,:], cmap='viridis')
ax.set_ylabel('Frequency [kHz]')
ax.set_yticklabels([])
ax.set_xlabel('Time [s]');

We see more structure, but without further experimentation, it is unclear if this exposes more of the signal or we are better visualizing the noise.

fig, axes = plt.subplots(1, 2, figsize=(10,4))

axes[0].pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-11), cmap='viridis')
axes[0].set_ylabel('Frequency [kHz]')
axes[0].set_xlabel('Time [s]');
axes[0].set_yscale('symlog')
axes[0].set_title('log spectrogram')

axes[1].pcolormesh(times_lin, freqs_lin, 10 * np.log10(spectrogram+1e-6)[::-1,:], cmap='viridis')
axes[1].set_ylabel('Frequency [kHz]')
axes[1].set_yticklabels([])
axes[1].set_xlabel('Time [s]');
axes[1].set_title('hyperresolution spectrogram');

CPP suitability analysis¶

This dataset could prove a challenge for cleanly separating individual calls. Here are the contributing factors:

background noise of high intensity in some portion of the calls
some of the calls being extremely faint and short in duration

A factor that could prove to be advantageous and contribute to ameliorating the above mentioned issues is the size of this dataset - over 17 000 thousand calls opens the route to further pre-processing.

	channel	filename	call_duration	offset_in_frames	duration_in_frames	call
0	2	190806140351.wav	0.823104	110469	19754	[9.1552734e-05, 9.1552734e-05, 9.1552734e-05, ...
1	2	190806140351.wav	1.169673	405746	28072	[-9.1552734e-05, -6.1035156e-05, -9.1552734e-0...
2	2	190806140351.wav	1.083031	588216	25992	[0.0016784668, 0.0016479492, 0.0016479492, 0.0...
3	2	190806140351.wav	1.169673	650599	28072	[-0.00024414062, -0.00021362305, -0.0002136230...
4	2	190806140351.wav	1.386280	692186	33270	[0.00061035156, 0.00061035156, 0.00064086914, ...