In this notebook, we are going to take a closer look at the data. Let us begin by loading everything in.
import librosa
import pandas as pd
import numpy as np
from IPython.lib.display import Audio
from matplotlib import pyplot as plt
import multiprocessing
import scipy.signal
from scipy import signal
anno = pd.read_pickle('data/signature_whistles.pkl')
anno.head()
Whistles are stored in the audio columns. There are a total of 400 calls, 20 from each of the 20 individuals.
anno.groupby('identity')['audio'].count()
Nearly all of the calls have been recorded with a sampling rate of 96 kHz.
(anno.sample_rate == 96000).sum()
A few of the calls have been recorded with a sampling rate of 88200 Hz.
(anno.sample_rate == 88200).sum()
anno.head()
The duration of the recordings vary from 1.47 seconds to 3.02 seconds.
min(call_durations), max(call_durations)
This is how the durations break down.
call_durations = anno.audio.apply(lambda x: x.shape[0]) / anno.sample_rate
plt.title('Call durations in seconds')
plt.xlabel('seconds')
plt.ylabel('count')
plt.hist(call_durations);
Let's take a look at a single call from each of the individuals.
anno.head()
fig, subplots = plt.subplots(5,4, figsize=(20,30))
for (idx, row), ax in zip(anno.groupby('identity').sample(n=1).iterrows(), subplots.flat):
freqs, times, Sx = signal.spectrogram(row.audio, fs=row.sample_rate)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-10), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');
ax.set_title(row.identity)
def save_call(audio):
audio = librosa.util.normalize(audio)
idx = 0
sf.write(f'{identity}.wav', audio, SAMPLE_RATE)
!mv f'{identity}.wav' assets
import soundfile as sf
for idx, row in anno.groupby('identity').sample(n=1).iterrows():
def save_call(audio):
audio = librosa.util.normalize(audio)
idx = 0
sf.write(f'{row.identity}.wav', audio, row.sample_rate)
!mv '{row.identity}.wav' assets
save_call(row.audio)
from IPython.core.display import display, HTML
# anno = anno.sample(frac=1)
# idx_and_vocal_type = [(idx, row.vocal_type) for (idx, row) in anno.groupby('vocal_type').sample(n=1).iterrows()]
for identity in anno.sort_values(by='identity').identity.unique():
display(HTML(f'''
{identity}
<audio style="display: block"
controls
src="assets/{identity}.wav">
Your browser does not support the
<code>audio</code> element.
</audio>
''')
)
Due to the high quality of the recordings, this dataset would lend itself well to creating synthetic mixtures to carry out a CPP study.
Unfortunately, there are no naturally overlapping calls from several individuals in the dataset which limits us to the synthetic data creation scenario.