Data Exploration¶

In this notebook, we are going to take a closer look at the data. Let us begin by loading everything in.

import librosa
import pandas as pd
import numpy as np
from IPython.lib.display import Audio
from matplotlib import pyplot as plt
import multiprocessing
import scipy.signal
from scipy import signal

anno = pd.read_pickle('data/signature_whistles.pkl')
anno.head()

Whistles are stored in the audio columns. There are a total of 400 calls, 20 from each of the 20 individuals.

anno.groupby('identity')['audio'].count()

identity
FB02     20
FB05     20
FB07     20
FB09     20
FB10     20
FB101    20
FB11     20
FB122    20
FB131    20
FB15     20
FB163    20
FB182    20
FB25     20
FB33     20
FB35     20
FB55     20
FB67     20
FB71     20
FB79     20
FB92     20
Name: audio, dtype: int64

Nearly all of the calls have been recorded with a sampling rate of 96 kHz.

(anno.sample_rate == 96000).sum()

360

A few of the calls have been recorded with a sampling rate of 88200 Hz.

(anno.sample_rate == 88200).sum()

40

anno.head()

The duration of the recordings vary from 1.47 seconds to 3.02 seconds.

min(call_durations), max(call_durations)

(1.4799375, 3.0279166666666666)

This is how the durations break down.

call_durations = anno.audio.apply(lambda x: x.shape[0]) / anno.sample_rate
plt.title('Call durations in seconds')
plt.xlabel('seconds')
plt.ylabel('count')
plt.hist(call_durations);

Let's take a look at a single call from each of the individuals.

anno.head()

fig, subplots = plt.subplots(5,4, figsize=(20,30))

for (idx, row), ax in zip(anno.groupby('identity').sample(n=1).iterrows(), subplots.flat):
    freqs, times, Sx = signal.spectrogram(row.audio, fs=row.sample_rate)
    ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-10), cmap='viridis', shading='auto')
    ax.set_ylabel('Frequency [kHz]')
    ax.set_xlabel('Time [s]');
    ax.set_title(row.identity)

def save_call(audio): 
    audio = librosa.util.normalize(audio)
    idx = 0
    sf.write(f'{identity}.wav', audio, SAMPLE_RATE)
    !mv f'{identity}.wav' assets

import soundfile as sf

for idx, row in anno.groupby('identity').sample(n=1).iterrows():
    def save_call(audio): 
        audio = librosa.util.normalize(audio)
        idx = 0
        sf.write(f'{row.identity}.wav', audio, row.sample_rate)
        !mv '{row.identity}.wav' assets
    save_call(row.audio)

from IPython.core.display import display, HTML

# anno = anno.sample(frac=1)
# idx_and_vocal_type = [(idx, row.vocal_type) for (idx, row) in anno.groupby('vocal_type').sample(n=1).iterrows()]

for identity in anno.sort_values(by='identity').identity.unique():
    display(HTML(f'''
        {identity}
        <audio style="display: block"
        controls
        src="assets/{identity}.wav">
            Your browser does not support the
            <code>audio</code> element.
        </audio>
        ''')
    )

CPP suitability analysis¶

Due to the high quality of the recordings, this dataset would lend itself well to creating synthetic mixtures to carry out a CPP study.

Unfortunately, there are no naturally overlapping calls from several individuals in the dataset which limits us to the synthetic data creation scenario.

	identity	audio	sample_rate
0	FB79	[0.0015869141, -0.0025634766, -0.0069885254, -...	96000
1	FB79	[-0.00024414062, 0.0005493164, 0.00079345703, ...	96000
2	FB79	[-0.00030517578, 0.0045776367, 0.0074157715, 0...	96000
3	FB79	[0.009552002, 0.010467529, 0.010437012, 0.0090...	96000
4	FB79	[0.00030517578, -0.0026245117, -0.0052490234, ...	96000

	identity	audio	sample_rate
0	FB79	[ 0.00158691 -0.00256348 -0.00698853 ... 0.00...	96000
1	FB79	[-0.00024414 0.00054932 0.00079346 ... 0.00...	96000
2	FB79	[-0.00030518 0.00457764 0.00741577 ... 0.00...	96000
3	FB79	[ 0.009552 0.01046753 0.01043701 ... -0.03...	96000
4	FB79	[ 0.00030518 -0.00262451 -0.00524902 ... 0.00...	96000