Data Exploration

In this notebook, we are going to take a closer look at the data. Let us begin by loading everything in.

In [6]:
import librosa
import pandas as pd
import numpy as np
from IPython.lib.display import Audio
from matplotlib import pyplot as plt
import multiprocessing
import scipy.signal
from scipy import signal

anno = pd.read_pickle('data/signature_whistles.pkl')
anno.head()
Out[6]:
identity audio sample_rate
0 FB79 [0.0015869141, -0.0025634766, -0.0069885254, -... 96000
1 FB79 [-0.00024414062, 0.0005493164, 0.00079345703, ... 96000
2 FB79 [-0.00030517578, 0.0045776367, 0.0074157715, 0... 96000
3 FB79 [0.009552002, 0.010467529, 0.010437012, 0.0090... 96000
4 FB79 [0.00030517578, -0.0026245117, -0.0052490234, ... 96000

Whistles are stored in the audio columns. There are a total of 400 calls, 20 from each of the 20 individuals.

In [2]:
anno.groupby('identity')['audio'].count()
Out[2]:
identity
FB02     20
FB05     20
FB07     20
FB09     20
FB10     20
FB101    20
FB11     20
FB122    20
FB131    20
FB15     20
FB163    20
FB182    20
FB25     20
FB33     20
FB35     20
FB55     20
FB67     20
FB71     20
FB79     20
FB92     20
Name: audio, dtype: int64

Nearly all of the calls have been recorded with a sampling rate of 96 kHz.

In [3]:
(anno.sample_rate == 96000).sum()
Out[3]:
360

A few of the calls have been recorded with a sampling rate of 88200 Hz.

In [4]:
(anno.sample_rate == 88200).sum()
Out[4]:
40
In [6]:
anno.head()
Out[6]:
identity audio sample_rate
0 FB79 [ 0.00158691 -0.00256348 -0.00698853 ... 0.00... 96000
1 FB79 [-0.00024414 0.00054932 0.00079346 ... 0.00... 96000
2 FB79 [-0.00030518 0.00457764 0.00741577 ... 0.00... 96000
3 FB79 [ 0.009552 0.01046753 0.01043701 ... -0.03... 96000
4 FB79 [ 0.00030518 -0.00262451 -0.00524902 ... 0.00... 96000

The duration of the recordings vary from 1.47 seconds to 3.02 seconds.

In [9]:
min(call_durations), max(call_durations)
Out[9]:
(1.4799375, 3.0279166666666666)

This is how the durations break down.

In [7]:
call_durations = anno.audio.apply(lambda x: x.shape[0]) / anno.sample_rate
plt.title('Call durations in seconds')
plt.xlabel('seconds')
plt.ylabel('count')
plt.hist(call_durations);

Let's take a look at a single call from each of the individuals.

In [10]:
anno.head()
Out[10]:
identity audio sample_rate
0 FB79 [0.0015869141, -0.0025634766, -0.0069885254, -... 96000
1 FB79 [-0.00024414062, 0.0005493164, 0.00079345703, ... 96000
2 FB79 [-0.00030517578, 0.0045776367, 0.0074157715, 0... 96000
3 FB79 [0.009552002, 0.010467529, 0.010437012, 0.0090... 96000
4 FB79 [0.00030517578, -0.0026245117, -0.0052490234, ... 96000
In [14]:
fig, subplots = plt.subplots(5,4, figsize=(20,30))

for (idx, row), ax in zip(anno.groupby('identity').sample(n=1).iterrows(), subplots.flat):
    freqs, times, Sx = signal.spectrogram(row.audio, fs=row.sample_rate)
    ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-10), cmap='viridis', shading='auto')
    ax.set_ylabel('Frequency [kHz]')
    ax.set_xlabel('Time [s]');
    ax.set_title(row.identity)
In [ ]:
def save_call(audio): 
    audio = librosa.util.normalize(audio)
    idx = 0
    sf.write(f'{identity}.wav', audio, SAMPLE_RATE)
    !mv f'{identity}.wav' assets
In [25]:
import soundfile as sf

for idx, row in anno.groupby('identity').sample(n=1).iterrows():
    def save_call(audio): 
        audio = librosa.util.normalize(audio)
        idx = 0
        sf.write(f'{row.identity}.wav', audio, row.sample_rate)
        !mv '{row.identity}.wav' assets
    save_call(row.audio)
In [27]:
from IPython.core.display import display, HTML

# anno = anno.sample(frac=1)
# idx_and_vocal_type = [(idx, row.vocal_type) for (idx, row) in anno.groupby('vocal_type').sample(n=1).iterrows()]
In [30]:
for identity in anno.sort_values(by='identity').identity.unique():
    display(HTML(f'''
        {identity}
        <audio style="display: block"
        controls
        src="assets/{identity}.wav">
            Your browser does not support the
            <code>audio</code> element.
        </audio>
        ''')
    )
FB02
FB05
FB07
FB09
FB10
FB101
FB11
FB122
FB131
FB15
FB163
FB182
FB25
FB33
FB35
FB55
FB67
FB71
FB79
FB92

CPP suitability analysis

Due to the high quality of the recordings, this dataset would lend itself well to creating synthetic mixtures to carry out a CPP study.

Unfortunately, there are no naturally overlapping calls from several individuals in the dataset which limits us to the synthetic data creation scenario.