Data Exploration¶

In this notebook, we are going to take a closer look at the data.

import librosa
import pandas as pd
import numpy as np
from IPython.lib.display import Audio
from matplotlib import pyplot as plt
import multiprocessing
import scipy.signal
from scipy import signal
from pathlib import Path
import warnings

There are three collections in this dataset.

ls data

Hawaii Recordings/  Salish Sea Recordings/  Tahiti Whale Song/

The dataset has no labels, but the recordings are very clear and high fidelity. They are also very dense in vocalizations - moments when no animal can be heard are very rare.

There are 3 collections included in this dataset:

Hawaii Recordings
Salish Sea Recordings
Tahiti Whale Song

Let's take a look at Hawaii Recordings first.

Hawaii Recordings¶

hawaii = list(Path('data/Hawaii Recordings').iterdir())

This collection contains 42 recordings.

len(hawaii)

42

with warnings.catch_warnings():
    warnings.simplefilter("ignore")

    srs = []
    durations = []
    for rec in hawaii:
        x, sr = librosa.core.load(rec, sr=None, mono=False)
        srs.append(sr)
        durations.append(x.shape[1] / sr )
        assert x.shape[0] == 2

All the recordings are stereo (2-channel). They have been recorded with a sample rate of 44.1 kHz and 48 kHz.

This collection contains over 14.5 hours of audio.

sum(durations) / 60 / 60

14.573022164273118

To get a better feel for the dataset, let us listen to a couple of 1 minute excerpts and take a look at their spectrograms.

rec = 'data/Hawaii Recordings/2016 Hawaii Feb (Molokai  Princesso).MP3'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, sr*4*60:sr*5*60]

/opt/conda/lib/python3.7/site-packages/librosa/core/audio.py:162: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")

fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');

The following recording is very interesting, extremely varied, but it is not as crystal clear as the one above - some noise can be heard.

rec = 'data/Hawaii Recordings/ZOOM0012 - solo singer Feb 19 2018.WAV'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, 60*sr:2*60*sr]

fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');

Salish Sea Recordings¶

salish_sea = list(Path('data/Salish Sea Recordings').iterdir())

This collection contains just 3 recordings.

len(salish_sea)

3

with warnings.catch_warnings():
    warnings.simplefilter("ignore")

    srs = []
    durations = []
    for rec in salish_sea:
        x, sr = librosa.core.load(rec, sr=None, mono=False)
        srs.append(sr)
        durations.append(x.shape[1] / sr )
        assert x.shape[0] == 2

All the recordings are stereo (2-channel) and have been with a sample rate of 44.1 kHz.

set(srs)

{44100}

This collection contains over 2 hours of audio.

sum(durations) / 60 / 60

2.3396802028218695

Here are 1 minute excerpts from each of the 3 files.

rec = 'data/Salish Sea Recordings/11052016-LK-HB-start2236-1HR-AWESOME-ZOOM0001.mp3'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, sr*10*60:sr*11*60]

/opt/conda/lib/python3.7/site-packages/librosa/core/audio.py:162: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")

fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');

rec = 'data/Salish Sea Recordings/2020 Jan 7 Lime Kiln.wav'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, int(sr*7.5*60):int(sr*8.5*60)]

fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');

rec = 'data/Salish Sea Recordings/2019 Dec 22 Lime Kiln.wav'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, int(sr*20.2*60):int(sr*21.2*60)]

fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-10), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');

These two recordings above seem to have been recoreded with a sampling rate of 44.1 kHz and subsequently a low pass filter was applied.

Tahiti Whale Song¶

tahiti = list(Path('data/Tahiti Whale Song').iterdir())

This collection contains 8 recordings.

len(tahiti)

8

with warnings.catch_warnings():
    warnings.simplefilter("ignore")

    srs = []
    durations = []
    for rec in tahiti:
        x, sr = librosa.core.load(rec, sr=None, mono=False)
        srs.append(sr)
        durations.append(x.shape[1] / sr )
        assert x.shape[0] == 2

All the recordings are stereo (2-channel). They have been recorded with a sample rate of 44.1 kHz and 96 kHz.

set(srs)

{44100, 96000}

The recordings contain over 2.5 hours of audio.

sum(durations) / 60 / 60

2.688144381456286

Here are a couple of 1 minute excerpts along with their spectrograms.

rec = 'data/Tahiti Whale Song/Tahiti Island Point Whale Song Sept 16 2018 Hydro 1.WAV'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, int(sr*2.8*60):int(sr*3.8*60)]

fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');

rec = 'data/Tahiti Whale Song/Tahiti Harold Whale 2 Sept 20 2018.WAV'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, 60*sr:2*60*sr]

fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');

rec = 'data/Tahiti Whale Song/2016 Whale Song Tahiti Hydro Hemene 2.wav'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, 4*60*sr:5*60*sr]

fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');

CPP suitability analysis¶

The audio is very clear and the number of recorded calls is very high, which suggests that this could be a particularly good dataset to take a closer look at in the context of the CPP.

The main issue is lack of labels. But file names indicate the researcher might know how many whales were vocalizing in some of the recordings. We are currently in the process of following up with the person who collected these recordings on what labels could be sourced. Once we understand the situation better, maybe some of the labelling could be automated.

The clarity of these recordings is unparalleled, which suggests that the investigation of the potential for labels makes a lot of sense.