In this notebook, we are going to take a closer look at the data.
import librosa
import pandas as pd
import numpy as np
from IPython.lib.display import Audio
from matplotlib import pyplot as plt
import multiprocessing
import scipy.signal
from scipy import signal
from pathlib import Path
import warnings
There are three collections in this dataset.
ls data
The dataset has no labels, but the recordings are very clear and high fidelity. They are also very dense in vocalizations - moments when no animal can be heard are very rare.
There are 3 collections included in this dataset:
Let's take a look at Hawaii Recordings first.
hawaii = list(Path('data/Hawaii Recordings').iterdir())
This collection contains 42 recordings.
len(hawaii)
with warnings.catch_warnings():
warnings.simplefilter("ignore")
srs = []
durations = []
for rec in hawaii:
x, sr = librosa.core.load(rec, sr=None, mono=False)
srs.append(sr)
durations.append(x.shape[1] / sr )
assert x.shape[0] == 2
All the recordings are stereo (2-channel). They have been recorded with a sample rate of 44.1 kHz and 48 kHz.
This collection contains over 14.5 hours of audio.
sum(durations) / 60 / 60
To get a better feel for the dataset, let us listen to a couple of 1 minute excerpts and take a look at their spectrograms.
rec = 'data/Hawaii Recordings/2016 Hawaii Feb (Molokai Princesso).MP3'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, sr*4*60:sr*5*60]
fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');
The following recording is very interesting, extremely varied, but it is not as crystal clear as the one above - some noise can be heard.
rec = 'data/Hawaii Recordings/ZOOM0012 - solo singer Feb 19 2018.WAV'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, 60*sr:2*60*sr]
fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');
salish_sea = list(Path('data/Salish Sea Recordings').iterdir())
This collection contains just 3 recordings.
len(salish_sea)
with warnings.catch_warnings():
warnings.simplefilter("ignore")
srs = []
durations = []
for rec in salish_sea:
x, sr = librosa.core.load(rec, sr=None, mono=False)
srs.append(sr)
durations.append(x.shape[1] / sr )
assert x.shape[0] == 2
All the recordings are stereo (2-channel) and have been with a sample rate of 44.1 kHz.
set(srs)
This collection contains over 2 hours of audio.
sum(durations) / 60 / 60
Here are 1 minute excerpts from each of the 3 files.
rec = 'data/Salish Sea Recordings/11052016-LK-HB-start2236-1HR-AWESOME-ZOOM0001.mp3'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, sr*10*60:sr*11*60]
fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');
rec = 'data/Salish Sea Recordings/2020 Jan 7 Lime Kiln.wav'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, int(sr*7.5*60):int(sr*8.5*60)]
fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');
rec = 'data/Salish Sea Recordings/2019 Dec 22 Lime Kiln.wav'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, int(sr*20.2*60):int(sr*21.2*60)]
fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-10), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');
These two recordings above seem to have been recoreded with a sampling rate of 44.1 kHz and subsequently a low pass filter was applied.
tahiti = list(Path('data/Tahiti Whale Song').iterdir())
This collection contains 8 recordings.
len(tahiti)
with warnings.catch_warnings():
warnings.simplefilter("ignore")
srs = []
durations = []
for rec in tahiti:
x, sr = librosa.core.load(rec, sr=None, mono=False)
srs.append(sr)
durations.append(x.shape[1] / sr )
assert x.shape[0] == 2
All the recordings are stereo (2-channel). They have been recorded with a sample rate of 44.1 kHz and 96 kHz.
set(srs)
The recordings contain over 2.5 hours of audio.
sum(durations) / 60 / 60
Here are a couple of 1 minute excerpts along with their spectrograms.
rec = 'data/Tahiti Whale Song/Tahiti Island Point Whale Song Sept 16 2018 Hydro 1.WAV'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, int(sr*2.8*60):int(sr*3.8*60)]
fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');
rec = 'data/Tahiti Whale Song/Tahiti Harold Whale 2 Sept 20 2018.WAV'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, 60*sr:2*60*sr]
fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');
rec = 'data/Tahiti Whale Song/2016 Whale Song Tahiti Hydro Hemene 2.wav'
x, sr = librosa.core.load(rec, sr=None, mono=False)
audio = x[:, 4*60*sr:5*60*sr]
fig, ax = plt.subplots()
freqs, times, Sx = signal.spectrogram(audio[0], fs=sr)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');
The audio is very clear and the number of recorded calls is very high, which suggests that this could be a particularly good dataset to take a closer look at in the context of the CPP.
The main issue is lack of labels. But file names indicate the researcher might know how many whales were vocalizing in some of the recordings. We are currently in the process of following up with the person who collected these recordings on what labels could be sourced. Once we understand the situation better, maybe some of the labelling could be automated.
The clarity of these recordings is unparalleled, which suggests that the investigation of the potential for labels makes a lot of sense.