In this notebook, we are going to take a closer look at the data. Let us begin by loading everything in.
import librosa
import pandas as pd
import numpy as np
from IPython.lib.display import Audio
from matplotlib import pyplot as plt
import multiprocessing
import scipy.signal
from scipy import signal
anno = pd.read_pickle('data/anno.pkl')
anno.head()
All of the audio has been recorded with a sample rate of 44.1 kHz. Each example is 5 seconds long.
Annotations include the category label (there are 50 categoris in total). Additionally, there is a smaller, less diversed dataset, that was sampled from this one - the ESC-10. Whether a recording made it into that dataset is indicated by the esc10 column.
Additionally, this dataset contains a suggestion on how to split it into folds. This information is contained in the fold column. This has been designed to minimize leakage - following these splits, all the segments from a single audio file should be in the same fold.
SAMPLE_RATE = 44100
There are 2000 records in this dataset.
anno.shape[0]
And the categories are balanced - 40 examples per category.
anno.category.value_counts()
To get a better feel for the dataset, let's listen to an example from each of the classes!
from IPython.core.display import HTML
for category in anno.category.unique():
display(HTML(f'''
{category}
<audio style="display: block"
controls
src="assets/{category}.wav">
Your browser does not support the
<code>audio</code> element.
</audio>
''')
)
Last but not least, let's take a look at a spectrogram for an example from each category.
fig, subplots = plt.subplots(10,5, figsize=(40,60))
for (idx, row), ax in zip(anno.groupby('category').sample(n=1).iterrows(), subplots.flat):
freqs, times, Sx = signal.spectrogram(row.audio, fs=SAMPLE_RATE)
ax.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
ax.set_ylabel('Frequency [kHz]')
ax.set_xlabel('Time [s]');
ax.set_title(row.category)