Data Exploration

In this notebook, we are going to take a closer look at the data.

There are 17 audio files, each being a recording of a 24 hour period.

In [1]:
ls -l data
total 19759544
-rw-r--r-- 1 radek radek 8523476253 Apr 29 16:47 korup_soundscape_17.zip
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:00 kp07_20150221_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:00 kp07_20151019_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:01 kp07_20161110_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:01 kp07_20170627_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:02 kp09_20130728_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:00 kp09_20131220_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:00 kp12_20150911_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:02 kp12_20170331_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:01 kp12_20171031_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:01 kp13_20150514_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:01 kp13_20160222_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:00 kp14_20140513_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:02 kp14_20141202_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:00 kp14_20170829_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:00 kp15_20160523_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:01 kp16_20131214_000000.wav
-rw-r--r-- 1 radek radek  688833024 Apr 29 12:01 kp16_20140929_000003.wav
In [2]:
import librosa
import pandas as pd
import numpy as np
from IPython.lib.display import Audio
from matplotlib import pyplot as plt
import multiprocessing
import scipy.signal
from scipy import signal
In [3]:
audio, sr = librosa.core.load('data/kp07_20150221_000000.wav', sr=None, mono=False)

sr
Out[3]:
4000

Audio has been recorded with a sample rate of 4khz.

In [4]:
SAMPLE_RATE = 4000

The recordings are single channel (mono).

In [5]:
audio.shape
Out[5]:
(344400128,)

24 hours is too long of an audio file to listen to via a browser, but we can look at smaller excerpts.

Here are the first five minutes of a recording.

And this is what a spectrogram of this excerpt looks like.

In [6]:
freqs, times, Sx = signal.spectrogram(audio[:5*60*SAMPLE_RATE], fs=SAMPLE_RATE)
plt.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
plt.ylabel('Frequency [kHz]')
plt.xlabel('Time [s]');
plt.title('First five minutes of the recording');

To get a better feel for the data, let's listen to an excerpt recorded during the day where several species can be heard.

In [7]:
freqs, times, Sx = signal.spectrogram(audio[(8*60*60+53*60)*SAMPLE_RATE:(8*60*60+58*60)*SAMPLE_RATE], fs=SAMPLE_RATE)
plt.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
plt.ylabel('Frequency [kHz]')
plt.xlabel('Time [s]');
plt.title('Five minutes of audio recorded in day time');

And this is what the entire 24 hour period looks like. We can clearly see the difference between day time and night time.

In [10]:
freqs, times, Sx = signal.spectrogram(audio, fs=SAMPLE_RATE, nperseg=60*SAMPLE_RATE)
plt.pcolormesh(times, freqs / 1000, 10 * np.log10(Sx+1e-9), cmap='viridis', shading='auto')
plt.ylabel('Frequency [kHz]')
plt.xlabel('Time [s]');
plt.title('The entire 24-hour period');

CPP suitability analysis

There are probably many creative ways in which this data could be used for a CPP study.

In its most basic form, calls could be hand labeled and synthetic mixtures could be created. We could try to mix calls coming from different days, but unfortunately there is no guarantee that they would come from dofferent individuals. That is unless we would attempt disambiguating calls from different species.

Another interesting aspect of this data is that longer sequences could be identified. Instead of a mixute of single calls we could attempt to mix longer sequences and see what results we would get. This might lead to interesting findings about the CPP pipeline that could translate to strenghtening the performance of it even further.