Audio Transforms

class klio_audio.transforms.audio.LoadAudio(**librosa_kwargs)

Load audio into memory as a numpy.ndarray.

This transform wraps librosa.load() takes in a PCollection of KlioMessages with the payload of the KlioMessage a file-like object or a path to a file, and returns a PCollection of KlioMessages where the payload is a numpy.ndarray.

Example:

# run.py
import apache_beam as beam
from klio.transforms import decorators
from klio_audio.transforms import audio

@decorators.handle_klio
def element_to_filename(ctx, data):
    filename = data.element.decode("utf-8")
    return f"file:///path/to/audio/{filename}.wav"

def run(in_pcol, job_config):
    return (
        in_pcol
        | beam.Map(element_to_filename)
        | audio.LoadAudio()
        # other transforms
    )
Parameters

librosa_kwargs (dict) – Instantiate the transform with keyword arguments to pass into librosa.load().

class klio_audio.transforms.audio.GetSTFT(**librosa_kwargs)

Calculate Short-time Fourier transform from a numpy.ndarray.

This transform wraps librosa.stft() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray and the output is the same with the stft calculation applied.

The Short-time Fourier transform (STFT) is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. STFT provides the time-localized frequency information for situations in which frequency components of a signal vary over time, whereas the standard Fourier transform provides the frequency information averaged over the entire signal time interval.

Example:

# run.py
import apache_beam as beam
from klio.transforms import decorators
from klio_audio.transforms import audio

@decorators.handle_klio
def element_to_filename(ctx, data):
    filename = data.element.decode("utf-8")
    return f"file:///path/to/audio/{filename}.wav"

def run(in_pcol, job_config):
    return (
        in_pcol
        | beam.Map(element_to_filename)
        | audio.LoadAudio()
        | audio.GetSTFT
        # other transforms
    )
Parameters

librosa_kwargs (dict) – Instantiate the transform with keyword arguments to pass into librosa.stft().

class klio_audio.transforms.audio.GetSpec(**librosa_kwargs)

Generate a dB-scaled spectrogram from a numpy.ndarray.

This transform wraps librosa.amplitude_to_db() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray and the output is the same with the amplitude_to_ db function applied.

A spectrogram shows the the intensity of frequencies over time.

Example:

# run.py
import apache_beam as beam
from klio.transforms import decorators
from klio_audio.transforms import audio

@decorators.handle_klio
def element_to_filename(ctx, data):
    filename = data.element.decode("utf-8")
    return f"file:///path/to/audio/{filename}.wav"

def run(in_pcol, job_config):
    return (
        in_pcol
        | beam.Map(element_to_filename)
        | audio.LoadAudio()
        | audio.GetSpec()
        # other transforms
    )
Parameters

librosa_kwargs (dict) – Instantiate the transform with keyword arguments to pass into librosa.amplitude_to_db().

class klio_audio.transforms.audio.GetMelSpec(**librosa_kwargs)

Generate a spectrogram from a numpy.ndarray using the mel scale.

This transform wraps librosa.feature.melspectrogram() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray and the output is the same with the melspectrogram function applied.

The mel scale is a non-linear transformation of frequency scale based on the perception of pitches. The mel scale is calculated so that two pairs of frequencies separated by a delta in the mel scale are perceived by humans as being equidistant.

Example:

# run.py
import apache_beam as beam
from klio.transforms import decorators
from klio_audio.transforms import audio

@decorators.handle_klio
def element_to_filename(ctx, data):
    filename = data.element.decode("utf-8")
    return f"file:///path/to/audio/{filename}.wav"

def run(in_pcol, job_config):
    return (
        in_pcol
        | beam.Map(element_to_filename)
        | audio.LoadAudio()
        | audio.GetMelSpec()
        # other transforms
    )
Parameters

librosa_kwargs (dict) – Instantiate the transform with keyword arguments to pass into librosa.feature.melspectrogram().

class klio_audio.transforms.audio.GetMFCC(**librosa_kwargs)

Calculate MFCCs from a numpy.ndarray.

This transform wraps librosa.power_to_db() followed by librosa.feature.mfcc() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray and the output is the same with the mfcc function applied.

The Mel frequency cepstral coefficients (MFCCs) of a signal are a small set of features (usually about 10–20) which describe the overall shape of a spectral envelope. It’s is often used to describe timbre or model characteristics of human voice.

Example:

# run.py
import apache_beam as beam
from klio.transforms import decorators
from klio_audio.transforms import audio

@decorators.handle_klio
def element_to_filename(ctx, data):
    filename = data.element.decode("utf-8")
    return f"file:///path/to/audio/{filename}.wav"

def run(in_pcol, job_config):
    return (
        in_pcol
        | beam.Map(element_to_filename)
        | audio.LoadAudio()
        | audio.GetMFCC()
        # other transforms
    )
Parameters

librosa_kwargs (dict) – Instantiate the transform with keyword arguments to pass into librosa.feature.mfcc().

class klio_audio.transforms.audio.SpecToPlot(title=None, **plot_args)

Generate a matplotlib figure of the spectrogram of a numpy.ndarray.

This transform wraps librosa.display.specshow() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray of a spectrogram and the output is a matplotlib.figure.Figure instance.

Example:

# run.py
import apache_beam as beam
from klio.transforms import decorators
from klio_audio.transforms import audio

@decorators.handle_klio
def element_to_filename(ctx, data):
    filename = data.element.decode("utf-8")
    return f"file:///path/to/audio/{filename}.wav"

def run(in_pcol, job_config):
    return (
        in_pcol
        | beam.Map(element_to_filename)
        | audio.LoadAudio()
        | audio.GetSpec()
        | audio.SpecToPlot()
        # other transforms
    )
Parameters
  • title (str) – Title of spectrogram plot. Default: Spectrogram of {KlioMessage.data.element}.

  • plot_args (dict) – keyword arguments to pass to librosa.display.specshow().

class klio_audio.transforms.audio.MelSpecToPlot(title=None, **plot_args)

Generate a matplotlib figure of the mel spectrogram of a a numpy.ndarray.

This transform wraps librosa.power_to_db() followed by librosa.display.specshow() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray of a melspectrogram and the output is a matplotlib.figure.Figure instance.

Example:

# run.py
import apache_beam as beam
from klio.transforms import decorators
from klio_audio.transforms import audio

@decorators.handle_klio
def element_to_filename(ctx, data):
    filename = data.element.decode("utf-8")
    return f"file:///path/to/audio/{filename}.wav"

def run(in_pcol, job_config):
    return (
        in_pcol
        | beam.Map(element_to_filename)
        | audio.LoadAudio()
        | audio.GetMelSpec()
        | audio.SpecToPlot()
        # other transforms
    )
Parameters
  • title (str) – Title of spectrogram plot. Default: Mel-freqency Spectrogram of {KlioMessage.data.element}.

  • plot_args (dict) – keyword arguments to pass to librosa.display.specshow().

class klio_audio.transforms.audio.MFCCToPlot(title=None, **plot_args)

Generate a matplotlib figure of the MFCCs as a numpy.ndarray.

This transform wraps librosa.display.specshow() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray of the MFCCs of an audio and the output is a matplotlib.figure.Figure instance.

Example:

# run.py
import apache_beam as beam
from klio.transforms import decorators
from klio_audio.transforms import audio

@decorators.handle_klio
def element_to_filename(ctx, data):
    filename = data.element.decode("utf-8")
    return f"file:///path/to/audio/{filename}.wav"

def run(in_pcol, job_config):
    return (
        in_pcol
        | beam.Map(element_to_filename)
        | audio.LoadAudio()
        | audio.GetMFCC()
        | audio.MFCCToPlot()
        # other transforms
    )
Parameters
  • title (str) – Title of spectrogram plot. Default: MFCCs of {KlioMessage.data.element}.

  • plot_args (dict) – keyword arguments to pass to librosa.display.specshow().

class klio_audio.transforms.audio.WaveformToPlot(num_samples=5000, title=None, **plot_args)

Generate a matplotlib figure of the wave form of a numpy.ndarray.

This transform wraps librosa.display.waveplot() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray of a loaded audio file the output is a matplotlib.figure.Figure instance.

Example:

# run.py
import apache_beam as beam
from klio.transforms import decorators
from klio_audio.transforms import audio

@decorators.handle_klio
def element_to_filename(ctx, data):
    filename = data.element.decode("utf-8")
    return f"file:///path/to/audio/{filename}.wav"

def run(in_pcol, job_config):
    return (
        in_pcol
        | beam.Map(element_to_filename)
        | audio.LoadAudio()
        | audio.WaveformToPlot()
        # other transforms
    )
Parameters
  • num_samples (int) – Number of samples to plot. Default: 5000.

  • title (str) – Title of spectrogram plot. Default: Waveplot of {KlioMessage.data.element}.

  • plot_args (dict) – keyword arguments to pass to librosa.display.waveplot().