klio_audio.transforms.audio.
LoadAudio
Load audio into memory as a numpy.ndarray.
numpy.ndarray
This transform wraps librosa.load() takes in a PCollection of KlioMessages with the payload of the KlioMessage a file-like object or a path to a file, and returns a PCollection of KlioMessages where the payload is a numpy.ndarray.
librosa.load()
PCollection
KlioMessage
KlioMessages
Example:
# run.py import apache_beam as beam from klio.transforms import decorators from klio_audio.transforms import audio @decorators.handle_klio def element_to_filename(ctx, data): filename = data.element.decode("utf-8") return f"file:///path/to/audio/{filename}.wav" def run(in_pcol, job_config): return ( in_pcol | beam.Map(element_to_filename) | audio.LoadAudio() # other transforms )
librosa_kwargs (dict) – Instantiate the transform with keyword arguments to pass into librosa.load().
GetSTFT
Calculate Short-time Fourier transform from a numpy.ndarray.
This transform wraps librosa.stft() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray and the output is the same with the stft calculation applied.
librosa.stft()
stft
The Short-time Fourier transform (STFT) is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. STFT provides the time-localized frequency information for situations in which frequency components of a signal vary over time, whereas the standard Fourier transform provides the frequency information averaged over the entire signal time interval.
# run.py import apache_beam as beam from klio.transforms import decorators from klio_audio.transforms import audio @decorators.handle_klio def element_to_filename(ctx, data): filename = data.element.decode("utf-8") return f"file:///path/to/audio/{filename}.wav" def run(in_pcol, job_config): return ( in_pcol | beam.Map(element_to_filename) | audio.LoadAudio() | audio.GetSTFT # other transforms )
librosa_kwargs (dict) – Instantiate the transform with keyword arguments to pass into librosa.stft().
GetSpec
Generate a dB-scaled spectrogram from a numpy.ndarray.
This transform wraps librosa.amplitude_to_db() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray and the output is the same with the amplitude_to_ db function applied.
librosa.amplitude_to_db()
amplitude_to_ db
A spectrogram shows the the intensity of frequencies over time.
# run.py import apache_beam as beam from klio.transforms import decorators from klio_audio.transforms import audio @decorators.handle_klio def element_to_filename(ctx, data): filename = data.element.decode("utf-8") return f"file:///path/to/audio/{filename}.wav" def run(in_pcol, job_config): return ( in_pcol | beam.Map(element_to_filename) | audio.LoadAudio() | audio.GetSpec() # other transforms )
librosa_kwargs (dict) – Instantiate the transform with keyword arguments to pass into librosa.amplitude_to_db().
GetMelSpec
Generate a spectrogram from a numpy.ndarray using the mel scale.
This transform wraps librosa.feature.melspectrogram() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray and the output is the same with the melspectrogram function applied.
librosa.feature.melspectrogram()
melspectrogram
The mel scale is a non-linear transformation of frequency scale based on the perception of pitches. The mel scale is calculated so that two pairs of frequencies separated by a delta in the mel scale are perceived by humans as being equidistant.
# run.py import apache_beam as beam from klio.transforms import decorators from klio_audio.transforms import audio @decorators.handle_klio def element_to_filename(ctx, data): filename = data.element.decode("utf-8") return f"file:///path/to/audio/{filename}.wav" def run(in_pcol, job_config): return ( in_pcol | beam.Map(element_to_filename) | audio.LoadAudio() | audio.GetMelSpec() # other transforms )
librosa_kwargs (dict) – Instantiate the transform with keyword arguments to pass into librosa.feature.melspectrogram().
GetMFCC
Calculate MFCCs from a numpy.ndarray.
This transform wraps librosa.power_to_db() followed by librosa.feature.mfcc() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray and the output is the same with the mfcc function applied.
librosa.power_to_db()
librosa.feature.mfcc()
mfcc
The Mel frequency cepstral coefficients (MFCCs) of a signal are a small set of features (usually about 10–20) which describe the overall shape of a spectral envelope. It’s is often used to describe timbre or model characteristics of human voice.
# run.py import apache_beam as beam from klio.transforms import decorators from klio_audio.transforms import audio @decorators.handle_klio def element_to_filename(ctx, data): filename = data.element.decode("utf-8") return f"file:///path/to/audio/{filename}.wav" def run(in_pcol, job_config): return ( in_pcol | beam.Map(element_to_filename) | audio.LoadAudio() | audio.GetMFCC() # other transforms )
librosa_kwargs (dict) – Instantiate the transform with keyword arguments to pass into librosa.feature.mfcc().
SpecToPlot
Generate a matplotlib figure of the spectrogram of a numpy.ndarray.
This transform wraps librosa.display.specshow() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray of a spectrogram and the output is a matplotlib.figure.Figure instance.
librosa.display.specshow()
matplotlib.figure.Figure
# run.py import apache_beam as beam from klio.transforms import decorators from klio_audio.transforms import audio @decorators.handle_klio def element_to_filename(ctx, data): filename = data.element.decode("utf-8") return f"file:///path/to/audio/{filename}.wav" def run(in_pcol, job_config): return ( in_pcol | beam.Map(element_to_filename) | audio.LoadAudio() | audio.GetSpec() | audio.SpecToPlot() # other transforms )
title (str) – Title of spectrogram plot. Default: Spectrogram of {KlioMessage.data.element}.
Spectrogram of {KlioMessage.data.element}
plot_args (dict) – keyword arguments to pass to librosa.display.specshow().
MelSpecToPlot
Generate a matplotlib figure of the mel spectrogram of a a numpy.ndarray.
This transform wraps librosa.power_to_db() followed by librosa.display.specshow() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray of a melspectrogram and the output is a matplotlib.figure.Figure instance.
# run.py import apache_beam as beam from klio.transforms import decorators from klio_audio.transforms import audio @decorators.handle_klio def element_to_filename(ctx, data): filename = data.element.decode("utf-8") return f"file:///path/to/audio/{filename}.wav" def run(in_pcol, job_config): return ( in_pcol | beam.Map(element_to_filename) | audio.LoadAudio() | audio.GetMelSpec() | audio.SpecToPlot() # other transforms )
title (str) – Title of spectrogram plot. Default: Mel-freqency Spectrogram of {KlioMessage.data.element}.
Mel-freqency Spectrogram of {KlioMessage.data.element}
MFCCToPlot
Generate a matplotlib figure of the MFCCs as a numpy.ndarray.
This transform wraps librosa.display.specshow() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray of the MFCCs of an audio and the output is a matplotlib.figure.Figure instance.
# run.py import apache_beam as beam from klio.transforms import decorators from klio_audio.transforms import audio @decorators.handle_klio def element_to_filename(ctx, data): filename = data.element.decode("utf-8") return f"file:///path/to/audio/{filename}.wav" def run(in_pcol, job_config): return ( in_pcol | beam.Map(element_to_filename) | audio.LoadAudio() | audio.GetMFCC() | audio.MFCCToPlot() # other transforms )
title (str) – Title of spectrogram plot. Default: MFCCs of {KlioMessage.data.element}.
MFCCs of {KlioMessage.data.element}
WaveformToPlot
Generate a matplotlib figure of the wave form of a numpy.ndarray.
This transform wraps librosa.display.waveplot() and expects a PCollection of KlioMessages where the payload is a numpy.ndarray of a loaded audio file the output is a matplotlib.figure.Figure instance.
librosa.display.waveplot()
# run.py import apache_beam as beam from klio.transforms import decorators from klio_audio.transforms import audio @decorators.handle_klio def element_to_filename(ctx, data): filename = data.element.decode("utf-8") return f"file:///path/to/audio/{filename}.wav" def run(in_pcol, job_config): return ( in_pcol | beam.Map(element_to_filename) | audio.LoadAudio() | audio.WaveformToPlot() # other transforms )
num_samples (int) – Number of samples to plot. Default: 5000.
5000
title (str) – Title of spectrogram plot. Default: Waveplot of {KlioMessage.data.element}.
Waveplot of {KlioMessage.data.element}
plot_args (dict) – keyword arguments to pass to librosa.display.waveplot().