latest release: 22.3.0 (What’s new?)
Current Status
Klio is currently under rapid development. This means that APIs and features will evolve. It is recommended that teams who adopt Klio today upgrade their installation as new releases become available, as backwards compatibility is not yet guaranteed.
The executor – not meant to be used directly by the user – is a CLI that launches a pipeline from within a job’s Docker container (a.k.a. an Apache Beam’s “driver”). Many commands from the klio-cli directly wrap to commands in the executor: a klio-cli command will set up the Docker context needed to correctly run the pipeline via the associated command with klio-exec. The Docker context includes mounting the job directory, sets up environment variables, mounting credentials, etc.
klio-cli
klio-exec
As the klio-exec package is not meant to be installed directly, check out the installation guide for how to setup installation. There is also the user guide and the API documentation for more information.
klioexec [OPTIONS] COMMAND [ARGS]...
klioexec audit [OPTIONS]
Options
-c
,
--config-file
<config_file>
Path to config filename. Defaults to klio-job.yaml in the current working directory.
--list
List available audit steps (does not run any audits).
Profile a job. NOTE: Requires klio-exec[debug] installed in the job’s Docker image.
klioexec profile [OPTIONS] COMMAND [ARGS]...
Profile overall CPU usage on an interval while running all Klio-based transforms.
klioexec profile cpu [OPTIONS] [ENTITY_IDS]...
--interval
<interval>
Sampling period (in seconds).
0.1
-i
--input-file
<input_file>
File of entity IDs (separated by a new line character) with which to profile a Klio job.
-o
--output-file
<output_file>
Output file for results.
-g
--plot-graph
Plot memory profile using matplotlib. Saves to klio_profile_memory_<YYYYMMDDhhmmss>.png.
False
--show-logs
Show a job’s logs while profiling.
Arguments
ENTITY_IDS
Optional argument(s)
Profile overall memory usage on an interval while running all Klio-based transforms.
klioexec profile memory [OPTIONS] [ENTITY_IDS]...
--include-children
Monitor forked processes as well (sums up all process memory).
--multiprocess
Monitor forked processes creating individual plots for each child.
Profile memory per line for every Klio-based transforms’ process method.
klioexec profile memory-per-line [OPTIONS] [ENTITY_IDS]...
--maximum
Print maximum memory usage per line in aggregate of all input elements process.
NOTE: This option is mutually exclusive with [--per-element].
--per-element
Print memory usage per line for each input element processed
NOTE: This option is mutually exclusive with [--maximum].
True
Profile wall time by every line for every Klio-based transforms’ process method. NOTE: this uses the line_profiler package, not Python’s timeit module.
klioexec profile timeit [OPTIONS] [ENTITY_IDS]...
-n
--iterations
<iterations>
Number of times to execute each entity ID provided.
10
klioexec run [OPTIONS]
--image-tag
<image_tag>
Docker image tag to use
--direct-runner
Run the job locally via the DirectRunner.
--blocking
--no-blocking
Wait for Dataflow job to finish before returning
--update
--no-update
[Experimental] Update an existing streaming Cloud Dataflow job.
-O
--override
<override>
Override a config value, in the form key=value.
key=value
-T
--template
<template>
Set the value of a config template parameter, in the form key=value. Any instance of ${key} in klio-job.yaml will be replaced with value.
${key}
klio-job.yaml
value
-j
--job-dir
<job_dir>
Job directory where the job’s Dockerfile is located. Defaults current working directory.
Dockerfile
Path to config filename. If PATH is not absolute, it will be treated relative to --job-dir. Defaults to klio-job.yaml.
PATH
klioexec stop [OPTIONS]
Thin wrapper around pytest. Any arguments after – are passed through.
klioexec test [OPTIONS] [PYTEST_ARGS]...
PYTEST_ARGS