Klio Executor

latest release: 22.3.0 (What’s new?)

Current Status

Klio is currently under rapid development. This means that APIs and features will evolve. It is recommended that teams who adopt Klio today upgrade their installation as new releases become available, as backwards compatibility is not yet guaranteed.

The executor – not meant to be used directly by the user – is a CLI that launches a pipeline from within a job’s Docker container (a.k.a. an Apache Beam’s “driver”). Many commands from the klio-cli directly wrap to commands in the executor: a klio-cli command will set up the Docker context needed to correctly run the pipeline via the associated command with klio-exec. The Docker context includes mounting the job directory, sets up environment variables, mounting credentials, etc.

As the klio-exec package is not meant to be installed directly, check out the installation guide for how to setup installation. There is also the user guide and the API documentation for more information.

klioexec

klioexec [OPTIONS] COMMAND [ARGS]...

audit

klioexec audit [OPTIONS]

Options

-c, --config-file <config_file>

Path to config filename. Defaults to klio-job.yaml in the current working directory.

--list

List available audit steps (does not run any audits).

profile

Profile a job. NOTE: Requires klio-exec[debug] installed in the job’s Docker image.

klioexec profile [OPTIONS] COMMAND [ARGS]...

cpu

Profile overall CPU usage on an interval while running all Klio-based transforms.

klioexec profile cpu [OPTIONS] [ENTITY_IDS]...

Options

--interval <interval>

Sampling period (in seconds).

Default

0.1

-i, --input-file <input_file>

File of entity IDs (separated by a new line character) with which to profile a Klio job.

-o, --output-file <output_file>

Output file for results.

-g, --plot-graph

Plot memory profile using matplotlib. Saves to klio_profile_memory_<YYYYMMDDhhmmss>.png.

Default

False

--show-logs

Show a job’s logs while profiling.

Default

False

-c, --config-file <config_file>

Path to config filename. Defaults to klio-job.yaml in the current working directory.

Arguments

ENTITY_IDS

Optional argument(s)

memory

Profile overall memory usage on an interval while running all Klio-based transforms.

klioexec profile memory [OPTIONS] [ENTITY_IDS]...

Options

--interval <interval>

Sampling period (in seconds).

Default

0.1

--include-children

Monitor forked processes as well (sums up all process memory).

Default

False

--multiprocess

Monitor forked processes creating individual plots for each child.

Default

False

-g, --plot-graph

Plot memory profile using matplotlib. Saves to klio_profile_memory_<YYYYMMDDhhmmss>.png.

Default

False

-i, --input-file <input_file>

File of entity IDs (separated by a new line character) with which to profile a Klio job.

-o, --output-file <output_file>

Output file for results.

--show-logs

Show a job’s logs while profiling.

Default

False

-c, --config-file <config_file>

Path to config filename. Defaults to klio-job.yaml in the current working directory.

Arguments

ENTITY_IDS

Optional argument(s)

memory-per-line

Profile memory per line for every Klio-based transforms’ process method.

klioexec profile memory-per-line [OPTIONS] [ENTITY_IDS]...

Options

--maximum

Print maximum memory usage per line in aggregate of all input elements process.

NOTE: This option is mutually exclusive with [--per-element].

Default

False

--per-element

Print memory usage per line for each input element processed

NOTE: This option is mutually exclusive with [--maximum].

Default

True

-i, --input-file <input_file>

File of entity IDs (separated by a new line character) with which to profile a Klio job.

-o, --output-file <output_file>

Output file for results.

--show-logs

Show a job’s logs while profiling.

Default

False

-c, --config-file <config_file>

Path to config filename. Defaults to klio-job.yaml in the current working directory.

Arguments

ENTITY_IDS

Optional argument(s)

timeit

Profile wall time by every line for every Klio-based transforms’ process method. NOTE: this uses the line_profiler package, not Python’s timeit module.

klioexec profile timeit [OPTIONS] [ENTITY_IDS]...

Options

-i, --input-file <input_file>

File of entity IDs (separated by a new line character) with which to profile a Klio job.

-o, --output-file <output_file>

Output file for results.

-n, --iterations <iterations>

Number of times to execute each entity ID provided.

Default

10

--show-logs

Show a job’s logs while profiling.

Default

False

-c, --config-file <config_file>

Path to config filename. Defaults to klio-job.yaml in the current working directory.

Arguments

ENTITY_IDS

Optional argument(s)

run

klioexec run [OPTIONS]

Options

--image-tag <image_tag>

Docker image tag to use

--direct-runner

Run the job locally via the DirectRunner.

--blocking, --no-blocking

Wait for Dataflow job to finish before returning

--update, --no-update

[Experimental] Update an existing streaming Cloud Dataflow job.

-O, --override <override>

Override a config value, in the form key=value.

-T, --template <template>

Set the value of a config template parameter, in the form key=value. Any instance of ${key} in klio-job.yaml will be replaced with value.

-j, --job-dir <job_dir>

Job directory where the job’s Dockerfile is located. Defaults current working directory.

-c, --config-file <config_file>

Path to config filename. If PATH is not absolute, it will be treated relative to --job-dir. Defaults to klio-job.yaml.

stop

klioexec stop [OPTIONS]

Options

-c, --config-file <config_file>

Path to config filename. Defaults to klio-job.yaml in the current working directory.

test

Thin wrapper around pytest. Any arguments after – are passed through.

klioexec test [OPTIONS] [PYTEST_ARGS]...

Options

-c, --config-file <config_file>

Path to config filename. Defaults to klio-job.yaml in the current working directory.

Arguments

PYTEST_ARGS

Optional argument(s)