klio job
Create the necessary files for a new Klio job.
klio job create [OPTIONS] [ADDL_JOB_OPTS]...
Name of your new job
Name of the GCP project the job should be created in
Output directory. Defaults to current working directory.
Accept default values.
Optional argument(s)
View and edit a Klio job’s configuration.
klio job config [OPTIONS] COMMAND [ARGS]...
Get the value for a configuration property of a Klio job.
klio job config get [OPTIONS] SECTION.PROPERTY
Job directory where the job’s Dockerfile is located. Defaults current working directory.
Path to config filename. If PATH is not absolute, it will be treated relative to --job-dir. Defaults to klio-job.yaml.
Required argument
Set a configuration value for a Klio job. Multiple pairs of SECTION.PROPERTY=VALUE are accepted.
klio job config set [OPTIONS] SECTION.PROPERTY=VALUE...
Required argument(s)
Show the complete effective configuration for a Klio job.
klio job config show [OPTIONS]
Unset a configuration value for a Klio job.
klio job config unset [OPTIONS] SECTION.PROPERTY
Run a klio job.
klio job run [OPTIONS]
Docker image tag to use
[Experimental] Update an existing streaming Cloud Dataflow job.
Run the job locally via the DirectRunner.
Build Docker image even if you already have it.
Override a config value, in the form key=value.
Set the value of a config template parameter, in the form key=value. Any instance of ${key} in klio-job.yaml will be replaced with value.
Deploy a job. This will first cancel any currently running job of the same name & region.
NOTE: Draining is not supported.
klio job deploy [OPTIONS]
Cancel a currently running job.
NOTE: Draining is not supported
klio job stop [OPTIONS]
Name of job, if neither --job-dir nor --config-file is not provided.
Region of job, if neither --job-dir nor --config-file is not provided.
Project of job, if neither --job-dir nor --config-file is not provided.
Delete GCP-related resources created by a Klio job
klio job delete [OPTIONS]
Verifies all GCP resources and dependencies used in the job so that the Klio Job as defined in the klio-info.yaml can run properly in production.
klio job verify [OPTIONS]
Create missing GCP resources based on klio-info.yaml. Default: False
Run unit tests for job.
klio job test [OPTIONS] [PYTEST_ARGS]...
Profile a job.
NOTE: Requires klio-exec[debug] installed in the job’s Docker image.
klio job profile [OPTIONS] COMMAND [ARGS]...
Collect & view profiling output in GCS. Sorting and restrictions as supported by the stats class.
NOTE: This requires running the Klio job on Dataflow with pipeline_options.profile_location set to a GCS bucket, and either/both pipeline_options.profile_cpu and/or pipeline_options.profile_memory set to True in klio-job.yaml.
klio job profile collect-profiling-data [OPTIONS] [RESTRICTIONS]...
NOTE: This option is mutually exclusive with [--input-file, --gcs-location].
GCS location of cProfile data.
NOTE: This option is mutually exclusive with [--config-file, --input-file, --job-dir].
Start time, relative or absolute (interpreted by dateparser.parse).
End time, relative or absolute (interpreted by dateparser.parse).
Print stats from a previously-saved output.
NOTE: This option is mutually exclusive with [--output-file, --config-file, --gcs-location, --job-dir].
Dump collected cProfile data to a desired output file.
NOTE: This option is mutually exclusive with [--input-file].
Sort output of profiling statistics as supported by sort_stats. Multiple --sort-stats invocations are supported.
Profile overall CPU usage on an interval while running all Klio-based transforms.
klio job profile cpu [OPTIONS] [ENTITY_IDS]...
Sampling period (in seconds).
Plot memory profile using matplotlib. Saves to klio_profile_memory_<YYYYMMDDhhmmss>.png.
File of entity IDs (separated by a new line character) with which to profile a Klio job. If file path is not absolute, it will be treated relative to --job-dir.
Output file for results. [default: stdout]
Show a job’s logs while profiling.
Profile overall memory usage on an interval while running all Klio-based transforms.
klio job profile memory [OPTIONS] [ENTITY_IDS]...
Monitor forked processes as well (sums up all process memory).
Monitor forked processes creating individual plots for each child.
Profile memory per line for every Klio-based transforms’ process method.
klio job profile memory-per-line [OPTIONS] [ENTITY_IDS]...
Print maximum memory usage per line in aggregate of all input elements process.
NOTE: This option is mutually exclusive with [--per-element].
Print memory usage per line for each input element processed
NOTE: This option is mutually exclusive with [--maximum].
Profile wall time by every line for every Klio-based transforms’ process method.
NOTE: this uses the line_profiler package, not Python’s timeit module.
klio job profile timeit [OPTIONS] [ENTITY_IDS]...
Number of times to execute each entity ID provided.
Audit a job for detect common issues via running tests with additional mocking.
NOTE: Additional arguments to pytest are not supported.
klio job audit [OPTIONS]
List available audit steps (does not run any audits).