Hello Klio Batch Example

This guide will show you how to set up your development environment to implement Klio, create an example Klio batch job and run it on DirectRunner. If you are interested in building a streaming Klio job then checkout the Klio Streaming Quickstart Guide.

Attention

Be sure to follow the installation instructions before continuing on.

Create a New Klio Batch Job

First, initialize the klio_quickstart project directory for git:

$ git init

Caution

If the klio-cli was installed via option 2 or option 3, make sure you’re klio-cli virtualenv is activated.

Next, within your project directory, run the following command:

$ klio job create \
  --job-name klio-quick-start \
  --job-type batch \
  --create-resources \
  --use-defaults

After responding to the prompts, Klio will:

  1. Create a GCS bucket in the provided GCP project you provided in the prompt for output data: gs://$GCP_PROJECT-output.

  2. Create a Google Stackdriver dashboard in the provided GCP project for you to monitor runtime job metrics.

  3. Create required files within the current working directory.

Then, commit the created job files into git:

$ git add .
$ git commit -m "Initial commit for Klio quickstart example"

Run the New Klio Job

Caution

If the klio-cli was installed via option 2 or option 3, make sure you’re klio-cli virtualenv is activated.

First, add a file of IDs elements as text input.

The default event input points to a text file in klio-quick-start_input_elements.txt in a GCS bucket. To get testing quick this file event input should be replaced with a local file. A local file containing lines of text corresponding to elements can be created manually in the top level job directory for this quickstart. Assuming a linux machine, a local file can also be created by running the following command while in the klio job directory.

$ { echo hello
    echo world
  } > klio-quick-start_input_elements.txt

This will create a local file with two lines of text that will serve as the event inputs of the batch Klio job.

Then, run the job using DirectRunner:

$ klio job run --direct-runner

Klio will first build a Docker image of the example job with the required dependencies, then start the job locally. To know it started successfully, you should see a log line containing

Running pipeline with DirectRunner

Klio will then read from the event input, in this case, the text file that was created. A Klio message is created for each line in the file and passed to the HelloKlio transform. When the message was successfully consumed, you should see log lines for each KlioMessage.

Received 'hello' from file './klio-quick-start_input_elements.txt'
Received 'world' from file './klio-quick-start_input_elements.txt'