This guide will show you how to set up your development environment to implement Klio, create an example Klio batch job and run it on DirectRunner. If you are interested in building a streaming Klio job then checkout the Klio Streaming Quickstart Guide.
Attention
Be sure to follow the installation instructions before continuing on.
First, initialize the klio_quickstart project directory for git:
klio_quickstart
git
$ git init
Caution
If the klio-cli was installed via option 2 or option 3, make sure you’re klio-cli virtualenv is activated.
klio-cli
Next, within your project directory, run the following command:
$ klio job create \ --job-name klio-quick-start \ --job-type batch \ --create-resources \ --use-defaults
After responding to the prompts, Klio will:
Create a GCS bucket in the provided GCP project you provided in the prompt for output data: gs://$GCP_PROJECT-output.
gs://$GCP_PROJECT-output
Create a Google Stackdriver dashboard in the provided GCP project for you to monitor runtime job metrics.
Create required files within the current working directory.
Then, commit the created job files into git:
$ git add . $ git commit -m "Initial commit for Klio quickstart example"
First, add a file of IDs elements as text input.
The default event input points to a text file in klio-quick-start_input_elements.txt in a GCS bucket. To get testing quick this file event input should be replaced with a local file. A local file containing lines of text corresponding to elements can be created manually in the top level job directory for this quickstart. Assuming a linux machine, a local file can also be created by running the following command while in the klio job directory.
klio-quick-start_input_elements.txt
$ { echo hello echo world } > klio-quick-start_input_elements.txt
This will create a local file with two lines of text that will serve as the event inputs of the batch Klio job.
Then, run the job using DirectRunner:
$ klio job run --direct-runner
Klio will first build a Docker image of the example job with the required dependencies, then start the job locally. To know it started successfully, you should see a log line containing
Running pipeline with DirectRunner
Klio will then read from the event input, in this case, the text file that was created. A Klio message is created for each line in the file and passed to the HelloKlio transform. When the message was successfully consumed, you should see log lines for each KlioMessage.
HelloKlio
Received 'hello' from file './klio-quick-start_input_elements.txt' Received 'world' from file './klio-quick-start_input_elements.txt'