Batch Klio pipelines can be useful when performing backfills or when work can be done on a cadence. Batch jobs can also simplify local testing as resources to handle Pub/Sub messages are not required to be set up to kick off a job.
In the klio-job.yaml there are two config values that need to change in order to convert a streaming job to a batch job: the streaming field and the job_config.event input and output configurations.
klio-job.yaml
Set streaming to False:
streaming
False
name: my-stream-job-that-i-want-to-be-batch pipeline_options: streaming: False job_config: <-- snip -->
Currently the only supported event inputs and outputs for streaming jobs are Google Cloud Pub/Sub. However there are multiple supported event configurations in batch mode, the simplest of which is a text file located locally or in GCS. Similarly, writing event outputs to a GCS file is available in batch mode by setting job_config.event.outputs.
job_config.event.outputs
An example of of the changes for reading and writing to a GCS file are seen below:
name: my-stream-job-that-i-want-to-be-batch pipeline_options: streaming: False <-- snip --> job_config: event: inputs: - type: file location: gs://my-event-input/my-input-elements.txt outputs: - type: file location: gs://my-event-output/
Note
A batch job can also be converted into a streaming job in a similar matter. However, missing resources such as Pub/Sub topics and subscriptions will need to be created with the command klio job verify --create-resources.
klio job verify --create-resources