Data Configuration

Input

Google Cloud Storage

Example configuration for Google Cloud Storage:

name: my-cool-job
pipeline_options:
  streaming: True
job_config:
  data:
    inputs:
      - type: gcs
        location: gs://my-bucket/my-jobs-folder
        file_suffix: .ogg
job_config.data.inputs[].type STR

Value: gcs

Runner: Dataflow, Direct
Required
job_config.data.inputs[].location STR

The GCS bucket of this job’s binary data input, usually an upstream job’s output. Must be a valid Cloud Storage URL, beginning with gs://.

Required for Klio’s automatic default existence checks.

Runner: Dataflow, Direct
Optional
job_config.data.inputs[].file_suffix STR

The general file suffix or extension of input files.

Required for Klio’s automatic default existence checks.

Runner: Dataflow, Direct
Optional
job_config.data.inputs[].skip_klio_existence_check BOOL

Inherited from global data input config.

job_config.data.inputs[].ping BOOL

Inherited from global data input config.

Custom

Example configuration for a custom data input that is not supported by Klio:

name: my-cool-job
job_config:
  data:
    inputs:
      - type: custom
        some_key: some_value
job_config.data.inputs[].type

Value: custom

Runner: Dataflow, Direct
Required
job_config.data.inputs[].skip_klio_existence_check BOOL

Inherited from global data input config. This will be set to True automatically.

job_config.data.inputs[].ping BOOL

Inherited from global data input config.

job_config.data.inputs[].<custom-key> ANY

Any arbitrary key-value pairs for custom data input configuration specific to a job.

Output

Google Cloud Storage

Example configuration for Google Cloud Storage:

name: my-cool-job
pipeline_options:
  streaming: True
job_config:
  data:
    outputs:
      - type: gcs
        location: gs://my-bucket/my-jobs-folder
        file_suffix: .wav
job_config.data.outputs[].type STR

Value: gcs

Runner: Dataflow, Direct
Required
job_config.data.outputs[].location STR

The GCS bucket of this job’s binary data output. Must be a valid Cloud Storage URL, beginning with gs://.

Required for Klio’s automatic default existence checks.

Runner: Dataflow, Direct
Optional
job_config.data.outputs[].file_suffix STR

The general file suffix or extension of input files.

Required for Klio’s automatic default existence checks.

Runner: Dataflow, Direct
Optional
job_config.data.outputs[].skip_klio_existence_check BOOL

Inherited from global data output config.

job_config.data.outputs[].force BOOL

Inherited from global data output config.

Custom

Example configuration for a custom data output that is not supported by Klio:

name: my-cool-job
job_config:
  data:
    outputs:
      - type: custom
        some_key: some_value
job_config.data.outputs[].type

Value: custom

Runner: Dataflow, Direct
Required
job_config.data.outputs[].skip_klio_existence_check BOOL

Inherited from global data output config. This will be set to True automatically.

job_config.data.outputs[].force BOOL

Inherited from global data output config.

job_config.data.outputs[].<custom-key> ANY

Any arbitrary key-value pairs for custom data output configuration specific to a job.