Python API

yamlprocessor.dataprocess

Process includes and variable substitutions in a YAML file.

For each value {"INCLUDE": "filename.yaml"}, load content from include file and substitute the value with the content of the include file.

For each string value with $NAME or ${NAME} syntax, substitute with value of corresponding (environment) variable.

For each string value with $YP_TIME_* or ${YP_TIME_*} syntax, substitute with value of corresponding date-time string.

Validate against specified JSON schema if root file starts with either #!<SCHEMA-URI> or # yaml-language-server: $schema=<SCHEMA-URI> line.

class yamlprocessor.dataprocess.DataProcessor

Process YAML and compatible data structure.

Import sub-data-structure from include files. Process variable substitution in string values. Process date-time substitution in string values. Validate against JSON schema.

.is_process_include: bool

Turn on/off include file processing.

.is_process_variable: bool

Turn on/off variable substitution.

.include_dict: dict

Dictonary for values that can be substituted for include.

.include_paths: list

Locations for searching include files. Default is the value of the YP_INCLUDE_PATH environment variable split into a list.

.schema_prefix: str = os.getenv("YP_SCHEMA_PREFIX")

Prefix for JSON schema specified as non-existing relative paths. See also YP_SCHEMA_PREFIX.

.time_formats: dict = {'': '%FT%T%z'}

Default and named time formats. See also YP_TIME_FORMAT and YP_TIME_FORMAT_.

.time_now: datetime.datetime

Date-time at instance initialisation.

.time_ref: datetime.datetime

Reference date-time. Default is the value of the YP_SCHEMA_PREFIX environment variable as datetime.datetime or time_now if the environment variable is not defined.

.variable_map: dict = os.environ

Mapping for variable substitutions.

.unbound_placeholder: str

Value to substitute for unbound variables.

.INCLUDE_SCHEMA: dict

(Class) The schema of the INCLUDE syntax.

get_filename(filename: str, parent_filenames: list) str

Return absolute path of filename.

If filename is a relative path, look for the file but looking in the directories containing the parent files, then the current working directory, then each path in .include_paths.

Parameters:
  • filename – File name to expand or return.

  • parent_filenames – Stack of parent file names.

static load_file(filename: str) object

Load content of (YAML) file into a data structure.

Parameters:

filename – name of file to load content.

Returns:

the loaded data structure.

static load_file_schema(filename: str) object

Load schema location from the schema association line of file.

Parameters:

filename – name of file to load schema location.

Returns:

a string containing the location of the schema or None.

load_include_file(value: object, parent_filenames: list, variable_map: dict) tuple

Load data if value indicates an include file.

Parameters:
  • value – Value that may contain file name to load.

  • parent_filenames – Stack of parent file names.

  • variable_mapvariable_map in the local scope, may have additional variables.

process_data(in_filename: str, out_filename: str) None

Process includes in input file and dump results in output file.

Parameters:
  • in_filename – input file name.

  • out_filename – output file name.

process_variable(item: object, variable_map: dict = None) object

Substitute (environment) variables into a string value.

Return item as-is if not .is_process_variable or if item is not a string.

For each $NAME and ${NAME} in item, substitute with the value of the environment variable NAME.

If NAME is not defined in the .variable_map and .unbound_placeholder is None, raise an UnboundVariableError.

If NAME is not defined in the .variable_map and .unbound_placeholder equals to the value of DataProcessor.UNBOUND_ORIGINAL, then leave the original syntax unchanged.

If NAME is not defined in the .variable_map and .unbound_placeholder is not None, substitute NAME with the value of .unbound_placeholder.

Parameters:
  • item – Item to process. Do nothing if not a str.

  • variable_mapvariable_map in the local scope, may have additional variables.

Returns:

Processed item on success.

validate_data(data: object, out_file_name: str, schema_location: str) None

Attempt to find the schema and use it to validate data.

Parameters:
  • data – The data structure to be validated.

  • schema_location – File name containing a JSON Schema.

exception yamlprocessor.dataprocess.UnboundVariableError

An error raised on attempt to substitute an unbound variable.

yamlprocessor.dataprocess.configure_basic_logging()

Configure basic logging, suitable for most CLI applications.

Basic no-frill format. Stream handler prints message on STDERR.

Normal usage:

>>> from yamlprocessor.dataprocess import DataProcessor
>>> processor = DataProcessor()
>>> # ... Customise the `DataProcessor` instance as necessary ..., then:
>>> processor.process_data(in_file_name, out_file_name)
yamlprocessor.dataprocess.construct_yaml_timestamp(constructor, node)

Return a method to add to the YAML constructor to parse datetime.

yamlprocessor.dataprocess.get_represent_datetime(time_format: str)

Return a method to add to the YAML representer to represent datetime.

yamlprocessor.dataprocess.strftime_with_colon_z(dto: datetime, time_format: str)

Wrap dto.strftime to support %:z, %::z and %:::z format code.

Always use Z for UTC - it is short and recognised by any parser.

yamlprocessor.schemaprocess

Modularise a JSON schema

Modularise a JSON schema and allows it to accept a data structure that can be composed of include files.

Two positional arguments are expected:

  1. The file name of the JSON schema file.

  2. The file name of a configuration file in JSON format.

The configuration file expects a mapping, where the keys are the file names (relative paths to current working directory) of the output sub-schema files, and the values are sub-schema break point locations (expressed as JMESPath format) in the input JSON schema document.

yamlprocessor.schemaprocess.schema_process(schema_filename: str, config_filename: str) None

Process schema to handle includes according to configuration.

Parameters:
  • schema_filename – schema file name.

  • config_filename – configuration file name.