Python API

yamlprocessor.dataprocess

Process includes and variable substitutions in a YAML file.

For each value {"INCLUDE": "filename.yaml"}, load content from include file and substitute the value with the content of the include file.

For each string value with $NAME or ${NAME} syntax, substitute with value of corresponding (environment) variable.

For each string value with $YP_TIME_* or ${YP_TIME_*} syntax, substitute with value of corresponding date-time string.

Validate against specified JSON schema if root file starts with either #!<SCHEMA-URI> or # yaml-language-server: $schema=<SCHEMA-URI> line.

CLI usage allows multiple positional arguments.

In usage 1, the final positional argument is the output file name, and the other arguments are input file names.

In usage 2, with --output=FILENAME (-o FILENAME) option, all positional arguments are input file names.

In either case, all input files will be concatenated together (as text), before being parsed as a combined YAML file.

class yamlprocessor.dataprocess.DataProcessor

Process YAML and compatible data structure.

Import sub-data-structure from include files. Process variable substitution in string values. Process date-time substitution in string values. Validate against JSON schema.

.is_process_include: bool: Turn on/off include file processing.

.is_process_variable: bool: Turn on/off variable substitution.

.include_dict: dict: Dictonary for values that can be substituted for include.

.include_paths: list: Locations for searching include files. Default is the value of the YP_INCLUDE_PATH environment variable split into a list.

.schema_prefix: str = os.getenv("YP_SCHEMA_PREFIX"): Prefix for JSON schema specified as non-existing relative paths. See also YP_SCHEMA_PREFIX.

.time_formats: dict = {'': '%FT%T%z'}: Default and named time formats. See also YP_TIME_FORMAT and YP_TIME_FORMAT_.

.time_now: datetime.datetime: Date-time at instance initialisation.

.time_ref: datetime.datetime: Reference date-time. Default is the value of the YP_SCHEMA_PREFIX environment variable as datetime.datetime or time_now if the environment variable is not defined.

.variable_map: dict = os.environ: Mapping for variable substitutions.

.unbound_placeholder: str: Value to substitute for unbound variables.

.INCLUDE_SCHEMA: dict: (Class) The schema of the INCLUDE syntax.

get_filename(filename: str, parent_filenames: list) → str

Return absolute path of filename.

If filename is a relative path, look for the file but looking in the directories containing the parent files, then the current working directory, then each path in .include_paths.

Parameters:

filename – File name to expand or return.
parent_filenames – Stack of parent file names.

static load_file(filename: str | IO) → object

Load content of (YAML) file into a data structure.

Parameters:: filename – file (name) to load content.
Returns:: the loaded data structure.

static load_file_schema(filename: str | IO) → object

Load schema location from the schema association line of file.

Parameters:: filename – name of file to load schema location.
Returns:: a string containing the location of the schema or None.

load_include_file(value: object, parent_filenames: list, variable_map: dict) → tuple

Load data if value indicates an include file.

Parameters:

value – Value that may contain file name to load.
parent_filenames – Stack of parent file names.
variable_map – variable_map in the local scope, may have additional variables.

log_settings(): Log (info) current settings of the processor.

process_data(in_filenames: str | Iterable[str], out_filename: str) → None

Concatenate input files and load resulting data.

Dump results in output file.

Parameters:

in_filenames – input file name str or input file names list.
out_filename – output file name.

process_variable(item: object, variable_map: dict = None) → object

Substitute (environment) variables into a string value.

Return item as-is if not .is_process_variable or if item is not a string.

For each $NAME and ${NAME} in item, substitute with the value of the environment variable NAME.

If NAME is not defined in the .variable_map and .unbound_placeholder is None, raise an UnboundVariableError.

If NAME is not defined in the .variable_map and .unbound_placeholder equals to the value of DataProcessor.UNBOUND_ORIGINAL, then leave the original syntax unchanged.

If NAME is not defined in the .variable_map and .unbound_placeholder is not None, substitute NAME with the value of .unbound_placeholder.

Parameters:

item – Item to process. Do nothing if not a str.
variable_map – variable_map in the local scope, may have additional variables.

Returns:

Processed item on success.

validate_data(data: object, out_file_name: str, schema_location: str) → None

Attempt to find the schema and use it to validate data.

Parameters:

data – The data structure to be validated.
schema_location – File name containing a JSON Schema.

exception yamlprocessor.dataprocess.UnboundVariableError: An error raised on attempt to substitute an unbound variable.

yamlprocessor.dataprocess.configure_basic_logging(level=20)

Configure basic logging, suitable for most CLI applications.

Basic no-frill format. Stream handler prints message on STDERR.

Normal usage:

>>> from yamlprocessor.dataprocess import DataProcessor
>>> processor = DataProcessor()
>>> # ... Customise the `DataProcessor` instance as necessary ..., then:
>>> processor.process_data(in_file_name, out_file_name)

yamlprocessor.dataprocess.construct_yaml_timestamp(constructor, node): Return a method to add to the YAML constructor to parse datetime.

yamlprocessor.dataprocess.get_represent_datetime(time_format: str): Return a method to add to the YAML representer to represent datetime.

yamlprocessor.dataprocess.strftime_with_colon_z(dto: datetime, time_format: str)

Wrap dto.strftime to support %:z, %::z and %:::z format code.

Always use Z for UTC - it is short and recognised by any parser.

yamlprocessor.schemaprocess

Modularise a JSON schema

Modularise a JSON schema and allows it to accept a data structure that can be composed of include files.

Two positional arguments are expected:

The file name of the JSON schema file.

The file name of a configuration file in JSON format.

The configuration file expects a mapping, where the keys are the file names (relative paths to current working directory) of the output sub-schema files, and the values are sub-schema break point locations (expressed as JMESPath format) in the input JSON schema document.

yamlprocessor.schemaprocess.schema_process(schema_filename: str, config_filename: str) → None

Process schema to handle includes according to configuration.

Parameters:

schema_filename – schema file name.
config_filename – configuration file name.