eva.data package#

Submodules#

eva.data.cubed_sphere_restart module#

class eva.data.cubed_sphere_restart.CubedSphereRestart(eva_class_name, eva_logger, timing)[source]#

Bases: EvaDatasetBase

A class for handling Cubed Sphere Restart data.

Initialize the base class for eva dataset objects.

Parameters:
  • eva_class_name (str) – Name of the eva class to instantiate.

  • eva_logger (Logger) – Logger instance for logging messages.

  • timing (Timing) – Timing instance for performance measurement.

execute(dataset_config, data_collections, timing)[source]#

Executes the processing of Cubed Sphere Restart data.

Parameters:
  • dataset_config (dict) – Configuration dictionary for the dataset.

  • data_collections (DataCollections) – Object for managing data collections.

  • timing – Timing object for tracking execution time.

generate_default_config(filenames, collection_name)[source]#

Generates a default configuration for Cubed Sphere Restart data.

Parameters:
  • filenames (list) – List of file names.

  • collection_name (str) – Name of the data collection.

Returns:

Default configuration dictionary.

Return type:

dict

eva.data.cubed_sphere_restart.read_fms_tiles(files, variable, logger)[source]#

Given a list of FMS netCDF files and a variable name, stitches the files together into an N+1 dimension variable.

Parameters:
  • files (list) – List of netCDF file paths.

  • variable (str) – Name of the variable to extract.

  • logger (Logger) – Logger object for logging messages.

Returns:

Combined variable array from input files.

Return type:

np.ndarray

eva.data.data_collections module#

class eva.data.data_collections.DataCollections[source]#

Bases: object

Manage collections of xarray Datasets with variable manipulations.

Initialize the DataCollections instance.

add_variable_to_collection(collection_name, group_name, variable_name, variable)[source]#

Add a new variable to a collection.

Parameters:
  • collection_name (str) – Name of the collection to add the variable to.

  • group_name (str) – Name of the group where the variable belongs.

  • variable_name (str) – Name of the variable.

  • variable (DataArray) – The xarray DataArray to add.

Raises:

ValueError – If variable is not an xarray DataArray.

adjust_channel_dimension_name(channel_dimension_name)[source]#

Adjust the name of the channel dimension in all collections.

Parameters:

channel_dimension_name (str) – New name for the channel dimension.

adjust_location_dimension_name(location_dimension_name)[source]#

Adjust the name of the location dimension in all collections.

Parameters:

location_dimension_name (str) – New name for the location dimension.

create_or_add_to_collection(collection_name, collection, concat_dimension=None)[source]#

Create a new collection or add to an existing collection.

Parameters:
  • collection_name (str) – Name of the collection.

  • collection (Dataset) – The xarray Dataset to add or create.

  • concat_dimension (str) – Dimension along which to concatenate if adding to an existing

  • collection.

Raises:
  • ValueError – If collection is not an xarray Dataset.

  • ValueError – If an existing empty collection with the same name is detected.

  • ValueError – If concatenation dimension is missing or invalid.

display_collections()[source]#

Display information about available collections, groups, and variables.

get_variable_data(collection_name, group_name, variable_name, channels=None)[source]#

Retrieve the data of a specific variable from a collection.

Parameters:
  • collection_name (str) – Name of the collection.

  • group_name (str) – Name of the group where the variable belongs.

  • variable_name (str) – Name of the variable.

  • channels (int or list[int]) – Indices of channels to select (optional).

Returns:

The selected variable data as a NumPy array.

Return type:

ndarray

get_variable_data_array(collection_name, group_name, variable_name, channels=None)[source]#

Retrieve a specific variable (as a DataArray) from a collection.

Parameters:
  • collection_name (str) – Name of the collection.

  • group_name (str) – Name of the group where the variable belongs.

  • variable_name (str) – Name of the variable.

  • channels (int or list[int]) – Indices of channels to select (optional).

Returns:

The selected variable as an xarray DataArray.

Return type:

DataArray

Raises:

ValueError – If channels are provided but the ‘Channel’ dimension is missing.

nan_float_values_outside_threshold(threshold, cgv_to_screen=None)[source]#

Set values outside a threshold to NaN in selected collections, groups, and variables.

Parameters:
  • threshold (float) – Threshold value for screening.

  • cgv_to_screen (str) – Collection, group, and variable to screen (optional).

validate_names()[source]#

Validate naming conventions for collections, groups, and variables.

eva.data.data_driver module#

eva.data.data_driver.data_driver(config, data_collections, timing, logger)[source]#

Driver for executing data processing.

Parameters:
  • config (dict) – Configuration settings for data processing.

  • data_collections (DataCollections) – Instance of the DataCollections class.

  • timing (Timing) – Timing instance for performance measurement.

  • logger (Logger) – Logger instance for logging messages.

eva.data.eva_dataset_base module#

class eva.data.eva_dataset_base.EvaDatasetBase(eva_class_name, eva_logger, timing)[source]#

Bases: ABC

Abstract base class for EVA dataset objects.

Initialize the base class for eva dataset objects.

Parameters:
  • eva_class_name (str) – Name of the eva class to instantiate.

  • eva_logger (Logger) – Logger instance for logging messages.

  • timing (Timing) – Timing instance for performance measurement.

abstract execute(config, data_collections, timing)[source]#

Execute the dataset processing.

Parameters:
  • config (dict) – Configuration settings for dataset processing.

  • data_collections (DataCollections) – Instance of the DataCollections class.

  • timing (Timing) – Timing instance for performance measurement.

abstract generate_default_config(filenames, collection_name, control_file=None)[source]#

Generate the default configuration for the dataset.

Parameters:
  • filenames (list) – List of filenames associated with the dataset.

  • collection_name (str) – Name of the collection.

  • control_file (str) – Path to the control file (optional).

class eva.data.eva_dataset_base.EvaDatasetFactory[source]#

Bases: object

Factory class for creating eva data ingest objects.

create_eva_object(eva_class_name, eva_group_name, eva_logger, timing)[source]#

Create an eva dataset ingest object.

Parameters:
  • eva_class_name (str) – Name of the EVA class.

  • eva_group_name (str) – Name of the EVA group.

  • eva_logger (Logger) – Logger instance for logging messages.

  • timing (Timing) – Timing instance for performance measurement.

Returns:

An instance of the specified EVA dataset class.

Return type:

EvaDatasetBase

eva.data.gsi_obs_space module#

class eva.data.gsi_obs_space.GsiObsSpace(eva_class_name, eva_logger, timing)[source]#

Bases: EvaDatasetBase

Eva dataset class for processing GSI observation space data.

Initialize the base class for eva dataset objects.

Parameters:
  • eva_class_name (str) – Name of the eva class to instantiate.

  • eva_logger (Logger) – Logger instance for logging messages.

  • timing (Timing) – Timing instance for performance measurement.

execute(dataset_config, data_collections, timeing)[source]#

Execute the GSI observation space data processing.

Parameters:
  • dataset_config (dict) – Configuration settings for the dataset processing.

  • data_collections (DataCollections) – Instance of the DataCollections class.

  • timing (Timing) – Timing instance for performance measurement.

generate_default_config(filenames, collection_name)[source]#

Generate the default configuration for the GSI observation space dataset.

Parameters:
  • filenames (list) – List of filenames associated with the dataset.

  • collection_name (str) – Name of the collection.

Returns:

Default configuration settings for the dataset.

Return type:

dict

eva.data.gsi_obs_space.all_equal(iterable)[source]#

Check if all elements in an iterable are equal.

Parameters:

iterable – An iterable object to check.

Returns:

True if all elements are equal, False otherwise.

Return type:

bool

eva.data.gsi_obs_space.satellite_dataset(ds)[source]#

Build a new dataset to reshape satellite data.

Parameters:

ds (Dataset) – The input xarray Dataset.

Returns:

Reshaped xarray Dataset.

Return type:

Dataset

eva.data.gsi_obs_space.subset_channels(ds, channels, logger, add_channels_variable=False)[source]#

Subset the dataset based on specified channels.

Parameters:
  • ds (Dataset) – The xarray Dataset to subset.

  • channels (list) – List of channel numbers to keep.

  • logger (Logger) – Logger instance for logging messages.

  • add_channels_variable (bool, optional) – Whether to add ‘channelNumber’ variable. Default is

  • False.

eva.data.gsi_obs_space.uv(group_vars)[source]#

Add ‘uv’ prefix to specified variables if present.

Parameters:

group_vars (list) – List of variable names.

Returns:

List of variable names with ‘uv’ prefix added.

Return type:

list

eva.data.ioda_obs_space module#

class eva.data.ioda_obs_space.IodaObsSpace(eva_class_name, eva_logger, timing)[source]#

Bases: EvaDatasetBase

A class for executing data collection processing using IODA observation space.

This class inherits from EvaDatasetBase and implements the execute method to read the data and process into the eva data collection format.

Parameters:

EvaDatasetBase (class) – The base class for dataset processing.

N/A
execute(dataset_config, data_collections, timing)[source]#

Executes data read and transition to data collection for IODA observation space.

generate_default_config(filenames, collection_name)[source]#

Generates a default configuration dictionary for IODA observation space, used for more easily accessing the class interactively.

Notes

  • The class inherits from EvaDatasetBase and extends its functionality.

  • (Additional notes, if applicable)

Example

# Instantiate the class
ioda_instance = IodaObsSpace()

# Execute data collection processing using IODA observation space
ioda_instance.execute(dataset_config, data_collections, timing)

# Generate a default configuration dictionary for IODA observation space
default_config = ioda_instance.generate_default_config(filenames, collection_name)

Initialize the base class for eva dataset objects.

Parameters:
  • eva_class_name (str) – Name of the eva class to instantiate.

  • eva_logger (Logger) – Logger instance for logging messages.

  • timing (Timing) – Timing instance for performance measurement.

execute(dataset_config, data_collections, timing)[source]#

Executes data collection processing using IODA observation space.

This method reads and processes data based on the provided configuration, which contains file names, variables etc. It iterates over files, groups, and variables.

Parameters:
  • dataset_config (dict) – Configuration settings for the dataset.

  • data_collections (DataCollection) – The data collection to store read data.

  • timing (Timing) – Timing information for profiling.

Returns:

None

Notes

  • This method operates on instance-specific attributes.

Example

# Instantiate the class
ioda_instance = IodaObsSpace()

# Execute data collection processing using IODA observation space
ioda_instance.execute(dataset_config, data_collections, timing)
generate_default_config(filenames, collection_name)[source]#

Generates a default configuration dictionary for IODA observation space.

This method generates a default configuration dictionary for IODA observation space. It sets default values for file names, groups, missing value threshold, and collection name.

Parameters:
  • filenames (list) – List of filenames for the data collection.

  • collection_name (str) – Name of the data collection.

Returns:

A dictionary containing default configuration settings.

Return type:

dict

Notes

  • This method operates on instance-specific attributes.

Example

# Instantiate the class
ioda_instance = IodaObsSpace()

# Generate a default configuration dictionary for IODA observation space
default_config = ioda_instance.generate_default_config(filenames,
                                                       collection_name)
eva.data.ioda_obs_space.subset_channels(ds, channels)[source]#

Subsets a dataset to include specific channels, if provided.

This function subsets a dataset based on the provided channel numbers. It can be used to retain only a subset of channels from the dataset while potentially resetting the dimension in the dataset.

Parameters:
  • ds (xarray.Dataset) – The input dataset to be subsetted.

  • channels (list-like) – List of channel numbers to retain.

Returns:

The subsetted dataset containing only the specified channels.

Return type:

xarray.Dataset

Notes

  • If the dataset contains a dimension named ‘Channel’, the function will attempt to subset based on this dimension.

  • If no ‘channels’ are provided, all channels in the dataset will be retained.

  • If the number of requested channels is less than the number of channels in the dataset, the function will perform the subset operation.

Example

# Subset the dataset 'data' to include only channels 1, 5 and 10:
subset_ds = subset_channels(data, [1, 5, 10])

eva.data.jedi_log module#

class eva.data.jedi_log.JediLog(eva_class_name, eva_logger, timing)[source]#

Bases: EvaDatasetBase

A class for handling Jedi log data.

Initialize the base class for eva dataset objects.

Parameters:
  • eva_class_name (str) – Name of the eva class to instantiate.

  • eva_logger (Logger) – Logger instance for logging messages.

  • timing (Timing) – Timing instance for performance measurement.

execute(dataset_config, data_collections, timing)[source]#

Executes the processing of Jedi log data.

Parameters:
  • dataset_config (dict) – Configuration dictionary for the dataset.

  • data_collections (DataCollections) – Object for managing data collections.

  • timing (Timing) – Timing object for tracking execution time.

generate_default_config(filenames, collection_name)[source]#

Generates a default configuration for Jedi log data ingest.

Parameters:
  • filenames (list) – List of file names.

  • collection_name (str) – Name of the data collection.

Returns:

Default configuration dictionary.

Return type:

dict

get_from_log(search_term, separator, position, custom_log=None)[source]#

Searches the Jedi log for a specified term and extracts the corresponding data.

Parameters:
  • search_term (str) – Search term to look for in the Jedi log.

  • separator (str) – Separator used to split the log line.

  • position (int) – Position of the desired data after splitting.

  • custom_log – Custom log to search in (optional).

Returns:

Extracted data value or None if not found.

Return type:

str

get_matching_chunks(search_terms)[source]#

Finds log chunks that match a list of search terms.

Parameters:

search_terms (list) – List of search terms to match in log chunks.

Returns:

List of matching log chunks.

Return type:

list

parse_convergence()[source]#

Parses convergence data from the Jedi log.

Returns:

Dataset containing the parsed convergence data.

Return type:

xr.Dataset

eva.data.jedi_log.get_data_from_line(jedi_log_line, search_term, separator, position)[source]#

Extracts data from a line in a Jedi log based on the specified search term, separator, and position.

Parameters:
  • jedi_log_line (str) – Line from the Jedi log.

  • search_term (str) – Search term to look for in the line.

  • separator (str) – Separator used to split the line.

  • position (int) – Position of the desired data after splitting.

Returns:

Extracted data value or None if not found.

Return type:

str

eva.data.lat_lon module#

class eva.data.lat_lon.LatLon(eva_class_name, eva_logger, timing)[source]#

Bases: EvaDatasetBase

A class for handling LatLon dataset configuration and processing.

Initialize the base class for eva dataset objects.

Parameters:
  • eva_class_name (str) – Name of the eva class to instantiate.

  • eva_logger (Logger) – Logger instance for logging messages.

  • timing (Timing) – Timing instance for performance measurement.

execute(dataset_config, data_collections, timing)[source]#

Executes the processing of LatLon dataset.

Parameters:
  • dataset_config (dict) – Configuration dictionary for the dataset.

  • data_collections (DataCollections) – Object for managing data collections.

  • timing (Timing) – Timing object for tracking execution time.

generate_default_config(filenames, collection_name)[source]#

Generates a default configuration for LatLon dataset.

Parameters:
  • filenames (list) – List of file names.

  • collection_name (str) – Name of the data collection.

Returns:

Default configuration dictionary.

Return type:

dict

eva.data.mon_data_space module#

class eva.data.mon_data_space.MonDataSpace(eva_class_name, eva_logger, timing)[source]#

Bases: EvaDatasetBase

A class for handling MonDataSpace dataset configuration and processing.

Initialize the base class for eva dataset objects.

Parameters:
  • eva_class_name (str) – Name of the eva class to instantiate.

  • eva_logger (Logger) – Logger instance for logging messages.

  • timing (Timing) – Timing instance for performance measurement.

execute(dataset_config, data_collections, timing)[source]#

Executes the processing of MonDataSpace dataset.

Parameters:
  • dataset_config (dict) – Configuration dictionary for the dataset.

  • data_collections (DataCollections) – Object for managing data collections.

  • timing (Timing) – Timing object for tracking execution time.

generate_default_config(filenames, collection_name, control_file)[source]#

Generates a default configuration for MonDataSpace dataset.

Parameters:
  • filenames (list) – List of file names.

  • collection_name (str) – Name of the data collection.

  • control_file (str) – Path to the control file.

Returns:

Default configuration dictionary.

Return type:

dict

get_ctl_dict(control_file)[source]#

Parse the control file and extract information into dictionaries.

Parameters:

control_file (str) – Path to the control file.

Returns:

Dictionary containing various coordinates and information. dict: Dictionary containing dimension sizes. dict: Dictionary containing sensor and satellite attributes. int: Number of variables. list: List of variable names. list: List of scan positions. dict: Dictionary containing channel information. dict: Dictionary containing level information.

Return type:

dict

get_dim_ranges(coords, dims, channo)[source]#

Get the valid ranges for each dimension based on the specified coordinates and channel numbers.

Parameters:
  • coords (dict) – Dictionary of coordinates.

  • dims (dict) – Dictionary of dimension sizes.

  • channo (list) – List of channel numbers.

Returns:

Valid x coordinate range or None. numpy.ndarray or None: Valid y coordinate range or None. numpy.ndarray or None: Valid z coordinate range or None.

Return type:

numpy.ndarray or None

get_ndims_used(dims)[source]#

Determine the number of dimensions used based on the provided dimension sizes.

Parameters:

dims (dict) – Dictionary of dimension sizes.

Returns:

Number of dimensions used. list: List of dimension names used.

Return type:

int

loadConditionalItems(dataset, chans_dict, levs_dict, scanpo)[source]#

Add channel, level, and scan related variables to the dataset.

Parameters:
  • dataset (xarray.Dataset) – Dataset to which variables will be added.

  • chans_dict (dict) – Dictionary of channel components.

  • levs_dict (dict) – Dictionary of level components.

  • scanpo (list) – List of scan positions.

Returns:

Dataset with added scan-related variables.

Return type:

xarray.Dataset

load_dset(vars, nvars, coords, darr, dims, ndims_used, dims_arr, x_range, y_range, z_range, cyc_darr, channo)[source]#

Create a dataset from various components.

Parameters:
  • vars (list) – List of variable names.

  • nvars (int) – Number of variables.

  • coords (dict) – Dictionary of coordinates.

  • darr (numpy.ndarray) – Numpy array of data.

  • dims (dict) – Dictionary of dimension sizes.

  • ndims_used (int) – Number of dimensions used.

  • dims_arr (list) – List of dimension names used.

  • x_range (numpy.ndarray or None) – Valid x coordinate range.

  • y_range (numpy.ndarray or None) – Valid y coordinate range.

  • z_range (numpy.ndarray or None) – Valid z coordinate range.

  • cyc_darr (numpy.ndarray) – Numpy array of cycle data.

  • channo (list) – List of channel numbers.

Returns:

Created dataset.

Return type:

xarray.Dataset

read_ieee(file_name, coords, dims, ndims_used, dims_arr, nvars, vars, file_path=None)[source]#

Read data from an IEEE file and arrange it into a numpy array.

Parameters:
  • file_name (str) – Name of the IEEE file to read.

  • coords (dict) – Dictionary of coordinates.

  • dims (dict) – Dictionary of dimension sizes.

  • ndims_used (int) – Number of dimensions used.

  • dims_arr (list) – List of dimension names used.

  • nvars (int) – Number of variables.

  • vars (list) – List of variable names.

  • file_path (str, optional) – Path to the directory containing the file. Defaults to None.

Returns:

Numpy array containing the read data. datetime.datetime: Cycle time extracted from the filename.

Return type:

numpy.ndarray

subset_coordinate(ds, coordinate, requested_subset, chans_dict)[source]#

Subset the input dataset along the specified coordinate dimension and update channel information.

Parameters:
  • ds (xarray.Dataset) – Input dataset to be subset.

  • coordinate (str) – Name of the coordinate dimension to subset.

  • requested_subset (list) – List of values to keep along the specified coordinate.

  • chans_dict (dict) – Dictionary of channel components.

Returns:

Subset of the input dataset. chans_dict (dict): Updated dictionary of channel components.

Return type:

xarray.Dataset

var_to_np_array(dims, ndims_used, dims_arr, var)[source]#

Create a numpy array with specified dimensions and fill it with a given value.

Parameters:
  • dims (dict) – Dictionary of dimension sizes.

  • ndims_used (int) – Number of dimensions used.

  • dims_arr (list) – List of dimension names used.

  • var – Value to fill the array with.

Returns:

Numpy array with the requested dimensions and filled with the given value.

Return type:

numpy.ndarray

eva.data.soca_restart module#

class eva.data.soca_restart.SocaRestart(eva_class_name, eva_logger, timing)[source]#

Bases: EvaDatasetBase

A class for reading and processing SOCA restart data.

This class inherits from EvaDatasetBase and provides methods to read and process SOCA restart data, including orographic fields and SOCA variables. The processed data is added to the data collections.

Parameters:

EvaDatasetBase (class) – The base class for EVITA dataset operations.

execute(dataset_config, data_collections, timing)[source]#

Process SOCA restart data and add it to the data collections. :param dataset_config: Configuration for the dataset. :type dataset_config: dict :param data_collections: Data collections to which the processed data :type data_collections: EvaDataCollections :param will be added.: :param timing: Timing information.

generate_default_config(filenames, collection_name)[source]#

Generate the default configuration for the dataset. :param filenames: Filenames. :param collection_name: Name of the collection.

Initialize the base class for eva dataset objects.

Parameters:
  • eva_class_name (str) – Name of the eva class to instantiate.

  • eva_logger (Logger) – Logger instance for logging messages.

  • timing (Timing) – Timing instance for performance measurement.

execute(dataset_config, data_collections, timing)[source]#

Process SOCA restart data and add it to the data collections.

Parameters:
  • dataset_config (dict) – Configuration for the dataset.

  • data_collections (EvaDataCollections) – Data collections to which the processed data will

  • added. (be) –

  • timing – Timing information.

generate_default_config(filenames, collection_name)[source]#

Generate a default configuration for the dataset.

This method generates a default configuration for the dataset based on the provided filenames and collection name. It can be used as a starting point for creating a configuration for the dataset.

Parameters:
  • filenames – Filenames or file paths relevant to the dataset.

  • collection_name (str) – Name of the collection for the dataset.

Returns:

A dictionary representing the default configuration for the dataset.

Return type:

dict

eva.data.soca_restart.read_soca(file, variable, logger)[source]#

Read SOCA data from the specified file for the given variable.

Parameters:
  • file (str) – Path to the SOCA data file.

  • variable (str) – Name of the variable to read.

  • logger (Logger) – Logger for logging messages.

Returns:

A tuple containing dimensions (list) and data (numpy.ndarray) for the specified variable.

Return type:

tuple

Module contents#