NWB Helpers#

Collection of Pydantic models and helper functions for configuring dataset IO parameters for different backends.

class BackendConfiguration(/, **data: 'Any') 'None'[source]#

Bases: BaseModel

A model for matching collections of DatasetConfigurations to a specific backend.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

apply_global_compression(compression_method: str, compression_options: dict[str, Any] | None = None) None[source]#

Apply compression settings to all datasets in this backend configuration.

This method modifies the backend configuration in-place, applying the specified compression method and options to ALL datasets, regardless of their current compression settings.

Parameters:
  • compression_method (str) – The compression method to apply to all datasets (e.g., “gzip”, “Blosc”, “Zstd”).

  • compression_options (dict, optional) – Additional compression options to apply. The available options depend on the compression method chosen.

Raises:

ValueError – If the compression method is not available for this backend type.

Examples

>>> backend_config = get_default_backend_configuration(nwbfile, backend="hdf5")
>>> backend_config.apply_global_compression("Blosc", {"cname": "zstd", "clevel": 5})
build_remapped_backend(locations_to_remap: dict[str, DatasetIOConfiguration]) Self[source]#

Build a remapped backend configuration by updating mismatched object IDs.

This function takes a dictionary of new DatasetIOConfiguration objects (as returned by find_locations_requiring_remapping) and updates a copy of the current configuration with these new configurations.

Parameters:

locations_to_remap (dict) – A dictionary mapping locations in the NWBFile to their corresponding new DatasetIOConfiguration objects with updated IDs.

Returns:

A new instance of the backend configuration class with updated object IDs for the specified locations.

Return type:

Self

find_locations_requiring_remapping(nwbfile: NWBFile) dict[str, DatasetIOConfiguration][source]#

Find locations of objects with mismatched IDs in the file.

This function identifies neurodata objects in the nwbfile that have matching locations with the current configuration but different object IDs. It returns a dictionary of remapped DatasetIOConfiguration objects for these mismatched locations.

Parameters:

nwbfile (pynwb.NWBFile) – The NWBFile object to check for mismatched object IDs.

Returns:

A dictionary where: * Keys: Locations in the NWB of objects with mismatched IDs. * Values: New DatasetIOConfiguration objects corresponding to the updated object IDs.

Return type:

dict[str, DatasetIOConfiguration]

Notes

  • This function only checks for objects with the same location but different IDs.

  • It does not identify objects missing from the current configuration.

  • The returned DatasetIOConfiguration objects are copies of the original configurations

with updated object_id fields.

classmethod from_nwbfile(nwbfile: NWBFile) Self[source]#

Create a backend configuration from an NWBFile with default chunking and compression settings.

Deprecated since version 0.8.4: The from_nwbfile method is deprecated and will be removed on or after June 2026. Use from_nwbfile_with_defaults or from_nwbfile_with_existing instead.

classmethod from_nwbfile_with_defaults(nwbfile: NWBFile) Self[source]#

Create a backend configuration from an NWBFile with default chunking and compression settings.

Parameters:

nwbfile (pynwb.NWBFile) – The NWBFile object to extract the backend configuration from.

Returns:

The backend configuration with default chunking and compression settings for each neurodata object in the NWBFile.

Return type:

Self

classmethod from_nwbfile_with_existing(nwbfile: NWBFile) Self[source]#

Create a backend configuration from an NWBFile using existing dataset settings.

This method extracts existing chunking and compression settings from an NWBFile that has already been written to disk.

Parameters:

nwbfile (pynwb.NWBFile) – The NWBFile object to extract the backend configuration from.

Returns:

The backend configuration with existing chunking and compression settings for each neurodata object in the NWBFile.

Return type:

Self

model_config: ClassVar[ConfigDict] = {'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod model_json_schema(**kwargs) dict[str, Any][source]#

Generates a JSON schema for a model class.

Args:

by_alias: Whether to use attribute aliases or not. ref_template: The reference template. union_format: The format to use when combining schemas from unions together. Can be one of:

keyword to combine schemas (the default). - ‘primitive_type_array’: Use the [type](https://json-schema.org/understanding-json-schema/reference/type) keyword as an array of strings, containing each type of the combination. If any of the schemas is not a primitive type (string, boolean, null, integer or number) or contains constraints/metadata, falls back to any_of.

schema_generator: To override the logic used to generate the JSON schema, as a subclass of

GenerateJsonSchema with your desired modifications

mode: The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod schema(**kwargs) dict[str, Any][source]#
classmethod schema_json(**kwargs) dict[str, Any][source]#
backend: ClassVar[Literal['hdf5', 'zarr']]#
pretty_backend_name: ClassVar[Literal['HDF5', 'Zarr']]#
data_io_class: ClassVar[type[DataIO]]#
dataset_configurations: dict[str, DatasetIOConfiguration]#
class HDF5BackendConfiguration(/, **data: 'Any') 'None'[source]#

Bases: BackendConfiguration

A model for matching collections of DatasetConfigurations specific to the HDF5 backend.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

backend: ClassVar[Literal['hdf5']] = 'hdf5'#
data_io_class#

alias of H5DataIO

model_config: ClassVar[ConfigDict] = {'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pretty_backend_name: ClassVar[Literal['HDF5']] = 'HDF5'#
dataset_configurations: dict[str, HDF5DatasetIOConfiguration]#
class ZarrBackendConfiguration(/, **data: 'Any') 'None'[source]#

Bases: BackendConfiguration

A model for matching collections of DatasetConfigurations specific to the Zarr backend.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

backend: ClassVar[Literal['zarr']] = 'zarr'#
data_io_class#

alias of ZarrDataIO

model_config: ClassVar[ConfigDict] = {'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pretty_backend_name: ClassVar[Literal['Zarr']] = 'Zarr'#
dataset_configurations: dict[str, ZarrDatasetIOConfiguration]#
number_of_jobs: int#
class DatasetIOConfiguration(/, **data: 'Any') 'None'[source]#

Bases: BaseModel, ABC

A data model for configuring options about an object that will become a HDF5 or Zarr Dataset in the file.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_neurodata_object(neurodata_object: Container, dataset_name: Literal['data', 'timestamps'], builder: BaseBuilder | None = None) Self[source]#

Construct an instance of a DatasetIOConfiguration for a dataset in a neurodata object in an NWBFile.

Parameters:
  • neurodata_object (hdmf.Container) – The neurodata object containing the field that will become a dataset when written to disk.

  • dataset_name (“data” or “timestamps”) – The name of the field that will become a dataset when written to disk. Some neurodata objects can have multiple such fields, such as pynwb.TimeSeries which can have both data and timestamps, each of which can be configured separately.

  • builder (hdmf.build.builders.BaseBuilder, optional) – The builder object that would be used to construct the NWBFile object. If None, the dataset is assumed to NOT have a compound dtype.

  • .. deprecated:: 0.8.4 – The from_neurodata_object method is deprecated and will be removed on or after June 2026. Use from_neurodata_object_with_defaults or from_neurodata_object_with_existing instead.

classmethod from_neurodata_object_with_defaults(neurodata_object: Container, dataset_name: Literal['data', 'timestamps'], builder: BaseBuilder | None = None) Self[source]#

Construct an instance of a DatasetIOConfiguration with default settings for a dataset in a neurodata object in an NWBFile.

Parameters:
  • neurodata_object (hdmf.Container) – The neurodata object containing the field that will become a dataset when written to disk.

  • dataset_name (“data” or “timestamps”) – The name of the field that will become a dataset when written to disk. Some neurodata objects can have multiple such fields, such as pynwb.TimeSeries which can have both data and timestamps, each of which can be configured separately.

  • builder (hdmf.build.builders.BaseBuilder, optional) – The builder object that would be used to construct the NWBFile object. If None, the dataset is assumed to NOT have a compound dtype.

abstractmethod classmethod from_neurodata_object_with_existing(neurodata_object: Container, dataset_name: Literal['data', 'timestamps']) Self[source]#

Construct an instance of a DatasetIOConfiguration from existing dataset settings.

This method extracts compression and chunking configurations from an already-written dataset. The neurodata object must have been read from an existing NWB file.

Parameters:
  • neurodata_object (hdmf.Container) – The neurodata object containing the field that has been read from disk.

  • dataset_name (“data” or “timestamps”) – The name of the field that corresponds to the dataset on disk.

Returns:

A DatasetIOConfiguration instance with settings matching the existing dataset.

Return type:

Self

abstractmethod get_data_io_kwargs() dict[str, Any][source]#

Fetch the properly structured dictionary of input arguments.

Should be passed directly as dynamic keyword arguments (**kwargs) into a H5DataIO or ZarrDataIO.

static get_dataset(neurodata_object: Container, dataset_name: Literal['data', 'timestamps']) Dataset | Array[source]#
static get_kwargs_from_neurodata_object(neurodata_object: Container, dataset_name: Literal['data', 'timestamps']) dict[source]#
model_config: ClassVar[ConfigDict] = {'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod model_json_schema(**kwargs) dict[str, Any][source]#

Generates a JSON schema for a model class.

Args:

by_alias: Whether to use attribute aliases or not. ref_template: The reference template. union_format: The format to use when combining schemas from unions together. Can be one of:

keyword to combine schemas (the default). - ‘primitive_type_array’: Use the [type](https://json-schema.org/understanding-json-schema/reference/type) keyword as an array of strings, containing each type of the combination. If any of the schemas is not a primitive type (string, boolean, null, integer or number) or contains constraints/metadata, falls back to any_of.

schema_generator: To override the logic used to generate the JSON schema, as a subclass of

GenerateJsonSchema with your desired modifications

mode: The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod schema(**kwargs) dict[str, Any][source]#
classmethod schema_json(**kwargs) dict[str, Any][source]#
classmethod validate_all_shapes(values: dict[str, Any]) dict[str, Any][source]#
object_id: str#
location_in_file: str#
dataset_name: Literal['data', 'timestamps']#
dtype: Annotated[dtype, InstanceOf()]#
full_shape: tuple[int, ...]#
chunk_shape: tuple[Annotated[int, Gt(gt=0)], ...] | None#
buffer_shape: tuple[int, ...] | None#
compression_method: str | Annotated[FilterRefBase, InstanceOf()] | Annotated[Codec, InstanceOf()] | None#
compression_options: dict[str, Any] | None#
class HDF5DatasetIOConfiguration(/, **data: 'Any') 'None'[source]#

Bases: DatasetIOConfiguration

A data model for configuring options about an object that will become a HDF5 Dataset in the file.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_neurodata_object_with_existing(neurodata_object: Container, dataset_name: Literal['data', 'timestamps']) Self[source]#

Construct an HDF5DatasetIOConfiguration from existing dataset settings.

Parameters:
  • neurodata_object (hdmf.Container) – The neurodata object containing the field that has been read from disk.

  • dataset_name (“data” or “timestamps”) – The name of the field that corresponds to the dataset on disk.

Returns:

An HDF5DatasetIOConfiguration instance with settings matching the existing dataset.

Return type:

Self

get_data_io_kwargs() dict[str, Any][source]#

Fetch the properly structured dictionary of input arguments.

Should be passed directly as dynamic keyword arguments (**kwargs) into a H5DataIO or ZarrDataIO.

model_config: ClassVar[ConfigDict] = {'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

compression_method: Literal['szip', 'lzf', 'gzip', 'Bitshuffle', 'Blosc', 'Blosc2', 'BZip2', 'FciDecomp', 'LZ4', 'Sperr', 'SZ', 'SZ3', 'Zfp', 'Zstd'] | Annotated[FilterRefBase, InstanceOf()] | None#
compression_options: dict[str, Any] | None#
class ZarrDatasetIOConfiguration(/, **data: 'Any') 'None'[source]#

Bases: DatasetIOConfiguration

A data model for configuring options about an object that will become a Zarr Dataset in the file.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_neurodata_object_with_existing(neurodata_object: Container, dataset_name: Literal['data', 'timestamps']) Self[source]#

Construct a ZarrDatasetIOConfiguration from existing dataset settings.

Parameters:
  • neurodata_object (hdmf.Container) – The neurodata object containing the field that has been read from disk.

  • dataset_name (“data” or “timestamps”) – The name of the field that corresponds to the dataset on disk.

Returns:

A ZarrDatasetIOConfiguration instance with settings matching the existing dataset.

Return type:

Self

get_data_io_kwargs() dict[str, Any][source]#

Fetch the properly structured dictionary of input arguments.

Should be passed directly as dynamic keyword arguments (**kwargs) into a H5DataIO or ZarrDataIO.

model_config: ClassVar[ConfigDict] = {'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_filter_methods_and_options_length_match(values: dict[str, Any])[source]#
compression_method: Literal['blosc', 'shuffle', 'bz2', 'categorize', 'delta', 'zstd', 'gzip', 'fletcher32', 'jenkins_lookup3', 'packbits', 'lz4', 'lzma', 'zlib'] | Annotated[Codec, InstanceOf()] | None#
compression_options: dict[str, Any] | None#
filter_methods: list[Literal['blosc', 'shuffle', 'bz2', 'categorize', 'delta', 'zstd', 'gzip', 'fletcher32', 'jenkins_lookup3', 'packbits', 'lz4', 'lzma', 'zlib'] | Annotated[Codec, InstanceOf()]] | None#
filter_options: list[dict[str, Any]] | None#
get_default_backend_configuration(nwbfile: NWBFile, backend: Literal['hdf5', 'zarr']) HDF5BackendConfiguration | ZarrBackendConfiguration[source]#

Fill a default backend configuration to serve as a starting point for further customization.

get_default_dataset_io_configurations(nwbfile: NWBFile, backend: None | Literal['hdf5', 'zarr'] = None) Generator[DatasetIOConfiguration, None, None][source]#

Generate DatasetIOConfiguration objects for wrapping NWB file objects with a specific backend.

This method automatically detects all objects in an NWB file that can be wrapped in a hdmf.DataIO. If the NWB file is in append mode, it supports auto-detection of the backend. Otherwise, it requires a backend specification.

Parameters:
  • nwbfile (pynwb.NWBFile) – An in-memory NWBFile object, either generated from the base class or read from an existing file of any backend.

  • backend (“hdf5” or “zarr”) – Which backend format type you would like to use in configuring each dataset’s compression methods and options.

Yields:

DatasetIOConfiguration – A summary of each detected object that can be wrapped in a hdmf.DataIO.

get_existing_backend_configuration(nwbfile: NWBFile) HDF5BackendConfiguration | ZarrBackendConfiguration[source]#

Fill an existing backend configuration to serve as a starting point for further customization.

Parameters:

nwbfile (NWBFile) – The NWBFile object to extract the backend configuration from. The nwbfile must have been read from an io object to work properly.

Returns:

The backend configuration extracted from the nwbfile.

Return type:

HDF5BackendConfiguration | ZarrBackendConfiguration

get_existing_dataset_io_configurations(nwbfile: NWBFile) Generator[DatasetIOConfiguration, None, None][source]#

Generate DatasetIOConfiguration objects for each neurodata object in an nwbfile.

Parameters:
  • nwbfile (pynwb.NWBFile) – An NWBFile object that has been read from an existing file with an existing backend configuration.

  • backend (“hdf5” or “zarr”) – Which backend format type you would like to use in configuring each dataset’s compression methods and options.

Yields:

DatasetIOConfiguration – A configuration object for each dataset in the NWB file.

configure_backend(nwbfile: NWBFile, backend_configuration: HDF5BackendConfiguration | ZarrBackendConfiguration) None[source]#

Configure all datasets specified in the backend_configuration with their appropriate DataIO and options.

Parameters:
  • nwbfile (pynwb.NWBFile) – The in-memory pynwb.NWBFile object to configure.

  • backend_configuration (HDF5BackendConfiguration or ZarrBackendConfiguration) – The configuration model to use when configuring the datasets for this backend.

add_device_from_metadata(nwbfile: NWBFile, modality: str = 'Ecephys', metadata: dict | None = None)[source]#

Add device information from metadata to NWBFile object.

Will always ensure nwbfile has at least one device, but multiple devices within the metadata list will also be created.

Parameters:
  • nwbfile (NWBFile) – NWBFile to which the new device information is to be added

  • modality (str) – Type of data recorded by device. Options: - Ecephys (default) - Icephys - Ophys - Behavior

  • metadata (dict) – Metadata info for constructing the NWBFile (optional). Should be of the format:

    metadata[modality]['Device'] = [
        {
            'name': my_name,
            'description': my_description
        },
        ...
    ]
    

    Missing keys in an element of metadata['Ecephys']['Device'] will be auto-populated with defaults.

configure_and_write_nwbfile(nwbfile: NWBFile, nwbfile_path: Annotated[Path, PathType(path_type=file)] | None = None, backend: Literal['hdf5', 'zarr'] | None = None, backend_configuration: BackendConfiguration | None = None) None[source]#

Write an NWB file using a specific backend or backend configuration.

A backend or a backend_configuration must be provided. To use the default backend configuration for the specified backend, provide only backend. To use a custom backend configuration, provide backend_configuration. If both are provided, backend must match backend_configuration.backend.

Parameters:
  • nwbfile (NWBFile)

  • nwbfile_path (FilePath | None, optional)

  • backend ({“hdf5”, “zarr”}, optional) – The type of backend used to create the file. This option uses the default backend_configuration for the specified backend. If no backend is specified, the backend_configuration is used.

  • backend_configuration (BackendConfiguration, optional) – Specifies the backend type and the chunking and compression parameters of each dataset. If no backend_configuration is specified, the default configuration for the specified backend is used.

get_default_nwbfile_metadata() DeepDict[source]#

Return structure with defaulted metadata values required for a NWBFile.

These standard defaults are:

metadata["NWBFile"]["session_description"] = "no description"
metadata["NWBFile"]["identifier"] = str(uuid.uuid4())

Proper conversions should override these fields prior to calling NWBConverter.run_conversion()

Returns:

A dictionary containing default metadata values for an NWBFile, including session description, identifier, and NeuroConv version information.

Return type:

DeepDict

get_module(nwbfile: NWBFile, name: str, description: str = None)[source]#

Check if processing module exists. If not, create it. Then return module.

Parameters:
  • nwbfile (NWBFile) – The NWB file to check or add the module to.

  • name (str) – The name of the processing module.

  • description (str, optional) – Description of the module. Only used if creating a new module.

Returns:

The existing or newly created processing module.

Return type:

ProcessingModule

make_nwbfile_from_metadata(metadata: dict) NWBFile[source]#

Make NWBFile from available metadata.

Parameters:

metadata (dict) – Dictionary containing metadata for creating the NWBFile. Must contain an ‘NWBFile’ key with required fields.

Returns:

A newly created NWBFile object initialized with the provided metadata.

Return type:

NWBFile

make_or_load_nwbfile(nwbfile_path: Annotated[Path, PathType(path_type=file)] | None = None, nwbfile: NWBFile | None = None, metadata: dict | None = None, overwrite: bool = False, backend: Literal['hdf5', 'zarr'] = 'hdf5', verbose: bool = False)[source]#

Context for automatically handling decision of write vs. append for writing an NWBFile.

Parameters:
  • nwbfile_path (FilePath) – Path for where to write or load (if overwrite=False) the NWBFile. If specified, the context will always write to this location.

  • nwbfile (NWBFile, optional) – An in-memory NWBFile object to write to the location.

  • metadata (dict, optional) – Metadata dictionary with information used to create the NWBFile when one does not exist or overwrite=True.

  • overwrite (bool, default: False) – Whether to overwrite the NWBFile if one exists at the nwbfile_path. The default is False (append mode).

  • backend (“hdf5” or “zarr”, default: “hdf5”) – The type of backend used to create the file.

  • verbose (bool, default: True) – If ‘nwbfile_path’ is specified, informs user after a successful write operation.

repack_nwbfile(*, nwbfile_path: Path, export_nwbfile_path: Path, backend: Literal['hdf5', 'zarr'] = 'hdf5', export_backend: Literal['hdf5', 'zarr', None] = None)[source]#

Repack an NWBFile with a new backend configuration.

Parameters:
  • nwbfile_path (Path) – Path to the NWB file to be repacked.

  • export_nwbfile_path (Path) – Path to export the repacked NWB file.

  • backend ({“hdf5”, “zarr”}, default: “hdf5”) – The type of backend used to read the file.

  • export_backend ({“hdf5”, “zarr”, None}, default: None) – The type of backend used to write the repacked file. If None, the same backend as the input file is used.