API Reference¶

Thyra's Python API centres on a single function: convert_msi. For most use cases, that is all you need. The remaining sections document configuration types, metadata objects, and base classes for advanced users who want to inspect results or extend Thyra with new formats.

Converting Data¶

The primary entry point. Detects the input format, reads metadata, and writes a SpatialData/Zarr directory.

Basic usage¶

from thyra import convert_msi

# Minimal -- auto-detects format, pixel size, resampling, and streaming
success = convert_msi("input.imzML", "output.zarr")

# With explicit parameters
success = convert_msi(
    "data/experiment.d",
    "output/experiment.zarr",
    dataset_id="hippocampus",
    pixel_size_um=10.0,
)

With resampling configuration¶

success = convert_msi(
    "input.imzML",
    "output.zarr",
    resampling_config={
        "method": "tic_preserving",
        "axis_type": "orbitrap",
        "target_bins": 50000,
    },
)

Multi-region dataset (select one region)¶

success = convert_msi(
    "data/slide.d",
    "output/tissue_only.zarr",
    region=0,  # convert only region 0
)

Force streaming for large datasets¶

success = convert_msi(
    "data/large_dataset.d",
    "output/large.zarr",
    streaming=True,
)

Full signature¶

`convert_msi(input_path: Union[str, Path], output_path: Union[str, Path], format_type: str = 'spatialdata', dataset_id: str = 'msi_dataset', pixel_size_um: Optional[float] = None, handle_3d: bool = False, resampling_config: Optional[Dict[str, Any]] = None, reader_options: Optional[Dict[str, Any]] = None, sparse_format: str = 'csc', include_optical: bool = True, streaming: Union[bool, Literal['auto']] = 'auto', region: Optional[int] = None, **kwargs: Any) -> bool` ¶

Convert MSI data to the specified format.

Provides automatic pixel size detection from metadata or accepts user-specified values.

Parameters:

Name	Type	Description	Default
`input_path`	`Union[str, Path]`	Path to input MSI data file or directory	required
`output_path`	`Union[str, Path]`	Path for output file	required
`format_type`	`str`	Output format type (default: "spatialdata")	`'spatialdata'`
`dataset_id`	`str`	Identifier for the dataset	`'msi_dataset'`
`pixel_size_um`	`Optional[float]`	Pixel size in micrometers (None for auto)	`None`
`handle_3d`	`bool`	Whether to process as 3D data (default: False)	`False`
`resampling_config`	`Optional[Dict[str, Any]]`	Optional resampling configuration	`None`
`reader_options`	`Optional[Dict[str, Any]]`	Optional format-specific reader options: - intensity_threshold: float - Minimum intensity to include. Default: None (no filtering). - use_recalibrated_state: bool - For Bruker data, use active/recalibrated calibration (default True).	`None`
`sparse_format`	`str`	Sparse matrix format ('csc' or 'csr')	`'csc'`
`include_optical`	`bool`	Include optical images (default: True)	`True`
`streaming`	`Union[bool, Literal['auto']]`	Use streaming converter for large datasets. - "auto": Auto-detect based on dataset size >10GB (default) - True: Force streaming converter - False: Force standard converter	`'auto'`
`region`	`Optional[int]`	For multi-region datasets (e.g. Bruker timsTOF), select a specific region number. None (default) converts all regions. Passed to the reader as reader_options["region"].	`None`
`**kwargs`	`Any`	Additional keyword arguments	`{}`

Returns:

Type	Description
`bool`	True if conversion was successful, False otherwise

Resampling Configuration¶

When you pass resampling_config to convert_msi, the dictionary keys map to the fields of ResamplingConfig. You can pass a plain dict (as shown in the examples above) or construct the dataclass directly:

from thyra.resampling.types import ResamplingConfig, ResamplingMethod, AxisType

config = ResamplingConfig(
    method=ResamplingMethod.TIC_PRESERVING,
    axis_type=AxisType.ORBITRAP,
    target_bins=50000,
)

success = convert_msi("input.imzML", "output.zarr", resampling_config=config)

`ResamplingConfig(method: Optional[ResamplingMethod] = None, axis_type: Optional[AxisType] = None, target_bins: Optional[int] = None, mass_width_da: Optional[float] = None, reference_mz: float = 500.0, min_mz: Optional[float] = None, max_mz: Optional[float] = None)` `dataclass` ¶

Configuration for resampling operations.

All fields default to None (auto-detect from instrument metadata). You can override individual fields while leaving the rest automatic.

Attributes:

Name	Type	Description
`method`	`Optional[ResamplingMethod]`	Resampling algorithm. `None` auto-selects based on the instrument type.
`axis_type`	`Optional[AxisType]`	Mass axis spacing model. `None` auto-detects from the instrument metadata.
`target_bins`	`Optional[int]`	Number of bins in the resampled axis. `None` lets the resampler choose a bin count that preserves the native resolution.
`mass_width_da`	`Optional[float]`	Bin width in Daltons at `reference_mz`. Alternative to `target_bins` -- specify one or the other.
`reference_mz`	`float`	Reference m/z for `mass_width_da`. Default 500.0 Da.
`min_mz`	`Optional[float]`	Override the lower bound of the mass range.
`max_mz`	`Optional[float]`	Override the upper bound of the mass range.

`ResamplingMethod` ¶

Bases: Enum

Available resampling methods.

Attributes:

Name	Type	Description
`NONE`		No resampling -- keep the original mass axis.
`NEAREST_NEIGHBOR`		Snap each peak to the nearest target bin.
`TIC_PRESERVING`		Redistribute intensity so the total ion count is preserved after rebinning (recommended for quantitative work).
`LINEAR_INTERPOLATION`		Linear interpolation between neighbouring bins.

`LINEAR_INTERPOLATION = 'linear_interpolation'` `class-attribute` `instance-attribute` ¶

`NEAREST_NEIGHBOR = 'nearest_neighbor'` `class-attribute` `instance-attribute` ¶

`NONE = 'none'` `class-attribute` `instance-attribute` ¶

`TIC_PRESERVING = 'tic_preserving'` `class-attribute` `instance-attribute` ¶

`AxisType` ¶

Bases: Enum

Mass axis spacing model, determined by the analyser physics.

The axis type controls how target bins are distributed across the mass range. When set to None in :class:ResamplingConfig, the type is auto-detected from instrument metadata.

Attributes:

Name	Type	Description
`CONSTANT`		Equidistant spacing (constant Da per bin).
`LINEAR_TOF`		Linear TOF -- spacing proportional to `sqrt(m/z)`.
`REFLECTOR_TOF`		Reflector TOF -- spacing proportional to `m/z`.
`ORBITRAP`		Orbitrap -- spacing proportional to `m/z^(3/2)`.
`FTICR`		FTICR -- spacing proportional to `m/z^2`.
`UNKNOWN`		Unknown analyser; falls back to constant spacing.

`CONSTANT = 'constant'` `class-attribute` `instance-attribute` ¶

`FTICR = 'fticr'` `class-attribute` `instance-attribute` ¶

`LINEAR_TOF = 'linear_tof'` `class-attribute` `instance-attribute` ¶

`ORBITRAP = 'orbitrap'` `class-attribute` `instance-attribute` ¶

`REFLECTOR_TOF = 'reflector_tof'` `class-attribute` `instance-attribute` ¶

`UNKNOWN = 'unknown'` `class-attribute` `instance-attribute` ¶

Metadata Types¶

Readers expose metadata through two dataclasses. EssentialMetadata contains everything needed for conversion decisions (grid size, mass range, memory estimate). ComprehensiveMetadata wraps essential metadata and adds vendor-specific details for provenance and QC.

from thyra.readers.imzml import ImzMLReader

with ImzMLReader("sample.imzML") as reader:
    meta = reader.get_essential_metadata()
    print(f"Grid: {meta.dimensions}")
    print(f"m/z range: {meta.mass_range}")
    print(f"Spectra: {meta.n_spectra}")
    print(f"Est. memory: {meta.estimated_memory_gb:.1f} GB")

`EssentialMetadata(dimensions: Tuple[int, int, int], coordinate_bounds: Tuple[float, float, float, float], mass_range: Tuple[float, float], pixel_size: Optional[Tuple[float, float]], n_spectra: int, total_peaks: int, estimated_memory_gb: float, source_path: str, coordinate_offsets: Optional[Tuple[int, int, int]] = None, spectrum_type: Optional[str] = None, peak_counts_per_pixel: Optional[NDArray[np.int32]] = None)` `dataclass` ¶

Critical metadata for processing decisions and interpolation setup.

Attributes:

Name	Type	Description
`dimensions`	`Tuple[int, int, int]`	Grid dimensions as `(x, y, z)`.
`coordinate_bounds`	`Tuple[float, float, float, float]`	Spatial extent as `(min_x, max_x, min_y, max_y)`.
`mass_range`	`Tuple[float, float]`	Mass-to-charge range as `(min_mz, max_mz)`.
`pixel_size`	`Optional[Tuple[float, float]]`	Pixel dimensions as `(x_um, y_um)` in micrometres, or `None` when not detected.
`n_spectra`	`int`	Total number of spectra in the dataset.
`total_peaks`	`int`	Total number of peaks across all spectra (used for sparse matrix pre-allocation).
`estimated_memory_gb`	`float`	Estimated dense memory footprint in GB.
`source_path`	`str`	Absolute path to the source data.
`coordinate_offsets`	`Optional[Tuple[int, int, int]]`	Raw coordinate offsets `(x, y, z)` used to normalise coordinates to 0-based indexing.
`spectrum_type`	`Optional[str]`	Spectrum type string (e.g. `"centroid spectrum"`), used to guide resampling decisions.
`peak_counts_per_pixel`	`Optional[NDArray[int32]]`	Per-pixel peak counts for CSR `indptr` construction in the streaming converter. Array of size `n_pixels` where `arr[pixel_idx] = peak_count` and `pixel_idx = z * (n_x * n_y) + y * n_x + x`.

`has_pixel_size: bool` `property` ¶

Check if pixel size information is available.

`is_3d: bool` `property` ¶

Check if dataset is 3D (z > 1).

`ComprehensiveMetadata(essential: EssentialMetadata, format_specific: Dict[str, Any], acquisition_params: Dict[str, Any], instrument_info: Dict[str, Any], raw_metadata: Dict[str, Any])` `dataclass` ¶

Complete metadata including format-specific details.

Wraps :class:EssentialMetadata and adds vendor-specific information that is not needed for conversion but useful for provenance and QC.

Attributes:

Name	Type	Description
`essential`	`EssentialMetadata`	Core metadata required for conversion.
`format_specific`	`Dict[str, Any]`	Vendor-specific metadata (e.g. ImzML CV params, Bruker property tables).
`acquisition_params`	`Dict[str, Any]`	Acquisition parameters such as polarity, scan range, and laser settings.
`instrument_info`	`Dict[str, Any]`	Instrument model, serial number, and software version.
`raw_metadata`	`Dict[str, Any]`	Unprocessed metadata exactly as read from the source file, preserved for round-trip fidelity.

`coordinate_bounds: Tuple[float, float, float, float]` `property` ¶

Convenience access to coordinate bounds from essential metadata.

`dimensions: Tuple[int, int, int]` `property` ¶

Convenience access to dimensions from essential metadata.

`pixel_size: Optional[Tuple[float, float]]` `property` ¶

Convenience access to pixel size from essential metadata.

Reader Base Class¶

All format readers (ImzML, Bruker, Waters) inherit from this base class. If you are writing a custom reader for a new format, subclass BaseMSIReader and implement the abstract methods below.

`BaseMSIReader(data_path: Path, intensity_threshold: Optional[float] = None, **kwargs: object)` ¶

Bases: ABC

Abstract base class for reading MSI data formats.

Initialize the reader with the path to the data.

Parameters:

Name	Type	Description	Default
`data_path`	`Path`	Path to the data file or directory	required
`intensity_threshold`	`Optional[float]`	Minimum intensity value to include. Values below this threshold are filtered out during iteration. Useful for removing detector noise in continuous mode data. Default: None (no filtering, include all values).	`None`
`**kwargs`	`object`	Additional reader-specific parameters	`{}`

`has_shared_mass_axis: bool` `property` ¶

Check if all spectra share the same m/z axis.

For continuous ImzML data, all spectra have identical m/z values, so get_common_mass_axis() only needs to read the first spectrum. For processed/centroid data, each spectrum may have different m/z values, requiring iteration through all spectra.

Returns:

Type	Description
`bool`	True if all spectra share the same m/z axis (continuous mode),
`bool`	False if each spectrum has different m/z values (processed mode).

`get_essential_metadata() -> EssentialMetadata` ¶

Get essential metadata for processing.

`get_comprehensive_metadata() -> ComprehensiveMetadata` ¶

Get complete metadata.

`get_common_mass_axis() -> NDArray[np.float64]` `abstractmethod` ¶

Return the common mass axis for all spectra.

This method must always return a valid array. If no common mass axis can be created, implementations should raise an exception.

`get_optical_image_paths() -> List[Path]` ¶

Get paths to optical/microscopy images associated with this data.

Returns list of TIFF file paths that contain optical images of the sample. These images can be stored alongside MSI data in SpatialData output for multimodal analysis.

Default implementation returns empty list. Subclasses should override to return paths to optical images specific to their format.

Returns:

Type	Description
`List[Path]`	List of paths to TIFF files, empty if no optical images available.

`iter_spectra(batch_size: Optional[int] = None) -> Generator[Tuple[Tuple[int, int, int], NDArray[np.float64], NDArray[np.float64]], None, None]` `abstractmethod` ¶

Iterate through spectra with optional batch processing.

Parameters:

Name	Type	Description	Default
`batch_size`	`Optional[int]`	Optional batch size for spectrum iteration	`None`

Yields:

Type	Description
`Tuple[Tuple[int, int, int], NDArray[float64], NDArray[float64]]`	Tuple containing: Coordinates (x, y, z) using 0-based indexing m/z values array Intensity values array

Note

Subclasses should apply intensity threshold filtering by calling _apply_intensity_filter() on the intensities before yielding.

`get_region_map() -> Optional[dict]` ¶

Get per-pixel region mapping for multi-region datasets.

Returns a dictionary mapping normalized (0-based) (x, y) coordinate tuples to integer region numbers. This enables the converter to annotate each pixel with its acquisition region in obs["region_number"].

Default implementation returns None (single-region or no region info). Subclasses should override when region information is available.

Returns:

Type	Description
`Optional[dict]`	Dict mapping (x, y) tuples to region numbers, or None if
`Optional[dict]`	region information is not available.

`get_region_info() -> Optional[list]` ¶

Get summary information about acquisition regions.

Returns a list of dictionaries, each describing one region with at minimum: {"region_number": int, "n_spectra": int}. Additional keys (e.g. "name") are format-specific and optional.

Default implementation returns None (single-region or no region info). Subclasses should override when region information is available.

Returns:

Type	Description
`Optional[list]`	List of region summary dicts, or None if region information
`Optional[list]`	is not available.

`close() -> None` `abstractmethod` ¶

Close all open file handles.

Converter Base Class¶

All output converters inherit from this base class. Currently only SpatialData output is supported, but the architecture allows adding new output formats by subclassing BaseMSIConverter.

`BaseMSIConverter(reader: BaseMSIReader, output_path: Union[str, Path, PathLike[str]], dataset_id: str = 'msi_dataset', pixel_size_um: float = 1.0, pixel_size_source: PixelSizeSource = PixelSizeSource.DEFAULT, compression_level: int = 5, handle_3d: bool = False, **kwargs: Any)` ¶

Bases: ABC

Base class for MSI data converters with shared functionality.

Implements common processing steps while allowing format-specific customization.

Initialize the MSI converter.

Parameters:

Name	Type	Description	Default
`reader`	`BaseMSIReader`	MSI data reader instance	required
`output_path`	`Union[str, Path, PathLike[str]]`	Path for output file	required
`dataset_id`	`str`	Identifier for the dataset	`'msi_dataset'`
`pixel_size_um`	`float`	Size of each pixel in micrometers	`1.0`
`pixel_size_source`	`PixelSizeSource`	How pixel size was determined	`DEFAULT`
`compression_level`	`int`	Compression level for output	`5`
`handle_3d`	`bool`	Whether to process as 3D data	`False`
`**kwargs`	`Any`	Additional keyword arguments	`{}`

`pixel_size_um = pixel_size_um` `instance-attribute` ¶

`pixel_size_source = pixel_size_source` `instance-attribute` ¶

`dataset_id = dataset_id` `instance-attribute` ¶

`handle_3d = handle_3d` `instance-attribute` ¶

`convert() -> bool` ¶

Template method defining the conversion workflow.

Returns:¶

bool: True if conversion was successful, False otherwise.

Format Detection and Plugin Registry¶

Thyra uses a registry to map file extensions and directory structures to the correct reader and converter classes. The public functions below let you detect formats programmatically or register your own reader/converter.

Detecting a format¶

from pathlib import Path
from thyra.core.registry import detect_format

fmt = detect_format(Path("experiment.imzML"))  # "imzml"
fmt = detect_format(Path("data.d"))            # "bruker" or "rapiflex"
fmt = detect_format(Path("data.raw"))          # "waters"

Registering a custom reader¶

from thyra.core.registry import register_reader
from thyra.core.base_reader import BaseMSIReader

@register_reader("my_format")
class MyFormatReader(BaseMSIReader):
    ...

`detect_format(input_path: Path) -> str` ¶

Detect MSI format from input path.

Parameters:

Name	Type	Description	Default
`input_path`	`Path`	Path to MSI data file or directory	required

Returns:

Type	Description
`str`	Format name ('imzml', 'bruker', 'rapiflex',
`str`	or 'waters')

`register_reader(format_name: str)` ¶

Decorator for reader registration.

`register_converter(format_name: str)` ¶

Decorator for converter registration.

API Reference¶

Converting Data¶

Basic usage¶

With resampling configuration¶

Multi-region dataset (select one region)¶

Force streaming for large datasets¶

Full signature¶

Resampling Configuration¶

ResamplingConfig(method: Optional[ResamplingMethod] = None, axis_type: Optional[AxisType] = None, target_bins: Optional[int] = None, mass_width_da: Optional[float] = None, reference_mz: float = 500.0, min_mz: Optional[float] = None, max_mz: Optional[float] = None) dataclass ¶

ResamplingMethod ¶

LINEAR_INTERPOLATION = 'linear_interpolation' class-attribute instance-attribute ¶

NEAREST_NEIGHBOR = 'nearest_neighbor' class-attribute instance-attribute ¶

NONE = 'none' class-attribute instance-attribute ¶

TIC_PRESERVING = 'tic_preserving' class-attribute instance-attribute ¶

AxisType ¶

CONSTANT = 'constant' class-attribute instance-attribute ¶

FTICR = 'fticr' class-attribute instance-attribute ¶

LINEAR_TOF = 'linear_tof' class-attribute instance-attribute ¶

ORBITRAP = 'orbitrap' class-attribute instance-attribute ¶

REFLECTOR_TOF = 'reflector_tof' class-attribute instance-attribute ¶

UNKNOWN = 'unknown' class-attribute instance-attribute ¶

Metadata Types¶

has_pixel_size: bool property ¶

is_3d: bool property ¶

ComprehensiveMetadata(essential: EssentialMetadata, format_specific: Dict[str, Any], acquisition_params: Dict[str, Any], instrument_info: Dict[str, Any], raw_metadata: Dict[str, Any]) dataclass ¶

coordinate_bounds: Tuple[float, float, float, float] property ¶

dimensions: Tuple[int, int, int] property ¶

pixel_size: Optional[Tuple[float, float]] property ¶

Reader Base Class¶

BaseMSIReader(data_path: Path, intensity_threshold: Optional[float] = None, **kwargs: object) ¶

has_shared_mass_axis: bool property ¶

get_essential_metadata() -> EssentialMetadata ¶

get_comprehensive_metadata() -> ComprehensiveMetadata ¶

get_common_mass_axis() -> NDArray[np.float64] abstractmethod ¶

get_optical_image_paths() -> List[Path] ¶

iter_spectra(batch_size: Optional[int] = None) -> Generator[Tuple[Tuple[int, int, int], NDArray[np.float64], NDArray[np.float64]], None, None] abstractmethod ¶

get_region_map() -> Optional[dict] ¶

get_region_info() -> Optional[list] ¶

close() -> None abstractmethod ¶

Converter Base Class¶

BaseMSIConverter(reader: BaseMSIReader, output_path: Union[str, Path, PathLike[str]], dataset_id: str = 'msi_dataset', pixel_size_um: float = 1.0, pixel_size_source: PixelSizeSource = PixelSizeSource.DEFAULT, compression_level: int = 5, handle_3d: bool = False, **kwargs: Any) ¶

pixel_size_um = pixel_size_um instance-attribute ¶

pixel_size_source = pixel_size_source instance-attribute ¶

dataset_id = dataset_id instance-attribute ¶

handle_3d = handle_3d instance-attribute ¶

convert() -> bool ¶

Returns:¶

Format Detection and Plugin Registry¶

Detecting a format¶

Registering a custom reader¶

detect_format(input_path: Path) -> str ¶

register_reader(format_name: str) ¶

register_converter(format_name: str) ¶

`ResamplingConfig(method: Optional[ResamplingMethod] = None, axis_type: Optional[AxisType] = None, target_bins: Optional[int] = None, mass_width_da: Optional[float] = None, reference_mz: float = 500.0, min_mz: Optional[float] = None, max_mz: Optional[float] = None)` `dataclass` ¶

`ResamplingMethod` ¶

`LINEAR_INTERPOLATION = 'linear_interpolation'` `class-attribute` `instance-attribute` ¶

`NEAREST_NEIGHBOR = 'nearest_neighbor'` `class-attribute` `instance-attribute` ¶

`NONE = 'none'` `class-attribute` `instance-attribute` ¶

`TIC_PRESERVING = 'tic_preserving'` `class-attribute` `instance-attribute` ¶

`AxisType` ¶

`CONSTANT = 'constant'` `class-attribute` `instance-attribute` ¶

`FTICR = 'fticr'` `class-attribute` `instance-attribute` ¶

`LINEAR_TOF = 'linear_tof'` `class-attribute` `instance-attribute` ¶

`ORBITRAP = 'orbitrap'` `class-attribute` `instance-attribute` ¶

`REFLECTOR_TOF = 'reflector_tof'` `class-attribute` `instance-attribute` ¶

`UNKNOWN = 'unknown'` `class-attribute` `instance-attribute` ¶

`has_pixel_size: bool` `property` ¶

`is_3d: bool` `property` ¶

`ComprehensiveMetadata(essential: EssentialMetadata, format_specific: Dict[str, Any], acquisition_params: Dict[str, Any], instrument_info: Dict[str, Any], raw_metadata: Dict[str, Any])` `dataclass` ¶

`coordinate_bounds: Tuple[float, float, float, float]` `property` ¶

`dimensions: Tuple[int, int, int]` `property` ¶

`pixel_size: Optional[Tuple[float, float]]` `property` ¶

`BaseMSIReader(data_path: Path, intensity_threshold: Optional[float] = None, **kwargs: object)` ¶

`has_shared_mass_axis: bool` `property` ¶

`get_essential_metadata() -> EssentialMetadata` ¶

`get_comprehensive_metadata() -> ComprehensiveMetadata` ¶

`get_common_mass_axis() -> NDArray[np.float64]` `abstractmethod` ¶

`get_optical_image_paths() -> List[Path]` ¶

`iter_spectra(batch_size: Optional[int] = None) -> Generator[Tuple[Tuple[int, int, int], NDArray[np.float64], NDArray[np.float64]], None, None]` `abstractmethod` ¶

`get_region_map() -> Optional[dict]` ¶

`get_region_info() -> Optional[list]` ¶

`close() -> None` `abstractmethod` ¶

`BaseMSIConverter(reader: BaseMSIReader, output_path: Union[str, Path, PathLike[str]], dataset_id: str = 'msi_dataset', pixel_size_um: float = 1.0, pixel_size_source: PixelSizeSource = PixelSizeSource.DEFAULT, compression_level: int = 5, handle_3d: bool = False, **kwargs: Any)` ¶

`pixel_size_um = pixel_size_um` `instance-attribute` ¶

`pixel_size_source = pixel_size_source` `instance-attribute` ¶

`dataset_id = dataset_id` `instance-attribute` ¶

`handle_3d = handle_3d` `instance-attribute` ¶

`convert() -> bool` ¶

`detect_format(input_path: Path) -> str` ¶

`register_reader(format_name: str)` ¶

`register_converter(format_name: str)` ¶