Skip to content

API Reference

Thyra's Python API centres on a single function: convert_msi. For most use cases, that is all you need. The remaining sections document configuration types, metadata objects, and base classes for advanced users who want to inspect results or extend Thyra with new formats.


Converting Data

The primary entry point. Detects the input format, reads metadata, and writes a SpatialData/Zarr directory.

Basic usage

from thyra import convert_msi

# Minimal -- auto-detects format, pixel size, resampling, and streaming
success = convert_msi("input.imzML", "output.zarr")

# With explicit parameters
success = convert_msi(
    "data/experiment.d",
    "output/experiment.zarr",
    dataset_id="hippocampus",
    pixel_size_um=10.0,
)

With resampling configuration

success = convert_msi(
    "input.imzML",
    "output.zarr",
    resampling_config={
        "method": "tic_preserving",
        "axis_type": "orbitrap",
        "target_bins": 50000,
    },
)

Multi-region dataset (select one region)

success = convert_msi(
    "data/slide.d",
    "output/tissue_only.zarr",
    region=0,  # convert only region 0
)

Force streaming for large datasets

success = convert_msi(
    "data/large_dataset.d",
    "output/large.zarr",
    streaming=True,
)

Full signature

convert_msi(input_path: Union[str, Path], output_path: Union[str, Path], format_type: str = 'spatialdata', dataset_id: str = 'msi_dataset', pixel_size_um: Optional[float] = None, handle_3d: bool = False, resampling_config: Optional[Dict[str, Any]] = None, reader_options: Optional[Dict[str, Any]] = None, sparse_format: str = 'csc', include_optical: bool = True, streaming: Union[bool, Literal['auto']] = 'auto', region: Optional[int] = None, **kwargs: Any) -> bool

Convert MSI data to the specified format.

Provides automatic pixel size detection from metadata or accepts user-specified values.

Parameters:

Name Type Description Default
input_path Union[str, Path]

Path to input MSI data file or directory

required
output_path Union[str, Path]

Path for output file

required
format_type str

Output format type (default: "spatialdata")

'spatialdata'
dataset_id str

Identifier for the dataset

'msi_dataset'
pixel_size_um Optional[float]

Pixel size in micrometers (None for auto)

None
handle_3d bool

Whether to process as 3D data (default: False)

False
resampling_config Optional[Dict[str, Any]]

Optional resampling configuration

None
reader_options Optional[Dict[str, Any]]

Optional format-specific reader options: - intensity_threshold: float - Minimum intensity to include. Default: None (no filtering). - use_recalibrated_state: bool - For Bruker data, use active/recalibrated calibration (default True).

None
sparse_format str

Sparse matrix format ('csc' or 'csr')

'csc'
include_optical bool

Include optical images (default: True)

True
streaming Union[bool, Literal['auto']]

Use streaming converter for large datasets. - "auto": Auto-detect based on dataset size >10GB (default) - True: Force streaming converter - False: Force standard converter

'auto'
region Optional[int]

For multi-region datasets (e.g. Bruker timsTOF), select a specific region number. None (default) converts all regions. Passed to the reader as reader_options["region"].

None
**kwargs Any

Additional keyword arguments

{}

Returns:

Type Description
bool

True if conversion was successful, False otherwise


Resampling Configuration

When you pass resampling_config to convert_msi, the dictionary keys map to the fields of ResamplingConfig. You can pass a plain dict (as shown in the examples above) or construct the dataclass directly:

from thyra.resampling.types import ResamplingConfig, ResamplingMethod, AxisType

config = ResamplingConfig(
    method=ResamplingMethod.TIC_PRESERVING,
    axis_type=AxisType.ORBITRAP,
    target_bins=50000,
)

success = convert_msi("input.imzML", "output.zarr", resampling_config=config)

ResamplingConfig(method: Optional[ResamplingMethod] = None, axis_type: Optional[AxisType] = None, target_bins: Optional[int] = None, mass_width_da: Optional[float] = None, reference_mz: float = 500.0, min_mz: Optional[float] = None, max_mz: Optional[float] = None) dataclass

Configuration for resampling operations.

All fields default to None (auto-detect from instrument metadata). You can override individual fields while leaving the rest automatic.

Attributes:

Name Type Description
method Optional[ResamplingMethod]

Resampling algorithm. None auto-selects based on the instrument type.

axis_type Optional[AxisType]

Mass axis spacing model. None auto-detects from the instrument metadata.

target_bins Optional[int]

Number of bins in the resampled axis. None lets the resampler choose a bin count that preserves the native resolution.

mass_width_da Optional[float]

Bin width in Daltons at reference_mz. Alternative to target_bins -- specify one or the other.

reference_mz float

Reference m/z for mass_width_da. Default 500.0 Da.

min_mz Optional[float]

Override the lower bound of the mass range.

max_mz Optional[float]

Override the upper bound of the mass range.

ResamplingMethod

Bases: Enum

Available resampling methods.

Attributes:

Name Type Description
NONE

No resampling -- keep the original mass axis.

NEAREST_NEIGHBOR

Snap each peak to the nearest target bin.

TIC_PRESERVING

Redistribute intensity so the total ion count is preserved after rebinning (recommended for quantitative work).

LINEAR_INTERPOLATION

Linear interpolation between neighbouring bins.

LINEAR_INTERPOLATION = 'linear_interpolation' class-attribute instance-attribute

NEAREST_NEIGHBOR = 'nearest_neighbor' class-attribute instance-attribute

NONE = 'none' class-attribute instance-attribute

TIC_PRESERVING = 'tic_preserving' class-attribute instance-attribute

AxisType

Bases: Enum

Mass axis spacing model, determined by the analyser physics.

The axis type controls how target bins are distributed across the mass range. When set to None in :class:ResamplingConfig, the type is auto-detected from instrument metadata.

Attributes:

Name Type Description
CONSTANT

Equidistant spacing (constant Da per bin).

LINEAR_TOF

Linear TOF -- spacing proportional to sqrt(m/z).

REFLECTOR_TOF

Reflector TOF -- spacing proportional to m/z.

ORBITRAP

Orbitrap -- spacing proportional to m/z^(3/2).

FTICR

FTICR -- spacing proportional to m/z^2.

UNKNOWN

Unknown analyser; falls back to constant spacing.

CONSTANT = 'constant' class-attribute instance-attribute

FTICR = 'fticr' class-attribute instance-attribute

LINEAR_TOF = 'linear_tof' class-attribute instance-attribute

ORBITRAP = 'orbitrap' class-attribute instance-attribute

REFLECTOR_TOF = 'reflector_tof' class-attribute instance-attribute

UNKNOWN = 'unknown' class-attribute instance-attribute


Metadata Types

Readers expose metadata through two dataclasses. EssentialMetadata contains everything needed for conversion decisions (grid size, mass range, memory estimate). ComprehensiveMetadata wraps essential metadata and adds vendor-specific details for provenance and QC.

from thyra.readers.imzml import ImzMLReader

with ImzMLReader("sample.imzML") as reader:
    meta = reader.get_essential_metadata()
    print(f"Grid: {meta.dimensions}")
    print(f"m/z range: {meta.mass_range}")
    print(f"Spectra: {meta.n_spectra}")
    print(f"Est. memory: {meta.estimated_memory_gb:.1f} GB")

EssentialMetadata(dimensions: Tuple[int, int, int], coordinate_bounds: Tuple[float, float, float, float], mass_range: Tuple[float, float], pixel_size: Optional[Tuple[float, float]], n_spectra: int, total_peaks: int, estimated_memory_gb: float, source_path: str, coordinate_offsets: Optional[Tuple[int, int, int]] = None, spectrum_type: Optional[str] = None, peak_counts_per_pixel: Optional[NDArray[np.int32]] = None) dataclass

Critical metadata for processing decisions and interpolation setup.

Attributes:

Name Type Description
dimensions Tuple[int, int, int]

Grid dimensions as (x, y, z).

coordinate_bounds Tuple[float, float, float, float]

Spatial extent as (min_x, max_x, min_y, max_y).

mass_range Tuple[float, float]

Mass-to-charge range as (min_mz, max_mz).

pixel_size Optional[Tuple[float, float]]

Pixel dimensions as (x_um, y_um) in micrometres, or None when not detected.

n_spectra int

Total number of spectra in the dataset.

total_peaks int

Total number of peaks across all spectra (used for sparse matrix pre-allocation).

estimated_memory_gb float

Estimated dense memory footprint in GB.

source_path str

Absolute path to the source data.

coordinate_offsets Optional[Tuple[int, int, int]]

Raw coordinate offsets (x, y, z) used to normalise coordinates to 0-based indexing.

spectrum_type Optional[str]

Spectrum type string (e.g. "centroid spectrum"), used to guide resampling decisions.

peak_counts_per_pixel Optional[NDArray[int32]]

Per-pixel peak counts for CSR indptr construction in the streaming converter. Array of size n_pixels where arr[pixel_idx] = peak_count and pixel_idx = z * (n_x * n_y) + y * n_x + x.

has_pixel_size: bool property

Check if pixel size information is available.

is_3d: bool property

Check if dataset is 3D (z > 1).

ComprehensiveMetadata(essential: EssentialMetadata, format_specific: Dict[str, Any], acquisition_params: Dict[str, Any], instrument_info: Dict[str, Any], raw_metadata: Dict[str, Any]) dataclass

Complete metadata including format-specific details.

Wraps :class:EssentialMetadata and adds vendor-specific information that is not needed for conversion but useful for provenance and QC.

Attributes:

Name Type Description
essential EssentialMetadata

Core metadata required for conversion.

format_specific Dict[str, Any]

Vendor-specific metadata (e.g. ImzML CV params, Bruker property tables).

acquisition_params Dict[str, Any]

Acquisition parameters such as polarity, scan range, and laser settings.

instrument_info Dict[str, Any]

Instrument model, serial number, and software version.

raw_metadata Dict[str, Any]

Unprocessed metadata exactly as read from the source file, preserved for round-trip fidelity.

coordinate_bounds: Tuple[float, float, float, float] property

Convenience access to coordinate bounds from essential metadata.

dimensions: Tuple[int, int, int] property

Convenience access to dimensions from essential metadata.

pixel_size: Optional[Tuple[float, float]] property

Convenience access to pixel size from essential metadata.


Reader Base Class

All format readers (ImzML, Bruker, Waters) inherit from this base class. If you are writing a custom reader for a new format, subclass BaseMSIReader and implement the abstract methods below.

BaseMSIReader(data_path: Path, intensity_threshold: Optional[float] = None, **kwargs: object)

Bases: ABC

Abstract base class for reading MSI data formats.

Initialize the reader with the path to the data.

Parameters:

Name Type Description Default
data_path Path

Path to the data file or directory

required
intensity_threshold Optional[float]

Minimum intensity value to include. Values below this threshold are filtered out during iteration. Useful for removing detector noise in continuous mode data. Default: None (no filtering, include all values).

None
**kwargs object

Additional reader-specific parameters

{}

has_shared_mass_axis: bool property

Check if all spectra share the same m/z axis.

For continuous ImzML data, all spectra have identical m/z values, so get_common_mass_axis() only needs to read the first spectrum. For processed/centroid data, each spectrum may have different m/z values, requiring iteration through all spectra.

Returns:

Type Description
bool

True if all spectra share the same m/z axis (continuous mode),

bool

False if each spectrum has different m/z values (processed mode).

get_essential_metadata() -> EssentialMetadata

Get essential metadata for processing.

get_comprehensive_metadata() -> ComprehensiveMetadata

Get complete metadata.

get_common_mass_axis() -> NDArray[np.float64] abstractmethod

Return the common mass axis for all spectra.

This method must always return a valid array. If no common mass axis can be created, implementations should raise an exception.

get_optical_image_paths() -> List[Path]

Get paths to optical/microscopy images associated with this data.

Returns list of TIFF file paths that contain optical images of the sample. These images can be stored alongside MSI data in SpatialData output for multimodal analysis.

Default implementation returns empty list. Subclasses should override to return paths to optical images specific to their format.

Returns:

Type Description
List[Path]

List of paths to TIFF files, empty if no optical images available.

iter_spectra(batch_size: Optional[int] = None) -> Generator[Tuple[Tuple[int, int, int], NDArray[np.float64], NDArray[np.float64]], None, None] abstractmethod

Iterate through spectra with optional batch processing.

Parameters:

Name Type Description Default
batch_size Optional[int]

Optional batch size for spectrum iteration

None

Yields:

Type Description
Tuple[Tuple[int, int, int], NDArray[float64], NDArray[float64]]

Tuple containing:

  • Coordinates (x, y, z) using 0-based indexing
  • m/z values array

  • Intensity values array

Note

Subclasses should apply intensity threshold filtering by calling _apply_intensity_filter() on the intensities before yielding.

get_region_map() -> Optional[dict]

Get per-pixel region mapping for multi-region datasets.

Returns a dictionary mapping normalized (0-based) (x, y) coordinate tuples to integer region numbers. This enables the converter to annotate each pixel with its acquisition region in obs["region_number"].

Default implementation returns None (single-region or no region info). Subclasses should override when region information is available.

Returns:

Type Description
Optional[dict]

Dict mapping (x, y) tuples to region numbers, or None if

Optional[dict]

region information is not available.

get_region_info() -> Optional[list]

Get summary information about acquisition regions.

Returns a list of dictionaries, each describing one region with at minimum: {"region_number": int, "n_spectra": int}. Additional keys (e.g. "name") are format-specific and optional.

Default implementation returns None (single-region or no region info). Subclasses should override when region information is available.

Returns:

Type Description
Optional[list]

List of region summary dicts, or None if region information

Optional[list]

is not available.

close() -> None abstractmethod

Close all open file handles.


Converter Base Class

All output converters inherit from this base class. Currently only SpatialData output is supported, but the architecture allows adding new output formats by subclassing BaseMSIConverter.

BaseMSIConverter(reader: BaseMSIReader, output_path: Union[str, Path, PathLike[str]], dataset_id: str = 'msi_dataset', pixel_size_um: float = 1.0, pixel_size_source: PixelSizeSource = PixelSizeSource.DEFAULT, compression_level: int = 5, handle_3d: bool = False, **kwargs: Any)

Bases: ABC

Base class for MSI data converters with shared functionality.

Implements common processing steps while allowing format-specific customization.

Initialize the MSI converter.

Parameters:

Name Type Description Default
reader BaseMSIReader

MSI data reader instance

required
output_path Union[str, Path, PathLike[str]]

Path for output file

required
dataset_id str

Identifier for the dataset

'msi_dataset'
pixel_size_um float

Size of each pixel in micrometers

1.0
pixel_size_source PixelSizeSource

How pixel size was determined

DEFAULT
compression_level int

Compression level for output

5
handle_3d bool

Whether to process as 3D data

False
**kwargs Any

Additional keyword arguments

{}

pixel_size_um = pixel_size_um instance-attribute

pixel_size_source = pixel_size_source instance-attribute

dataset_id = dataset_id instance-attribute

handle_3d = handle_3d instance-attribute

convert() -> bool

Template method defining the conversion workflow.

Returns:

bool: True if conversion was successful, False otherwise.


Format Detection and Plugin Registry

Thyra uses a registry to map file extensions and directory structures to the correct reader and converter classes. The public functions below let you detect formats programmatically or register your own reader/converter.

Detecting a format

from pathlib import Path
from thyra.core.registry import detect_format

fmt = detect_format(Path("experiment.imzML"))  # "imzml"
fmt = detect_format(Path("data.d"))            # "bruker" or "rapiflex"
fmt = detect_format(Path("data.raw"))          # "waters"

Registering a custom reader

from thyra.core.registry import register_reader
from thyra.core.base_reader import BaseMSIReader

@register_reader("my_format")
class MyFormatReader(BaseMSIReader):
    ...

detect_format(input_path: Path) -> str

Detect MSI format from input path.

Parameters:

Name Type Description Default
input_path Path

Path to MSI data file or directory

required

Returns:

Type Description
str

Format name ('imzml', 'bruker', 'rapiflex',

str

or 'waters')

register_reader(format_name: str)

Decorator for reader registration.

register_converter(format_name: str)

Decorator for converter registration.