API Reference¶
Thyra's Python API centres on a single function: convert_msi. For most use
cases, that is all you need. The remaining sections document configuration
types, metadata objects, and base classes for advanced users who want to inspect
results or extend Thyra with new formats.
Converting Data¶
The primary entry point. Detects the input format, reads metadata, and writes a SpatialData/Zarr directory.
Basic usage¶
from thyra import convert_msi
# Minimal -- auto-detects format, pixel size, resampling, and streaming
success = convert_msi("input.imzML", "output.zarr")
# With explicit parameters
success = convert_msi(
"data/experiment.d",
"output/experiment.zarr",
dataset_id="hippocampus",
pixel_size_um=10.0,
)
With resampling configuration¶
success = convert_msi(
"input.imzML",
"output.zarr",
resampling_config={
"method": "tic_preserving",
"axis_type": "orbitrap",
"target_bins": 50000,
},
)
Multi-region dataset (select one region)¶
success = convert_msi(
"data/slide.d",
"output/tissue_only.zarr",
region=0, # convert only region 0
)
Force streaming for large datasets¶
Full signature¶
convert_msi(input_path: Union[str, Path], output_path: Union[str, Path], format_type: str = 'spatialdata', dataset_id: str = 'msi_dataset', pixel_size_um: Optional[float] = None, handle_3d: bool = False, resampling_config: Optional[Dict[str, Any]] = None, reader_options: Optional[Dict[str, Any]] = None, sparse_format: str = 'csc', include_optical: bool = True, streaming: Union[bool, Literal['auto']] = 'auto', region: Optional[int] = None, **kwargs: Any) -> bool
¶
Convert MSI data to the specified format.
Provides automatic pixel size detection from metadata or accepts user-specified values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_path
|
Union[str, Path]
|
Path to input MSI data file or directory |
required |
output_path
|
Union[str, Path]
|
Path for output file |
required |
format_type
|
str
|
Output format type (default: "spatialdata") |
'spatialdata'
|
dataset_id
|
str
|
Identifier for the dataset |
'msi_dataset'
|
pixel_size_um
|
Optional[float]
|
Pixel size in micrometers (None for auto) |
None
|
handle_3d
|
bool
|
Whether to process as 3D data (default: False) |
False
|
resampling_config
|
Optional[Dict[str, Any]]
|
Optional resampling configuration |
None
|
reader_options
|
Optional[Dict[str, Any]]
|
Optional format-specific reader options: - intensity_threshold: float - Minimum intensity to include. Default: None (no filtering). - use_recalibrated_state: bool - For Bruker data, use active/recalibrated calibration (default True). |
None
|
sparse_format
|
str
|
Sparse matrix format ('csc' or 'csr') |
'csc'
|
include_optical
|
bool
|
Include optical images (default: True) |
True
|
streaming
|
Union[bool, Literal['auto']]
|
Use streaming converter for large datasets. - "auto": Auto-detect based on dataset size >10GB (default) - True: Force streaming converter - False: Force standard converter |
'auto'
|
region
|
Optional[int]
|
For multi-region datasets (e.g. Bruker timsTOF), select a specific region number. None (default) converts all regions. Passed to the reader as reader_options["region"]. |
None
|
**kwargs
|
Any
|
Additional keyword arguments |
{}
|
Returns:
| Type | Description |
|---|---|
bool
|
True if conversion was successful, False otherwise |
Resampling Configuration¶
When you pass resampling_config to convert_msi, the dictionary keys map
to the fields of ResamplingConfig. You can pass a plain dict (as shown in
the examples above) or construct the dataclass directly:
from thyra.resampling.types import ResamplingConfig, ResamplingMethod, AxisType
config = ResamplingConfig(
method=ResamplingMethod.TIC_PRESERVING,
axis_type=AxisType.ORBITRAP,
target_bins=50000,
)
success = convert_msi("input.imzML", "output.zarr", resampling_config=config)
ResamplingConfig(method: Optional[ResamplingMethod] = None, axis_type: Optional[AxisType] = None, target_bins: Optional[int] = None, mass_width_da: Optional[float] = None, reference_mz: float = 500.0, min_mz: Optional[float] = None, max_mz: Optional[float] = None)
dataclass
¶
Configuration for resampling operations.
All fields default to None (auto-detect from instrument metadata).
You can override individual fields while leaving the rest automatic.
Attributes:
| Name | Type | Description |
|---|---|---|
method |
Optional[ResamplingMethod]
|
Resampling algorithm. |
axis_type |
Optional[AxisType]
|
Mass axis spacing model. |
target_bins |
Optional[int]
|
Number of bins in the resampled axis. |
mass_width_da |
Optional[float]
|
Bin width in Daltons at |
reference_mz |
float
|
Reference m/z for |
min_mz |
Optional[float]
|
Override the lower bound of the mass range. |
max_mz |
Optional[float]
|
Override the upper bound of the mass range. |
ResamplingMethod
¶
Bases: Enum
Available resampling methods.
Attributes:
| Name | Type | Description |
|---|---|---|
NONE |
No resampling -- keep the original mass axis. |
|
NEAREST_NEIGHBOR |
Snap each peak to the nearest target bin. |
|
TIC_PRESERVING |
Redistribute intensity so the total ion count is preserved after rebinning (recommended for quantitative work). |
|
LINEAR_INTERPOLATION |
Linear interpolation between neighbouring bins. |
AxisType
¶
Bases: Enum
Mass axis spacing model, determined by the analyser physics.
The axis type controls how target bins are distributed across the
mass range. When set to None in :class:ResamplingConfig, the
type is auto-detected from instrument metadata.
Attributes:
| Name | Type | Description |
|---|---|---|
CONSTANT |
Equidistant spacing (constant Da per bin). |
|
LINEAR_TOF |
Linear TOF -- spacing proportional to
|
|
REFLECTOR_TOF |
Reflector TOF -- spacing proportional to |
|
ORBITRAP |
Orbitrap -- spacing proportional to |
|
FTICR |
FTICR -- spacing proportional to |
|
UNKNOWN |
Unknown analyser; falls back to constant spacing. |
CONSTANT = 'constant'
class-attribute
instance-attribute
¶
FTICR = 'fticr'
class-attribute
instance-attribute
¶
LINEAR_TOF = 'linear_tof'
class-attribute
instance-attribute
¶
ORBITRAP = 'orbitrap'
class-attribute
instance-attribute
¶
REFLECTOR_TOF = 'reflector_tof'
class-attribute
instance-attribute
¶
UNKNOWN = 'unknown'
class-attribute
instance-attribute
¶
Metadata Types¶
Readers expose metadata through two dataclasses. EssentialMetadata contains
everything needed for conversion decisions (grid size, mass range, memory
estimate). ComprehensiveMetadata wraps essential metadata and adds
vendor-specific details for provenance and QC.
from thyra.readers.imzml import ImzMLReader
with ImzMLReader("sample.imzML") as reader:
meta = reader.get_essential_metadata()
print(f"Grid: {meta.dimensions}")
print(f"m/z range: {meta.mass_range}")
print(f"Spectra: {meta.n_spectra}")
print(f"Est. memory: {meta.estimated_memory_gb:.1f} GB")
EssentialMetadata(dimensions: Tuple[int, int, int], coordinate_bounds: Tuple[float, float, float, float], mass_range: Tuple[float, float], pixel_size: Optional[Tuple[float, float]], n_spectra: int, total_peaks: int, estimated_memory_gb: float, source_path: str, coordinate_offsets: Optional[Tuple[int, int, int]] = None, spectrum_type: Optional[str] = None, peak_counts_per_pixel: Optional[NDArray[np.int32]] = None)
dataclass
¶
Critical metadata for processing decisions and interpolation setup.
Attributes:
| Name | Type | Description |
|---|---|---|
dimensions |
Tuple[int, int, int]
|
Grid dimensions as |
coordinate_bounds |
Tuple[float, float, float, float]
|
Spatial extent as |
mass_range |
Tuple[float, float]
|
Mass-to-charge range as |
pixel_size |
Optional[Tuple[float, float]]
|
Pixel dimensions as |
n_spectra |
int
|
Total number of spectra in the dataset. |
total_peaks |
int
|
Total number of peaks across all spectra (used for sparse matrix pre-allocation). |
estimated_memory_gb |
float
|
Estimated dense memory footprint in GB. |
source_path |
str
|
Absolute path to the source data. |
coordinate_offsets |
Optional[Tuple[int, int, int]]
|
Raw coordinate offsets |
spectrum_type |
Optional[str]
|
Spectrum type string (e.g. |
peak_counts_per_pixel |
Optional[NDArray[int32]]
|
Per-pixel peak counts for CSR |
ComprehensiveMetadata(essential: EssentialMetadata, format_specific: Dict[str, Any], acquisition_params: Dict[str, Any], instrument_info: Dict[str, Any], raw_metadata: Dict[str, Any])
dataclass
¶
Complete metadata including format-specific details.
Wraps :class:EssentialMetadata and adds vendor-specific information
that is not needed for conversion but useful for provenance and QC.
Attributes:
| Name | Type | Description |
|---|---|---|
essential |
EssentialMetadata
|
Core metadata required for conversion. |
format_specific |
Dict[str, Any]
|
Vendor-specific metadata (e.g. ImzML CV params, Bruker property tables). |
acquisition_params |
Dict[str, Any]
|
Acquisition parameters such as polarity, scan range, and laser settings. |
instrument_info |
Dict[str, Any]
|
Instrument model, serial number, and software version. |
raw_metadata |
Dict[str, Any]
|
Unprocessed metadata exactly as read from the source file, preserved for round-trip fidelity. |
coordinate_bounds: Tuple[float, float, float, float]
property
¶
Convenience access to coordinate bounds from essential metadata.
dimensions: Tuple[int, int, int]
property
¶
Convenience access to dimensions from essential metadata.
pixel_size: Optional[Tuple[float, float]]
property
¶
Convenience access to pixel size from essential metadata.
Reader Base Class¶
All format readers (ImzML, Bruker, Waters) inherit from this base class. If
you are writing a custom reader for a new format, subclass BaseMSIReader
and implement the abstract methods below.
BaseMSIReader(data_path: Path, intensity_threshold: Optional[float] = None, **kwargs: object)
¶
Bases: ABC
Abstract base class for reading MSI data formats.
Initialize the reader with the path to the data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_path
|
Path
|
Path to the data file or directory |
required |
intensity_threshold
|
Optional[float]
|
Minimum intensity value to include. Values below this threshold are filtered out during iteration. Useful for removing detector noise in continuous mode data. Default: None (no filtering, include all values). |
None
|
**kwargs
|
object
|
Additional reader-specific parameters |
{}
|
has_shared_mass_axis: bool
property
¶
Check if all spectra share the same m/z axis.
For continuous ImzML data, all spectra have identical m/z values, so get_common_mass_axis() only needs to read the first spectrum. For processed/centroid data, each spectrum may have different m/z values, requiring iteration through all spectra.
Returns:
| Type | Description |
|---|---|
bool
|
True if all spectra share the same m/z axis (continuous mode), |
bool
|
False if each spectrum has different m/z values (processed mode). |
get_essential_metadata() -> EssentialMetadata
¶
Get essential metadata for processing.
get_comprehensive_metadata() -> ComprehensiveMetadata
¶
Get complete metadata.
get_common_mass_axis() -> NDArray[np.float64]
abstractmethod
¶
Return the common mass axis for all spectra.
This method must always return a valid array. If no common mass axis can be created, implementations should raise an exception.
get_optical_image_paths() -> List[Path]
¶
Get paths to optical/microscopy images associated with this data.
Returns list of TIFF file paths that contain optical images of the sample. These images can be stored alongside MSI data in SpatialData output for multimodal analysis.
Default implementation returns empty list. Subclasses should override to return paths to optical images specific to their format.
Returns:
| Type | Description |
|---|---|
List[Path]
|
List of paths to TIFF files, empty if no optical images available. |
iter_spectra(batch_size: Optional[int] = None) -> Generator[Tuple[Tuple[int, int, int], NDArray[np.float64], NDArray[np.float64]], None, None]
abstractmethod
¶
Iterate through spectra with optional batch processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch_size
|
Optional[int]
|
Optional batch size for spectrum iteration |
None
|
Yields:
| Type | Description |
|---|---|
Tuple[Tuple[int, int, int], NDArray[float64], NDArray[float64]]
|
Tuple containing:
|
Note
Subclasses should apply intensity threshold filtering by calling _apply_intensity_filter() on the intensities before yielding.
get_region_map() -> Optional[dict]
¶
Get per-pixel region mapping for multi-region datasets.
Returns a dictionary mapping normalized (0-based) (x, y) coordinate tuples to integer region numbers. This enables the converter to annotate each pixel with its acquisition region in obs["region_number"].
Default implementation returns None (single-region or no region info). Subclasses should override when region information is available.
Returns:
| Type | Description |
|---|---|
Optional[dict]
|
Dict mapping (x, y) tuples to region numbers, or None if |
Optional[dict]
|
region information is not available. |
get_region_info() -> Optional[list]
¶
Get summary information about acquisition regions.
Returns a list of dictionaries, each describing one region with at minimum: {"region_number": int, "n_spectra": int}. Additional keys (e.g. "name") are format-specific and optional.
Default implementation returns None (single-region or no region info). Subclasses should override when region information is available.
Returns:
| Type | Description |
|---|---|
Optional[list]
|
List of region summary dicts, or None if region information |
Optional[list]
|
is not available. |
close() -> None
abstractmethod
¶
Close all open file handles.
Converter Base Class¶
All output converters inherit from this base class. Currently only
SpatialData output is supported, but the architecture allows adding new output
formats by subclassing BaseMSIConverter.
BaseMSIConverter(reader: BaseMSIReader, output_path: Union[str, Path, PathLike[str]], dataset_id: str = 'msi_dataset', pixel_size_um: float = 1.0, pixel_size_source: PixelSizeSource = PixelSizeSource.DEFAULT, compression_level: int = 5, handle_3d: bool = False, **kwargs: Any)
¶
Bases: ABC
Base class for MSI data converters with shared functionality.
Implements common processing steps while allowing format-specific customization.
Initialize the MSI converter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reader
|
BaseMSIReader
|
MSI data reader instance |
required |
output_path
|
Union[str, Path, PathLike[str]]
|
Path for output file |
required |
dataset_id
|
str
|
Identifier for the dataset |
'msi_dataset'
|
pixel_size_um
|
float
|
Size of each pixel in micrometers |
1.0
|
pixel_size_source
|
PixelSizeSource
|
How pixel size was determined |
DEFAULT
|
compression_level
|
int
|
Compression level for output |
5
|
handle_3d
|
bool
|
Whether to process as 3D data |
False
|
**kwargs
|
Any
|
Additional keyword arguments |
{}
|
Format Detection and Plugin Registry¶
Thyra uses a registry to map file extensions and directory structures to the correct reader and converter classes. The public functions below let you detect formats programmatically or register your own reader/converter.
Detecting a format¶
from pathlib import Path
from thyra.core.registry import detect_format
fmt = detect_format(Path("experiment.imzML")) # "imzml"
fmt = detect_format(Path("data.d")) # "bruker" or "rapiflex"
fmt = detect_format(Path("data.raw")) # "waters"
Registering a custom reader¶
from thyra.core.registry import register_reader
from thyra.core.base_reader import BaseMSIReader
@register_reader("my_format")
class MyFormatReader(BaseMSIReader):
...
detect_format(input_path: Path) -> str
¶
Detect MSI format from input path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_path
|
Path
|
Path to MSI data file or directory |
required |
Returns:
| Type | Description |
|---|---|
str
|
Format name ('imzml', 'bruker', 'rapiflex', |
str
|
or 'waters') |
register_reader(format_name: str)
¶
Decorator for reader registration.
register_converter(format_name: str)
¶
Decorator for converter registration.