Skip to content

Output Format

Thyra converts MSI data into SpatialData objects stored as Zarr directories. This page describes what the output contains and how to work with it.


Loading a Dataset

import spatialdata as sd

sdata = sd.read_zarr("output.zarr")

print("Tables:", list(sdata.tables.keys()))
print("Images:", list(sdata.images.keys()))
print("Shapes:", list(sdata.shapes.keys()))

Structure Overview

A converted dataset contains the following elements:

Element Key Pattern Description
Table {dataset_id}_z{z} AnnData with intensity matrix (pixels x m/z), coordinates in .obs, m/z axis in .var
TIC Image {dataset_id}_z{z}_tic 2D total ion current image, shape (1, y, x)
Pixel Shapes {dataset_id}_z{z}_pixels GeoDataFrame with pixel box geometries
Optical Images {dataset_id}_optical_{name} Microscopy images (when available)

3D mode

When converted with --handle-3d, the _z{z} suffix is dropped and all slices are merged into a single table with x, y, z coordinates in .obs.

Default dataset ID

The default dataset_id is msi_dataset, so typical keys look like msi_dataset_z0, msi_dataset_z0_tic, etc. Change it with --dataset-id.


TIC Images

The TIC (Total Ion Current) image shows the sum of all intensities per pixel.

import numpy as np
import matplotlib.pyplot as plt

tic_key = [k for k in sdata.images if k.endswith("_tic")][0]
tic_array = np.asarray(sdata.images[tic_key])[0]  # drop channel dim -> (y, x)

plt.imshow(tic_array, cmap="viridis")
plt.colorbar(label="TIC Intensity")
plt.title("Total Ion Current")
plt.show()

Optical Images

When converted with --include-optical (the default for Bruker data), microscopy images are stored alongside the MSI data.

optical_keys = [k for k in sdata.images if "optical" in k]
print("Optical images:", optical_keys)

if optical_keys:
    opt_image = np.asarray(sdata.images[optical_keys[0]])
    # Shape is (channels, y, x) -- transpose for display
    opt_rgb = np.moveaxis(opt_image[:3], 0, -1)
    plt.imshow(opt_rgb)
    plt.title("Optical Image")
    plt.show()

TIC-to-Optical Overlay

The TIC image carries an affine transform that maps it into the optical image's coordinate space. This means overlays work automatically in tools like napari.

To inspect the transform:

from spatialdata.transformations import get_transformation

tic_element = sdata.images[tic_key]
transforms = get_transformation(tic_element, get_all=True)
affine = list(transforms.values())[0]

matrix = np.array(affine.to_affine_matrix(
    input_axes=("x", "y"), output_axes=("x", "y")
))
print(f"Scale: {matrix[0,0]:.2f}x, {matrix[1,1]:.2f}x")
print(f"Offset: ({matrix[0,2]:.0f}, {matrix[1,2]:.0f})")

How alignment works

The optical image has an Identity transform and defines the reference coordinate system. The TIC image has an Affine transform (scale + offset) that positions it in the optical coordinate space. This comes from the teaching point calibration in the .mis file (Bruker data).


Mass Spectrum Data

Average Spectrum

Each table stores a pre-computed average spectrum in uns:

table_key = list(sdata.tables.keys())[0]
msi_table = sdata.tables[table_key]

mz_values = msi_table.var["mz"].values
avg_spectrum = msi_table.uns["average_spectrum"]

plt.plot(mz_values, avg_spectrum, linewidth=0.5)
plt.xlabel("m/z")
plt.ylabel("Average Intensity")
plt.title("Average Mass Spectrum")
plt.show()

Per-Region Average Spectrum

For multi-region datasets, Thyra also stores a mean spectrum per acquisition region in uns["average_spectrum_per_region"]. Each key is the region number (as a string), and the value is a 1-D array matching the m/z axis.

if "average_spectrum_per_region" in msi_table.uns:
    per_region = msi_table.uns["average_spectrum_per_region"]
    for region_id, spectrum in per_region.items():
        plt.plot(mz_values, spectrum, label=f"Region {region_id}", linewidth=0.5)
    plt.xlabel("m/z")
    plt.ylabel("Average Intensity")
    plt.legend()
    plt.title("Average Spectrum per Region")
    plt.show()

Note

This key is only present when the dataset contains multiple acquisition regions. Single-region datasets only have the global average_spectrum.

Intensity Matrix

The intensity matrix is stored as a sparse matrix. Each row is one pixel, each column is one m/z bin:

X = msi_table.X  # sparse (pixels x m/z)
print(f"Shape: {X.shape}")
print(f"Non-zero: {X.nnz:,} ({X.nnz / (X.shape[0] * X.shape[1]) * 100:.2f}%)")

Sparse format

The default storage is CSC (Compressed Sparse Column), which is fast for extracting ion images (column = one m/z across all pixels). If you need fast per-pixel access, convert with --sparse-format csr.

Ion Images

To visualise the spatial distribution of a specific m/z value:

target_mz = 760.5
mz_idx = np.abs(mz_values - target_mz).argmin()

# Extract column from sparse matrix
ion_values = np.asarray(X[:, mz_idx].toarray()).flatten()

# Reconstruct image from pixel coordinates
x_coords = msi_table.obs["x"].values.astype(int)
y_coords = msi_table.obs["y"].values.astype(int)

ion_image = np.zeros((y_coords.max() + 1, x_coords.max() + 1))
ion_image[y_coords, x_coords] = ion_values

plt.imshow(ion_image, cmap="hot")
plt.colorbar(label="Intensity")
plt.title(f"m/z {mz_values[mz_idx]:.4f}")
plt.show()

Individual Spectra

pixel_idx = 0
spectrum = X[pixel_idx].toarray().flatten()

plt.plot(mz_values, spectrum, linewidth=0.5)
plt.xlabel("m/z")
plt.ylabel("Intensity")
plt.title(f"Pixel {pixel_idx}")
plt.show()

Pixel Coordinates

Coordinates are stored in the table's .obs DataFrame:

print("Columns:", list(msi_table.obs.columns))
Column Type Description
x, y int Raster grid coordinates (pixel indices)
spatial_x, spatial_y float Physical coordinates in micrometers
region categorical SpatialData region key
region_number int Acquisition region number

The DataFrame index is instance_id (a string pixel identifier).


Regions

Datasets acquired from multi-region slides (e.g., multiple tissue sections on one slide) store region information in two places.

Per-Pixel Region Number

The region_number column in .obs indicates which acquisition region each pixel belongs to:

print(msi_table.obs["region_number"].value_counts())

Region Summary

Region metadata -- including human-readable names from the instrument's Area definitions -- is stored as JSON in uns:

import json

regions = json.loads(msi_table.uns["regions"])
for r in regions:
    print(f"Region {r['region_number']}: {r.get('name', 'unnamed')} "
          f"({r['n_spectra']:,} spectra)")

Example output:

Region 0: E2506 (104,321 spectra)
Region 1: Matrix (1,053 spectra)

Filtering by region

To work with only one region:

mask = msi_table.obs["region_number"] == 0
tissue_table = msi_table[mask]


3D Data / Z-Slices

By default, Thyra stores each z-slice as a separate table and TIC image:

slice_tables = sorted(k for k in sdata.tables if "_z" in k)
print(f"{len(slice_tables)} z-slices: {slice_tables}")

# Access a single slice
z0_table = sdata.tables[slice_tables[0]]
print(f"Slice 0: {z0_table.shape}")

When converted with --handle-3d, all slices are combined into a single table with x, y, z coordinates in .obs.


Dataset Metadata

Global metadata

Stored in sdata.attrs:

if "pixel_size_x_um" in sdata.attrs:
    print(f"Pixel size: {sdata.attrs['pixel_size_x_um']} um")

if "msi_dataset_info" in sdata.attrs:
    info = sdata.attrs["msi_dataset_info"]
    print(f"Dimensions: {info.get('dimensions_xyz')}")
    print(f"Non-empty pixels: {info.get('non_empty_pixels'):,}")

Table-level metadata

Instrument info, acquisition parameters, and resampling config are in msi_table.uns:

print("uns keys:", list(msi_table.uns.keys()))

Recipes

Side-by-side TIC and ion image

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

ax1.imshow(tic_array, cmap="viridis")
ax1.set_title("TIC")

target_mz = 760.5
mz_idx = np.abs(mz_values - target_mz).argmin()
ion_values = np.asarray(X[:, mz_idx].toarray()).flatten()

x_coords = msi_table.obs["x"].values.astype(int)
y_coords = msi_table.obs["y"].values.astype(int)
ion_image = np.zeros((y_coords.max() + 1, x_coords.max() + 1))
ion_image[y_coords, x_coords] = ion_values

ax2.imshow(ion_image, cmap="hot")
ax2.set_title(f"m/z {mz_values[mz_idx]:.2f}")

plt.tight_layout()
plt.show()

Export ion image to TIFF

from PIL import Image

# Normalise to 0-255
ion_norm = (ion_image / ion_image.max() * 255).astype(np.uint8)
Image.fromarray(ion_norm).save("ion_image.tiff")

Top N most intense m/z values

avg = msi_table.uns["average_spectrum"]
top_n = 10
top_indices = np.argsort(avg)[-top_n:][::-1]

for idx in top_indices:
    print(f"  m/z {mz_values[idx]:.4f}  avg intensity {avg[idx]:.1f}")

Summary statistics

print(f"Dataset: {table_key}")
print(f"  Pixels: {msi_table.n_obs:,}")
print(f"  m/z bins: {msi_table.n_vars:,}")
print(f"  m/z range: {mz_values.min():.2f} -- {mz_values.max():.2f}")
print(f"  Sparsity: {(1 - X.nnz / (X.shape[0] * X.shape[1])) * 100:.1f}%")
if "pixel_size_x_um" in sdata.attrs:
    print(f"  Pixel size: {sdata.attrs['pixel_size_x_um']} um")