Skip to content

Write TIFF Outputs From Ray

Overview

Problem solved: save NumPy image arrays from a Ray dataset as TIFF files without writing your own row-by-row export loop.

Use this example when:

  • you already have image arrays in a Ray dataset,
  • you want one TIFF per row,
  • or you want to export masks, patches, or derived images for later inspection.

Why This Approach

VipsTiffDatasink wraps pyvips inside a Ray datasink so dataset rows can be written directly to TIFF. This is useful when your pipeline already lives in Ray and the natural output is a directory of images rather than another Parquet table.

Example

from ray.data.expressions import col

from ratiopath.ray import read_slides
from ratiopath.ray.datasource import VipsTiffDatasink
from ratiopath.tiling import grid_tiles, read_slide_tiles
from ratiopath.tiling.utils import row_hash


def expand_tiles(row: dict[str, object]) -> list[dict[str, object]]:
    return [
        {
            "path": row["path"],
            "slide_id": row["id"],
            "tile_x": x,
            "tile_y": y,
            "level": row["level"],
            "tile_extent_x": row["tile_extent_x"],
            "tile_extent_y": row["tile_extent_y"],
        }
        for x, y in grid_tiles(
            slide_extent=(row["extent_x"], row["extent_y"]),
            tile_extent=(row["tile_extent_x"], row["tile_extent_y"]),
            stride=(row["stride_x"], row["stride_y"]),
            last="keep",
        )
    ]


slides = read_slides("data", mpp=0.5, tile_extent=512, stride=512).map(row_hash)
tiles = slides.flat_map(expand_tiles).limit(32)

tiles_with_pixels = tiles.with_column(
    "tile",
    read_slide_tiles(
        col("path"),
        col("tile_x"),
        col("tile_y"),
        col("tile_extent_x"),
        col("tile_extent_y"),
        col("level"),
    ),
)

tiles_with_pixels.write_datasink(
    VipsTiffDatasink(
        path="tile_tiffs",
        data_column="tile",
        default_options={
            "compression": "lzw",
            "tile": True,
            "bigtiff": True,
        },
    )
)
Under the hood

VipsTiffDatasink expects a column containing NumPy arrays. For each row, it converts that array into a pyvips.Image and writes a TIFF buffer with the options you provide.

This keeps export inside the distributed pipeline instead of forcing a manual collect-and-save step on the driver.

When This Is Better Than Parquet

Use TIFF export when:

  • another tool expects image files,
  • you want to inspect outputs visually,
  • or you are exporting masks or derived rasters rather than metadata.

Use Parquet when:

  • the output is mostly scalar metadata,
  • you want efficient downstream filtering and joins,
  • or the image pixels are only an intermediate step.