ratiopath.tiling.overlays

`tile_overlay(roi, overlay_path, tile_x, tile_y, mpp_x, mpp_y)`

Read overlay tiles for a batch of tiles.

For each overlay path the corresponding whole-slide image is opened (OpenSlide or OME-TIFF). The overlay is accessed at the slide level closest to each tile's mpp and the tile coordinates/extents are scaled to that level before reading.

The region of interest (ROI) geometry is treated in the same image space (resolution) as the underlying slide tiles. The region can be an arbitrary polygon. However, a bounding box of the region is used for reading overlay tiles and then masked to respect the region defined by provided overlay.

Unfortunately, at the moment we cannot use masked arrays directly in Ray Dataset. So instead of a numpy masked array, we provide the data and the mask as 2 separate arrays. The implementation is handled via TensorArray.

Parameters:

Name	Type	Description	Default
`roi`	`BaseGeometry`	The region of interest geometry.	required
`overlay_path`	`StringArray`	A pyarrow array of whole-slide image paths for the overlays.	required
`tile_x`	`IntegerArray`	A pyarrow array of tile x-coordinates.	required
`tile_y`	`IntegerArray`	A pyarrow array of tile y-coordinates.	required
`mpp_x`	`FloatArray`	A pyarrow array of physical resolutions (µm/px) of the underlying slide in X direction.	required
`mpp_y`	`FloatArray`	A pyarrow array of physical resolutions (µm/px) of the underlying slide in Y direction.	required

Returns:

Type	Description
`Array`	A pyarrow array of masked numpy arrays containing the read overlay tiles. - The first element is the tile data. - The second element is the mask (True for pixels outside the ROI).

Source code in ratiopath/tiling/overlays.py

@udf(return_dtype=DataType(np.ndarray))
def tile_overlay(
    roi: BaseGeometry,
    overlay_path: pa.StringArray,
    tile_x: pa.IntegerArray,
    tile_y: pa.IntegerArray,
    mpp_x: pa.FloatArray,
    mpp_y: pa.FloatArray,
) -> pa.Array:
    """Read overlay tiles for a batch of tiles.

    For each overlay path the corresponding whole-slide image is opened (OpenSlide or OME-TIFF).
    The overlay is accessed at the slide level closest to each tile's mpp and the tile
    coordinates/extents are scaled to that level before reading.

    The region of interest (ROI) geometry is treated in the same image space (resolution) as the underlying slide tiles.
    The region can be an arbitrary polygon. However, a bounding box of the region is used for reading overlay tiles
    and then masked to respect the region defined by provided overlay.

    Unfortunately, at the moment we cannot use masked arrays directly in Ray Dataset. So instead of a numpy masked array,
    we provide the data and the mask as 2 separate arrays. The implementation is handled via TensorArray.

    Args:
        roi: The region of interest geometry.
        overlay_path: A pyarrow array of whole-slide image paths for the overlays.
        tile_x: A pyarrow array of tile x-coordinates.
        tile_y: A pyarrow array of tile y-coordinates.
        mpp_x: A pyarrow array of physical resolutions (µm/px) of the underlying slide in X direction.
        mpp_y: A pyarrow array of physical resolutions (µm/px) of the underlying slide in Y direction.

    Returns:
        A pyarrow array of masked numpy arrays containing the read overlay tiles.
            - The first element is the tile data.
            - The second element is the mask (True for pixels outside the ROI).
    """
    overlays = _tile_overlay(roi, overlay_path, tile_x, tile_y, mpp_x, mpp_y)

    return pa.array(TensorArray([[overlay.data, overlay.mask] for overlay in overlays]))

`tile_overlay_overlap(roi, overlay_path, tile_x, tile_y, mpp_x, mpp_y)`

Calculate the overlap of each overlay tile with the region of interest (ROI).

For each overlay path the corresponding whole-slide image is opened (OpenSlide or OME-TIFF). The overlay is accessed at the slide level closest to each tile's mpp and the tile coordinates/extents are scaled to that level before reading.

The region of interest (ROI) geometry is treated in the same image space (resolution) as the underlying slide tiles. The region can be an arbitrary polygon.

The Pyarrow array that is used inside ray dataset stores data in array like dictionary. This results in all rows having same set of keys and missing keys are filled with Nones. Furthermore Pyarrow only support string keys in dictionaries. Therefore the keys in the resulting dictionary are string representations of the overlay values.

Parameters:

Name	Type	Description	Default
`roi`	`BaseGeometry`	The region of interest geometry.	required
`overlay_path`	`StringArray`	A pyarrow array of whole-slide image paths for the overlays.	required
`tile_x`	`IntegerArray`	A pyarrow array of tile x-coordinates.	required
`tile_y`	`IntegerArray`	A pyarrow array of tile y-coordinates.	required
`mpp_x`	`FloatArray`	A pyarrow array of physical resolutions (µm/px) of the underlying slide in X direction.	required
`mpp_y`	`FloatArray`	A pyarrow array of physical resolutions (µm/px) of the underlying slide in Y direction.	required

Returns:

Type	Description
`MapArray`	A pyarrow array of dictionaries mapping overlay values to their overlap fraction with the ROI.

Source code in ratiopath/tiling/overlays.py

@udf(return_dtype=DataType(dict))
def tile_overlay_overlap(
    roi: BaseGeometry,
    overlay_path: pa.StringArray,
    tile_x: pa.IntegerArray,
    tile_y: pa.IntegerArray,
    mpp_x: pa.FloatArray,
    mpp_y: pa.FloatArray,
) -> pa.MapArray:
    """Calculate the overlap of each overlay tile with the region of interest (ROI).

    For each overlay path the corresponding whole-slide image is opened (OpenSlide or OME-TIFF).
    The overlay is accessed at the slide level closest to each tile's mpp and the tile
    coordinates/extents are scaled to that level before reading.

    The region of interest (ROI) geometry is treated in the same image space (resolution) as the underlying slide tiles.
    The region can be an arbitrary polygon.

    The Pyarrow array that is used inside ray dataset stores data in array like dictionary.
    This results in all rows having same set of keys and missing keys are filled with Nones.
    Furthermore Pyarrow only support string keys in dictionaries. Therefore the keys in the
    resulting dictionary are string representations of the overlay values.

    Args:
        roi: The region of interest geometry.
        overlay_path: A pyarrow array of whole-slide image paths for the overlays.
        tile_x: A pyarrow array of tile x-coordinates.
        tile_y: A pyarrow array of tile y-coordinates.
        mpp_x: A pyarrow array of physical resolutions (µm/px) of the underlying slide in X direction.
        mpp_y: A pyarrow array of physical resolutions (µm/px) of the underlying slide in Y direction.

    Returns:
        A pyarrow array of dictionaries mapping overlay values to their overlap fraction with the ROI.
    """
    # The overlay is a masked array where the mask is True for pixels outside the ROI.
    overlay = _tile_overlay(roi, overlay_path, tile_x, tile_y, mpp_x, mpp_y)

    def overlap_fraction(overlay: np.ma.MaskedArray) -> dict[str, float]:
        """Calculate the overlap fraction of each unique value in the overlay."""
        return {
            # Pyarrow requires string keys in dictionaries
            str(value.item()): count.item() / overlay.count()
            for value, count in zip(
                *np.unique(overlay.compressed(), return_counts=True), strict=True
            )
        }

    overlap_vectorized = np.vectorize(overlap_fraction, otypes=[object])

    return pa.array(overlap_vectorized(overlay))