Skip to content

ratiopath.tiling.overlays

tile_overlay(roi, overlay_path, tile_x, tile_y, mpp_x, mpp_y)

Read overlay tiles for a batch of tiles.

For each overlay path the corresponding whole-slide image is opened (OpenSlide or OME-TIFF). The overlay is accessed at the slide level closest to each tile's mpp and the tile coordinates/extents are scaled to that level before reading.

The region of interest (ROI) geometry is treated in the same image space (resolution) as the underlying slide tiles. The region can be an arbitrary polygon. However, a bounding box of the region is used for reading overlay tiles and then masked to respect the region defined by provided overlay.

Unfortunately, at the moment we cannot use masked arrays directly in Ray Dataset. So instead of a numpy masked array, we provide the data and the mask as 2 separate arrays. The implementation is handled via TensorArray.

Parameters:

Name Type Description Default
roi BaseGeometry

The region of interest geometry.

required
overlay_path StringArray

A pyarrow array of whole-slide image paths for the overlays.

required
tile_x IntegerArray

A pyarrow array of tile x-coordinates.

required
tile_y IntegerArray

A pyarrow array of tile y-coordinates.

required
mpp_x FloatArray

A pyarrow array of physical resolutions (µm/px) of the underlying slide in X direction.

required
mpp_y FloatArray

A pyarrow array of physical resolutions (µm/px) of the underlying slide in Y direction.

required

Returns:

Type Description
Array

A pyarrow array of masked numpy arrays containing the read overlay tiles. - The first element is the tile data. - The second element is the mask (True for pixels outside the ROI).

Source code in ratiopath/tiling/overlays.py
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
@udf(return_dtype=DataType(np.ndarray))
def tile_overlay(
    roi: BaseGeometry,
    overlay_path: pa.StringArray,
    tile_x: pa.IntegerArray,
    tile_y: pa.IntegerArray,
    mpp_x: pa.FloatArray,
    mpp_y: pa.FloatArray,
) -> pa.Array:
    """Read overlay tiles for a batch of tiles.

    For each overlay path the corresponding whole-slide image is opened (OpenSlide or OME-TIFF).
    The overlay is accessed at the slide level closest to each tile's mpp and the tile
    coordinates/extents are scaled to that level before reading.

    The region of interest (ROI) geometry is treated in the same image space (resolution) as the underlying slide tiles.
    The region can be an arbitrary polygon. However, a bounding box of the region is used for reading overlay tiles
    and then masked to respect the region defined by provided overlay.

    Unfortunately, at the moment we cannot use masked arrays directly in Ray Dataset. So instead of a numpy masked array,
    we provide the data and the mask as 2 separate arrays. The implementation is handled via TensorArray.

    Args:
        roi: The region of interest geometry.
        overlay_path: A pyarrow array of whole-slide image paths for the overlays.
        tile_x: A pyarrow array of tile x-coordinates.
        tile_y: A pyarrow array of tile y-coordinates.
        mpp_x: A pyarrow array of physical resolutions (µm/px) of the underlying slide in X direction.
        mpp_y: A pyarrow array of physical resolutions (µm/px) of the underlying slide in Y direction.

    Returns:
        A pyarrow array of masked numpy arrays containing the read overlay tiles.
            - The first element is the tile data.
            - The second element is the mask (True for pixels outside the ROI).
    """
    overlays = _tile_overlay(roi, overlay_path, tile_x, tile_y, mpp_x, mpp_y)

    return pa.array(TensorArray([[overlay.data, overlay.mask] for overlay in overlays]))

tile_overlay_overlap(roi, overlay_path, tile_x, tile_y, mpp_x, mpp_y)

Calculate the overlap of each overlay tile with the region of interest (ROI).

For each overlay path the corresponding whole-slide image is opened (OpenSlide or OME-TIFF). The overlay is accessed at the slide level closest to each tile's mpp and the tile coordinates/extents are scaled to that level before reading.

The region of interest (ROI) geometry is treated in the same image space (resolution) as the underlying slide tiles. The region can be an arbitrary polygon.

The Pyarrow array that is used inside ray dataset stores data in array like dictionary. This results in all rows having same set of keys and missing keys are filled with Nones. Furthermore Pyarrow only support string keys in dictionaries. Therefore the keys in the resulting dictionary are string representations of the overlay values.

Parameters:

Name Type Description Default
roi BaseGeometry

The region of interest geometry.

required
overlay_path StringArray

A pyarrow array of whole-slide image paths for the overlays.

required
tile_x IntegerArray

A pyarrow array of tile x-coordinates.

required
tile_y IntegerArray

A pyarrow array of tile y-coordinates.

required
mpp_x FloatArray

A pyarrow array of physical resolutions (µm/px) of the underlying slide in X direction.

required
mpp_y FloatArray

A pyarrow array of physical resolutions (µm/px) of the underlying slide in Y direction.

required

Returns:

Type Description
MapArray

A pyarrow array of dictionaries mapping overlay values to their overlap fraction with the ROI.

Source code in ratiopath/tiling/overlays.py
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
@udf(return_dtype=DataType(dict))
def tile_overlay_overlap(
    roi: BaseGeometry,
    overlay_path: pa.StringArray,
    tile_x: pa.IntegerArray,
    tile_y: pa.IntegerArray,
    mpp_x: pa.FloatArray,
    mpp_y: pa.FloatArray,
) -> pa.MapArray:
    """Calculate the overlap of each overlay tile with the region of interest (ROI).

    For each overlay path the corresponding whole-slide image is opened (OpenSlide or OME-TIFF).
    The overlay is accessed at the slide level closest to each tile's mpp and the tile
    coordinates/extents are scaled to that level before reading.

    The region of interest (ROI) geometry is treated in the same image space (resolution) as the underlying slide tiles.
    The region can be an arbitrary polygon.

    The Pyarrow array that is used inside ray dataset stores data in array like dictionary.
    This results in all rows having same set of keys and missing keys are filled with Nones.
    Furthermore Pyarrow only support string keys in dictionaries. Therefore the keys in the
    resulting dictionary are string representations of the overlay values.

    Args:
        roi: The region of interest geometry.
        overlay_path: A pyarrow array of whole-slide image paths for the overlays.
        tile_x: A pyarrow array of tile x-coordinates.
        tile_y: A pyarrow array of tile y-coordinates.
        mpp_x: A pyarrow array of physical resolutions (µm/px) of the underlying slide in X direction.
        mpp_y: A pyarrow array of physical resolutions (µm/px) of the underlying slide in Y direction.

    Returns:
        A pyarrow array of dictionaries mapping overlay values to their overlap fraction with the ROI.
    """
    # The overlay is a masked array where the mask is True for pixels outside the ROI.
    overlay = _tile_overlay(roi, overlay_path, tile_x, tile_y, mpp_x, mpp_y)

    def overlap_fraction(overlay: np.ma.MaskedArray) -> dict[str, float]:
        """Calculate the overlap fraction of each unique value in the overlay."""
        return {
            # Pyarrow requires string keys in dictionaries
            str(value.item()): count.item() / overlay.count()
            for value, count in zip(
                *np.unique(overlay.compressed(), return_counts=True), strict=True
            )
        }

    overlap_vectorized = np.vectorize(overlap_fraction, otypes=[object])

    return pa.array(overlap_vectorized(overlay))