Adding Annotation Coverage to Tiles
This tutorial shows you how to enrich your tiling pipeline with annotation data—such as regions of interest, cancer boundaries, or other expert-marked areas. You’ll learn how to parse annotation files, compute coverage for each tile, and add this information to your dataset using ratiopath
.
Step 1: Understanding Annotation Coverage
Annotation coverage quantifies how much of a tile overlaps with annotated regions (e.g., cancerous tissue). This is useful for downstream analysis, training machine learning models, or filtering tiles based on biological relevance.
Step 2: Parsing Annotation Files
To associate tiles with annotation data, you first need to parse the annotation files. ratiopath
provides parsers for common formats such as ASAP XML and GeoJSON.
from ratiopath.parsers import ASAPParser
annotation_path = row["path"].replace(".mrxs", ".xml")
parser = ASAPParser(annotation_path)
annotations = list(parser.get_polygons(name="...", part_of_group="..."))
- Replace
.mrxs
with your slide file extension as needed. - Use appropriate parser for your annotation format.
- Note: The
name
andpart_of_group
arguments are regular expressions used to filter annotations based on theirName
andPartOfGroup
attributes.
Step 3: Computing Tile Coverage
For each tile, define its region of interest (ROI) as a polygon. Then, calculate the fraction of the tile area covered by annotation polygons. Note: In this example, the ROI is set to cover the entire tile, but you can use any geometry inside the tile as the ROI (e.g., a subregion, mask, or shape of interest). The ROI is always defined relative to the tile's coordinate system.
from shapely import Polygon
roi = Polygon([
(0, 0),
(row["tile_extent_x"], 0),
(row["tile_extent_x"], row["tile_extent_y"]),
(0, row["tile_extent_y"]),
])
Step 4: Attaching Coverage to Tile Metadata
Use the tile_annotations
function to compute which annotation polygons overlap with the tile and return their intersection polygons. Add the coverage value to each tile row.
from ratiopath.tiling import tile_annotations
def tiling_with_annotations(row: dict[str, Any]) -> list[dict[str, Any]]:
annotation_path = row["path"].replace(".mrxs", ".xml")
parser = ASAPParser(annotation_path)
annotations = list(parser.get_polygons(name="...", part_of_group="..."))
roi = Polygon([
(0, 0),
(row["tile_extent_x"], 0),
(row["tile_extent_x"], row["tile_extent_y"]),
(0, row["tile_extent_y"]),
])
coordinates = np.array(list(
grid_tiles(
slide_extent=(row["extent_x"], row["extent_y"]),
tile_extent=(row["tile_extent_x"], row["tile_extent_y"]),
stride=(row["stride_x"], row["stride_y"]),
last="keep",
)
))
return [
{
"tile_x": coordinates[i, 0],
"tile_y": coordinates[i, 1],
"path": row["path"],
"slide_id": row["id"],
"level": row["level"],
"tile_extent_x": row["tile_extent_x"],
"tile_extent_y": row["tile_extent_y"],
"coverage": polygon.area / roi.area,
}
for i, polygon in enumerate(
tile_annotations(
annotations,
roi,
coordinates[:, 0],
coordinates[:, 1],
row["downsample"],
)
)
]
- Each output row contains tile coordinates, metadata, and coverage value.
Step 5: Integrating with the Pipeline
Apply the annotation coverage function to your tiles dataset using Ray Data’s flat_map
or map_batches
:
tiles = slides.flat_map(tiling_with_annotations)
You can now filter, group, or analyze tiles based on their annotation coverage.
Notes and Next Steps
- The
ASAPParser
can be replaced with other parsers (e.g.,GeoJSONParser
) depending on your annotation format. - You can extend this approach to compute coverage for multiple annotation classes, or other metrics (e.g., distance to nearest annotation).
By adding annotation coverage, your pipeline produces richer tile metadata, enabling more targeted downstream workflows such as supervised learning, tissue quantification, or quality control.