Post-Segmentation
After segmentation, the raw masks are refined and converted into polygons to create labels and enable mapping of gene transcripts in newly segmented cells (see Outputs for cells and gene transcripts table). This is done in three main steps:
Masks
Filter masks using
filter_cell_by_regionprops(removes artifacts and irregular cells based on multiple morphological criteria).def filter_cell_by_regionprops( seg_masks, max_eccentricity=None, min_area='median_div2', min_absolute_area=50, max_area=None, min_solidity=None, max_solidity=None, min_extent=None, max_extent=None, min_compactness=None, max_convexity_deficit=0.20, max_perimeter_area_ratio=None, area_std_filter=None, save_debug=False, verbose=True): """Filter segmented cell masks by multiple morphological criteria. Analyzes region properties of segmented masks and filters cells based on shape metrics including area, eccentricity, solidity, extent, compactness, and convexity deficit. Provides detailed statistics and rejection reasons when verbose mode is enabled. Args: seg_masks (np.ndarray): Input segmentation masks with integer labels. max_eccentricity (float, optional): Maximum eccentricity (elongation) threshold. Range: 0-1, where 0=circle, 1=line. Typical: 0.95. min_area (str or float, optional): Minimum area filter. Options: - 'median': Use median(areas) as threshold - 'median_div2': Use median(areas)/2 as threshold (default) - 0-100: Use percentile of area distribution - Absolute value: Use as pixel count threshold min_absolute_area (int, optional): Hard minimum area to remove tiny artifacts regardless of distribution. Defaults to 50. max_area (float, optional): Maximum area threshold in pixels. min_solidity (float, optional): Minimum solidity (area/convex_area). Range: 0-1. Filters highly concave cells. Typical: 0.7-0.9. max_solidity (float, optional): Maximum solidity. Filters perfectly convex cells (unrealistic). Typical: 0.98-0.995. min_extent (float, optional): Minimum extent (area/bounding_box_area). Range: 0-1. Filters sparse cells. Typical: 0.3-0.5. max_extent (float, optional): Maximum extent. Filters perfectly rectangular cells. Typical: 0.9-0.95. min_compactness (float, optional): Minimum compactness (Polsby-Popper). Compactness = 4π * area / perimeter². Range: 0-1, where 1=perfect circle. Filters irregular shapes. Typical: 0.5-0.7 for cells. max_convexity_deficit (float, optional): Maximum convexity deficit. Deficit = 1 - solidity = (convex_area - area) / convex_area. Range: 0-1, where 0=perfectly convex. Filters concave cells like half-moons. Typical: 0.10-0.20. Defaults to 0.20. max_perimeter_area_ratio (float, optional): Maximum perimeter/sqrt(area) ratio. Normalized perimeter measure. Filters cells with excessive perimeter. Typical cells: 3.5-5.0, irregular cells: >6.0. area_std_filter (float, optional): Remove area outliers beyond N standard deviations from mean. E.g., 3.0 removes outliers beyond 3σ. save_debug (bool, optional): Whether to save debug image "cleaned_mask.png". Defaults to False. verbose (bool, optional): Whether to print filtering statistics and rejection breakdown. Defaults to True. Returns: np.ndarray: Cleaned and relabeled mask array with consecutive integer labels starting from 1. Zero indicates background. """
Shape Metrics Explained:
Solidity = area / convex_area
~1.0: Very smooth, convex (circle, ellipse) - may be unrealistic
0.7-0.9: Slightly irregular - typical realistic cells
<0.7: Highly concave/irregular - likely artifacts
Extent = area / bounding_box_area
~1.0: Fills bounding box (square, rectangle)
0.5-0.8: Typical cell shapes
<0.3: Very sparse/thin - likely artifacts
Eccentricity = elongation measure
0: Perfect circle
0.95: Very elongated
>0.95: Extremely elongated - may be artifacts
Compactness = 4π * area / perimeter² (Polsby-Popper)
1.0: Perfect circle
0.6-0.8: Typical cells
<0.5: Irregular/elongated shapes (including half-moons)
Convexity Deficit = 1 - solidity
0: Perfectly convex
0.05-0.15: Slight irregularity (normal cells)
>0.20: Significant concavity (half-moon cells, artifacts)
Perimeter/Area Ratio = perimeter / sqrt(area)
3.5-5.0: Typical cells
>6.0: Irregular perimeter (fragmented, concave cells)
Note: The new metrics (compactness, convexity deficit) are especially effective for removing half-moon/crescent-shaped cells.
Polygons
Convert masks into polygons using
masks_to_polygonsand upscale back to original resolution. Then wrap the polygons into aGeoDataFrameand aShapesModel.@staticmethod def masks_to_polygons(seg_masks, factor_rescale): """Convert segmentation masks into scaled polygons. Extracts contours from each labeled region in the segmentation mask, converts them to Shapely polygons, and rescales coordinates to match the original image resolution. Args: seg_masks (np.ndarray): Labeled segmentation masks where each unique integer represents a different cell. factor_rescale (int): Factor by which to upscale polygon coordinates. E.g., if masks are 4x smaller than original, use factor_rescale=4. Returns: list[shapely.geometry.Polygon]: List of upscaled polygon geometries, one per valid segmented region. Invalid or empty polygons are excluded. Note: - Finds the longest contour for each region - Ensures contours are closed (first point = last point) - Buffers polygons by 0 to fix any self-intersections - If factor_rescale=0, no scaling is applied """ upscale_polygons = [] for region in regionprops(seg_masks): mask = (seg_masks == region.label).astype(int) contours = find_contours(np.array(mask)) if contours: contour = max(contours, key=lambda x: len(x)) # ensure the contour is closed (first point = last point) if not ((contour[0] == contour[-1]).all()): contour = np.vstack([contour, contour[0]]) poly = Polygon(contour[:, [1, 0]]).buffer(0) # only keep polygons that are valid and not empty if poly.is_valid and not poly.is_empty: upscale_polygons.append(poly) if factor_rescale != 0: upscale_polygons = [ scale(poly, xfact=factor_rescale, yfact=factor_rescale, origin=(0, 0)) for poly in upscale_polygons ] return upscale_polygons
def process_masks_to_shapes(self): """Filter masks, convert to polygons, and create shapes model. Applies morphological filtering to segmentation masks, converts filtered masks to polygon geometries, mirrors coordinates, and creates a SpatialData-compatible shapes model with cell IDs. Returns: tuple: Contains three elements: - masks (np.ndarray): Filtered and relabeled segmentation masks. - gdf_polygons (gpd.GeoDataFrame): GeoDataFrame with cell_id and geometry columns. - shapes_model (gpd.GeoDataFrame): SpatialData shapes model with region names and geometries. Raises: RuntimeError: If no valid polygons are extracted from masks after filtering. Note: - Filters masks using filter_cell_by_regionprops() - Converts to polygons and upscales by factor_rescale - Mirrors y-coordinates and shifts to ensure all y >= 0 - Sets self.masks, self.gdf_polygons, and self.shapes_model """ # Filter masks self.masks = self.filter_cell_by_regionprops(self.masks) # Convert masks → polygons polygons = self.masks_to_polygons(self.masks, self.factor_rescale) if not polygons: raise RuntimeError( "No polygons extracted from mask — check your segmentation.") # Create shapes model shapes_df = gpd.GeoDataFrame({ "geometry": polygons, "region": [f"cell_{i+1}" for i in range(len(polygons))] }) shapes_df.set_geometry("geometry", inplace=True) self.shapes_model = ShapesModel.parse(shapes_df) # Apply coordinate transformations self.shapes_model["geometry"] = self.shapes_model["geometry"].apply( self.mirror_y0) miny = self.shapes_model.total_bounds[1] if miny < 0: self.shapes_model["geometry"] = self.shapes_model["geometry"].apply( lambda g: translate(g, xoff=0, yoff=-miny)) self.gdf_polygons = self.shapes_model[["region", "geometry"]].rename( columns={"region": "cell_id"}) return self.masks, self.gdf_polygons, self.shapes_model
Labels
Create labels by resizing masks to the full original resolution and converting to a
Labels2DModelfor downstream analysis.def process_labels(self): """Upscale masks and create Labels2DModel. Upscales the filtered segmentation masks to the original image resolution and creates a SpatialData-compatible Labels2DModel for visualization and further analysis. Returns: tuple: Contains two elements: - masks (np.ndarray): Original filtered masks (not upscaled). - labels_model (sd.models.Labels2DModel): Upscaled labels model with name set to self.label_name. Note: - Uses nearest-neighbor interpolation (order=0) to preserve integer labels - Upscales by factor_rescale to match original image dimensions - Sets self.labels_model attribute with specified label name """ rescale_size = tuple(map(lambda x: x * self.factor_rescale, self.masks.shape)) masks_upscale = resize( self.masks, rescale_size, order=0, preserve_range=True, anti_aliasing=False, ).astype(np.int32) self.labels_model = Labels2DModel.parse( data=np.squeeze(masks_upscale), dims=("y", "x"), ) self.labels_model.name = self.label_name return self.masks, self.labels_model
Coordinate Transformation
The pipeline includes a coordinate mirroring step to align the segmentation coordinates with the spatial data coordinate system:
@staticmethod
def mirror_y0(geom):
"""Mirror a polygon geometry across the x-axis (y=0).
Flips polygon coordinates vertically by negating y-values. Used to
correct coordinate system orientation differences between image and
spatial coordinate systems.
Args:
geom (shapely.geometry.Polygon or other): Input geometry to mirror.
Returns:
shapely.geometry: Mirrored geometry of the same type as input.
If input is not a Polygon, returns input unchanged.
Example:
Point (x, y) becomes (x, -y)
"""
return type(geom)([
(x, -y) for x, y in geom.exterior.coords
]) if geom.geom_type == "Polygon" else geom
Advanced Filtering Examples
Basic Filtering
# Use default settings (median_div2 area, convexity deficit < 0.20)
cleaned_masks = filter_cell_by_regionprops(masks)
Strict Filtering (Remove More Artifacts)
# Remove elongated cells, irregular shapes, and half-moons
cleaned_masks = filter_cell_by_regionprops(
masks,
max_eccentricity=0.90, # Remove very elongated cells
min_compactness=0.55, # Remove irregular shapes
max_convexity_deficit=0.15, # Remove concave/half-moon cells
max_perimeter_area_ratio=5.0, # Remove cells with irregular perimeter
verbose=True
)
Permissive Filtering (Keep More Cells)
# Keep more cells, only remove extreme artifacts
cleaned_masks = filter_cell_by_regionprops(
masks,
min_area='median_div2',
min_absolute_area=30, # Lower minimum size
max_convexity_deficit=0.30, # Allow more concavity
verbose=True
)
Custom Area Filtering
# Use percentile-based area filtering
cleaned_masks = filter_cell_by_regionprops(
masks,
min_area=10, # Keep cells above 10th percentile
max_area=2000, # Remove very large cells
area_std_filter=2.5, # Remove area outliers beyond 2.5σ
verbose=True
)
Debugging
# Save debug image and see detailed statistics
cleaned_masks = filter_cell_by_regionprops(
masks,
save_debug=True,
verbose=True
)
# Creates "cleaned_mask.png" for visual inspection
Understanding Filter Statistics
When verbose=True, the function prints detailed statistics:
Shape Statistics:
Area distribution (range, mean, median, std)
Eccentricity distribution
Compactness distribution (key for identifying irregular shapes)
Solidity and convexity deficit (key for identifying concave cells)
Perimeter/area ratio (key for identifying fragmented cells)
Extent distribution (bounding box filling)
Filtering Results:
Number and percentage of regions that passed
Statistics for valid regions only
Rejection breakdown by category (e.g., “High Convexity Deficit: 45 (12.3%)”)
This helps you understand:
What types of cells are being filtered out
Whether your thresholds are too strict or too lenient
The shape characteristics of your dataset
Performance Notes
Vectorized operations: The filter uses vectorized operations and lookup tables for efficient relabeling
Memory efficient: Processes regions without creating intermediate copies
Optimized for large datasets: Can handle thousands of cells efficiently
Debug mode: Use
save_debug=Trueonly when needed, as it creates additional file I/O