scene_service.ingest.perception_concept_graphs¶

ConceptGraphs perception — the real one, not just the detection frontend.

This module owns a persistent conceptgraph.slam.slam_classes.MapObjectList and runs the canonical concept-graphs merge pipeline every tick:

YOLO-World detect (open-vocab via CLIP text encoder)
│ ▼

MobileSAM (bbox-prompted masks)
│ ▼

open_clip ViT-B-32 → per-detection 512-d image feature
│ ▼

detections_to_obj_pcd_and_bbox (depth + masks + cam_K + trans_pose

→ per-detection o3d.PointCloud + OrientedBoundingBox in map frame)

│ ▼

compute_spatial_similarities (3D pcd-overlap M×N matrix) compute_visual_similarities (CLIP cosine similarity M×N matrix) aggregate_similarities (sim_sum)

│ ▼

merge_detections_to_objects (matched → merge_obj2_into_obj1
with EMA pose/extent + pcd union; unmatched → new map object)

This is NOT the old “Hungarian on class+spatial” code that lived in scene/state/data_assoc.py. That path over-segmented heavily because it couldn’t see visual features and treated cabinet/shelf as separate even when CLIP would say they’re the same physical thing.

The persistent MapObjectList is the source of truth. Every tick we project it back into scene’s ObjectRegistry so the existing web UI / MCP API keep working without changes.

Periodic cleanup runs denoise_objects + merge_overlap_objects from concept-graphs to GC duplicates that escape the per-tick merge.

Classes

ConceptGraphsDetector(*, rgb_fetcher_msg, ...)

Per-frame detector that runs the canonical concept-graphs merge pipeline.

class scene_service.ingest.perception_concept_graphs.ConceptGraphsDetector(*, rgb_fetcher_msg: Callable[[], Any | None], depth_fetcher_msg: Callable[[], Any | None], camera_info_fetcher: Callable[[], _CamIntrinsics | None], chassis_pose_fn: Callable[[], tuple[float, float, float, float] | None], on_detections: Callable[[list[Detection]], Awaitable[None]], registry: ObjectRegistry, world_frame_fn: Callable[[], str] | None = None, period_s: float = 0.6, confidence_threshold: float = 0.3, camera_height_m: float = 1.1, max_detections: int = 30, yolo_weights_path: str | None = None, sam_weights_path: str | None = None, clip_model_name: str | None = None, clip_pretrained: str | None = None, cfg_overrides: dict | None = None, hub: Any = None, camera_frame: str = 'head_front_camera_rgb_optical_frame')[source]¶

Bases: object

Per-frame detector that runs the canonical concept-graphs merge pipeline. Owns a persistent MapObjectList and projects it into scene’s ObjectRegistry once per tick so the web UI/MCP API stay consistent.

on_detections is kept for backwards compat but no longer used — we write to the registry directly via _project_to_registry. The caller still passes a registry via this side channel.

export_3d_snapshot(max_points_per_obj: int = 256) → dict[source]¶

async start() → None[source]¶

async stop() → None[source]¶