AI Vision & Perception
ClaudusBridge provides a layered perception system that gives AI clients real-time awareness of the Unreal Editor. As of 0.5.0, the canonical visual path is the WebRTC /preview stream (no PNG frame-view files), and structured perception tools return scene metadata directly as JSON without pixel I/O.
Layered Perception Pipeline
Visual Path (browser-based): Structured Path (no pixels):
http://localhost:3000/preview?bare=1 ai_perceive
| ai_vision_stream
WebRTC peer connection (auto) get_vision (raycast depth map)
| scan_360 (peripheral 360°)
Pixel Streaming 2 frames proximity_alert (collision sphere)
| get_cognitive_map (accumulated 2D)
Watcher sub-agent screenshots get_actors_in_view
| |
Visual ground truth ----------------> publish_agent_event
|
Primary agent's
agent_message queue
The visual path gives the AI "eyes" via Pixel Streaming 2 — multiple agents and humans can attach to the same stream simultaneously (PS2 default MaxPlayerCount=-1). The structured path gives precise 3D information without pixel overhead, ideal for high-frequency spatial reasoning.
Visual Perception — /preview
Open http://localhost:3000/preview in any modern browser to see the Unreal Editor live through WebRTC. The viewer auto-connects (AutoConnect=true&AutoPlayVideo=true&StartVideoMuted=true&HoveringMouse=true) — no user gesture required.
Focus mode — /preview?bare=1
Strips all chrome (header, padding, frame border, HUD chips) so the iframe fills 100vw × 100vh and the stream renders at native browser-viewport resolution. Recommended target for visual watcher sub-agents — it removes anything that isn't the editor itself.
Multi-viewer concurrency
Pixel Streaming 2's signaling default MaxPlayerCount=-1 allows multiple humans and agents to view the same editor concurrently without quotas or session takeover.
The Watcher Sub-Agent
Every connecting AI client must spawn a visual watcher sub-agent using the canonical prompt at Saved/ClaudusBridge/watcher_subagent.prompt.md. The watcher:
- Opens
/preview?bare=1through Claude in Chrome (or equivalent browser-automation MCP) - Subscribes to ClaudusBridge events as session
visual-watcher - Polls events every ~12 seconds, takes screenshots when notable events arrive
- Compares the screenshot with the previous baseline (panel, tab, asset selection, viewport content, status bar, error highlights, compile state)
- Publishes observations on the
agent_messagebus, channelwatcher.observation, targetprimary
The primary agent drains the queue with poll_editor_events between user messages and uses the watcher's summaries as visual ground truth. This closes the verification gap between "tool returned success" and "the visual reality matches the intent".
For session-long persistence beyond chat-runtime sub-agent timeouts, run the Python daemon Saved/ClaudusBridge/claudusbridge_watcher.py in addition to the in-conversation watcher — it runs unattended with three pattern rules (compile loops, PIE crash-on-start, actor leaks).
Editor Window Capture
capture_editor_window
Captures any editor window via Slate — not just the 3D viewport. See Widget Designer, Blueprint Graph, Material Editor, Content Browser, or any tab. Useful when the watcher needs to inspect a specific Unreal panel rather than the active viewport.
5 modes available:
| Mode | Description | Extra Params |
|---|---|---|
list | List all open windows (title, size, position, is_active) | — |
active | Capture the focused window | filename |
all | Capture every open window simultaneously | — |
by_title | Capture a window matching a title substring | title, filename |
by_index | Capture a window by its index | index, filename |
Example: List all windows
curl -X POST http://localhost:3000/mcp \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{
"name":"capture_editor_window","arguments":{"mode":"list"}}}'
Response:
{
"windows": [
{"index": 0, "title": "OCPRO - Unreal Editor", "width": 2578, "height": 1398, "is_active": false},
{"index": 1, "title": "WBP_TestHUD", "width": 2578, "height": 1398, "is_active": true}
]
}
Example: Capture Widget Designer
curl -X POST http://localhost:3000/mcp \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{
"name":"capture_editor_window","arguments":{"mode":"by_title","title":"WBP_TestHUD"}}}'
Structured Perception — ai_perceive
Combines camera, outliner, proximity sphere, collision sphere, look_at, ground trace, visible actors, and change detection in a single call:
curl -X POST http://localhost:3000/mcp \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{
"name":"ai_perceive","arguments":{}}}'
Engine Data Stream — ai_vision_stream
Returns lightweight engine data for continuous tracking without pixel capture:
- Camera position, rotation, FOV
- Outliner snapshot
- Visible actors (name, class, distance)
- Proximity sphere objects (approaching)
- Collision sphere objects (touching)
- Ground trace
- Change detection (new/removed actors since last call)
Pixel-Free Vision — get_vision, scan_360, proximity_alert
Microsecond-class spatial reasoning via screen-space raycasting (no screenshots, no images):
get_vision— Real-time viewport vision. Builds a depth map with actor identification per region, gap analysis, navigation suggestion, motion detection, blob detection, texture classification, pattern recognition. Feeds the cognitive spatial map.scan_360— 360-degree peripheral scan. 36 rays at 10° intervals around the camera at the current height without moving. Returns distances, actors, blocked status per direction, plus open directions, nearest obstacle, escape routes.proximity_alert— Ultra-fast collision proximity. 26 rays in a sphere (8 horizontal + 8 diagonal-up + 8 diagonal-down + up + down). Returns threat_level (safe/alert/DANGER/CRITICAL), per-actor distance and direction, and an evasion vector with ready-to-usenavigate()parameters.
Cognitive Map — get_cognitive_map
Top-down 2D map built from accumulated get_vision() calls. Each cell stores height, surface type, actor label, and exploration status. Supports ASCII-art visualization or structured grid output. Use it to understand the spatial layout of areas already scanned.
NNE GPU Inference
Optional. Import an .onnx model via the Content Browser (it becomes a UNNEModelData asset), then call load_nne_model to enable GPU-accelerated neural vision through DirectML Tensor Cores.
| Tool | Purpose |
|---|---|
load_nne_model | Load an ONNX model — afterwards ai_vision returns real NNE inference instead of heuristic color analysis |
get_nne_status | Returns model loaded state, runtime, input dimensions, class count, and available NNE runtimes |
list_nne_runtimes | Lists registered runtimes: NNERuntimeORTDml (GPU), NNERuntimeORTCpu (CPU) |
Best Practices
/previewis the canonical visual path — Always open it for the watcher sub-agent. PNG frame-view files are no longer generated.- Spawn the watcher early — Subscribe
primarytoagent_message, then spawn the watcher sub-agent using the canonical prompt atSaved/ClaudusBridge/watcher_subagent.prompt.md. This is mandatory canonical setup, not optional. - Drain
poll_editor_eventsbetween messages — Read the watcher'swatcher.observationsummaries before answering any state-of-the-editor question. Tool success is supplementary; visual reality is ground truth. - Use structured tools for high-frequency reasoning —
get_vision,scan_360,proximity_alertare microsecond-class. Use them for navigation, collision avoidance, and tight spatial loops. - Use
capture_editor_windowfor non-viewport panels — Widget Designer, Blueprint Graph, Material Editor. The viewport stream alone won't show those. - Call
record_feedbackwhen the watcher catches a discrepancy — It accumulates infeedback.jsonland broadcasts onfeedback.<kind>so live triage agents can react.