Skip to main content
Version: 0.5.0

AI Vision & Perception

ClaudusBridge provides a layered perception system that gives AI clients real-time awareness of the Unreal Editor. As of 0.5.0, the canonical visual path is the WebRTC /preview stream (no PNG frame-view files), and structured perception tools return scene metadata directly as JSON without pixel I/O.


Layered Perception Pipeline

Visual Path (browser-based):           Structured Path (no pixels):
http://localhost:3000/preview?bare=1 ai_perceive
| ai_vision_stream
WebRTC peer connection (auto) get_vision (raycast depth map)
| scan_360 (peripheral 360°)
Pixel Streaming 2 frames proximity_alert (collision sphere)
| get_cognitive_map (accumulated 2D)
Watcher sub-agent screenshots get_actors_in_view
| |
Visual ground truth ----------------> publish_agent_event
|
Primary agent's
agent_message queue

The visual path gives the AI "eyes" via Pixel Streaming 2 — multiple agents and humans can attach to the same stream simultaneously (PS2 default MaxPlayerCount=-1). The structured path gives precise 3D information without pixel overhead, ideal for high-frequency spatial reasoning.


Visual Perception — /preview

Open http://localhost:3000/preview in any modern browser to see the Unreal Editor live through WebRTC. The viewer auto-connects (AutoConnect=true&AutoPlayVideo=true&StartVideoMuted=true&HoveringMouse=true) — no user gesture required.

Focus mode — /preview?bare=1

Strips all chrome (header, padding, frame border, HUD chips) so the iframe fills 100vw × 100vh and the stream renders at native browser-viewport resolution. Recommended target for visual watcher sub-agents — it removes anything that isn't the editor itself.

Multi-viewer concurrency

Pixel Streaming 2's signaling default MaxPlayerCount=-1 allows multiple humans and agents to view the same editor concurrently without quotas or session takeover.


The Watcher Sub-Agent

Every connecting AI client must spawn a visual watcher sub-agent using the canonical prompt at Saved/ClaudusBridge/watcher_subagent.prompt.md. The watcher:

  1. Opens /preview?bare=1 through Claude in Chrome (or equivalent browser-automation MCP)
  2. Subscribes to ClaudusBridge events as session visual-watcher
  3. Polls events every ~12 seconds, takes screenshots when notable events arrive
  4. Compares the screenshot with the previous baseline (panel, tab, asset selection, viewport content, status bar, error highlights, compile state)
  5. Publishes observations on the agent_message bus, channel watcher.observation, target primary

The primary agent drains the queue with poll_editor_events between user messages and uses the watcher's summaries as visual ground truth. This closes the verification gap between "tool returned success" and "the visual reality matches the intent".

For session-long persistence beyond chat-runtime sub-agent timeouts, run the Python daemon Saved/ClaudusBridge/claudusbridge_watcher.py in addition to the in-conversation watcher — it runs unattended with three pattern rules (compile loops, PIE crash-on-start, actor leaks).


Editor Window Capture

capture_editor_window

Captures any editor window via Slate — not just the 3D viewport. See Widget Designer, Blueprint Graph, Material Editor, Content Browser, or any tab. Useful when the watcher needs to inspect a specific Unreal panel rather than the active viewport.

5 modes available:

ModeDescriptionExtra Params
listList all open windows (title, size, position, is_active)
activeCapture the focused windowfilename
allCapture every open window simultaneously
by_titleCapture a window matching a title substringtitle, filename
by_indexCapture a window by its indexindex, filename

Example: List all windows

curl -X POST http://localhost:3000/mcp \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{
"name":"capture_editor_window","arguments":{"mode":"list"}}}'

Response:

{
"windows": [
{"index": 0, "title": "OCPRO - Unreal Editor", "width": 2578, "height": 1398, "is_active": false},
{"index": 1, "title": "WBP_TestHUD", "width": 2578, "height": 1398, "is_active": true}
]
}

Example: Capture Widget Designer

curl -X POST http://localhost:3000/mcp \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{
"name":"capture_editor_window","arguments":{"mode":"by_title","title":"WBP_TestHUD"}}}'

Structured Perception — ai_perceive

Combines camera, outliner, proximity sphere, collision sphere, look_at, ground trace, visible actors, and change detection in a single call:

curl -X POST http://localhost:3000/mcp \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{
"name":"ai_perceive","arguments":{}}}'

Engine Data Stream — ai_vision_stream

Returns lightweight engine data for continuous tracking without pixel capture:

  • Camera position, rotation, FOV
  • Outliner snapshot
  • Visible actors (name, class, distance)
  • Proximity sphere objects (approaching)
  • Collision sphere objects (touching)
  • Ground trace
  • Change detection (new/removed actors since last call)

Pixel-Free Vision — get_vision, scan_360, proximity_alert

Microsecond-class spatial reasoning via screen-space raycasting (no screenshots, no images):

  • get_vision — Real-time viewport vision. Builds a depth map with actor identification per region, gap analysis, navigation suggestion, motion detection, blob detection, texture classification, pattern recognition. Feeds the cognitive spatial map.
  • scan_360 — 360-degree peripheral scan. 36 rays at 10° intervals around the camera at the current height without moving. Returns distances, actors, blocked status per direction, plus open directions, nearest obstacle, escape routes.
  • proximity_alert — Ultra-fast collision proximity. 26 rays in a sphere (8 horizontal + 8 diagonal-up + 8 diagonal-down + up + down). Returns threat_level (safe/alert/DANGER/CRITICAL), per-actor distance and direction, and an evasion vector with ready-to-use navigate() parameters.

Cognitive Map — get_cognitive_map

Top-down 2D map built from accumulated get_vision() calls. Each cell stores height, surface type, actor label, and exploration status. Supports ASCII-art visualization or structured grid output. Use it to understand the spatial layout of areas already scanned.


NNE GPU Inference

Optional. Import an .onnx model via the Content Browser (it becomes a UNNEModelData asset), then call load_nne_model to enable GPU-accelerated neural vision through DirectML Tensor Cores.

ToolPurpose
load_nne_modelLoad an ONNX model — afterwards ai_vision returns real NNE inference instead of heuristic color analysis
get_nne_statusReturns model loaded state, runtime, input dimensions, class count, and available NNE runtimes
list_nne_runtimesLists registered runtimes: NNERuntimeORTDml (GPU), NNERuntimeORTCpu (CPU)

Best Practices

  1. /preview is the canonical visual path — Always open it for the watcher sub-agent. PNG frame-view files are no longer generated.
  2. Spawn the watcher early — Subscribe primary to agent_message, then spawn the watcher sub-agent using the canonical prompt at Saved/ClaudusBridge/watcher_subagent.prompt.md. This is mandatory canonical setup, not optional.
  3. Drain poll_editor_events between messages — Read the watcher's watcher.observation summaries before answering any state-of-the-editor question. Tool success is supplementary; visual reality is ground truth.
  4. Use structured tools for high-frequency reasoningget_vision, scan_360, proximity_alert are microsecond-class. Use them for navigation, collision avoidance, and tight spatial loops.
  5. Use capture_editor_window for non-viewport panels — Widget Designer, Blueprint Graph, Material Editor. The viewport stream alone won't show those.
  6. Call record_feedback when the watcher catches a discrepancy — It accumulates in feedback.jsonl and broadcasts on feedback.<kind> so live triage agents can react.