Skip to main content
Version: 0.6.3

Self-Healing Provider Pipeline

ClaudusBridge talks to the active LLM through a small provider runtime — a Node process that handles auth + routing for Claude / ChatGPT / Gemini-CLI. The runtime listens on a port that changes every time it starts. The editor learns that port at boot via a tiny sentinel file and caches it inside BaseURL.

Before 0.6.1, if the runtime crashed, was killed manually, or got upgraded out of band, the editor kept POSTing to the dead port forever (Network error (no response from http://127.0.0.1:<old-port>/v1/messages)) until UE itself was restarted. 0.6.1 → 0.6.3 turned that into a self-healing pipeline. This guide explains the moving parts.

TL;DR

If the provider runtime restarts out of band — for any reason — you don't need to do anything. The next chat request silently swaps the URL via dispatch.json mtime detection and goes to the live port. The retry-after-failure logic stays as a safety net. You should never see libcurl error: 7 in the editor log on 0.6.3+.


The sentinel — dispatch.json

When the provider runtime starts up, it writes a tiny JSON file at:

<LOCALAPPDATA>/ClaudusBridge/ProviderRuntime/dispatch.json
{
"port": 52523,
"upstreamPort": 52522,
"pid": 29936,
"runtime": "claudus-provider-runtime",
"routeLayer": "claudus-public-model-router-v2",
"startedAt": "2026-05-15T07:14:33.614Z",
"updatedAt": "2026-05-15T07:14:34.260Z",
"installRoot": "C:\\Users\\…\\ClaudusBridge\\ProviderRuntime"
}

The file is rewritten on every fresh runtime startup. The Node process closes the handle immediately after writing, so the editor can read it concurrently without lock contention.

ClaudusBridge's auto-login at editor startup reads this file to discover the runtime port. The same file is the basis of the 0.6.x self-healing pipeline below.


1. Proactive refresh (0.6.3 — the primary path)

At the entry of every chat-send path (the synchronous ResolveAndAppendReply used by ask_claudus / submit_chat_sync, and the async ResolveAndAppendReplyAsync used by SubmitHumanMessage from the Output Log), the editor calls:

FCBClaudusAI::EnsureFreshProviderRuntimeURL();

That helper does one cheap stat() of dispatch.json (single-digit microseconds on local SSD). If the mtime advanced past the last reconciled value, it re-parses the file. If the new port differs from the cached BaseURL, it swaps both BaseURL and ProviderRuntimeRootURL in place and updates LastSeenDispatchMtimeTicks.

The request then goes out against the fresh port. libcurl never touches the dead socket. No 2-second timeout, no LogHttp warning block, no visible recovery message.

The only visible signal is a single LogTemp: Display line:

LogTemp: Display: [ClaudusBridge] dispatch.json mtime advanced;
swapping cached provider runtime URL http://127.0.0.1:50662/v1/messages
-> http://127.0.0.1:52523/v1/messages before issuing request.

If the mtime hasn't moved, the helper is a no-op (just the stat() cost). Steady-state overhead is negligible.


2. Retry-after-failure (0.6.1 + 0.6.2 — the safety net)

The proactive refresh covers the common case. The safety net covers the rare race where the runtime swap happens between the stat() and the actual request:

PathWhen safety net firesWhat it does
Dashboard auto-connect (ConnectProviderRuntimeAsync)Connect to runtime URL fails with bConnectedOk=false (libcurl couldn't connect)Re-read dispatch.json via TryRediscoverProviderRuntimeURL(); if the port differs, reissue with retries-1 against the new URL. The original OnComplete is handed down so callers see one terminal result. Default retries = 1.
Synchronous chat (ResolveAndAppendReplyCallAnthropic)CallAnthropic returns HttpCode == 0 against the local runtimeRe-read dispatch.json, swap BaseURL, re-call CallAnthropic once. Single fully-resolved answer to the caller.
Async chat (IssueDashboardProviderRequest)Lambda receives bConnectedOk=false while in ClaudusRuntime modeReissue against the rediscovered URL using the same "(thinking…)" placeholder. The user sees a slightly longer thinking-spinner instead of an error chat entry.

When the safety net fires, you see a system chat entry tagged provider-connect-rediscover (auto-connect path) or a LogTemp: Display line (sync/async chat paths) announcing the swap.

The retry counter prevents infinite loops: each path retries at most once.


3. CBDesktopAuthBridge::InvalidateDispatch()

The DesktopAuth layer (the dashboard auth/login bridge) memoizes the dispatch port across calls, with a TTL check. If you've upgraded the runtime in the middle of a session and want to force the next DiscoverDispatch() to re-read the sentinel from disk regardless of TTL, call:

DesktopAuth->InvalidateDispatch();

This drops every memoization keyed on the old port (bDispatchAvailable, DispatchPort, ControlBaseURL, LastRefreshSeconds, LastDiscoverySeconds). The next read goes back to disk.

The auto-connect retry path (path 1 in the safety net table above) calls this automatically inside the cmd-auto-connect-error callback after ConnectProviderRuntimeAsync has already burned its single retry. So the next user message picks up the fresh port at the DesktopAuth layer too.

For most users this helper is internal plumbing — you don't call it directly. It exists for the recovery sequence and for any external tooling that wants to force a clean port-rediscovery without relying on a failed request.


What you'll see in the Output Log

Happy path (no runtime restart)

Just the normal chat flow. No LogTemp: Display: ... swapping ... lines, no LogHttp: Warning blocks.

Runtime restarted out of band

LogTemp: Display: [ClaudusBridge] dispatch.json mtime advanced;
swapping cached provider runtime URL http://127.0.0.1:50662/v1/messages
-> http://127.0.0.1:52523/v1/messages before issuing request.

One line per restart, on the next chat request. After that, steady-state silence again.

Rare race (proactive refresh raced the runtime restart)

LogHttp: Warning: ... POST http://127.0.0.1:50662/v1/messages completed with reason 'ConnectionError' after 2.02s
LogTemp: Display: [ClaudusBridge] Provider runtime at http://127.0.0.1:50662/v1/messages did not respond;
rediscovered live port and retrying synchronously against http://127.0.0.1:52523/v1/messages.

The libcurl warning is unavoidable when the actual HTTP request times out — that warning is emitted by UE's HTTP module, not by us. But the retry recovers and the user gets their answer.

Genuinely no runtime

LogTemp: Display: [ClaudusBridge] Provider runtime at http://127.0.0.1:50662/v1/messages did not respond;
rediscovered live port and retrying synchronously against http://127.0.0.1:50662/v1/messages.

(Or no rediscovered URL at all if dispatch.json is gone.) The retry path checks that the rediscovered URL differs from the failed URL — if the sentinel still points to a dead port, the retry bails and reports the underlying error. Use agent_login from any MCP client to spawn a fresh runtime in that case.


How to test (or reproduce) the self-heal

# 1. Confirm the current cached port
cb stream # not relevant, but a cheap "still alive?" check
Get-Content "$env:LOCALAPPDATA\ClaudusBridge\ProviderRuntime\dispatch.json"

# 2. Kill the runtime (replace <pid> with the value from dispatch.json's "pid")
Stop-Process -Id <pid> -Force

# 3. Spawn a fresh runtime
$script = "C:\<...your project path...>\Plugins\ClaudusBridge\Resources\ClaudusProviderRuntime\launch-claudus-provider-runtime.ps1"
Start-Process powershell.exe -ArgumentList "-NoProfile","-ExecutionPolicy","Bypass","-File","`"$script`"" -WindowStyle Hidden

# 4. Confirm dispatch.json now shows a new port
Get-Content "$env:LOCALAPPDATA\ClaudusBridge\ProviderRuntime\dispatch.json"

# 5. Send a chat — the editor swaps the URL silently
cb call ask_claudus '{"message":"Reply with the single word PONG. Nothing else."}'

You should see the response come back normally. Open the editor's Output Log; the only swap-related line should be the Display log entry showing the URL transition.


What stays user-owned

The self-healing pipeline only handles port discovery — finding which port the local runtime is listening on. It does not:

  • Restart the runtime when it dies (the runtime is a peer; the user or another process decides whether to relaunch).
  • Persist provider OAuth tokens (CCAG / gemini-cli own credential storage; the plugin only reads sanitized presence flags via claudus_get_last_auth_state — see Cognition Tier overview).
  • Migrate sessions across editor restarts (each UE process is independent).

For runtime auto-launch on demand, the existing agent_login flow already handles bootstrap on a cold start. Self-heal kicks in after a runtime has been running and the editor knows its sentinel.


Where to go next

  • Cognition Tier overview — Provider auth visibility (the observability layer on top of the pipeline)
  • Connecting Your AI Client — How agent_login spawns the runtime in the first place
  • Auto-Observations — If a connect failure ever makes it to the user as an error chat entry, the action_failed observation captures it for the next session's audit