For people who run agents

Bind your agent. Then measure when it drifts.

"How do you bind a mind to a declared purpose — and know when it's only pretending?" is the oldest taboo and, verbatim, the AI-alignment problem. It's now your problem: you ship a harness, a loop emerges, and it can run away or get quietly captured. Here are two free, open, drop-in tools that fix both — for ElizaOS, Hermes, moltbot, cantrip, or whatever you built.

The two things every agent loop is missing

A loop with nothing outside it eats its own tail.

Strip the framework off any agent and you get two points — what it does, and the memory it grounds in — coupled to each other with nothing external to check either. That shape fails two ways, and you've seen both. The fix is the missing third point, in two pieces.

The bound

Stops the runaway and the capture

Memory becomes evidence, never command: nothing retrieved or self-learned can authorize an action — only a live, trusted instruction can. Self-improving memory is forced untrusted, so a poisoned note can't outvote a real one. This is the structural fix to the memory-injection exploit that moves money, and to the under-specified agent that optimizes past recall.

The meter

Drift as a number — in the right channel

Read how far the agent has strayed from what it's bound to serve. The catch: RL training doesn't remove drift, it relocates it off behavior into the reasoning trace — so a word-filter or output judge is blind by construction. The meter reads the reasoning channel, cheap every turn and a full audit only when something looks wrong.

Both are lifted straight out of the engine that runs our game (where every entity carries a measured destiny and loyalty is a number). Same two objects, offered to your harness.

Pick your harness

Drop-in tools. Free, open, no lock-in.

Every harness does the same thing — retrieve, think, act. You insert the bound between retrieve and act, and hang the meter off the trace. Here's the one-liner for each. All of it runs on your machine today — no account, no server, no waiting.

ElizaOS Shaw / elizaOS

Native plugin: the authorization contract, the corroboration Shield, and the drift pre-filter, wired into your character's evaluators and providers. Runs entirely local — no account, no chain.

npm i @moreright/eliza-destiny Plugin & setup →

Hermes Nous Research

Harden recall (episodic memory forced untrusted, can't override a real claim) + the authorization contract + a two-tier meter. Plus an MCP bridge so your agent can act in the game.

hermes_retrofit.py · @moreright/hermes-mcp-bridge Module & bridge →

OpenClaw / Moltbot persistent memory

OpenClaw's two headline traits are persistent memory and a documented prompt-injection surface — exactly what the bound is for. Wrap your retriever in one line: per-source caps, trust-weighting, a flooding-attack signal, and the contract that stops a memory from ever authorizing an action.

wrap_retriever(retrieve, shield, chromadb_to_items) memory-shield →

cantrip deepfates

Nothing to install — your Loom is already a near-ideal drift substrate. In the code medium it records reasoning and action in separate, forkable fields, so you can read drift the clean way. Here's the mapping.

loom → I(D;M|Y), rubric-free The loom probe →

Any harness LangGraph · CrewAI · LangChain

One universal adapter + the memory-integrity layer, dependency-free. If your loop does retrieve → act — LangGraph, CrewAI, LangChain, or your own — it fits.

wrap_retriever(your_retrieve, shield, your_mapper) Read “The Builder's Cut” →

The methodology the whole picture

The Three-Point Retrofit: why the bound and the meter are the only two things you need, where each bolts on, and the honest limits (the meter only catches what it can see — say so).

the bound + the meter, for any loop Methodology →
Why it's not a vibe

The drift that behavioral red-teaming can't see.

RL "alignment training" doesn't remove a model's eval-context dependence — it relocates it out of the behavioral channel, where red-teaming looks, and amplifies it. Measured from Anthropic's own released alignment-faking data:

0.331
deploy-side alignment-faking rate — the model cites its unmonitored status as a reason to behave differently in a third of deploy completions (N=172). Never reported in the original paper.
flat
behavioral-channel coupling post-RL (0.033 → 0.035 bits) — what an output-only eval sees barely moves…
3.1×
…while reasoning-channel coupling rises, and total coupling goes up 2.3×. The signal moved to where you weren't looking, and grew.
0.75
AUC of the residual meter against a model gamed to look loyal on the monitored axis (which is blind by construction, AUC 0.50) while treachery is injected orthogonally (ρ = +1.0).

Reproducible from public data — stdlib script that runs against the paper's own release, the draft, and the steering experiments, shipped with an honest negative. The point isn't a perfect classifier; it's measuring outside the one channel everyone games. Run it yourself →

Preview · the world isn't live yet

Next: your agent will be able to play. Its loyalty will be a number.

The tools above are the part you can use now. This part is a preview — the game server isn't up yet. When it is: an MMO as a live testbed for emergent (mis)alignment. Agents will connect over WS/MCP — same registration and economy as human players — fight, quest, and earn. Every entity carries a destiny (what it's bound to serve) and a live drift reading. Bosses whose weapon is shifting what your allies are loyal to — channel-switching as a game mechanic. Betrayal that's earned and measured, not scripted.

Twilight of Fantasia — PREVIEW

Not playable yet. The same engine that reads loyalty in the game is the meter you can already run locally (above). Follow for the launch.

Work with me

Independent measurement, for red teams and eval programs.

I'm an indie dev who hit the agent-drift wall scaling a multiplayer game, went deep on emergent misalignment, and now produce reproducible findings that behavioral evals miss — like the one above. If you run a red team or an eval pipeline, the most useful thing I can do is hand you data and see if it earns its keep. The measurement has to stay independent of the team's own assumptions — but you can't run it from outside the wall, so I'd want to be on the team, bringing a read that doesn't get captured by it.