You finally got your system to beat Mem0 on its own benchmark. Spin up a fresh DB. Things are good, confabs down, productivity is up. A week or two passes, and it's a goldfish. Open your store, and it's the Red Wedding in there. Your agent has either been saving nothing you want, half what you need, something about nothing, OR EVERYTHING! C'Est La Vie.
I'm going to try to convince you that I got it figured out; if not, maybe it will help you get your model under control. Cause I promise, I hit every failure mode building Recall, a local active memory outside of an agent's control.
The failure modes
Quietly not writing. You ask the model to remember something durable. It says "noted" and moves on. Nothing lands in the store. No error, no warning, just a turn that ended without a write. This is the most common one and the hardest to catch, because from inside the conversation, everything looks fine.
Half writing. The model writes one fact and drops the three that mattered as much. Or it writes the headline and not the reasoning behind it, so a later session gets a claim with no support. The store fills up, but with fragments you cannot act on.
Writing the wrong thing. If your memory is structured (required fields, typed records, confidence, evidence links the model fills the structure out wrong. It puts a passing observation where a decision should go, leaves the confidence blank, or points a "this corrects that" link at a free-text label instead of the actual record. The schema is satisfied on paper and is useless in practice.
Writing everything. The overcorrection. The model dumps the whole turn into the store: every aside, every dead end, and sometimes a secret it should never have persisted. Now you have a second problem on top of the first, because data buried is the same as data corrupted
Why this happens
The model has no stake in the future session. Inside a single turn, the context window already holds everything the model needs. Writing to an external store is, from the model's point of view, work that pays off for someone else: a future session it will never experience as itself. It optimizes for finishing the turn in front of it, and the write is the first thing to get skipped.
There is usually a competitor. If your agent runs inside a host like Claude Code, that host probably ships its own memory feature, wired into the base system prompt. When two "save this" pathways exist, the native one wins, because it is closer to the model's root instructions than your skill is. Your memory system can be fully armed and still lose every write to the built-in one. I confirmed this with a single-variable test: with the native feature on, the model wrote the user's facts to flat files every time, no matter how loudly my system asked for the structured store.
Writing is harder than reading. Reading is free-form: ask a question, get text. A structured write means satisfying a schema, and the moment the model meets friction, it takes the path of least resistance, which is to skip the write or to dump unstructured prose. Friction is not a small factor here.
There is no feedback in the loop. When the model writes the wrong structure and the write just fails silently, nothing teaches it otherwise. It shrugs and continues. Adherence with no signal is a coin flip; the model loses a little more often every turn.
Three solutions that do not work
Tell it harder in the prompt. The instinct is to add "ALWAYS write durable facts to memory" in capital letters and call it done. This is prompt-nagging. It competes with the native pathway and loses; it costs tokens on every turn, and it decays: the model obeys for a few turns, then rationalizes its way out ("this is just a simple note", "I will write it later"). It is also brittle across models, so the day you switch models, you start over.
Log everything and clean up later. If the model does not decide what is durable, make it write all of it and curate afterward. This trades the empty-store problem for a curation-debt problem, defeats the entire point of a schema, and is the exact path that leaks secrets into the store. You have not solved adherence. You have moved the failure downstream and added a cleanup job you will never get to.
Fine-tune a model to obey the schema. Reach for training, and you get a heavy, expensive fix that is brittle to schema changes, locks you to one model, and still does not address the competing native feature. It is a large hammer for what turns out to be a wiring problem, and the wiring problem is sitting right there, unsolved underneath it.
Two easy fixes that actually help
Turn off the competitor. This is the single change that helps most, and it is one line. If the host ships its own auto-memory, disable it so there is only one "save this" pathway in the building. In Claude Code that is CLAUDE_CODE_DISABLE_AUTO_MEMORY=1. With the competitor gone, a properly armed agent reaches for the structured store on its own, because nothing is shadowing it anymore. Most of the "quietly not writing" problem was never the model refusing. It was the model writing somewhere else.
Lower the write friction. Give the model a small helper that takes only a few inputs it can judge (the record type, a title, a body, a confidence, a couple of topics) and emits the schema-valid object for it. The model stops hand-assembling a structured payload and picks the two or three load-bearing fields instead. In Recall, this removed the schema-friction tax on the first write of every session, which was where most of the "writing the wrong thing" came from. The model was not being careless. It was being asked to do clerical work under load, and it cut corners exactly where you would expect.
These two get you a long way. They do not, by themselves, guarantee the write happens at the right moment, or that a correction supersedes the old value instead of sitting next to it. For that, you need the system, not the model, to carry the discipline.
The real fix: Ta dun Ta da hooks
The durable answer is to stop relying on the model and move the adherence burden onto hooks that trigger from events that perform actions between the beginning and end of that forward pass.
At the start of a turn, inject the memory. A hook on session start or on prompt submit that says, in-band, "the memory store exists, read it before you rely on recollection," and then hands the model a mini-index of what is already stored that is relevant to this prompt: ids and titles, nothing heavy. This does two things at once. It makes reading the default instead of an optional courtesy, and it kills the "assert from memory" and "ask the user a thing they already told you" failures by showing the model what is on the shelf. Reading first is also what makes writing meaningful: a model that has seen the current state writes the resolution, not a duplicate.
At write time, enforce the structure in-band. Put a validation gate in front of the store so a malformed or secret-shaped write bounces with a readable error the model can fix on the spot, instead of failing silently or corrupting the store. This is where "writing the wrong thing" and "writing everything" get caught. The schema stops being a thing the model has to remember to honor and becomes a thing the system guarantees. The same gate is where you reject secrets, so a leaked token never reaches the graph in the first place.
At the end of a substantive turn, nudge the write. A stop hook that checks whether the turn produced something durable and nothing got written, and prompts for it. This closes the "quietly not writing" gap from the other side: even if the model forgot, the system asks once before the turn ends.
The shape of the fix is the same in all three places. The model's job shrinks to the part only it can do, which is judging what is durable and how confident it is. Everything mechanical (when to read, when to write, what shape the write takes
There is a small equation hiding in here that I found the hard way. Obedience is the product of three things: the model's intent on the turn, the arming you put in place (the skill, the helper, the hooks). That is why "tell it harder" fails on its own; it is the factor most likely to be silently zero while you debug the other two.
What the future looks like
Business as usual, and your memory system fails in the most expensive way possible: it looks like it is working. The store exists, the writes occasionally happen, and you do not notice until a session confidently tells you something three versions out of date, or asks you a question you answered 10minutes prior, or starts cold and re-derives what the last run already knew. The store becomes a graveyard you stop trusting, and you quietly go back to pasting context in by hand. You are now maintaining a database for nothing, which is strictly worse than not having one.
Fix it, and the thing compounds. Sessions inherit. The model reads before it acts, writes the resolution when it corrects itself, and supersedes the old value instead of stacking a new one next to it, so the current answer is always on top and the history still survives underneath. The memory gets more useful the more you use it, because every correction makes the store sharper instead of noisier. You stop re-explaining your own project to your own tools. That was the entire promise of agentic memory,
I didn't talk about RAG, separate embedding models designed for retrieval, and only touched on automemory because. I'm saving some sauce for the ribs.
I've spent the better part of five or six months now putting the work in on , Recall, a push-style memory substrate for agents: structured records, computed and calibrated confidence, directional value updates with provenance and the hooks described above. It's open, any and all feedback of its behavior on other systems is appreciated. Thank you for your time and the read. github.com/hendrixx-cnc/recall.
Top comments (23)
The "looks like it is working" failure mode is the convergence point I keep watching peers name under different vocabularies this week. Marcos Henrique's Poirot post landed on it from the incident-response side (well-formatted wrong report). Shudipto Trafder's hallucination measurement framework named it as the gap between demo and deployment. The thread under Raffaele Zarrelli's push-by-state post from this morning called it "actively steering with confidence." Your version specific to memory adherence is the production formulation: the store exists, writes occasionally happen, and the goldfish symptom only shows up days later when somebody notices a confidently wrong recall. Different domains, identical failure shape, and none of the standard observability catches it.
The three-place hook architecture you describe maps onto why pull-by-relevance is structurally blind for memory adherence specifically. A durable fact that has not been written yet has no relevance hook for the ranker to grip on, which is the same atom Raffaele just named in another thread ("openness is not a similarity signal"). Session-start injection, in-band write validation, and end-of-turn nudge are the three places push-by-state has to compensate, because relevance scoring has nothing to score against until after the act of capture has happened. The model is not refusing to write. It has no salience signal that this is the moment to write.
The native-competitor lesson is the part I think most teams will under-budget. CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 is one line, but the broader pattern is that when two pathways exist for the same operation, the one closer to the model's root instructions wins, regardless of how loud the secondary one is. Same shape as IAM versus prompt-injection: the system-level boundary wins over the prompt-level instruction every time, in both directions.
The piece all of this rests on, and the part I have not seen anyone in the cluster name out loud yet, is that the architecture (hooks, validation gate, nudge, native-competitor disable) requires the largest amount of sustained smooth-operator discipline of any agent fix I have read this year. Engineering-perfect as a recommendation, unverifiable as a real-life property. The failure mode is the operator quietly dropping one of these after a frustrating Tuesday and nobody noticing until the goldfish symptom returns. The system architecture is the easy half. Operator discipline holding across months is the hard half, and there is no planted-fault test for that one.
One concrete question, since Recall is push-style and you have been running it for months: at the end-of-turn nudge, do you ever surface the not-yet-written candidate to the operator instead of the model, or is the loop fully closed inside the model's pass? The handoff-to-operator path is the one I keep coming back to for the case where the model itself has no salience signal that something was durable.
dev.to/hendrixx/confident-confabul... I push an incomplete precompile of keywords and addressed cell IDs. The 1st hook is a primer with hints and cues, BM25, then the model does a full compile on a second stophook when it's ready to respond, a third end-of-turn hook, and forces a strict schema firewall that won't let it end until it's satisfied. The edge programs attenuate the scores. The secret sauce is that I'm showing the model its effective confidence on the first hook with a trending instruction.
The variance-signal post is the missing piece your architecture rests on, and reading it after the hook description makes the design click in a different way. Showing the model its effective confidence on the first hook with a trending instruction is engineering against the exact failure mode the variance work identifies: confident confabulation has the same mean as truth but larger swings, and the model itself has no observable signal that it is in the high-variance regime unless you hand it one. Externalizing the variance metric into the prompt converts an invisible internal property into a steering input. That move is the bridge between the two pieces.
The three-hook architecture (primer with BM25 hints, full compile on second stophook, end-of-turn schema firewall) is also where the recognition regime from your variance data does practical work. Your data shows recognition is the only directional signal that separates from confident confabulation reliably (entropy up, cosine drops). The schema firewall plus trending confidence plus end-of-turn check structurally biases the model toward the recognition posture instead of letting it settle into confident-confabulation posture. The variance-signal paper is the empirical justification for the hook architecture, not just an adjacent finding.
One honest concrete: the "incomplete precompile + full compile on second hook" pattern is doing the same job as the planted-fault test on the patch engine I was asking about elsewhere, applied to memory rather than to plugins. The first compile is the structurally-required check that something is being attempted; the full compile is the assertion that the attempt converged. If the model produces a first compile that never reaches a full compile, the schema firewall catches the abandonment without needing a separate alarm.
To answer my earlier question explicitly with what you've just shown: the end-of-turn nudge surfaces the not-yet-written candidate to the model via the schema firewall, with the trending confidence as the salience signal. The loop is closed inside the model's pass, but the firewall enforces the structural boundary the model would otherwise drift past. That's tighter than the operator-handoff path I was reaching for, in the cases where the schema can carry the contract.
Edge programs attenuating the scores is the part I want to learn more about. Is the attenuation applied to the model-facing display of confidence (so the model sees a smoothed signal), or to the eval/log side (so downstream auditing sees a stabilized metric), or both?
I have an even better post coming up, the incomplete push everything enters as data model only acts your prompt and searches the graph with keywords they can't force action becuase its been dismantled into tokens and returned as graph IDs
mitigates outside prompt injection through the memory store, and BM25 can track a manipulative word list and flag third-party, with a value, and that can also be tracked by the DBs' stood-up functions like a trendline derivative that only decays with time. So long, cons get timed out or pushed below a safety threshold, you can do a hashed allow list on known good DB members. make untrusted members earn trust through the same mechanism. I'm still working out the details, but I'm stoked about the functional memory layer and model defense security for free woot^^next week I'll have the goods.
The dismantle-into-tokens-and-return-as-graph-IDs move is the structural form of the same boundary discipline you already have at the IPC layer: untrusted content cannot route through an executable path because the path has been removed by construction. The reputation-derivative-that-decays-with-time piece is the part I am most curious about, since trust-accumulated-through-observed-behavior beats trust-declared-at-registration by exactly the same margin that last-caught-a-planted-violation beats last-reviewed. Will read next week's post first thing.
One concrete question waiting for the goods: does the decay rate adapt to source class (one-off prompt vs persistent integration), or is it a single global curve? The two cases have different blast radii and probably want different half-lives.
From 920c6e3b ("BM25-carried bad-word probe accumulates a decay-only injection score on the actor"):
A bad-word / injection-signature hit raises a numeric injection-suspicion score that accumulates on the writer, not on the flagged cell. The cost lands on the actor's standing.
It is a ratchet with decay: the score only ever goes up on hits. The only thing that brings it down is time-decay, described as "the same exponential form as currency, its own tau."
So the function is exponential decay of the form S(t) = S0 · e^(−t/τ) applied to the accumulated score, with its own time constant τ (the same shape as the currency/staleness axis, not necessarily the same τ value).
Deliberately decay-only / no buy-back: an attacker cannot wash the score out by flooding good cells. Volume does nothing; only elapsed time clears it. Same asymmetry as the rest of the trust math: easy to lose, slow to regain.
The score feeds the writer's trust, so a writer racking up injection-suspicion gets their cells down-weighted out of the push automatically. The rise-vs-decay slope is the escalating-threat trend.
A flag is never a silent hard block (32c17f94): a single signature match does not nuke a cell, because a cell about injection looks like one attempting it. It feeds the flag plus the trust score, and a per-writer trend on injection-phrase density does the real escalation detection.
The accumulate-on-actor versus accumulate-on-cell choice is the part that turns this from a per-write detector into a behavioral one, and I think it is the load-bearing decision. A flag that lands on the cell only tells you "this content matched a signature." A flag that lands on the writer plus decays only with time tells you "this actor's history has a slope," which is the only quantity that survives the case where one cell legitimately discusses injection content. The single-signature-is-never-a-block rule is the consequence of that choice rather than an exception to it.
The no-buy-back property is the structural anti-Sybil piece worth keeping forward. Volume can pile up around an actor without diluting their accumulated suspicion, because the asymmetry is hard-coded: hits add, time subtracts, nothing else moves the score. That removes the obvious gaming path of creating many good cells under the same identity to swamp a flag, and it does it without a registration whitelist or a captcha. Same shape as rate-limit-on-bad-event-class primitives, applied to writer reputation rather than request volume.
One concrete question about τ: is τ_injection independently tunable from τ_currency, or coupled? If coupled, fast currency decay forces fast trust-regain too, which creates an awkward case for high-velocity legitimate writers (a CI bot legitimately writing many cells in a short window can look indistinguishable from an actor working off accumulated suspicion via volume + time). Decoupled τs let you say "this class of writer regains trust at a different rate than the content goes stale," which seems like the property you would want.
The "trend on injection-phrase density" callback to the trending-confidence display from the earlier thread is the part where the architecture starts looking unified. Same instrumentation primitive at two different sites: track derivative, react to slope rather than instantaneous value, externalize the curve to the decision surface. Different signals, identical mechanism.
I get blown sometimes by how readily the frontier models lean into using the system; they start responding with short descriptors and cell IDs, emergent behavior. They would reach for it and suggest its use (I speculate the signature of the IDs being present in auto memory was being inferred as missing a function). This was before the hooks; now it takes this shape. From 920c6e3b / 32c17f94): Try its free and no ex-fill.github.com/H-XX-D/recall-memory-su...
Pulled the repo. Few things landed harder than expected.
"Memory that merely persists vs memory that stays honest" is the distinction this whole space keeps approximating. You compiled it into a --contradicts primitive and a deterministic rank shift. v1 to eff:0.29(challenged) on a single contradiction is the operational form of: dissent moves rank, agreement does not raise the baseline. Most trust layers I read invert that.
Three observations from the read:
Reader = writer with admission firewall is the part most teams will misread as a simplification. It is the opposite — you closed the loophole where a second model gets called "independence" because it is a different process. Independence by judgment, not by content. Frame author cannot be the attester at the same scale.
Per-actor Brier as a writeable primitive is what most evals stop short of. Cost lands on the actor's standing, recomputed on read, not retrospectively in a dashboard nobody opens. That is the part that survives adversarial users.
The 334 → 334 tripwire on a throwaway db is the evaluation move I keep wanting more of in this space: planted-fault test applied to the report itself, not just to the system under test.
The 920c6e3b / 32c17f94 path you described — frontier models reaching for cell IDs before the hooks were in — reads as the signature working. They infer the missing function from the shape of what is present. Inverse of injection-suspicion: same trending-derivative primitive, cooperative side.
I work adjacent on the contracts layer, not the memory substrate, so reading as a neighbor. Will cite Recall next time I need to point at this primitive in production form.
Right on, thank you. I'm a machinist by trade, developer by proxy with AI, lowering the cost of admission to competent software engineering. If you know, you know. I'm through the firmament, I just finished a sparse QUBO/Ising Maxcut Algorithm 10m variables 265B updates/s on a 16GB 5060ti GPU, Been trying to bring up a Vertex 8 GB HBM FPGA to see what it can really, really do, but it's like pulling teeth trying to find the PCIe network on my workstation there's no f*cking literature on what's wrong de nada.
Machinist-route to engineering depth is the part most pure-software people don't have access to. Manufacturing teaches you consequence-locality as a native sense — you cannot ship the „it usually works" answer when the workpiece is in your hand. That intuition shows up in everything you write about memory and audit primitives. The substrate has to honor reality because the alternative is somebody's broken arm. Most of the field is still working from the „it should be fine" default.
I came back to building through the same door, from the other side. Spent years doing things that were not shipping software, then AI lowered the energy cost of turning an idea into a working artifact enough that the return was actually viable. The thing I keep telling people is that this only works if the operator brings real cross-domain experience to the loop. Otherwise it is „AI do this for me please" and the output has the right shape and zero consequence-locality. The long detour through other domains is what makes the AI output worth keeping — you can read where it is hallucinating because you have a native sense of where ground truth has to live and where it cannot. „Developer by proxy" is honest framing only if the proxy is being driven by judgment, not generated by it.
10M variables on Maxcut at 265B updates/s on a 5060ti is real CUDA work. The interesting part for what we have been arguing about: at that scale you cannot point-verify optimality — you can verify the cut value but you cannot verify it is the best cut. That is the no-floor case from the three-floors split, applied to combinatorial optimization rather than agent output. The counterparty who could check optimality does not exist; the best you have is structural lower bounds, randomized restarts, and an admission that „best I found in budget" is the honest claim. Same shape as agent eval, different domain.
On the Vertex PCIe bringup, honest stage: I am out of my depth on FPGA bus enumeration. If it were mine I would be flailing with lspci -vvv | grep -A 20 -i vertex for capability advertisement, dmesg | grep -i -E "pcie|aer|bar" for kernel-side handshake failures, and checking IOMMU groups and BAR allocation conflicts in the kernel command line. Most of the Vertex-specific stuff lives in vendor application engineer Slack channels and customer-only NDA driver docs — „no fcking literature"* is the right diagnosis for that whole class of hardware. Sometimes the only path through is calling the FAE.
Post about the bringup once you get it. The class of „I made a serious piece of accelerator hardware actually work without the literature" writeup is one of the things this space needs more of, and you are positioned to write it. Until then I will keep reading what you ship.
nvidia_smi_samples.csv
results.jsonl
run.log
---- peak record ----
{'_file': 'consensus55_1024_deep.json',
'best_cut': 2152466096,
'consensus_pct': 55,
'dense_consensus_solver_start': True,
'estimated_edge_shots_per_sec': 132115900000.0,
'estimated_edges': 2147450880,
'generation_stats': [{'best_cut': 2152356971,
'generation': 0,
'locked_spins': 16830,
'milliseconds': 20431.65,
'target_hit': False},
{'best_cut': 2152393203,
'generation': 1,
'locked_spins': 58127,
'milliseconds': 18190.899,
'target_hit': False},
{'best_cut': 2152466096,
'generation': 2,
'locked_spins': 63528,
'milliseconds': 11240.456,
'target_hit': True}],
'generations_done': 3,
'generations_requested': 8,
'graph_seed': 20260710,
'hit_generation': 2,
'hit_index': 959,
'iterations': 65536,
'mutation_ppm': 40000,
'nodes': 65536,
'shots': 1024,
'status': 'success',
'target': 2152400000,
'target_hit': True,
'top_k': 128,
'total_milliseconds': 49933.194,
'total_shots': 3072}
Artifact lands. The cascade has the same shape as MAL: locked is anchor, unlocked is per-generation recompute. Same primitive on two substrates.
Honest stage from this side: I can see it worked and exited on budget, cannot calibrate 132B edge-shots/sec without a baseline I do not have. Forward question: does sweeping consensus_pct 50/55/60 shift the lock-cascade shape, or just wall-clock? Threshold as quality knob versus budget knob.
GPU: NVIDIA H100 80GB HBM3, 81559 MiB
=== build portable + h100 ===
BUILD_OK
=== correctness gate G1 (must be 11624) ===
portable: gpu_best=11624 ratio=1.0000
h100 : gpu_best=11624 ratio=1.0000
=== generate sparse graphs (avg deg 6) ===
graphs ready
=== A/B throughput: portable vs h100-smem (chains=1056 block=128 sweeps=1000) ===
n=8000 portable=3.525922e+10 h100=4.393392e+10
n=16000 portable=3.299176e+10 h100=2.839285e+10
n=32000 portable=1.688648e+10 h100=2.106597e+10
n=65536 portable=8.693548e+09 h100=2.333180e+10
n=131072 portable=5.806947e+09 h100=9.142611e+09
=== peak hunt h100-smem on g16000 (chains x block) ===
chains=528 block=128 ups=2.733804e+10
chains=528 block=256 ups=4.027037e+10
chains=1056 block=128 ups=2.836440e+10
chains=1056 block=256 ups=3.466102e+10
chains=2112 block=128 ups=2.864729e+10
chains=2112 block=256 ups=3.605591e+10
chains=4224 block=128 ups=2.954953e+10
chains=4224 block=256 ups=3.768414e+10
--- TOP 3 ---
chains=528 block=256 ups=4.027037e+10
chains=4224 block=256 ups=3.768414e+10
chains=2112 block=256 ups=3.605591e+10
=== confirm peak (chains=528 block=256 sweeps=4000) + GPU util/power ===
kernel=h100_smem instance=g16000.gset n=16000 edges=48000 colors=7 chains=528 sweeps=4000 block=256 smem_bytes=16000 max_optin=232448
gpu_best=38282 kernel_ms=817.1 updates=33792000000 updates_per_s=4.135688e+10
--- util header + busiest rows (sm% in col 5) ---
gpu pwr gtemp mtemp sm mem enc dec jpg ofa mclk pclk fb bar1 ccpm
Idx W C C % % % % % % MHz MHz MB MB MB
## DONE
Artifact lands. The correctness gate G1 (must be 11624) is the move worth pointing at — pre-committed target verified before throughput numbers count. Planted-fault check applied to your own benchmark workflow, same primitive as the admission firewall but at validation layer rather than write layer. Most benchmarks ship „look how fast" without the „here is what right means before we ran it" gate.
Portable vs h100-smem on identical input is real cross-implementation independence test. Two implementations, same correctness target, divergence shows hardware/algorithmic frontier rather than producer bias. Same shape as the confidence-extraction conversation — independence by structure, not by claim.
Reading the corpus as it lands.
I borrowed components from Linuxcnc to wire my graph like HAL. I'm calling it MAL Memory Abstraction Layer. I win for clever. I like or2's and Lut5 to do work on the memory layer outside the model between passes. Imagine a PID that surfaces based on the feedback from a different memory delta. You can watch a call and trigger on a signal appearing, or you can watchdog a cell for the absence of an expected event. Like blast a Slack notification if a deploy-ready gate falls before you even try to compile or CI automated observance from a volatile tick/turn computed confidence score, the edges do the math, the hooks reinforce graph orchestration during its forward pass....a single forward pass$$ There is a lot going on under the hood that hasn't shipped yet, drunk with ideas. though
HAL ports cleanly here. People keep reaching for „make the model smarter" when what they actually want is a real-time deterministic substrate where the model is one component pinned to the bus, not the conductor. CNC had to solve consequence locality the hard way — a wrong path through metal hurts somebody's arm — so the constraints HAL was built under are not academic. Borrowing or2 and Lut5 to do work between passes is the move.
Watchdog-on-absence is the piece I'd put highest. Most observability triggers on „did this happen as expected." Watchdog-on-absence triggers on „did this NOT happen when it should have." That is the runtime form of catching a planted violation — you cannot fake silence into looking like progress when an or2 is waiting for an event that has to land.
PID on a memory delta scales beyond your repo. Confidence as a tick/turn computed signal where the edges do the math means the model never declares its own state — the substrate computes it from observable deltas and the model has to operate with the result. Same shape as effective confidence recomputed on read, but as a continuous control signal rather than a re-read pull.
„Drunk with ideas" is the right state for this to be in. The risk worth naming: HAL works because components are typed, signals are typed, and the wiring is deterministic. The moment a component decides to write whatever it wants to the bus, the whole thing collapses to mush. MAL inherits that constraint or it inherits none of the actual properties.
One question while you sketch: who admits new component types? That is where most HAL-shaped systems leak — the bus stays clean but the what-counts-as-a-component gate is the soft spot. If the model can register its own watchdogs, it can register watchdogs that never fire.
MAL has many writers one reader for workflow efficiency. HAL is one writer many readers, safe operation.
Right, the inversion is the whole point and I missed naming it. HAL is one writer many readers because the safety property has to be unambiguous at every read site — multiple writers to one pin is the failure mode the architecture exists to prevent. MAL is many writers one reader because the integration point is the model's forward pass, and the bus is doing the work HAL's wiring does in reverse — collecting contributions rather than distributing a guarantee.
The interesting consequence is where the safety-from-HAL gets re-earned. In HAL it lives in physical wiring discipline and signal typing. In MAL with many writers, it has to live in the admission gate — schema discipline at write time, contradiction edges that fire deterministically at read time, and a clear policy on what happens when two writers post incompatible signals to the same cell. The single reader gives you one place to consolidate judgment, but it also means the reader cannot afford to be the only writer — that collapses you back to self-report, which is the loophole the audit primitive exists to close.
That puts the admission question I asked back in the center, sharper than I had it: with many writers, what counts as a writer is the discipline that keeps the architecture honest. Watchdogs registering their own watchdogs is one failure mode. Two PIDs writing to overlapping cells with different damping is another. HAL never had to think about this because the wiring was the gate. MAL has to make the gate explicit and enforce it at runtime — which is exactly the schema-validated admission firewall pattern, just generalized from memory cells to component types.
Every proposal to the DB has to pass through the schema firewall, all sources, other models, tests CI, person no exceptions. It's how the graph stays healthy, and integrity can be trusted. Values are immutable calibration anchors; their effectives get computed through the MAL network and dont exist till called. The hooks have clear, concise instructions attached that get pushed*into context every time, and the real work is done outside the model by the runtime. The agent's responsibility is with what's in band for that pass + what it's doing to those surfaces and how they relate on its search, and it all happens in the same forward pass when context is freshest
No exceptions on the schema firewall is the right structural answer. Privileged writers are how every audit primitive eventually leaks — the moment one source can post without passing the gate, the gate becomes a suggestion. CI tests are usually the first carve-out somebody requests; refusing that carve-out is the discipline that keeps the architecture honest.
The immutable-anchor plus compute-on-call split is the part worth sitting with. Recall stores the effective confidence and recomputes on read. MAL does not store the effective at all — it computes when the reader pulls and never persists the derived value. That is a stricter commitment. There is no stale effective in the graph because there is no effective in the graph. Integrity lives in the anchors, honesty lives in the compute, and the separation is architectural rather than conventional.
One implication worth naming: at scale, with many edges doing math on every pull, read cost stops being trivial. Recall's recompute is per-cell; yours is per-graph-traversal. That has to be either memoized within a single forward pass or accepted as a real per-pull cost. Which way are you going on that?
Gotta leave them wanting more, home boy. I got 2 solid months of daily posts coming up, 2 of which if you seen logs with me laying the foundation out of how and why you would think I was from Flint or something.
Right on. Two months of daily posts turns this thread into a corpus, so I will be there reading. The „logs seen logs" framing is the recursion direction this whole conversation has been pointing at without anyone landing it cleanly, so curious how you build it out. Flint reads fine on this side. Shop-floor grain is what makes the work readable, not despite it.