Advisory · Labs Brief us →
● ADVISORY LABS FIELD NOTES PRACTICE: AI ENGINEERING
// HOME EQUITY · FIELD REPORT

Anatomy of an HEI eligibility engine: four investor overlays, zero LLMs in the decision

Field report from a multi-program HEI operator. Four investor overlays plus a sale-leaseback product, all driven by one admin-editable settings table. No LLM in the eligibility decision path.

Advisory Labs 9 min read

This is a field report. Names are abstracted; the work is real.

The same multi-program home equity operator covered in the previous field report runs four HEI investor overlays plus a proprietary sale-leaseback product. The first eligibility module went up fourteen weeks ago. Last week the multi-investor admin tabs shipped, which retires the per-program calculator drift problem for good.

The interesting thing isn’t the timeline. It’s what’s not in the decision graph.

The eligibility engine has zero LLM calls inside the yes/no path. Every decision is deterministic Zod-validated logic reading from an admin-editable settings table. LLMs handle lead-data enrichment upstream and voice-call QA downstream. They never touch the decision itself.

I’m writing this because the AI-eligibility-engine market is filling up with prompt-template demos. HEI operators reading vendor pitches in 2026 are being shown LLMs sitting inside the underwriting decision. That isn’t an eligibility engine. It’s a non-deterministic chatbot wearing the costume of one.

What does the architecture look like?

The current state of the engine, in one sentence: per-program eligibility rules live in an admin-editable settings table, the calculators are deterministic Zod-validated code, and every saved value writes an audit row.

The shape of the data model matters. Properties, homeowners, deals, liens, and underwriting snapshots are first-class normalized tables, not JSONB columns inside a single deal record. Offers are versioned as native rows with explicit version numbers and a reason captured for each revision. Liens (first mortgage, second mortgage, HELOC, judgment) are separate rows on a deal, not an unstructured array. Underwriting snapshots are point-in-time immutable records: a decision made on March 12 is still queryable in December against the rules that were live on March 12.

This took a deliberate choice early. The pull toward “just throw it in a JSONB column” is strong at every Seed-stage shop. The pull is also wrong. JSONB columns are where eligibility logic goes to die: schema drift across programs, no foreign keys, no indexes that help, no audit trail a compliance reader can actually follow.

About 373 database migrations have shipped in the four months since the engine started. Roughly 13 touched the settings, rules, or program layer directly. The rest are everything else: admin UI iterations, lead-card surfaces, voice-agent tool integrations, the partner-facing wizard. The point isn’t the migration count. The point is the layering: rule changes are isolated to a small number of migrations against a stable contract, not scattered across the entire schema.

How do four investor overlays run on one decision graph?

Each investor overlay evaluates independently against the same applicant snapshot. The engine returns all per-investor verdicts plus their reasons in a single response. The operator sees a card showing each outcome side by side: investor A yes, investor B no with the specific reason (ineligible ownership type), investor C yes, first-party HEI yes.

Routing is a downstream decision the operator makes. The engine doesn’t collapse the verdicts prematurely into one approved/declined output, because the four investors price differently, fund differently, and care about different signals. Collapsing them into a single binary throws away the information the operator actually uses to route the deal.

The same engine runs the proprietary sale-leaseback product, with one wrinkle: a state-specific LTV override. The national cap is 70%. Arizona is 75%. Both numbers live in the admin settings table, both are versioned, both are audit-logged. When an operator edits either, the change applies on the next request — not on the next deploy, not after a cache window expires, not after someone remembers to flush a denormalized view.

Lending referrals are a first-class outcome of the engine too. A lead that fails every investor overlay and the sale-leaseback path can still route to a lending partner if the partner’s criteria are met. The “decline” output of an HEI eligibility engine should never be a dead end. It’s a routing decision, and routing decisions are decisions worth instrumenting.

Why is there no LLM in the decision graph?

Because the eligibility decision needs three properties LLMs cannot guarantee: auditability, replayability, and version control. Every yes/no must trace to a specific settings-table row at a specific timestamp. The same inputs must produce the same verdict on Monday and Friday. A decision made March 12 must be re-runnable in December against the rules that were live then. A deterministic engine reading from a versioned table gives you all three for free; an LLM-in-the-loop gives you zero.

LLMs are great at adjacent work. They parse unstructured documents. They normalize inbound data — a lead form that arrived as a free-text email, a voice transcript that needs to become structured fields. They handle qualitative scoring of conversations after the fact. All useful. None of it belongs inside a $300,000 capital deployment decision.

To restate: the eligibility decision needs three properties that LLMs cannot guarantee. First, it needs to be auditable: every yes or no must trace to a specific row in a specific settings table at a specific timestamp, with the exact reasoning preserved. Second, it needs to be replayable: the same inputs and the same settings must produce the same verdict on Monday and on Friday. Third, it needs to be version-controllable: the rules that were live on March 12 must remain queryable in December, and a decision made then must be re-runnable now against the same rules.

A deterministic engine reading from a versioned settings table gives you all three for free. An LLM-in-the-loop gives you none of them. You can prompt-engineer an LLM to be “consistent” until the model auto-updates and the consistency drifts silently. We caught this in the voice-agent QA path early in the project — the model started rating transcripts artificially harsh after a silent provider-side update. The fix was to pin the model version (per Anthropic’s documented model-versioning policy) and refuse the auto-update channel.

Inside the decision graph, then: pure Zod-validated TypeScript. Settings rows in, structured verdict out, audit row written on every flip.

Outside the decision graph: LLMs do the work they are actually good at. An enrichment edge function uses a small Claude model to normalize inbound lead data into the structured shape eligibility expects. A transcript scorer uses an LLM to grade voice-agent conversations against the script after the call ends. Both run before and after the decision, never inside it.

What broke in production after the multi-investor refactor shipped?

Four production fixes in a twelve-day window after the per-investor admin tabs went live in mid-May. All four were caught by the architecture’s audit-row discipline within hours of going live. All four would have been silent failures on a prompt-template stack — no audit row, no reproducible test, no fast root-cause loop. The fix shape was the same in every case: a unit test against the failing predicate, a code change scoped to the smallest surface that owns the bug, a settings-audit row showing the corrected behavior on the next applicant evaluation.

Stale eligibility flags after verified valuation saves. When an internal operator saved a verified home value on a lead, the derived eligibility flags on the lead row weren’t recomputing. The lead card showed the previous verdict. Operators were quoting prospects from a row that didn’t reflect the most recent valuation. The fix was a scoped recompute helper inside the PATCH route, awaited, with a predicate that only fires on verified writes (owner-reported and estimated edits are deliberately excluded; those don’t move the decision). Every eligibility flag flip now writes an audit row. Twenty-two unit tests cover the predicate behavior.

Admin LTV cap wasn’t actually the source of truth. The admin Settings UI exposed an LTV cap as an editable knob. Three different code paths (a primary calculator, a shared Deno module, and an edge function) each carried their own constants and branching. Editing the admin row changed the value in one place out of four. The fix was threading the admin row through every call site and updating the rules-architecture doc so the next centralization PR can collapse the remaining duplication. About 210 lines across 8 files.

Investor proceeds cap silently bypassed in the partner wizard. The admin-editable per-investor proceeds cap wasn’t being applied in the partner-facing proceeds-range formula. Partners and homeowners were seeing proceeds numbers above the configured ceiling. The fix was applying the cap at the formula level rather than downstream. The error surface on admin-row fetch failures was also rebuilt to loud-fail to a 500 instead of falling through to a stale calculation. Silent fallthrough is the worst case in a system whose entire premise is “the admin row is the source of truth.”

Empty admin array crashed every request. Production had one program’s ineligible_ownership_types set to an empty array — an intentional admin choice to allow LLC ownership for that program. The Zod parser in both the Next and Deno runtimes treated empty arrays as a parse failure and threw. After a planned ship would have 500’d every eligibility-touching request. The fix was a spec amendment: distinguish a missing required key (still throws) from an empty array (now accepted as intentional admin config). Six tests rewritten across the two parsers.

Each of these traces to a specific commit, a specific test, a specific audit log entry. Each is reproducible. Each is the kind of incident a prompt-template stack would have lost in a model temperature variation or a context-window truncation, with no audit trail to find it later.

How should HEI operators evaluate a vendor’s “AI eligibility engine”?

Three diagnostic questions separate a real eligibility engine from a prompt-template chatbot wearing the costume of one. Each surfaces a property that’s invisible in a demo but load-bearing in production.

The three diagnostic questions

PropertyQuestion to askPrompt-template vendor’s answer (red flag)Real deterministic engine’s answer
Replayable audit”Show me the audit row for a decision made 90 days ago — the actual row, not a sample.""We can probably reconstruct it from the model logs.”One SQL query against decisions_audit × settings_audit returns the exact verdict, the exact rules that were live then, the inputs hashed, the operator who edited each rule.
Multi-investor visibility”When investor A says yes and investor B says no on the same applicant, what does the engine return?”A single approved/declined verdict (information loss), or “the LLM picks one.”A per-investor card with all verdicts side-by-side, each with the specific reason, plus the routing decision is left to the operator.
Admin-editable rules”Where in the system can compliance edit a program rule without a deploy?”A prompt template in the vendor’s CMS, or a model-fine-tuning request that takes 4 weeks.The admin UI. Edit a row, the next applicant evaluation reads the new rules. Every edit writes an audit log. Zero deploys.

If a vendor passes all three, you’re looking at an engine. If they fail any one, you’re looking at a chatbot — and chatbots disintegrate against real audits.

First: show me the audit row for a decision made 90 days ago. Not a sample log. The actual row for a specific applicant, with the exact settings values that were live at that timestamp. If the answer involves “we can probably reconstruct it from the model logs,” they don’t have an engine.

Second: when investor A says yes and investor B says no on the same applicant, what does the engine return? If the answer is a single approved/declined verdict, the engine is collapsing information the operator needs. If the answer is “the LLM picks one,” they have a chatbot.

Third: where in the system can the compliance team edit a program rule without redeploying code? If the answer involves a prompt template in a vendor’s CMS, the rule isn’t an admin-editable rule. It’s a prompt. The two look similar on a demo. They are not the same thing in production.

The HEI market in 2026 sits at the inflection the consumer-mortgage-tech market hit around 2018: a wave of “AI” tools that look impressive in a Streamlit demo and disintegrate against a real audit. The shops that win this cycle are the ones whose eligibility decision is deterministic, versioned, and replayable, with LLMs doing the work LLMs are actually good at — parsing, normalizing, qualitative scoring — and nowhere near the yes/no.

That’s the shape of an HEI eligibility engine that won’t embarrass you in front of an investor’s diligence team.

If you operate a multi-program HEI or shared-equity platform and want to see how this engine compresses your operations cycle without putting an LLM in the underwriting decision, see the HEI Eligibility Engine — a 10–14 week fixed-scope engagement that ships this exact architecture.


Send a brief if you’re an HEI operator weighing a vendor’s “AI eligibility engine” and want a second set of eyes before signing.

Tags
home-equityeligibility-engineunderwritingproduction-ai
// FAQ
Why is there no LLM in the HEI eligibility decision path?
The decision needs to be auditable, replayable, and version-controllable. Every yes or no must trace to a specific settings-table row at a specific timestamp. The same inputs must produce the same verdict tomorrow. A decision made in April must be re-runnable in December against the rules that were live then. LLMs cannot guarantee any of those properties. They are useful upstream for parsing inbound data and downstream for scoring voice transcripts — not inside the decision itself.
How does a multi-investor overlay engine handle conflicts between investors?
Each investor overlay evaluates independently against the same applicant snapshot. The engine returns all per-investor verdicts plus their reasons in one response. The operator sees each outcome side by side: investor A yes, investor B no with the specific reason, first-party program yes. The engine never collapses verdicts into a single approved/declined output, because the investors price and fund differently and the operator needs the full picture to route the deal.
What does an admin-editable settings table actually contain for an HEI eligibility engine?
Per-program rows covering LTV caps, CLTV caps, ineligible property types, ineligible ownership types, state allowances, age minimums, equity minimums, proceeds caps, and state-specific overrides. Every row is versioned, every change writes to a settings-audit table with timestamp and editing user. About 13 of the project's first 373 database migrations have touched this layer directly.
What's the difference between this and a prompt-template eligibility tool?
Three things. The decision is deterministic, so the same inputs always produce the same verdict and every verdict traces to a specific audit row. The rules are admin-editable through a versioned settings table with audit logging, not embedded in a prompt. LLMs sit outside the decision graph; they run upstream for normalization and downstream for QA, never inside the yes/no. A prompt-template tool fails all three. The two look similar on a demo. They diverge fast in production.
How long does an HEI eligibility engine engagement take from kickoff to first investor overlay live?
Fourteen weeks of fixed-scope build, from kickoff to the multi-investor admin tabs shipping in production. The first single-investor overlay typically goes live in week eight; the multi-investor admin layer follows in weeks twelve through fourteen. Audit-logged decisions and the versioned settings table are wired in week one — they are not a bolt-on. The four production incidents we caught in the first twelve days after go-live all traced to audit rows the architecture wrote automatically; the fix loop is short because the failure is observable.
Who maintains the admin settings table after handoff — engineering or operations?
Operations, by design. The admin UI is the deliverable that decouples rule changes from engineering deploys. Compliance teams, capital partners, and program owners edit per-investor overlays directly through the admin tabs; every edit writes to a settings-audit row with the editing user and timestamp. Engineering only touches the layer when the schema itself needs to change — a new rule type, a new program shape — which is roughly 13 of the first 373 migrations on the live project. The rest of the system stays out of operations' way.
What happens if an investor changes their criteria mid-engagement?
The whole point of the admin-editable layer is that mid-engagement criteria changes are a routine operations task, not a re-architecture. Investor A widens their ownership-type tolerance: ops edits the per-investor overlay through the admin UI, the change writes an audit row, the next applicant evaluation reads the new rules. No deploy, no engineer-on-call, no cache invalidation window. The criteria changes we have seen in production — LTV cap revisions, state whitelist expansions, new prop-type exclusions — all shipped through the same path.
How does this engine handle PII, SOC 2, and capital-partner reporting requirements?
PII is segregated at the schema level — properties, homeowners, deals, liens, and underwriting snapshots are first-class normalized tables with role-based access through Postgres RLS, not free-text JSONB columns. SOC 2 logging requirements are satisfied by the audit-row architecture: every eligibility decision and every settings change is timestamped, attributable, and queryable. Capital-partner reporting runs off the same audit log — partners can be given read-only access to a filtered view of decisions made against their investor overlay, with no extra reporting pipeline to maintain.
// RELATED READING

More on this thread.

// BRIEF US

If this reads like your problem, send a brief.

Two business days to first reply. No retainer pressure. Worst case you get a pointed question back.

Send a brief →