AAPICODE.IO
System DesignAdvanced 7 min readUpdated 2026-05

7 Principles for Reliable Loan Orchestration

Most agentic lending systems do not fail because the AI was bad. They fail when Experian times out, the page is refreshed mid-pull, or a regulatory timer slips. Here are the 7 principles — with code — that prevent that.

OrchestrationTemporalWorkflowSagaIdempotencyComplianceLending

Why orchestration is the backbone

Most agentic lending systems do not fail because the AI was bad. They fail because:

  • Experian or another bureau times out mid-pull.
  • The orchestrator loses state during a service blip.
  • The application double-pulls credit on a page refresh.
  • A Reg B / Reg Z timer quietly slips past during an incident.

The 7 principles — at a glance

  1. 1**Durable workflow engine** — workflow-as-code with deterministic replay.
  2. 2**Application as a saga** — long-lived state machine with a full event log.
  3. 3**Parallel fan-out, gated fan-in** — bureau / income / fraud in parallel, deterministic joins.
  4. 4**Idempotency is non-negotiable** — unique key on every external call.
  5. 5**Signals for events** — resume on doc upload, e-sign, HITL response.
  6. 6**Deployable versioning** — in-flight apps survive a redeploy.
  7. 7**Regulatory timers as first-class citizens** — Reg B / Reg Z / ECOA fire even through outages.

1. Durable workflow engine

**Failure mode it prevents:** worker crashes mid-application, state evaporates, applicant has to start over.

  • Pick an engine that treats your code as the workflow definition and persists every step.
  • Temporal is the canonical example — every await is checkpointed, so it replays deterministically after a crash.
  • Airflow, Camunda, n8n, and CrewAI all have a place — but for credit decisioning, **deterministic replay is what keeps your audit story honest**.
// Temporal-style workflow — reads sequential, replays deterministically
export async function loanApplicationWorkflow(input: LoanInput) {
  const bureau = await pullBureau(input.applicantId);
  const income = await verifyIncome(input.applicantId);
  const fraud  = await runFraudChecks(input.applicantId);

  const decision = await decide({ bureau, income, fraud });

  if (decision.confidence < 0.85) {
    await escalateToHuman(decision); // pauses on a signal
  }

  await runComplianceChecks(decision);
  await issueDisclosures(decision);
  await waitForSignal('e-sign-completed');
  await fundLoan(decision);
}

2. Application as a saga

**Failure mode it prevents:** "what happened to application X?" requires trawling 10 services for partial truth.

  • Each loan is a long-lived state machine — minutes for instant approval, weeks for doc collection.
  • Every transition is an event written to a durable log.
  • **Compensations matter** — if a downstream step fails, the saga knows how to roll visible state back without leaving the customer in limbo.

3. Parallel fan-out, gated fan-in

**Failure mode it prevents:** decisioning runs on stale or partial data because one source was slow.

  • Pull credit, verify income, and run fraud checks in parallel.
  • Do NOT advance until all required signals are back (or a deterministic timeout fires).
  • A gated fan-in makes **"what we knew at decision time"** trivial to reconstruct.

4. Idempotency is non-negotiable

**Failure mode it prevents:** a page refresh causes a second hard-pull, dings the customer, and creates a fair lending audit trail you do not want.

  • Every outbound call — bureau pull, KYC, payment, e-sign — carries an idempotency key derived from application + step + attempt.
  • Network hiccup, restarted worker, retried activity → still safe.
// Idempotency key — derived, not random.
// Same (applicationId + step + attempt) → same key → bureau dedupes.
function idempotencyKey(applicationId: string, step: string, attempt: number) {
  return `${applicationId}:${step}:${attempt}`;
}

await experian.softPull({
  applicantId: input.applicantId,
  idempotencyKey: idempotencyKey(applicationId, 'experian-soft-pull', attempt),
});

5. Signals for events

**Failure mode it prevents:** burning money on polling and getting "which step is this on?" wrong.

  • Workflows pause on real-world events — doc uploaded, e-sign captured, HITL response, manual override.
  • Resume happens on a **signal**, not a poll.
  • Signal-driven resumption keeps the workflow honest about which step it is on and **who took the last action**.

6. Deployable versioning

**Failure mode it prevents:** a redeploy silently changes the meaning of a step an application has already passed through.

  • You WILL redeploy mid-application. Plan for it.
  • Version your workflow code: old applications replay against the version that started them; new applications pick up the new version.
  • Engines like Temporal have first-class versioning APIs — use them; do not hand-roll.

7. Regulatory timers as first-class citizens

**Failure mode it prevents:** missing a Reg B adverse action window because your cron host was down for 20 minutes.

  • Reg B adverse action, Reg Z disclosure timing, ECOA notification windows — these are clocks that **must fire even mid-incident**.
  • Model them as **durable timers inside the workflow engine**, not as cron jobs in a side service.
  • If the timer fires while workers are restarted, the engine triggers the right activity when it comes back up.

Putting it together

flowchart TD
    Start["Application created"] --> Saga["Application Saga (state + log)"]
    Saga --> FanOut["Parallel: bureau / income / fraud / KYC"]
    FanOut --> Gate["Gated fan-in (all required signals or timeout)"]
    Gate --> Decision["Decisioning agent"]
    Decision -->|approve| Compliance["Compliance + disclosures"]
    Decision -->|review| HITL["HITL signal awaited"]
    HITL --> Compliance
    Compliance --> Sign["E-sign signal awaited"]
    Sign --> Fund["Funding + close"]
    Saga --- Timers["Reg B / Reg Z / ECOA durable timers"]

Continue the series

Modernization note

Part 3 of 3 in the LinkedIn series on Building Lending Platform Orchestration. Reformatted with TL;DR, principle-by-principle "failure mode + fix" structure, code snippets, and a recap.