Building a 51-Jurisdiction Compliance SaaS on FastAPI, Supabase, and Anthropic Citations

Beverage-alcohol compliance in the US is a combinatorial problem: 50 states plus DC, two distribution channels (direct-to-consumer and three-tier wholesale), and constant rule changes, with every cell in that matrix carrying its own requirements. The single incumbent, Sovos ShipCompliant, sells eight separate products at enterprise prices, out of reach for the 82% of US wineries that make fewer than 5,000 cases a year. The same maze hits breweries, distilleries, and cideries.

Ratify is one platform that handles the full compliance lifecycle for beverage-alcohol producers across all 51 US jurisdictions (50 states plus DC). This post is about the architectural decisions behind it: a split FastAPI and Next.js backend, two-pass Citations extraction, a strict separation between deterministic rules and AI assistive work, and audit-grade reliability.

Why Split Architecture, Not a Vercel Monolith

The default modern starting point is a Next.js monolith on Vercel. I rejected that early. Four constraints made it the wrong fit:

Function duration ceiling. Vercel functions cap at 800 seconds. State tax filing batch jobs can run for many minutes. Some compliance workflows need to span tens of minutes.
No persistent workers. Regulatory monitoring needs scheduled jobs that run continuously, not invocation-by-invocation serverless functions.
Read-only filesystem. Report generation, COLA submissions, and similar workflows want a writable working directory.
No built-in API gateway. Enterprise customers expect rate limiting, API versioning, and middleware patterns. Bolting that on top of Next.js is a worse experience than getting it from FastAPI.

The split:

Railway / FastAPI (Python) carries the compliance engine, AI extraction pipeline, background workers, and integrations (Commerce7, ShipStation, FedEx). Standard Docker containers, no vendor lock-in.
Vercel / Next.js (TypeScript) carries the dashboard surface, marketing pages, and edge-cached server components.
Supabase Postgres holds tenant data behind Row-Level Security, with pgvector powering the regulatory RAG corpus.

Two deployment targets instead of one. Accepted cost for the flexibility gains. FastAPI gives the full Python AI tooling stack (Anthropic SDK, LiteLLM, Pydantic) without shoehorning into Node.js. Vercel stays for what Vercel is good at: fast SSR, edge caching, Server Components.

This is decision D-006 in the project's decision log.

AI Never in the Critical Path

The single most important architectural decision in this product runs against the AI-first default of 2026: AI sits next to the critical path, never inside it. Compliance checks and tax calculations are deterministic rules-engine operations. AI does what AI is good at: natural-language compliance questions, regulatory document extraction, expansion planning, and audit report synthesis. None of these block an order or a filing.

Why:

Compliance decisions must be 100% reliable, fast (under 100ms), and auditable. LLM latency (1-5 seconds typical) is incompatible with real-time order gating. LLM hallucination risk is incompatible with tax calculations. Regulatory requirements demand deterministic, reproducible results.
Reproducibility. The same order under the same rules must produce the same answer on every check. A non-deterministic system fails the most basic compliance test.
Failure isolation. An AI provider outage degrades assistive features (the natural-language assistant goes down) but never breaks the core product. Order gating and tax math keep running. Audit answers stay reproducible.

Decision D-009. The concrete shape:

Real-Time Order Compliance Gate is a pure rules engine. Looks up jurisdiction rules from Postgres, returns allowed/denied with a structured reason code.
Tax Calculation is a pure rules engine. Walks tax-rule lookup tables for jurisdiction × product type × channel.
AI workloads: natural-language compliance questions, regulatory document extraction, expansion advice, audit report synthesis. None block an order or a filing.

The positioning is the moat. Sovos's product line treats AI as a feature; this product treats determinism as the feature and AI as the augmentation.

flowchart TD
  accTitle: Ratify split architecture with AI beside the critical path
  accDescr: A producer dashboard on Vercel and Next.js calls a FastAPI service on Railway. The deterministic critical path runs the order compliance gate and tax calculation as pure rules engines in under 100 milliseconds against Supabase Postgres. AI workloads, two-pass Citations extraction and natural-language compliance questions, sit beside the critical path and never block an order; they route through a LiteLLM proxy with budget caps to Claude Sonnet and Haiku, backed by a pgvector regulatory corpus.
  client["Producer dashboard (Next.js on Vercel)"] --> api["FastAPI on Railway"]
  subgraph critical["Critical path: deterministic, under 100ms"]
    gate["Order compliance gate"]
    tax["Tax calculation"]
  end
  subgraph assistive["Beside the path: AI assistive, never blocking"]
    extract["Two-pass Citations extraction"]
    qa["Natural-language compliance Q and A"]
  end
  api --> gate
  api --> tax
  api --> extract
  api --> qa
  extract --> litellm["LiteLLM proxy (budget caps)"]
  qa --> litellm
  litellm --> claude["Claude Sonnet 4.6 / Haiku 4.5"]
  gate --> db[("Supabase Postgres: jurisdiction rules + RLS")]
  tax --> db
  extract --> db
  db -. "pgvector HNSW" .-> rag[("Regulatory RAG corpus")]

Two-Pass Citations Extraction

A compliance product is only useful if its rules are correct, and "correct" means traceable to a verbatim source. Naive single-pass LLM extraction produces plausible-looking JSON with no audit trail: no way to retrace which span of the source document produced which extracted value, no way to defend a conclusion in a compliance review, no way to ground a regression test.

The fix is two-pass extraction using Anthropic's Citations API plus Structured Outputs (decision D-049):

Pass 1 locates citation spans in the source document, exact verbatim quotes anchored to character offsets.
Pass 2 validates extracted values against the schema and against the cited spans returned by pass 1.
Ephemeral prompt caching with 5-minute TTL and 0.10x read-rate billing cuts cost ~10x for repeated source documents (state DOR pages are large and re-fetched frequently).
Model tiering: Claude Sonnet 4.6 for primary extraction quality, Claude Haiku 4.5 for cheap-path workloads where the document is small or pre-screened.

The Anthropic SDK call looks roughly like:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    system=[{"type": "text", "text": SYSTEM_PROMPT, "cache_control": {"type": "ephemeral", "ttl": "5m"}}],
    messages=[...],
    tools=[CITATIONS_TOOL, STRUCTURED_EXTRACT_TOOL],
)

Outcomes:

Every extracted compliance value carries a verbatim citation back to the source document. Defensible in audit. Replayable in CI.
Golden eval fixtures freeze the cited spans, not just the output JSON, so prompt drift surfaces immediately as a citation mismatch.
Higher per-call latency and cost than single-pass extract. Offset by prompt caching and Haiku-tier routing on cheap paths.

Wave 3 used this pattern to land judge-grade HIGH-confidence wine-DTC compliance extractors across all 51 US jurisdictions.

Multi-Tenant Isolation With pgvector and RLS

B2B SaaS requires per-tenant data isolation. pgvector HNSW indexes have one gotcha that's easy to get wrong: they return candidates before SQL WHERE filters apply. A naive query for "tax rules" could surface another tenant's documents in the candidate set before filtering them out, and you'd never know.

The fix has three layers (decision D-007):

Row-Level Security at the database layer. Every tenant-scoped table carries a tenant_id column with an RLS policy keyed off auth.uid(). Raw SQL access can't cross tenant boundaries.
Filter inside the search query, not as a post-filter. The hybrid search RPC applies the tenant filter inside the HNSW search, not after. This is the only safe pattern when the index returns candidates before WHERE.
Connection-level tenant context. set_config('app.tenant_id', ...) runs at request start, so RLS policies have the right tenant in scope for every downstream query.

Combined with jurisdiction-agnostic data modelling (D-010, jurisdiction_rules with a jurisdiction_type ENUM supporting state, county, city, country, territory), isolation holds even as the data model evolves to support new jurisdictions and beverage categories.

Wave 3: 51 Jurisdictions, Tier by Tier

The biggest extraction work was Wave 3: building HIGH-confidence wine direct-to-consumer compliance extractors for every US jurisdiction. 50 states plus DC.

The work progressed in four tiers (Tier-3A through Tier-3D), each adding a batch of jurisdictions until all 51 were live. Tier-3A established the pattern and later tiers reused it. Every jurisdiction shipped with a signed evidence document, a citations snapshot, and a golden eval fixture, and the fixtures gate regressions in CI.

Production Observability

Sentry for exception tracking, with PII scrubbing in middleware before any payload leaves the FastAPI process.
Daily and global LLM budget caps, enforced per tenant and platform-wide at the LiteLLM proxy layer before any provider request goes out.
9 GitHub Actions workflows for CI, nightly smoke tests, security scans, AgentShield, Copilot rereview automation, and dependabot auto-merge. The nightly smoke tests catch upstream regulatory-page format changes before customers do.

By the Numbers

285 commits across the FastAPI backend
54 Postgres migrations
1,273 tests
32 compliance rule keys
52 golden eval fixtures
9 GitHub Actions workflows
51 of 51 US jurisdictions with judge-grade HIGH-confidence wine DTC extractors
Production deployed: API on Railway, web on Vercel, Supabase us-east-1

Lessons

Determinism is the moat. Compliance products fail the moment they're non-reproducible. Putting AI next to the rules engine instead of inside it is the single most important call to get right.

Two-pass citations turn LLMs into auditable tools. Freezing the cited spans in golden fixtures, not just the output JSON, is what makes prompt drift fail loudly in CI instead of leaking a wrong rule into production. That is the difference between an AI-derived rule you can defend in an audit and one you can only hope is right.

A jurisdiction-agnostic data model costs nothing on day one. Using jurisdiction_rules with a typed ENUM instead of state_rules adds maybe ten characters of code. Refactoring away from state_rules after every rule lookup is built against it would cost weeks.

RLS is the only place tenant safety can actually live. Application-level checks are necessary but not sufficient: one missed WHERE and you've leaked data across tenants. Pushing isolation into the database, plus filtering inside the HNSW search instead of after it, makes that class of bug impossible rather than merely unlikely.

Tech Stack

Backend: FastAPI (Python 3.12) on Railway
Frontend: Next.js 16 on Vercel
Database: PostgreSQL on Supabase with pgvector HNSW and Row-Level Security
AI: Anthropic Claude Sonnet 4.6 (primary), Claude Haiku 4.5 (cheap-path)
Citations: Anthropic Citations API with ephemeral prompt caching (5m TTL, 0.10x read rate)
Model routing: LiteLLM proxy with per-tenant and global budget caps
Observability: Sentry with PII scrubbing
CI/CD: 9 GitHub Actions workflows (CI, nightly smoke, security scans, AgentShield, Copilot rereview, dependabot auto-merge)
Integrations: Commerce7 (DTC), ShipStation (fulfillment), FedEx (carrier compliance)