Skip to main content
Back to Projects

AI Chief of Staff

A citation-gated advisory MCP toolbelt that gives an AI coding agent grounded, source-backed tools across nine startup-operator domains, with a hard architectural rule that no tool may ever call a language model.

2026

Python
FastMCP
Pydantic
pgvector
Voyage AI
Supabase
mypy
pytest
  • 27 MCP tools across 9 startup-operator domains
  • 560 tests, mypy strict (zero untyped code), ruff clean across ~220 source files
  • Zero language-model calls inside any tool, enforced by an AST scan and a runtime guard in CI
  • Frozen Pydantic Output Contract with a 4-state confidence tier and worst-tier rollup
AI Chief of Staff project screenshot

System Architecture

AI Chief of Staff system architecture diagram

The Problem

Founders and solo operators field a constant stream of high-stakes, cross-domain questions, from equity-election deadlines to privacy obligations to cap-table math, where a confidently wrong answer is worse than no answer. General AI assistants answer all of them fluently and hallucinate some of them. The need was an operating partner an AI agent could consult where every legal or financial claim returns with a verifiable source or is explicitly marked unknown.

Approach

A FastMCP toolbelt that the reasoning agent (Claude Code) consults, built on one load-bearing rule: no tool may call a language model. Every tool is deterministic or retrieval-only and returns a cited, structured envelope, so a model can never launder an unsourced claim through a tool. The rule is enforced structurally in CI by an AST scan plus a runtime guard, not by convention.

Architecture

Claude Code is the reasoning brain; the Chief is a passive stdio MCP server exposing 27 tools across 9 domains (a core RAG spine, legal and finance, regulatory and privacy, intelligence, GTM, technical, and comms). Every tool returns a frozen Pydantic GroundedEnvelope of Claims, each with a 4-state confidence tier and a worst-tier rollup, gated by a finalize check. Retrieval grounds against a pgvector corpus of primary sources; an external compliance engine is reached through a circuit-breaker-guarded MCP client. A second in-process Agent-SDK surface carries the tools that cannot run over stdio.

Results

  • 27 MCP tools across 9 startup-operator domains
  • 560 tests, mypy strict (zero untyped code), ruff clean across ~220 source files
  • Zero language-model calls inside any tool, enforced by an AST scan and a runtime guard in CI
  • Frozen Pydantic Output Contract with a 4-state confidence tier and worst-tier rollup
  • Four sanctioned grounding-metadata patterns, each CI-enforced
  • Graceful degradation throughout: circuit-breaker on the external engine, in-memory fallback for discovery state, SSRF-guarded fetch

Lessons Learned

Putting a language model on the critical path is the easiest way to manufacture confident, unsourced answers. Moving all reasoning to the agent, keeping the tools deterministic, and enforcing that boundary with a build-breaking test is what makes a grounded advisory system trustworthy. Honest-uncertainty downgrades, where a legal tool returns insufficient information instead of asserting an exemption, matter more than raw coverage.