How the AI builder actually works

People keep asking "how does it work." Short version: Claude. Long version is more interesting.

This is the build I sketched on a napkin in January 2026 and shipped in June. Here's what's inside.

The pipeline

A prompt goes through five stages. Three of them are AI-touched. Two are deterministic. Combined, they take about 90 seconds.

classify → generate (streaming) → review → create → open

Stage 1, classify, is a quick keyword + embeddings match against our template library. If your prompt is "menu for my pizza place" we know we have a restaurant menu template. We skip the AI entirely and instantiate the template. ~2 seconds.

Stage 2, generate, is where Claude does the work. A single tool-use call. We force the model to emit a structured Blueprint: sheet definitions (tabs, columns, types, sample rows), relationships (foreign-key-style links between sheets), and the full HTML for the app UI. We stream the response so you see the app materializing as it's built.

The streaming matters more than people realize. The build takes ~60 seconds. Watching it happen feels like 20. Staring at a spinner feels like 5 minutes. Same wall-clock, very different feeling.

Stage 3, review, shows you what's coming: a live preview of the app, the sheet structure underneath, a quality score, and a "Create it" button. You're not committed until you click Create. That preview pane was the single biggest UX win of the project. Users want to see what they're getting before they pay (in attention, not money — we already paid for Claude).

Stage 4, create, runs an atomic saga: create the Google Sheet in your Drive → write the project to our DB → seed sheet headers + sample rows → create an API key → render and persist the final HTML with runtime globals injected. Each step has a compensate() function. If step 7 fails, steps 1-6 get rolled back in reverse order. No half-created projects.

Stage 5, open, drops you into the hosted app at /app/your-slug. It works.

The honest hard parts

The HTML safety scanner. The model occasionally tries to inline external <script src> or use .innerHTML = variable (XSS risk). We scan every output and reject if it fails. But rejecting is bad UX — you waited 60 seconds for nothing. So we built a two-tier auto-repair:

For violations with an obvious safe fix (external scripts, javascript: hrefs, hardcoded secrets), we deterministically strip the offending nodes. Sub-millisecond.
For violations that need understanding (innerHTML, external fetches in non-trivial places), we make a second tight Claude call (~$0.02) that says "rewrite this with the violation removed." Then we rescan.

Most blocks get rescued. The ones that don't fail honestly with a friendly error and a retry button.

The truncation problem. Sometimes you ask for too much — five sheets, a calendar, a chart, three views — and the model runs out of output budget mid-HTML. The blueprint is half-written. The app doesn't render. We used to charge for this and watch users get furious. Now: if the model hits stop_reason=max_tokens before producing usable appHtml, we eat the cost, mark the job as OUTPUT_TRUNCATED, and tell the user "You weren't charged. Try splitting your request into smaller pieces."

The 6-minute refine problem. Refines used to re-emit the entire HTML document for a one-line change. Two minutes of streaming for "add a status column." We fixed this by extracting a compact app manifest at build time — a structured summary of screens, actions, sheets, and dangerous writes. Refines now send the manifest + a 15KB slice of HTML instead of the full 30-40KB document. ~80% context reduction, ~80% faster refines.

The free vs paid speed tradeoff. Sonnet streams at ~80 tok/s. With a 12K-token budget that's 2.5 minutes per build. Too slow for the wow moment. So free and starter tiers run on Haiku 4.5 (~180 tok/s). Pro and Enterprise stay on Sonnet for polish. There's an env var override if free-tier quality complaints come in: AI_BUILD_MODEL_FREE=claude-sonnet-4-6.

What we deliberately didn't build

Real-time collaboration in the generated app. That's a different product. Use Google Sheets if you need real-time editing.

Custom code blocks. The generated apps are vanilla HTML + Tailwind CDN + (optionally) React 18 UMD. No build step, no npm. The system prompt restricts external scripts to three CDNs. This is a deliberate constraint — it makes the safety scanner tractable.

A full visual editor. We have refine-via-prompt. You describe a change in English, the AI ships it. We considered shipping a visual drag-drop editor and decided against it. v0 and Bolt cover that lane. We compete on "your real data is attached from minute zero", and we don't want to dilute that.

What's next

Diff-based refines. The manifest has screens with stable IDs. We can send the AI only the relevant slice of HTML — not the elided middle, not the head, not the unrelated section. Goal: refines under 30 seconds.
Anonymous "try it" demo. Type a prompt on the homepage, no signup, see a real working app in 60 seconds. Costs us $0.30 per visitor; we'll cap and rate-limit it.
Batch & transaction API. Already shipped — POST /api/sheets/batch with up to 100 ops, all-validated-before-write. Required if you want to use this for order systems.

If you're a dev

The pipeline is documented in our internal handoff doc. The cost-guard, retry logic, and atomic-create saga are the most interesting bits. If you'd be useful on this team, my DMs are open.

— Ahmed