Insights
Prompt Design · 6 min

Prompts that survive contact with users.

The patterns we use to make agents robust to messy real-world inputs.

Madriss Seksaoui · 6 min read · Feb 18, 2026

Your prompt works because nobody real has typed into it yet

Every prompt looks brilliant in the playground. You wrote it, you know what the agent is supposed to do, and you test it with inputs that match your mental model. Then it ships, and the first real user types "hi can u tell me wht my balanc is i forgot my email it's like jess smth" and the agent confidently invents an answer about something else entirely.

The gap between a demo prompt and a production prompt is the gap between "the model can do this when asked nicely" and "the model still does this when the input is broken." Here are the patterns we lean on.

Pattern 1: Schema-shaped outputs, always

If the output of your prompt feeds another system — a tool call, a UI, another agent — it must be structured. Not "please return JSON." Use the model's native structured output mode (JSON Schema for OpenAI, tool use for Claude, response schema for Gemini). This isn't a styling preference; it's a robustness lever.

const schema = {
  type: "object",
  properties: {
    intent: { type: "string", enum: ["balance", "transfer", "support", "unknown"] },
    confidence: { type: "number", minimum: 0, maximum: 1 },
    needs_clarification: { type: "boolean" },
    clarification_question: { type: "string" },
  },
  required: ["intent", "confidence", "needs_clarification"],
};

The unknown enum value plus the needs_clarification boolean is the trick. You're giving the model a socially acceptable way to admit it doesn't know. If you don't give it that exit, it will invent.

Pattern 2: Few-shot the failure modes, not the happy path

The instinct is to fill your prompt with examples of the agent doing the right thing. The leverage is in the opposite direction — show the model what to do when the input is broken.

A real example from one of our deployments:

INPUT: "i need to cancl the thing from last week"
OUTPUT: {
  "intent": "support",
  "confidence": 0.4,
  "needs_clarification": true,
  "clarification_question": "Could you tell me what you'd like to cancel? Was it a subscription, an order, or something else?"
}

INPUT: "yo"
OUTPUT: {
  "intent": "unknown",
  "confidence": 0.1,
  "needs_clarification": true,
  "clarification_question": "Hi! How can I help you today?"
}

Three or four of these in a system prompt does more for production robustness than ten happy-path examples.

Pattern 3: Constrain the surface area, then expand

The most common failure mode we see in founder prompts is too much capability surface. The prompt says "you are a helpful AI assistant that can do anything" and then they're surprised when the agent goes off-script. Start the opposite way:

This is the equivalent of writing a state machine. Boring, yes. But boring is the whole point.

Pattern 4: Test with the worst input you can produce

Your eval set should include:

If your prompt only works on clean inputs, it doesn't work.

Takeaway

A prompt is not a piece of natural language. It's a tiny program with a fuzzy interpreter. Schema-shape the outputs, give the model a way to say it doesn't know, few-shot the failure cases, and start narrow. Do those four things and your prompts will survive contact with users — or at least limp through it gracefully.

Insights