AI features require PRD sections that don't exist in any standard template — and missing them is the primary cause of AI feature rework. The most commonly absent: model behavior boundaries (what the AI should and should not do), output quality success metrics (how you measure if the AI is good), and failure mode policies (what happens when the AI gets it wrong). These gaps are why most AI feature launches have a "v2 rewrite" six weeks in.
"Every AI feature we've shipped that went well had a detailed spec on model behavior boundaries and a defined quality measurement process. Every one that went poorly didn't. That correlation is not subtle."
— Arjun P., AI Product Lead at a B2B SaaS company
AI-specific sections to add to every AI feature PRD
Model behavior boundaries: What should the AI do? What should it explicitly refuse to do? What's the behavior when input is outside the expected distribution? Document this like a policy — specific and enumerable, not aspirational.
Output quality definition: How do you measure whether the AI output is good? For text generation: a rubric. For classification: precision/recall targets. For recommendation: click-through rate, acceptance rate. Define this before building — don't discover it in post-launch review.
Failure mode policy: When the AI fails (low confidence, ambiguous input, out-of-distribution request), what does the user experience? Graceful degradation? Human fallback? Error message? Specify the exact UX for each failure mode.
Human oversight: Is human review required before AI output is shown to users? For any AI feature that could cause harm, specify the moderation layer explicitly.
AI feature edge cases
| Edge case | Example | Expected behavior |
|---|---|---|
| Low-confidence output | Model confidence <70% | Show "We're not sure — verify this" label, offer alternatives |
| Ambiguous input | User input could mean two different things | Clarifying question or show two output options |
| Out-of-distribution input | User pastes non-English text into English-only feature | Detect language, show "Currently English-only" message |
| Hallucination risk area | Feature asks AI to generate specific data (dates, names) | Ground with retrieval, add verification step, display source |
| User disagrees with AI output | User wants to override the suggestion | Override is always available, override events are logged |
AI success metrics that actually measure quality
Vanity metrics for AI: requests made, outputs generated. Signal metrics: output acceptance rate (% of AI suggestions the user keeps vs. discards), edit distance (how much users modify AI output before using it), task completion rate with vs. without AI assist, and qualitative satisfaction score (CSAT on AI output quality).