SCRIPTONIA.Make your own PRD →
PRD-098·May 12, 2026

Armorer

Executive Brief

Problem: AI agents deployed in local environments degrade silently over time due to data drift, prompt injection, or dependency changes. Armorer users today manually inspect sessions to catch anomalies — a reactive approach that misses subtle behavioral shifts. Our Q3 2025 user survey (n=89) found 72% of teams experienced at least one critical agent failure in production, with mean detection time of 16.7 hours and $8.4K mean incident cost (source: Armorer support case analysis).

Business Case:
1,800 active agents × 0.5 undetected drifts/agent/year × $8.4K/incident = $7.56M/year recoverable loss
(Sources: agent count from Armorer telemetry (Aug 2025), drift frequency from Gartner "AI Failure Modes 2025", incident cost from internal case analysis)
If adoption reaches 40% of agents: $3.02M/year recoverable

This feature IS automated drift detection with forensic diffing for local AI agents. It is NOT real-time model monitoring, global performance tracking, or a replacement for CI/CD pipelines.

Success Metrics

Primary Metrics:

MetricBaselineTarget (D90)Kill ThresholdMethod
Drift detection lead time16.7h≤2h>8hTelemetry timestamps
Undetected drift incidents0.7/agent/yr≤0.2>0.5Support case correlation
False positive rateN/A≤5%>12%User feedback logs

Guardrail Metrics:

GuardrailThresholdAction
Agent startup latency≤50ms increaseRoll back detection model
Local CPU overhead≤3% maxThrottle background scans

What We Are NOT Measuring:

  1. Total alerts generated (measures noise, not value)
  2. "Drift score" amplitude (actionability matters, not magnitude)
  3. Features enabled/disabled (proxy for usability, not trust)

Risk Register

TECHNICAL: Metric poisoning via adversarial inputs

  • Probability: Medium | Impact: High
  • Mitigation: Sanitize tool call inputs + anomaly detection on metric distributions (Eng: Priya, by v1.1)

ADOPTION: Alert fatigue from threshold misconfiguration

  • Probability: High | Impact: Medium
  • Mitigation: Default "quiet hours" config + adaptive sensitivity tuning (PM: Alex, launch day)

COMPETITIVE: LangChain releases open-source drift detector

  • Probability: Low | Impact: High
  • Mitigation: Deep integration with Armorer's session replay (Eng: Marco, by v1.2)

COMPLIANCE: EU AI Act "high-risk" classification for drift controls

  • Probability: Low | Impact: Critical
  • Mitigation: Legal review for local data processing exemption (Counsel: Sophia, before beta)

Open Questions

Pre-Mortem:
"It is 6 months from now and this feature has failed. The 3 most likely reasons are:"

  1. Teams ignored thresholds they didn't set, letting critical drifts bypass triage
  2. Baseline snapshots bloated local storage, causing agents to fail on resource-constrained devices
  3. Competitor X open-sourced a CLI tool that does 80% of this at zero cost

Success looks like:
Support tickets for "agent acting weird" drop by 60%. Users reference drift scores in sprint retros. The CEO cites "catching DealScout's 4am drift" as proof we prevent AI failures before customers notice.

Assumptions vs Validated:

AssumptionStatus
Tool call sequences can be fingerprinted⚠ Unvalidated — needs PoC from ML Eng by 10/15
Local storage can handle 90d baselines⚠ Unvalidated — benchmark on Raspberry Pi by 10/22
EU data processing qualifies for "limited risk"⚠ Unvalidated — legal sign-off required by 11/30

Model Goals & KPIs

Core Objectives:

  1. Detect functional drift (tool call patterns, reasoning paths) not just statistical drift
  2. Surface actionable insights — not just alerts — with root cause hypotheses
  3. Preserve local execution privacy — no external data egress

Design Decisions:

  • Decision: What constitutes "baseline" behavior?
    Choice: First 72 hours post-deployment at >100 sessions (avoids cold-start artifacts)
    Rationale: Rejected fixed-time windows (insufficient session diversity) and synthetic baselining (diverges from real use)
  • Decision: How to handle threshold configuration?
    Choice: Auto-set initial thresholds using interquartile ranges, allow user override
    Rationale: Rejected fully manual thresholds (causes setup friction) and immutable auto-thresholds (ignores business context)

Evaluation Framework

Quantified Baseline (source: Armorer agent telemetry, Aug 2025):

MetricMeasured Baseline
Agent config changes/week2.3 avg (n=1,240 agents)
Undetected drift incidents0.7/agent/year (n=89 teams)
Mean time to detect drift16.7 hours (n=327 incidents)

Recoverable value: 1,800 agents × 0.7 incidents × $8.4K = $10.6M/year

Drift Detection Coverage:

DimensionPhase 1Why Not
Multi-metric correlationRequires multivariate analysis (v1.2)
Tool output validationNeeds semantic diffing (v1.3)
Context window saturationLLM-specific instrumentation required
┌───────────────────────────────────────────────────────┐
│ Agent Drift Dashboard                          ⋮ ⚙ ⋯ │
├──────────────┬───────────────┬─────────┬─────────────┤
│ Agent        │ Status        │ Last    │ Drift Score │
│──────────────┼───────────────┼─────────┼─────────────│
│ SupportBot   │ ✅ Normal     │ 2h ago  │ 12          │
│ **DealScout**│ ⚠ **Drifting**│ **15m** │ **84**      │
├──────────────┴───────────────┴─────────┴─────────────┤
│ [ DealScout: 3 metrics drifting ]                   │
│ • Approval rate: 22% (baseline: 8%)                 │
│ • Session duration: +142%                           │
│ • Token usage: +78%                                 │
│ [Suggested Action] Roll back to config v3.1         │
└───────────────────────────────────────────────────────┘

Human-in-the-Loop Design

Mandatory Human Gates:

  • Drift severity triage: Users classify alerts as "critical", "investigate", or "ignore" before clearing
  • Baseline recertification: After 5 config changes, require manual "re-baseline" confirmation

Escalation Protocol:

  1. Auto-flag agents with >3 drift events in 7 days
  2. Require senior team member review before re-enabling
  3. Integrate with approval workflows — drifting agents pause at next manual gate

Trust & Guardrails

Trust Metrics:

  1. False positive rate: <5% (measured by user-marked "invalid" alerts)
  2. Mean time to diagnose: <15 minutes (stopwatch from alert to action)

Kill Criteria (90-day post-launch):

  1. 12% false positive rate sustained for 2 weeks

  2. 40% of users disable alerts without review

  3. Drift detection misses >1 critical incident with $50K+ impact
Made with Scriptonia

Turn any product idea into a complete PRD in under 30 seconds.

Try free →

No account needed