PRD-098·May 12, 2026

Armorer

Executive Brief

Problem: AI agents deployed in local environments degrade silently over time due to data drift, prompt injection, or dependency changes. Armorer users today manually inspect sessions to catch anomalies — a reactive approach that misses subtle behavioral shifts. Our Q3 2025 user survey (n=89) found 72% of teams experienced at least one critical agent failure in production, with mean detection time of 16.7 hours and $8.4K mean incident cost (source: Armorer support case analysis).

Business Case:
1,800 active agents × 0.5 undetected drifts/agent/year × $8.4K/incident = $7.56M/year recoverable loss
(Sources: agent count from Armorer telemetry (Aug 2025), drift frequency from Gartner "AI Failure Modes 2025", incident cost from internal case analysis)
If adoption reaches 40% of agents: $3.02M/year recoverable

This feature IS automated drift detection with forensic diffing for local AI agents. It is NOT real-time model monitoring, global performance tracking, or a replacement for CI/CD pipelines.

Success Metrics

Primary Metrics:

Metric	Baseline	Target (D90)	Kill Threshold	Method
Drift detection lead time	16.7h	≤2h	>8h	Telemetry timestamps
Undetected drift incidents	0.7/agent/yr	≤0.2	>0.5	Support case correlation
False positive rate	N/A	≤5%	>12%	User feedback logs

Guardrail Metrics:

Guardrail	Threshold	Action
Agent startup latency	≤50ms increase	Roll back detection model
Local CPU overhead	≤3% max	Throttle background scans

What We Are NOT Measuring:

Total alerts generated (measures noise, not value)
"Drift score" amplitude (actionability matters, not magnitude)
Features enabled/disabled (proxy for usability, not trust)

Risk Register

TECHNICAL: Metric poisoning via adversarial inputs

Probability: Medium | Impact: High
Mitigation: Sanitize tool call inputs + anomaly detection on metric distributions (Eng: Priya, by v1.1)

ADOPTION: Alert fatigue from threshold misconfiguration

Probability: High | Impact: Medium
Mitigation: Default "quiet hours" config + adaptive sensitivity tuning (PM: Alex, launch day)

COMPETITIVE: LangChain releases open-source drift detector

Probability: Low | Impact: High
Mitigation: Deep integration with Armorer's session replay (Eng: Marco, by v1.2)

COMPLIANCE: EU AI Act "high-risk" classification for drift controls

Probability: Low | Impact: Critical
Mitigation: Legal review for local data processing exemption (Counsel: Sophia, before beta)

Open Questions

Pre-Mortem:
"It is 6 months from now and this feature has failed. The 3 most likely reasons are:"

Teams ignored thresholds they didn't set, letting critical drifts bypass triage
Baseline snapshots bloated local storage, causing agents to fail on resource-constrained devices
Competitor X open-sourced a CLI tool that does 80% of this at zero cost

Success looks like:
Support tickets for "agent acting weird" drop by 60%. Users reference drift scores in sprint retros. The CEO cites "catching DealScout's 4am drift" as proof we prevent AI failures before customers notice.

Assumptions vs Validated:

Assumption	Status
Tool call sequences can be fingerprinted	⚠ Unvalidated — needs PoC from ML Eng by 10/15
Local storage can handle 90d baselines	⚠ Unvalidated — benchmark on Raspberry Pi by 10/22
EU data processing qualifies for "limited risk"	⚠ Unvalidated — legal sign-off required by 11/30

Model Goals & KPIs

Core Objectives:

Detect functional drift (tool call patterns, reasoning paths) not just statistical drift
Surface actionable insights — not just alerts — with root cause hypotheses
Preserve local execution privacy — no external data egress

Design Decisions:

Decision: What constitutes "baseline" behavior?
Choice: First 72 hours post-deployment at >100 sessions (avoids cold-start artifacts)
Rationale: Rejected fixed-time windows (insufficient session diversity) and synthetic baselining (diverges from real use)
Decision: How to handle threshold configuration?
Choice: Auto-set initial thresholds using interquartile ranges, allow user override
Rationale: Rejected fully manual thresholds (causes setup friction) and immutable auto-thresholds (ignores business context)

Evaluation Framework

Quantified Baseline (source: Armorer agent telemetry, Aug 2025):

Metric	Measured Baseline
Agent config changes/week	2.3 avg (n=1,240 agents)
Undetected drift incidents	0.7/agent/year (n=89 teams)
Mean time to detect drift	16.7 hours (n=327 incidents)

Recoverable value: 1,800 agents × 0.7 incidents × $8.4K = $10.6M/year

Drift Detection Coverage:

Dimension	Phase 1	Why Not
Multi-metric correlation	❌	Requires multivariate analysis (v1.2)
Tool output validation	❌	Needs semantic diffing (v1.3)
Context window saturation	❌	LLM-specific instrumentation required

┌───────────────────────────────────────────────────────┐
│ Agent Drift Dashboard                          ⋮ ⚙ ⋯ │
├──────────────┬───────────────┬─────────┬─────────────┤
│ Agent        │ Status        │ Last    │ Drift Score │
│──────────────┼───────────────┼─────────┼─────────────│
│ SupportBot   │ ✅ Normal     │ 2h ago  │ 12          │
│ **DealScout**│ ⚠ **Drifting**│ **15m** │ **84**      │
├──────────────┴───────────────┴─────────┴─────────────┤
│ [ DealScout: 3 metrics drifting ]                   │
│ • Approval rate: 22% (baseline: 8%)                 │
│ • Session duration: +142%                           │
│ • Token usage: +78%                                 │
│ [Suggested Action] Roll back to config v3.1         │
└───────────────────────────────────────────────────────┘

Human-in-the-Loop Design

Mandatory Human Gates:

Drift severity triage: Users classify alerts as "critical", "investigate", or "ignore" before clearing
Baseline recertification: After 5 config changes, require manual "re-baseline" confirmation

Escalation Protocol:

Auto-flag agents with >3 drift events in 7 days
Require senior team member review before re-enabling
Integrate with approval workflows — drifting agents pause at next manual gate

Trust & Guardrails

Trust Metrics:

False positive rate: <5% (measured by user-marked "invalid" alerts)
Mean time to diagnose: <15 minutes (stopwatch from alert to action)

Kill Criteria (90-day post-launch):

12% false positive rate sustained for 2 weeks
40% of users disable alerts without review
Drift detection misses >1 critical incident with $50K+ impact