Armorer
Executive Brief
Problem: AI agents deployed in local environments degrade silently over time due to data drift, prompt injection, or dependency changes. Armorer users today manually inspect sessions to catch anomalies — a reactive approach that misses subtle behavioral shifts. Our Q3 2025 user survey (n=89) found 72% of teams experienced at least one critical agent failure in production, with mean detection time of 16.7 hours and $8.4K mean incident cost (source: Armorer support case analysis).
Business Case:
1,800 active agents × 0.5 undetected drifts/agent/year × $8.4K/incident = $7.56M/year recoverable loss
(Sources: agent count from Armorer telemetry (Aug 2025), drift frequency from Gartner "AI Failure Modes 2025", incident cost from internal case analysis)
If adoption reaches 40% of agents: $3.02M/year recoverable
This feature IS automated drift detection with forensic diffing for local AI agents. It is NOT real-time model monitoring, global performance tracking, or a replacement for CI/CD pipelines.
Success Metrics
Primary Metrics:
| Metric | Baseline | Target (D90) | Kill Threshold | Method |
|---|---|---|---|---|
| Drift detection lead time | 16.7h | ≤2h | >8h | Telemetry timestamps |
| Undetected drift incidents | 0.7/agent/yr | ≤0.2 | >0.5 | Support case correlation |
| False positive rate | N/A | ≤5% | >12% | User feedback logs |
Guardrail Metrics:
| Guardrail | Threshold | Action |
|---|---|---|
| Agent startup latency | ≤50ms increase | Roll back detection model |
| Local CPU overhead | ≤3% max | Throttle background scans |
What We Are NOT Measuring:
- Total alerts generated (measures noise, not value)
- "Drift score" amplitude (actionability matters, not magnitude)
- Features enabled/disabled (proxy for usability, not trust)
Risk Register
TECHNICAL: Metric poisoning via adversarial inputs
- Probability: Medium | Impact: High
- Mitigation: Sanitize tool call inputs + anomaly detection on metric distributions (Eng: Priya, by v1.1)
ADOPTION: Alert fatigue from threshold misconfiguration
- Probability: High | Impact: Medium
- Mitigation: Default "quiet hours" config + adaptive sensitivity tuning (PM: Alex, launch day)
COMPETITIVE: LangChain releases open-source drift detector
- Probability: Low | Impact: High
- Mitigation: Deep integration with Armorer's session replay (Eng: Marco, by v1.2)
COMPLIANCE: EU AI Act "high-risk" classification for drift controls
- Probability: Low | Impact: Critical
- Mitigation: Legal review for local data processing exemption (Counsel: Sophia, before beta)
Open Questions
Pre-Mortem:
"It is 6 months from now and this feature has failed. The 3 most likely reasons are:"
- Teams ignored thresholds they didn't set, letting critical drifts bypass triage
- Baseline snapshots bloated local storage, causing agents to fail on resource-constrained devices
- Competitor X open-sourced a CLI tool that does 80% of this at zero cost
Success looks like:
Support tickets for "agent acting weird" drop by 60%. Users reference drift scores in sprint retros. The CEO cites "catching DealScout's 4am drift" as proof we prevent AI failures before customers notice.
Assumptions vs Validated:
| Assumption | Status |
|---|---|
| Tool call sequences can be fingerprinted | ⚠ Unvalidated — needs PoC from ML Eng by 10/15 |
| Local storage can handle 90d baselines | ⚠ Unvalidated — benchmark on Raspberry Pi by 10/22 |
| EU data processing qualifies for "limited risk" | ⚠ Unvalidated — legal sign-off required by 11/30 |
Model Goals & KPIs
Core Objectives:
- Detect functional drift (tool call patterns, reasoning paths) not just statistical drift
- Surface actionable insights — not just alerts — with root cause hypotheses
- Preserve local execution privacy — no external data egress
Design Decisions:
- Decision: What constitutes "baseline" behavior?
Choice: First 72 hours post-deployment at >100 sessions (avoids cold-start artifacts)
Rationale: Rejected fixed-time windows (insufficient session diversity) and synthetic baselining (diverges from real use) - Decision: How to handle threshold configuration?
Choice: Auto-set initial thresholds using interquartile ranges, allow user override
Rationale: Rejected fully manual thresholds (causes setup friction) and immutable auto-thresholds (ignores business context)
Evaluation Framework
Quantified Baseline (source: Armorer agent telemetry, Aug 2025):
| Metric | Measured Baseline |
|---|---|
| Agent config changes/week | 2.3 avg (n=1,240 agents) |
| Undetected drift incidents | 0.7/agent/year (n=89 teams) |
| Mean time to detect drift | 16.7 hours (n=327 incidents) |
Recoverable value: 1,800 agents × 0.7 incidents × $8.4K = $10.6M/year
Drift Detection Coverage:
| Dimension | Phase 1 | Why Not |
|---|---|---|
| Multi-metric correlation | ❌ | Requires multivariate analysis (v1.2) |
| Tool output validation | ❌ | Needs semantic diffing (v1.3) |
| Context window saturation | ❌ | LLM-specific instrumentation required |
┌───────────────────────────────────────────────────────┐
│ Agent Drift Dashboard ⋮ ⚙ ⋯ │
├──────────────┬───────────────┬─────────┬─────────────┤
│ Agent │ Status │ Last │ Drift Score │
│──────────────┼───────────────┼─────────┼─────────────│
│ SupportBot │ ✅ Normal │ 2h ago │ 12 │
│ **DealScout**│ ⚠ **Drifting**│ **15m** │ **84** │
├──────────────┴───────────────┴─────────┴─────────────┤
│ [ DealScout: 3 metrics drifting ] │
│ • Approval rate: 22% (baseline: 8%) │
│ • Session duration: +142% │
│ • Token usage: +78% │
│ [Suggested Action] Roll back to config v3.1 │
└───────────────────────────────────────────────────────┘
Human-in-the-Loop Design
Mandatory Human Gates:
- Drift severity triage: Users classify alerts as "critical", "investigate", or "ignore" before clearing
- Baseline recertification: After 5 config changes, require manual "re-baseline" confirmation
Escalation Protocol:
- Auto-flag agents with >3 drift events in 7 days
- Require senior team member review before re-enabling
- Integrate with approval workflows — drifting agents pause at next manual gate
Trust & Guardrails
Trust Metrics:
- False positive rate: <5% (measured by user-marked "invalid" alerts)
- Mean time to diagnose: <15 minutes (stopwatch from alert to action)
Kill Criteria (90-day post-launch):
-
12% false positive rate sustained for 2 weeks
-
40% of users disable alerts without review
- Drift detection misses >1 critical incident with $50K+ impact