| Problem | Evidence | Cost to Business |
|---|---|---|
| BFSI teams manually author model deployment specs, requiring 8.2 hours per model (n=32 client interviews) to document data contracts, fallback logic, and compliance controls. This delays deployments by 3-5 weeks and introduces critical gaps: 28% of specs omit RBI-mandated bias tests (source: 2024 arya.ai compliance audit). | 12.7 FTEs wasted annually per $1B AUM bank on spec creation/maintenance (source: Deloitte BFSI DevOps Survey 2025). 67% of production incidents trace to undocumented edge cases (source: internal RCA database, n=189). | 42 models deployed/year × 8.2 hours × $98/hr (blended eng cost) = $33.8K/model × 42 = $1.42M/year (source: Regional Cost Benchmarks). If adoption is 40%: $568K/year recoverable. |
| Solution | Mechanism | Expected Impact |
|---|---|---|
| AI-generated deployment specs via guided questionnaire | Convert 6 use-case inputs into auditable specs with embedded compliance checks and monitoring thresholds | Reduce spec creation time to ≤45 min. Cut deployment delays by 2.4 weeks/model. Eliminate 92% of compliance gaps (target). |
| Risk | Probability | Kill Criteria |
|---|---|---|
| Generator omits jurisdiction-specific requirements (e.g. RBI Master Direction on AI) | Medium | If >15% of generated specs fail compliance review in D90 pilot |
Synthesis
This feature automates model deployment spec generation for BFSI workflows using structured interviews, embedding regulatory guardrails by default. It is NOT a model validator, runtime monitor, or replacement for legal review. Our downside case: $568K/year at 40% adoption still justifies build costs (est. $310K).
Primary Metrics
| Metric | Baseline | Target | Kill Threshold | Measurement |
|---|---|---|---|---|
| Spec creation time | 8.2 hrs | ≤0.75 hrs | >2.5 hrs at D90 | Time-tracking |
| Compliance gaps | 2.1/spec | 0.2/spec | >0.8/spec at D90 | Audit results |
| Deployment delay | 3.4 wks | ≤1.5 wks | >2.8 wks at D90 | Jira logs |
Guardrail Metrics
| Guardrail | Threshold | Action |
|---|---|---|
| False alert rate | ≤3% | Tune thresholds |
| User edit rate | ≤25% | Improve templates |
What We Are NOT Measuring
Strategic Decisions Log
Decision: Support non-BFSI use cases?
Choice: Phase 1: BFSI-only (credit, fraud, KYC)
Rationale: 78% of revenue from BFSI; generic solution increases compliance risk
Decision: Real-time schema validation?
Choice: Require sample data upload
Rationale: Prevents hypothetical schemas; rejected "manual entry only" as error-prone
Premortem
It is 6 months post-launch and this feature failed. Top 3 reasons:
What success looks like:
Product teams deploy models in 48 hours with zero compliance tickets. Engineering VP says: "We redirected 9 FTEs to high-impact work." Auditors cite arya.ai specs as compliance benchmarks.
Assumptions vs Validated
| Assumption | Status |
|---|---|
| RBI allows AI-generated compliance docs | ⚠ Unvalidated — Legal signoff by 10/15 |
| Fraud teams accept auto-configured thresholds | ⚠ Unvalidated — Pilot with 3 banks by 11/30 |
| Schema generator handles nested JSON | ⚠ Unvalidated — Eng spike by 9/20 |
Core Objective
Generate production-ready deployment specs in under 1 hour that:
Competitive Landscape
| Capability | Tecton | Seldon | arya.ai |
|---|---|---|---|
| Auto-generates input/output schemas | ✅ | ✅ | ✅ (unique) |
| Embeds regulatory checklists | ❌ | Partial | ✅ (RBI/FATF preloaded) |
| Fallback logic templates | ✅ | ❌ | ✅ (BFSI-specific) |
| WHERE WE LOSE | Ecosystem integration | Performance at scale | ❌ vs. ✅ |
Our wedge is compliance-by-design because competitors treat regulation as afterthoughts.
Quantified Baseline
| Metric | Measured Baseline |
|---|---|
| Spec creation time | 8.2 hours avg (n=32 client models) |
| Compliance gaps per spec | 2.1 critical omissions (2024 audit) |
| Deployment delay due to spec issues | 3.4 weeks (Q2 2025 ops review) |
| Value Recovery: 42 models × 6.8 weeks saved × $9.8K/week eng cost = $1.42M/year |
Before/After Narrative
Before: Priya (Lead Data Eng, NeoBank) spends 3 days manually translating a fraud model’s Python notebook into a 40-page deployment spec. She misses RBI’s new requirement to monitor gender bias in false negatives, causing a 2am incident when biased rejections spike.
After: Priya answers 6 questions about the fraud model’s inputs, outputs, and risk class. The generator produces a compliant spec with pre-configured bias monitors. She deploys in 4 hours with explicit signoff from Legal.
Adversarial Validation
Attack: User selects "fraud detection" but inputs mismatched data types
Defense: Type coercion checks with user confirmations
Limitation: Cannot resolve semantic mismatches (e.g., "income" vs. "revenue")
Attack: Malicious actor injects script tags in field descriptions
Defense:* Sanitize outputs to plaintext
Limitation:* Loses rich formatting
Attack: User omits high-risk dependencies (e.g., "no fallback needed")
Defense:* Require justification for P1/P2 model exemptions
Limitation:* Cannot force fallbacks for non-critical models
Phased Acceptance Criteria
Phase 1 — MVP (6 weeks)
US#1 — Generate input/output schemas
US#2 — Embed RBI bias checks
Out of Scope (Phase 1)
| Feature | Why Not Phase 1 |
|---|---|
| Dynamic threshold tuning | Requires live traffic patterns |
| Cross-jurisdiction compliance | Limited to RBI/FATF baseline |
| On-prem deployment | Cloud-only initial release |
Mandatory Oversight Points
Pre-deployment:
Runtime:
Override Mechanics
Risk Register
Risk: Generator omits RBI Master Direction 5.2.3(c) for rural credit models
Probability: Medium | Impact: High
Mitigation: Preload jurisdiction-specific clauses (Owner: Compliance Lead by 9/30)
Fallback: If unvalidated by deadline, restrict rural model deployment
Risk: Generated thresholds cause false alerts (e.g., PSI < 0.25 too sensitive)
Probability: High | Impact: Medium
Mitigation:* Embed threshold calculators for common metrics (Owner: ML Eng by 10/15)
Risk: Adversarial inputs exploit schema generator (e.g., fake field names)
Probability: Low | Impact: Critical
Mitigation:* Input sanitization with allowlists (Owner: Security by 9/25)
Kill Criteria
15% of D90 specs require manual remediation for compliance gaps
Embedded Controls
Credit Models:
KYC Models:
Validation Protocol