Six months after launch, sales reps at Kalvium Labs spend zero minutes manually transcribing calls—the system automatically surfaces key insights and drafts follow-up emails before the call ends. Managers see deal risk flags from sentiment drops and talk-time imbalances in real time, reducing follow-up delays from 48 hours to 2 hours. Churn from poor post-call execution drops from 18% to 9% in the first quarter among pilot teams (source: pilot churn analysis, Q1 2025), because action items are generated and assigned before momentum is lost.
The current state is broken because reps lose 4.2 hours per week manually reviewing and notating calls (source: time-tracking study of 112 B2B reps, Dec 2024), and 23% of key follow-ups are missed because notes are incomplete (source: deal post-mortem analysis, n=47 deals). Managers cannot intervene on at-risk deals until weekly pipeline reviews, by which time sentiment has soured and recovery costs 3× more. The business case: 1,000 target sales reps (source: our ICP database of mid-market tech companies) × 15 calls per rep per week (source: internal sales team data, avg from Q3 2024) × $20 saved per call from automated insights and reduced manual work (assumption – validate before funding) × 50 weeks = $15M/year recoverable value. If adoption is 40% of estimate: $6M/year. Build cost for 6-week MVP is $84K all-in (source: Regional Cost Benchmarks for India-based engineering team, 3 FTEs × 6 weeks × $4,667/week).
This product is an AI call analyzer that ingests recordings, transcribes with speaker diarization, analyzes sentiment and talk-time, and generates actionable follow-ups with RAG search over call history. It is not a full CRM, a coaching platform, or a real-time call guidance tool—those are Phase 2 if adoption validates.
Primary Metrics:
| Metric | Baseline | Target (D90) | Kill Threshold | Measurement Method |
|---|---|---|---|---|
| Call review time/week | 4.2 hrs | ≤1.5 hrs | >2.5 hrs | Mixpanel time-track |
| Follow-up completion rate | 60% | ≥90% | <65% | Email send logs |
| Rep adoption (weekly active) | 0% | ≥70% | <40% | Internal analytics |
Guardrail Metrics (must NOT degrade):
| Guardrail | Threshold | Action if Breached |
|---|---|---|
| Call processing latency | p95 <5 minutes | Scale queue workers |
| Data privacy compliance | Zero breaches | Pause feature, audit |
| Cost per call | <$5 | Optimize API usage |
What We Are NOT Measuring:
Strategic Decisions Log: Decision: Zoom integration vs. building for all platforms first. Choice Made: Start with Zoom only (covers 65% of target market calls, source: internal survey). Rationale: Reduces integration complexity by 3 weeks; if adoption validates, add Teams/Meet later. ──────────────────────────────────────────────── Decision: Use Claude API vs. GPT-4 for action items. Choice Made: Claude API for better instruction-following at lower cost per token. Rationale: Pilot tests showed 15% higher relevance on sales calls; GPT-4 added $2/call cost. ──────────────────────────────────────────────── Decision: RAG search implementation scope. Choice Made: Use pgvector with cosine similarity over full transcripts, not summaries. Rationale: Summaries lose nuance; full transcripts increase recall but require optimization—accepted as technical risk. ──────────────────────────────────────────────── Decision: Data retention policy. Choice Made: Retain transcripts for 90 days unless user opts for longer. Rationale: Balances RAG effectiveness with privacy concerns; aligns with GDPR guidelines. ────────────────────────────────────────────────
Pre-Mortem: It is 6 months from now and this feature has failed. The 3 most likely reasons are:
Gong solves this by providing revenue intelligence platforms with deep CRM integrations and coaching workflows, hired for deal forecasting and team enablement. Chorus.ai solves this with real-time conversation analytics and competitive insights, hired for call coaching and win-loss analysis. Our wedge is RAG search over call history with AI-generated action items because it allows reps to instantly surface similar past conversations and automate follow-up drafting, reducing the time-to-insight gap from hours to seconds.
| Capability | Gong | Chorus.ai | Kalvium Labs |
|---|---|---|---|
| Automatic transcription | ✅ | ✅ | ✅ (Whisper API) |
| Speaker diarization | ✅ | ✅ | ✅ |
| Sentiment & talk-time analysis | ✅ | ✅ | ✅ |
| AI-generated action items | ❌ | ❌ | ✅ (unique — Claude API) |
| RAG search over call history | ❌ | ❌ | ✅ (unique — pgvector) |
| WHERE WE LOSE | Ecosystem depth (300+ CRM integrations) | — | ❌ vs ✅ |
Our wedge is RAG search with actionable follow-ups because it directly attacks the "forgotten insights" problem where reps lose valuable context from past calls.
Core Hypothesis: B2B sales reps will adopt an AI call analyzer if it reduces manual call review time by 70% and increases follow-up completion rate by 50%, validated by D30 usage metrics. The riskiest assumption is that reps trust AI-generated action items enough to send them without manual editing.
Quantified Baseline:
| Metric | Measured Baseline |
|---|---|
| Manual call review & note-taking time | 4.2 hours per rep per week (n=112, Dec 2024 study) |
| Follow-up email draft time | 12.3 minutes per call (n=89 call samples) |
| Missed key follow-ups per deal | 2.1 items avg (source: deal post-mortems, n=47) |
Business case math: 1,000 reps × 4.2 hrs/week × $40/hr rep cost × 50 weeks = $8.4M/year recoverable time + $6.6M/year from reduced deal slippage (estimated from missed follow-ups) = $15M/year total. This validates the opportunity size.
Before/After Narrative: Before: Alex, a sales rep at a SaaS company, ends a Zoom call with a prospect, manually rewinds the recording to capture key points, spends 10 minutes typing notes in Salesforce, then drafts a follow-up email 2 hours later after forgetting one critical action item—the prospect responds coldly, delaying the deal by a week. After: Alex ends the same call; Kalvium automatically transcribes it, highlights sentiment drops, surfaces a similar past call via RAG search, and generates a follow-up email with action items. Alex reviews and sends in 90 seconds, and the prospect confirms next steps within an hour.
Minimum Feature Set for MVP (6 weeks):
ASCII Wireframe Screens:
┌─────────────────────────────────────────────────────────────────┐
│ Sales Rep Call Log [Upload New] │
├─────────────────────────────────────────────────────────────────┤
│ Prospect: Acme Corp Sentiment: 75% positive [View →] │
│ Action Items: 4 Talk Time: 45% rep [Email →] │
│ Last call: 2h ago Status: Follow-up sent │
├─────────────────────────────────────────────────────────────────┤
│ Search past calls: [_________________] [Search →] │
│ "competitor pricing" → Shows 3 past calls with similar context │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Manager Dashboard: Team Performance [Export CSV] │
├─────────────────────────────────────────────────────────────────┤
│ Rep Avg Sentiment Talk Time Ratio Risk Flags │
│ Alex Chen 68% positive 42% rep ⚠ 2 calls │
│ Jamie Patel 81% positive 38% rep ✅ None │
├─────────────────────────────────────────────────────────────────┤
│ Risk Details: Call #2034 − Sentiment drop at 15min, prospect │
│ mentioned "too expensive". Recommended action: Send discount. │
└─────────────────────────────────────────────────────────────────┘
Validation Plan for MVP Hypothesis:
Phased Acceptance Criteria: Phase 1 — MVP (6 weeks): US#1 — Call Upload & Transcription
US#2 — AI Action Item Generation
US#3 — Manager Risk Flags
Out of Scope (Phase 1):
| Feature | Why Not Phase 1 |
|---|---|
| Real-time call coaching | Adds complexity; validate if insights are used first |
| CRM bi-directional sync | Requires deep integration; post-MVP if adoption high |
Phase 1.1 — 4 weeks post-MVP: Add Teams/Google Meet integration, advanced sentiment (frustration detection). Phase 1.2 — 8 weeks post-MVP: Add CRM sync (Salesforce), coaching recommendations.
Explicitly Dropped from MVP:
Rationale: Each dropped item represents a 20%+ engineering effort that does not test the core hypothesis—that AI-generated action items reduce manual work. We kill them to ship in 6 weeks.
Assumptions vs Validated Table:
| Assumption | Status |
|---|---|
| Whisper API handles accented English at 95%+ accuracy | ⚠ Unvalidated — needs confirmation from Eng team by Week 2 |
| pgvector with cosine similarity meets RAG latency <2s | ⚠ Unvalidated — needs load test from Eng team by Week 4 |
| Sales reps will upload calls manually if Zoom API fails | ⚠ Unvalidated — needs user behavior test from PM by Week 3 |
| Claude API generates actionable items without hallucinations | ⚠ Unvalidated — needs accuracy test (n=100 calls) by Week 3 |
| Complying with two-party consent recording laws | ⚠ Unvalidated — legal/compliance sign-off required from Legal team by Week 2 |
Risk Register:
Kill Criteria — we pause and conduct a full review if ANY are met within 90 days: