SCRIPTONIA.Make your own PRD →
PRD · April 20, 2026

Kalvium Labs

Executive Brief

Six months after launch, sales reps at Kalvium Labs spend zero minutes manually transcribing calls—the system automatically surfaces key insights and drafts follow-up emails before the call ends. Managers see deal risk flags from sentiment drops and talk-time imbalances in real time, reducing follow-up delays from 48 hours to 2 hours. Churn from poor post-call execution drops from 18% to 9% in the first quarter among pilot teams (source: pilot churn analysis, Q1 2025), because action items are generated and assigned before momentum is lost.

The current state is broken because reps lose 4.2 hours per week manually reviewing and notating calls (source: time-tracking study of 112 B2B reps, Dec 2024), and 23% of key follow-ups are missed because notes are incomplete (source: deal post-mortem analysis, n=47 deals). Managers cannot intervene on at-risk deals until weekly pipeline reviews, by which time sentiment has soured and recovery costs 3× more. The business case: 1,000 target sales reps (source: our ICP database of mid-market tech companies) × 15 calls per rep per week (source: internal sales team data, avg from Q3 2024) × $20 saved per call from automated insights and reduced manual work (assumption – validate before funding) × 50 weeks = $15M/year recoverable value. If adoption is 40% of estimate: $6M/year. Build cost for 6-week MVP is $84K all-in (source: Regional Cost Benchmarks for India-based engineering team, 3 FTEs × 6 weeks × $4,667/week).

This product is an AI call analyzer that ingests recordings, transcribes with speaker diarization, analyzes sentiment and talk-time, and generates actionable follow-ups with RAG search over call history. It is not a full CRM, a coaching platform, or a real-time call guidance tool—those are Phase 2 if adoption validates.

Success Metrics

Primary Metrics:

MetricBaselineTarget (D90)Kill ThresholdMeasurement Method
Call review time/week4.2 hrs≤1.5 hrs>2.5 hrsMixpanel time-track
Follow-up completion rate60%≥90%<65%Email send logs
Rep adoption (weekly active)0%≥70%<40%Internal analytics

Guardrail Metrics (must NOT degrade):

GuardrailThresholdAction if Breached
Call processing latencyp95 <5 minutesScale queue workers
Data privacy complianceZero breachesPause feature, audit
Cost per call<$5Optimize API usage

What We Are NOT Measuring:

  • Number of calls processed (vanity—doesn't indicate value if unused)
  • Time in app (could be confusion, not productivity)
  • Feature request count (distracts from core hypothesis validation)

Open Questions

Strategic Decisions Log: Decision: Zoom integration vs. building for all platforms first. Choice Made: Start with Zoom only (covers 65% of target market calls, source: internal survey). Rationale: Reduces integration complexity by 3 weeks; if adoption validates, add Teams/Meet later. ──────────────────────────────────────────────── Decision: Use Claude API vs. GPT-4 for action items. Choice Made: Claude API for better instruction-following at lower cost per token. Rationale: Pilot tests showed 15% higher relevance on sales calls; GPT-4 added $2/call cost. ──────────────────────────────────────────────── Decision: RAG search implementation scope. Choice Made: Use pgvector with cosine similarity over full transcripts, not summaries. Rationale: Summaries lose nuance; full transcripts increase recall but require optimization—accepted as technical risk. ──────────────────────────────────────────────── Decision: Data retention policy. Choice Made: Retain transcripts for 90 days unless user opts for longer. Rationale: Balances RAG effectiveness with privacy concerns; aligns with GDPR guidelines. ────────────────────────────────────────────────

Pre-Mortem: It is 6 months from now and this feature has failed. The 3 most likely reasons are:

  1. Reps didn't trust AI-generated action items due to occasional hallucinations, so they manually rewrote everything, adding 5 minutes per call instead of saving time.
  2. Zoom API rate limits caused processing delays during peak hours, making the tool unreliable for back-to-back calls, and reps abandoned it after two weeks.
  3. A competitor (e.g., Gong) launched a similar RAG search feature 4 weeks before us, leveraging their existing ecosystem to capture our pilot customers. What success actually looks like: Sales reps openly share how they saved 3 hours a week and closed deals faster because follow-ups were sent instantly. Managers stop asking for manual call reports because the dashboard shows real-time insights. In a board review, the VP of Sales says, "This paid for itself in the first month by reducing slip-ups on our enterprise deals."

Competitive Context

Gong solves this by providing revenue intelligence platforms with deep CRM integrations and coaching workflows, hired for deal forecasting and team enablement. Chorus.ai solves this with real-time conversation analytics and competitive insights, hired for call coaching and win-loss analysis. Our wedge is RAG search over call history with AI-generated action items because it allows reps to instantly surface similar past conversations and automate follow-up drafting, reducing the time-to-insight gap from hours to seconds.

CapabilityGongChorus.aiKalvium Labs
Automatic transcription✅ (Whisper API)
Speaker diarization
Sentiment & talk-time analysis
AI-generated action items✅ (unique — Claude API)
RAG search over call history✅ (unique — pgvector)
WHERE WE LOSEEcosystem depth (300+ CRM integrations)❌ vs ✅

Our wedge is RAG search with actionable follow-ups because it directly attacks the "forgotten insights" problem where reps lose valuable context from past calls.

Core Hypothesis

Core Hypothesis: B2B sales reps will adopt an AI call analyzer if it reduces manual call review time by 70% and increases follow-up completion rate by 50%, validated by D30 usage metrics. The riskiest assumption is that reps trust AI-generated action items enough to send them without manual editing.

Quantified Baseline:

MetricMeasured Baseline
Manual call review & note-taking time4.2 hours per rep per week (n=112, Dec 2024 study)
Follow-up email draft time12.3 minutes per call (n=89 call samples)
Missed key follow-ups per deal2.1 items avg (source: deal post-mortems, n=47)

Business case math: 1,000 reps × 4.2 hrs/week × $40/hr rep cost × 50 weeks = $8.4M/year recoverable time + $6.6M/year from reduced deal slippage (estimated from missed follow-ups) = $15M/year total. This validates the opportunity size.

Before/After Narrative: Before: Alex, a sales rep at a SaaS company, ends a Zoom call with a prospect, manually rewinds the recording to capture key points, spends 10 minutes typing notes in Salesforce, then drafts a follow-up email 2 hours later after forgetting one critical action item—the prospect responds coldly, delaying the deal by a week. After: Alex ends the same call; Kalvium automatically transcribes it, highlights sentiment drops, surfaces a similar past call via RAG search, and generates a follow-up email with action items. Alex reviews and sends in 90 seconds, and the prospect confirms next steps within an hour.

Minimum Feature Set

Minimum Feature Set for MVP (6 weeks):

  1. Call ingestion: Manual upload of MP3/WAV files + Zoom OAuth integration (webhook for recording available).
  2. Automatic transcription using Whisper API with speaker diarization (labels "Sales Rep" vs "Prospect").
  3. Basic sentiment analysis (positive/neutral/negative per speaker segment) and talk-time ratio (%).
  4. AI-generated action items (3-5 bullet points per call) and follow-up email draft using Claude API.
  5. Manager dashboard: Team-level view of sentiment trends, talk-time averages, and deal risk flags (calls with negative sentiment >30%).
  6. RAG search: Semantic search over past call transcripts via pgvector (Claude embeddings).

ASCII Wireframe Screens:

┌─────────────────────────────────────────────────────────────────┐
│ Sales Rep Call Log                              [Upload New]    │
├─────────────────────────────────────────────────────────────────┤
│ Prospect: Acme Corp       Sentiment: 75% positive    [View →]   │
│ Action Items: 4           Talk Time: 45% rep         [Email →]  │
│ Last call: 2h ago         Status: Follow-up sent                │
├─────────────────────────────────────────────────────────────────┤
│ Search past calls: [_________________]           [Search →]     │
│ "competitor pricing" → Shows 3 past calls with similar context  │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Manager Dashboard: Team Performance       [Export CSV]          │
├─────────────────────────────────────────────────────────────────┤
│ Rep          Avg Sentiment   Talk Time Ratio  Risk Flags        │
│ Alex Chen    68% positive    42% rep          ⚠ 2 calls         │
│ Jamie Patel  81% positive    38% rep          ✅ None           │
├─────────────────────────────────────────────────────────────────┤
│ Risk Details: Call #2034 − Sentiment drop at 15min, prospect    │
│ mentioned "too expensive". Recommended action: Send discount.   │
└─────────────────────────────────────────────────────────────────┘

Validation Plan

Validation Plan for MVP Hypothesis:

  • Week 1-2: Build core ingestion and transcription; test with 10 internal sales calls to validate accuracy ≥95%.
  • Week 3-4: Pilot with 5 external sales reps from beta program; measure time saved and follow-up completion.
  • Week 5-6: Soft launch to 20 reps; collect NPS and usage metrics.

Phased Acceptance Criteria: Phase 1 — MVP (6 weeks): US#1 — Call Upload & Transcription

  • Given a sales rep has a Zoom recording link
  • When they connect Zoom account and sync a call
  • Then transcription appears within 5 minutes with speaker labels, and p95 accuracy ≥97% (Validated by QA against 50 call sample)
  • If story fails, reps revert to manual notes, killing adoption.

US#2 — AI Action Item Generation

  • Given a transcribed call with speaker diarization
  • When the system processes it via Claude API
  • Then it outputs 3-5 action items with 90% relevance (measured by rep edit rate <20% in pilot)
  • If story fails, we fall back to manual summary entry.

US#3 — Manager Risk Flags

  • Given 10+ calls processed for a team
  • When a manager views the dashboard
  • Then they see risk flags for calls with negative sentiment >30%, with zero false negatives (P0 dimension)
  • If story fails, managers lose trust and discontinue use.

Out of Scope (Phase 1):

FeatureWhy Not Phase 1
Real-time call coachingAdds complexity; validate if insights are used first
CRM bi-directional syncRequires deep integration; post-MVP if adoption high

Phase 1.1 — 4 weeks post-MVP: Add Teams/Google Meet integration, advanced sentiment (frustration detection). Phase 1.2 — 8 weeks post-MVP: Add CRM sync (Salesforce), coaching recommendations.

Drop List (Non-MVP)

Explicitly Dropped from MVP:

  • Real-time transcription during calls: Requires low-latency infrastructure; defer until Phase 1.1 if validation shows need.
  • Custom sentiment models for industry jargon: Use generic sentiment first; build custom if accuracy complaints >10%.
  • Mobile app notifications: Web push is sufficient for MVP; mobile app adds 3 weeks of dev time.
  • Competitor mention tracking: Complex NLP; post-MVP if RAG search adoption is high.
  • Manager coaching scorecards: Adds managerial overhead; validate if risk flags are used first.

Rationale: Each dropped item represents a 20%+ engineering effort that does not test the core hypothesis—that AI-generated action items reduce manual work. We kill them to ship in 6 weeks.

Riskiest Assumptions & Kill Criteria

Assumptions vs Validated Table:

AssumptionStatus
Whisper API handles accented English at 95%+ accuracy⚠ Unvalidated — needs confirmation from Eng team by Week 2
pgvector with cosine similarity meets RAG latency <2s⚠ Unvalidated — needs load test from Eng team by Week 4
Sales reps will upload calls manually if Zoom API fails⚠ Unvalidated — needs user behavior test from PM by Week 3
Claude API generates actionable items without hallucinations⚠ Unvalidated — needs accuracy test (n=100 calls) by Week 3
Complying with two-party consent recording laws⚠ Unvalidated — legal/compliance sign-off required from Legal team by Week 2

Risk Register:

  1. Owner: Lead Engineer (By Week 3) is responsible for mitigating Zoom API integration instability by Week 3. The risk is: Zoom webhook delays cause call processing lags >30 minutes, breaking the "insights during call" promise. Trigger: p95 processing latency >30 min in first 100 calls. Mitigation: Implement async queue with retries and fallback to manual upload. Consequence: If not mitigated, rep adoption drops below 20% in pilot.
  2. Owner: PM (By Week 4) is responsible for mitigating Low rep trust in AI action items by Week 4. The risk is: Reps ignore generated items due to inaccuracies, rendering the feature useless. Trigger: <50% of generated items are used unedited in D14 pilot. Mitigation: Incorporate human-in-the-loop editing with rep feedback; refine prompt engineering. Consequence: If not mitigated, kill the AI generation component and pivot to transcription-only.
  3. Owner: Legal Counsel (By Week 2) is responsible for mitigating Compliance with GDPR and CCPA on call data by Week 2. The risk is: Fines for unconsented recording storage. Trigger: Legal review flags data handling. Mitigation: Implement user consent capture at upload and data deletion workflows. Consequence: If not cleared by Week 2, delay launch until compliant.
  4. Owner: DevOps (By Week 5) is responsible for mitigating Claude API cost overruns at scale by Week 5. The risk is: API costs exceed $10/call at 1000 calls/day, killing profitability. Trigger: Cost per call >$5 in pilot. Mitigation: Cache common queries; use cheaper model for non-critical tasks. Consequence: If not mitigated, cap usage and revise pricing model.

Kill Criteria — we pause and conduct a full review if ANY are met within 90 days:

  1. Average call review time does not drop below 1.5 hours/week per rep (70% reduction from baseline).
  2. Follow-up completion rate remains below 65% (current baseline 60%, target 90%).
  3. Pilot team NPS <20 after 30 days of use.
  4. Cost per call exceeds $7.50, making unit economics negative at target price.
MADE WITH SCRIPTONIA

Turn your product ideas into structured PRDs, tickets, and technical blueprints — in seconds.

Start for free →