Hiring managers at AI-first startups waste 1,800 hours/year manually reviewing candidates—reading resumes, intuiting fit, and drafting interview questions without structured frameworks (source: rightfit.so time-tracking study, n=112 managers, Q1 2025). At 50 candidates/role and 8 roles/year, this consumes 15 minutes/candidate—delaying hires and causing 32% inconsistent evaluations (source: internal audit of 500 rejected candidates).
The business case:
2,000 active hiring managers × 400 candidate reviews/year × $0.83/minute (blended $50/hr rate) × 13 minutes saved = $8.66M/year recoverable time
(Source: Manager count from CRM, review volume from 2024 user survey, wage from AngelList startup comp benchmarks)
If adoption is 40% of estimate: $3.46M/year
This excludes $220K/hire cost of bad-fit candidates avoided through structured red-flag detection.
This is an AI-generated candidate brief delivering skill match scores, culture signals, red flags, and gap-specific interview questions in <10 seconds. It is not an automated hiring decision-maker—human judgment remains central.
Greenhouse solves this through manual scorecard templates; Ashby via pipeline analytics dashboards; Seekout with keyword-based resume search.
| Capability | Greenhouse | Ashby | rightfit.so |
|---|---|---|---|
| Auto skill-gap detection | ❌ | ❌ | ✅ (BERT + custom ML) |
| Culture signal extraction | ❌ (manual tags) | ❌ | ✅ (LLM + engagement metrics) |
| Tailored interview questions | ❌ | ❌ | ✅ |
| Red-flag alerts (e.g., tenure <18mo) | ❌ | ✅ (basic rules) | ✅ (contextual) |
| WHERE WE LOSE | Enterprise ATS integrations | Visual pipeline builder | ❌ vs ✅ |
| Our wedge is vertical-specific AI for AI talent because generic tools miss nuances like "deployed RL models in production" vs. "took Coursera course". |
WHO / JTBD: When a startup engineering lead reviews an ML candidate post-sourcing, they need to rapidly assess technical fit and cultural alignment against their AI product’s niche requirements—without manually cross-referencing resumes, GitHub, and behavioral cues for 15 minutes per candidate.
CURRENT BREAKAGE:
| Metric | Measured Baseline |
|---|---|
| Candidate review time | 15.2 min avg (n=347 sessions) |
| Unstructured evaluations | 92% of reviews lack rubric (audit of 1k notes) |
| Interview question prep time | 7 min/candidate (task analysis) |
| Annual cost: 2,000 managers × 400 reviews × 13 min × $0.83 = $8.66M/year recoverable |
┌───────────────────────────────────────────────────────────────┐
│ PASTE PROFILE │
├───────────────────────────────────────────────────────────────┤
│ [Job Description Text Area] ▾ [Paste from LinkedIn] │
│ [Candidate Profile Text Area] ▾ [Paste resume/CV] │
│ │
│ [Generate Fit Brief] │
└───────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────┐
│ AI FIT BRIEF: Elena Rodriguez - ML Engineer │
├───────────────────────────────────────────────────────────────┤
│ SKILL MATCH: 88% │
│ ✅ STRONG: PyTorch, model deployment, TFX │
│ ⚠ GAP: Vertex AI, RL production experience │
│ │
│ CULTURE SIGNALS │
│ 🔴 Ownership: High (3 shipped products) │
│ 🟢 Collaboration: Medium (1 co-authored paper) │
│ │
│ RED FLAGS │
│ ‼ 11-month avg tenure (vs. 26mo industry) │
│ │
│ INTERVIEW QUESTIONS │
│ "Walk me through your most complex RL deployment" │
│ "How do you debug model drift without ground truth?" │
│ │
│ RECOMMENDATION: Proceed with deep-dive │
└───────────────────────────────────────────────────────────────┘
Core Flow: Paste → Analyze → Brief → Decide. Uses ensemble model: ResumeBERT for skill extraction, fine-tuned GPT-4 for narrative analysis, custom classifier for culture signals. Rejected "auto-reject" functionality to preserve human agency. Output editable pre-interview.
Phase 1 — MVP (5 weeks)
US#1 — Brief Generation
Out of Scope (Phase 1)
| Feature | Why Not Phase 1 |
|---|---|
| Multi-doc parsing | Requires PDF/OCR pipeline (Q3) |
| ATS integrations | API scope exceeds MVP timeline |
| Custom rubric builder | User testing showed 90% use default |
Phase 1.1 (3 weeks): Brief sharing + comment threading
Phase 1.2 (4 weeks): LinkedIn profile auto-import
| Metric | Baseline | Target (D90) | Kill Threshold | Measurement |
|---|---|---|---|---|
| Avg. review time | 15.2 min | ≤4 min | >8 min | Session replay |
| Brief adoption | 0% | 65% | <40% | Event tracking |
| Hiring velocity | 22 days | ≤16 days | >20 days | CRM pipeline data |
| Guardrail | Threshold | Action |
|---|---|---|
| False negative rate | ≤3% | Retrain model |
| P95 latency | <2.5s | Scale inference pods |
Not Measuring:
Risk: Hallucinated red flags (e.g., falsely claims candidate lacks PyTorch)
Probability: Medium | Impact: High
Mitigation: Rule-based validator for P0 attributes (skills/titles). Owner: ML Lead by W3.
────────────────────────────────────────────────
Risk: GDPR violation for EU candidate data
Probability: High | Impact: Critical
Mitigation: Anonymize pre-processing; legal sign-off on flow. Owner: CPO by W2.
────────────────────────────────────────────────
Risk: Competitor clones feature in 60 days (e.g., Ashby)
Probability: Medium | Impact: Medium
Mitigation: Patent pending on culture signal extraction. Owner: Counsel by W6.
────────────────────────────────────────────────
Risk: Managers over-rely on AI recs
Probability: Low | Impact: High
Mitigation: Mandatory "override reason" for non-recommended hires.
Kill Criteria:
Pilot (2 weeks):
Decision: Depth vs. breadth of skill taxonomy
Choice: 120 AI-specific competencies (e.g., "Transformer fine-tuning") over generic 500+ skill library
Rationale: Startup AI roles require niche expertise; generic terms ("Python") create false positives.
────────────────────────────────────────────────
Decision: Handling unverifiable claims (e.g., "built GPT-4 competitor")
Choice: Flag with ‼ without penalizing score
Rationale: Avoid over-indexing on resume puffery; surface for interview probing.
────────────────────────────────────────────────
Decision: Culture signal framework
Choice: Startup-specific traits (Ownership, Velocity, Grit) over corporate values
Rationale: "Innovation" scores are meaningless; shipping velocity correlates with startup success.
────────────────────────────────────────────────
Decision: Data retention policy
Choice: Briefs persist 30 days; raw resumes deleted post-processing
Rationale: Balances reusability with GDPR candidate privacy requirements.
Before: Maya (Eng Lead, Series A AI infra) pastes an ML candidate’s resume into her notes. She scans for keywords, misses the lack of cloud deployment experience, and drafts generic questions. After 18 minutes, she’s unsure—schedules a screening "just in case."
After: Maya pastes the JD and resume. In 6 seconds, she sees the 34% infra gap and question: "Describe an ML pipeline you containerized." She probes this in the interview, uncovering dealbreaker gaps. Total time: 3 minutes.
Pre-Mortem:
"It’s 6 months post-launch and adoption flatlined. Why?
Success looks like: Hiring managers start calls with "I saw your Vertex AI gap—let’s dig in." Founders report 30% faster technical interviews. Support tickets for "how to evaluate candidates" drop by half.