Contacts and call log - every caller's thread of AI-summarised calls
The AI Receptionist home dashboard - calls, appointments and leads today, a call-volume chart, AI performance metrics, and today's schedule
Side Project 01 · AI Receptionist · 2025

A bet on voice agents: a receptionist any business can plug in.

A missed call is a lost customer, and most businesses can’t staff a front desk around the clock. I bet voice agents had just gotten good enough to answer for real, so I built an AI receptionist any business can point a phone number at. It answers in their voice, remembers every caller, and replies in about 300ms, fast enough to feel like a conversation. Built end-to-end, multi-tenant from day one.

My role
End-to-end - PM + build
Read
7 min
Stack
Groq / LPU · telephony · knowledge graph

Start reading
The bet
Voice agents are finally good enough to answer for real
What it does
Any business plugs in a number - it answers
The hard part
Latency - 1,050ms → 300ms, at ¼ the cost
The shape
Multi-tenant - one stack, many desks
THE BET (01)

Why I built it

Small businesses lose real money to the phone. After hours, during a rush, on a holiday, the call goes unanswered and the customer calls the next place. Hiring a 24/7 front desk isn’t realistic for most of them. For years the obvious fix, an automated one, was worse than nothing: it sounded like a machine and everyone hung up.

What changed is speed. Voice models got good, and fast inference on Groq’s LPU got the round-trip low enough that a caller doesn’t notice the gap. There’s a line where it stops sounding like software. Under it, people talk to it like a person. Over it, they give up. I bet we’d crossed it, so I built a receptionist to find out.

HOW IT WORKS HOW IT WORKS
HOW IT WORKS (02)

The product, end to end

Point a phone number at it and it answers, then gets smarter about each caller over time. Here are the surfaces that run it, and the shape underneath.

The AI Receptionist home dashboard - calls, appointments and leads today, a call-volume chart, AI performance metrics, today's schedule, and an 'AI Active' status
01 / The dashboard

The front desk, at a glance.

The home a business logs into - calls, appointments and leads today, the call-volume trend, and the day’s schedule down the side.

The headline an owner actually checks: AI Active, and how much of the phone the agent is quietly handling.

live overview
Contacts and call log - each caller's thread of AI-summarised calls with an appointment captured inline, plus a detail panel of call history and lead info
02 / Voice + memory

It knows who’s calling.

It answers in the business’s voice, and a per-caller memory plus a knowledge graph mean it remembers who you are and what you asked last time - so repeat callers don’t start from zero.

Every caller gets a thread: each call summarised, appointments captured inline, full history at a glance.

caller memory · graph
Inside one call - recording player, an AI-vs-caller speaker-activity timeline, the AI summary, and the verbatim transcript
03 / Every call, on the record

Inside every call.

The recording, an AI-vs-caller speaker timeline, an auto-summary, and the verbatim transcript - so a business can check exactly what was said, and the agent can hand off cleanly when it should.

recording · transcript
04 / Point a number at it

Routing & config.

Calls route straight to the agent. A per-business config carries the hours, the services, and when to escalate to a human, so it behaves like their front desk.

per-business config
05 / Multi-tenant

Built for one, runs for many.

It’s a multi-tenant service from the start: one system, many front desks. Onboarding a new business takes a number and a config. No new stack.

one → many
THE HARD PART THE HARD PART
THE HARD PART (03)

Latency is the whole product

Everything else is downstream of one number. How long the caller waits before they hear a reply. At ~1,050ms it feels broken. People talk over it, repeat themselves, and hang up. Under ~300ms it feels alive. So the work wasn’t the conversation logic. It was tuning the whole stack until the gap disappeared: the transcriber, an open model on Groq’s LPU, a custom voice, the memory lookups. The same swap cut cost to a quarter, from ~$0.19 to ~$0.05 a minute, which is what makes running it for many businesses viable.

The other half of “feels real” is the same bar I hold on every AI product: it’s not allowed to be confidently wrong. A receptionist that invents your hours is worse than one that didn’t pick up. So it knows when to say it doesn’t know, and when to hand the call to a person.

Before · stock stackBefore - average latency ~1,050ms, average cost ~$0.19/min. Deepgram English transcriber, GPT-4o cluster, Vapi voice
After · tuned stackAfter - average latency ~300ms, average cost ~$0.05/min. AssemblyAI multilingual transcriber, llama-3.1-8b-instant on Groq, custom voice
The number Swapping the stack pulled the round-trip ~1,050ms → ~300ms and cost ~$0.19 → ~$0.05/min - the line between “feels like a person” and “hangs up.”
300
ms response latency, down from ~1,050ms - at ¼ the cost
24/7
Answers every call - after hours, holidays, rush
1→∞
Multi-tenant - built for one business, runs for many
0
Confidently-wrong answers tolerated - it escalates instead
WHAT I OWN WHAT I OWN
WHAT I OWN (04)

The product decisions, end to end

Building it end to end meant the product calls were mine: the script and the escalation paths (when the agent should stop talking and get a human), the “never confidently wrong” bar, and the multi-tenant model that lets one system be many businesses’ front desk. Same instinct as Otto. In anything conversational and critical, the job is owning what the model is allowed to say, and how it behaves when it isn’t sure.

It’s a bet, and it’s still early. But the core question it answers, can a business never miss a call without hiring for it, is one a lot of people have.