Contacts and call log - every caller's thread of AI-summarised calls

9:41

Crowe Plumbing · Front desk

Incoming call

AI answering · 00:18

Returning caller

Reported a leak under the kitchen sink last week. Prefers afternoons.

response 300 ms

mute

keypad

audio

add call

FaceTime

contacts

The AI Receptionist home dashboard - calls, appointments and leads today, a call-volume chart, AI performance metrics, and today's schedule

Multi-tenant Per-caller memory Latency 300ms Groq / LPU Live ✓ answering

Side Project 01 · AI Receptionist · 2025

A bet on voice agents: a receptionist any business can plug in.

A missed call is a lost customer, and most businesses can’t staff a front desk around the clock. I bet voice agents had just gotten good enough to answer for real, so I built an AI receptionist any business can point a phone number at. It answers in their voice, remembers every caller, and replies in about 300ms, fast enough to feel like a conversation. Built end-to-end, multi-tenant from day one.

My role

End-to-end - PM + build

Read

7 min

Stack

Groq / LPU · telephony · knowledge graph

↓ Start reading

THE BET (01)

Why I built it

Small businesses lose real money to the phone. After hours, during a rush, on a holiday, the call goes unanswered and the customer calls the next place. Hiring a 24/7 front desk isn’t realistic for most of them. For years the obvious fix, an automated one, was worse than nothing: it sounded like a machine and everyone hung up.

What changed is speed. Voice models got good, and fast inference on Groq’s LPU got the round-trip low enough that a caller doesn’t notice the gap. There’s a line where it stops sounding like software. Under it, people talk to it like a person. Over it, they give up. I bet we’d crossed it, so I built a receptionist to find out.

HOW IT WORKS (02)

The product, end to end

Point a phone number at it and it answers, then gets smarter about each caller over time. Here are the surfaces that run it, and the shape underneath.

The AI Receptionist home dashboard - calls, appointments and leads today, a call-volume chart, AI performance metrics, today's schedule, and an 'AI Active' status

01 / The dashboard

The front desk, at a glance.

The home a business logs into - calls, appointments and leads today, the call-volume trend, and the day’s schedule down the side.

The headline an owner actually checks: AI Active, and how much of the phone the agent is quietly handling.

live overview

Contacts and call log - each caller's thread of AI-summarised calls with an appointment captured inline, plus a detail panel of call history and lead info

02 / Voice + memory

It knows who’s calling.

It answers in the business’s voice, and a per-caller memory plus a knowledge graph mean it remembers who you are and what you asked last time - so repeat callers don’t start from zero.

Every caller gets a thread: each call summarised, appointments captured inline, full history at a glance.

caller memory · graph

Inside one call - recording player, an AI-vs-caller speaker-activity timeline, the AI summary, and the verbatim transcript

03 / Every call, on the record

Inside every call.

The recording, an AI-vs-caller speaker timeline, an auto-summary, and the verbatim transcript - so a business can check exactly what was said, and the agent can hand off cleanly when it should.

recording · transcript

04 / Point a number at it

Routing & config.

Calls route straight to the agent. A per-business config carries the hours, the services, and when to escalate to a human, so it behaves like their front desk.

per-business config

05 / Multi-tenant

Built for one, runs for many.

It’s a multi-tenant service from the start: one system, many front desks. Onboarding a new business takes a number and a config. No new stack.

one → many

THE HARD PART (03)

Latency is the whole product

Everything else is downstream of one number. How long the caller waits before they hear a reply. At ~1,050ms it feels broken. People talk over it, repeat themselves, and hang up. Under ~300ms it feels alive. So the work wasn’t the conversation logic. It was tuning the whole stack until the gap disappeared: the transcriber, an open model on Groq’s LPU, a custom voice, the memory lookups. The same swap cut cost to a quarter, from ~$0.19 to ~$0.05 a minute, which is what makes running it for many businesses viable.

The other half of “feels real” is the same bar I hold on every AI product: it’s not allowed to be confidently wrong. A receptionist that invents your hours is worse than one that didn’t pick up. So it knows when to say it doesn’t know, and when to hand the call to a person.

Before - average latency ~1,050ms, average cost ~$0.19/min. Deepgram English transcriber, GPT-4o cluster, Vapi voice — **The number** Swapping the stack pulled the round-trip **~1,050ms → ~300ms** and cost **~$0.19 → ~$0.05/min** - the line between “feels like a person” and “hangs up.”

After - average latency ~300ms, average cost ~$0.05/min. AssemblyAI multilingual transcriber, llama-3.1-8b-instant on Groq, custom voice — **The number** Swapping the stack pulled the round-trip **~1,050ms → ~300ms** and cost **~$0.19 → ~$0.05/min** - the line between “feels like a person” and “hangs up.”

WHAT I OWN (04)

The product decisions, end to end

Building it end to end meant the product calls were mine: the script and the escalation paths (when the agent should stop talking and get a human), the “never confidently wrong” bar, and the multi-tenant model that lets one system be many businesses’ front desk. Same instinct as Otto. In anything conversational and critical, the job is owning what the model is allowed to say, and how it behaves when it isn’t sure.

It’s a bet, and it’s still early. But the core question it answers, can a business never miss a call without hiring for it, is one a lot of people have.