Welcome to VOCALS
VOCALS (Voice-Orchestrated Call and Language System) is an AI-powered telephony platform that handles real-time voice conversations. It integrates with Twilio for telephony and supports swappable AI providers for speech-to-text (STT), large language models (LLM), and text-to-speech (TTS).
What can you do with VOCALS?
- Build AI phone agents — Create conversational AI agents that answer and make phone calls
- Choose your AI stack — Swap STT, LLM, and TTS providers per agent (Deepgram, OpenAI, ElevenLabs, Anthropic, and more)
- Scale call operations — Queue outbound calls with rate limiting, handle concurrent inbound calls
- Monitor everything — Real-time analytics dashboard with latency, cost, and conversation metrics
- Integrate via API — Full REST API and webhook system for building custom workflows
How it works
Caller ──► Twilio SIP ──► VOCALS Orchestrator ──► STT (speech → text)
│ │
│ transcribed text
│ │
│ LLM (generate response)
│ │
│ response text
│ │
│ TTS (text → speech)
│ │
◄──────────────── audio back to caller
Each call flows through a real-time pipeline: incoming audio is transcribed, the transcript is sent to an LLM for a response, and the response is synthesized back into speech — all in under 2 seconds.
Next steps
- Quickstart — Set up your first agent and make a call
- Core Concepts — Understand tenants, agents, providers, and the pipeline
- API Overview — Start building with the VOCALS API