Skip to main content

Welcome to VOCALS

VOCALS (Voice-Orchestrated Call and Language System) is an AI-powered telephony platform that handles real-time voice conversations. It integrates with Twilio for telephony and supports swappable AI providers for speech-to-text (STT), large language models (LLM), and text-to-speech (TTS).

What can you do with VOCALS?

  • Build AI phone agents — Create conversational AI agents that answer and make phone calls
  • Choose your AI stack — Swap STT, LLM, and TTS providers per agent (Deepgram, OpenAI, ElevenLabs, Anthropic, and more)
  • Scale call operations — Queue outbound calls with rate limiting, handle concurrent inbound calls
  • Monitor everything — Real-time analytics dashboard with latency, cost, and conversation metrics
  • Integrate via API — Full REST API and webhook system for building custom workflows

How it works

Caller ──► Twilio SIP ──► VOCALS Orchestrator ──► STT (speech → text)
│ │
│ transcribed text
│ │
│ LLM (generate response)
│ │
│ response text
│ │
│ TTS (text → speech)
│ │
◄──────────────── audio back to caller

Each call flows through a real-time pipeline: incoming audio is transcribed, the transcript is sent to an LLM for a response, and the response is synthesized back into speech — all in under 2 seconds.

Next steps

  • Quickstart — Set up your first agent and make a call
  • Core Concepts — Understand tenants, agents, providers, and the pipeline
  • API Overview — Start building with the VOCALS API