Provider Integration

VOCALS uses a provider abstraction pattern that lets you add custom STT, LLM, and TTS implementations without modifying the core pipeline. This guide walks through the base interfaces, required methods, and registration process.

Base Interfaces

All providers inherit from BaseProvider, which handles API key storage and requires a validate() method. Each pipeline stage has its own abstract class.

BaseProvider

# backend/app/providers/base.py

class BaseProvider(ABC):
    def __init__(self, api_key: str, config: Optional[Dict[str, Any]] = None):
        self.api_key = api_key
        self.config = config or {}

    @abstractmethod
    async def validate(self) -> bool:
        """Validate the provider configuration and API key.
        Returns True if valid, raises an exception with details otherwise."""
        ...

    async def list_models(self) -> List[str]:
        """Return available model IDs. Override in subclasses.
        Returns an empty list by default."""
        return []

STTProvider

class STTProvider(BaseProvider):
    @abstractmethod
    async def transcribe(self, audio_stream: AsyncIterator[bytes]) -> AsyncIterator[str]:
        """Transcribe streaming audio to text.

        Args:
            audio_stream: Async iterator of PCM audio chunks (16kHz, 16-bit mono).

        Yields:
            Transcribed text fragments (partial or final results).
        """
        ...

LLMProvider

class LLMProvider(BaseProvider):
    @abstractmethod
    async def generate(
        self,
        messages: list,
        system_prompt: Optional[str] = None,
    ) -> AsyncIterator[str]:
        """Generate a streaming response from the LLM.

        Args:
            messages: Conversation history as list of dicts with 'role' and 'content'.
            system_prompt: Optional system prompt to prepend.

        Yields:
            Text tokens/chunks as they are generated.
        """
        ...

TTSProvider

class TTSProvider(BaseProvider):
    @abstractmethod
    async def synthesize(self, text: str) -> AsyncIterator[bytes]:
        """Synthesize text to streaming audio.

        Args:
            text: Text to convert to speech.

        Yields:
            Audio data chunks (PCM 16kHz, 16-bit mono).
        """
        ...

Implementing a Custom Provider

Here is a complete example of adding a custom TTS provider.

Step 1: Create the Provider Class

Create a new file in the appropriate subdirectory:

# backend/app/providers/tts/my_tts.py

from typing import Any, AsyncIterator, Dict, List, Optional
import httpx

from app.providers.base import TTSProvider


class MyTTSProvider(TTSProvider):
    """Custom TTS provider implementation."""

    # Hardcoded model list (or fetch from API in list_models)
    MODELS = ["model-standard", "model-hd"]

    def __init__(self, api_key: str, config: Optional[Dict[str, Any]] = None):
        super().__init__(api_key, config)
        self.model = config.get("model", "model-standard")
        self.voice = config.get("voice_id", "default")
        self.base_url = "https://api.my-tts-service.com/v1"

    async def validate(self) -> bool:
        """Test the API key by making a lightweight API call."""
        async with httpx.AsyncClient() as client:
            resp = await client.get(
                f"{self.base_url}/voices",
                headers={"Authorization": f"Bearer {self.api_key}"},
                timeout=10.0,
            )
            if resp.status_code == 401:
                raise ValueError("Invalid API key")
            resp.raise_for_status()
        return True

    async def list_models(self) -> List[str]:
        """Return available models. Can query the API or return a static list."""
        return self.MODELS

    async def synthesize(self, text: str) -> AsyncIterator[bytes]:
        """Stream synthesized audio as PCM 16kHz 16-bit mono chunks."""
        async with httpx.AsyncClient() as client:
            async with client.stream(
                "POST",
                f"{self.base_url}/synthesize",
                headers={"Authorization": f"Bearer {self.api_key}"},
                json={
                    "text": text,
                    "model": self.model,
                    "voice": self.voice,
                    "output_format": "pcm_16000",
                },
                timeout=30.0,
            ) as resp:
                resp.raise_for_status()
                async for chunk in resp.aiter_bytes(chunk_size=4096):
                    yield chunk

Step 2: Register the Provider

Add the import and registration call to the provider registry:

# backend/app/providers/registry.py

def _register_all() -> None:
    # ... existing registrations ...

    from app.providers.tts.my_tts import MyTTSProvider
    register_provider("tts", "my_tts", MyTTSProvider)

The register_provider function takes three arguments:

Argument	Type	Description
`provider_type`	string	`stt`, `llm`, or `tts`
`name`	string	Unique identifier used in the API (e.g. `my_tts`)
`cls`	Type[BaseProvider]	The provider class

Step 3: Use the Provider

After registration, the provider is immediately available through the API:

# Create a provider configuration
curl -X POST \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "tts",
    "name": "my_tts",
    "api_key": "your-api-key-here",
    "model_id": "model-hd",
    "extra_config": { "voice_id": "custom-voice" }
  }' \
  https://api.usevocals.com/api/v1/providers

# Test the provider
curl -X POST \
  -H "Authorization: Bearer $JWT" \
  https://api.usevocals.com/api/v1/providers/{provider_id}/test

# List available models
curl -H "Authorization: Bearer $JWT" \
  https://api.usevocals.com/api/v1/providers/{provider_id}/models

Then assign the provider to an agent:

curl -X PUT \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{"active_tts_provider_id": "provider-uuid-here"}' \
  https://api.usevocals.com/api/v1/agents/{agent_id}

Provider Registry

The registry (backend/app/providers/registry.py) maintains three dictionaries -- one per provider type -- mapping names to classes:

_stt_providers: Dict[str, Type[STTProvider]] = {}
_llm_providers: Dict[str, Type[LLMProvider]] = {}
_tts_providers: Dict[str, Type[TTSProvider]] = {}

Key functions:

Function	Description
`register_provider(type, name, cls)`	Register a provider class
`get_provider(type, name, api_key, config)`	Instantiate a registered provider
`list_providers(type)`	List registered provider names for a type

The _register_all() function runs at import time and registers all built-in providers.

Currently Registered Providers

STT Providers

Name	Class	Description
`deepgram`	`DeepgramSTTProvider`	Real-time streaming via Deepgram Nova
`openai`	`OpenAISTTProvider`	OpenAI Whisper API
`whisper`	`WhisperSTTProvider`	OpenAI Whisper (local)
`elevenlabs`	`ElevenLabsSTTProvider`	ElevenLabs STT
`qwen`	`QwenSTTProvider`	Alibaba Qwen STT
`fish`	`FishSTTProvider`	Fish Audio STT

LLM Providers

Name	Class	Description
`openai`	`OpenAILLMProvider`	GPT-4o, GPT-4, GPT-3.5
`claude`	`ClaudeLLMProvider`	Anthropic Claude models
`google`	`GoogleLLMProvider`	Google Gemini models
`kimi`	`KimiLLMProvider`	Moonshot Kimi models

TTS Providers

Name	Class	Description
`deepgram`	`DeepgramTTSProvider`	Deepgram Aura TTS
`openai`	`OpenAITTSProvider`	OpenAI TTS
`elevenlabs`	`ElevenLabsTTSProvider`	ElevenLabs TTS
`qwen`	`QwenTTSProvider`	Alibaba Qwen TTS
`resemble`	`ResembleTTSProvider`	Resemble AI TTS
`fish`	`FishTTSProvider`	Fish Audio TTS

Audio Format Requirements

All STT providers receive audio as an AsyncIterator[bytes] of PCM 16kHz, 16-bit mono chunks. The orchestrator handles conversion from Twilio's mulaw/8kHz format before passing audio to the STT provider.

All TTS providers must yield audio as PCM 16kHz, 16-bit mono chunks. The orchestrator converts the output back to mulaw/8kHz for Twilio.

If your provider's API uses a different format, perform the conversion inside your provider implementation.

Best Practices

Streaming: Use streaming APIs where available. The pipeline processes audio in real time, so batch APIs add significant latency.
Timeout handling: Set reasonable timeouts on HTTP calls. The orchestrator will retry or fall back gracefully on provider failures.
Error messages: Raise ValueError from validate() with a clear message (e.g. "Invalid API key", "Model not found"). These messages are returned to the user through the API.
Model listing: Implement list_models() to query the provider API when possible. Fall back to a hardcoded list if the API does not support model enumeration. Return an empty list only as a last resort.
Config access: Use self.config to read provider-specific settings passed via extra_config when creating or updating a provider through the API.

Base Interfaces​

BaseProvider​

STTProvider​

LLMProvider​

TTSProvider​

Implementing a Custom Provider​

Step 1: Create the Provider Class​

Step 2: Register the Provider​

Step 3: Use the Provider​

Provider Registry​

Currently Registered Providers​

STT Providers​

LLM Providers​

TTS Providers​

Audio Format Requirements​

Best Practices​