Custom English and Hindi AI Calling Agent with Live Streaming & Real-Time Interruption Handling

Замовник: AI | Опубліковано: 26.02.2026

I need a production-ready voice agent that can speak fluent, natural-sounding Hindi, handle two-way telephone calls end-to-end, and broadcast the conversation live. The goal is a single, deployable service that I can point at a SIP number (or Twilio number) and immediately start taking or placing calls while spectators watch the stream in real time. Core requirements • Real-time speech recognition (Hindi) and TTS with configurable personalities and speed • Dialogue engine that lets me script branching call flows, hand-off to fallback intents, or inject an operator mid-call without dropping audio • Live streaming of the ongoing call (audio only is fine) to a major platform or a lightweight custom player; I’m open to your recommendation as long as latency stays under 3 s • Interruption management: the agent should detect when a caller talks over it, pause gracefully, decide whether to answer automatically or prompt an operator, then resume the script when appropriate • Simple web dashboard that shows transcript, sentiment, and a “Take Control” button for manual intervention • Dockerised deployment, clear README, and all source code Acceptance criteria 1. I can spin up the stack with one command, register a phone number, and demonstrate a sample conversation in Hindi. 2. Viewers can open a provided URL and hear the call with <3 s delay. 3. When I speak over the bot, it stops, acknowledges, and either answers or routes to the operator logic you deliver. 4. All conversation text and events appear in the dashboard and in a downloadable JSON log. Tech is flexible—Dialogflow CX, Rasa, Kaldi, Vosk, Twilio Voice, Asterisk, WebRTC, FFmpeg, OBS-style RTMP pipelines—use whatever delivers the smoothest Hindi recognition and low-latency stream, but keep licensing clear for commercial use. Tell me how you would architect the speech pipeline, manage interruptions, and keep the audio stream in sync. If you’ve built similar multilingual voice or streaming tools before, a quick demo link will help me choose fast.