Centralized AI Voice & Chat System Developer

We are building a Centralized AI Voice & Chat Agent System. Architecture Philosophy: Machine B → Central AI Brain (existing chatbot, KB, CRM, order APIs) Machine A → Media Processing Unit (GPU server for STT + TTS + SIP + WebRTC) Voice and chat must share the same AI brain. We require a developer who can build a low-latency (<1 second), GPU-optimized, production-ready system. This is NOT an API wrapper project. This requires real-time streaming AI experience. Infrastructure (Already Available) Machine A RTX 5060 Ti 16GB Proxmox 8.4 Docker running directly on host (NO GPU passthrough via VM) NVIDIA Container Toolkit access Machine B Existing chatbot backend Knowledge base (site-wise) CRM integration Order status APIs Existing React frontend (MUST NOT be modified) Project Scope Media Processing Layer (Machine A) You will build: Audio Orchestrator Handle SIP calls Handle WebRTC / WebSocket browser audio Route audio to STT Send text to Machine B Receive AI response Route to TTS Stream audio back Must support: 10–15 concurrent calls Session management Site ID tagging Fault isolation per session STT (Speech-to-Text) Requirements: Open-source only (Faster-Whisper / NeMo / equivalent) GPU accelerated Streaming mode (NOT batch) Hindi + English support Optimized chunk processing Latency target: <300ms chunk processing TTS (Text-to-Speech) Open-source only (Coqui XTTS / VITS / Piper / similar) Must be fine-tuned for: Natural Indian conversational tone Hinglish switching Professional assistant voice Latency target: audio generation start <400ms Model weights must be delivered Web Voice Backend WebRTC or WebSocket Secure connection (WSS) Embeddable JS mic widget AI Brain Enhancements (Machine B) You will: Modify chatbot API to accept: source: webchat | voice_call site_id parameter Optimize response formatting for voice Expose Knowledge Base CRUD APIs (site-wise) Enable CRM & order status through voice channel Existing chat functionality must remain untouched. Inter-Server Communication gRPC preferred (low latency) HTTPS required Token authentication Retry & timeout logic Analytics Log: Call ID Site ID Transcript STT latency AI latency TTS latency Total latency Call duration Call outcome Data must be stored for dashboard usage. Performance Requirements (Critical) End-to-end latency: < 1 second 15 concurrent calls stable for 45 minutes No GPU OOM Natural-sounding TTS If latency consistently exceeds 1.2 seconds → not acceptable. Deliverables Full source code (Git) Docker Compose files NVIDIA GPU configuration Fine-tuned TTS weights API documentation (Swagger) Deployment guide Load testing report Architecture diagram Required Experience Must have: Real-time streaming STT experience Experience deploying AI models on GPU Experience with Docker + NVIDIA toolkit Experience handling SIP or VoIP systems Low-latency system design experience Nice to have: WebRTC experience Hindi NLP experience Do NOT Apply If You only have OpenAI API integration experience You have never deployed open-source models on GPU You have never handled streaming audio You cannot demonstrate latency optimization work Project Type Fixed price preferred Milestone-based Code ownership transferred on completion NDA required Proposal Requirements In your proposal, please answer: Which STT model will you use and why? Which TTS model will you use and how will you fine-tune? How will you achieve <1 second latency? What is your experience with concurrent audio sessions? Provide examples of similar projects.

PHP

Регистрация