AI Voice-Calling Agent Integration with Existing Chatbot

Замовник: AI | Опубліковано: 26.02.2026

Project Brief: AI Calling Agent (Machine B) – Centralized with Existing AI Chatbot (Machine A) Existing System Architecture (Machine A – Already Live) We currently have Machine A running a production AI Chatbot with the following setup: Tech Stack • Backend: Python • Frontend: React • AI Brain: o Custom vector database o Fallback to OpenAI API if data not available in vector DB • Multi-tenant Architecture: o Each website has:  Unique site_id  Unique api_key o Data & personalization handled site-wise Current Capabilities • Integrated with CRM & Portal APIs • Functions include: o Check Order Status o Update Order Status • Site-specific data handling • Analytics API already available for chatbot (visible on React frontend) Important: There must be zero disturbance to the existing chatbot system (Machine A). No downtime, no architecture break, no performance degradation. ________________________________________ New Requirement: AI Calling Agent (Machine B) We are building Machine B dedicated to AI Voice Calling Agent. Machine B will: • Handle SIP trunk calls • Handle website voice widget calls • Share the same AI Brain + knowledgebase as Machine A • Use centralized architecture for status fetching/updating ________________________________________ Core Architectural Goal Centralized Intelligence Layer We want: Chat (Machine A) Central AI Brain + Knowledgebase + CRM/Portal APIs Call Agent (Machine B) Both Chatbot and Calling Agent should: • Use same vector database • Use same CRM/Portal APIs • Use same site_id architecture • Use same personalization logic • Use same knowledgebase • Use same business logic Only response formatting style will differ (chat vs voice). ________________________________________ Responsibilities of Developer (Machine B) A. SIP Call Handling Machine B must: • Connect via SIP trunk • Handle 10–15 concurrent calls • Manage: o Call pickup o Reject o Hangup o Outgoing Calls for Sales/Lean qualifications o Transfer to human agent • Maintain low latency • Proper session management per call If AI fails: • Transfer call to human agent ________________________________________ B. Website Voice Chat Widget Similar to chatbot widget but for voice. • Multi-tenant (site_id based) • Each website has its own personalization • Voice widget connects to Machine B • Both SIP & Voice widget calls land in same processing room/session system ________________________________________ Voice Processing Flow (Critical) Hardware: GPU Provided: • RTX 5060 Ti • 16GB VRAM (GDDR7) Developer will get GPU access. Flow: Call Received (SIP / Widget) ↓ STT (Open-source model) ↓ Text ↓ AI Brain (Centralized) ↓ Response Text ↓ TTS (Open-source model) ↓ Audio Streaming back to user Requirements: • Ultra-low latency • Natural sounding response • Avoid robotic delay • Real-time streaming (not full-buffer response) ________________________________________ Delay Handling If response generation is slow: System should play dynamic holding message like: “Line par bane rahiye, check karne mein samay lag raha hai…” This should be: • Automatically triggered after defined threshold • Non-blocking ________________________________________ STT & TTS Requirements • Open-source STT model • Open-source TTS model • GPU optimized • Real-time streaming support • Fine tuning for: o Natural tone o Human-like pacing o Hindi/Hinglish pronunciation o Punjabi (optional) o English ________________________________________ Response Formatting Difference (Chat vs Call) Same AI brain, different response styles: For Call: “Aap line par bane rahein, main information check kar raha hoon.” For Chat: “Please chat par bane rahein, main check kar raha hoon.” Developer must implement response formatting layer depending on: channel = chat | call ________________________________________ Knowledgebase Management • Centralized knowledgebase • Developer must: o Create APIs to manage knowledgebase o Expose endpoints for frontend management • Frontend UI for KB management is NOT part of this developer’s scope • Only backend APIs required ________________________________________ Analytics (New Requirement) Chatbot already has analytics API. Developer must build Call Analytics API including: • Total calls • Successful AI handled calls • Transferred calls • Average response time • Call duration • Concurrency metrics • Drop rate • Language usage stats API must be compatible with existing React dashboard. ________________________________________ Performance Expectations • 10–15 concurrent calls minimum • GPU optimized STT/TTS • No blocking architecture • Scalable design • Proper logging • Error handling • Failover to human • Clean modular architecture ________________________________________ Multi-language Handling Primary language: • Hinglish (majority conversation) Optional: • English • Punjabi Language detection preferred if feasible. ________________________________________ Critical Constraint (Must Highlight) There must be NO disturbance to the existing Chatbot system on Machine A. • No downtime • No breaking changes • No latency increase • No architectural compromise • Backward compatibility must be maintained All integration must be done safely. ________________________________________ Expected Deliverables 1. Fully working SIP-based AI calling agent 2. Voice widget backend support 3. STT + TTS integrated with GPU 4. Centralized integration with AI Brain 5. Knowledgebase management APIs 6. Call analytics APIs 7. Call transfer logic 8. Outbound call API 9. Low latency real-time streaming 10. Clean documentation 11. Deployment guide 12. A scaleable Architecture if the calling volume grows in future the came can be scaled by adding more ram gpu etc 14 Another requirement We need REST API endpoint: for Human call transcribing we want transcribing service for Real humans you will have issue a API endpoint where we will be sending you the recording files from our crm and then stt engine will transcribe the call and give structured text in response , we want this for Audit purpose of our human calling agents. As we will already be doing this in crm integration part of AI Calling just the new requirement is for Humans call transcribing