Project Scope: We have a high-accuracy (94%) Answering Machine Detection (AMD) system for outbound calls in France. It currently detects humans vs. machines within 2 seconds by downloading and processing audio. We now need to convert this into a real-time streaming solution that can make decisions as fast as possible (400ms–2s max). Current System: • 94% accuracy on human vs. machine detection. • Processes downloaded audio segments (2s window). • Works with multiple audio features and classification logic. What Needs to Be Done: • Transform the current batch (download) approach into low-latency real-time audio streaming processing. • Maintain a maximum detection window of 2 seconds, but allow early decision (400ms, 500ms, 1.2s…) when confidence is 100%. • Integrate additional features for: • Bit verification. • Synthetic voice detection. • Music/sound pattern detection. • Ensure instant call routing: • If human → transfer immediately to a callbot. • If machine → hang up instantly. Skills Required: • Strong experience in real-time audio processing (WebRTC, RTP, SIP audio streams, or equivalent). • Proficiency in speech and signal processing (e.g., VAD, MFCC, spectral analysis). • Machine Learning/Deep Learning for audio classification. • Experience with latency optimization in streaming systems. • Familiarity with telephony protocols (SIP, Asterisk, FreeSWITCH, etc.) is a strong plus. • Python/Node.js/Go/C++ (any language capable of handling low-latency audio). Deliverables: • Real-time streaming AMD system (replace current download method). • Early decision logic with configurable thresholds. • Integration of new detection features (synthetic voice, music, bit verification). • API or direct integration with existing call system for routing. Additional Info: We handle high outbound call volumes (hundreds of checks per second). The system must be highly scalable and optimized for performance.