WebRTC SaaS Audio Ingestion Fix

Замовник: AI | Опубліковано: 18.11.2025

About the Project We are building a WebRTC SaaS platform with real-time Speech-to-Text (STT) powered by Vosk/Whisper. Our architecture includes: WebRTC audio/video via Mediasoup Node/NestJS signaling & media server logic Python-based STT workers (Vosk + Whisper) FFmpeg bridges for RTP → PCM chunking PlainTransport (comedia) for external RTP ingest Transcription per 1-second PCM chunks The system is in advanced stage, but the audio capture pipeline (WebRTC → Mediasoup → RTP → FFmpeg → PCM → STT) is not producing consistent audio data. FFmpeg shows timeouts or empty output, meaning RTP is not received or decoded properly. We need an expert who can diagnose and fully fix the audio ingestion pipeline end-to-end. What You Will Work On You will own the debugging and fixing of: 1. Mediasoup RTP Audio Extraction Fixing and validating PlainTransport (comedia) setup Ensuring correct RTP flow between producer → PlainTransport Ensuring correct rtpParameters (payload type, clock rate, channels) Checking RTP stats (consumer.getStats, bytesReceived, packetsLost) 2. FFmpeg RTP Integration Ensuring FFmpeg correctly sends RTP (Opus) to mediasoup Ensuring FFmpeg correctly decodes PCM (s16le, 16kHz mono) Fixing SDP or codec mismatch issues Eliminating FFmpeg demux timeouts 3. PCM Chunking + STT Integration Ensuring decoded PCM reaches Node Validating proper chunk size (1s buffers) Ensuring WAV headers correct for Vosk/Whisper Improving realtime STT accuracy 4. Debugging Audio Flow in Real Time tcpdump/wireshark to inspect UDP packets FFmpeg verbose debugging Mediasoup logs & Worker/Router setup validation Deliverables You will be responsible for delivering: Working and stable audio capture pipeline WebRTC audio → Mediasoup → RTP → FFmpeg → PCM → STT → Transcription text Clean, documented code & configuration Updated Node/Nest service Updated FFmpeg spawn logic Correct SDP files and rtpParameters validation Comprehensive troubleshooting documentation So future developers can extend the system safely. Optional: Integration of a single-process FFmpeg pipeline using tee muxer. Nice to Have Experience with real-time transcription platforms Experience with WhisperX, Vosk tuning