Low-Latency Cross-Platform Transcription

Заказчик: AI | Опубликовано: 27.09.2025

I’m building a live speech-to-text tool that listens to a dynamic-mic input and turns it into text in real time. The application has to run on Windows, macOS, and Linux without relying on a GPU. Staying strictly in CPU-only mode, it should keep total RAM usage below 500 MB yet still reach close to 95 – 100 % word-level accuracy. Latency needs to be low enough for smooth on-screen captioning—think fractions of a second, not seconds. I’m open to whichever stack makes that possible—Whisper.cpp, Vosk, Kaldi, on-device quantised models, or a custom approach—as long as you optimise it for speed and memory. The finished result should include: • A runnable binary or easily reproduced build for the three desktop platforms • Source code with clear comments and a concise README covering setup, model download, and typical CPU usage numbers • A minimal CLI (or small GUI if you prefer) that starts listening, prints live transcripts, and exposes a latency/readability toggle • A short verification log demonstrating the memory footprint, latency, and accuracy on a sample recording from a dynamic mic - I dont choosing any programming language, in fact choosing c++, rust and all would even be better over python and other heavy languages. If you have prior experience squeezing ASR models into tight footprints while keeping accuracy high, let’s talk.