I need a detail-minded collaborator to convert a batch of raw audio files into clean, richly tagged text that can be dropped straight into a speech-recognition training pipeline. The job is solely audio-based for now, but the workflow is designed to expand to other modalities later, so versatility is welcome. Your core tasks are to retype each recording verbatim, mark speaker turns and background events with the tags in my style guide, and deliver everything in UTF-8 text files that pass my validation script. Accuracy is paramount: I’m aiming for a word-error rate under 2 %. Any sections that are unintelligible or privacy-sensitive need to be flagged so I can remove them before model training. Deliverables • Time-aligned transcripts for every file • Inline annotation of non-lexical utterances, speaker changes, and noise events • A short log of flagged audio segments • Final assets organised in the folder structure /corpus/{split}/{language}/ and committed to my Git repo Acceptance criteria • ≤2 % WER on a second-pass review • Zero validation warnings from my checker script • Consistent file naming and folder hierarchy We’ll work in a Linux/Git environment, and I regularly use Audacity, ELAN, Praat, and Python-based tooling—if you have similar preferences, you’ll ramp up quickly. Tell me your typical throughput in minutes of audio per day and highlight any previous speech-recognition projects you’ve tackled. Once we agree on milestones, I can share the guide and validation tools so you can start right away.