Child Voice Coqui TTS Fine-Tuning -- 2

I need a scalable fine-tuning job on Coqui TTS. Following the official guide (https://docs.coqui.ai/en/latest/finetuning.html), I want a bilingual model that speaks in both English and Hindi. The target is child voices in the 5-to-7-year age range, using the children’s speech dataset available on Kaggle. Here’s what I expect: • Prepare and clean the Kaggle child-voice dataset (or guide me if additional samples are required). • Fine-tune an existing Coqui TTS checkpoint so it reproduces clear, natural speech that sounds like the selected children. • Deliver the trained model files, inference script, and a short README showing how to install dependencies, load the model, and generate audio from text on a standard GPU. • Provide the basic training notebook or script so I can replicate or continue training later. and additionally the pipelines with which we are going to connect through the fast api servers, we also wanted to have the speaker diarization. We also want suggest of implementing the VAD into our TTS. the timeline to finish is aroumd 10 days. Keep the scope lean—no elaborate UI, just a working model and straightforward instructions that let me plug in text and hear the child voice in both languages.

Python

Реєстрація