I need a scalable fine-tuning job on Coqui TTS. Following the official guide (https://docs.coqui.ai/en/latest/finetuning.html), I want a bilingual model that speaks in both English and Hindi. The target is child voices in the 5-to-7-year age range, using the children’s speech dataset available on Kaggle. Here’s what I expect: • Prepare and clean the Kaggle child-voice dataset (or guide me if additional samples are required). • Fine-tune an existing Coqui TTS checkpoint so it reproduces clear, natural speech that sounds like the selected children. • Deliver the trained model files, inference script, and a short README showing how to install dependencies, load the model, and generate audio from text on a standard GPU. • Provide the basic training notebook or script so I can replicate or continue training later. and additionally the pipelines with which we are going to connect through the fast api servers, we also wanted to have the speaker diarization. We also want suggest of implementing the VAD into our TTS. the timeline to finish is aroumd 10 days. and additionally I also want the openwakeword trained with the keyword called BOBOLOO and it has to work in esp32 and resberry pi's any version and i want the full system of integrating it to the hardware and once we will be callling the keyword it should have to connect our hardware to the stt which is on the cloud created by us and then the our llm make the response and then the stt will thereafter send the response through the child based tts