I’m building a small hardware toy that wakes up to the custom word “boboloo,” understands what a child says in Hindi or English, and replies with a warm, kid-friendly synthetic voice. To make future language expansion easy, every component must be fully open-source and cleanly documented. Here’s what I need: • A custom OpenWakeWord model trained on “boboloo,” packaged so it runs efficiently on an embedded Linux SBC (Raspberry Pi 4 or ESP or similar) with clear latency and false-accept metrics. • An STT pipeline—Whisper, Vosk, or any equally permissive stack—that reliably handles Hindi and English today but is architected so I can drop in additional language models later without touching the toy’s firmware. The complete code of this will be going to the cloud. • A children-tuned TTS model (Coqui-TTS or comparable) producing a friendly, non-robotic voice. Please include guidance on voice cloning or fine-tuning in case I ever want alternate characters. This code will also be going to cloud. • End-to-end integration scripts: wake-word → streaming STT → my cloud LLM (REST endpoint) → TTS playback, returning only audio feedback to the child. All calls to the LLM should be abstracted behind a simple Python class so I can swap providers. • Clear build instructions, Dockerfiles (if helpful), and a lightweight demo CLI so I can test each step on a laptop before flashing the toy.