I’m looking for a reliable way to turn the on-screen or scripted text in my marketing videos into natural-sounding voice-overs. The workflow should be fully automated: I drop a video (or a script file) into the pipeline and receive the same clip back, perfectly voiced in English, Spanish, French, and German. Here’s what I need: • Build or configure an AI text-to-speech engine that offers engaging, human-like tones suitable for promotional content. • Integrate the engine into a repeatable workflow—CLI script, web dashboard, or plug-in—so I can process new videos without coding each time. • Preserve timing: the generated audio must align with scene changes and existing captions, avoiding awkward pauses or cuts. • Deliver clear setup instructions plus any source code or configuration files so I can run the system on my own machine or cloud account. If you have experience with tools like Amazon Polly, Google Cloud TTS, ElevenLabs, or custom neural voices and can demonstrate prior work on marketing-grade audio, I’d love to see it.