Voice-to-Veena DDSP Conversion

I am working on an audio-ML experiment and need a hand finishing the pipeline that converts a raw human-voice track into the characteristic sound of a South Indian Veena using Google’s DDSP framework. The core idea is simple: feed a sung or spoken melody in, get a realistic, expressive Veena performance out. Where the project stands • I have explored DDSP’s pre-trained models and understand the training loop, but I have not yet collected or prepared a Veena dataset. • I’m undecided on whether I will supply the vocal samples myself or rely on your guidance in sourcing/recording them, so I need flexibility on that point. What I need from you • Prepare or help me source a clean set of Veena stems suitable for DDSP training (ideally 44.1 kHz, mono). • Fine-tune or build a DDSP model that maps monophonic voice to Veena timbre; Python notebooks or scripts preferred. • Deliver inference code that accepts a WAV file and returns the converted audio (real-time or offline processing is fine as long as latency is documented). • Provide short demo renders that clearly showcase the model’s output quality. • Write concise setup instructions so I can reproduce results on my own machine (Ubuntu, CUDA available). Acceptance criteria 1. Converted audio retains original melody and phrasing while sounding recognisably like a Veena. 2. Artifacts such as metallic ringing or pitch drift kept to a minimum. 3. Full training/inference workflow reproducible via the provided notebook or script. Tools & skills that would help: TensorFlow, DDSP library, basic DSP, dataset prep, and a good ear for Carnatic timbre. If you’ve already done instrument style-transfer work, especially on plucked strings, please mention it and share any relevant audio links. Let’s craft a convincing Voice-to-Veena model together.

Python

Регистрация