Human Voice Classification Model

Бюджет: 30 $

I have a fully-labeled collection of WAV recordings and need a compact solution that can 1) tell whether an input clip is a human voice and 2) identify which speaker from the dataset is talking. My preference is to work with MFCC features fed into a Convolutional Neural Network; if you believe another architecture will reliably hit the 90 % accuracy mark, feel free to explain why. Scope of work • Write a clean training script that reads the WAV files, extracts MFCCs, and trains the model. • Validate on a held-out set and document the accuracy; target ≥ 90 %. • Provide the final weights, an inference script that accepts a single audio file and prints the predicted speaker name (or “unknown” if outside the set), and a short README with setup and run commands. Please keep the code lightweight—TensorFlow, PyTorch, or similar mainstream libraries are fine—and emphasize reproducibility so I can retrain with future data.

Python

Реєстрація