HateMM Extraction & Fusion

I already have a models for text, audio and video tracks as well as three working unimodal checkpoints. What I now need is a deep-learning practitioner who can complete the HateMM pipeline for me—running feature extraction for every modality, validating each unimodal model and then carrying out the final fusion step. The focus is squarely on your time management: please tell me how quickly you can complete this project Deliverables • Run the official HateMM feature-extraction scripts on my model (text, audio, video). • Evaluate the three existing unimodal models and return precision, recall, F1 and confusion matrices. • Implement and train a fusion layer (late or intermediate, whichever proves most performant) and report its metrics. • Hand over cleaned notebooks / .py files, requirements.txt and a short “how-to-run” read-me. I will provide the dataset link, current checkpoints and baseline numbers as soon as we start; you supply the working code and final metrics.

Python

Регистрация