Social Media NLP Classifier

Бюджет: 250 $

I need a Natural Language Processing solution that accurately classifies social-media posts into predefined categories. The raw text will be provided in CSV format; it comes directly from public platforms and carries the usual noise—emojis, hashtags, abbreviations, and mixed languages—so an effective preprocessing pipeline is as important as the model itself. Here is how I picture the workflow. • Data handling: robust cleaning, tokenisation, and normalisation that respects emojis and common social-media shorthand. • Model building: a modern text-classification architecture (transformers via HuggingFace, or a lightweight scikit-learn baseline if you can justify comparable performance). • Training & evaluation: use train/validation/test splits and report accuracy, F1, and confusion matrix. • Inference script: simple Python script or API endpoint that takes a post and returns its class. • Documentation: concise README explaining setup, dependencies, and how to retrain with fresh data. Acceptance criteria 1. Minimum F1-score of 0.85 on the hold-out test set I will supply. 2. Reproducible environment (requirements.txt or environment.yml). 3. Code delivered via private Git repository or ZIP file. Tools you might consider include Python 3.11, PyTorch or TensorFlow, HuggingFace Transformers, spaCy, and scikit-learn; feel free to propose alternatives if they achieve equal or better results. I will review interim results as soon as you have an initial baseline, then we can iterate on hyper-parameters, class imbalance handling, and deployment details.

Python

Регистрация