Real-Time Fake News Detector

I’m building a service that pulls breaking stories from Times of India, NDTV, and BBC through their public APIs, runs each headline and body through a fine-tuned BERT model, and instantly tells the reader whether the article is genuine or fake. Every prediction must be accompanied by a SHAP explanation so users can see which phrases drove the decision and how confident the model is. Here’s the flow I have in mind: • A lightweight Python pipeline (Transformers + PyTorch) polls the three APIs, normalises the text, then feeds it to the classifier. • The classifier is a BERT base model fine-tuned specifically on a labelled fake-news dataset; the training notebook, scripts, and final .pt/.bin weights are part of the hand-over. • SHAP values are calculated in real time, returned as a plot or JSON payload that the front end can visualise. • A simple web UI (React or plain Flask/Jinja—your choice) lets users paste a custom article or browse the live feed with prediction scores and SHAP highlights. • Everything should deploy easily on a small VPS or Heroku-style container: one command sets up the virtualenv, launches the web server, and starts the news-ingest scheduler. Deliverables 1. Fine-tuned BERT model files and training code 2. API-based data-ingestion module with clear config for each source 3. SHAP explainability component wired into the prediction endpoint 4. Web interface + deployment scripts (Dockerfile or Procfile) Acceptance criteria • End-to-end latency per article ≤ 3 s • Minimum 90 % validation F1 on the provided benchmark set • SHAP plot or token-level scores visible for every prediction • README covering setup, retraining, and environment variables If this workflow matches your skills and you’ve shipped similar NLP projects before, I’d love to see an outline of your approach and rough timeline.

Python

Реєстрація