Political Intelligence & Narrative Analysis Platform

Заказчик: AI | Опубликовано: 20.11.2025

I am building a political intelligence and narrative analysis platform focused on urban Indian constituencies. The system will collect public, legally accessible digital data (social media, news, civic APIs) and process it through an NLP-based intelligence engine to produce: Issue clusters Sentiment trends Influencer mapping Localized political insights Narrative recommendations I need a small, highly competent technical team (or individual) to build Module 1 (Data Ingestion) and Module 2 (Intelligence Engine) for an MVP. This is NOT a generic social media scraper. It is a structured political insights platform similar in concept to Cambridge Analytica, but built entirely with public data, ethical boundaries, and a strict legal compliance framework. Key Requirements 1. Data Ingestion Layer (Module 1) Build automated ingestion pipelines for: Public Social Media Twitter/X API (search, timelines, keywords, geo-tagged content) YouTube API (video metadata + comments) Instagram public posts/comments where permissible News & Media RSS feeds from major national & local publishers Optional: scraping digital news portals (must be ToS-safe) Civic / Public Data Open Government Data (OGD) APIs Municipal complaint data (where public/available) Public parliamentary/assembly data (PRS, MPLADS, etc.) Pipeline Requirements Incremental fetch (avoid duplicates) Store raw JSON in S3 (or similar) Normalise + enrich data (language tag, location heuristics) Push final structured data into a searchable index (Elasticsearch or OpenSearch) Modular architecture for adding more connectors later 2. Intelligence Engine (Module 2) NLP Components Topic modeling (BERTopic or similar) Sentiment analysis tuned for Indian English/Hinglish Entity extraction (political figures, places, issues) Clustering of issues into categories (traffic, corruption, pollution, governance, etc.) Embeddings + vector search (Milvus/Pinecone/Weaviate) Influence Mapping Build lightweight social graph (retweets, mentions) Identify influential accounts and communities Compute basic centrality scores Analytics & Scoring Issue heatmaps (topic frequency + sentiment) Ward/area-level mapping (geo-tagged or inferred) Trend spikes + anomaly detection Daily/weekly narrative indicators Recommendations (Rule/ML Hybrid) Match issue clusters + sentiment → suggested messages/themes Aggregate insights into a “Daily Brief” JSON object 3. MVP Dashboard (Lightweight) (Not full UI — just enough for internal testing) Issue clusters, top topics Sentiment graphs Influencer list Trend timeline Downloadable daily brief (PDF or JSON) Tech Stack (Flexible but must be modern) Python (FastAPI / Flask) or Node.js for backend PostgreSQL + Elasticsearch/OpenSearch Vector DB (Weaviate/Pinecone/Milvus) AWS/GCP for hosting Twitter/X + YouTube official APIs NLP stack: transformers, sentence embeddings, BERTopic, spaCy/HF Ideal Freelancer / Team Strong experience with data ingestion & ETL pipelines Experience with NLP models, embeddings, clustering Experience building searchable data stores (Elastic, OpenSearch) Comfortable with APIs, rate limits, and compliance restraints Able to propose improvements instead of waiting for instructions Must understand ethical boundaries & Indian data context (no private data, no scraping violating TOS) Deliverables (MVP) Working ingestion pipelines for 3–5 data sources Clean, searchable structured dataset NLP pipelines (topic clustering, sentiment, entity extraction) Influencer mapping module Daily brief generator Lightweight demo dashboard Documentation (architecture + setup) Milestones M1 (2–3 weeks): Data ingestion (Twitter + YouTube + 3 RSS feeds) + storage + basic search. M2 (3–4 weeks): NLP engine: clustering, sentiment, entity extraction, vector search. M3 (2–3 weeks): Influencer graph + daily brief + simple dashboard. Total MVP Timeline: 8–10 weeks. Budget Flexible based on competence — quality matters more than budget. Quote your best realistic timeline + cost with example projects.