Live Immigration Knowledge Graph Build

The assignment centres on taking live immigration feeds from government databases and shaping them into a continually updated knowledge graph, with a clear upgrade path toward Graph-RAG so an LLM can later query the graph directly. Phase 1 – Knowledge graph Data will arrive as real-time or near-real-time streams. I already have authorised access to the government endpoints; your job is to design and code the ingestion, normalisation, and storage layers. A graph database such as Neo4j, TigerGraph, or Amazon Neptune is preferred, but I am open to any engine that supports ACID guarantees and fast traversals. The graph must refresh automatically as new records appear and expose a REST/GraphQL interface for downstream services. The entities and relationships that must be modelled are: • Visa applications • Border crossings • Residency permits • Visa change procedures • Validity periods • Status transitions • Eligibility rules Phase 2 – Graph-RAG enablement Once the schema is stable, we will add a retrieval layer (LangChain or similar) so that a large language model can run natural-language questions against the graph. Clean embeddings, context windows, and response ranking will all be part of this stage. Key expectations • Clean, well-documented code (Python). • Container-ready deployment scripts (Docker + Compose or Helm). • Continuous ingestion tests that confirm freshness and integrity. • A short README explaining how to spin up the stack locally and how to execute sample queries. • For Phase 2, a demo notebook or endpoint that shows at least three successful Graph-RAG queries returning correct, reference-verified answers. If you thrive on data engineering, graph schemas, and cutting-edge RAG workflows, this project should be a good fit.

Python

Реєстрація