I’m rolling out a website-embedded chatbot that will serve as our first-line customer service agent. The immediate priority is simple: the bot must reliably understand and respond to visitor questions drawn from our internal knowledge base. To make that happen, I need an engineer who can design every layer of a retrieval-augmented generation (RAG) pipeline—embeddings, vector search, reasoning loop, and safety filters—then orchestrate it through the OpenAI API with LangChain (or an equivalent framework). Effective prompt engineering and tool calling are core to the role, as is a clear strategy for handling edge-cases and abusive inputs. You can build in Python or Node.js; whichever you pick, the final service should expose clean endpoints that my frontend team can drop into the site without friction. Experience evaluating or fine-tuning LLMs to tighten accuracy will come in handy as we iterate. Deliverables • Architecture diagram detailing data flow, RAG logic, and safety layers • Production-ready code for ingestion, embedding creation, vector store, and LLM orchestration • Integration hooks plus a minimal HTML/JS snippet that shows the chatbot live on the site • README with setup steps, environment variables, and tuning guidelines Acceptance criteria • ≥ 90 % accuracy on a supplied set of customer queries • ≤ 3 s median response time under light load • No successful prompt-injection attack from the provided red-team list If you’ve shipped something similar, I’d love to see it and hear which libraries or tricks you’d lean on first.