Project Overview:** I’m developing the **Arabic Lexicon Project (Al-Qamus Al-Dhaki)** — a structured linguistic knowledge base (CSV) containing lexical cards for Arabic words and roots with definitions, explanations, and Quranic, poetic, and Hadith examples. I need **hands-on training** to build a working **RAG (Retrieval-Augmented Generation)** system using **FAISS or Chroma** and **LangChain**. **Training Objective:** - Step-by-step practical sessions to build a working prototype using my data. - By the end of the sessions, I should be able to update the data, run retrievals, and generate Arabic answers independently. **Training Scope (3–5 Sessions):** 1) Environment setup (Google Colab or VS Code) + library installation. 2) Convert CSV data into documents + create embeddings + build FAISS/Chroma vector store. 3) Test similarity search with Arabic queries. 4) Connect retrieval to an LLM using LangChain (Arabic prompt with grounded answers). 5) Deliver documentation + best practices for maintenance and scaling. **Trainer Requirements:** - Proven experience in Python, LangChain, FAISS or ChromaDB, and Embeddings (OpenAI or Sentence-Transformers). - Prior experience with RAG/NLP projects (GitHub links or examples preferred). - Clear and patient teaching style (focus on **training**, not just code delivery). - Understanding of Arabic is a plus (reading or testing Arabic outputs). **Deliverables:** - Working notebook/repository + minimal documentation (English preferred). - Initial vector index built from my sample CSV data. - Example Arabic queries with accurate grounded answers. **Logistics:** - Sessions via Zoom or Google Meet (screen sharing). - Language: English (Arabic understanding optional). - Timezone: Asia/Riyadh (GMT+3). - Payment type: Hourly. - Suggested rate: 15–30 USD/hour, total 10–15 hours (flexible based on progress). **Screening Questions:** 1) Link to a previous RAG project (GitHub/Notebook) + your role in it? 2) Do you prefer FAISS or Chroma, and why? 3) What embedding model do you recommend for Arabic data? 4) Estimated hours to build a working prototype with Arabic CSV data? 5) Can you offer a 30-minute trial session to review setup and workflow plan? **Required Skills:** Python, LangChain, FAISS, ChromaDB, NLP, Embeddings, OpenAI API, LlamaIndex, Machine Learning, Data Engineering