Title: Build POC to Compare Normal RAG vs Graph RAG vs Tree RAG on Enterprise Knowledge Base Project Summary: I need an experienced AI/LLM engineer or small team to build a Proof of Concept that compares 3 retrieval approaches on the same real knowledge base documents: 1. Normal RAG (vector similarity / vector DB) 2. Graph RAG (entity + relationship + graph traversal) 3. Tree RAG (page / heading / section / hierarchy-based retrieval) The purpose of this POC is not only to make all 3 work, but to compare them fairly on the same documents and same question set, then recommend which approach works best for which question type. Main Goal: Build a working POC that can: * ingest the same source documents * create 3 separate indexes from the same documents * answer questions using each retrieval approach * run a comparison on the same question set * generate a final evaluation report with findings and recommendation Business Objective: We want to understand whether our agent/orchestrator should dynamically select: * the correct knowledge base * the correct retrieval strategy based on the user question. Current Thinking / Expected Architecture: There are 2 modes in this POC. 1. Runtime mode For one real user question: * user asks question * orchestrator classifies question * system selects KB * system selects retrieval strategy * selected retriever fetches evidence * evidence is normalized * same foundation model generates answer with citations 2. POC comparison mode For evaluation: * same question is intentionally run through all 3 retrieval approaches * outputs are compared side by side * recommendation is created based on real results Scope of Work: Phase 1: Start with one KB only For fair comparison, begin with one knowledge base only, for example: * Document 1 Later, the design should be extendable to: * Document 1 * Document 2 * Document 3 Stage 0: Document Preparation and Index Building Build 3 indexes from the same source documents. A. Vector Index for Normal RAG Expected: * document parsing * chunking with overlap * embedding generation * vector DB / vector index * metadata stored for each chunk: * source document * page number * chunk position B. Graph Index for Graph RAG Expected: * define domain schema * identify entity types * identify relationship types * entity extraction pipeline * relationship extraction pipeline * entity linking / canonicalization * graph storage * every entity and relationship must store source-text back reference Important: Graph retrieval must not return only triples. It must also ground results back to original source passages for answer generation. C. Tree Index for Tree RAG Expected: * parse document structure * detect headings / subheadings / sections / pages * build hierarchy like: Document → Chapter → Section → Subsection → Paragraph / Page * store hierarchy path and source references Important: Before Tree RAG indexing, do a document structure audit and clearly report whether the documents are suitable for tree-based retrieval. Stage 1: Question Analysis and Routing Build orchestrator/routing logic with these steps in sequence: 1. classify question type 2. select KB/domain 3. select retrieval strategy based on: * question type * available indexes for the selected KB Initial routing heuristics: * factual / semantic question → Normal RAG * relationship / dependency / multi-hop / comparative question → Graph RAG * section / heading / page / hierarchy question → Tree RAG * aggregation question → Graph or Tree depending on document structure, may also need post-retrieval computation These are only initial heuristics. The POC should validate or correct them. Stage 2: Retrieval Execution Runtime mode: * only one selected retrieval path runs POC comparison mode: * all 3 retrieval paths run for the same question Expected retrieval behavior: Normal RAG: * embed user query * run vector similarity search * return top K chunks with scores and metadata Graph RAG: * extract entities from query * perform canonicalization / entity linking * traverse graph with bounded hops * retrieve connected nodes / relationships * ground all results back to source passages * optional hybrid retrieval support is a plus Tree RAG: * match query against hierarchy * navigate headings / section titles / page references * return section text + hierarchy path + page references Stage 3: Evidence Normalization Create a common evidence schema for all 3 approaches. Every retrieved item should be normalized into a structure containing: * source document * location in document * retrieval method * confidence / relevance score * retrieved text Reason: The generation layer and evaluation layer must consume a common structure regardless of retrieval method. Stage 4: Answer Generation Use the same foundation model and same generation policy across all 3 approaches. Important: For fair comparison, keep fixed: * same FM / LLM * same prompt template * same temperature * same max tokens * same evidence injection style Answer must include citations based only on retrieved evidence. Stage 5: Logging and Metadata For every run, capture: * KB selected * retrieval method selected * retrieved evidence * retrieval latency * generation latency * confidence / relevance details * citations returned Stage 6: POC Evaluation Harness Build evaluation mode where the same tagged question set runs across all 3 approaches. Question set: * around 30 to 50 questions * based on real use cases * tagged by question type: * factual * multi-hop * comparative * section-reference * aggregation Evaluation metrics: * answer accuracy * retrieval relevance * citation quality * faithfulness / grounding * completeness * hallucination * latency * implementation effort * maintenance complexity Nice to have: * recall measured on a labeled subset * automated scoring helpers * evaluation dashboard or comparison sheet Final Deliverables: 1. Working POC codebase 2. Setup / run instructions 3. Ingestion pipeline for all 3 index types 4. Runtime routing flow 5. POC comparison harness 6. Sample outputs for all 3 approaches 7. Evaluation matrix / comparison sheet 8. Final recommendation report including: * strengths and weaknesses of each approach * best approach by question type * whether dynamic KB + RAG routing is justified * suggested production architecture direction Technical Expectations: Freelancer should have strong experience in: * Python * LLM / RAG systems * vector databases * graph databases / Neo4j or equivalent * document parsing / PDF processing * evaluation of GenAI systems * prompt design for evidence-grounded answering Preferred experience: * Graph RAG * hierarchical / tree-based retrieval * Bedrock / Azure OpenAI / OpenAI APIs * LangChain / LlamaIndex / custom pipelines * citation-grounded QA systems What I Need in Proposal: Please include: 1. Relevant similar work you have done 2. Your suggested technical stack 3. How you would implement all 3 approaches 4. How you would ensure fair comparison 5. Estimated timeline 6. Estimated budget 7. Key risks / assumptions 8. Example of deliverables you would provide Project Success Criteria: The project is successful if: * all 3 retrieval approaches work on the same document set * outputs can be compared fairly * evaluation clearly shows where each approach performs well or poorly * final recommendation is backed by data, not theory Important Notes: * This is a POC, not a production system * correctness of comparison matters more than UI polish * clean architecture and clear evaluation matter a lot * documentation is important