Headless AI RAG Agent Development with Image Generation and AG-UI Protocol -- 2

Project Summary We are seeking a senior developer or team to build a backend-only AI agent (no frontend) designed for Q&A over a private document base. The agent must connect to document sources (initially SharePoint or Google Drive), index their content, and answer user queries. The key technical differentiators for this project are: ag-ui Protocol: All communication with the agent must be handled via the standard ag-ui (Agent-User Interaction Protocol). Image Artifact Output: As a backend-only agent, the API response (via ag-ui) must include not only the synthesized text answer and source links but also dynamically generated PNG files. These images will be "screenshots" of the original document sections (PDF, DOCX, XLSX, etc.) with the relevant text highlighted. Core Functional Requirements 1. Document Ingestion and Synchronization Modular Sources: The agent must be able to ingest documents from a single source at a time, configurable via environment variables (secrets, endpoints, etc.). Initial Connectors: The first two connectors to be developed are for Microsoft SharePoint and Google Drive. File Format Support: Must be able to parse and index the following file types: pdf, pptx, xlsx, docx, md, .html, and .txt. Technical Suggestion (MCPs): We encourage the use of existing Model Context Protocol (MCP) servers (see mcpserver.so) to accelerate the integration with SharePoint and GDrive. The proponent must state if they will use this or a custom alternative. Indexing Process: On startup, the agent must perform a full indexing of the document repository. It must expose a status endpoint (e.g., /api/v1/status) that indicates the indexing status (e.g., "Connecting", "Indexing {X/Y} docs", "Complete"). It must have a synchronization mechanism (daily or, ideally, webhook-based) to detect new/modified/deleted files and update its internal vector database. 2. Q&A Logic (RAG) The agent will receive queries (prompts) via its ag-ui endpoint. It must search its vector database for the most relevant information chunks to answer the query. The logic must be advanced, capable of synthesizing a coherent answer based on multiple sources, not just returning raw text chunks. 3. Artifact Generation (The Key Requirement) Along with the synthesized text answer, the API response must return a list of "matches." For each match, it must provide: The link or identifier of the source document. An on-the-fly generated PNG image file. Image Specifications: The image must be a "screenshot" of the relevant section of the original document. It must include a visual highlight (e.g., a yellow box) over the exact text that was used. The image must be readable. If a single match spans two pages (e.g., in a PDF), a single image containing both pages should be generated (or the proponent must justify an alternative). The proponent must explain their technical strategy for rendering non-visual formats (like .xlsx, .md, .docx) for this image generation. Technical Architecture Requirements API Protocol: The agent must act as an ag-ui compatible server. All interaction will be based on this protocol. Tech Stack: The proponent is free to choose the stack (e.g., Python, LangChain/LlamaIndex, Vector DB) but must specify it clearly in their proposal. LLM (Language Model): The agent must use OpenRouter as the router for all LLM calls. It must be tested to work with at least one open-source model (developer's choice) available on OpenRouter. Scalability: The architecture must be designed to handle a document base of "all types" (from a few small files to thousands of large documents). Execution: The agent must be executable with uvx. Deliverables The complete Git repository with all the agent's source code. A detailed README.md file with: Installation instructions for dependencies. A guide for setting up all environment variables (for the data source, OpenRouter, etc.). Execution instructions (the uvx command). The scope does not include deployment to a server; the deliverable is the finalized, executable source code. Information Required in Your Proposal To be considered, your proposal must include: Detailed Tech Stack: A list of the technologies (Python, libraries, vector DB) you plan to use. (CRITICAL) Image Generation Strategy: A detailed technical plan for "Core Functional Requirement #3." How will you convert and render .xlsx, .docx, md, etc., to generate images with highlights? What specific libraries will you use for this? Ingestion Approach: Confirm if you will use the suggested MCPs (from mcpserver.so) for GDrive/SharePoint or if you propose a manual, custom integration. LLM for Testing: Specify which open-source model (via OpenRouter) you will use for development and testing. Portfolio: Examples of past work related to RAG, AI agents, or complex document processing. Cost, Timeline, and Terms: A time estimate (in weeks) and the total project cost. Currency: All quotes must be in Euros (EUR) or Dollars (USD).

Python

Реєстрація