Develop Multimodal AI Assistant

I’m building an education-focused assistant that can chat in natural language, study a student’s handwritten notes, listen to spoken questions, and digest lengthy PDFs, then return clear, well-cited explanations. The experience must feel fluid and human-like, so the text-based conversational layer is the first priority; once that core is smooth we will layer in vision, voice, and document intelligence. The stack is already sketched out: Python, LangChain to orchestrate tool calling, a vector database for Retrieval-Augmented Generation, Whisper for speech-to-text, a lightweight text-to-speech module, and a Streamlit front end. Your task is to turn that outline into a working product that can be demoed by non-technical educators. Deliverables • Production-ready Python repo with clear separation between chains, model wrappers, and UI. • Streamlit interface that supports text chat today and can be extended to images, voice, and PDF ingestion tomorrow. • Automated tests proving the assistant can answer curriculum-style questions after ingesting sample lesson material. • Concise README plus one-command deployment (Docker or Streamlit Cloud). When you reply, include a detailed project proposal: architecture diagram, model choices, phased timeline, and any previous work that proves you can ship LLM-powered RAG systems. I’ll review proposals this week and move quickly toward kickoff.

Python

Реєстрація