I’m building an AI-driven assistant that turns a user’s own furniture idea—whether sent as a written description or a photo—into a full, do-it-yourself guide. The goal is that someone can sketch a concept or snap a picture, feed it to the system, and instantly receive: • A materials and tool checklist • Step-by-step build instructions in plain language • Exploded-view images and annotated diagrams generated on the fly • Helpful tips covering safety, finishing, and common pitfalls Scope of work 1. Model pipeline: combine vision models (for photo input) and language models (for text input) so the assistant recognizes furniture parts, measurements, and joinery details. 2. Instruction engine: structure output into logical build phases, dynamically adjusting steps for different skill levels. 3. Image generation: create clear assembly visuals and cut-lists; any proven solution—Stable Diffusion, Midjourney, or similar—works as long as results are coherent and label-ready. 4. Front-end demo: a lightweight web page or notebook showcasing the two input modes and the polished, illustrated output. 5. Documentation: succinct setup guide plus comments in code so I can iterate later. Tech flexibility I’m comfortable with Python, so staying within that ecosystem (PyTorch, TensorFlow, OpenAI APIs, etc.) keeps hand-off smooth. If another stack dramatically improves performance, let’s discuss. Deliverables • Clean, well-structured code repository • Deployed or easily runnable demo • Sample output package for at least three furniture types (e.g., chair, shelf, custom concept) proving the system can generalize • Brief report on future enhancement options such as AR overlays or metric/imperial toggling If you have experience fusing vision models with generative text—or have already tackled similar maker-style projects—I’d love to see what you can bring to this build-instruction AI.