I need an intermediate-level, causal multimodal agent that can take raw chest X-ray images, link them with the corresponding clinical notes and output a clear diagnosis pipeline. The workflow must cover three core functions: automatic image analysis, robust disease prediction focused on thoracic findings, and the generation of both text and graphical reports (heat-maps, saliency overlays, or similar visual explanations). All processing will involve X-rays only; CT and MRI are outside the project scope. The system must ship with fully commented, runnable source code and deliver reliable, end-to-end results on a small validation set so I can demonstrate functionality immediately. Deliverables • Clean, modular code (Python preferred) that loads chest X-rays, parses the paired report text and produces causal attention/feature maps. • A disease prediction module that outputs probability scores plus a concise textual summary. • A dual-format report creator that writes a human-readable paragraph and embeds the supporting graphics. • README with setup instructions and a short demo notebook or script. Timing & budget The total budget is fixed at 8 000, with payment released in two milestones: 60 % on a feature-complete prototype, 40 % after final hand-off and quick bug fix round. I need everything wrapped up by the end of next week, so please confirm you can meet the timeline before we proceed. The main element is causal learning into project.