I’m working with a collection of images that come paired with captions or other textual metadata, and I need a concise, well-structured Python solution that lets me analyse both components in one pass. The image side will involve standard loading and preprocessing; the text side must run sentiment analysis, text classification, and targeted data extraction on the accompanying words. Feel free to tap into familiar toolkits such as OpenCV or Pillow for the visuals, and spaCy, NLTK, or Hugging Face transformers for the NLP work—whatever you are most productive with, as long as the dependencies are clearly listed in a requirements.txt. Deliverables • A modular Python script (or Jupyter notebook) that ingests a folder of images plus their text, cleans and prepares each modality, and produces: 1. Structured image features or summaries that I can reuse later. 2. Sentiment scores, classification labels, and extracted key data points for every text item. 3. A consolidated CSV or JSON report combining the above outputs. • Clear inline docstrings, a brief README that explains how to run everything from the command line, and a small sample dataset to prove it works. Acceptance criteria • I can install via pip -r requirements.txt without errors. • Running one command processes the supplied sample and generates the final report files in an /output directory. • Results are reproducible on Python 3.9+. Keep the code readable and commentary light—once the prototype works I’ll extend it further myself.