I have a mixed dataset—numerical and categorical—whose end-goal is straightforward: accurate prediction. I need you to take the raw file from first inspection all the way to a validated model I can reuse, while clearly showing me every step that gets us there. Here’s the flow I expect. You will: • Clean the data, manage missing values, encode categories and scale where needed. • Perform exploratory analysis, visualising key relationships so I can understand what drives the target. • Engineer features that improve predictive power and explain why they help. • Flag any columns that may be sensitive; I’m not yet sure whether anonymisation is required, so please advise and mask if necessary. • Train several algorithms in Python (pandas, NumPy, scikit-learn; XGBoost or LightGBM welcome), compare them with sound validation, and select the best. • Evaluate the chosen model on a hold-out set, reporting metrics, feature importance and any limitations. Deliverables • A fully runnable Jupyter notebook or .py script with tidy, commented code • Clear markdown cells or a brief report that explains your methodology and findings • A README with environment details (requirements.txt / conda YAML) so I can reproduce results on my machine Acceptance is complete when the notebook runs end-to-end, the model beats a simple baseline, and the documentation lets a non-technical teammate grasp what was done and why.