Female Safety Data & Modelling

I need a solid foundation of publicly available, text-based tabular datasets that focus on crimes affecting women so I can refine the analytics engine behind my safety web-app. The data should be pulled directly from GitHub, Kaggle and Mendeley Data; if an especially relevant repository appears elsewhere, flag it as an optional extra but keep the primary emphasis on those three platforms. Once the datasets are gathered and documented, the workflow continues into model experimentation. I want to see Decision Trees, Random Forests and Support Vector Machines trained on the cleaned data, all coded in Python (pandas, scikit-learn or equivalent). For each model run, report Accuracy, Precision, Recall and F1-Score using stratified cross-validation so we can decide which approach integrates best with the app’s pipeline. Deliverables • Curated dataset pack (original files + cleaned versions) with a brief README per source • Pre-processing and feature-engineering scripts/notebooks • Training notebooks or scripts for the three models mentioned above • Comparative metric table and a concise recommendation outlining the best-performing model and why I will plug the chosen model straight into the web-app, so clarity, reproducibility and well-commented code are essential.

Python

Регистрация