Train PDF Table Classifier

Замовник: AI | Опубліковано: 08.12.2025

I have two PDFs that contain nothing but clean, well-structured tables. Your task is to pull those tables into a workable format, develop a small classifier around them, and then respond to a set of fifteen short questions I’ll share as images once we start. One of those questions simply asks for a screenshot of the model’s performance, so please keep your notebook or console output handy. The target label is still open—binary or multi-class might make the most sense once you inspect the columns—so I’m happy for you to recommend the final approach after a quick exploratory look. Python with pandas and scikit-learn (or a lightweight TensorFlow/Keras setup) will be perfect, but feel free to lean on whatever tooling you’re most efficient with as long as it’s easily reproducible. Deliverables • Cleaned CSV (or DataFrame pickled) version of both PDF tables • Well-commented code/notebook that trains the classifier and prints key metrics • Answers to all 15 questions, including the requested performance screenshot Everything should run end-to-end on a standard Python environment without hidden dependencies. Let me know if you need a sample of the PDFs ahead of time.