Python OCR Project – Header Order, Unknown Errors, New Features & Optimization

Бюджет: 30 $

I maintain a Python OCR tool that captures data and exports it to Excel, and I need several targeted improvements. First, the column headers in the spreadsheet must always appear in a strict, predefined sequence, yet the current script shuffles them. That ordering logic needs to be corrected. Second, the OCR occasionally misreads a header and records it as “Unknown.” I want reliable header detection so every expected header is either correctly identified or explicitly flagged. I’m also expanding the project to cover extra odds categories (Performance, Historical, and Future), so the data model and extraction logic should accommodate those fields cleanly. On top of that, the codebase should move to the latest stable release of PaddleOCR. Please handle any breaking changes, tidy up dependencies, and squeeze out obvious performance gains. I have representative PDFs/images that reproduce the issues and will share them once we begin. Deliverables: • Refactored Python script that enforces the fixed column order. • Enhanced header-recognition logic with accuracy notes. • Integration (or clear extension points) for the three new odds categories. • Updated requirements.txt locking in the upgraded PaddleOCR and related libraries. • Well-commented code plus a concise README covering setup and testing. Fluency with PaddleOCR, pandas, openpyxl, and clean Python structuring is essential. Looking forward to collaborating on a solid, maintainable upgrade. The work should be completed within 2–3 days.

Реєстрація