Document Analysis Web App Development

Бюджет: 250 $

I need a web-based application that takes uploaded PDFs, Word files, and scanned images and automatically pulls out key metadata such as name, date, and any other tagged fields I define later. The main goal is accurate, repeatable data extraction rather than broad document classification or summarisation. The flow I have in mind: • User drags a document onto the page or sends it via an API endpoint. • The app detects the file type, performs OCR when necessary, and isolates the requested fields. • Extracted values are returned as JSON and also stored in a lightweight database for audit and search. I am open to your choice of stack, though something modern such as Python (FastAPI, Tesseract, PyPDF2), JavaScript/TypeScript (Node, Express, pdf-lib), or similar would be ideal if it keeps deployment straightforward. Whatever you pick, please keep the code clean, modular, and container-friendly (Docker). Acceptance criteria 1. Upload interface and REST endpoint both functional. 2. Successful extraction of at least the fields “name” and “date” from sample PDFs, DOCXs, and scanned PNG/JPEGs I will supply. 3. JSON output matches a predefined schema and reaches >95 % accuracy on the sample set. 4. Clear README with setup, environment variables, and a short demo video or screenshots confirming the above tests. Let me know which libraries or cloud services you plan to use for OCR and parsing, how long integration will take, and any previous projects that show you can handle mixed document inputs.

Python

Регистрация