OCR Blood Test Extractor

I have a set of blood-test reports that arrive as PDFs, and I need an accurate, repeatable way to extract only the test result section from each file. The patient demographics and doctor’s notes can be ignored; my focus is strictly on the numerical results, reference ranges, and units. Here’s what I’m looking for: • A lightweight script or small desktop tool (.NET Core + Tesseract, AWS Textract, or any engine you prefer) that ingests multi-page PDF blood panels and returns structured data—CSV or JSON is fine. • Clear mapping of the extracted fields to their respective test names as they appear in the PDF. • Reliability across differing lab layouts; most follow similar tables, but spacing and fonts vary. Acceptance criteria 1. Feed a sample batch of 10 PDFs and receive one consolidated CSV with no missing result values. 2. Numeric accuracy within ±1 of the values shown on the page when spot-checked. 3. Delivery of source code and a brief read-me so I can run it locally on Windows. If you’ve tackled medical OCR before, even better—please mention the toolkit you plan to use and any comparable projects. I’m ready to start as soon as I find the right approach.

Registration