Datasheet Image Extraction Project

Бюджет: 70 $

I have a backlog of roughly 6,000 PDF datasheets whose visual content—photos, technical tables, block diagrams, graphs—must be extracted and saved as standalone JPEGs. Every visual element should come out as its own file, preserved at the original dimensions. Where a source image looks dull, noisy, or overly compressed, please clean it up just enough to look sharp on an e-commerce product page; otherwise leave the image untouched so we keep true-to-source resolution. I am open to whatever toolkit you find fastest and most reliable—Python (PyPDF2, pdf2image), Adobe Acrobat scripts, command-line utilities such as pdfimages—so long as the workflow scales without manual bottlenecks and the output stays consistent. Deliverables • A folder structure that mirrors the original PDF set, with each visual element saved as an individual JPEG. • File names that clearly map back to the parent datasheet (e.g., SKU_page-03_diagram-A.jpg). • Light enhancement only on low-quality originals; no upscaling for its own sake. • A brief read-me outlining the extraction pipeline so I can reproduce or extend it later. I will supply a batch of sample PDFs for you to prove the process before we roll through the full archive.

Python

Регистрация