20MB Large PDF Compression

I’m looking for a practical way to squeeze our 2 500-page sales-contract PDFs below the 20 MB ceiling that our AI compliance engine enforces. Standard “reduce file size” tricks still leave us hovering around 250 MB, so I need a smarter, repeatable approach. You can build the solution in Python—my preferred environment—using whatever blend of libraries makes sense (PyPDF2, Ghostscript, qpdf, OCR-friendly image down-sampling, etc.). The end product should: • Consistently push multi-thousand-page PDFs under 20 MB • Preserve text and key tables well enough for the AI to parse (moderately readable is fine; crisp print quality is not required) • Run from the command line or a small API wrapper so the team can drop files in and retrieve the compressed versions without tinkering each time Deliverables 1. Clean, documented Python script (or small package) with all dependency notes 2. A one-page “how-to” covering installation, usage, and typical compression ratios 3. Before-and-after sample using one of our contracts to prove the 20 MB target is hit When you reply, show a quick example of similar PDF or document-compression work you’ve done so I can gauge fit right away. If you have clever ideas—incremental compress-then-linearize passes, mixed-resolution strategies, or leveraging OCR layers—feel free to mention them; I’m open to the most reliable path that meets the size limit.

Python

Registration