Automate PDF Report Splitting

I have one large PDF of 139 report pages that I need broken into individual files. Each resulting PDF must be named exactly in this pattern: YYYY-MM-DD_author_type. The date you use comes from the report header—not the file’s metadata—so the script or workflow has to read that header text, extract the correct date, and apply it to the filename together with the author and the word “report” (or the type you find in the header). Once the 139-page test set is proven accurate, I will supply another batch of roughly 3,900 pages that needs the same treatment. These are all standard reports with the date positioned consistently in the header. Acceptance criteria • Every report page saved as its own PDF • Filenames follow the exact YYYY-MM-DD_author_type pattern • Date pulled from the header, not creation metadata • No pages missed, merged, or renamed incorrectly Feel free to use Python (PyPDF2, PDFMiner, pdfplumber), Acrobat JavaScript, or any reliable PDF-handling tool you prefer; all I need is a repeatable method plus the correctly split files. Let me know how quickly you can turn around the 139-page sample and what approach you plan to take.

Python

Реєстрація