PDF Scraping & AI Translating in WordPress/ PHP+MySQL

Замовник: AI | Опубліковано: 03.10.2025
Бюджет: 750 $

Large-Scale PDF Scraping, AI Translation & WordPress/PHP Website Development We’re looking for an experienced developer to build a scalable system that can automatically scrape, AI translate, and publish a very large volume of PDF documents as SEO-friendly web pages 1. Data Extraction & Processing • Automatically scrape and download all PDF files from a publicly available website. • Extract text content from each PDF. 1a.[Pay special attention to extracting the title of each document (this will become the article title – see step 2 content publishing)]. 1b. [Remove any personal data (especially from the first pages of the documents) so that such information is not extracted or published.] • Translate extracted text using AI (or another reliable translator). ________________________________________ 2. Content Publishing • For each translated file, create a new article/page on the website. Each PDF file =>AI Translation => one SEO-friendly page. • Technology: WordPress or custom PHP/MySQL solution. • Text must be stored in the database (not as iFrames) for full SEO rendering. The complete text must be visible as standard HTML text. ________________________________________ 3. SEO & Indexing • Auto-generate unique meta titles and meta descriptions for every page (fully crawlable, indexable). • Use clean, descriptive URLs (e.g. /category/document-title-keywords). o Each page should include: Title, Tags, Meta description, Full HTML/text content. • Implement an XML sitemap. ________________________________________ 4. Security & Reliability • Anti-scraping & anti-DDoS protection. • DMCA/copyright system. ________________________________________ 5. Performance Targets • Fast page load times and mobile-first responsive design. Page load time: under 2.5 seconds (desktop & mobile). • Core Web Vitals score: 90+ (Google PageSpeed Insights). • TTFB: under 500 ms. ________________________________________ 6. Search & Navigation • Search bar with filters (categories, tags, keywords). • Fast search results with filtering options. • Browsing by category. • Support for multiple category levels (category, subcategory, sub-subcategory). • All pages must be free to read and browse for all visitors. ________________________________________ 7. Scalability • Implement a scalable architecture to handle a large volume of content efficiently. • The system/script must be capable of automatically scraping/downloading PDFs, AI-translating, and publishing the initial 220,000 PDF titles/files into indexable web pages upon launch. ________________________________________ Payment Terms • 100% of the payment will be placed in Escrow on Freelancer.com. • Payment will be released only after the project is fully functional on the live server and all requirements are met. • Proof required: production-ready site, having all of the initial 220,000 documents, meeting requirements including Google PageSpeed Insights Core Web Vitals score of 90+ on the Document Page CPT (Custom Post Type). ________________________________________ Deliverable A complete, production-ready website/system meeting all the above requirements. The system must be built in such a way that it can be easily customized to repeat the process with other websites as well. !!!!! PLEASE READ BEFORE BIDDING !!!!! Do not bid if you do not have the skills to complete this project. Do not bid if you have never done this before. This should be a simple project for someone who knows what they are doing. MOST IMPORTANT: Before placing any bid, you must contact me privately to receive the link to the website from which the PDF documents will be scraped and downloaded. Do not place a bid before contacting me. TO APPLY: - Place your real bid amount, not a placeholder. I do not want to waste time renegotiating. Time-wasters, please do not bid. Place a real bid amount for this project, not a random sum, and do not ask for more money later. No generic bids. Bid what you actually want me to pay you. I will choose based on the content of your bid. - Please DO NOT bid if you haven’t read the full job description. Please start your proposal with the phrase -"The sun is pink"- in the first line of your proposal; otherwise, it will not be considered. This is to confirm that you have read the full description. My time is just as important as yours, and I don’t want us to waste each other’s time. - Please DO NOT send copy-paste automated messages or automated bids. Questions & Clarifications Ask any questions or request clarifications before placing your bid. Do NOT ask questions or clarifications after bidding. [[[[[Everything above is required for your bid to be considered]]]]]