Full-Text Document Search Engine

I need a self-hosted search engine that can crawl and index a mix of PDF, Word, and plain-text files, then return accurate full-text results almost instantly. Once the index is built, users must be able to: • Enter a query and see matches ranked by relevance by default, with an optional toggle to sort by date. • Narrow results by file type so they can quickly focus on just PDFs, DOCX files, or TXT notes. A lightweight web interface or a small REST API is fine—whichever you feel will get the fastest, most reliable response times. I am comfortable provisioning a Linux server, so feel free to lean on Elasticsearch, Apache Lucene/Solr, or another open-source stack you trust; just outline why you picked it and any helper libraries (for example, Tika for document parsing) in your proposal. Deliverables 1. Source code and setup script/container so I can deploy with a single command. 2. Clear README covering prerequisites, indexing instructions, and how to enable the sort/filter controls. 3. A brief test dataset plus test cases that demonstrate searches, date sorting, relevance scoring, and file-type filtering. A Document Search Engine is a system that indexes and searches unstructured and semi-structured documents such as PDFs, Word files, text files, and scanned documents, allowing users to quickly find relevant information. If the first pass runs smoothly, I may extend the project for OCR support and user-level permissions later on.

Python

Реєстрація