Vision-Based UI Bot for Form Processing

Customer: AI | Published: 07.04.2026
Бюджет: 300 $

Build a cross-platform desktop bot that uses computer vision (template matching) to automate web form filling and data scraping — controlled through a web-based dashboard, with no Selenium or browser drivers. We need a vision-based UI automation bot capable of navigating real browsers (Chrome, Firefox, Brave, Opera) installed on the user's machine. The bot will fill web forms and scrape report data using image-based template matching — no browser drivers or Selenium. A local web UI will serve as the control dashboard for operators to input data, monitor progress, and handle errors manually when needed. Core requirements 1. Cross-platform support Must run natively on Windows, Linux, and macOS without platform-specific hacks. 2. Native browser control Open and control the locally installed browser (Chrome, Firefox, Brave, or Opera) at the UI level. Selenium and WebDriver-based solutions will not be accepted. 3. Template-based visual navigation Navigate target websites and perform predefined operations using template/image matching. Reference images for each action step will be provided by us. 4. Web-based operator dashboard A local web UI for operators to enter form input data. The dashboard must display a live count of inputs submitted and automations completed successfully. 5. Step-level error handling with manual override On error, the bot pauses at the failed step and sends a real-time notification in the web UI. The operator completes that step manually, then the bot continues automatically from the next step. 6. Structured CSV output Export results as a structured CSV where input data occupies the first columns and scraped output data follows in subsequent columns. The core requirments are: 1. The system should be able run in any os enviromnment windoes, linux, mac 2. it should open local browser installed on the machine whethere it is crome, firefox, brave, opera. 3. The it should navigate tho targe websites website do the predefined operation by template matching (reference image will be provided) 4. There must be we based UI to insert data that will be used to fill up the forms 5. The UI must track how many inputs are given are and how many automation is done sucessfully 6. If there any error occurred during the automation it should pause the at that setp and the notifies in the web Ui with notification and so the operaton can do that mannualy and after that step the bot should continue with the next steps 7. The output data should have starutured formated csv that will have have imput data at dirst coulmns and the output data at the later columns We are looking for ✓Hands-on experience with computer vision libraries (e.g. OpenCV, PyAutoGUI,Playwright) ✓Ability to build a local web dashboard to drive and monitor bot activity ✓Solid understanding of OS-level UI automation across Windows, Linux, and Mac ✓Experience with real-time notifications and step-level error recovery logic ✗Do not apply if your solution relies on Selenium, WebDriver, or any headless browser API When applying, please describe your proposed technical approach — especially how you plan to handle template matching across different screen resolutions and OS environments. Include any relevant demos or past work.