Vision-Based Browser Automation System Development

Бюджет: 750 $

This project is a full stack system built around a vision based browser automation bot. The bot visits three target websites, fills multi-step forms on each of them using data provided by an operator, scrapes the results returned after each submission, and stores everything in a database. The operator interacts with the system through a web interface and never touches the bot or the code directly. The system has five parts that all need to be built and connected. A React frontend is where the operator submits input data and monitors what the bot is doing in real time. A FastAPI backend sits behind the frontend and handles communication between the operator, the bot, and the database. Redis acts as the message layer between the backend and the bot, passing events like run triggers, status updates, CAPTCHA alerts, and completions. PostgreSQL stores all input and output data with full traceability between them. The automation service is the bot itself, built using MSS for screen capture, OpenCV for image matching, OCR for reading text from the screen, and PyAutoGUI for simulating mouse and keyboard interactions. The bot does not interact with HTML at any point. It works entirely by looking at the screen, the same way a human would. For every section on every page it visits, it must follow a strict five step cycle. It first takes a screenshot and verifies it is in the correct section before doing anything. It then selects that section and verifies again. It then fills the required field using only mouse and keyboard inputs. It then takes another screenshot to confirm the field is no longer empty. It then validates one more time before moving forward. If any step fails, the bot goes back to a defined earlier step and retries. There are no shortcuts and no fallback methods. If the bot encounters a Cloudflare challenge or bot detection page it cannot resolve on its own, it pauses, publishes an alert through Redis, and triggers a sound notification that repeats every five minutes until a human operator manually resolves it and tells the bot to continue. The full system must run on Windows. All output data must be available for download as an Excel file through the frontend. Version control and code documentation are mandatory throughout the build. The reference images and interaction flow the bot uses for visual matching will be provided by the client. Developers do not generate or decide these themselves.

Python

Регистрация