I need a lightweight, computer-vision bot that relies on on-screen image recognition rather than Selenium or similar browser-automation libraries. The workflow is straightforward: the bot launches a local browser window, navigates to a specific site I will provide, and—through visual cues—automatically clicks the required elements, fills in form fields, and finally extracts the resulting text, images, and tabular data that the page returns. The process must loop continuously so it can watch for page changes and react in real time. A small delay between cycles is fine as long as the interaction remains smooth and does not overload the system or the target site. Robust error handling is important; if a button is missing or the page layout shifts, the bot should retry gracefully or log the issue without crashing. Please choose your preferred vision stack—OpenCV, PyAutoGUI, SikuliX, or an equivalent solution—and include lightweight OCR where necessary for the text capture. Java, Python, or another language is acceptable as long as setup remains minimal and cross-platform. Key deliverables: • Ready-to-run script or executable with clear configuration for URLs, visual anchors, and form data. • Brief README outlining dependencies, setup steps, and how to extend or retrain the visual templates. • Logging/reporting module that shows each cycle’s success status and the data captured (CSV or JSON is fine). I will provide the site URL, form values, and any visual references you need once we start. Looking forward to a clean, reliable solution built purely on vision-based automation techniques.