I need a command-line tool that lets a user type a short prompt in plain English and immediately receive working Python code (a function), see any issues fixed on the fly, and watch the script run with its output printed back to the same terminal session. The workflow I have in mind is: 1. The user enters a natural-language request (e.g. “create a quicksort function and test it”). 2. The AI engine generates Python code that fullfils the request (it may use available AI code generation CLI tools - codex, claude code, etc. whatever works best). 3. Before execution, the tool checks the code for syntax or logical errors; if something is flagged, it automatically proposes and applies a correction, explaining what changed. 4. The corrected script is executed in a safe, isolated environment, and the terminal displays stdout, stderr, and the final result. Key points I want covered • Single-language focus: Python only • Generated function may have arguments and other dependencies (other functions provided as params) • Pure CLI experience: no web or desktop layers; standard input/output is enough. • Clean architecture: keep model prompts, error-handling logic, and execution sandbox in clearly separated modules so I can extend them later. • Security: the execution step must be sandboxed (subprocess with resource limits or similar) to avoid arbitrary system access. • Readable output: color-coded sections for generated code, bug fixes, and final results will make the tool friendlier to use. Deliverables • Source code with clear instructions on installing dependencies and running the CLI. • A short README that explains how to add new prompt templates or swap in a different LLM provider. • Demo video or GIF showing the tool generating, correcting, and executing a sample script. Acceptance criteria – I can run `python main.py "make a fibonacci generator"` and receive working code plus the sequence output. – Deliberately introducing a syntax error still results in an automatic fix before execution. – No un-sandboxed shell access is possible from within generated code (prove with a simple security test). If you have experience combining large language models with safe code execution, I’d love to see a quick outline of your approach and timeline.