I have a containerised Flask API running on a RunPod GPU instance and it just refuses to behave: endpoints hang, sometimes the whole service crashes, and the logs don’t tell me much. I need a sharp pair of eyes to jump in, trace the failure, patch whatever is broken and leave me with a reliably responding API that survives restarts. Context • The service runs inside RunPod’s standard Docker image with CUDA enabled. • The API handles model inference requests and streams results back to the caller — nothing fancy beyond typical POST/GET routes. • Python-based stack: Flask, Gunicorn, and the usual suspects (uvicorn, requests, etc.). What I expect from you 1. Connect via the RunPod dashboard or SSH tunnel, reproduce the fault, and pinpoint its cause (dependency clash, bad config, resource leak, whatever it turns out to be). 2. Apply the fix: code, config, or deployment tweak — I don’t mind, so long as it’s clean and documented. 3. Add a concise README section or inline comments so I can follow what you changed and why. 4. Hand back a container that builds and starts with docker compose up and serves every endpoint without time-outs. If you normally lean on tools like Flask-DebugToolbar, Gunicorn workers, Prometheus metrics, or NVIDIA-SMI to spot GPU bottlenecks, feel free — as long as the final image stays lightweight. I’m ready to spin up a fresh pod or give you direct access to the current one right away. Let me know your availability and any quick diagnostics you’d like me to run in advance.