Scalable Azure RAG Backend

Замовник: AI | Опубліковано: 13.10.2025

Python backend RAG on AZURE with autoscaling with openai oct12 I need well performing Python-based AI langchain backend agent using AZURE, so no UI , but interface should be done programmatically using python fastapi to enable access from any local computer on web AZURE ( both VM and lambda, one implementation using VM and second implementation using lambda ) You provide in clear English video with step by step instructions and python code files (both code for AZURE and local computer for testing) and doc file with written instructions (text and screen copies ) use only python and langchain provide using LLMs openai 1 use good langchain rag github project 2 for testing request sent from running locally python automatically in parallel (you provide such local running python code ) , but for real use : many users will use from many local computers 3 I do not provide LLM API key for development , but I will test using my own LLM api key 4 Acceptance criteria I can reproduce on my windows computer and on my AZURE account 5 provide auto scaling like load balancer. use AZURE VM . both options with docker and without docker 6 provide user interface on local to detailed make testing possible like add files to all , add files to user, add many files from given folder, 7 provide db data isolation between users. so user 1 and can not chat with data for user 2 8 I do not give you my AZURE account, develop on your own AZURE account 9 Conversation Logs ( user and agent ) are saved (file name has user name and date time stamp ) on external persistent AZURE disk (delete automatically older than 35 day ) and on local 10 Supports 10 request per user per minute for 30 users (30 threads) 11 your responsibility to provide working correctly project 5 days to do 12 explain in detail step by step how to deploy on AZURE. Request and conversations text are sent from locally running python code 13 autoscaling as more users as more AZURE (as less as less instances ) instances allocated. you proive code python to simulate high and low load with fast change from high to low and low to high. from 10 to 200 users during 10 seconds 14 code runs both locally and AZURE both server and client 15 Each user get one same link , For each user separate vm created , When user close browsers vm destroyed