please review project details here https://docs.google.com/document/d/18J_UF2inYrq3XeOUdV7rQXGM_aCYhzpym7QihlKfjfc/edit?usp=sharing Job Title: Data Scientist / AI Engineer (Real Estate Forecasting & Flask Integration) Company / Project Project Title: AI-Enhanced Pre-Foreclosure Property Investment Platform Owner: [Your Name / Company Name] Deadline: July 1, 2025 (Phase I: AI Model v1.0) Website (in development): homeprices.cashprohomebuyers.com We’re building an intelligent real estate platform that empowers homeowners in pre-foreclosure and property investors with AI-driven home value predictions, interactive dashboards, and real-time property insights. The backend is built in Flask (Python), and we’re ready to integrate a robust machine learning pipeline for predictive analytics. Objective Develop and deploy a predictive model that forecasts future property values (6mo, 1yr, 1.5yr, 2yr) using comparative market data, loan info, and local economic trends. The model will power both the Seller and Buyer dashboards, providing dynamic visual forecasts, ROI analytics, and confidence intervals. Core Responsibilities 1. Data Engineering & Integration Build pipelines to collect, clean, and normalize data from: PropStream (comparables, sale history, loan balance) Zillow/Redfin APIs (pricing & trends) County assessor data and optional macroeconomic feeds Handle missing data, outliers, and inconsistent property features. Engineer a unified dataset combining property + market-level variables. 2. Feature Engineering Create new features such as: Equity spread, price/sqft, DOM (days on market) Rolling averages of appreciation rates Local neighborhood appreciation indices Encode categorical variables and scale continuous features. Implement temporal splits for model validation. 3. Model Development Develop two complementary prediction models: XGBoost / LightGBM for short-term (6–12 months) LSTM / Transformer (TensorFlow) for long-term (12–24 months) Train models using time-series regression and feature importance tracking. Produce confidence intervals for all predictions. 4. Model Evaluation Validate with metrics (RMSE, MAPE, R²). Compare model performance by: Region (ZIP/City) Property type Forecast horizon Provide feature importance visualizations and drift tracking. 5. Deployment Package models into a Flask-based REST API: /predict endpoint accepts JSON input and returns forecasts: { "6mo": {"value": X, "ci": [A,B]}, "1yr": {"value": X, "ci": [A,B]}, "1.5yr": {"value": X, "ci": [A,B]}, "2yr": {"value": X, "ci": [A,B]} } Optimize for performance and caching (Redis preferred). Document API schema and connect to the frontend dashboards. 6. Visualization & Dashboard Integration Output prediction data for display in: Seller dashboard (forecast chart + equity analysis) Buyer dashboard (ROI %, forecasted appreciation) Deliver results as JSON + Plotly-ready datasets. 7. Maintenance & Monitoring Implement automated retraining pipelines. Track model accuracy, drift, and versioning (using MLflow or similar). Schedule weekly or monthly retraining using updated PropStream data. Tech Stack Layer Preferred Tools Programming Python 3.10+, Flask Data Pandas, NumPy, SQLAlchemy Modeling XGBoost, LightGBM, TensorFlow/Keras Visualization Plotly, Matplotlib Storage PostgreSQL or Firebase Deployment AWS Lambda / EC2 / GCP Auth / API JWT, Flask-RESTful Tracking MLflow, DVC Deliverables Cleaned and merged dataset (CSV + schema) Feature engineering scripts Trained models (XGBoost & LSTM) Flask REST API with /predict endpoint Model evaluation report (RMSE, MAPE, confidence intervals) Integration guide for dashboards (JSON schema + example plots) Retraining pipeline (ETL + model update) Documentation (Markdown or PDF)