I need an end-to-end deep-learning model that can pick out identical human faces across images and video in real time. The core requirements are straightforward: • Detect every human face in an image or live stream, draw accurate bounding boxes, then compare each face against a gallery to decide whether it is an identical match. • Serve three environments without extra rewrites: security-grade CCTV feeds, social-media style mobile uploads, and large photo-management archives. • Deliver low latency on a single modern GPU while still running acceptably on CPU-only hardware for lightweight deployments. I’m comfortable with either PyTorch or TensorFlow/Keras; use the framework you know best. A pre-trained backbone such as ResNet, MobileNet, or Vision Transformer is fine as long as you include the full training pipeline so I can continue to improve the model with fresh data. Deliverables 1. Source code with clear, commented modules for detection, embedding generation, and similarity matching. 2. Pre-trained weights ready for immediate inference. 3. A small demo app or notebook that shows: – live webcam/CCTV inference, – batch search across a folder of photos, and – a simple REST or gRPC endpoint that returns face locations and a similarity score. 4. Written setup guide plus concise API documentation. Acceptance criteria • ≥ 95 % precision / ≥ 90 % recall on the LFW or a comparable public dataset. • Mean search time ≤ 100 ms per face on an NVIDIA T4 or equivalent. • All delivered code runs out-of-the-box on Ubuntu 20.04 with Python 3.10. If any open-source libraries are used (e.g., MTCNN, FaceNet, dlib), list their licenses so I can keep compliance simple.