I’m building a ROS2-based perception stack that fuses 3-D lidar data with RGB camera streams to detect, track, and identify people and objects in real time. The platform must perform reliably in both indoor corridors and outdoor open spaces, so calibration, synchronization, and robustness to lighting changes are essential. Pipeline goals • Fuse lidar point clouds and camera frames to create a unified scene representation. • Run object detection and multi-object tracking on this fused data, then carry out facial recognition on suitable tracks. • Output three simultaneous products: – Annotated images or video frames that overlay bounding boxes, IDs, and distance estimates. – Time-stamped data logs (CSV/ROS2 bag) of detections, tracks, and recognition results. – Real-time ROS2 topics or HTTP/MQTT alerts to trigger downstream actions. Technical context The core must live entirely inside ROS2 (Humble or later) and compile with colcon. You’re free to choose proven tools such as OpenCV, PCL, YOLOv8, DeepSORT, OpenVINO, or TensorRT as long as dependencies are containerised or well-documented. Accurate extrinsic calibration between the lidar (Velodyne VLP-16 on /velodyne_points) and the camera (ZED2 on /zed/left_raw) is a key milestone before fusion. Deliverables 1. Source code and CMake/colcon packages organised under a Git repo. 2. Launch files and parameter YAMLs for fast deployment on x86_64 and Jetson Orin. 3. Sample annotated video, log files, and screenshots proving correct operation indoors and outdoors. 4. Brief setup guide outlining hardware connections, calibration procedure, and steps to reproduce results. If you have prior ROS2 sensor-fusion projects or can share demos of real-time annotated feeds, please mention them in your proposal.