iOS Engineer - Real-time Vision Pipeline & LLM API Integration

Бюджет: 250 $

Seeking an experienced iOS Developer to build a high-performance framework for real-time environmental analysis. The core of the project involves creating a stable bridge between the device's camera system and a cloud-based Large Language Model (LLM). Technical Requirements: Advanced Camera Implementation: Build a custom AVFoundation session that programmatically detects and prioritizes the Ultra-Wide (0.5x) lens across all compatible hardware (iPhone 11 through 17). Frame Capture & Buffering: Implement a high-efficiency capture system to extract frames at defined intervals (2-3 seconds) and transmit them to an API without blocking the main thread. AI Integration: Connect the capture pipeline to the Google Gemini Flash API (via REST or Swift SDK). Dynamic Instruction Architecture: Build a modular system where the system_instruction for the LLM is pulled dynamically from a local configuration file based on a user-selected profile. Speech Output: Integrate AVSpeechSynthesizer for low-latency conversion of API text responses into audio. Required Stack: Swift 6.0 / SwiftUI. Frameworks: AVFoundation, Vision, Combine. API Experience: RESTful services, JSON handling, Google AI SDK. Target: iOS 15.0+ (Optimized for iPhone 11 - 17). Screening Questions (Must answer to be considered): How do you propose to handle automatic lens discovery (Wide vs. Ultra-Wide) to ensure the widest Field of View is always selected programmatically? What is your strategy for managing memory usage and CMSampleBuffer release during continuous, long-term frame capture sessions? Given Apple's privacy constraints, what is your approach to maintaining the camera session active while the UI is in a low-power/dimmed state?

Додатки для ios

Реєстрація