Unity Offline Voice SDK Setup

Бюджет: 50 $

We have already developed and fully tested a Windows C# Advanced Speech Analysis (ASA) engine that includes: Offline Speech-to-Text (Whisper-based) Phoneme-level analysis (Wav2Vec2 CTC) Pronunciation scoring (CTC forced alignment) Accent strength scoring (WavLM embedding model) Fully working evaluation pipeline The system works correctly on Windows using C# + AI models. We now require an experienced mobile developer to: Convert this engine into a production-ready Unity SDK for Android and iOS. This includes native model execution and Unity integration. To verify your skill, I will give you TTS Mac SDK build work first(Milestone1, 400$). This already developed android, iOS TTS SDK, but it needs Mac SDK too to integrate in final dev team. Scope of Work The developer will: Create Native Mobile Inference Layer Android Build native .so library (NDK) Integrate: ONNX Runtime (for phoneme model) Whisper.cpp (for STT) TorchScript WavLM model (accent) ARM64 support required iOS Build static .a library or XCFramework CoreML optional (if beneficial) ARM64 support required Model Integration We will provide: whisper model (ggml format) Phoneme CTC model (ONNX INT8) Accent model (TorchScript) Reference embedding bank Vocabulary file Developer must: Load models efficiently Run inference offline Optimize memory usage Ensure stable performance Unity SDK Wrapper Create a clean Unity C# interface: Example API: ASAResult Analyze( float[] audioPCM16k, string expectedWord, string expectedIPA ); Return structured result: { "stt_text": "volleyball", "stt_match": true, "pronunciation_score": 82, "accent_score": 74, "phonemes": [ { "ipa": "v", "score": 90 }, { "ipa": "ɒ", "score": 65 } ] } Unity plugin must support: Android (AAR) iOS (Xcode project / xcframework) Thread-safe inference Non-blocking calls Functional Requirements 1. Offline Only No cloud calls No external APIs 2. Performance Targets Model load time: < 2 seconds Per utterance (1–3 sec audio): Total latency < 1.5 seconds Memory under reasonable mobile limits 3. Audio Input 16kHz mono PCM float Silence trimming (VAD optional but recommended) 4. Scoring Logic (Already Designed) Pipeline: Whisper STT → check expected word If match → run phoneme forced alignment Compute per‑phoneme scores Compute overall pronunciation score Compute accent similarity score Return structured result Developer does NOT need to redesign scoring logic. Deliverables Android .so library iOS .a or .xcframework Unity wrapper Example Unity demo scene Build documentation Performance test report Required Skills C++ (strong) Android NDK iOS native development Unity native plugin development ONNX Runtime mobile Experience with Whisper.cpp Experience with TorchScript / LibTorch Audio DSP basics Preferred: Experience with on-device ML optimization Experience with quantized models Budget & Timeline Timeline: 3 weeks Fixed price or milestone-based Must deliver stable SDK, not prototype

Реєстрація