Indian Multi-Language Speech Recognition and Translation System (Open Source)

Заказчик: AI | Опубликовано: 17.10.2025

We are developing a prototype of an Indian language speech recognition and translation system using open-source technologies only (no Google, AWS, or Azure APIs). The system should be capable of: Converting speech to text in multiple Indian languages Translating text between selected Indian languages Providing basic speaker identification (diarization) Offering a RESTful API interface for integration This is a proof-of-concept (MVP) project focusing on 5–10 major Indian languages such as Hindi, Tamil, Telugu, Bengali, and Marathi. The goal is to build a functional base system that can later be expanded to cover 150+ languages and dialects. --- Scope of Work 1. Audio Processing Handle input formats (MP3, WAV, FLAC, M4A) Perform noise reduction and normalization to 16kHz Prepare data for speech recognition models 2. Speech Recognition (ASR) Implement speech-to-text using open-source models (Meta MMS, Whisper, or Coqui STT) Support multiple Indian languages Provide language detection and transcription confidence scores 3. Text Translation Use open-source translation models such as IndicTrans2 or MarianNMT Enable bidirectional text translation between selected Indian languages 4. Speaker Diarization Integrate speaker detection using pyannote.audio or a similar open-source tool 5. API Development Develop RESTful API endpoints for speech-to-text and translation Include basic authentication and documentation 6. Deliverables Complete source code (Python preferred) Deployment and configuration scripts Technical and API documentation --- Preferred Tech Stack Python, FastAPI, PyTorch Whisper, MMS, IndicTrans2, pyannote.audio Hugging Face Transformers Docker for deployment --- Deliverables and Timeline Functional MVP covering 5–10 Indian languages Duration: 6–8 weeks Budget: ₹40,000 (fixed) --- Required Skills Experience with speech recognition and translation using open-source models Familiarity with Indian language datasets (Bhashini, AI4Bharat, Common Voice) Strong Python and API development experience Ability to deliver clean, documented, and reproducible code --- How to Apply Please include the following in your proposal: 1. Short summary of relevant experience with ASR or translation systems 2. Example projects or GitHub repositories 3. Proposed timeline and approach for the MVP