Share

Voice of India Benchmark Reveals Global AI Models Struggle with Indian Speech Recognition

Voice of India Benchmark Reveals Global AI Models Struggle with Indian Speech Recognition

The Economic Times

The Economic Times

A groundbreaking speech recognition benchmark called *Voice of India*, developed by Josh Talks and AI4Bharat at IIT Madras, has exposed significant performance failures in global AI systems when processing Indian languages and accents. The evaluation, covering 15 languages and approximately 35,000 speakers, reveals that leading international models from OpenAI and Microsoft struggle dramatically with how Indians actually speak.

India-focused Sarvam Audio consistently outperforms global competitors, particularly against OpenAI's models which trail by over 50 percentage points in accuracy. The benchmark highlights critical disparities: all models perform better on Indo-Aryan languages like Hindi and Bengali (5-6% word error rate) compared to Dravidian languages like Tamil and Telugu (15-20% WER). Regional Hindi dialects such as Bhojpuri and Chhattisgarhi, spoken by tens of millions, see error rates jump to 20-30%.

The evaluation uniquely incorporates code-switched speech, background noise, and geographic dialect variations that reflect real-world Indian conversations. With voice becoming the primary digital interface for banking, healthcare, and government services, these high error rates have serious implications for digital inclusion across India's diverse linguistic landscape.