Cornell University

X
A new study presents BhashaSetu, featuring the innovative GETR (Graph-Enhanced Token Representation) method for cross-lingual knowledge transfer to extremely low-resource languages with just hundreds of labeled examples.
The research focuses on improving performance for both sentence-level and word-level NLP tasks in Indian languages. The GETR approach leverages graph neural networks to transfer linguistic knowledge from high-resource languages, outperforming existing multilingual methods.
Results show significant improvements: 13 percentage points for POS tagging in truly low-resource languages like Mizo and Khasi, and impressive gains of 20 and 27 percentage points in macro-F1 scores for sentiment classification and named entity recognition in simulated low-resource languages (Marathi, Bangla, Malayalam).
The study also analyzes the specific mechanisms that make cross-lingual knowledge transfer successful in this context, providing insights for future work on computational linguistics for under-represented languages.
Bengaluru Startup Sarvam AI Launches Indus, an AI Chatbot for Indian Languages
National Tamil Science Conference 2026 Promotes Scientific Innovation Through Tamil Language
Amit Shah Advocates Indian-Origin Scripts for Tribal Languages, Addresses Kokborok Script Debate
Kuku Launches India's First AI-Generated Microdramas at India AI Impact Summit 2026
Army Commander Attends Certificate Ceremony at Joint Services Language Training Institute, Kothmale
Shunya Labs Launches Vāķ, India's Largest Open-Weight Voice AI Supporting 55 Indian Languages
