Cornell University

X
A new study presents BhashaSetu, featuring the innovative GETR (Graph-Enhanced Token Representation) method for cross-lingual knowledge transfer to extremely low-resource languages with just hundreds of labeled examples.
The research focuses on improving performance for both sentence-level and word-level NLP tasks in Indian languages. The GETR approach leverages graph neural networks to transfer linguistic knowledge from high-resource languages, outperforming existing multilingual methods.
Results show significant improvements: 13 percentage points for POS tagging in truly low-resource languages like Mizo and Khasi, and impressive gains of 20 and 27 percentage points in macro-F1 scores for sentiment classification and named entity recognition in simulated low-resource languages (Marathi, Bangla, Malayalam).
The study also analyzes the specific mechanisms that make cross-lingual knowledge transfer successful in this context, providing insights for future work on computational linguistics for under-represented languages.
Bangladesh agri-tech platforms use Bangla apps for farmers
Marathi Made Mandatory for Auto-Rickshaw Drivers in Maharashtra
Chrome adds support for 8 Indic languages in AI features
Telangana digitizes 1.8 lakh manuscripts in 15 languages
93% of Karnataka students choose Hindi as third language
AI Farm Advisor Built for Telugu Speakers in India
