Share

BhashaSetu: Novel GNN Approach Boosts NLP Performance for Low-Resource Indian Languages

BhashaSetu: Novel GNN Approach Boosts NLP Performance for Low-Resource Indian Languages

Cornell University

5 February 2026

X

A new study presents BhashaSetu, featuring the innovative GETR (Graph-Enhanced Token Representation) method for cross-lingual knowledge transfer to extremely low-resource languages with just hundreds of labeled examples.

The research focuses on improving performance for both sentence-level and word-level NLP tasks in Indian languages. The GETR approach leverages graph neural networks to transfer linguistic knowledge from high-resource languages, outperforming existing multilingual methods.

Results show significant improvements: 13 percentage points for POS tagging in truly low-resource languages like Mizo and Khasi, and impressive gains of 20 and 27 percentage points in macro-F1 scores for sentiment classification and named entity recognition in simulated low-resource languages (Marathi, Bangla, Malayalam).

The study also analyzes the specific mechanisms that make cross-lingual knowledge transfer successful in this context, providing insights for future work on computational linguistics for under-represented languages.