Title | : | Advancing Multilingual NLP: Data, Modeling, and Evaluation Strategies |
Speaker | : | Sumanth Doddapaneni (IITM) |
Details | : | Tue, 25 Mar, 2025 4:00 PM @ SSB 334 |
Abstract: | : | The rapid advancements in multilingual Natural Language Processing (NLP) have significantly expanded the capabilities of language models. However, challenges remain in ensuring robust multilingual generation, reliable evaluation, and high-quality language resources. In this seminar, we explore three key contributions addressing these gaps. First, we introduce IndicXTREME and IndicBERT v2, which provide the largest monolingual corpora, benchmarks, and multilingual models for Indic languages. These resources improve language coverage, enhance representation learning, and establish a standardized evaluation framework for multilingual classification models in low-resource settings. To advance multilingual long-form generation, we propose QAPGEN, a novel approach that leverages intermediate question-answer pairs as structured planning hints to improve content generation across multiple languages. While QAPGEN enhances coherence and informativeness, our results reveal two key challenges: (i) evaluating generative tasks remains highly difficult, necessitating better automatic metrics for efficiency and reliability, and (ii) despite improvements, the translate-test approach—where input is translated into English, processed, and translated back—continues to outperform structured multilingual generation. Finally, we present FBI (Finding Blind Spots in Evaluator LLMs), a framework that critically examines the reliability of automatic evaluation methods. FBI systematically injects controlled perturbations into generated outputs and measures whether Evaluator LLMs can accurately detect errors in factuality, coherence, instruction adherence, and reasoning. Our findings reveal substantial blind spots in current evaluation strategies, underscoring the urgent need for more robust and interpretable automatic metrics for multilingual text generation. Through these discussions, we highlight key challenges in building high-quality, reliable multilingual NLP systems and propose new directions for improving both generation models and their evaluation. |