Apr 12, 2024 : CSE Bits

The details of the speakers are as follows:
1. Kaushal Bhogale (CS22D006)
Title : Improving ASR Systems for Low-Resource Languages by Curating Datasets from Public Sources
Abstract: End-to-end (E2E) models have become the default choice for state-of-the-art speech recognition systems. Such models are trained on large amounts of labelled data, which are often not available for low-resource languages. Techniques such as self-supervised learning and transfer learning hold promise, but have not yet been effective in training accurate models. On the other hand, collecting labelled datasets on a diverse set of domains and speakers is very expensive. In this work, we demonstrate an inexpensive and effective alternative to these approaches by ``mining'' text and audio pairs for Indian languages from public sources, specifically from the public archives of All India Radio. As a key component, we adapt the Needleman-Wunsch algorithm to align sentences with corresponding audio segments given a long audio and a PDF of its transcript, while being robust to errors due to OCR, extraneous text, and non-transcribed speech. We thus create Shrutilipi, a dataset which contains over 6,400 hours of labelled audio across 12 Indian languages totalling to 4.95M sentences. On average, Shrutilipi results in a 2.3x increase over publicly available labelled data. We establish the quality of Shrutilipi with 21 human evaluators across the 12 languages. We also establish the diversity of Shrutilipi in terms of represented regions, speakers, and mentioned named entities. Significantly, we show that adding Shrutilipi to the training set of Wav2Vec models leads to an average decrease in WER of 5.8% for 7 languages on the IndicSUPERB benchmark. For Hindi, which has the most benchmarks (7), the average WER falls from 18.8% to 13.5%. This improvement extends to efficient models: We show a 2.3% drop in WER for a Conformer model (10x smaller than Wav2Vec). Finally, we demonstrate the diversity of Shrutilipi by showing that the model trained with it is more robust to noisy input.
2. Dalwadi Aditya Yogeshbhai (CS22S005)
Title: Planar Cycle-Extendable Graphs
Abstract: A nontrivial connected graph G is said to be matching covered if each edge is part of some perfect matching. Most of the problems in matching theory can be reduced to matching covered graphs. Such graphs are also known as 1-extendable graphs because each edge extends to a perfect matching. In a similar fashion, we say that a matching covered graph G is cycle-extendable if (either) perfect matching of each even cycle Q extends to a perfect matching (of G). Equivalently, a matching covered graph is cycle-extendable if each even cycle can be expressed as a symmetric difference of two perfect matchings. Another motivation to work on cycle-extendable graphs arises from the ear decomposition theory of matching covered graphs (that is similar to the well-known ear decomposition theory of 2-connected graphs). From this viewpoint, a matching covered graph G is cycle-extendable if and only if each even cycle extends to an ear decomposition of G. As of now, there is no known NP characterization of cycle-extendable graphs (in general). Recently, we obtained NP (as well as P) characterizations of planar cycle-extendable graphs. Xiaofeng Guo and Fuji Zhang (2004) obtained such a characterization for bipartite planar cycle-extendable graphs (wherein they refer to cycle extendable graphs as 1-cycle resonant graphs). We provide a complete characterization of nonbipartite planar cycle-extendable graphs in terms of four infinite families.

Date : Friday, April 12
Time : 5:00 PM
Venue : CSB-25

© 2016 - All Rights Reserved - Dept of CSE, IIT Madras
Website Credits