research
papers, experience, and other research works
Speech foundation model ensembles for the controlled singing voice deepfake detection (CtrSVDD) challenge 2024.
This work presents an ensemble approach leveraging speech foundation models and a novel Squeeze-and-Excitation Aggregation (SEA) method, achieving a 1.79% pooled equal error rate (EER) in the 2024 Singing Voice Deepfake Detection Challenge.
NLPineers@ NLU of Devanagari Script Languages: Hate Speech Detection using Ensembling of BERT-based models
This paper explores hate speech detection in Devanagari-scripted languages, focusing on Hindi and Nepali, for Subtask B of the CHIPSAL@COLING 2025 Shared Task. Using a range of transformer-based models such as XLM-RoBERTa, MURIL, and IndicBERT, we examine their effectiveness in navigating the nuanced boundary between hate speech and free expression. Our best-performing model, implemented as an ensemble of multilingual BERT models, achieves a Recall of 0.7762 (Rank 3/31 in terms of recall) and an F1 score of 0.6914 (Rank 17/31).
Real-Time Scream Detection and Position Estimation for Worker Safety in Construction Sites
Integrating Wav2Vec2 and Enhanced ConvNet models for accurate scream detection, coupled with the GCC-PHAT algorithm for robust time delay estimation under reverberant conditions, followed by a gradient descent-based approach to achieve precise position estimation in noisy environments.