research

papers, experience, and other research works

📄

Speech foundation model ensembles for the controlled singing voice deepfake detection (CtrSVDD) challenge 2024.

Status: Published|2024|

This work presents an ensemble approach leveraging speech foundation models and a novel Squeeze-and-Excitation Aggregation (SEA) method, achieving a 1.79% pooled equal error rate (EER) in the 2024 Singing Voice Deepfake Detection Challenge.

PDF Code BibTex IEEE SLT 2024

📄

NLPineers@ NLU of Devanagari Script Languages: Hate Speech Detection using Ensembling of BERT-based models

Status: Accepted|2025|

This paper explores hate speech detection in Devanagari-scripted languages, focusing on Hindi and Nepali, for Subtask B of the CHIPSAL@COLING 2025 Shared Task. Using a range of transformer-based models such as XLM-RoBERTa, MURIL, and IndicBERT, we examine their effectiveness in navigating the nuanced boundary between hate speech and free expression. Our best-performing model, implemented as an ensemble of multilingual BERT models, achieves a Recall of 0.7762 (Rank 3/31 in terms of recall) and an F1 score of 0.6914 (Rank 17/31).

PDF Code BibTex COLING 2025

📄

Real-Time Scream Detection and Position Estimation for Worker Safety in Construction Sites

Status: Accepted|2024|

Integrating Wav2Vec2 and Enhanced ConvNet models for accurate scream detection, coupled with the GCC-PHAT algorithm for robust time delay estimation under reverberant conditions, followed by a gradient descent-based approach to achieve precise position estimation in noisy environments.

PDF Code BibTex AIRISE 2025