ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Integrating Faster Whisper with Deep Learning Speaker Recognition

Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.13, No. 9)

Publication Date:

Authors : ; ; ; ;

Page : 1-8

Keywords : ASR; Faster whisper; MFCC; ResNet CNN architecture; Speaker Recognition;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Effectively communicating and understanding can be a challenging task for people that are either deaf or hard of hearing, it involves them to constantly rely on help to adequately fit in, however with assistive technologies they can minimize their everyday problems. This paper contributes as an advancement to one of these techs and addresses to integrate Faster Whisper, a real-time Automatic Speech Recognition (ASR) model, and a deep learning-based speaker recognition system built on ResNet Convolutional Neural Network (CNN) architecture. Noise Augmentation is employed to enhance the capabilities of the model especially to cope with noisy environments before using the Mel-Scale Frequency Cepstral Coefficients (MFCC) with Delta and Double-Delta Coefficients extracted from speech signals to learn spectral and temporal Features. Due to specific model selection with experiments and skip connections the results have shown significant improvements in Performance metrics with high accuracy and low latency, outperforming standalone models. Future research could explore further refinements in model integration along with the applications of this technology in diverse real-world scenarios, paving the way for more comprehensive solutions in assistive communication technologies.

Last modified: 2024-09-15 01:59:40