The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network

Journal: Scientific and Technical Journal of Information Technologies, Mechanics and Optics (Vol.21, No. 4)

Publication Date: 2021-08-24

Authors : Murtazin R.A. Kuznetsov A.Yu. Fedorov E.A. Garipov I.M. Kholodenina A.V. Baldanova Yu.B. Vorobeva A.A.;

Page : 545-552

Keywords : biometric; automatic speaker verification in banking; synthetic speech; spoofing detection; cepstral analysis; convolutional neural network;

Source : Download Find it from : Google Scholar

Abstract

The existing approaches to detecting synthesized speech, based on the current issues of synthesizing voice sequences, are considered. The stages of the algorithm for detecting spoofing attacks on voice biometric systems are described, and its final workflow is presented. The research focuses mainly on detecting synthesized speech, as it is the most dangerous type of attacks. The authors designed a software application for an experimental study, present its structure and propose the detection synthesized speech algorithm. This algorithm uses mel-frequency and constant Q cepstral coefficients to extract speech features. A Gaussian mixture model is used to construct a user model. Convolutional neural network was chosen as a classifier to determine the voice's authenticity. Two basic methods for combating spoofing attacks, proposed by the authors of the ASVspoof2019 competition, were selected for making comparisons. One of these methods involved using linear frequency cepstral coefficients as speech features, while the other method used constant Q. Both solutions used Gaussian mixture models for classification. To evaluate the effectiveness of the proposed solution and compare it with other methods, a voice database was created. The selected EER and minDCF metrics were applied. The experimental results demonstrated the advantages of the proposed algorithm in comparison with the other algorithms. An advantage of the proposed solution is that it uses extracted speech features that perform efficiently when it comes to user identification. This makes it possible to use the algorithm to optimize a voice biometric system that has embedded protection against spoofing attacks that is built on speech synthesis. In addition, it is possible to use the proposed method for voice identification with minimal modifications required. Voice biometric identification systems have excellent opportunities in the banking sector. Such systems allow banks to simplify and accelerate the process of financial transactions and provide their users with advanced banking functions remotely. The implementation of voice biometric systems is difficult by their vulnerability to spoofing attacks, particularly to those conducted by means of speech synthesis. The proposed solution can be integrated into voice biometric systems to improve their security.

Main Menu

Searching By

PARTNERS

The speech synthesis detection algorithm based on cepstral coefficients and convolutional neural network

Abstract

Advertisement