A Combination of Deep Neural Networks for Acoustic modeling of Vietnamese LVCSR
Journal: International Journal of Advances in Computer Science and Technology (IJACST) (Vol.5, No. 1)Publication Date: 2016-02-25
Authors : Quoc Bao Nguyen; Tat Thang Vu; Chi Mai Luong;
Page : 1-7
Keywords : Bottleneck feature; Deep neural network; Vietnamese LVCSR.;
Abstract
In this work, we propose a deep neural network architecture with the combination of two popular applications of deep neural networks for Vietnamese large vocabulary continuous speech recognition. First, a deep neural network is trained to extract bottleneck features from frames of a combination of Mel frequency cepstral coefficient (MFCC) and tonal feature. This network is then applied as a nonlinear discriminative feature-space transformation for hybrid network training where acoustic modeling is performed by denoising auto-encoder pre-training and back-propagation algorithms. The experiments are carried out on the dataset containing speeches on Voice of Vietnam channel (VOV). The results show that the performance of the system using combined deep neural network architecture obtained relative improvements over the best hybrid HMM/DNN system by 4.1% and over baseline system by 51.4%. Adding tonal feature as input feature of the network reached around 18% relative recognition performance
Other Latest Articles
- Flipping a Class: Impact on Performance and Retention
- Enhancing Reliability in P2P Networks Using Social Capital Principles
- HoloLens - The Technology Beyond Imagination
- Effect and Analysis of Electrically Driven Centrifugal Compressor in Single Cylinder Direct Ignition System
- Android Application on Location Based Profile Manager
Last modified: 2016-02-25 10:10:40