ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login


Journal: Scientific and Technical Journal of Information Technologies, Mechanics and Optics (Vol.20, No. 4)

Publication Date:

Authors : ;

Page : 560-567

Keywords : convolutional neural networks; neural network accelerators; hardware accelerators; FPGA; CAD; high-level synthesis;

Source : Downloadexternal Find it from : Google Scholarexternal


Currently, more and more tasks on image processing and analysis are being solved using convolutional neural networks. Neural networks implemented using high-level programming languages, libraries and frameworks cannot be used in real-time systems, for example, for processing streaming video in cars, due to the low speed and energy efficiency of such implementations. The application of specialized hardware accelerators of neural networks is necessary for these tasks. The design of such accelerators is a complex iterative process requiring highly specialized knowledge and qualification. This consideration makes the creation of automation tools for high-level synthesis of such computers a relevant issue. The purpose of this research is a tool development for the automated synthesis of neural network accelerators from a high-level specification for programmable logic devices (FPGAs), which reduces the development time. A description of networks is used as a high-level specification, which can be obtained using the TensorFlow framework. The several strategies have been researched for optimizing the structure of convolutional networks, methods for organizing the computational process and formats for representing data in neural networks and their effect on the characteristics of the resulting computer. It was shown that structure optimization of neural network fully connected layers on the example of solving the handwritten digit recognition problem from the MNIST set reduces the number of network parameters by 95 % with a loss of accuracy equal to 0.43 %, pipelining of calculations speeds up the calculation by 1.7 times, and parallelization of the computing process individual parts provides the acceleration by almost 20 times, although it requires 4-6 times more FPGA resources. Applying of fixed-point numbers instead of floating-point numbers in calculations reduces the used FPGA resources by 1.7–2.8 times. The analysis of the obtained results is carried out and a model of an automated synthesis tool is proposed, which performs the indicated optimizations in automatic mode in order to meet the requirements for speed and resources used in the implementation of neural network accelerators on FPGA.

Last modified: 2020-08-26 03:20:26