Efficient implementation of artificial neural networks on FPGAs using high-level synthesis and parallelism
Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.11, No. 119)Publication Date: 2024-10-30
Authors : Mini K. Namboothiripad; Gayathri Vadhyan;
Page : 1497-1511
Keywords : Artificial neural network; Field programmable gate arrays; High level synthesis; Node and weight level parallelism; Programmable logic.;
Abstract
Artificial neural networks (ANNs) have gained significant attention for their ability to solve complex problems in various domains. However, the efficient implementation of ANN models on hardware remains challenging, particularly for systems requiring low power and high performance. Field programmable gate arrays (FPGAs) offer a promising solution due to their reconfigurability and parallel processing capabilities. This study explores the implementation of ANN on an FPGA using high level synthesis (HLS), focusing on optimizing performance by leveraging weight-level and node-level parallelism. Two methodologies were proposed for efficiently implementing ANN computations on an FPGA. The focus was on partitioning the computations of the ANN's first layer to the programmable logic (PL) of a system-on-chip (SoC) FPGA, while offloading the processing of subsequent layers to a 666 MHz, advanced reduced instruction set computer machine (ARM) processor. Six designs with varying levels of weight-level and node-level parallelism were implemented on a python based FPGA (PYNQ) board. Multiple processing elements (PEs) and sub-PEs were instantiated in the PL to extract parallelism from the ANN computations. Single-precision floating-point accuracy was used throughout the implementations. The custom digital design, operated at 150 MHz, achieved a significant speedup, demonstrating 2.5 times faster computation than the 666 MHz ARM processor for the entire ANN computation even with the limited resources available on the PYNQ board. Scaling up with multiple FPGAs could result in performance levels comparable to generic processors. The integration of HLS and the control block redesign capabilities of the ARM processor made the system adaptable to various applications without requiring extensive knowledge of hardware descriptive languages (HDL). This research shows that FPGA-based implementations of ANN, especially using HLS, offer a viable and efficient alternative to graphical processing unit (GPU) or processor-based designs for ANN applications. The demonstrated speedup achieved through parallelism and the use of PL indicates the potential of FPGAs in creating dedicated application-specific integrated circuits (ASICs), for ANN applications, offering a competitive option compared to traditional GPU or processor-based solutions.
Other Latest Articles
- An approach to cloud user access control using behavioral biometric-based authentication and continuous monitoring
- Advancing retinal image analysis: from preprocessing to lesion identification in diabetic retinopathy
- Analyzing lip dynamics using sparrow search optimized BiLSTM classifier
- Trust-based secure and optimal route selection in MANET utilizing multiple agent-based reinforcement learning
- Gait-based gender spoofing detection using depth images
Last modified: 2024-11-07 23:26:01