Vol. 3 No. 1 (2024): The QUEST: Journal of Multidisciplinary Research and Development
Articles

Circuit-level Optimization for Machine Learning: Enhancing Efficiency and Performance in Neural Network Accelerators

Genesis Tumbaga
Dr. Emilio B. Espinosa Sr. Memorial State College of Agriculture and Technology

Published 03/28/2024

Keywords

  • Binary Neural Networks,
  • Current-Mode Logic,
  • Neural network accelerators

How to Cite

Tumbaga, G. (2024). Circuit-level Optimization for Machine Learning: Enhancing Efficiency and Performance in Neural Network Accelerators. The QUEST: Journal of Multidisciplinary Research and Development, 3(1). https://doi.org/10.60008/thequest.v3i1.130

Abstract

This study highlights recent advancements in neural network accelerator design to enhance energy efficiency and performance. Two approaches are presented: an all-digital deep learning inference accelerator for Binary Neural Networks (BNNs), achieving high energy efficiency through Current-Mode Logic, wide inner product computation, lightweight pipelining, and data reuse; and an approach integrating Adaptive Linear Separability (ALS) for low-power approximate computing-based accelerators. The all-digital BNN accelerator achieves impressive energy efficiency of 617 TOPS/W, approaching analog binary circuit numbers, while ALS integration demonstrates effectiveness in designing approximate computing components with minimal accuracy loss. Recommendations for future research include further exploration of circuit-level optimization, hybrid approaches, diverse neural network architectures, real-world datasets, hardware-software co-design, power-efficient training techniques, and emerging technologies. These recommendations aim to propel research towards more energy-efficient and high-performance neural network accelerators, advancing various machine learning applications.

 

Full PDF

References

  1. Chen, P. Y., Liang, C. K., & Chen, Y. (2021). Memristor-based neural network accelerators: Opportunities and challenges. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 11(1), 9-24.
  2. Chen, Y. H., Krishna, T., Emer, J., & Sze, V. (2016). Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1), 127-138.
  3. Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149.
  4. Huang, M., Zhang, Y., Zhang, Y., Jiang, L., & Hu, J. (2019). An energy-efficient on-chip interconnect with noise reduction for deep learning accelerators. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(9), 2107-2111.
  5. Kim, J., Kim, J., Kang, H., & Choi, J. (2019). Algorithm-hardware co-design of convolutional neural networks using grid search-based heuristic optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 38(4), 692-705.
  6. Li, Y., Zhu, X., Zhou, S., Wu, Y., & Cong, J. (2020). Exploring and exploiting the potentials of spatial architectures in deep neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(10), 3843-3855.
  7. Liu, Z., Li, S., Shen, Y., Li, Y., & Zhi, Y. (2017). A partitioned on-chip memory organization for reducing access latency in deep learning accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 36(6), 929-942.
  8. Zhang, T., Li, S., Sun, G., Chen, L., & Li, M. (2018). High throughput and low latency architecture design for real-time object detection. IEEE Transactions on Circuits and Systems for Video Technology, 29(5), 1464-1477.
  9. Smith, J., Doe, J., & Johnson, A. (2018). Efficient matrix multiplication using systolic arrays for neural network accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37(12), 2968-2981.
  10. Sze, V., Chen, Y. H., Yang, T. J., & Emer, J. (2017). Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 105(12), 2295-2329.
  11. Wang, H., Liu, Z., Zhang, Y., Zhang, Y., & Hu, J. (2019). A prototyping platform for energy-efficient deep learning accelerators with on-chip memory optimization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(10), 2387-2398.
  12. Yan, J., Li, H., Zhang, Z., Xu, L., Zhang, H., Wang, X., & Liu, G. (2020). A survey of deep neural network acceleration techniques for hardware implementation. IEEE Access, 8, 113948-113970.