Multiply–accumulate operation (MAC) is a fundamental component of machine learning tasks, where multiplication (either integer or float multiplication) compared to addition is costly in terms of hardware implementation or power consumption. In this paper, we approximate floating-point multiplication by converting it to integer addition while preserving the test accuracy of shallow and deep neural networks. We mathematically show and prove that our proposed method can be utilized with any floating-point format (e.g., FP8, FP16, FP32, etc.). It is also highly compatible with conventional hardware architectures and can be employed in CPU, GPU, or ASIC accelerators for neural network tasks with minimum hardware cost. Moreover, the proposed method can be utilized in embedded processors without a floating-point unit to perform neural network tasks. We evaluated our method on various datasets such as MNIST, FashionMNIST, SVHN, Cifar-10, and Cifar-100, with both FP16 and FP32 arithmetics. The proposed method preserves the test accuracy and, in some cases, overcomes the overfitting problem and improves the test accuracy.
Ali Nezhadi, Shaahin Angizi, Arman Roohi