Due to the essential role of matrix multiplication in many scientific applications, especially in data and compute -intensive applications, we explore the efficiency of highly used matrix production algorithms. This paper proposes an HW/SW co-optimization technique, entitled EaseMiss, to reduce the cache miss ratio for large general matrix-matrix multiplications. First, we revise the algorithms by applying three software optimization techniques to improve performance. Choosing the proper algorithms to achieve the best performance is examined and formulated. By leveraging the proposed optimizations, the number of cache misses decreases by a factor of 3 in a conventional data cache. To further improve, we then propose SPLiTCACHE to virtually split data cache regarding matrices’ dimensions for better data reuse. This method can be easily embedded into conventional general-purpose processors or GPUs at the cost of negligible logical circuit overhead. After using the correct and valid splitting, the obtained results show that the cache misses reduce by a factor of 2 compared to the conventional data cache on average in the machine learning workloads.
Ali Nezhadi, Shaahin Angizi, Arman Roohi