top of page

CSCE 990: Hardware Acceleration for Machine Learning (Fall 2020-2021-2022)


Machine learning (ML) is currently widely used in many advanced artificial intelligence (AI) applications. Breakthrough in computational ability has enabled the system to compute complicated different ML algorithms in a relatively short time, providing real-time human-machine interaction such as face detection for video surveillance, advanced driver-assistance systems, and image recognition of early cancer detection. Among all those applications, a high detection accuracy requires complicated ML computation, which comes at the cost of high computational complexity. This results in a high requirement on the hardware platform. Currently, most applications are implemented on general-purpose compute engines, especially graphics processing units (GPUs). However, work recently reported from both industry and academy shows a trend in the design of application-specific integrated circuits (ASIC) for ML, especially in the field of deep neural networks (DNN). This course gives an overview of the hardware accelerator design, including the basics of deep learning, deep learning frameworks, hardware accelerators, co-optimization of algorithms and hardware, training and inference, and support for state-of-the-art deep learning networks. This course is a seminar-style course, so students are expected to present, discuss, and interact with research papers. At the end of the semester, students will present their work based on a class research project

CSCE 430/830: Computer Architecture (Spring 2021-2022)


The architecture of single-processor (Von Neumann or SISD) computer systems. Evolution, design, implementation, and evaluation of state-of-the-art systems. Memory Systems, including interleaving, hierarchies, virtual memory, and cache implementations; Communications and I/O, including bus architectures, arbitration, I/O processors, and DMA channels; and Central Processor Architectures, including RISC and Stack machines, high-speed arithmetic, fetch/execute overlap, and parallelism in a single-processor system.

bottom of page