Transistors have been continuously scaled to be smaller, faster and more energy-efficient. Yet, wires connecting the transistors, blocks and chips have been unable to be scaled as much as the transistors. As a result, the performance and energy consumption of chips and systems began to be dominated by those of data movements over the wires. Tackling this challenge, many researchers proposed various architectures and circuits that compute near data. These proposals, nonetheless, often fell short to consider various practical constraints imposed by complex ecosystems and practices of the computing industry.
Aiming to impact the industry with innovative yet practical proposals, I have worked closely with the industry to consider the practical constraints and developed hardware and (system) software co-design proposals for computing near data. Among these, for this talk, I chose two proposals that just began to be evaluated by two major hyperscalers for production system deployments. More specifically, in this talk, I first present a computing near memory architecture with a software driver that allows any existing application to run on a server without any change in server hardware and application software. Second, I present a computing in network architecture with an enhanced distributed computing framework software for accelerating distributed deep neural network (DNN) training, where a large fraction of training time is spent for communicating weights and gradients over the network. Demonstrating the feasibility and effectiveness of both the proposals with commercial servers, I implemented the proposed architectures with FPGA cards, one of which is a custom design to be connected to a main memory channel, and necessary supporting software.
I am a tenured full professor at the University of Illinois, Urbana-Champaign and a fellow of both ACM and IEEE. From 2018 to 2020, I took a leave of absence and as a Sr. Vice President at a major memory manufacturing company I led the development of next-generation DRAM products that will play a significant role in shaping the future computing landscape. Prior to joining the University of Illinois in the fall of 2015, I was an associate professor at the University of Wisconsin, Madison where I was early-tenured in 2013. My interdisciplinary research incorporates device, circuit, architecture, and software for power-efficient computing. Prior to joining the University of Wisconsin, Madison, I was a senior research scientist at Intel from 2004 to 2008, where I conducted research in power-efficient digital circuit and process architecture. I have published more than 200 refereed articles to highly-selective conferences and journals in the field of digital circuit, processor architecture, and computer-aided design. The top three most frequently cited papers have more than 4000 citations and the total number of citations of all my papers exceeds 12000. I was a recipient of the IEEE International Symposium on Microarchitecture (MICRO) Best Paper Award in 2003 and ACM/IEEE Most Influential International Symposium on Computer Architecture (ISCA) Paper Award in 2017, along with many best paper nominations at top conferences. I am a hall of fame member of all three major computer architecture conferences, IEEE International Symposium on High-Performance Computer Architecture (HPCA) , MICRO, and ISCA.