Embedding Operations in Deep Learning Recommendation Models
At Facebook, AI is essential at providing personalized feed and at understanding visual and linguistic contents, and must scale to billions of users. I will specifically focus on deep learning recommendation models (DLRM) that account for the bulk of our AI usage and have substantially different characteristics from computer vision and language models, yet relatively less studied in academia. I will deep dive into embedding operations of the deep learning recommendation models that pose interesting challenges to memory/storage and network systems. The embedding operations require more than hundreds of GBs of memory while demanding HBM-level memory bandwidth. They also result in all-to-all collective communication that are not easy to scale with hierarchical network topology unlike all-reduce often found in computer vision and language models. I will go over the current state-of-the art and want to invite researchers at KAIST to discuss future directions.
Jongsoo Park is a Technical Lead at Facebook AI Systems Co-design. His work includes characterization of deep learning inference to guide industry/academia, and FBGEMM, a linear algebra library for low-precision machine learning. Prior to FB, he was at Intel Parallel Computing Labs, contributing to sparse convolutional neural networks and highly parallel sparse linear solvers. He was a recipient of a best paper at supercomputing conference for his work on low-communication FFT.
Meeting ID: 876 4277 7975