AI in EE

AI IN DIVISIONS

AI in Computer Division

Yujeong Choi, Jiin Kim, and Minsoo Rhu, “ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models,” The 51st IEEE/ACM International Symposium on Computer Architecture (ISCA-51) (유민수 교수 연구실)

Abstract: With the increasing popularity of recommendation systems (RecSys), the demand for compute resources in data-centers has surged. However, the model-wise resource allocation employed in current RecSys model serving architectures falls short in effectively utilizing resources, leading to sub-optimal total cost of ownership. We propose ElasticRec, a model serving architecture for RecSys providing resource elasticity and high memory efficiency. ElasticRec is based on a microservice-based software architecture for fine-grained resource allocation, tailored to the heterogeneous resource demands of RecSys. Additionally, ElasticRec achieves high memory efficiency via our utility-based resource allocation. Overall, ElasticRec achieves an average 3.3× reduction in memory allocation size and 8.1× increase in memory utility, resulting in an average 1.6× reduction in deployment cost compared to state-of-the-art RecSys inference serving system.

Main figure:

3 1