Shinhwan Kang, Kyuhan Lee, and Kijung Shin
ICDE 2022: IEEE International Conference on Data Engineering 2022
Abstract: Are users of an online social network interested equally in all connections in the network? If not, how can we obtain a summary of the network personalized to specific users? Can we use the summary for approximate query answering? As massive graphs (e.g., online social networks, hyperlink networks, and road networks) have become pervasive, graph compression has gained importance for the efficient processing of such graphs with limited resources. Graph summarization is an extensively-studied lossy compression method. It provides a summary graph where nodes with similar connectivity are merged into supernodes, and a variety of graph queries can be answered approximately from the summary graph. In this work, we introduce a new problem, namely personalized graph summarization, where the objective is to obtain a summary graph where more emphasis is put on connections closer to a given set of target nodes. Then, we propose PeGaSus, a linear-time algorithm for the problem. Through experiments on six real-world graphs, we demonstrate that PeGaSus is (a) Effective: node-similarity queries for target nodes can be answered significantly more accurately from personalized summary graphs than from non-personalized ones of similar size, (b) Scalable: it summarizes graphs with up to one billion edges, and (c) Applicable to distributed multi-query answering: it successfully replaces graph partitioning for communication-free multi-query processing.