In this paper, we observe that the main performance bottleneck of emerging graph neural networks (GNNs) is not the inference algorithms themselves, but their graph data preprocessing. To take such preprocessing off the critical path in GNNs, we propose PreGNN, a novel hardware automation architecture that accelerates all the tasks of GNN preprocessing from the beginning to the end. Specifically, PreGNN accelerates graph generation in parallel, samples neighbor nodes of a given graph, and prepares graph datasets through all hardware. To reduce the long latency of GNN preprocessing over hardware, we also propose simple, efficient combinational logic that can perform radix sort and arrange the data in a self-governing manner. We implement PreGNN in a customized coprocessor prototype that contains a 16nm FPGA with 64GB DRAM. The results show that PreGNN can shorten the end-to-end latency of GNN inferences by 10.7 x while consuming less energy by 3.3 x, compared to a GPU-only system.
https://ieeexplore.ieee.org/document/9837798