For smart grid services, accurate individual load forecasting is an essential element. When training individual forecasting models for multi-customers, discrepancies in data distribution among customers should be considered; there are two simple ways to build the models considering multi-customers: constructing each model independently or training as one model encompassing multi-customers. The independent approach shows higher accuracy than the latter. However, it deploys copious models, causing resource/management inefficiency; the latter is the opposite. A compromise between these two could be clustering-based forecasting. However, the previous studies are limited in applying to individual forecasting in that they focus on aggregated load and do not consider concept drift, which degrades accuracy over time. Therefore, we propose a distribution-aware temporal pooling framework that is enhanced clustering-based forecasting. For the clustering, we propose Variational Recurrent Deep Embedding (VaRDE) working in a distribution-aware manner, so it is suitable to process individual load. It allocates clusters to customers every time, so the clusters, where customers are assigned, are dynamically changed to resolve distribution change. We conducted experiments with real data for evaluation, and the result showed better performance than previous studies, especially with a few models even for unseen data, leading to high scalability.