Machine learning (ML) and artificial intelligence (AI) technologies have revolutionized the ways in which we interact with large-scale, imperfect, real-world data. While traditional workloads including transactional and database processing continue to grow modestly, there is an explosion in the computational footprint of a range of ML and AI applications that aim to extract deep insight from vast quantities of structured and unstructured data. There is an exactness implied by traditional computing that is not needed in the processing of most types of these data. Approximate computing aims to relax these exactness constraints with the goal of obtaining significant gains in computational throughput and energy savings – while still maintaining an acceptable quality of results.
In this talk, we demonstrate that multiple approximation techniques can be applied to applications in these domains and can be further combined together to compound their benefits. In assessing the potential of approximation in these applications, we take the liberty of changing multiple layers of the system stack: architecture, programming model, and algorithms. Across a set of AI and ML applications spanning the domains of DSP, robotics, and deep learning, we show that hot loops in the applications can be perforated by an average of 50% with proportional reduction in execution time, while still producing acceptable quality of results. In addition, the width of the data used in the computation can be reduced to 10-16 bits from the currently common 32/64 bits with potential for significant performance and energy benefits. For parallel applications we reduce execution time by 50% using relaxed synchronization mechanisms. Finally, our results also demonstrate that benefits are compounded when these techniques are applied concurrently.