Advisor: Prof. Sujit Dey
Co-supervisor: Prof. Anand Raghunathan
In this research, we propose a fundamentally different approach to making systems variation tolerant. Instead of trying to make the chip resistant to variation and thereby incurring costly overheads, or accept only those chip instances that do not display significant variation from their expected performance and thereby incur significant yield loss, we accept the fact that the underlying hardware may contain components which display variability. We attempt to address the adverse effects of variability by designing software which can dynamically adapt itself to tolerate the performance degradation in the underlying hardware due to process variations, and ensure satisfactory performance of the overall system. Specifically, we propose application level adaptation techniques which can be applied to a broad class of pervasive applications, such as multi-media (audio, video, image) processing, graphics, communications, as well as emerging application domains such as recognition and mining.
Increased transistor scaling, along with rise in complexity and computational capabilities of today's semiconductors, have resulted in higher power density, and hence increased chip temperature. High temperature reduces the lifetime of a chip in addition to the reduction in its functional reliability and performance, and increases its cooling cost. Due to limitations of cooling such as high cost for servers and data centers, or space and/or power requirements in portable devices such as smartphones and laptops, dynamic thermal management (DTM) has been developed as a supplement and sometimes, an alternative solution to the conventional cooling for thermal management of semiconductors. While DTM techniques are efficient in managing temperature, they can introduce different types of performance related overheads, introducing dynamic variation in system performance, in addition to the effects due to process variations. The performance impact due to DTM can lead to increasing the run time of the tasks/applications, and potentially impacting the quality of their results, in particular for real-time applications. In this research, we have developed application adaptation techniques to ensure that the negative impact of DTM on the performance and quality of applications is minimized. In particular, we selected video encoding, an application which is one of the most compute intensive and hence impacts the thermal characteristics of a computing system. We focus on studying the impact of DTM on video encoding, and develop a DTM aware dynamically adaptive video encoder which can vastly reduce the DTM impact.
While the above approach to managing thermal variations is reactive, in the sense that application adaptation is applied after DTM to minimize its impact on application quality, we have also started investigating proactive approaches, which may lead to more effective thermal management with lesser impact on application quality. In particular, we are investigating developing an adaptation framework that can use both workload reduction (application adaptation) and DTM techniques jointly to manage thermal variations, with minimum impact on application quality/performance. We would like to develop a prototype of the joint approach, and demonstrate its effectiveness using (a) the video encoding application and (b) a graphics rendering application, as these are the two most compute intensive applications which are also most commonly used in computing systems.