Mots-clés : runtime prediction, power modeling, energy efficiency, analytical framework, HPC-AI, DL training, GPU modeling
A propos :
« The rapid and ongoing expansion of Artificial Intelligence, particularly large-scale Deep Learning models, has positioned computational power as a key driver of modern innovation.
However, this progress is shadowed by a serious consequence: an unsustainable trajectory of energy consumption. The electrical power required to train and operate these complex models now represents a first-order economic and environmental constraint, posing a critical challenge to the long-term viability and societal acceptance of AI. To date, efforts to improve the energy efficiency of AI systems have often been fragmented, addressing specific components or algorithmic techniques in an ad-hoc manner. This approach lacks a unified and systematic engineering process that integrates the full lifecycle from initial system diagnosis to the deployment of verifiable optimizations.
This paper addresses this methodological gap. We propose and detail a comprehensive, four-pillar framework, “Measure, Understand, Model, and Optimize,” that structures the pursuit of energy efficiency as a virtuous iterative cycle. By synthesizing a broad review of the state-of-the-art in energy metrology, system characterization, predictive modeling, and resource management, our framework provides a coherent and actionable workflow.
It transforms energy optimization from a specialized art into a rigorous engineering
discipline. This work provides a practical roadmap for researchers, developers, and
infrastructure managers to systematically analyze, predict, and improve the energy footprint of their systems. In doing so, it aims to foster the necessary engineering principles for a truly Sustainable AI, ensuring that its development and deployment are not only powerful but also responsible. »
