Responsible Generative AI
Figure
7-1 : GenAI hardware infrastructure example .
It is possible to estimate the large energy investment for a single training run of a popular AI model . See below for one approximate calculation . Of course , there are many variables involved in this , including various performance and efficiency factors that are highly variable and moving targets , so actual systems are likely to be quite different in their resource requirements .
Of course , practical systems would use many ( perhaps hundreds ) of these DGX H100 systems in parallel . An example of one such architecture is shown in Figure 7-1 . To complete a model training , run in a week ( which is reasonable to keep the model current , or in multi-client environments ), an infrastructure with 195 DGX systems ( about 50 racks full ) would be required in parallel , along with the storage , switching , power and cooling support equipment , drawing about 2.4MW of power continuously .
Running the Large Language Models ( LLMs ) in inference mode is also power and other resource ( memory ) intensive . It is reported that the compute resources for running the inference phase for large AI models account for the same and in some cases bigger than the power requirements of training the model . Imagine the resource impact of multiple instances and use cases running at the same time 31 .
7.3 HUMAN SKILLS AND TRAINING NEEDED
There is a dearth of experienced AI scientists and engineers to work in the area of GenAI and this could hamper the development of guard rails , tools for ethical and responsible frameworks for AI , as most AI engineers could get lured by other work that may result in immediate commercial
31 https :// doi . org / 10.1787 / 7babf571-en Journal of Innovation 33