IIC Journal of Innovation 17th Edition Applying Solutions at the Digital Edge | Page 55

Heterogeneous Computing in the Edge
Language ) framework is a more standardized alternative to CUDA , and has some advantages for edge computing .
There are several commercially-available GPU hardware modules optimized for edge computing applications . One example is the NVIDIA “ Jetson ” TX2 module , capable of 1.3TFLOPs of computation throughput , with 256 CUDA cores , 8GB of DDR4 memory , less than 15 Watts power dissipation , and a physical size of 87mm x 50mm 22 . Their cost is less than $ 500 in small quantities .
TPUs
Tensor Processing Units ( TPUs ) are specialized computing architectures whose data paths are optimized for Artificial Intelligence / Machine Learning / Deep Learning ( AI / ML / DL ) operations . They can accelerate both the learning phase ( where training data is digested into an AI model ) and the inference phase ( where AI models are executed on live data to produce insights ). They offer extremely large performance / cost improvements compared to CISC / RISC CPUs , and significant improvements over GPUs for many workloads . They consist of massively parallel arrays of relatively simple processors intimately connected to high performance memory subsystems . A highly parallel matrix multiplier is at their core ( capable of performing tens of trillions of multiplies per second ) 23 , which is an essential operation in machine learning .
TPUs were pioneered by Google , and have been offered as a service on their cloud network since 2016 24 . They are integrated into their cloud computing network in large arrays called “ pods ”, and are offered as an alternative to CISC / RISC servers to accelerate AI / ML / DL workloads . NVIDIA announced their Deep learning Accelerator TPU as an open architecture 25 . Intel announced their Neural Compute Stick that includes some TPU functions 26 .
As in GPUs , TPUs require specialized programming environments . Google has a proprietary system called TensorFlow , Nvidia has PyTorch , and Intel has an AI Analytics Toolkit to provide the software development environments for these specialized processor architectures .
TPUs at the edge present some unique challenges . The training data and AI models can be large , which can tax the network bandwidth and storage available at edge nodes . Edge nodes can be power , size , and weight constrained , making it difficult to install cloud-optimized TPU solutions .
22
Jetson Modules | NVIDIA Developer
23
First In-Depth Look at Google ’ s TPU Architecture ( nextplatform . com )
24
Cloud TPU | Google Cloud 25 NVIDIA Deep Learning Accelerator ( nvdla . org )
26
Intel ® Neural Compute Stick 2
IIC Journal of Innovation - 51 -