IIC Journal of Innovation 17th Edition Applying Solutions at the Digital Edge

Heterogeneous Computing in the Edge

Language ) framework is a more standardized alternative to CUDA , and has some advantages for edge computing .

There are several commercially-available GPU hardware modules optimized for edge computing applications . One example is the NVIDIA “ Jetson ” TX2 module , capable of 1.3TFLOPs of computation throughput , with 256 CUDA cores , 8GB of DDR4 memory , less than 15 Watts power dissipation , and a physical size of 87mm x 50mm ²² . Their cost is less than $ 500 in small quantities .

TPUs

Tensor Processing Units ( TPUs ) are specialized computing architectures whose data paths are optimized for Artificial Intelligence / Machine Learning / Deep Learning ( AI / ML / DL ) operations . They can accelerate both the learning phase ( where training data is digested into an AI model ) and the inference phase ( where AI models are executed on live data to produce insights ). They offer extremely large performance / cost improvements compared to CISC / RISC CPUs , and significant improvements over GPUs for many workloads . They consist of massively parallel arrays of relatively simple processors intimately connected to high performance memory subsystems . A highly parallel matrix multiplier is at their core ( capable of performing tens of trillions of multiplies per second ) ²³ , which is an essential operation in machine learning .

TPUs were pioneered by Google , and have been offered as a service on their cloud network since 2016 ²⁴ . They are integrated into their cloud computing network in large arrays called “ pods ”, and are offered as an alternative to CISC / RISC servers to accelerate AI / ML / DL workloads . NVIDIA announced their Deep learning Accelerator TPU as an open architecture ²⁵ . Intel announced their Neural Compute Stick that includes some TPU functions ²⁶ .

As in GPUs , TPUs require specialized programming environments . Google has a proprietary system called TensorFlow , Nvidia has PyTorch , and Intel has an AI Analytics Toolkit to provide the software development environments for these specialized processor architectures .

TPUs at the edge present some unique challenges . The training data and AI models can be large , which can tax the network bandwidth and storage available at edge nodes . Edge nodes can be power , size , and weight constrained , making it difficult to install cloud-optimized TPU solutions .

Jetson Modules | NVIDIA Developer

First In-Depth Look at Google ’ s TPU Architecture ( nextplatform . com )

Cloud TPU | Google Cloud ²⁵ NVIDIA Deep Learning Accelerator ( nvdla . org )

Intel ® Neural Compute Stick 2

IIC Journal of Innovation - 51 -

IIC Journal of Innovation 17th Edition Applying Solutions at the Digital Edge | Page 55