IIC Journal of Innovation 17th Edition Applying Solutions at the Digital Edge

Heterogeneous Computing in the Edge

computing tasks they are designed for . This is accomplished through the use of massively parallel arrays of relatively simple processors ( for example , the NVIDIA “ Ampere ” microarchitecture used in their A40 product has 10752 processor cores ¹⁷ ). GPUs also have advanced memory subsystems capable of moving data in and out of their associated memories many times faster than CISC / RISC CPUs .

NVIDIA and AMD are leading suppliers of GPU processors . They have been in a competitive race for over a decade , and this has fueled rapid innovation . Intel and ARM also provide GPUs , but theirs are mostly integrated with conventional CPUs for applications like laptop computers , and this model may not be optimal for modular edge nodes . Recently , both NVIDIA ¹⁸ and AMD ¹⁹ have introduced GPU products optimized for edge / IoT applications .

The subset of edge computing applications that can be optimized through the use of GPUs tend to follow certain design patterns , for example those workloads that are “ embarrassingly parallel ” ²⁰ . These sorts of problems are common in graphics , image analysis and visual computing , where a scene can be broken down into pixels or regions , and each can be processed independently of all others , with minimal inter-thread communication . Other examples of problems that are amenable to parallel execution include database searches , matrix math , and some problems in artificial intelligence ( such as neural network execution ). They all continue to scale linearly with thousands of processors acting on the same problem in parallel . These workloads are common on edge computing , and the addition of GPUs to edge nodes can greatly improve the efficiency of systems .

The programing environment for GPUs is somewhat different from the traditional software development models used on CISC / RISC CPUs . In order to partition the algorithms effectively across the large numbers of GPU processors , special software constructs are needed so the programmer can help direct the parallelism , and special compilers are needed to help optimize the execution . The NVIDIA CUDA ²¹ ( Compute Unified Device Architecture ) is one of the most capable toolkits for parallel algorithm development on GPUs . There is a large ecosystem of developers and a thriving application marketplace for CUDA . The OpenCL ( Open Compute

NVIDIA A40 datasheet

Edge Computing Solutions For Leading Technologies | NVIDIA

AMD Ryzen™ Embedded Family | AMD

EmbarrassinglyParallel Design Pattern ( ufl . edu )

CUDA Zone | NVIDIA Developer

- 50 - June 2021

IIC Journal of Innovation 17th Edition Applying Solutions at the Digital Edge | Page 54