IIC Journal of Innovation 17th Edition Applying Solutions at the Digital Edge | Page 54

Heterogeneous Computing in the Edge
computing tasks they are designed for . This is accomplished through the use of massively parallel arrays of relatively simple processors ( for example , the NVIDIA “ Ampere ” microarchitecture used in their A40 product has 10752 processor cores 17 ). GPUs also have advanced memory subsystems capable of moving data in and out of their associated memories many times faster than CISC / RISC CPUs .
NVIDIA and AMD are leading suppliers of GPU processors . They have been in a competitive race for over a decade , and this has fueled rapid innovation . Intel and ARM also provide GPUs , but theirs are mostly integrated with conventional CPUs for applications like laptop computers , and this model may not be optimal for modular edge nodes . Recently , both NVIDIA 18 and AMD 19 have introduced GPU products optimized for edge / IoT applications .
The subset of edge computing applications that can be optimized through the use of GPUs tend to follow certain design patterns , for example those workloads that are “ embarrassingly parallel ” 20 . These sorts of problems are common in graphics , image analysis and visual computing , where a scene can be broken down into pixels or regions , and each can be processed independently of all others , with minimal inter-thread communication . Other examples of problems that are amenable to parallel execution include database searches , matrix math , and some problems in artificial intelligence ( such as neural network execution ). They all continue to scale linearly with thousands of processors acting on the same problem in parallel . These workloads are common on edge computing , and the addition of GPUs to edge nodes can greatly improve the efficiency of systems .
The programing environment for GPUs is somewhat different from the traditional software development models used on CISC / RISC CPUs . In order to partition the algorithms effectively across the large numbers of GPU processors , special software constructs are needed so the programmer can help direct the parallelism , and special compilers are needed to help optimize the execution . The NVIDIA CUDA 21 ( Compute Unified Device Architecture ) is one of the most capable toolkits for parallel algorithm development on GPUs . There is a large ecosystem of developers and a thriving application marketplace for CUDA . The OpenCL ( Open Compute
17
NVIDIA A40 datasheet
18
Edge Computing Solutions For Leading Technologies | NVIDIA
19
AMD Ryzen™ Embedded Family | AMD
20
EmbarrassinglyParallel Design Pattern ( ufl . edu )
21
CUDA Zone | NVIDIA Developer
- 50 - June 2021