Heterogeneous Computing in the Edge
on the backplane . In addition to the primary interconnect , the backplane carries an overlay network for management traffic , and the distribution paths for the power subsystem .
At least one CISC / RISC CPU module will be present in most modular edge nodes . These typically will include a single socket X86 or ARM CPU chip , along with associated memory , storage and I / O structures . The CISC / RISC CPU chip could be a server-class CPU with power dissipation over 100 Watts , but more likely it will be a laptop-class CPU with power ratings in the 40-Watt range . Memory is typically a few DRAM modules configurable in the 16-256GB range for edge applications . Storage is typically solid-state drives , with capacities in the 1-20TB range . It is interconnected to the backplane using a number of PCI-e or Ethernet channels , for a total usable bandwidth of tens of Gb / s
GPU modules will often have a modest CISC / RISC CPU as a host processor , and a fairly powerful GPU chip as the main processor . This chip may have thousands of GPU cores , and connect to 4GB + of fast memory . It communicates with the backplane over many channels of high-speed interconnect , with a total bandwidth of several tens of Gb / s . As GPU chips are power hungry , this board may also feature advanced thermal management technologies like vapor chambers , heat pipes or local booster fans .
FPGA modules may be configured as farms of 4-16 FPGA chips , managed by a small host processor . Each FPGA will likely have one or two DIMM sockets for DRAM , but some applications may not require them and leave those sockets empty . Some of the rich I / O structures on the FPGAs will interconnect with their on-board peers over a mesh network , while other I / O channels will go to the backplane connector , and still others leave the modular edge node as external I / O links .
The architecture of TPU modules will be similar to the GPU modules . It will typically include a small host processor , and one or more large TPU chips . Each TPU will have 4-16GB of fast DRAM . Some TPUs may have local HDDs or SSDs to store the large datasets associated with AI / ML / DL training and model data . TPU modules communicate with the backplane over tens of Gb / s of channel bandwidth .
DSPs , ASICs , ASSPs , and SoCs all have potential applicability as modules for this edge node architecture . For certain applications , these will be parts of highly efficient implementations . However , for many edge applications , especially multi-tenant edge applications where multiple computational workloads from multiple customers are supported concurrently on the same edge node , some mix of the more versatile and higher performing technologies of CISC / RISC CPUs , GPUs , TPUs , and FPGAs will likely dominate , so those are emphasized on Figure 2 .
Storage is another important aspect of modular edge node design . As shown in the sidebar , storage will usually be implemented as a hierarchy , involving main memory , solid state drives , rotating drives and network storage . Different types of modules , for example flash arrays and rotating disk arrays , can be configured on the same backplane as required by the applications .
- 56 - June 2021