NVIDIA today announced that it launched new NVIDIA-HGX A100 systems with the support of global partners. These systems will accelerate AI and HPC by adding NVIDIA elements like NVIDIA NDR400G InfiniBand networking, NVIDIA A100 80GB GPU, and NVIDIA Magnum Io GPUDirect Storage software. The new HGX systems are being brought to market through Atos and Dell Technologies, Hewlett Packard Enterprise, Lenovo, Microsoft Azure, NetApp, and Hewlett Packard Enterprise.
NVIDIA has been making high-powered GPUs for years. NVIDIA’s technology is being used by more supercomputers due to the growing AI market, especially in HPC. NVIDIA has been manufacturing HPC workstations and servers for several years using DGX and HGX models. This has allowed NVIDIA IPs to be combined under one roof, resulting in better performance. The new systems are equipped with the most recent NVIDIA technology.
NVIDIA A100 80GB PCIe GPU
GTC announced the NVIDIA A100 last year. The new 7nm GPU uses the company’s Ampere architecture. It contains 54 million transistors. NVIDIA quickly upgraded the product by introducing the NVIDIA 80GB PCIe GPU. This doubled its memory. The A100 80GBPCIe GPU is the first component of the new HGX A100 system. The large memory capacity and high bandwidth of the A100 80GB PCIe GPU allow larger data sets and more neural networks to be stored in memory. This allows for less internode communication and less energy consumption. High memory can also allow for faster results due to its higher throughput.
The company’s Ampere architecture powers the NVIDIA 80GB PCIe A100 GPU. Multi-Instance GPU (also known as MIG) is a feature of this architecture. MIG provides acceleration for smaller workloads (i.e., AI Inference). This feature allows users to scale both memories and compute down while maintaining a high quality of service.
Atos and Cisco are the partners for NVIDIA A100 80GB PCIe GPU. Inspur, Fujitsu (H3C, HPE), Inspur, Lenovo. Penguin Computing, QCT, and Supermicro are also available. A few cloud services also offer the technology, including AWS, Azure, and Oracle.
NVIDIA NDR 400G InfiniBand networking
NVIDIA NDR400G InfiniBand switch system is the second piece of the NVIDIA HGX A100 systems puzzle. Although it may sound obvious, HPC systems require high data throughput. Mellanox was acquired by NVIDIA a few years ago for almost $7 billion. It has steadily been releasing new products and slowly eliminating the Mellanox brand for NVIDIA since then. It released the NVIDIA NDR400G InfiniBand last year with 3x the port density, 32x the AI acceleration. Through the NVIDIA Quantum-2 fixed-configuration switch system, this is being integrated into new HGX systems. The system will deliver 64 ports of NDR 400Gb/s InfiniBand each port and 128 ports for NDR 200.
The company claims that the NVIDIA Quantum-2 modular switches offer scalable configurations of up to 2,048 ports NDR 400Gb/s InfiniBand or 4,096 ports NDR200 with a bi-directional throughput totaling 1.64 petabits/second. This is a significant improvement from the previous generation, with 6.5x more scalability. Users can connect to more than a million nodes using the DragonFly+ network topology. The company also added its third-generation NVIDIA SHARP IN-Network Computing data reduction technology. This technology can be said to deliver 32x more AI acceleration than previous generations.
NVIDIA Quantum-2 switches can be used forward and backward compatible. Atos, DDN, and Dell Technologies are manufacturing partners.
Magnum IO GPUDirect Storage
Magnum IO GPUDirect Storage is the final piece in the NVIDIA HDX 100 puzzle. This allows direct memory access between the GPU’s memory and storage. This has many benefits, including lower I/O latency and bull use of bandwidth from the network adapters. It also has less impact on the CPU. Magnum IO GPUDirect Storage is now available from several partners, including DDN Technologies, Excelero and HPE, IBM Storage, and WekaIO.