AceleMax DGS-1216AS
Dual AMD EPYC Processors 16x NVIDIA A100 GPU Server
- 16x NVIDIA A100 GPUs with 1,280 GB total GPU memory
- 6x NVIDIA NVSwitches
- 9.6 TB/s total aggregate bandwidth
- 2nd Generation NVIDIA NVSwitch
Purpose-Built for the Convergence of Simulation, Data Analytics, and AI
Massive datasets, exploding model sizes, and complex simulations require multiple GPUs with extremely fast interconnections. The NVIDIA HGX™ platform brings together the full power of NVIDIA GPUs, NVIDIA® NVLink®, NVIDIA Mellanox® InfiniBand® networking, and a fully optimized NVIDIA AI and HPC software stack from NGC™ to provide highest application performance. With its end-to-end performance and flexibility, NVIDIA HGX enables researchers and scientists to combine simulation, data analytics, and AI to advance scientific progress. With a new generation of A100 80GB GPUs, a single HGX A100 now has up to 1.3 terabytes (TB) of GPU memory and a world’sfirst 2 terabytes second (TB/s) of memory bandwidth, delivering unprecedented acceleration for emerging workloads, fueled by exploding model sizes and massive data-sets.
Third-Generation NVIDIA NVLink Creates a Single Super GPU
Scaling applications across multiple GPUs requires extremely fast movement of data. The third generation of NVIDIA NVLink in the NVIDIA A100 Tensor Core GPU doubles the GPU-to-GPU direct bandwidth to 600 gigabytes per second (GB/s), almost 10X higher than PCIe Gen4. Third-generation NVLink is available in four-GPU and eight-GPU HGX A100 servers from leading computer makers.
Multi-Instance GPU (MIG) Delivers Seven Accelerators in a Single GPU
Every AI and HPC application can benefit from acceleration, but not every application needs the performance of a full A100 Tensor Core GPU. With MIG, each A100 can be partitioned into as many as seven GPU instances, fully isolated at the hardware level with their own high-bandwidth
memory, cache, and compute cores. This allows HGX A100 systems to offer up to 112 GPU instances, giving developers access to breakthrough speed for every application, big and small, with guaranteed quality of service.
With A100 80GB, seven MIGs can be configured with 10 GB each (double the size of A100 40GB MIGs), making it now possible to perform inference on batch-size constrained models like BERT-LARGE (a natural language processing model with superhuman understanding) at much higher
batch sizes, delivering up to a 1.3X increase in throughput.
Design Versatility to Suit Any Workload
NVIDIA HGX™ A100 delivers a best-in-class server platform through GPU baseboards and a design guide that provides different configuration options. This allows unmatched versatility, enabling server manufacturers to build a range of CPU and GPU systems or cloud instances ideal for different workloads.
Third-Generation Tensor Cores Redefine the Future of AI and HPC
First introduced in the NVIDIA Volta™ architecture, NVIDIA Tensor Core technology has brought AI training times down from weeks to hours and provided massive acceleration to inference operations. The third generation of Tensor Cores in the NVIDIA Ampere architecture builds upon these innovations by providing up to 20X more floating operations per second (FLOPS) for AI applications and up to 2.5X more FLOPS for FP64 HPC applications.
NVIDIA HGX A100 4-GPU delivers nearly 80 teraFLOPS of FP64performance for the most demanding HPC workloads. NVIDIA HGX A100 8-GPU provides 5 petaFLOPS of FP16 deep learning compute. Abd the HGX A100 16-GPU configuration achieves a staggering 10 petaFLOPS, creating the world’s most powerful accelerated server platform for AI and HPC.
Applications:
AI, HPC, VDI, machine intelligence, deep learning, machine learning, artificial intelligence, Neural Network, advanced rendering and compute.
8U GPU Chassis (JBOG)
Graphics Processing Unit (GPU):
16x NVIDIA A100 GPUs
NVIDIA Baseboard
2x NVIDIA HGX A100 8-GPU Baseboard
Expansion Slots
16x FHHL PCIe Gen4 x16 slots
Management
1x ASPEED AST2520
Storage
8x NVMe U.2 SSDs
Headnode Connection
8x zCD connector (each zCD connector provides x16 PCIe Gen4 lane connectivity)
Power Supply
4+4 Redundant 80 Plus Titanium level Redundant Power Supplies (3,000W Max. @180V-264Vac, 1500W Max. @100V-127Vac)
Chassis Dimension
352(H) x 447(W) x 948(D) mm
2U Headnode (Two Systems per 8U JBOG)
Processor
2x AMD EPYC™ 7002 or 7003 series processor, 7nm, Socket SP3, up to 64 cores, 128 threads, and 256MB L3 cache per processor, up to 240W TDP
Memory
32x DIMM Slots, DDR4 RDIMM (Support 2x 32GB NVDIMMs or 4x16GB NVDIMMs , optional)
Storage
- 18x 2.5” hot-swap NVMe U.2 SSD
- 2x SATA/NVMe M.2 (2280/22110)
- 2x 2.5” hot-swap SATA/NVMe U.2 SSD
Management
1x ASPEED AST2500 BMC, IPMI2.0
TPM
1x TPM 2.0 Module
Rear Panel
- 1x RJ45 for BMC dedicated management 1x RJ45 Console port
- 1x VGA
- 1x UID LED
- 2x GbE Ethernet RJ45
- 2x USB 3.0
Front Panel
1x System Healthy LED (OFF/Amber)
JBOG Connection
4x zCD connector (each zCD connector provides x16 PCIe Gen4 lanes connectivity)
Expansion Slot
1x OCP3.0 PCIe Gen4 x16 NIC slot
Power Supply
2x 1600W Redundant (Platinum level certified)
System Cooling
6x System Fan (60x56m)
Chassis Dimension
17.5”(W) x 28.0”(D) x 3.4”(H) (446.6mm x 711.2m x 87.0mm)
Optimized for Turnkey Solutions
Enable powerful design, training, and visualization with built-in software tools including TensorFlow, Caffe, Torch, Theano, BIDMach cuDNN, NVIDIA CUDA Toolkit and NVIDIA DIGITS.