Dgx 2 price: Sixteen Tesla V100s, 30 TB of NVMe, only $400K

NVIDIA DGX A100 supercomputer is half the cost and size with double the performance! How does it do it?

Trending

Google Pixel 7 and 7 Pro review: The best pure Android phones, Bar none?

Apple’s 10th-gen iPad has a new design, bigger display, USB-C and A14 chip

AirPods Pro (2022) review: h3 and the sound of silence

News

News Categories

Note: This article was first published on 15 May 2020.

If the new Ampere architecture based A100 Tensor Core data center GPU is the component responsible re-architecting the data center, NVIDIA’s new DGX A100 AI supercomputer is the ideal enabler to revitalize data centers. With 5 petaflops of AI performance, it also packs the power and capabilities of an entire data center into a single machine. And given the advancements of the new Ampere A100 GPU, this is not just a marketing statement.

NVIDIA DGX A100 is the ultimate instrument for advancing AI. NVIDIA DGX is the first AI system built for the end-to-end machine learning workflow — from data analytics to training to inference. And with the giant performance leap of the new DGX, machine learning engineers can stay ahead of the exponentially growing size of AI models and data. — Jensen Huang, founder and CEO of NVIDIA

What’s under the hood of the DGX A100

The DGX A100 is NVIDIA’s third iteration of its supercomputing unit and it’s packed with eight of NVIDIA’s latest Ampere-based A100 GPUs that communicate via 6 upgraded NVSwitches (which, mind you, were already impressive in their last iteration) to match up with the speeder third-gen NVLinks on the new GPUs. The second-generation NVSwitch interconnect fabric now boasts an inter-GPU bandwidth of 600GB/s (thanks to speedier NVLinks on the A100) and brings the total inter-GPU communication bandwidth of 4. 8TB/s across all the GPUs in the DGX A100 supercomputer.

While that’s the same as what the DGX-2, DGX A100 does this with far fewer components. The NVSwitch interconnect fabric does however theoretically allow scaling it further to support 16 GPUs and 16 NVSwtiches, which would then bring the total inter-GPU communication bandwidth to 9.6TB/s.

The eight GPUs combined bring 320GB of total GPU memory to the system using higher speed HBM2 memory from Samsung. And although total GPU memory is down from 512GB on the DGX-2, the new DGX A100 has far higher speed memory and helps close the gap to boast 12.4TB/s peak memory throughput (just a little less than 14.4TB/s in the DGX-2).

Connectivity out of the box to scale up data center capabilities with more DGX supercomputers is courtesy of NVIDIA’s new acquisition that allows them to use high-speed Mellanox HDR 200Gbps interconnects – which are twice the throughput that Infiniband 100GbE offered on the DGX-2. The DGX A100 offers 8x single-port Mellanox ConnectX-6 HDR Infiniband/200GbE and with clustering, supports a total peak interconnect performance of 200GB/s! It also has a single dual-port ConnectX-6 for data and storage networking needs.

In a surprising move, NVIDIA’s latest supercomputer dumps Intel for AMD’s EPYC 7742, 64-core server processor! This speaks volumes of AMD’s speedier advancement in the server and data center scene and NVIDIA’s confidence in their supply chain. There’s two of them on the motherboard baseboard within then DGX A100, so that’s a total of 128 CPU cores and 1TB of system memory. Storage space is serviced by dual 1.92TB M.2 NVMe drives to host the OS while non-OS storage comes up to be a total of 15TB utilizing quad 3.84TB U.2 NVMe drives.

 

Here’s what makes the NVIDIA DGX A100 even more impressive

So almost all the quantity, frequency, bandwidth and throughput figures are quite a fair bit higher – and we stress almost because in some ways, the DGX A100 does come with less that helps it bring its sticker price way lower than you would imagine. The previous AI supercomputer, DGX-2, costs a whopping US$399,000 and puts out 2 teraflops of AI performance.

The new DGX A100 costs ‘only’ US$199,000 and churns out 5 teraflops of AI performance –the most powerful of any single system. It is also much smaller than the DGX-2 that has a height of 444mm. Meanwhile, the DGX A100 with a height of only 264mm fits within a 6U rack form factor. How can NVIDIA serve out something that’s half of the cost of its predecessor, nearly half the size and more than doubles the performance capability?

For starters, the DGX A100 only uses 8 GPUs vs. 16 on the DGX-2, which is enough reason for massive cost savings from a silicon consumption and complexity management perspective. Less GPUs mean less NVSwtiches deployed, and that is also halved in the DGX A100. The reduces number of components mean it has one less plane used to accommodate all the GPUs it needs compared to the DGX-2. If you managed to playback the DGX A100 video intro above, you would get a glimpse of the back of the supercomputer and its layout that has been simplified. The bottom plane houses all the redundant PSUs, the next layer houses all the external system connectivity options, followed by the single GPU plane at the top. All of this adds up to tremendous cost savings while the improved GPU architecture and connectivity options ensure you get much more for less.

It doesn’t just stop there. Thanks to the Multi-Instance GPU (MIG) capability of the new A100 GPU, each GPU can partition itself into seven discrete instances, fully isolated from each other and running various workloads while still having their own slice of high bandwidth memory, cache, compute cores and more. With the DGX A100’s eight GPUs, this gives the administrator the ability carve out up to 56 GPU instances. This key feature makes along with massive processing throughput boosts make the DGX A100 the most versatile high-density compute system in the market offering a consolidated option for training, inferencing, analytics needs into a unified, easy-to-deploy AI system with unprecedented levels of performance.

In fact, NVIDIA’s painted a picture of a typical AI data center setup today that might consist of 50 DGX-1 systems for AI training and 600 CPU systems for AI inferencing that could horde up to 25 racks in space, sip up to 630kW of power and cost about US$11 million in infrastructure alone can be consolidated into a single rack with just five of the DGX A100 AI supercomputers to do the job at just a million dollars and consuming 28kW of power. That’s tremendous savings in infrastructure costs, running costs and carbon footprint.

Unlike its predecessors, the DGX A100 is no longer just another option in NVIDIA’s supercomputer lineup, but this effectively retires both the DGX-1 and DGX-2 by a long shot. The performance to cost ratio is simply off the charts – comparatively speaking.

 

Availability and who’s adopting it?

Immediately available, DGX A100 systems have begun shipping worldwide, with the first order going to the U. S. Department of Energy’s (DOE) Argonne National Laboratory, which will use the cluster’s AI and computing power to better understand and fight COVID-19.

We’re using America’s most powerful supercomputers in the fight against COVID-19, running AI models and simulations on the latest technology available, like the NVIDIA DGX A100.  The compute power of the new DGX A100 systems coming to Argonne will help researchers explore treatments and vaccines and study the spread of the virus, enabling scientists to do years’ worth of AI-accelerated work in months or days. — Rick Stevens, associate laboratory director for Computing, Environment and Life Sciences at Argonne. 

Here are the other early adopters:-

  • The Center for Biomedical AI — At the University Medical Center Hamburg-Eppendorf, Germany, they will leverage DGX A100 to advance clinical decision support and process optimization.

     
  • Chulalongkorn University — Thailand’s top research-intensive university will use DGX A100 to accelerate its pioneering research such as Thai natural language processing, automatic speech recognition, computer vision and medical imaging.

     
  • German Research Center for Artificial Intelligence (DFKI) will use the DGX A100 systems to further accelerate its research on new deep learning methods and their explainability while significantly reducing space and energy consumption.

     
  • Harrison.ai — A Sydney-based healthcare AI company will deploy Australia’s first DGX A100 systems to accelerate the development of its AI-as-medical-device.

     
  • The UAE Artificial Intelligence Office — First in the Middle East to deploy the new DGX A100 is building a national infrastructure to accelerate AI research, development and adoption across the public and private sector.
     
  • VinAI Research — Vietnam’s leading AI research lab, based in Hanoi and Ho Chi Minh City will use DGX A100 to conduct high-impact research and accelerate the application of AI.

     

Read Next (1): NVIDIA’s new Ampere architecture will soon power cars!
Read Next (2): Meet NVIDIA’s new 54-billion transistor Ampere GPU, the A100

Join HWZ’s Telegram channel here and catch all the latest tech news!

Our articles may contain affiliate links. If you buy through these links, we may earn a small commission.

Previous Story

Facebook buys GIF-sharing platform Giphy

Next Story

Grand Theft Auto 5 is now available for free on the Epic Games Store


Sponsored Links

NVIDIA DGX-2 ‘The World’s Largest GPU’ Announced

NVIDIA has unveiled a brand new product called the NVIDIA DGX-2 supercomputer which is basically 16 Volta GPUs stacked together using NVSwitch — which is basically a high performance external interconnect. The entire package has a power consumption that is a fraction of what you would traditionally get using CPU based clusters but of course comes at a premium.

Keep in mind calling this server rack a GPU is a misnomer. While the use of NVSwitch does in fact make it much faster than a server rack with 16 Volta GPUs in a vanilla setup, it is still nowhere near as fast as enough as a GPU of this calibre would truly be. The server packs a total of 81,920 CUDA Cores with 512 GB HBM2 memory and a 14.4 TB/s aggregate bandwidth and 300 GB/s GPU to GPU. The total power consumption of the rack is 10,000 watts and weighs 350 pounds.

DGX-2 provides 10X the processing power of DGX-1 of six months ago, unveiled in September 2017.

The exploration space of AI, the number of layers, the training rates sweeping through different fraemworks, with bigger networks, more experimentation, DGX-2 couldn’t come at a better time.

How much should we charge is the question? It tooks hundreds of millions of dollars of engineering to create this.

It’s $399K for the world’s most powerful computer. This replaces $3M of 300 dual-CPU servers consuming 180 kilowatts.  This is 1/8th the cost, 1/60th of the space, 18th the power. “The more you buy, the more you save,” Jensen says, repeating a phrase he’s used at a number of these. via NVIDIA

8x EDR 100 Gigabit ethernet connectors make up for connectivity and the GPU cluster is driven by 2x Xeon Platinums which if its the 8180 would pack a total of 56 cores — more than enough to handle the driver’s seat. 1.5 TB of RAM is also included with 30 TB of NVME SSDs. The interconnect fabric used is seamless and can go both ways, so this is definitely not simply 16 Volta GPUs connected. Its more than that — but still not a wholesome GPU. You can buy the DGX-2 for $399,000 (fun fact, the DGX-2 is 500 times faster than 2x GTX 580s but also 399 times more expensive).

The NVIDIA DGX2 is 500 times faster than a couple of GTX 580s (and 399 times more expensive.)

DGX-2 is the first system to debut NVSwitch, which enables all 16 GPUs in the system to share a unified memory space. Developers now have the deep learning training power to tackle the largest datasets and most complex deep learning models.

Combined with a fully optimized, updated suite of NVIDIA deep learning software, DGX-2 is purpose-built for data scientists pushing the outer limits of deep learning research and computing.

2 of 9

DGX-2 can train FAIRSeq, a state-of-the-art neural machine translation model, in less than two days — a 10x improvement in performance from the DGX-1 with Volta, introduced in September.

DGX-2 is the latest addition to the NVIDIA DGX product portfolio, which consists of three systems designed to help data scientists quickly develop, test, deploy and scale new deep learning models and innovations.

DGX-2, with 16 GPUs, is the top of the lineup. It joins the NVIDIA DGX-1 system, which features eight Tesla V100 GPUs, and DGX Station™, the world’s first personal deep learning supercomputer, with four Tesla V100 GPUs in a compact, deskside design. These systems enable data scientists to scale their work from the complex experiments they run at their desks to the largest deep learning problems, allowing them to do their life’s work.

Further information, including detailed technical specifications and order forms, is available at https://nvda.ws/2IRilLe.

DGX-2

NVIDIA DGX-2

Forget the limits of AI computing with the NVIDIA DGX-2. It is the first 2 petaflops system to feature 16 fully interconnected GPUs that provide a 10x performance boost in deep learning tasks. Powered by NVIDIA® DGX™ software and based on the NVIDIA NVSwitch architecture, the Explorer system enables you to solve the world’s most complex AI challenges.

Unparalleled computing power delivers unprecedented algorithm training performance

AI algorithms are becoming more complex and require unrivaled levels of processing power. NVIDIA® DGX-2 combines the power of 16 of the world’s most advanced GPUs to accelerate new types of AI models that were never possible before. In addition, the system provides extensive scalability. You can train up to 4x larger neural network models while experiencing 1x performance gains compared to systems with 8 GPUs.

A revolutionary network fabric for artificial intelligence

With the advent of DGX-2, the complexity of neural network models and their size are no longer limited by the capabilities of traditional architectures. Now you can enjoy the benefits of parallel model training with NVIDIA NVSwitch high-speed internal communication technology. This is the innovative technology behind the world’s very first 2 petaflops system. It delivers up to 2.5 TB/s GPU-to-GPU transfer speeds, up to 12 times faster than the previous generation internal connection and up to 5 times faster application performance.

1. NVIDIA TESLA V100 32 GB, SXM3

2. TOTAL 16 GPUs ON TWO BOARDS WITH A TOTAL 512 GB HBM2 MEMORY
Each board has 8 NVIDIA Tesla V100 GPUs.

3. 12 CHIPS WITH NVSWITCH ARCHITECTURE
High-speed interconnect with a throughput of 2.4 TB/s.

4. 8 EDR INFINIBAND/100 GbE ETHERNET SWITCHES
Low latency and 1600 Gb/s bi-directional bandwidth.

5. PCIE 9 SWITCH SET0003

6. Two CPU Intel Xeon Platinum

7. 1.5 TB system memory

8. Double 10/25 GBE Ethernet

9. NVMe-drive of 30 TB

Scaling of artistic intelligence of a new level

Today’s businesses need to be able to quickly deploy artificial intelligence to meet today’s business challenges. The DGX-2 is a turnkey, modular solution that provides the fastest path to scaling AI. The DGX-2 makes it easy to scale AI algorithms with flexible networking options to create the largest compute clusters, combined with multi-tenancy features. This provides improved isolation of different users and worker processes in shared infrastructure environments. With an architecture purpose-built for easy scaling and faster model deployment, your team can save time building infrastructure and move on to project development.

Enterprise-grade infrastructure that keeps you up and running

Artificial intelligence is a critical part of your business, so you need a platform built for high reliability, availability and serviceability (RAS). The DGX-2 is purpose-built to reduce unplanned downtime, optimize service and maintain business continuity. The DGX-2 is an enterprise-grade solution that can handle 24/7 high workloads for your most critical AI projects.

Request a Quote

Nvidia DGX-2 Deep Learning Server Powered by Tesla V100 GPUs

The topic of graphics applications for NVIDIA products at GTC 2018 has long ceased to be the main one. Although a significant part of the key speech of the head of the company was occupied by real-time ray tracing and automotive topics, he did not bypass the most important topic for NVIDIA on the GTC — the use in artificial intelligence systems and deep learning, in particular. The California-based company’s technology has gone far beyond rendering and processing acceleration of visual data, and is now centered on a deep learning-accelerating computing platform.

Millions of servers and computers around the world are becoming more productive, but in order to make them smarter, you need to learn how to use all the possibilities: high-quality voice recognition, natural speech understanding and much more — by the way, visual data processing is also included here. President Jensen Huang, speaking to 8500 developers, businessmen, scientists and the press, published a series of announcements strengthening the company’s computing deep learning platform.

NVIDIA significantly improves the capabilities and performance of its deep learning platform every year, far exceeding all expectations, which opens up new opportunities for using their platform and revolutionary changes in various fields: medicine, transportation, science and many others. Even apart from important software announcements, including the adaptation of most cloud services, one of the most interesting hardware announcements was the announcement of the new Tesla V100 computing solution, which uses HBM2-chip memory doubled to 32 GB, which is relevant in a large number of volume-demanding and memory speed of deep learning tasks. Double the amount of memory will allow you to train larger models and gain an advantage in tasks that were previously limited to 16 GB of memory.

The new Tesla V100 32GB computing solution is available from the day of the announcement, and renowned manufacturers such as Cray, Hewlett Packard Enterprise, IBM, Lenovo, Supermicro and Tyan will begin distributing Tesla V100 32GB based systems in the second quarter of this year. Oracle Cloud Infrastructure has also already announced plans to offer cloud capabilities for the new Tesla V100 32GB in the second half of the year.

Combined with NVIDIA’s all-new NVSwitch chip-to-chip interconnect technology, which connects up to 16 Tesla V100 accelerators into a single device with 2.4 TB/s memory performance, the possibilities of such systems will seem limitless. NVSwitch extends the capabilities of NVLink and offers up to 5 times the bandwidth of the best PCI-Express switches, allowing you to create systems with a large number of GPUs connected to each other in them.

Neural networks are becoming more complex, their size and data sets are growing. There are also some new techniques that require more GPUs connected to each other for data exchange and synchronization. Such operations require the transfer of a large amount of data and high bandwidth. The company’s new solution removes previous restrictions on the data transfer rate between chips and allows the use of larger data sets with increasingly resource-intensive workloads, including parallel training of neural networks.

Each NVSwitch contains 18 NVLink ports (50 Gb/s per port), six of them on the base board along with eight Tesla V100 GPUs, and these two base boards can be combined into one. Each of the eight GPUs on one board is connected to each of the six NVSwitch by a single NVLink channel, and the eight ports of each NVSwitch chip are used to communicate with the other baseboard. Accordingly, each of the eight GPUs on the board communicates with other processors at a speed of 300 GB / s.