Deep Learning GPU Benchmarks — V100 vs 2080 Ti vs 1080 Ti vs Titan V
What’s the best GPU for Deep Learning? The 2080 Ti. We benchmark the 2080 Ti vs the Titan V, V100, and 1080 Ti.
At Lambda, we’re often asked «what’s the best GPU for deep learning?» In this post and accompanying white paper, we evaluate the NVIDIA RTX 2080 Ti, RTX 2080, GTX 1080 Ti, Titan V, and Tesla V100.
View Lambda’s GPU workstation
TLDR;
As of February 8, 2019, the NVIDIA RTX 2080 Ti is the best GPU for deep learning. For single-GPU training, the RTX 2080 Ti will be…
- 37% faster than the 1080 Ti with FP32, 62% faster with FP16, and 25% more costly.
- 35% faster than the 2080 with FP32, 47% faster with FP16, and 25% more costly.
- 96% as fast as the Titan V with FP32, 3% faster with FP16, and ~1/2 of the cost.
- 80% as fast as the Tesla V100 with FP32, 82% as fast with FP16, and ~1/5 of the cost.
All experiments utilized Tensor Cores when available and relative cost calculations can be found here. You can view the benchmark data spreadsheet here.
Hardware
A Lambda deep learning workstation was used to conduct benchmarks of the RTX 2080 Ti, RTX 2080, GTX 1080 Ti, and Titan V. Tesla V100 benchmarks were conducted on an AWS P3 instance with an E5-2686 v4 (16 core) and 244 GB DDR4 RAM.
Results in-depth
Performance of each GPU was evaluated by measuring FP32 and FP16 throughput (# of training samples processed per second) while training common models on synthetic data. We divided the GPU’s throughput on each model by the 1080 Ti’s throughput on the same model; this normalized the data and provided the GPU’s per-model speedup over the 1080 Ti. Speedup is a measure of the relative performance of two systems processing the same job.
Throughput of each GPU on various models; raw data can be found here.
We then averaged the GPU’s speedup over the 1080 Ti across all models:
FP32 and FP16 average speedup vs 1080 ti.
Finally, we divided each GPU’s average speedup by the total system cost to calculate our winner:
FP32 and FP16 performance per $. Units are speedup / k$.
Under this evaluation metric, the RTX 2080 Ti wins our contest for best GPU for Deep Learning training.
2080 Ti vs V100 — is the 2080 Ti really that fast?
How can the 2080 Ti be 80% as fast as the Tesla V100, but only 1/8th of the price? The answer is simple: NVIDIA wants to segment the market so that those with high willingness to pay (hyper scalers) only buy their TESLA line of cards which retail for ~$9,800. The RTX and GTX series of cards still offers the best performance per dollar.
If you’re not AWS, Azure, or Google Cloud then you’re probably much better off buying the 2080 Ti. There are, however, a few key use cases where the V100s can come in handy:
- If you need FP64 compute. If you’re doing Computational Fluid Dynamics, n-body simulation, or other work that requires high numerical precision (FP64), then you’ll need to buy the Titan V or V100s. If you’re not sure if you need FP64, you don’t. You would know.
- If you absolutely need 32 GB of memory because your model size won’t fit into 11 GB of memory with a batch size of 1. If you are creating your own model architecture and it simply can’t fit even when you bring the batch size lower, the V100 could make sense. However, this is a pretty rare edge case. Fewer than 5% of our customers are using custom models. Most use something like ResNet, VGG, Inception, SSD, or Yolo.
So. You’re still wondering. Why would anybody buy the V100? It comes down to marketing.
2080 Ti is a Porsche 911, the V100 is a Bugatti Veyron
The V100 is a bit like a Bugatti Veyron. It’s one of the fastest street legal cars in the world, ridiculously expensive, and, if you have to ask how much the insurance and maintenance is, you can’t afford it. The RTX 2080 Ti, on the other hand, is like a Porsche 911. It’s very fast, handles well, expensive but not ostentatious, and with the same amount of money you’d pay for the Bugatti, you can buy the Porsche, a home, a BMW 7-series, send three kids to college, and have money left over for retirement.
And if you think I’m going overboard with the Porsche analogy, you can buy a DGX-1 8x V100 for $120,000 or a Lambda Blade 8x 2080 Ti for $28,000 and have enough left over for a real Porsche 911. Your pick.
Raw performance data
FP32 throughput
FP32 (single-precision) arithmetic is the most commonly used precision when training CNNs. FP32 data comes from code in the Lambda TensorFlow benchmarking repository.
Model / GPU | 2080 | 2080 Ti | Titan V | V100 | 1080 Ti |
---|---|---|---|---|---|
ResNet-50 | 209.89 | 286.05 | 298.28 | 368.63 | 203.99 |
ResNet-152 | 82.78 | 110.24 | 110.13 | 131.69 | 82.83 |
InceptionV3 | 141.9 | 189.31 | 204.35 | 242.7 | 130.2 |
InceptionV4 | 61.6 | 81 | 78.64 | 90.6 | 56.98 |
VGG16 | 123.01 | 169.28 | 190.38 | 233 | 133.16 |
AlexNet | 2567.38 | 3550.11 | 3729.64 | 4707.67 | 2720.59 |
SSD300 | 111. 04 | 148.51 | 153.55 | 186.8 | 107.71 |
FP16 throughput (Sako)
- FP16 (half-precision) arithmetic is sufficient for training many networks. We use Yusaku Sako benchmark scripts. The Sako benchmark scripts have both FP16 and FP32 results. From here you can clearly see the 2080 Ti beating out the 1080 Ti’s FP16 performance.
Model/GPU | 2080 | 2080 Ti | Titan V | V100 | 1080 Ti |
---|---|---|---|---|---|
VGG16 | 181.2 | 238.45 | 270.27 | 333.33 | 149.39 |
ResNet-152 | 62.67 | 103.29 | 84.92 | 108.54 | 62.74 |
FP32 (Sako)
Model/GPU | 2080 | 2080 Ti | Titan V | V100 | 1080 Ti |
---|---|---|---|---|---|
VGG16 | 120.39 | 163.26 | 168.59 | 222.22 | 130. 8 |
ResNet-152 | 43.43 | 75.18 | 61.82 | 80.08 | 53.45 |
FP16 Training Speedup over 1080 ti
Model/GPU | 2080 | 2080 Ti | Titan V | V100 | 1080 Ti |
---|---|---|---|---|---|
VGG16 | 1.21 | 1.60 | 1.81 | 2.23 | 1.00 |
ResNet-152 | 1.00 | 1.65 | 1.35 | 1.73 | 1.00 |
FP32 Training Speedup
Model/GPU | 2080 | 2080 Ti | Titan V | V100 | 1080 Ti |
---|---|---|---|---|---|
VGG16 | 0.92 | 1.25 | 1.29 | 1.70 | 1.00 |
ResNet-152 | 0.81 | 1.41 | 1.16 | 1.50 | 1.00 |
Price Performance Data (Speedup / $1,000 USD) FP32
Model/GPU | 2080 | 2080 Ti | Titan V | V100 | 1080 Ti |
---|---|---|---|---|---|
Price Per GPU (k$) | 0. 7 | 1.2 | 3 | 9.8 | 0.7 |
Price Per 1 GPU System (k$) | 1.99 | 2.49 | 4.29 | 11.09 | 1.99 |
AVG | 0.51 | 0.55 | 0.33 | 0.16 | 0.50 |
ResNet-50 | 0.52 | 0.56 | 0.34 | 0.16 | 0.50 |
ResNet-152 | 0.50 | 0.53 | 0.31 | 0.14 | 0.50 |
InceptionV3 | 0.55 | 0.58 | 0.37 | 0.17 | 0.50 |
InceptionV4 | 0.54 | 0.57 | 0.32 | 0.14 | 0.50 |
VGG16 | 0.46 | 0.51 | 0.33 | 0.16 | 0.50 |
AlexNet | 0.47 | 0.52 | 0.32 | 0.16 | 0.50 |
SSD300 | 0.52 | 0.55 | 0.33 | 0.16 | 0.50 |
Price Performance Data (Speedup / $1,000 USD) FP16
Model/GPU | 2080 | 2080 Ti | Titan V | V100 | 1080 Ti |
---|---|---|---|---|---|
AVG | 0. 56 | 0.65 | 0.37 | 0.18 | 0.50 |
VGG16 | 0.61 | 0.64 | 0.42 | 0.20 | 0.50 |
ResNet-152 | 0.50 | 0.66 | 0.32 | 0.16 | 0.50 |
Methods
- All models were trained on a synthetic dataset. This isolates GPU performance from CPU pre-processing performance.
- For each GPU, 10 training experiments were conducted on each model. The number of images processed per second was measured and then averaged over the 10 experiments.
- The speedup benchmark is calculated by taking the images / sec score and dividing it by the minimum image / sec score for that particular model. This essentially shows you the percentage improvement over the baseline (in this case the 1080 Ti).
- The 2080 Ti, 2080, Titan V, and V100 benchmarks utilized Tensor Cores.
Batch sizes used
Model | Batch Size |
---|---|
ResNet-50 | 64 |
ResNet-152 | 32 |
InceptionV3 | 64 |
InceptionV4 | 16 |
VGG16 | 64 |
AlexNet | 512 |
SSD | 32 |
Hardware
All benchmarks, except for those of the V100, were conducted using a Lambda Vector with swapped GPUs. The exact specifications are:
- RAM: 64 GB DDR4 2400 MHz
- Processor: Intel Xeon E5-1650 v4
- Motherboard: ASUS X99-E WS/USB 3.1
- GPUs: EVGA XC RTX 2080 Ti GPU TU102, ASUS 1080 Ti Turbo GP102, NVIDIA Titan V, and Gigabyte RTX 2080.
Software
All benchmarks, except for those of the V100, were conducted with:
- Ubuntu 18.04 (Bionic)
- CUDA 10.0
- TensorFlow 1.11.0-rc1
- cuDNN 7.3
The V100 benchmark was conducted with an AWS P3 instance with:
- Ubuntu 16.04 (Xenial)
- CUDA 9.0
- TensorFlow 1.12.0.dev20181004
- cuDNN 7.1
How we calculate system cost
The cost we use in our calculations is based on the estimated price of the minimal system that avoids CPU, memory, and storage bottlenecking for Deep Learning training. Note that this won’t be upgradable to anything more than 1 GPU.
- CPU: i7-8700K or equivalent (6 cores, 16 PCI-e lanes). ~$380.00 on Amazon.
- CPU Cooler: Noctua L-Type Premium. ~$50 on Amazon.
- Memory: 32 GB DDR4. ~$280.00 on Amazon.
- Motherboard: ASUS Prime B360-Plus (16x pci-e lanes for GPU). ~$105.00 on Amazon.
- Power supply: EVGA SuperNOVA 750 G2 (750W). ~$100.00 on Amazon.
- Case:NZXT H500 ATX case ~$70.00 on Amazon
- Labor: About $200 in labor if you want somebody else to build it for you.
Cost (excluding GPU): $1,291.65 after 9% sales tax.
Reproduce the benchmarks yourself
All benchmarking code is available on Lambda Lab’s GitHub repo. Share your results by emailing [email protected] or tweeting @LambdaAPI. Be sure to include the hardware specifications of the machine you used.
Step One: Clone benchmark repo
git clone https://github.com/lambdal/lambda-tensorflow-benchmark.git --recursive
Step Two: Run benchmark
- Input a proper gpu_index (default 0) and num_iterations (default 10)
cd lambda-tensorflow-benchmark . /benchmark.sh gpu_index num_iterations
Step Three: Report results
- Check the repo directory for folder <cpu>-<gpu>.logs (generated by benchmark.sh)
- Use the same num_iterations in benchmarking and reporting.
./report.sh <cpu>-<gpu>.logs num_iterations
We are now taking orders for the Lambda Blade 2080 Ti Server and the Lambda Quad 2080 Ti workstation. Email [email protected] for more info.
You can download this blog post as a whitepaper using this link: Download Full 2080 Ti Performance Whitepaper.
Page not found — Technical City
Page not found — Technical City
We couldn’t find such page: /en/video/geforce-rtx-2080-ti-vs-tesla-v100-sxm3-32-gb%23characteristics
Popular graphics cards comparisons
GeForce RTX
3060 Ti
vs
GeForce RTX
3060
GeForce RTX
3060 Ti
vs
GeForce RTX
3070
GeForce GTX
1050 Ti
vs
GeForce GTX
1650
GeForce GTX
1660 Ti
vs
GeForce GTX
1660 Super
GeForce RTX
2060 Super
vs
GeForce RTX
3060
GeForce GTX
1060 6 GB
vs
Radeon RX
580
Popular graphics cards
GeForce RTX
4090
GeForce GTX
1050 Ti
GeForce RTX
3060
Radeon RX
580
GeForce GTX
1660 Super
GeForce GTX
1650
Popular CPU comparisons
Ryzen 5
5600X
vs
Core i5
12400F
Ryzen 5
3600
vs
Core i5
10400F
Core i5
10400F
vs
Core i3
12100F
Ryzen 5
5600X
vs
Ryzen 5
5600G
Ryzen 5
3600
vs
Ryzen 5
5600X
Core i5
1135G7
vs
Ryzen 5
5500U
Popular CPUs
Ryzen 5
5500U
EPYC
7h22
Core i3
1115G4
Core i5
1135G7
Ryzen 5
3500U
Ryzen 5
5600X
Compare NVIDIA GeForce RTX 2080 Ti and NVIDIA Tesla V100 PCIe 32 GB
Comparative analysis of NVIDIA GeForce RTX 2080 Ti and NVIDIA Tesla V100 PCIe 32 GB video cards by all known characteristics in the categories: General information, Specifications, Video outputs and ports, Compatibility, dimensions, requirements, API support, Memory.
Analysis of video card performance by benchmarks: PassMark — G3D Mark, PassMark — G2D Mark, Geekbench — OpenCL, CompuBench 1.5 Desktop — Face Detection (mPixels/s), CompuBench 1.5 Desktop — Ocean Surface Simulation (Frames/s), CompuBench 1.5 Desktop — T -Rex (Frames/s), CompuBench 1.5 Desktop — Video Composition (Frames/s), CompuBench 1.5 Desktop — Bitcoin Mining (mHash/s), GFXBench 4.0 — Car Chase Offscreen (Frames), GFXBench 4.0 — Manhattan (Frames), GFXBench 4.0 — T-Rex (Frames), GFXBench 4.0 — Car Chase Offscreen (Fps), GFXBench 4.0 — Manhattan (Fps), GFXBench 4.0 — T-Rex (Fps), 3DMark Fire Strike — Graphics Score.
nine0003
NVIDIA GeForce RTX 2080 Ti
versus
NVIDIA Tesla V100 PCIe 32 GB
Benefits
Why Choose NVIDIA GeForce RTX 2080 Ti
- vs 1230 MHz
- 12% more core clock in Boost mode: 1545 MHz vs 1380 MHz
- 8x more memory clock(s): 14000 MHz vs 1752 MHz
- 2. 4 times more performance in GFXBench 4.0 — Car Chase Offscreen (Frames) benchmark: 23659 vs 9969
- About 6% more performance in GFXBench 4.0 — Manhattan (Frames) benchmark: 3718 vs 3521
- About 6% more performance in GFXBench 4.0 — Manhattan (Fps) benchmark: 3718 vs 3521
90 about 6% more performance in GFXBench 4.0 — Car Chase Offscreen (Fps) benchmark: 23659 vs 9969
Issue date | 20 September 2018 vs 27 March 2018 |
Core frequency | 1350 MHz vs 1230 MHz |
Boost core clock | 1545 MHz vs 1380 MHz |
Memory frequency | |
GFXBench 4.0 — Car Chase Offscreen (Frames) | 23659 vs 9969 |
GFXBench 4. 0 — Manhattan (Frames) | 3718 vs 3521 |
GFXBench 4.0 — Car Chase Offscreen (Fps) | 23659 vs 9969 |
GFXBench 4.0 — Manhattan (Fps) | 3718 vs 3521 |
Reasons to choose NVIDIA Tesla V100 PCIe 32 GB
- About 4% better performance in Geekbench — OpenCL benchmark: 144501 vs 138925
144501 vs 138925 | |
GFXBench 4.0 — T-Rex (Frames) | 3360 vs 3358 |
GFXBench 4.0 — T-Rex (Fps) | 3360 vs 3358 |
Benchmark comparison
GPU 1: NVIDIA GeForce RTX 2080 Ti
GPU 2: NVIDIA Tesla V100 PCIe 32 GB
Geekbench — OpenCL |
|
|
||||
GFXBench 4. 0 — Car Chase Offscreen (Frames) |
|
|||||
GFXBench 4.0 — Manhattan (Frames) |
|
|||||
GFXBench 4.0 — T-Rex (Frames) |
|
nine0040 | ||||
GFXBench 4.0 — Car Chase Offscreen (Fps) |
|
|||||
GFXBench 4. 0 — Manhattan (Fps) | nine0036 | |||||
GPU 1 | ||||||
GPU 2 |
GPU 1 |
GPU 2 |
nine0040
Name | NVIDIA GeForce RTX 2080 Ti | NVIDIA Tesla V100 PCIe 32 GB |
---|---|---|
PassMark — G3D Mark | 21886 | |
PassMark — G2D Mark | 942 | |
Geekbench — OpenCL | 138925 | 144501 |
CompuBench 1. 5 Desktop — Face Detection (mPixels/s) | 394.035 | |
CompuBench 1.5 Desktop — Ocean Surface Simulation (Frames/s) | 5451.484 | |
CompuBench 1.5 Desktop — T-Rex (Frames/s) | 41.133 | |
CompuBench 1.5 Desktop — Video Composition (Frames/s) | 173.697 | |
CompuBench 1.5 Desktop — Bitcoin Mining (mHash/s) | 1963.652 | |
GFXBench 4.0 — Car Chase Offscreen (Frames) | 23659 | 9969 |
GFXBench 4. 0 — Manhattan (Frames) | 3718 | |
GFXBench 4.0 — T-Rex (Frames) | 3358 | 3360 |
GFXBench 4.0 — Car Chase Offscreen (Fps) | 23659 | 9969 |
GFXBench 4.0 — Manhattan (Fps) | 3521 | |
GFXBench 4.0 — T-Rex (Fps) | 3358 | 3360 |
3DMark Fire Strike — Graphics Score | 14614 |
Performance comparison
NVIDIA GeForce RTX 2080 Ti | NVIDIA Tesla V100 PCIe 32 GB | |
---|---|---|
Architecture | Turing | Volta |
Codename | TU102 | GV100 |
Issue date | 20 September 2018 | March 27, 2018 |
Price at first issue date | $999 | |
Place in the ranking | 19 | 206 |
Price now | $1,279. 99 | |
Type | Desktop | Workstation |
Price/performance ratio (0-100) | 17.22 | nine0043 |
Boost core clock | 1545MHz | 1380MHz |
Core frequency | 1350MHz | 1230MHz |
Process | 12nm | 12 nm |
Power consumption (TDP) | 250 Watt | 250 Watt |
Number of transistors | 18,600 million | 21,100 million |
Floating point performance | 14. 131 gflops | |
Number of shaders | 5120 | |
Texturing speed | 441.6 GTexel/s | |
Video connectors | No outputs | |
Interface | PCIe 3.0 x16 | PCIe 3.0 x16 |
Length | 267mm | |
Additional power connectors | 2x 8-pin | 2x 8-pin |
DirectX | 12. 0 (12_1) | 12.0 (12_1) |
OpenGL | 4.6 | 4.6 |
Memory frequency | 1752 MHz | |
Maximum memory size | 32GB | |
Memory bandwidth | 897.0 GB/s | |
Memory bus width | 4096 Bit | |
Memory type | HBM2 |
Future Nvidia GPUs tested | Page 2
SergeyMAGNITOGORSK
Experienced
-
#21
Nik65 said:
Great! But what is it for miners? Is there at least 1 Tesla owner on the forum?
Click to expand. ..
Ether to mine
fasam
Experienced
-
#22
these cards do not mine, or rather, there are fewer hashes than on a regular one, I’m talking about computing, there are no normal miners for them, or rather there are, but not everyone is available
Market maker
Grider
Forum friend
-
nine0700 29 Feb 2020
-
-
#23
noodl said:
AMD is still better.
Click to expand…
You can’t argue with numbers
Steam Hardware & Software Survey
noodl
nine
tipo
Genius of thought
-
-
#25
mechislav said:
I’m just sitting around, finally playing enough of my favorite total thief, no more farms, no noise, dust, nerves, forever falling rates
Click to expand…
And did you come here because you got abstinuha? So you break.
kol_660
Forum legend
-
-
#26
Grider said:
Two entries have been found in the online Geekbench test suite database that reveal the main parameters of yet-to-be-revealed Nvidia devices. Available of the first GPU, there are 118 multiprocessor units (SM) with a total of 7552 CUDA cores, and 24 GB video buffer. The second accelerator with 108 computing units (6992 CUDA cores) received 48 GB of memory.
View attachment 129978View attachment 129979
For reference, the GeForce RTX 2080 T i graphics card operates 4352 CUDA cores, and in the case of Tesla V100 adapters, their number is 5376 pieces. If we talk about the results of the OpenCL subtest, then the first device scored 184 thousand points in it, and the second — almost 142 thousand points. Quite good, given the rather low (1-1.1 GHz) frequencies of the video core. The performance of the current Nvidia GPUs can be found below. nine0005 View attachment 129977
Rumor has it that the above accelerators use 7nm (or 8nm) Nvidia Ampere graphics cores. These GPUs are primarily aimed at the resource-intensive computing segment, where the Tesla V100, which debuted back in 2017, now rules the show. The flagship GA100 die will physically include 8192 CUDA cores and communicate with six HBM2 memory stacks via a 6144-bit bus. It remains to wait for details from Nvidia itself.Source:
overclockers.ua
nine0003Click to expand…
thanks to you I learned that there are video cards more powerful than titanium V))
kol_660
Forum legend
-
-
#27
fasam said:
nine0004
these cards do not mine, or rather, there are fewer hashes than on a regular one, I’m talking about computing, there are no normal miners for them, or rather there are, but not everyone is available
Grider
Forum friend
nine0885 29 Feb 2020
#28
noodl said:
Click to expand…
X-Wiz-X
Local
nine0648
#29
Alexey73 said:
that’s when the income really wakes up and write here, otherwise soon they will start creating topics, to take a shit or not,
Click to expand. ..
So after all, should I go or not..?
skybest
Forum friend
#thirty
Grider said:
Hardly here, but there are a lot of charged freaks with farms from the most expensive maps, I saw a farm from Titan V on YouTube (for 3k $ per map)
Click to expand…
Good exchange without KYC 9odubo
Forum friend
#32
A friend deals with computers, the client ordered a computer with a quadro, I don’t remember the model, the card itself cost 3 zucchini in the purchase, after assembly he decided to test it on the air, how much this monster gives out, the result is 2 (TWO!!!) megahash!!! Then there was no 2000 series yet and the 1080ti was a top. nine0003
de0
Local
#33
Grider said:
Two entries have been found in the online Geekbench test suite database that reveal the main parameters of yet-to-be-revealed Nvidia devices. Available of the first GPU, there are 118 multiprocessor units (SM) with a total of 7552 CUDA cores, and 24 GB video buffer. The second accelerator with 108 computing units (6992 CUDA cores) received 48 GB of memory.
View attachment 129978View attachment 129979
For reference, the GeForce RTX 2080 T i graphics card operates 4352 CUDA cores, and in the case of Tesla V100 adapters, their number is 5376 pieces. If we talk about the results of the OpenCL subtest, then the first device scored 184 thousand points in it, and the second — almost 142 thousand points. Quite good, given the rather low (1-1.1 GHz) frequencies of the video core. The performance of the current Nvidia GPUs can be found below. nine0005 View attachment 129977
Rumor has it that the above accelerators use 7nm (or 8nm) Nvidia Ampere graphics cores. These GPUs are primarily aimed at the resource-intensive computing segment, where the Tesla V100, which debuted back in 2017, now rules the show. The flagship GA100 die will physically include 8192 CUDA cores and communicate with six HBM2 memory stacks via a 6144-bit bus. It remains to wait for details from Nvidia itself.Source:
overclockers.ua
nine0003Click to expand…
A whole 25% increase and 50% faster than the 2080ti. Well, why is this a miracle for miners for thousands of dollars?
#34
de0 said:
A whole 25% increase and 50% faster than the 2080ti.