Gtx 1080 ti nvlink: NVLink RTX 2080 Ti Benchmark: x16/x16 vs. x8 & GTX 1080 Ti SLI | GamersNexus

NVLink RTX 2080 Ti Benchmark: x16/x16 vs. x8 & GTX 1080 Ti SLI | GamersNexus

Test Platform – X299 / PCIe Bandwidth Limitation Testing

We are using the following components to benchmark PCIe bandwidth limitations:









 

Component

Courtesy of

CPU

Intel i9-7980XE 4.6GHz

Intel

GPU

This is what we’re testing!

Often the company that makes the card, but sometimes us (see article)

Motherboard

EVGA X299 DARK

EVGA

RAM

GSkill Trident Z Black 32GB 3600MHz (4 sticks)

GSkill

PSU

Corsair AX1600i

Corsair

Cooler

NZXT Kraken X62

NZXT

SSD

ADATA S60
Crucial MX300 1TB

GamersNexus

On this platform, we are toggling between PCIe generations to create limitations on the per-lane throughput, thus enabling visibility to potential limitations within the interface itself. This will help us determine viability of testing later in the content.

Test Methodology – Game Benchmarks

Testing methodology has completely changed from our last GPU reviews, which were probably for the GTX 1070 Ti series cards. Most notably, we have overhauled the host test bench and had updated with new games. Our games selection is a careful one: Time is finite, and having analyzed our previous testing methodologies, we identified shortcomings where we were ultimately wasting time by testing too many games that didn’t provide meaningfully different data from our other tested titles. In order to better optimize our time available and test “smarter” (rather than “more,” which was one of our previous goals), we have selected games based upon the following criteria:

  • Game Engine: Most games run on the same group of popular engines. By choosing one game from each major engine (e.g. Unreal Engine), we can ensure that we are representing a wide sweep of games that just use the built-in engine-level optimizations
  • API: We have chosen a select group of DirectX 11 and DirectX 12 API integrations, as these are the most prevalent at this time. We will include more Vulkan API testing as more games ship with Vulkan
  • Popularity: Is it something people actually play?
  • Longevity: Regardless of popularity, how long can we reasonably expect that a game will go without updates? Updating games can hurt comparative data from past tests, which impacts our ability to cross-compare new data and old, as old data may no longer be comparable post-patch

Game graphics settings are defined in their respective charts.

We are also testing most games at all three popular resolutions – at least, we are for the high-end. This includes 4K, 1440p, and 1080p, which allows us to determine GPU scalability across multiple monitor types. More importantly, this allows us to start pinpointing the reason for performance uplift, rather than just saying there is performance uplift. If we know that performance boosts harder at 4K than 1080p, we might be able to call this indicative of a ROPs advantage, for instance. Understanding why performance behaves the way it does is critical for future expansion of our own knowledge, and thus prepares our content for smarter analysis in the future.

For the test bench proper, we are now using the following components:

GPU Test Bench (Sponsored by Corsair)









 

Component

Courtesy of

CPU

Intel i7-8086K 5.0GHz

GamersNexus

GPU

This is what we’re testing!

Often the company that makes the card, but sometimes us (see article)

Motherboard

ASUS ROG Maximus X Hero

ASUS

RAM

Corsair Vengeance LPX 32GB 3200MHz

Corsair

PSU

Corsair AX1600i

Corsair

Cooler

NZXT Kraken X62

NZXT

SSD

Plextor 256-M7VC
Crucial MX300 1TB

GamersNexus

Separately, for the initial RTX 20-series reviews, we are using 10-series board partner models instead of reference models. This is because we know that most of the market, for fact, is using board partner models, and we believe this to be the most realistically representative and relatable for our audience. We acknowledge that the differences between the RTX and GTX reference cards would be more pronounced than when comparing partner cards, but much of this is resultant of poor cooler and reference card solutions in the previous generation. It creates, in our eyes, an unrealistically strong appearance for incoming cards on dual-axial coolers, and does not help the vast majority of users who own board partner model 10-series cards.

PCIe 3.0 Bandwidth Limitations: RTX 2080 Ti NVLink Benchmark

Ashes of the Singularity: Explicit Multi-GPU PCIe Bandwidth Test

Ashes of the Singularity is an incredibly interesting benchmarking tool for this scenario. Ashes uses explicit multi-GPU via the PCIe bus to allow multiple GPUs of varying make, a unique feature of Dx12, and also communicates entirely via the PCIe bus. This means that the cards can’t lean on the 100GB/s bandwidth provided to them by NVLink. Instead, all that data transacts over the significantly more limited bandwidth of PCIe, which is limited to 16GB/s in x16 mode. In our Titan V testing – and we’ll pop the old chart up on screen – we found that the PCIe bandwidth limits were finally being strained. Again, this is with no supporting bridge, and it’s the only title we know of that really makes use of multi-GPU like this.

For the 2080 Tis, we removed the NVLink bridge and just tested them via explicit multi-GPU via the PCIe bus. This is to determine at what point we hit PCIe 3.0 limitations; with PCIe 4.0 looming, there’s been a lot of talk of EOL for 3.0. In Ashes, we found that our maximum performance was 127.2FPS AVG, averaged across 10 runs. Running in x8/x8, which would be common in Z370 platforms, we had a measurable and consistent loss that exited margin of error. The loss was about 1.7%. Not a big deal. Running in x8/x8, we saw a massive performance penalty. The cards were now limited to 107FPS AVG, resulting in a 16% loss.

TimeSpy Extreme PCIe Bandwidth Limitation Benchmark

TimeSpy Extreme is an extremely useful synthetic tool for this type of benchmark, and also runs memory hard for the GPUs. We ran TimeSpy extreme 5 times each on these cards, fully automated, and found a difference of 0.19% between x8/x8 and x16/x16 for GFX test 1, which is geometrically intensive. This is way within margin of test variance for 3DMark, and produces zero loss of performance between x8/x8 and x16/x16. Part of this is likely because of NVLink’s additional bandwidth, reducing reliance on the PCIe bus.

Firestrike Ultra PCIe 3.0 x16/x16 vs. x8/x8 with 2080 Ti

For Firestrike Ultra, we observed an FPS difference of about 1% — it was a 0.9% difference in GFX 1 and a 1.0% difference in GFX 2. We ended up running these an additional 5 times, for a total of 10 each, and found the results repeated. Firestrike has variance run-to-run, so we cannot with full confidence state that a difference exists – but if one does exist here, it amounts to a 1% advantage in x16/x16 versus x8/x8.

Negative scaling is of questionable existence at x16/x8, but certainly exists when forced down PCIe entirely (bypassing the NVLink bridge). We only know of one ‘game’ which does this presently, and that’s Ashes.

2x RTX 2080 Ti NVLink vs. GTX 1080 Ti SLI & Single RTX 2080 Ti

Sniper Elite 4 – NVLink Benchmark vs. RTX 2080 Ti & SLI 1080 Ti

Sniper Elite 4 produced some of the best scaling results, as it often does. This game is also the best DirectX 12 implementation we’re aware of, so its scaling will not apply to all games universally – it is an outlier, but a good one that can teach us a lot.

With our usual benchmark settings, the dual, NVLinked cards push past 200FPS and hit an average of 210FPS under non-overclocked settings. This outperforms the stock RTX 2080 Ti FE by about 94%. This is nearly perfect 2x scaling, and has been rare to achieve in the past years – but it’s always exciting when we see it, because this is what multi-GPU should be like. Versus the overclocked, single 2080 Ti, we saw a performance gain of 71% with the stock 2080 Tis in NVLink. Not bad, and overclocking the two cards, although annoying to find stability, would regain that lead. The GTX 1080 Tis in SLI

The next major consideration is frametime consistency: Multi-GPU has traditionally shown terrible frame-to-frame interval consistency, often resulting in things like micro-stutter or intolerable tearing. For this pairing, as you can see in our frametime plot, the lows scale pretty well. It’s not a near-perfect 2x scaling like the average, but it’s pretty close. As a reminder, these plots are to be read as lowest is best, but more consistent is more important than just being a low interval. 16ms is 60FPS. Very impressive performance in this game, which is more a testament to Sniper 4’s development team than anything else – they have continued to build some of the best-optimized games in the space.

We also tested with Sniper 4 at Ultra settings, just to remove CPU bottleneck concerns. Here’s a quick chart with those results, although they aren’t too different.

Far Cry 5 RTX 2080 Ti x8/x8 NVLink vs. Single Card

Far Cry 5 and the Dunia engine also show some SLI or NVLink scaling support. At 4K/High, Far Cry 5 plots the RTX 2080 Ti single card at 74FPS AVG stock, or 83FPS AVG overclocked. Lows stick around 55-60FPS in each value. With dual-cards, we manage 108FPS AVG, posting a growth of 46% over the single 2080 Ti stock card’s 74FPS AVG. That’s not nearly as exciting as the past result, but at least it’s still some scaling. At 50%, though, you can’t help but feel like you’re only getting $600 of value out of your additional $1200 purchase. For the lows, we’re looking at a 0.1% of 60FPS, compared to a 0.1% of 55FPS on the stock 2080 Ti – no improvement there. Let’s look at a more valuable frametime plot, as these 0.1% metrics don’t tell the whole story.

In our frametime chart, we can see the limitations of scaling. Although the NVLinked cards run a higher average, they fail to sustain similar scaling in frametime consistency. Frametimes are spikier and potentially more jarring, although raw framerate alone makes up for much of this lost frame-to-frameEV interval consistency.

Back to the main chart now, we also have the 1080 Ti cards in SLI to consider: In this configuration, the SLI 1080 Tis operate at 91.4FPS AVG, with spurious lows bouncing around between 42FPS and 66FPS for the 0.1% metric. For averages, the overall performance uplift amounts to about 60% over a single 1080 Ti SC2, and outperforms a single 2080 Ti FE card. Of course, there’ll be games where SLI gets you nothing, but instances like this will permit out-performing new hardware at the same price.

Shadow of the Tomb Raider GPU Benchmark – NVLink RTX 2080 Ti

Shadow of the Tomb Raider is a new game still and will eventually host RTX features, but didn’t at the time of filming. The game also uses a modified Crystal engine. It’s got a lot of issues with NVLink and SLI, and nVidia is aware of them. As of now, we have experienced blue screens of death upon launch, crashes upon minimizing, and other seemingly random crashes. Fortunately, we were eventually able to figure out how to work around these for long enough to run a benchmark – just know that the game is very unstable with multi-GPU. One of the other issues we discovered was constant blue screens with TAA enabled, which is unfortunately what we used for our full GPU review. For this reason, we retested the 1080 Ti with TAA off as well, just for a baseline. We did not retest all devices, only those which are marked.

At 4K, Shadow of the Tomb Raider shows a few-FPS difference between the 1080 Ti SC2 with TAA on and TAA off. This shows that there is minimal overall performance impact, but it will offset our data a bit. The 2080 Ti FE single-card averaged 67FPS originally, with lows tightly timed at around 56-58. This means frametimes are very consistent with a single card. Multi-GPU got us 147FPS AVG, and the dual 1080 Tis got 113FPS AVG. These two numbers are directly relatable as they were run under fully identical conditions: For SLI, it becomes even more difficult to justify the 2080 Tis versus 1080 Tis, not that we fully endorse SLI as a good overall option.

F1 2018 GPU Benchmark – NVLink vs. SLI

F1 2018 also showed scaling results. NVLinked 2080 Tis managed 168FPS AVG here, with 1% lows at around 69FPS. The RTX 2080 Ti single-card had an average of 99, with its 1% lows at 47FPS when stock. The result is scaling of about 70% — pretty good. It’s not as impressive as Sniper, but still a better gain overall than expected for SLI configurations over the last few years. As for the 1080 Tis in SLI, we measured them at 88FPS AVG and 57FPS for 1% lows. A single 1080 Ti SC2 ran at 81FPS, giving us a dismal scaling of 9%.

Hellblade GPU Benchmark – NVLink RTX 2080 Ti

Hellblade is up next, just for a Dx11 Unreal Engine title. This game has some of the best graphics in a game right now, making it a good benchmarking option, and represents Unreal Engine 4 well. It also did not show any scacling; in fact, technically, we observed negative scaling with this title. We saw a drop of about 16% in performance, with additional jarring tearing during gameplay. Doing research for this content, we learned that there is a custom Hellblade SLI profile out there, but it is not an nVidia official profile. Out of the box, it appears that NVLink does not work with Hellblade, but it also looks like some mods could be made to hack it to work. The 1080 Ti saw negative scaling.

GTA V NVLink Benchmark

GTA V is next. This is another Dx11 title, but it uses the RAGE engine and has been more heavily tuned for graphics hardware over its three-year tenure. It also shows some scaling in averages, though not necessarily in low-end frametime performance. We posted a 132FPS AVG with the NVLinked cards, as opposed to a 77FPS AVG with a single FE GPU. The difference is an approximate 71% improvement in average framerate, but lock-step frametime performance. There is no improvement in low-end performance. For dual GTX 1080 Ti cards in SLI, we managed 117FPS AVG, for a gain of 83% over a single GTX 1080 Ti SC2, allowing it to outperform a 2080 Ti when overclocked, albeit with similar frametime consistency illustrated in the lows. We are nearing CPU limitations in this game, with hard limits around 170FPS AVG on our configuration.

Conclusion: Is NVLink or SLI Worth It in 2018?

It’s certainly gotten better since we last looked at multi-GPU game support. We have never once, in the 10-year history of the site, recommended multi-GPU for AMD or nVidia when given the stronger, more cost-effective single-card alternatives.

That trend continues, but it continues with more hesitance in the answer than ever before. Overall game support is improved, but it’s clear – if only because of SOTTR’s dismal BSOD issues at launch – that games still won’t be immediately supported. Marketshare of multi-GPU users is infinitesimal and off the radar of developers, at least without direct encouragement from the graphics vendors. You could be waiting weeks (or months – or ad infinitum) for multi-GPU support to get patched into games. Most of them won’t have it at launch. A stronger user community than previous years does mean more options if nVidia fails to officially provide SLI profiles, though. NVidia’s renewed focus on selling sets of cards to users, rather than one, may also benefit multi-GPU support. The rise of low-level, low-abstraction APIs has also aided in multi-GPU scalability. It is now more common to see 70% scaling and up.

But we still don’t wholly recommend multi-GPU configurations, particularly given our present stance on the RTX lineup. When it doesn’t work, it burns, and nVidia does not have a strong track record in recent years for supporting its own technologies. VXAO and Flow are rarely used, if ever. MFAA vanished. SLI was shunted with Pascal, forced down to two-way and then forgotten. NVidia hasn’t even updated its own list of supported SLI games – and NVLink is SLI, in this regard – to include the most recent, compatible titles. The company couldn’t give more signals that it won’t support this technology, despite scaling actually improving year-over-year.

It’s better. That much is certain. It’s just a question of whether you can trust nVidia to continue pushing for multi-GPU adoption.

Editorial, Testing: Steve Burke
Video: Andrew Coleman

RTX 2080Ti with NVLINK — TensorFlow Performance (Includes Comparison with GTX 1080Ti, RTX 2070, 2080, 2080Ti and Titan V)

Table of Contents


  • Test system
    • Hardware
    • Software
  • Functionality and Peer-to-Peer Data Transfer Performance for 2 RTX 2080 GPU’s with NVLINK
    • RTX 2080 Ti NVLINK «Capability» report from nvidia-smi nvlink -c
  • RTX 2080 Ti NVLINK Peer-To-Peer Performance:
    • simpleP2P
      • Does NVLINK with two NVIDIA RTX 2080 Ti GPU use both Links for CUDA Memory Copy?
    • p2pBandwidthLatencyTest
  • TensorFlow performance with 2 RTX 2080 Ti GPU’s and NVLINK
  • TensorFlow CNN: ResNet-50
    • ResNet-50 – GTX 1080Ti, RTX 2070, RTX 2080, RTX 2080Ti, Titan V – TensorFlow – Training performance (Images/second)
  • TensorFlow LSTM: Big-LSTM 1 Billion Word Dataset
    • «Big LSTM» – GTX 1080Ti, RTX 2070, RTX 2080, RTX 2080Ti, Titan V – TensorFlow – Training performance (words/second)
  • Should you get an RTX 2080Ti (or two, or more) for machine learning work?

This post is a continuation of the NVIDIA RTX GPU testing I’ve done with TensorFlow in; NVLINK on RTX 2080 TensorFlow and Peer-to-Peer Performance with Linux and NVIDIA RTX 2080 Ti vs 2080 vs 1080 Ti vs Titan V, TensorFlow Performance with CUDA 10. 0. The same job runs as done in these previous two posts will be extended with dual RTX 2080Ti’s. I was also able to add performance numbers for a single RTX 2070.

If you have read the earlier posts then you may want to just scroll down and check out the new result tables and plots.


  • Puget Systems Peak Single
  • Intel Xeon-W 2175 14-core
  • 128GB Memory
  • 1TB Samsung NVMe M.2
  • GPU’s
  • GTX 1080Ti
  • RTX 2070
  • RTX 2080 (2)
  • RTX 2080Ti (2)
  • Titan V
  • Ubuntu 18.04
  • NVIDIA display driver 410.66 (from CUDA install) NOTE: The 410.48 driver that I used in previous testing was causing system restarts during the big LSTM testing with 2 RTX 2080Ti’s and NVLINK.
  • CUDA 10.0 Source builds of,
    • simpleP2P
    • p2pBandwidthLatencyTest
  • TensorFlow 1.10 and 1.4
  • Docker 18.06.1-ce
  • NVIDIA-Docker 2. 0.3
  • NVIDIA NGC container registry
  • Container image: nvcr.io/nvidia/tensorflow:18.08-py3 for «Big LSTM»
  • Container image: nvcr.io/nvidia/tensorflow:18.03-py2 linked with NCCL and CUDA 9.0 for milti-GPU «CNN»

Two TensorFlow builds were used since the latest version of the TensorFlow docker image on NGC does not support multi-GPU for the CNN ResNet-50 training test job I like to use. For the «Big LSTM billion word» model training I use the latest container with TensorFlow 1.10 linked with CUDA 10.0. Both of the test programs are from «nvidia-examples» in the container instances.

For details on how I have Docker/NVIDIA-Docker configured on my workstation have a look at the following post along with the links it contains to the rest of that series of posts. How-To Setup NVIDIA Docker and NGC Registry on your Workstation – Part 5 Docker Performance and Resource Tuning


There are two links available:

GPU 0: GeForce RTX 2080 Ti (UUID:

  • Link 0, P2P is supported: true
  • Link 0, Access to system memory supported: true
  • Link 0, P2P atomics supported: true
  • Link 0, System memory atomics supported: true
  • Link 0, SLI is supported: true
  • Link 0, Link is supported: false
  • Link 1, P2P is supported: true
  • Link 1, Access to system memory supported: true
  • Link 1, P2P atomics supported: true
  • Link 1, System memory atomics supported: true
  • Link 1, SLI is supported: true
  • Link 1, Link is supported: false

Those two links get aggregated over the NVLINK bridge!

In summary, NVLINK with two RTX 2080 Ti GPU’s provides the following features and performance,

  • Peer-to-Peer memory access: Yes
  • Unified Virtual Addressing (UVA): Yes

Yes!

  • cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 44. 87GB/s

That is twice the unidirectional bandwidth of the RTX 2080.

The terminal output below shows that two RTX 2080 Ti GPU’s with NVLINK provides,

  • Unidirectional Bandwidth: 48 GB/s

  • Bidirectional Bandwidth: 96 GB/s

  • Latency (Peer-To-Peer Disabled),

    • GPU-GPU: 12 micro seconds
  • Latency (Peer-To-Peer Enabled),

    • GPU-GPU: 1.3 micro seconds

Bidirectional bandwidth over NVLINK with 2 2080 Ti GPU’s is nearly 100 GB/sec!

P2P Connectivity Matrix
     D\D     0     1
     0	     1     1
     1	     1     1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 528.83   5.78
     1   5.81 531.37
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
   D\D     0      1
     0 532. 21  48.37
     1  48.38 532.37
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 535.76  11.31
     1  11.42 536.52
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 535.72  96.40
     1  96.40 534.63
P2P=Disabled Latency Matrix (us)
   GPU     0      1
     0   1.93  12.10
     1  12.92   1.91

   CPU     0      1
     0   3.77   8.49
     1   8.52   3.75
P2P=Enabled Latency (P2P Writes) Matrix (us)
   GPU     0      1
     0   1.93   1.34
     1   1.34   1.92

   CPU     0      1
     0   3.79   3.08
     1   3.07   3.76

First, don’t expect miracles from that 100GB/sec bidirectional bandwidth, …

The convolution neural network (CNN) and LSTM problems I’ll test will not expose much of the benefit of using NVLINK. This is because their multi-GPU algorithms achieve parallelism mostly by distributing data as independent batches of images or words across the two GPU’s. There is little use of GPU-to-GPU communication. Algorithms with finer grained parallelism that need more direct data and instruction access across the GPU’s would benefit more.

The TensorFlow jobs the I have run with 2 GPU’s and NVLINK are giving around 6-8% performance boost. That is right around the percentage cost increase of adding the NVLINK bridge. It looks like you get what you pay for, which is a good thing! I haven’t tested anything yet where the (amazing) bandwidth will really help. You may have ideas where that would be a big help?? I have a lot more testing to do.

I am using benchmarks that I used in the recent post «NVLINK on RTX 2080 TensorFlow and Peer-to-Peer Performance with Linux». The CNN code I am using is from an older NGC docker image with TensorFlow 1.4 linked with CUDA 9.0 and NCCL. I’m using this in order to have a multi-GPU support utilizing the NCCL communication library for the CNN code. The most recent version of that code does not support this. The LSTM «Billion Word» benchmark I’m running is using the newer version with TensorFlow 1.10 link with CUDA 10.0.

I’ll give the command-line inputs for reference.

The tables and plots are getting bigger! I’ve been adding to the testing data over the last 3 posts. There is now comparison of GTX 1080 Ti, RTX 2070, 2080, 2080 Ti and Titan V.

Docker container image tensorflow:18.03-py2 from NGC,

docker run --runtime=nvidia --rm -it -v $HOME/projects:/projects nvcr.io/nvidia/tensorflow:18.03-py2

Example command line for job start,

NGC/tensorflow/nvidia-examples/cnn# python nvcnn.py --model=resnet50 --batch_size=64 --num_gpus=2 --fp16

Note, –fp16 means «use tensor-cores».

GPU FP32
Images/sec
FP16 (Tensor-cores)
Images/sec
RTX 2070 192 280
GTX 1080 Ti 207 N/A
RTX 2080 207 332
RTX 2080 Ti 280 437
Titan V 299 547
2 x RTX 2080 364 552
2 x RTX 2080+NVLINK 373 566
2 x RTX 2080 Ti 470 750
2 x RTX 2080 Ti+NVLINK 500 776

 

 


Docker container image tensorflow:18. 09-py3 from NGC,

docker run --runtime=nvidia --rm -it -v $HOME/projects:/projects nvcr.io/nvidia/tensorflow:18.09-py3

Example job command-line,

/NGC/tensorflow/nvidia-examples/big_lstm# python single_lm_train.py --mode=train --logdir=./logs --num_gpus=2 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output/ --hpconfig run_profiler=False,max_time=90,num_steps=20,num_shards=8,num_layers=2,learning_rate=0.2,max_grad_norm=1,keep_prob=0.9,emb_size=1024,projected_size=1024,state_size=8192,num_sampled=8192,batch_size=256
GPU FP32
Images/sec
RTX 2070 (Note:1) 4740
GTX 1080 Ti 6460
RTX 2080 (Note:1) 5071
RTX 2080 Ti 8945
Titan V (Note:2) 7066
Titan V (Note:3) 8373
2 x RTX 2080 8882
2 x RTX 2080+NVLINK 9711
2 x RTX 2080 Ti 15770
2 x RTX 2080 Ti+NVLINK 16977

 

 

  • Note:1 With only 8GB memory on the RTX 2070 and 2080 I had to drop the batch size down to 256 to keep from getting «out of memory» errors. That typically has a big (downward) influence on performance.
  • Note:2 For whatever reason this result for the Titan V is worse than expected. This is TensorFlow 1.10 linked with CUDA 10 running NVIDIA’s code for the LSTM model. The RTX 2080Ti performance was very good!
  • Note:3 I re-ran the «big-LSTM» job on the Titan V using TensorFlow 1.4 linked with CUDA 9.0 and got results consistent with what I have seen in the past. I have no explanation for the slowdown with the newer version of «big-LSTM».

I’ve said it before … I think that is an obvious yes! For ML/AI work using fp32 or fp16 (tensor-cores) precision the new NVIDIA RTX 2080 Ti looks really good. The RTX 2080 Ti may seem expensive but I believe you are getting what you pay for. Two RTX 2080 Ti’s with the NVLINK bridge will cost less than a single Titan V and can give double (or more) of the performance in some cases. The Titan V is still the best value when you need fp64 (double precision). I would not hesitate to recommend the 2080 Ti for machine learning work.

I did get my first testing with the RTX 2070 in this post but I’m not sure if it is a good value or not for ML/AI work. However, from the limited testing here it looks like it would be a better value than the RTX 2080 if you have a tight budget.

I’m sure I will do 4 GPU testing before too long and that should be very interesting.

Happy computing! –dbk

Tags: CUDA, Machine Learning, ML/AI, NVIDIA, NVLINK, RTX 2080 Ti, TensorFlow

Help in choosing — Difficulties in choosing Palit 2080 | Page 3

Kullogr4mm
Forum friend