NVLink RTX 2080 Ti Benchmark: x16/x16 vs. x8 & GTX 1080 Ti SLI | GamersNexus
Test Platform – X299 / PCIe Bandwidth Limitation Testing
We are using the following components to benchmark PCIe bandwidth limitations:
Component
|
Courtesy of
|
|
CPU
|
Intel i9-7980XE 4.6GHz
|
Intel
|
GPU
|
This is what we’re testing!
|
Often the company that makes the card, but sometimes us (see article)
|
Motherboard
|
EVGA X299 DARK
|
EVGA
|
RAM
|
GSkill Trident Z Black 32GB 3600MHz (4 sticks)
|
GSkill
|
PSU
|
Corsair AX1600i
|
Corsair
|
Cooler
|
NZXT Kraken X62
|
NZXT
|
SSD
|
ADATA S60
|
GamersNexus
|
On this platform, we are toggling between PCIe generations to create limitations on the per-lane throughput, thus enabling visibility to potential limitations within the interface itself. This will help us determine viability of testing later in the content.
Test Methodology – Game Benchmarks
Testing methodology has completely changed from our last GPU reviews, which were probably for the GTX 1070 Ti series cards. Most notably, we have overhauled the host test bench and had updated with new games. Our games selection is a careful one: Time is finite, and having analyzed our previous testing methodologies, we identified shortcomings where we were ultimately wasting time by testing too many games that didn’t provide meaningfully different data from our other tested titles. In order to better optimize our time available and test “smarter” (rather than “more,” which was one of our previous goals), we have selected games based upon the following criteria:
- Game Engine: Most games run on the same group of popular engines. By choosing one game from each major engine (e.g. Unreal Engine), we can ensure that we are representing a wide sweep of games that just use the built-in engine-level optimizations
- API: We have chosen a select group of DirectX 11 and DirectX 12 API integrations, as these are the most prevalent at this time.
We will include more Vulkan API testing as more games ship with Vulkan
- Popularity: Is it something people actually play?
- Longevity: Regardless of popularity, how long can we reasonably expect that a game will go without updates? Updating games can hurt comparative data from past tests, which impacts our ability to cross-compare new data and old, as old data may no longer be comparable post-patch
Game graphics settings are defined in their respective charts.
We are also testing most games at all three popular resolutions – at least, we are for the high-end. This includes 4K, 1440p, and 1080p, which allows us to determine GPU scalability across multiple monitor types. More importantly, this allows us to start pinpointing the reason for performance uplift, rather than just saying there is performance uplift. If we know that performance boosts harder at 4K than 1080p, we might be able to call this indicative of a ROPs advantage, for instance. Understanding why performance behaves the way it does is critical for future expansion of our own knowledge, and thus prepares our content for smarter analysis in the future.
For the test bench proper, we are now using the following components:
GPU Test Bench (Sponsored by Corsair)
Component
|
Courtesy of
|
|
CPU
|
Intel i7-8086K 5.0GHz
|
GamersNexus
|
GPU
|
This is what we’re testing!
|
Often the company that makes the card, but sometimes us (see article)
|
Motherboard
|
ASUS ROG Maximus X Hero
|
ASUS
|
RAM
|
Corsair Vengeance LPX 32GB 3200MHz
|
Corsair
|
PSU
|
Corsair AX1600i
|
Corsair
|
Cooler
|
NZXT Kraken X62
|
NZXT
|
SSD
|
Plextor 256-M7VC
|
GamersNexus
|
Separately, for the initial RTX 20-series reviews, we are using 10-series board partner models instead of reference models. This is because we know that most of the market, for fact, is using board partner models, and we believe this to be the most realistically representative and relatable for our audience. We acknowledge that the differences between the RTX and GTX reference cards would be more pronounced than when comparing partner cards, but much of this is resultant of poor cooler and reference card solutions in the previous generation. It creates, in our eyes, an unrealistically strong appearance for incoming cards on dual-axial coolers, and does not help the vast majority of users who own board partner model 10-series cards.
PCIe 3.0 Bandwidth Limitations: RTX 2080 Ti NVLink Benchmark
Ashes of the Singularity: Explicit Multi-GPU PCIe Bandwidth Test
Ashes of the Singularity is an incredibly interesting benchmarking tool for this scenario. Ashes uses explicit multi-GPU via the PCIe bus to allow multiple GPUs of varying make, a unique feature of Dx12, and also communicates entirely via the PCIe bus. This means that the cards can’t lean on the 100GB/s bandwidth provided to them by NVLink. Instead, all that data transacts over the significantly more limited bandwidth of PCIe, which is limited to 16GB/s in x16 mode. In our Titan V testing – and we’ll pop the old chart up on screen – we found that the PCIe bandwidth limits were finally being strained. Again, this is with no supporting bridge, and it’s the only title we know of that really makes use of multi-GPU like this.
For the 2080 Tis, we removed the NVLink bridge and just tested them via explicit multi-GPU via the PCIe bus. This is to determine at what point we hit PCIe 3.0 limitations; with PCIe 4.0 looming, there’s been a lot of talk of EOL for 3.0. In Ashes, we found that our maximum performance was 127.2FPS AVG, averaged across 10 runs. Running in x8/x8, which would be common in Z370 platforms, we had a measurable and consistent loss that exited margin of error. The loss was about 1.7%. Not a big deal. Running in x8/x8, we saw a massive performance penalty. The cards were now limited to 107FPS AVG, resulting in a 16% loss.
TimeSpy Extreme PCIe Bandwidth Limitation Benchmark
TimeSpy Extreme is an extremely useful synthetic tool for this type of benchmark, and also runs memory hard for the GPUs. We ran TimeSpy extreme 5 times each on these cards, fully automated, and found a difference of 0.19% between x8/x8 and x16/x16 for GFX test 1, which is geometrically intensive. This is way within margin of test variance for 3DMark, and produces zero loss of performance between x8/x8 and x16/x16. Part of this is likely because of NVLink’s additional bandwidth, reducing reliance on the PCIe bus.
Firestrike Ultra PCIe 3.0 x16/x16 vs. x8/x8 with 2080 Ti
For Firestrike Ultra, we observed an FPS difference of about 1% — it was a 0.9% difference in GFX 1 and a 1.0% difference in GFX 2. We ended up running these an additional 5 times, for a total of 10 each, and found the results repeated. Firestrike has variance run-to-run, so we cannot with full confidence state that a difference exists – but if one does exist here, it amounts to a 1% advantage in x16/x16 versus x8/x8.
Negative scaling is of questionable existence at x16/x8, but certainly exists when forced down PCIe entirely (bypassing the NVLink bridge). We only know of one ‘game’ which does this presently, and that’s Ashes.
2x RTX 2080 Ti NVLink vs. GTX 1080 Ti SLI & Single RTX 2080 Ti
Sniper Elite 4 – NVLink Benchmark vs. RTX 2080 Ti & SLI 1080 Ti
Sniper Elite 4 produced some of the best scaling results, as it often does. This game is also the best DirectX 12 implementation we’re aware of, so its scaling will not apply to all games universally – it is an outlier, but a good one that can teach us a lot.
With our usual benchmark settings, the dual, NVLinked cards push past 200FPS and hit an average of 210FPS under non-overclocked settings. This outperforms the stock RTX 2080 Ti FE by about 94%. This is nearly perfect 2x scaling, and has been rare to achieve in the past years – but it’s always exciting when we see it, because this is what multi-GPU should be like. Versus the overclocked, single 2080 Ti, we saw a performance gain of 71% with the stock 2080 Tis in NVLink. Not bad, and overclocking the two cards, although annoying to find stability, would regain that lead. The GTX 1080 Tis in SLI
The next major consideration is frametime consistency: Multi-GPU has traditionally shown terrible frame-to-frame interval consistency, often resulting in things like micro-stutter or intolerable tearing. For this pairing, as you can see in our frametime plot, the lows scale pretty well. It’s not a near-perfect 2x scaling like the average, but it’s pretty close. As a reminder, these plots are to be read as lowest is best, but more consistent is more important than just being a low interval. 16ms is 60FPS. Very impressive performance in this game, which is more a testament to Sniper 4’s development team than anything else – they have continued to build some of the best-optimized games in the space.
We also tested with Sniper 4 at Ultra settings, just to remove CPU bottleneck concerns. Here’s a quick chart with those results, although they aren’t too different.
Far Cry 5 RTX 2080 Ti x8/x8 NVLink vs. Single Card
Far Cry 5 and the Dunia engine also show some SLI or NVLink scaling support. At 4K/High, Far Cry 5 plots the RTX 2080 Ti single card at 74FPS AVG stock, or 83FPS AVG overclocked. Lows stick around 55-60FPS in each value. With dual-cards, we manage 108FPS AVG, posting a growth of 46% over the single 2080 Ti stock card’s 74FPS AVG. That’s not nearly as exciting as the past result, but at least it’s still some scaling. At 50%, though, you can’t help but feel like you’re only getting $600 of value out of your additional $1200 purchase. For the lows, we’re looking at a 0.1% of 60FPS, compared to a 0.1% of 55FPS on the stock 2080 Ti – no improvement there. Let’s look at a more valuable frametime plot, as these 0.1% metrics don’t tell the whole story.
In our frametime chart, we can see the limitations of scaling. Although the NVLinked cards run a higher average, they fail to sustain similar scaling in frametime consistency. Frametimes are spikier and potentially more jarring, although raw framerate alone makes up for much of this lost frame-to-frameEV interval consistency.
Back to the main chart now, we also have the 1080 Ti cards in SLI to consider: In this configuration, the SLI 1080 Tis operate at 91.4FPS AVG, with spurious lows bouncing around between 42FPS and 66FPS for the 0.1% metric. For averages, the overall performance uplift amounts to about 60% over a single 1080 Ti SC2, and outperforms a single 2080 Ti FE card. Of course, there’ll be games where SLI gets you nothing, but instances like this will permit out-performing new hardware at the same price.
Shadow of the Tomb Raider GPU Benchmark – NVLink RTX 2080 Ti
Shadow of the Tomb Raider is a new game still and will eventually host RTX features, but didn’t at the time of filming. The game also uses a modified Crystal engine. It’s got a lot of issues with NVLink and SLI, and nVidia is aware of them. As of now, we have experienced blue screens of death upon launch, crashes upon minimizing, and other seemingly random crashes. Fortunately, we were eventually able to figure out how to work around these for long enough to run a benchmark – just know that the game is very unstable with multi-GPU. One of the other issues we discovered was constant blue screens with TAA enabled, which is unfortunately what we used for our full GPU review. For this reason, we retested the 1080 Ti with TAA off as well, just for a baseline. We did not retest all devices, only those which are marked.
At 4K, Shadow of the Tomb Raider shows a few-FPS difference between the 1080 Ti SC2 with TAA on and TAA off. This shows that there is minimal overall performance impact, but it will offset our data a bit. The 2080 Ti FE single-card averaged 67FPS originally, with lows tightly timed at around 56-58. This means frametimes are very consistent with a single card. Multi-GPU got us 147FPS AVG, and the dual 1080 Tis got 113FPS AVG. These two numbers are directly relatable as they were run under fully identical conditions: For SLI, it becomes even more difficult to justify the 2080 Tis versus 1080 Tis, not that we fully endorse SLI as a good overall option.
F1 2018 GPU Benchmark – NVLink vs. SLI
F1 2018 also showed scaling results. NVLinked 2080 Tis managed 168FPS AVG here, with 1% lows at around 69FPS. The RTX 2080 Ti single-card had an average of 99, with its 1% lows at 47FPS when stock. The result is scaling of about 70% — pretty good. It’s not as impressive as Sniper, but still a better gain overall than expected for SLI configurations over the last few years. As for the 1080 Tis in SLI, we measured them at 88FPS AVG and 57FPS for 1% lows. A single 1080 Ti SC2 ran at 81FPS, giving us a dismal scaling of 9%.
Hellblade GPU Benchmark – NVLink RTX 2080 Ti
Hellblade is up next, just for a Dx11 Unreal Engine title. This game has some of the best graphics in a game right now, making it a good benchmarking option, and represents Unreal Engine 4 well. It also did not show any scacling; in fact, technically, we observed negative scaling with this title. We saw a drop of about 16% in performance, with additional jarring tearing during gameplay. Doing research for this content, we learned that there is a custom Hellblade SLI profile out there, but it is not an nVidia official profile. Out of the box, it appears that NVLink does not work with Hellblade, but it also looks like some mods could be made to hack it to work. The 1080 Ti saw negative scaling.
GTA V NVLink Benchmark
GTA V is next. This is another Dx11 title, but it uses the RAGE engine and has been more heavily tuned for graphics hardware over its three-year tenure. It also shows some scaling in averages, though not necessarily in low-end frametime performance. We posted a 132FPS AVG with the NVLinked cards, as opposed to a 77FPS AVG with a single FE GPU. The difference is an approximate 71% improvement in average framerate, but lock-step frametime performance. There is no improvement in low-end performance. For dual GTX 1080 Ti cards in SLI, we managed 117FPS AVG, for a gain of 83% over a single GTX 1080 Ti SC2, allowing it to outperform a 2080 Ti when overclocked, albeit with similar frametime consistency illustrated in the lows. We are nearing CPU limitations in this game, with hard limits around 170FPS AVG on our configuration.
Conclusion: Is NVLink or SLI Worth It in 2018?
It’s certainly gotten better since we last looked at multi-GPU game support. We have never once, in the 10-year history of the site, recommended multi-GPU for AMD or nVidia when given the stronger, more cost-effective single-card alternatives.
That trend continues, but it continues with more hesitance in the answer than ever before. Overall game support is improved, but it’s clear – if only because of SOTTR’s dismal BSOD issues at launch – that games still won’t be immediately supported. Marketshare of multi-GPU users is infinitesimal and off the radar of developers, at least without direct encouragement from the graphics vendors. You could be waiting weeks (or months – or ad infinitum) for multi-GPU support to get patched into games. Most of them won’t have it at launch. A stronger user community than previous years does mean more options if nVidia fails to officially provide SLI profiles, though. NVidia’s renewed focus on selling sets of cards to users, rather than one, may also benefit multi-GPU support. The rise of low-level, low-abstraction APIs has also aided in multi-GPU scalability. It is now more common to see 70% scaling and up.
But we still don’t wholly recommend multi-GPU configurations, particularly given our present stance on the RTX lineup. When it doesn’t work, it burns, and nVidia does not have a strong track record in recent years for supporting its own technologies. VXAO and Flow are rarely used, if ever. MFAA vanished. SLI was shunted with Pascal, forced down to two-way and then forgotten. NVidia hasn’t even updated its own list of supported SLI games – and NVLink is SLI, in this regard – to include the most recent, compatible titles. The company couldn’t give more signals that it won’t support this technology, despite scaling actually improving year-over-year.
It’s better. That much is certain. It’s just a question of whether you can trust nVidia to continue pushing for multi-GPU adoption.
Editorial, Testing: Steve Burke
Video: Andrew Coleman
RTX 2080Ti with NVLINK — TensorFlow Performance (Includes Comparison with GTX 1080Ti, RTX 2070, 2080, 2080Ti and Titan V)
Table of Contents
- Test system
- Hardware
- Software
- Functionality and Peer-to-Peer Data Transfer Performance for 2 RTX 2080 GPU’s with NVLINK
- RTX 2080 Ti NVLINK «Capability» report from
nvidia-smi nvlink -c
- RTX 2080 Ti NVLINK «Capability» report from
- RTX 2080 Ti NVLINK Peer-To-Peer Performance:
simpleP2P
- Does NVLINK with two NVIDIA RTX 2080 Ti GPU use both Links for CUDA Memory Copy?
p2pBandwidthLatencyTest
- TensorFlow performance with 2 RTX 2080 Ti GPU’s and NVLINK
- TensorFlow CNN: ResNet-50
- ResNet-50 – GTX 1080Ti, RTX 2070, RTX 2080, RTX 2080Ti, Titan V – TensorFlow – Training performance (Images/second)
- TensorFlow LSTM: Big-LSTM 1 Billion Word Dataset
- «Big LSTM» – GTX 1080Ti, RTX 2070, RTX 2080, RTX 2080Ti, Titan V – TensorFlow – Training performance (words/second)
- Should you get an RTX 2080Ti (or two, or more) for machine learning work?
This post is a continuation of the NVIDIA RTX GPU testing I’ve done with TensorFlow in; NVLINK on RTX 2080 TensorFlow and Peer-to-Peer Performance with Linux and NVIDIA RTX 2080 Ti vs 2080 vs 1080 Ti vs Titan V, TensorFlow Performance with CUDA 10. 0. The same job runs as done in these previous two posts will be extended with dual RTX 2080Ti’s. I was also able to add performance numbers for a single RTX 2070.
If you have read the earlier posts then you may want to just scroll down and check out the new result tables and plots.
- Puget Systems Peak Single
- Intel Xeon-W 2175 14-core
- 128GB Memory
- 1TB Samsung NVMe M.2
- GPU’s
- GTX 1080Ti
- RTX 2070
- RTX 2080 (2)
- RTX 2080Ti (2)
- Titan V
- Ubuntu 18.04
- NVIDIA display driver 410.66 (from CUDA install) NOTE: The 410.48 driver that I used in previous testing was causing system restarts during the big LSTM testing with 2 RTX 2080Ti’s and NVLINK.
- CUDA 10.0 Source builds of,
- simpleP2P
- p2pBandwidthLatencyTest
- TensorFlow 1.10 and 1.4
- Docker 18.06.1-ce
- NVIDIA-Docker 2.
0.3
- NVIDIA NGC container registry
- Container image: nvcr.io/nvidia/tensorflow:18.08-py3 for «Big LSTM»
- Container image: nvcr.io/nvidia/tensorflow:18.03-py2 linked with NCCL and CUDA 9.0 for milti-GPU «CNN»
Two TensorFlow builds were used since the latest version of the TensorFlow docker image on NGC does not support multi-GPU for the CNN ResNet-50 training test job I like to use. For the «Big LSTM billion word» model training I use the latest container with TensorFlow 1.10 linked with CUDA 10.0. Both of the test programs are from «nvidia-examples» in the container instances.
For details on how I have Docker/NVIDIA-Docker configured on my workstation have a look at the following post along with the links it contains to the rest of that series of posts. How-To Setup NVIDIA Docker and NGC Registry on your Workstation – Part 5 Docker Performance and Resource Tuning
There are two links available:
GPU 0: GeForce RTX 2080 Ti (UUID:
- Link 0, P2P is supported: true
- Link 0, Access to system memory supported: true
- Link 0, P2P atomics supported: true
- Link 0, System memory atomics supported: true
- Link 0, SLI is supported: true
- Link 0, Link is supported: false
- Link 1, P2P is supported: true
- Link 1, Access to system memory supported: true
- Link 1, P2P atomics supported: true
- Link 1, System memory atomics supported: true
- Link 1, SLI is supported: true
- Link 1, Link is supported: false
Those two links get aggregated over the NVLINK bridge!
In summary, NVLINK with two RTX 2080 Ti GPU’s provides the following features and performance,
- Peer-to-Peer memory access: Yes
- Unified Virtual Addressing (UVA): Yes
Yes!
- cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 44.
87GB/s
That is twice the unidirectional bandwidth of the RTX 2080.
The terminal output below shows that two RTX 2080 Ti GPU’s with NVLINK provides,
-
Unidirectional Bandwidth: 48 GB/s
-
Bidirectional Bandwidth: 96 GB/s
-
Latency (Peer-To-Peer Disabled),
- GPU-GPU: 12 micro seconds
-
Latency (Peer-To-Peer Enabled),
- GPU-GPU: 1.3 micro seconds
Bidirectional bandwidth over NVLINK with 2 2080 Ti GPU’s is nearly 100 GB/sec!
P2P Connectivity Matrix D\D 0 1 0 1 1 1 1 1 Unidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 0 528.83 5.78 1 5.81 531.37 Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s) D\D 0 1 0 532.21 48.37 1 48.38 532.37 Bidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 0 535.76 11.31 1 11.42 536.52 Bidirectional P2P=Enabled Bandwidth Matrix (GB/s) D\D 0 1 0 535.72 96.40 1 96.40 534.63 P2P=Disabled Latency Matrix (us) GPU 0 1 0 1.93 12.10 1 12.92 1.91 CPU 0 1 0 3.77 8.49 1 8.52 3.75 P2P=Enabled Latency (P2P Writes) Matrix (us) GPU 0 1 0 1.93 1.34 1 1.34 1.92 CPU 0 1 0 3.79 3.08 1 3.07 3.76
First, don’t expect miracles from that 100GB/sec bidirectional bandwidth, …
The convolution neural network (CNN) and LSTM problems I’ll test will not expose much of the benefit of using NVLINK. This is because their multi-GPU algorithms achieve parallelism mostly by distributing data as independent batches of images or words across the two GPU’s.
There is little use of GPU-to-GPU communication. Algorithms with finer grained parallelism that need more direct data and instruction access across the GPU’s would benefit more.
The TensorFlow jobs the I have run with 2 GPU’s and NVLINK are giving around 6-8% performance boost. That is right around the percentage cost increase of adding the NVLINK bridge. It looks like you get what you pay for, which is a good thing! I haven’t tested anything yet where the (amazing) bandwidth will really help. You may have ideas where that would be a big help?? I have a lot more testing to do.
I am using benchmarks that I used in the recent post «NVLINK on RTX 2080 TensorFlow and Peer-to-Peer Performance with Linux». The CNN code I am using is from an older NGC docker image with TensorFlow 1.4 linked with CUDA 9.0 and NCCL. I’m using this in order to have a multi-GPU support utilizing the NCCL communication library for the CNN code. The most recent version of that code does not support this. The LSTM «Billion Word» benchmark I’m running is using the newer version with TensorFlow 1.10 link with CUDA 10.0.
I’ll give the command-line inputs for reference.
The tables and plots are getting bigger! I’ve been adding to the testing data over the last 3 posts. There is now comparison of GTX 1080 Ti, RTX 2070, 2080, 2080 Ti and Titan V.
Docker container image tensorflow:18.03-py2 from NGC,
docker run --runtime=nvidia --rm -it -v $HOME/projects:/projects nvcr.io/nvidia/tensorflow:18.03-py2
Example command line for job start,
NGC/tensorflow/nvidia-examples/cnn# python nvcnn.py --model=resnet50 --batch_size=64 --num_gpus=2 --fp16
Note, –fp16 means «use tensor-cores».
GPU | FP32 Images/sec |
FP16 (Tensor-cores) Images/sec |
---|---|---|
RTX 2070 | 192 | 280 |
GTX 1080 Ti | 207 | N/A |
RTX 2080 | 207 | 332 |
RTX 2080 Ti | 280 | 437 |
Titan V | 299 | 547 |
2 x RTX 2080 | 364 | 552 |
2 x RTX 2080+NVLINK | 373 | 566 |
2 x RTX 2080 Ti | 470 | 750 |
2 x RTX 2080 Ti+NVLINK | 500 | 776 |
Docker container image tensorflow:18. 09-py3 from NGC,
docker run --runtime=nvidia --rm -it -v $HOME/projects:/projects nvcr.io/nvidia/tensorflow:18.09-py3
Example job command-line,
/NGC/tensorflow/nvidia-examples/big_lstm# python single_lm_train.py --mode=train --logdir=./logs --num_gpus=2 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output/ --hpconfig run_profiler=False,max_time=90,num_steps=20,num_shards=8,num_layers=2,learning_rate=0.2,max_grad_norm=1,keep_prob=0.9,emb_size=1024,projected_size=1024,state_size=8192,num_sampled=8192,batch_size=256
GPU | FP32 Images/sec |
---|---|
RTX 2070 (Note:1) | 4740 |
GTX 1080 Ti | 6460 |
RTX 2080 (Note:1) | 5071 |
RTX 2080 Ti | 8945 |
Titan V (Note:2) | 7066 |
Titan V (Note:3) | 8373 |
2 x RTX 2080 | 8882 |
2 x RTX 2080+NVLINK | 9711 |
2 x RTX 2080 Ti | 15770 |
2 x RTX 2080 Ti+NVLINK | 16977 |
- Note:1 With only 8GB memory on the RTX 2070 and 2080 I had to drop the batch size down to 256 to keep from getting «out of memory» errors.
That typically has a big (downward) influence on performance.
- Note:2 For whatever reason this result for the Titan V is worse than expected. This is TensorFlow 1.10 linked with CUDA 10 running NVIDIA’s code for the LSTM model. The RTX 2080Ti performance was very good!
- Note:3 I re-ran the «big-LSTM» job on the Titan V using TensorFlow 1.4 linked with CUDA 9.0 and got results consistent with what I have seen in the past. I have no explanation for the slowdown with the newer version of «big-LSTM».
I’ve said it before … I think that is an obvious yes! For ML/AI work using fp32 or fp16 (tensor-cores) precision the new NVIDIA RTX 2080 Ti looks really good. The RTX 2080 Ti may seem expensive but I believe you are getting what you pay for. Two RTX 2080 Ti’s with the NVLINK bridge will cost less than a single Titan V and can give double (or more) of the performance in some cases. The Titan V is still the best value when you need fp64 (double precision). I would not hesitate to recommend the 2080 Ti for machine learning work.
I did get my first testing with the RTX 2070 in this post but I’m not sure if it is a good value or not for ML/AI work. However, from the limited testing here it looks like it would be a better value than the RTX 2080 if you have a tight budget.
I’m sure I will do 4 GPU testing before too long and that should be very interesting.
Happy computing! –dbk
Tags: CUDA, Machine Learning, ML/AI, NVIDIA, NVLINK, RTX 2080 Ti, TensorFlow
Help in choosing — Difficulties in choosing Palit 2080 | Page 3
Kullogr4mm
Forum friend
-
-
#41
dsg8 said:
according to rumors, 2070ti super will be presented at e3 in a couple of days, and upgraded cards on this line will be released, they promise to reduce prices for past rtx to 100 bucks
sli was good with hb bridge, but it is already outdated it is almost never used in new games
Click to expand.
..
I wonder how the type of bridge affects whether the game has SLI support or not. What kind of nonsense are you writing. You at least google technical information before writing such a thing. The bridge cannot become obsolete, new bridges simply have a higher bandwidth, the discrepancy in fps of a conventional bridge and hb — the error level, there are practically no advantages other than appearance. AT
generation rtx simply changed the connection interface, this does not affect SLI support in any way. https://overclockers.ru/lab/show/83…oj-geforce-gtx-1070-kogda-svedutsya-vse-mosty
6xRTX 3080 Ti + 5950X.
MomkinGamer
Forum friend
-
-
#42
Kullogr4mm said:
The bridge cannot become obsolete
Click to expand.
..
maybe he meant NVLink Bridge?
Kullogr4mm
Forum friend
-
-
#43
MomkinGamer said:
maybe he meant NVLink Bridge?
Click to expand…
NvLink 10 series not available. This is not the point, but the fact that NvLink also does not give any advantages in SLI scaling in real games compared to the old ones, although the throughput is several times higher. Perhaps in the future, when the cards are 5x more powerful, there will be a difference. I tested the cards on pci3.0x1 through the riser in the benches, compared to pci3.0x16, the performance drops by only 5-10%.
6xRTX 3080 Ti + 5950X.
Kullogr4mm
Forum friend
-
-
#44
Just 2*2080 is unreasonable for games in sli I think: more consumption from the socket, you need to remove heat, it requires more space, powerful power supply, the cost of the bridge nvLink 9k rub, and the output is zilch. It is better to take a 2080 Ti and overclock well. Here for 4k or 2K 144GHz — there is nowhere to go: you need to set 2 * 2080 Ti for a comfortable fps, because there is nothing more powerful than 2080 Ti for ordinary users. As the owner of 9800gx2, gtx 260 sli, 5970 (I took it in the summer of 2010, oh why didn’t I mine bitcoin then) and Gtx 560 sli I’ll say: oh, I suffered with them, and spat when 2 gp did not work, it would be better if I took gtx 580. So at that time, the scalability and support for sli was many times better than it is now.
Last edit:
6xRTX 3080 Ti + 5950X.
lalka
Experienced
-
-
#45
of course jetstream
there is nothing to even think about
skarm
-
-
#46
On the subject — personally I am for SJS. They run great. Cold.
MiningFamily(ECPiCo)
Nexthell
Own person
-
-
#47
2080 GameRock not pro, perfect. Turntables for 66 — heated up to 63
dsg8
Experienced
-
-
#48
Kullogr4mm said:
I wonder how the type of bridge affects whether the game has SLI support or not.
What kind of nonsense are you writing. You at least google technical information before writing such a thing. The bridge cannot become obsolete, new bridges simply have a higher bandwidth, the discrepancy in fps of a conventional bridge and hb — the error level, there are practically no advantages other than appearance. AT
generation rtx simply changed the connection interface, this does not affect SLI support in any way. https://overclockers.ru/lab/show/83…oj-geforce-gtx-1070-kogda-svedutsya-vse-mostyClick to expand…
I don’t know how you understood my writings, the technology itself is recognized as obsolete nvidia
this is the 3rd generation of bridges that appeared in pascal, what nonsense I write and tested myself in 4k and there are videos on youtube, almost x2 from two cards in if both are under 90% loaded, I was not going to compare with previous versions of bridges, but the difference there is under 20% comes from flex bridge
sli is the end, nvlink has come
Last edit:
nikolai
Forum friend
-
#49
Chopper said:
Excuse me, how many 4Kmonitors do you have?
I think not a single one ...
ps tube amplifier is a topic — not a single figure lay next to it. the speakers should also be a match for him, from the old USSR there were very decent AC-90s, they were quietly bought up by Panasonic at one time and remade under his own brand and sold at exorbitant prices
Click to expand…
Seriously? Also chtol buy ac 90 and brig 001
dsg8
Experienced
-
-
#50
Kullogr4mm said:
There is no NvLink on the 10th series. This is not the point, but the fact that NvLink also does not give any advantages in SLI scaling in real games compared to the old ones, although the throughput is several times higher.
Perhaps in the future, when the cards are 5x more powerful, there will be a difference. I tested the cards on pci3.0x1 through the riser in the benches, compared to pci3.0x16, the performance drops by only 5-10%.
Click to expand…
nvlink, as it were, was invented for server gpu clusters, sli is simply outdated
dmitriev11
Forum friend
-
-
#51
Payback period, as I understand it, nobody cares here? Your 2080 for 45 tr. per month now brings 2 tr.
Just to mine?)
Then take the most expensive card and make the backlight cool!
At least you won’t be ashamed in front of the boys! You are not some rogue!
Kullogr4mm
Forum friend
-
-
#52
dsg8 said:
nvlink was invented for server gpu clusters, sli is just outdated
Click to expand.
..
Don’t confuse: SLI is a technology, nvlink is a bus. SLI hasn’t gone anywhere.
6xRTX 3080 Ti + 5950X.
Anakoly
Forum friend
-
-
#53
nikolyai said:
Seriously? Also chtol buy ac 90 and brig 001
Click to expand…
Now there are easier options, edifire2700/2800, or 2730db with BT. I myself had s-90 + amphiton 25u-202s. I took the 2800 and didn’t lose anything, in my opinion the sound is even better. You can watch 9 videos on youtube0007
Sergios5
Forum friend
-
-
#54
2080 is not enough for an 8k Samsung monitor, you need at least 2 * 2080ti to take it out … So think about buying 2080, it’s not enough today …. I would not take it at all …
nikolai
Forum friend
-
-
#55
Anakoly said:
now there are simpler options, edifire2700/2800, or 2730db with BT.
I myself had s-90 + amphiton 25u-202s. I took the 2800 and didn’t lose anything, in my opinion the sound is even better. You can watch 9 videos on youtube0007
Click to expand…
No. I like old things. there is something in them
Anakoly
Forum friend
-
-
#56
nikolyai said:
Nope. I like old things. there is something in them
Click to expand…
well, yes, here for an amateur
Pro100Mininggg
Own person
-
-
#57
Sergios5 said:
2080 is not enough for an 8k Samsung monitor, you need at least 2 * 2080ti to take it out .
.. So think about buying a 2080, it’s not enough today …. I would not take it at all …
Click to expand…
Don’t worry, I don’t have an 8k monitor for 7k in Baku. By the way, for 1000 dollars for 1k it turns out. That is, for 1 pixel 1 dollar)))
As soon as I buy one, I will change the cards to 2x2080Ti, or whatever AMD will roll out by that time, HZ. At such a price for a monitor, video cards are a consumable.
dmitriev11 said:
The payback period, as I understand it, is of no interest to anyone here? Your 2080 for 45 tr. per month now brings 2 tr.
Just to mine?)Then take the most expensive card and make the backlight cool!
At least you won’t be ashamed in front of the boys! You are not some rogue!Click to expand…
Oh! Troll detected! Well, how are we without you, but in such a topic, you are our dear! We’ve been waiting for you here! How will you return from the last call, come in, I will show you the light bulbs and lights in the system unit with a glass cover, we will play tanks, or what is in fashion at shkololo now? Now it’s the holidays, don’t sit on the forums and troll from idleness . .. On the built-in INTEL, only the browser of the norms works, and sometimes it lags, it doesn’t give out above 60 fps, you come in, don’t be shy!
nikolyai said:
Nope. I like vintage things. there is something in them
Click to expand…
It’s bad that the S-90s are not shielded. Oh, how many cassettes they password-protected me in due time. I still regret some, although I don’t listen to them … there is nowhere to listen)
Last edit:
Alex1908
Experienced
-
-
#58
Satoshi for Tatoshi said:
but you can’t be the coolest gamer without 4k that’s the question
Click to expand.
..
Yes, this is a loss of losses.)))))))
Kullogr4mm said:
Lies. So you did not unscrew the graph for maxes. For 1070 and in HD, there are 10 game projects where it will be hammered at 35-45 fps and this is from 9900K. Run watch dogs 2 or the latest assassinc — any 4-core without HT, even 8350K@4.8, will hit the top 100.
View attachment 108885Click to expand…
Maybe. I just set the graphics to the maximum and maybe I needed to put it somewhere else. I don’t particularly follow the FPS, because. I look at the fact that the game was without brakes, and not at the instrument readings.
Soup and porridge is our food! — said the master to the serf, eating the sturgeon with black caviar.
Pro100Mininggg
Own person
-
-
#59
And so! As a result, I took JetStream. I’ll start with the cons:
1. Right next to the second VC. Places minimum!
2. Factory overclock is lower than GamingPro OC in stock it is really weaker than GamingPro OC, but we all know how to fix it)
Pros:
1. Much better cooling, but if you put the card higher than the 2nd one, it will overheat. She just has nowhere to take cold air.
2. When overclocked, it is faster than GamingPro OC. I don’t know what is involved. And acceleration keeps better. Maybe this is just my case.
3. Overclocking potential is higher, 2 8 pin connectors instead of 6 + 8 like GamingPro and in general the card is still older.
How Pegrev won. Firstly, I put JetStream below the second card, so the second card has «breathing» and JetStream does not heat up anyway.
Second — I put a big fan on top of the cards so that they have access to fresh air. I think that in this situation with a fan in the side cover, they will not heat up even when the case is closed, but I have a case with an open side cover.