Navi 5700x: Amazon.com: MSI Gaming Radeon Rx 5700 Xt 256-bit 8GB GDDR6 HDMI/DP Dual Fans Crossfire Freesync Navi Architecture Graphics Card (RX 5700 Xt Gaming X) : Everything Else

AMD Navi RX 5700 XT & Ryzen Specs, Overclocking, Architecture, Mem OC | GamersNexus

AMD’s technical press event bore information for both AMD Ryzen and AMD Navi, including overclocking information for Ryzen, Navi base, boost, and average clocks, architectural information and block diagrams, product-level specifications, and extreme overclocking information for Ryzen with liquid nitrogen. We understand both lines better now than before and can brief you on what AMD is working on. We’ll start with Navi specs, die size, and top-level architectural information, then move on to Ryzen. AMD also talked about ray tracing during its tech day, throwing some casual shade at NVIDIA in so doing, and we’ll also cover that here.

First, note that AMD did not give pricing to the press ahead of its livestream at E3, so this content will be live right around when the prices are announced. We’ll try to update with pricing information as soon as we see it, although we anticipate our video’s comments section will have the information immediately. UPDATE: Prices are $450 for the RX 5700 XT, $380 for the RX 5700.

AMD’s press event yielded a ton of interesting, useful information, especially on the architecture side. There was some marketing screwery in there, but a surprisingly low amount for this type of event. The biggest example was taking a thermographic image of two heatsinks to try and show comparative CPU temperature, even though the range was 23 to 27 degrees, which makes the delta look astronomically large despite being in common measurement error. Also, the heatsink actually should be hot because that means it’s working, and taking a thermographic image of a shiny metal object means you’re more showing reflected room temperature or encountering issues with emissivity, and ultimately they should just be showing junction temperature, anyway. This was our only major gripe with the event — otherwise, the information was technical, detailed, and generally free of marketing BS. Not completely free of it, but mostly. The biggest issue with the comparison was the 28-degree result that exited the already silly 23-27 degree range, making it look like 28 degrees was somehow massively overheating.

Let’s start with the GPU side.

GPU Specs, Frequency, Die Size

AMD has two GPUs it plans to release in July of 2019, with exact date TBD at time of filming. Ryzen 3000 mostly launches on July 7th, so the GPUs will push around the same time. The two cards will be the RX 5700 XT and RX 5700 non-XT, both moving away from GCN and toward the new RDNA architecture. We’ve known GCN to be extremely limited in cache bandwidth and in scalability past 56 CUs, so RDNA will be a refreshing change that should be interesting to study. GCN will stay alive for the high-performance deep learning and non-gaming market segments, but RDNA will be the gaming architecture going forward.

The RX 5700 XT will host 40 compute units with 64 stream processors per CU, or 2560 SPs for the 40 CU RX 5700 XT. This is the same SP per CU count as Vega 10, also at 64 stream processors per CU, but note that you can’t linearly compare the streaming processors 1:1. In an AMD performance demo, the company showed to press that its Navi 10 GPU – that’s the one used in the 5700 series – posts about 14% higher performance versus Vega 56 with about 23% lower power consumption. This was an AMD internal demo, so we obviously need to validate at homebase, but that uplift is promising for RDNA’s significantly smaller first GPU. Speeds were listed to media as 1905MHz boost, 1755MHz for gaming workloads, and 1605MHz base. The delta between boost and gaming tells us that AMD’s chosen blower cooler was once again a poor decision, to no one’s surprise, as there should be more thermal and power headroom to clock higher for gaming workloads if using a proper cooler. We’ll talk about the negatives more later, let’s get back to specs and positives first.

As for GPU size, AMD also provided that information to press. 2. This is important, and it’s for a lot of reasons: By significantly cutting the die size, and also by eliminating the highly expensive HBM2, AMD is able to cut its manufacturing cost significantly and finally start competing with NVIDIA directly in price. AMD claims its RX 5700 XT will be the direct RTX 2070 competitor.

Either way, we know for fact that Navi 10 GPU and GDDR6 costs will be lower than Vega 10 and HBM2; furthermore, if AMD’s claims of increased performance per watt with increased performance per area prove true, Navi may be a re-entry into the gaming market. We still want to see a flagship AMD card – AMD hasn’t had a true head-to-head top-end competitor with NVIDIA for a while, with Radeon VII being the closest but still falling short. A flagship class card is needed for halo marketing so that sales cascade down the stack, but for now, it looks like another mid-range foray by AMD, it’s just that this one is more promising than the refreshes and power-hungry brutes we’ve seen in the past. We’d advise, as always, tempering hype until we can see third-party benchmarks and numbers.

In short, the RX 5700 XT has 40 CUs, 8GB GDDR6 at 14Gbps for 448GB/s bandwidth, 1755MHz gaming frequencies, and 1905MHz boost, likely limited by the blower cooler. We’ll have to see what better solutions can do with more serious overclocks at a later date.

The RX 5700 non-XT will cut down to 36 CUs and 2304 stream processors, with an 8GB GDDR6 framebuffer and GPU speeds of 1725MHz boost, 1625MHz “gaming,” and 1465MHz base.

AMD was forced to acknowledge real-time ray tracing this time, but doesn’t have immediate support plans. From our end, we think this is fine, and that the market demand for real-time ray tracing features is low enough to be mostly irrelevant. That didn’t stop AMD from casting stones at NVIDIA, though. AMD mostly took the easy jabs, like its stated goal to “drive ecosystem so that gamers can take full advantage of the features they pay for.” Other digs included references to delayed uptake on RTX features. AMD is looking at support in the future, but isn’t immediately worried with real-time ray tracing support.

As a quick aside, note that AMD is planning a 7nm+ Navi GPU for late 2020 or 2021, and that it is aiming to support some form of hybrid ray-tracing in the future.

RX 5700 XT & 5700 New Overclocking Limits

Total board power should be 225W for the 5700 XT and 180W for the 5700 non-XT. We asked if VBIOS would be locked and, due to security concerns, AMD is locking down custom VBIOS modifications. We then asked if powerplay table hacks will still work this generation, but we were told that we’d have to learn new tricks to bypass the power limits. That’s a little bit of a let-down from the fringe use case of power modding, but we’re hopeful that we can find new power bypasses.

The reference card is a blower cooler with a vapor chamber, a design we’ve seen struggle in the past, and AMD claims to be listening to feedback. Unfortunately, the company still went with a blower cooler, but it has listened to some parts of criticism. One of those was noise levels, where AMD is now capping its noise level to 43dBA, although we don’t know at what distance that noise was measured. We do know previous cards stretched upwards of 59dBA with a 20” measurement.

The reference 5700 XT PCB has a 7-phase VRM. AMD couldn’t answer questions about what power stages were used, so that information doesn’t mean anything right now, but we did learn that the 5700 XT should be a 6+1 layout and that the 5700 should be 5+1. We also noticed that the cards don’t have a VBIOS switch, but we’re not sure if that’ll change for the full retail product. It probably will be single VBIOS.

Ryzen 3000 Memory OC, XOC 16-Core, PCIe Gen

Before diving deeper, let’s take a break from GPUs to talk CPUs. On the CPU side of things, we learned that AMD was hitting 5.35GHz all-core on the 16-core CPU with liquid nitrogen when using an MSI Godlike motherboard; this matches what we heard previously, and as a reminder, we also had a motherboard manufacturer at Computex tell us they were hitting 5. 5GHz and trying to approach 6GHz on the 12-core CPU. Critically, overclockers will no longer have to step down to PCIe Gen1 for benching and can run full PCIe Gen3 or Gen4. Cold bugs will depend on the CPU, and AMD was running its XOC at about -150 degrees Celsius.

For memory overclocking, Ryzen 3000 is supposed to fairly easily support 3600MHz XMP without issue, and we’ve also seen clocks on MSI boards up to 5100MHz with timings at 18-21-21-56-1T. That 5100MHz overclock was done with air cooling on the CPU and is a result of better trace layout and a new controller design.

Ryzen 3000 Architectural & Gaming Improvements

Some of the more noteworthy architectural changes to Ryzen 3000 CPUs focus on higher IPC and higher gaming performance. Interestingly, in AMD’s engineering discussion, the senior architects and AMD fellows thought Zen2 would see frequencies regressing. Engineers originally thought the jump to 7nm would result in lower clocks, relegating Zen2 to a server part. The team was able to figure it out, though, and managed to boost frequencies overall.

Key changes included the following:

AMD doubled the size of the floating point engine to 256-bit, which means AVX-256 can now run single-cycle on Zen2, so that’s a big change

AMD has doubled its load/store bandwidth for data storage and movement

Integer execution was also improved, primarily by reducing resource contention during integer operations

AMD also doubled the L3 cache size. More data in local cache means fewer hits to memory, but also reduced effective latency to system memory. AMD further highlighted that improvement in prefetching algorithms further reduces the effective memory latency. As an aside, AMD has made the unfortunate decision to rebrand L3 cache as “Game Cache,” so if you see that name in the future, that’s what it means.

AMD showed some more internal benchmark charts, and while we know there is improvement in gaming, we’d again advise to wait for our own benchmark testing. Most of the internal charts looked GPU-constrained, for instance, and so we will not be reproducing them here as we believe them to be misleading. We do anticipate performance uplift if only from the IPC and frequency jump, both of which will have direct impact. Memory is also more easily tuned to 3600MHz and beyond now, as mentioned in the OC section.

AMD also walked through architectural changes that we’ll speak to in more depth approaching launch. For some quick stats, we learned that AMD has moved its integer execution to a 92-entry integer scheduler from 84 previously, it has increased physical register file entries from 168 to 180, grown the reorder buffer from 192 to 224, which is meant to help with making-up for the latency hit by increasing L3 size, and AMD has increased the store queue from 44 to 48, with increased load/store bandwidth by 2x.

We have a lot more to talk about in this storyline. AMD gave us multiple pages of additional notes on architecture, product information, and software, and we’ll be digging through this incrementally over the next few days. For now, this content pieces will get us started. The RX 5700 XT and RX 5700 pricing will both be added to this content as we receive them, just note there may be a latency as we’ll be on planes home around when the AMD stream goes live. Note also that we’ll be keeping an eye out for more 3950X news from AMD.

More to come. We’re flying back home, so we’ll write-out the rest on the plane.

Editorial, Host: Steve Burke
Video: Keegan Gallick

Navi and the RDNA Architecture – WikiChip Fuse

February 23, 2020May 25, 2021

David Schor

7 nm, AMD, ISSCC, ISSCC 2020, N7P, Navi, Radeon, Radeon RX 5700

Last year AMD launched the Radeon RX 5700 Series. This GPU is part of AMD’s latest Navi microarchitecture which makes use of the new RDNA architecture. At ISSCC AMD gave a short talk about the 5700 and some of its design points. The speaker was Sal Dasgupta, director at AMD also serving as the lead the power and performance team on the RX 5700 project.

The reference board for the Radeon RX 5700 XT is shown in the slide below. As far as display is concerned, there is support for DisplayPort 1.4 and HDMI 2.0 which supports 4K at 60 fps and 1080p at 240. From a multimedia point of view, there is decode and encode support for h364 and h365. Although this isn’t in the slide, there is also VP9 decode support. The board interfaces with the system over sixteen lanes of PCIe Gen4. Finally, for the DRAM interface, the chip supports a 256-bit GDDR6 interface operating at 14 GT/s.

The 5700 uses GDDR6 memory. AMD has gone back and forth between HBM and GDDR a couple of times. Compared to GDDR5, GDDR6 operates at nearly double the data rate (14 GT/s vs 8). With an interface 256-bit wide, you are looking at a peak bandwidth of 448 GB/s which is 75% higher than the RX 580 but 8% lower than the Vega 64. Sal explained that the switch back from HBM2 to GDDR6 was really due to the platform cost. Given all the changes that were made and taking into account the platform power (incl. GDDR6 memory) and the bandwidth provided, it was determined that this was suitable enough for their target performance at the target cost. It goes without saying that future models might go a different route.

In addition to the interface itself, there are a couple of new power-related features that come with this move. Read Data Strobe (RDQS) support saves power when in idle power states. Moving from 8 GT/s to 14 GT/s was helped by a T-coil which helped improve the eye height by up to 16% and eye width by up 26%.

AMD reported significant improvements compared to the prior generation. AMD says it was able to increase the maximum turbo frequency by 23% while increasing the IPC by 25% while, at the same time, lowering the power consumption by 23% which in total equates to around 1.5x greater performance per watt. Some of those improvements come from the new process technology, some of it comes from better physical design, and some of it is architectural.

Some of the die and package specs are listed below. The new chip comprises 10.3 billion transistors on a 251 squared millimeter piece of silicon. One important key feature to point out is the underlying process technology which is TSMC’s 2nd-generation 7-nanometer node (N7P) which not only provides significant density improvement over AMD’s last node, GlobalFoundries 14-nanometer process but also offers slightly better performance and power over the company’s own first-generation 7 nm process.

At the center of the die is the graphics core. The entire north and south sides are the GDDR6 controllers and PHYs. On the east side of the die are multimedia, display, and bus interfaces. Dasgupta didn’t want to spend too much time on the floorplan which he said involved quite a bit of careful layout as AMD has additional papers on this coming up in future conferences. We’ll have to wait for those delighted papers for more details!

AMD made a lot of work on the physical design side. With over ten billion transistors on the chip and with highly parallel architecture, the large wire count alone poses new challenges on the power delivery. The 5700 includes buses that are up to 2048-bit wide. Although there has been a lot of physical reuse, there were still around 60 unique tile designs, each with one to two million instance-counts. Physical design played a big role in the careful placement of logic to alleviate routing issues. The 5700 is implemented using 13 metal layers. Custom routing in the upper low resistance wires is taken advantage of on critical paths to minimize wire delays and improve timing across. It’s a nice reminder that adding more compute units isn’t as simple as just finding a spot for them, there are rather complex physical routing issues involved as well.

The large graphics core mesh actually comprises seven smaller meshes in order to better handle the clock distribution while keeping the overhead at a minimum. AMD claims that the move resulted in a reduction of global clock skew by 30%. Another optimization that was done is introducing custom-designed clock buffers. Reducing parasitic capacitance in the drivers along with metal wire optimizations contributed to another 40% reduction in the power contribution of the clock mesh, normalized by area.

Within the local clock distribution, AMD adopted a configurable structured clock tree. The structured clock tree allowed them to reduce the median clock insertion delay by up to 50%, reducing jitter and variability. For those local clock trees, AMD went with a more bottom-up approach instead of a fixed-region cloning style methodology. The change allowed them to reduce the effective switched capacitance by up to 10% iso-process.

Short Comparison

The RX 5700 XT is the first generation to implement the RDNA architecture. Compared to the prior generations which supported GDDR5 and then HBM2, this chip switches back to GDDR6. The 5700 is manufactured on TSMC 7-nanometer process while the other two were based on GlobalFoundries 14-nanometer process. The other main points in the table are the boost clock and the board power which decreased while the clock improved. AMD says that when normalizing for iso-power and iso-configuration, the 5700 brings around 50% higher performance.

We want to bring your attention to the bottom right side of the slide below which reports on the contributors of this performance uplift. One of the major contributors to this 50% improvement in performance is the move to a 7-nanometer node which accounts for 30% of this alone. On top of that, the density advantage enables the remaining changes. It’s a nice reminder that despite what you might have been led to believe, the effects of transistor scaling and Moore’s Law remains the driving force behind new products. The remaining contributors are the higher clock and power improvement (10%) and the remaining 60% of that came from the performance per clock improvement as a result of the RDNA architecture.

–
Spotted an error? Help us fix it! Simply select the problematic text and press Ctrl+Enter to notify us.
–

meet Radeon RX 5700 XT and Radeon RX 5700

?
N21: 505.
N22: 340.
N23: 240. https://t.co/8GW19Ck9sm
> ± 5mm2.

— 遠坂小町@Komachi (@KOMACHI_ENSAKA) April 28, 2020

By comparison, the 7nm Vega 20 die in the Radeon VII is 331mm², while the 14nm Vega 10 die, familiar from the Radeon RX Vega 56/64, is 486mm². Even the 28nm Fiji core (596mm²) used by the Radeon R9 Fury and Fury X is not far behind the Navi 21.

Course

POBUDOVA BUSINESS PROCESS

Learn how to lead business processes appropriately to new company goals.

Presumably, Navi 21 will receive 80 Compute Units (CUs) on the new RDNA 2 architecture. From the first generation Graphics Core Next architecture, AMD uses a GPU building scheme in which 1 CU contains 64 stream processors and 4 TMU texture units. The same scheme is typical for the Radeon RX 5000 based on the RDNA architecture. That is, if AMD does not radically change the design, then Navi 21 can eventually get 5120 stream processors and 320 texture units. Thus, Navi 21 should be at least twice as productive as Navi 10, and this is not counting the improvements of the new RDNA 2 architecture.

It is known that GPUs based on RDNA 2 architecture will be produced at TSMC facilities using N7+ technology based on EUV lithography.

Previously, there were rumors that AMD plans to use GDDR6 memory — 24 GB or even 32 GB of memory — and a 384-bit or 512-bit bus.

At best, the AMD RX 5950 XT on the 7nm Navi 21 GPU will be released at the end of the year.