Gm204: NVIDIA GM204 GPU Specs | TechPowerUp GPU Database

Maxwell 2 Architecture: Introducing GM204

by Ryan Smithon September 18, 2014 10:30 PM EST

  • Posted in
  • GPUs
  • GeForce
  • Maxwell



The NVIDIA GeForce GTX 980 ReviewMaxwell 1 Architecture: The Story So FarMaxwell 2 Architecture: Introducing GM204Maxwell 2’s New Features: Direct3D 11.3 & VXGIDisplay Matters: HDMI 2.0, HEVC, & VR DirectBetter AA: Dynamic Super Resolution & Multi-Frame Sampled Anti-AliasingLaunching Today: GTX 980 & GTX 970Meet the GeForce GTX 980The TestMetro: Last LightCompany of Heroes 2Bioshock InfiniteBattlefield 4Crysis 3Crysis: WarheadTotal War: Rome 2ThiefGRID 2SyntheticsComputePower, Temperature, & NoiseOverclocking GTX 980Final Words

Now that we’ve had a chance to recap Maxwell 1 and what went into that architecture, let’s talk about the first of the second generation Maxwell GPUs, the GM204.

GM204 may be a second generation Maxwell part, but it is without question still a Maxwell part. Maxwell has learned some new tricks that we are going to cover here, but functionally speaking you can consider GM204 to be a bigger version of GM107, taking more SMMs and more ROP/memory partitions and using them to build a bigger, more powerful GPU.

With GM107 being built from 5 SMMs, GM204 is a bit more than a triple GM107. Altogether NVIDIA is building GM204 out of 16 SMMs, this time divided up into 4 GPCs instead of GM107’s single GPC. This is bound to 64 ROPs and 4 64bit memory controllers, which is a 4x increase in the number of ROPs compared to GM107, and a 2x increase in the memory bus size.

Drilling down to the SMMs for a second, there are a couple of small changes that need to be noted. Organizationally the GM204 SMM is identical to the GM107 SMM, however GM204 gets 96KB of shared memory versus 64KB on GM107. Separate from the combined L1/texture cache, this shared memory services a pair of SMMs and their associated texture units to further reduce the need to go to L2 cache or beyond.

The Polymorph Engines have also been updated. There are not any major performance differences with the 3.0 engines, but they are responsible for implementing some of the new functionality we’ll reference later.

Other than this, GM204’s SMM is identical to the GM107 SMM. This includes the use of 4 shared texture units per 2 SMMs, leading to a 16:1 compute-to-texture ratio, and a 512Kb register file for each SMM.

Compared to GK104 of course this is a more remarkable change. Compared to its immediate predecessor, GM204 sees significant differences in both the layout of the SMM and of the resulting chip, which means that even before accounting for feature differences we can’t just start counting functional units and directly comparing GM204 to GK104. GM204 is overall a more efficient chip, and although it possesses just 33% more CUDA cores than GK104 its performance advantage is much greater, on the order of 50% or more, highlighting the fact that NVIDIA is getting more work out of their CUDA cores than ever before. Altogether, NVIDIA tells us that on average they’re getting 40% more performance per core, which is one of the reasons why GTX 980 can beat even the full GK110 based GTX 780 Ti, with its 2880 CUDA cores.

Compute hardware aside, fleshing out GM204 is of course the ROP/memory partitions. Although the constituent hardware hasn’t changed much – we’re still looking at 7GHz GDDR5 memory controllers and the same pixel throughput per ROP – GM204 is very atypical for its configuration of these parts.

Until now, high-end NVIDIA designs have used an 8:1 ratio; 8 ROPs (or rather ROPs that process 8 pixels per clock) paired up with each 64bit memory controller. This gave GK104 32 ROPs, GK110 48 ROPs, and GM107 16 ROPs. However beginning with GM204 NVIDIA has increased the ROP-to-memory ratio and as a result has doubled their total ROP count compared to GK104. GM204 features a 16:1 ratio, giving us our first NVIDIA GPU with 64 ROPs.

Now the subject of ROPs is always a dicey one because of the nature of pixel operations. Unlike compute hardware, which can be scaled up rather effectively with more complex workloads and better caching methods, the same is not true for ROPs. ROPs are the ultimate memory bandwidth burner. They are paired with memory controllers specifically because the work they do – the Z testing, the pixel blending, the anti-aliasing – devours immense amounts of bandwidth. As a result, even if you are bottlenecked by ROP performance increasing the ROP count won’t necessarily be performance effective if those ROPs are going to be bandwidth starved.

NVIDIA ROP To Memory Controller Ratios
GPU ROP:MC Ratio Total ROPs
Maxwell (GM204) 16:1 64
Maxwell (GM107) 8:1 16
Kepler (GK110) 8:1 48
Fermi (GF110) 8:1 48
GT200 4:1 32

The last time NVIDIA increased their ROP ratio was for Fermi, when it went from 4:1 to 8:1. This was largely fueled by the introduction of GDDR5, whose higher data rates provided the bandwidth necessary to feed the greater number of ROPs. Since then GDDR5 clockspeeds have increased a bit for NVIDIA, from 4GHz to 7GHz, but so have ROP clockspeeds as well, meaning there hasn’t been a significant change in the ability for NVIDIA’s memory controllers to feed their ROPs since Fermi.

Consequently making the jump to a 16:1 means that change would need to happen somewhere else. This has led to NVIDIA approaching the problem from the other direction: instead of increasing the available memory bandwidth, what can they do to reduce it?

Color Compression

The solution, and really the key to making a 16:1 ROP ratio feasible, is the latest generation of NVIDIA’s delta color compression technology. Color compression in and of itself is not new technology, but over successive hardware generations NVIDIA has continued to iterate on it, and as such has continued to increase the amount of data they can compress.

NVIDIA first introduced color compression on the GeForce FX series, where it could compress data at up to a 4:1 ratio. The actual compressibility of any frame would in turn depend on the contents of the frame. At a most basic level NVIDIA would break down a frame into regions and then attempt to find smaller portions of redundant data to compress. Anti-aliasing was especially favorable here, as anti-aliasing samples would frequently all be of a fully covered triangle, resulting in all pixels being identical. In the case of regular color compression the key is finding whole regions of identical colors, at which point you could potentially compress them down by as much as 8:1.

More recently, in Fermi NVIDIA introduced delta color compression, which is designed to take color compression beyond simple regions containing identical pixels. Delta color compression is essentially focused on pattern compression instead of region compression, compressing based on the differences (delta) between pixels rather than how they’re identical; if you can describe how the pixels will differ from one-another, then you can save space describing the delta instead of the individual pixel. Delta color compression works off of the same blocks and essentially applies different delta patterns to them, attempting to find the best pattern for the block.

Delta compression is by its nature less efficient than whole color compression, topping out at just 2:1 compared to 8:1 for the latter. However a 2:1 ratio is still potentially a 50% reduction in data size, which is far better than letting the data go uncompressed. At 4×2 32bit pixels per region, this would mean reducing a region from 32 bytes to 16 bytes.

NVIDIA’s 3rd generation of color compression then is the latest iteration on this technology. The fundamentals between the various generations of delta color compression have not changed, but with each iteration NVIDIA has gained the ability to apply more and more patterns to the blocks to find better matches. 3rd generation delta color compression offers the most patterns yet, and the most opportunity to compress pixel blocks.

The importance of color compression cannot be understated. The impact of 3rd generation delta color compression is enough to reduce NVIDIA’s bandwidth requirements by 25% over Kepler, and again this comes just from having more delta patterns to choose from. In fact color compression is so important that NVIDIA will actually spend multiple cycles trying different compression ratios, simply because the memory bandwidth is more important than the computational time.

Getting back to our ROPs then, it’s the introduction of 3rd generation color compression, which alongside the larger 2MB L2 cache, makes a 16:1 ROP ratio on GM204 viable. Being able to feed 64 ROPs in turn helps NVIDIA’s overall performance, especially at higher resolutions. With 4K monitors taking off NVIDIA needs to be able to offer competitive performance at those resolutions, and while doubling the number of ROPs won’t double NVIDIA’s performance, it none the less is an essential part of being able to scale up performance for the needs of 4K. AMD for their part already went to 64 ROPs on their high-end GPU with Hawaii last year, and while the subject isn’t nearly as simple as just comparing ROP counts, it was one of the factors that resulted in the superior 4K performance scaling we saw from Hawaii cards.

Die Size & Power

Last but certainly not least, now that we’ve had a chance to discuss the architecture of GM204, let’s talk about its physical properties.

One of the problems posed by remaining on the 28nm process is that increasing CUDA core counts will result in larger GPUs. NVIDIA has actually done quite a bit of work on chip density, and as a result the increase in chip size is not going to be as great as the increase in the underlying hardware. Still, GM204 is a more powerful and more complex chip than GK104, and as a result die size and transistor count has gone up.

GM204 ends up weighing in at 5.2 billion transistors, with a die size of 398mm2. This compares to 3.54B transistors and a die size of 294mm2 for GK104, and 7.1B transistors and 551mm2 for GK110. Compared to either Kepler design the overall transistor density is improved, albeit not significantly so.

More important is the fact that GM204 ends up being NVIDIA’s largest xx4 class GPU. xx4 GPUs are typically NVIDIA’s midrange to high-end consumer workhorses, designed first and foremost for graphics and not packing the advanced compute features such as high speed FP64 and ECC memory support that we see in the big x00/x10 GPUs. For cost and overlap reasons NVIDIA’s sweet spot up until now has been around 300-350mm2, with GK104 coming in a hair ahead of the curve. But at just shy of 400mm2, GM204 is encroaching on newer, larger territory.

To some degree this is an inevitable result of remaining on the 28nm process. More performance requires more transistors, and as a result die size was destined to go up. None the less the fact that NVIDIA is fabricating such a large GPU as an xx4 GPU is remarkable. It provides a good example of just how much hardware (in terms of transistors) NVIDIA had to throw in to reach their performance goals. Alternatively, it’s telling that NVIDIA is now going to be able to use a 398mm2 chip as the basis of their high-end consumer video card, as opposed to having to use a 551mm2 chip in the form of GK110.

What’s particularly interesting though is that despite the big die, NVIDIA’s power consumption is exceptionally low. By historical standards GK104 was already a low power GPU for its size, this being the case particularly for GTX 680. GTX 680 was a 195W TDP part with a GPU Boost 1.0 power target of 170W. The GM204 based GTX 980 on the other hand, despite packing in nearly 1.5B more transistors for another 104mm2 of die size, actually consumes less power than said GK104 based card. At 165W TDP NVIDIA’s energy efficiency optimizations are in full effect, and it means NVIDIA consumes surprisingly little power for such a large GPU.

Impressively, all of this comes at the same time that NVIDIA is clocking the GPU at over 1.2GHz. This means we are not looking at a simple case of wide-and-slow, as is often the case for power optimized GPUs (see: SoCs). NVIDIA is clocking GM204 high and hitting it with over 1.2v, and yet it’s still able to maintain a 165W TDP in spite of its large die size. We’ll look at the competitive ramifications of this later, but to keep power consumption so low on such a large GPU really is a feather in NVIDIA’s cap.

Maxwell 1 Architecture: The Story So Far
Maxwell 2’s New Features: Direct3D 11.3 & VXGI
The NVIDIA GeForce GTX 980 ReviewMaxwell 1 Architecture: The Story So FarMaxwell 2 Architecture: Introducing GM204Maxwell 2’s New Features: Direct3D 11.3 & VXGIDisplay Matters: HDMI 2.0, HEVC, & VR DirectBetter AA: Dynamic Super Resolution & Multi-Frame Sampled Anti-AliasingLaunching Today: GTX 980 & GTX 970Meet the GeForce GTX 980The TestMetro: Last LightCompany of Heroes 2Bioshock InfiniteBattlefield 4Crysis 3Crysis: WarheadTotal War: Rome 2ThiefGRID 2SyntheticsComputePower, Temperature, & NoiseOverclocking GTX 980Final Words


NVIDIA Unleashes Maxwell GM204 Based GeForce GTX 980 and GeForce GTX 970 Graphics Cards

Today, NVIDIA finally unleashes their high-performance GeForce GTX 980 and GeForce GTX 970 graphics card. NVIDIA’s latest graphics cards feature the second generation Maxwell architecture which is the most advanced GPU ever built by NVIDIA featuring great performance and delivering higher efficiency in terms of power input. The Maxwell architecture revolutionizes the graphics industry setting new standards for NVIDIA gamers and fans.

We have waited for several years for Maxwell to hit the market. While NVIDIA initially released the first core as the GM107 which was based on the first generation architecture design, the GM204 is based on the latest and improved, second generation Maxwell core architecture which adopts some new technologies. So before we go talk about the cards, let’s take a recap of the architectural details we have come to know about GM204.

The GM204 is the heart of the next generation GeForce GTX 980 and GeForce GTX 970 graphics cards. The chip makes use of the second generation Maxwell core architecture that has faster per core performance than first generation Maxwell based chips (GM107) which were released with the GeForce GTX 750 and GeForce GTX 750 Ti graphics cards and has several new features which deliver better performance and great power efficiency making GeForce GTX 980 one of the most efficient flagship offering in history. NVIDIA has changed since their Kepler generation of cards. Before Kepler, NVIDIA was known to release cards which ran hot and consumed a ton of power and the failure rates of the previous generation cards were pretty high. Though, NVIDIA did manage to release some great cards over the period, the G80 based GeForce 8000 GTX and the price/performance king, GTX 460 are still considered one of the greatest NVIDIA cards that came to market.

Kepler changed certain things, NVIDIA moved away from their branding scheme where users were able to buy HPC chips rebranded for the GeForce audience. The first Kepler GPU, the GK104 was branded as the GeForce card and while it was fast, it wasn’t the fastest compared to another chip which NVIDIA had in their hands for over a year. I am talking about the GK110 which was geared towards the professional market such as the Tesla super computer. The GK110 did launch a year later but it since then, NVIDIA has configured their core lineup to span two generations, one with the gaming minded chip and the follow up would be the full fledged HPC chip. There’s a reason for NVIDIA to advertise the GeForce GTX Titan Z as a professional and gaming card even though it clearly has a GeForce name in its branding. The Titan Z makes use of two GK110 chips which are the compute crunching beasts compared to the GK104 which focused on gaming features by excluding all the non essential features such as compute. So while the GM104 based GTX 980 will obviously replace GK110 based GTX 780 Ti in branding, the real comparison in branding should be GM104 versus the GK104. Regardless of this, the GTX 980 is a superb card which beats GK110 on an existing process node.

This is all achieved with the 28nm process node so one can imagine the numbers we can expect when NVIDIA hops to an even lower process in the future. Alright, so the GM204 has two variants, the GM204-400 which is fused on the GeForce GTX 980 and the GM204-200 which is fused on the GeForce GTX 970. The fully enabled GM204 chip features 4 GPC (Graphics Processing Clusters) which feature four SMM blocks each. These blocks include four logic units each which consist of 32 cores so in total, a single SMM unit results in 128 Cores while the 16 blocks available on the GM204-400 chip equate to 2048 CUDA Cores. The GM204-200 has three less SMM units which result in a lower core count of 1664 thus making it around as fast as the GeForce GTX 780 while the GTX 980 will tackle the GeForce GTX 980 with a good 15-20% performance lead.

The most critical details of the chip are the transistor number and we all remember that the GK110 chip was a performance and computing beast at 7.08 Billion transistors while the GK104 included 3.54 Billion transistors. The GM204 includes 5.2 Billion transistors crammed inside a die that measures around 398 mm2 just 2 mm2 shy of 400mm2. The GK104 and GK110 measure at around 294 mm2 and 581 mm2 respectively. The die size has been increased a lot compared to GK104 and that’s the generational predecessor of the card. The GK110 will be replaced by GM200 but that is far from launch at the moment but NVIDIA has managed to include more on the 28nm process yet keeping the power consumption at just 165W on the GTX 980 and 148W for the GTX 970 which is simply mind boggling.

The GM204 GPU features 128 texture mapping units which was the standard amount featured on the GK104 but the raster operation units have been upped from 32 on GTX 680 and 48 on GTX 780 Ti to 64 on the GTX 980 graphics card. This is actually a larger update than GK110 but the GK110 does come with a very high TMU count of 240. NVIDIA compensates this by clocking the GM204 chip hence resulting in a higher per clock performance output when it comes to texture fill rate. Maxwell was also meant to improve the way GPU handles bandwidth and they are limiting the bandwidth dependancy of their cards by adding more cache of 2 MB which is 512 KB L2 more than GK110. The GK104 had just 256 KB of L2 cache so a major update there.

The theoretical compute of the chip in single precision would be rated around 4.6 TFLOPs which is really close to the GK110 which pumps out 5.1 TFLOps while the 1144 GT/s texture fill rate is a bit low but the pixel fillrate is considerably higher at 72.1 GP/s compared to 53. 3 GP/s on GTX 780 Ti.

NVIDIA has some new software side enhancements through the hardware implemented in Maxwell which include Dynamic Super Resolution which is basically a second version of down sampling that functions to increase video quality at 1080P that matches 4K resolution.  There’s also Delta Color Compression which is similar to the color compression we saw on AMD’s Tonga that compresses images to a lossless format so that overall quality is maintained allowing the GPU core to read and write the compressed data easily. A more refined version which saves images in local memory to be used later on to increase memory efficiency is used in Maxwell.

Then there’s Multi-Pixel Programming Sampling technology which improves randomization of each sample and reduces quantification artifacts for better geometry processing and anti aliasing filtering. An update on the display side is that GeForce GTX 980 adopts the HDMI 2.0 standard which goes in well with the new display standard of three Display Ports 1,2, 1 DVI, 1 HDMI outputs set by NVIDIA for their flagship offering.

2 of 9

NVIDIA GM204 Block Diagram:

NVIDIA GM204 SMM Unit Block Diagram:

The NVIDIA SMM or SM (Streaming Multiprocessor Maxwell) units are a update over the Kepler SMX. Each logic unit is split into four parts consisting of 32 cores, each of the SM unit houses 128 cores. The 128 core count is lower than the 192 Cores featured on the SMX unit on Kepler but do note that the Maxwell second generation cores are a good 40+ faster than Kepler cores. The new design also simplifies the architecture and the overall scheduling resulting in a considerable drop in power consumption and delays.

Probably one of the most major talks surrounding the Maxwell cards were their low memory bus compared to their GK110 based predecessors. The slide posted below clearly shows that due to a new and improved ram architecture, NVIDIA has enhanced the bandwidth efficiency where 7.0 Gbps DRAM can deliver an effective throughput of 9.3 Gbps in gaming. Hence even with lower bandwidth, the entire need of the available band width has gone considerably down which results in better performance throughput and utilization.

On the other hand, the performance numbers of Maxwell just keep on getting better and better with up to 3 times the energy efficiency of Kepler. Note that while people will think that NVIDIA should have compensated power for more performance, the actual fact is that the card performs good and the lower TDP results in higher stability and overclocking numbers from non-reference and custom variants. The Maxwell architecture will also scale down from Tegra chips all the way to the top end GM200 based HPC parts so energy efficiency does matter.

NVIDIA is not only introducing a new core architecture but along with it several new technologies. There are six key updates to Maxwell that enable new algorithms and superior image quality compared to previous released cards.

The NVIDIA Maxwell core architecture adds the new tiled resources and multi-projection technology for voxel grids (future VXGI) which enhances global illumination. The DirectX 11.2 API makes use of 3D Tiled Resources that allows hardware managed virtual memory for the graphics processing unit and has several Tier-2 features supported such as Shader LOD clamp and mapped status feedback, mini/max reduction filtering and reads from non-mapped title returns 0.

Conservative Raster Technology:

First up, we have conservative raster technology which improves voxeliazation, improving the accuracy of voxel coverage calculation. A mapped path of pixels will be covered if they are already covered by a triangle which is the conservative raster enabler which notices both orange and purple colors and covers them conserving the time it requires for calculation. This enables new rendering algorithms and the result of this voxelization tech improves performance by three times with the new hardware enabled acceleration available on Maxwell.

MFAA or Multi-Framed Sampled Anti-Aliasing Algorithm:

NVIDIA has been ahead in the anti-aliasing game for some time releasing new algorithms each passing generation. Their recent updates include MLAA, FXAA, TXAA and now, NVIDIA introduces the latest MFAA (Multi-Framed Sample Anti-Aliasing) technology which is an ultra efficient anti aliasing software design that delivers 30% more performance and the same quality as 4xMSAA.

NVIDIA Dynamic Super Resolution — 4K Quality on a 1080P Display:

One of the new features Maxwell supports in DSR or Dynamic Super Resolution. You can call it a new version of down sampling which has become a trend in PC gaming. The technology is enabled on GeForce 900 series cards only and can be enabled through GeForce Experience (set to enabled by default). The main purpose of down sampling is to deliver higher resolution quality down scaled to a smaller resolution monitor. So regardless of your monitor size, it can display superior image quality than what it’s built to show as standard.

NVIDIA Flex, Gameworks, FlameWorks, HairWorks, GodRays Technologies:

NVIDIA’s Flex is the latest unified GPU PhysX system which allows developers to use a combination of rigid body and fluid simulations. In past game development processes, it was hard to let the two simulations work aside each other due to their complex nature but NVIDIA’s Flex with the right tools would unify this process allowing the use of both rigid body and fluid simulations.

Next up is the new GI Works SDK which is the short term for Global Illumination Works which allows real-time global illumination in any scene required. Currently, developers use pre-backed global illumination effects in their scenes placing several light sources in a particular place which is a burden for developers and at the same time, it gives off a non-dynamic presentation. This is solved with the use of real-time global illumination which is more realistic and offers a more dynamic experience to gamers.

Last up is the Flame Works SDK which includes a film-quality volumetric effect solution to render flame and smoke. NVIDIA is adding these features along with various other effects in alot of upcoming titles such as Batman: Arkham Origins, Witcher 3: The Wild Hunt, Assassins Creed IV: Black Flag, Watch Dogs. Some of the new titles such as Project Cars and the multi-million dollar funded Star Citizen are also offering rich NVIDIA Turbulence and NVIDIA PhysX and PhysX Particles support plus HBAO+, TXAA, Cloth Simulation and many more to name.

NVIDIA also showcased several slides during the event at their conference at GDC 2014 which confirm that their next generation FleX Unified PhysX and Turbulence particle effects are officially headed for PC and would be inte grated in Unreal Engine 4 and CryEngine. The Turbulence particles will be added to Unreal Engine 3, 4 and Cry Engine via a patch while FleX would be headed to Unreal Engine 4. Only PC is the supported platform for these new features so titles developed exclusively for PC or multi-plat titles which are optimized for PC will adopt the new features.

The NVIDIA GeForce GTX 980 is the flagship GeForce 900 series offering and the fastest Maxwell card to launch in the market. From top to bottom, the GeForce GTX 980 is a well built card featuring better performance, low power consumption and several new gaming and architecture side enhancements. The NVIDIA GeForce GTX 980 include 2048 CUDA Cores, 128 TMUs, 64 ROPs. The core clock is maintained at 1126 MHz core and 1216 MHz boost while the memory is clocked in at 7 GHz effective clock which results in 224 GB/s bandwidth. The TDP of the card is set at 165W while the power is fed through dual 6-Pin power connectors.

The GeForce GTX 980 is making use of an update revision of the NVTTM cooler introduced on the GeForce GTX Titan Black with a all black naming logo etched on the shroud near the I/O plate and a all black heatsink which can be spotted from the mirror cut out in the center of the shroud. The card obviously makes of vapor chamber which is cooler off by a blower fan. We were unable to find the Dual Axial fan design which NVIDIA had patented back a few months and was rumored to be a part of the new graphics card series but I expect the card even as it is will do a great job cooling the card considering it can dissipate heat of up to 275W while GeForce GTX 980 will have a maximum thermal dissipation power of just under 170W. So that’s a ton of cooling being supplied to the core and we can expect massive overclocking headroom for a card which is already clocked past the 1216 MHz barrier.

Back to the cooler design, the NVTTM does include some minor changes along the display ports isolating it inside the shroud entirely. One of the changes I like the most is the addition of the backplate which is carried over from the GeForce GTX Titan Z. The card features two SLI Gold fingers which will allow 4-Way SLI Multi GPU functionality. The GeForce GTX 980 is fed power through dual 6-Pin connectors and while there is space for an 8-Pin connector, NVIDIA will just feature two 6-Pin as a reference design leaving its AIB partners to do the rest in the form of custom designs. Display outputs include DVI, HDMI and three display ports which is one reason for the unusually large size of the display connector. The bracket is also updated with a new layout since the cut outs for exhaust look similar to the ones featured on the GeForce GTX Titan Z.

The PCB has been modified to a more brute design, NVIDIA can be seen using eight Samsung K4G41325FC-HC28 128M x 32. A total of eight of these modules have been featured which equate to 4 GB GDDR5 VRAM across a 256-bit bus. The voltage controller has been moved below the power connectors and the power delivery includes 5 Phases compared to 6 on the GeForce GTX 780 Ti. At the same time, we can see a large array of VRMs aside the chokes which will deliver unprecedented amount of overclocking performance even on the reference designs. The NVIDIA GeForce GTX 980 will retail at $549 US while non-reference models will retail at around $599 US pricing.

The NVIDIA GeForce GTX 970 is the most surprising part in the Maxwell lineup coming in at a price of just $329 US. NVIDIA’s GeForce GX 970 features 13 SMM units placed in 4 GPC (Graphics Processing Clusters). Since each SMM unit has 128 CUDA cores, 32 in each logic unit (32 x 4), the total number of CUDA cores equates to 1664 on the die. From the first generation Maxwell core architecture, we learned that a Maxwell SMM (Streaming Multiprocessor Maxwell) unit has 128 cores compared to 192 on the current generation Kepler SMX units. The specifications equate to a total of 1664 CUDA Cores, 104 TMUs and 64 ROPs.

Along with that, we have a 4 GB GDDR5 memory running across a 256-Bit memory interface clocked at 1753 MHz (7. 00 GHz Effective) which pumps out 224.4 GB/s bandwidth. The core clock is maintained at 1051 MHz and 1178 MHz boost clock something which I was expecting if the cards were to be able to take on the GK110 core based graphics cards. Lastly, we have the fill rate numbers which amount to 33.6 GPixels/s Pixel and 145.0 GTexels/s Texture fill rates. The GeForce GTX 970 will be available in both reference and non-reference variants at launch which will retail at a range of $329 to $349 US. Display ports on the reference models will stick with the DVI, HDMI and three display ports. The card uses HDMI 2.0 technology and will be powered by dual 6-Pin connectors. AIB partners may offer different display output configurations but the cards would be fully compatible with G-Sync monitors.

NVIDIA Maxwell GeForce GTX 980 and GeForce GTX 970 Reviews:

  • NVIDIA GeForce GTX 980 Review @ Anandtech
  • NVIDIA GeForce GTX 980 Review @ Techpowerup
  • NVIDIA GeForce GTX 980 Review  @ Hardwarecanucks
  • NVIDIA GeForce GTX 980 Review @ Guru3D
  • NVIDIA GeForce GTX 980 Review @ HardOCP
  • NVIDIA GeForce GTX 980 Review @ PCPer
  • NVIDIA GeForce GTX 980 Review @ Bit-Tech
  • NVIDIA GeForce GTX 980 Review @ Overclock3d
  • NVIDIA GeForce GTX 980 Review @ TechReport
  • NVIDIA GeForce GTX 980 Review @ Hexus
  • NVIDIA GeForce GTX 980 Review @ MaximumPC
  • NVIDIA GeForce GTX 980 Review @ Techspot
  • NVIDIA GeForce GTX 980 Review @ Tweaktown
  • NVIDIA GeForce GTX 980 and GeForce GTX 970 Review @ PCPOP
  • NVIDIA GeForce GTX 970 Review @ Wccftech

NVIDIA GM204 Die Shot:

NVIDIA GeForce GTX 970 and GTX 980 Specifications:

GeForce GTX 570 GeForce GTX 580 GeForce GTX 670 GeForce GTX 680 GeForce GTX 770 GeForce GTX 780 GeForce GTX 780 Ti GeForce GTX 970 GeForce GTX 980
Codename GF110 GF110 GK104 GK104 GK104 GK110 GK110 GM204 GM204
Process 40nm 40nm 28nm 28nm 28nm 28nm 28nm 28nm 28nm
GPU Core Fermi Fermi Kepler Kepler Kepler Kepler Kepler Maxwell Maxwell
SM Units 15 x 32 16 x 32 7 x 192 8 x 192 8 x 192 12 x 192 15 x 192 13 x 128 16 x 128
CUDA Cores 480 512 1344 1536 1536 2304 2880 1664 2048
ROPS 40 48 32 32 32 48 48 64 64
TMUs 60 64 112 128 128 192 240 104 128
Core Clock 732 MHz 772 MHz 915 MHz 1006 MHz 1046 MHz 863 MHz 875 MHz 1051 MHz 1126 MHz
Boost Clock 1464 MHz 1544 MHz (Shader Clock) 980 MHz 1058 MHz 1085 MHz 900 MHz 928 MHz 1178 MHz 1216 MHz
Memory 1. 2 GB GDDR5 1.5 GB GDDR5 2 GB GDDR5 2 GB GDDR5 2 GB GDDR5 3 GB GDDR5 3 GB GDDR5 4 GB GDDR5 4 GB GDDR5
Memory Bus 320-Bit 384-Bit 256-bit 256-bit 256-bit 384-Bit 384-Bit 256-bit 256-bit
Memory Clock 3.80 GB/s 4.0 GB/s 6.0 GHz 6.0 GHz 7.0 GHz 6.0 GHz 7.0 GHz 7.0 GHz 7.0 GHz
Memory Bandwidth 152.00 GB/s 192.4 GB/s 192.0 GB/s 192.0 GB/s 224.5 GB/s 288.6 GB/s 336.0 GB/s 224.5 GB/s 224.5 GB/s
Texture Fill Rate GT/s 43.92 49.41 102.5 128.8 134 166 210 145.0 TBC
TDP 219W 244W 170W 192W 220W 250W 250W 148W 165W
Power Connectors 6+6 Pin 8+6 Pin 6+6 Pin 6+6 Pin 8+6 Pin 8+6 Pin 8+6 Pin 6+6 Pin 6+6 Pin
DirectX 12 Support Yes Yes Yes Yes Yes Yes Yes Yes Yes
Launch December 7th 2010 November 09 2010 May 10th 2012 March 22nd 2012 May 30th 2013 May 23rd 2013 December 2013 18th September 2014 18th September 2014
Price $349 US $499 US $349 US $499 US $349 US $499 US $699 US $329 Reference
$329+ Custom
$549 Reference
$549+ Custom

 NVIDIA GeForce GTX 980 and GTX 970 Custom Models Gallery:

2 of 9

NVIDIA Maxwell GM204 Slides:

Share this story



what do we know and what do we not know about NVIDIA GM204?

2023 © All rights reserved