Memory bus graphics card: Reddit — Dive into anything

GPU Memory Bandwidth

Photo by Rafael Pol / Unsplash

This blog breaks down one of the most overlooked GPU characteristics: memory bandwidth. We will dive into what GPU memory bandwidth is and look at why it should be taken into consideration as one of the qualities an ML expert should look for in a machine learning platform.

Understanding the memory needs for machine learning is an important component of the development process of a model. It is, nevertheless, sometimes easy to overlook.

The basic GPU anatomy

A graphics card, like a motherboard, is a printed circuit board that holds a processor, a memory, and a power management unit. It also has a BIOS chip, which retains the card’s settings and performs startup diagnostics on the memory, input, and output.

The graphics processing unit (GPU) on a graphics card is somewhat analogous to the CPU on a computer’s motherboard. A GPU, on the other hand, is designed to do the complicated mathematical and geometric calculations required for graphics rendering or other machine learning related applications.

Nvidia GTX 780 PCB Layout, Source

For a graphics card, the computing unit (GPU) is connected to the memory unit (VRAM, short for Video Random Access Memory) via a Bus called the memory interface.

Throughout a computer system, there are numerous memory interfaces. A memory interface is the physical bit-width of the memory bus as it relates to the GPU. Data is sent to and from the on-card memory every clock cycle (billions of times per second). The physical count of bits that may fit along the bus every clock cycle is the width of this interface, which is usually described as «384-bit» or something similar. A 384-bit memory interface allows 384 bits of data to be transferred each clock cycle. So, in establishing maximum memory throughput on a GPU, the memory interface is also an important part of the memory bandwidth calculation. As a result, NVIDIA and AMD are more likely to employ standardized serial point-to-point buses in their graphics cards. The POD125 standard, for example, is used by the A4000, A5000, and A6000 NVIDIA Ampere series graphics cards you can find available for Paperspace users, which essentially describes the communication protocol with GDDR6 vRAMs.

When it comes to memory bandwidth, latency is a second factor to consider. Originally, general-purpose buses such as the VMEbus and S-100 bus were implemented, but contemporary memory buses are designed to connect directly to VRAM chips to reduce latency.

In the case of GDDR5 and GDDR6 memories, which are one of the newest forms of GPU memory standards. Each memory is made up of two chips, each with a 32-bit bus (two parallel 16-bits) that allows multiple memory accesses at the same time. As a result, a GPU with a 256-bit memory interface will have eight GDDR6 memory chips.

Another standard for Memory types is HBM and HBM2 (high bandwidth memory v1 and v2), with these standards each HBM interface is 1024 bits offering generally higher bandwidths than GDDR5 and GDDR6.

The external PCI-Expression connection between the motherboard and the graphics card is not to be confused with this internal memory interface. This bus is also characterized by its bandwidth and speed, although it is orders of magnitude slower.

Get started

What is GPU Memory bandwidth ?

The GPU’s memory bandwidth determines how fast it can move data from/to memory (vRAM) to the computation cores. It’s a more representative indicator than GPU Memory Speed. It is determined by the data transmission speed between memory and computation cores, as well as the number of individual parallel links in the bus between these two parts.

Absolute memory bandwidths in consumer devices have increased by several orders of magnitude since the early 1980s home computers (~1MB/s), but available compute resources have increased even faster, and the only way to avoid constantly hitting bandwidth limits is to ensure that workloads and resources have same order of magnitude in terms of memory size and bandwidth.

Let’s take a look for example at one of the state of the art of ML oriented GPUs, the NVIDIA RTX A4000:

It comes with 16 GB of GDDR6 memory, 256-bit memory interface (number of individual links on the bus between the GPU and VRAM) and an astonishing number of CUDA Cores reaching 6144. With all these memory related characteristics, the A4000 can reach a memory bandwidth of 448 GB/s.

Other GPUs that are available for Gradient users, also offers some high performance memory characteristics:

GPU	vRAM	Memory interface width	Memory Bandwidth
P4000	8GB GDDR5	256-bit	243 GB/s
P5000	16GB GDDR5X	256-bit	288 GB/s
P6000	24GB GDDR5X	384-bit	432 GB/s
V100	32GB HBM2	4096-bit	900 GB/s
RTX4000	8GB GDDR6	256-bit	416 GB/s
RTX5000	16GB GDDR6	256-bit	448 GB/s
A4000	16GB GDDR6	256-bit	448 GB/s
A5000	24GB GDDR6	384-bit	768 GB/s
A6000	48GB GDDR6	384-bit	768 GB/s
A100	40GB HBM2	5120-bit	1555 GB/s

Why do we need high memory bandwidth for machine learning applications ?

The effect of memory bandwidth is not inherently obvious. If it’s too slow, the system will bottleneck, meaning all those thousands of GPU compute cores will be idle while they wait for a memory response. And, depending on the type of application the GPU is used for, data blocks can be processed repeatedly by the GPU (call it T times), then the external PCI bandwidth must be 1/Tth of the GPU internal bandwidth.

The most common use of a GPU demonstrates the above limitation. For example, a Model training program would load training data into GDDR RAM and make several runs for a neural network layer in the compute cores, for hours at a time. So the ratio of PCI bus bandwidth to GPU internal bandwidth can be up 20 to one.

The amount of memory bandwidth required is entirely dependent on the type of project you’re working on. For example, if you’re working on a deep learning project that relies on large volumes of data being fed, reprocessed, and continuously restored in memory, you’ll need a wider memory bandwidth. For a video and image-based machine learning project, the requirements for memory and memory bandwidth are not as low as they are for a natural language processing or a sound processing project. For most of the average projects, a good ballpark figure is 300 GB/s to 500 GB/s. This isn’t always the case, but it’s usually enough memory bandwidth to accommodate a wide range of visual data machine learning applications.

Let’s look at an example of deep learning memory bandwidth requirements validation:

If we consider the 50-Layer ResNet, which has over 25 million weight parameters, and if we use 32-bit floating point to store a single parameter, it would take around 0.8GB of memory space. So, during parallel computing with a mini-batch of size 32 for example, we would need 25.6GB of memory to be loaded during each model pass. With a GPU like the A100 capable of 19.5 TFLOPs and considering that the ResNet model uses 497 GFLOPs in a single pass (for the case of a feature size 7 x 7 x 2048) we would be able to do around 39 full passes per second, which would lead to a bandwidth need of 998 GB/s. So the A100 with its bandwidth of 1555 GB/s would be able to handle this model efficiently and stay far away from bottle-necking.

How to optimize models for lower memory bandwidth usage ?

Machine learning algorithms in general and Deep neural networks in the computer vision field in particular, induce a large memory and memory bandwidth footprint. Some techniques can be used for deploying ML models in resource constrained contexts or even in powerful cloud ML services to reduce cost and time. Here are some of the strategies that can be implemented:

Partial fitting: If the dataset is too large to fit in a single pass. Instead of fitting a model on the data all at once, this feature allows you to fit a model on the data in stages. So it takes a piece of data, fits it to get a weight vector, then continues on to the next piece of data, fits it to get another weight vector, and so on. Needless to say, this lowers VRAM use while increasing training duration. The most significant flaw is that not all algorithms and implementations utilize partial fit or can be technically adjusted to do so. Nonetheless, it should be taken into account wherever possible.

Dimensionality reduction: This is important not only for reducing training time but also for reducing memory consumption during runtime. Some techniques, such as Principal component analysis (PCA), Linear discriminant analysis (LDA), or Matrix Factorization, can drastically reduce dimensionality and yield subsets of the input variables with fewer features while retaining some of the original data’s important qualities.

Sparse matrix: When dealing with a sparse matrix, storing only the non-zero entries can result in significant memory savings. Different data structures can be utilized depending on the number and distribution of non-zero items, resulting in significant memory savings as compared to the basic technique. The trade-off is that accessing individual components becomes more difficult, and extra structures are required to retrieve the original matrix without ambiguity, necessitating the use of more core computes in exchange for lower memory bandwidth utilization.

Conclusion

Understanding the memory bandwidth requirements for machine learning is a crucial part of the model construction process. You now know what memory bandwidth is as a result of reading this article. Following a review of the relevance and how memory bandwidth requirements can be assessed. We discussed some of the methods for reducing bandwidth usage and lowering costs by selecting a less powerful cloud package while maintaining timing and accuracy criteria.

Get started

Understanding video RAM memory bandwidth

Video card information
How to select a video card How to install a video card How to troubleshoot video card problems The big fat table of video cards AGP compatibility for sticklers DVI compatibility for sticklers Troubleshooting AGP Troubleshoot your video card by underclocking Diagnose your video card problems by comparing with example corrupted screens Troubleshooting DVI problems The basics of 3D graphics without making your eyes glaze over — just enough to get you started How to uninstall your current display drivers How to install display drivers for your video card Understanding video RAM memory bandwidth What kind of expansion slot should you use for your video card? VGA video card outputs

Video card information

How to select a video card
How to install a video card
How to troubleshoot video card problems
The big fat table of video cards
AGP compatibility for sticklers
DVI compatibility for sticklers
Troubleshooting AGP
Troubleshoot your video card by underclocking
Diagnose your video card problems by comparing with example corrupted screens
Troubleshooting DVI problems
The basics of 3D graphics without making your eyes glaze over — just enough to get you started
How to uninstall your current display drivers
How to install display drivers for your video card
Understanding video RAM memory bandwidth
What kind of expansion slot should you use for your video card?
VGA video card outputs

Understanding video RAM memory bandwidth

The basics of memory bandwidth

One of the main things you need to consider when selecting a
video card
is the memory bandwidth of the
video RAM.
Memory bandwidth is basically the speed of the video RAM. It’s measured in
gigabytes per second (GB/s). The more memory bandwidth you have, the better.
A video card with higher memory bandwidth can draw faster and draw higher
quality images. But there’s more to video cards than just memory bandwidth.
You also have to consider the drawing speed of the
GPU.
There’s little point in getting a video card with a very fast GPU and limited
memory bandwidth because the memory will be the bottleneck. The GPU will
spend a lot of time doing nothing while waiting for its slow video RAM. By the
same token, you don’t want to get a video card with a slow GPU and very high
memory bandwidth. This page addresses only the subject of memory bandwidth.

The memory bandwidth is determined by the memory clock, the memory type, and
the memory width. The memory clock is the clock rate of the memory chips.
Current (2006) memory chips have clock rates which range from about 167
MHz
to
1000 MHz. The most common memory type is double data rate
(DDR)
which means that it transfers two memory values for each memory clock cycle.
There are also other kinds of DDR like DDR2, GDDR3, and GDDR4 and they also
transfer at twice the memory clock rate. Some very old video cards still use
single data rate (SDR) which transfers one value per clock cycle. The memory
width of the common cards range from 32 bits to 256 bits. The maximum
theoretical memory bandwidth is the product of the memory clock, the
transfers per clock based on the memory type, and the memory width. For
example, a video card with 200 MHz DDR video RAM which is 128 bits wide has a
bandwidth of 200 MHz times 2 times 128 bits which works out to 6.4
GB/s.
This table contains the video RAM
bandwidth for many video cards in its
RAM speed column. If
you take a look at those memory bandwidths, you can see how much they vary
between fast video cards and slow ones.

Be careful about memory width when buying low-end video cards!

If you check the video card table
carefully, you’ll notice that there are some low-end video cards which can
come with either 64 bit or 128 bit memory widths. There are also some cards
which can be either 32 bits or 64 bits wide. For example, the Radeon 9550
comes in both a 128 bit model
and a 64 bit model. The
companies design the
GPUs
to support a certain memory width. The memory width is matched to the needs
of the GPU. Unfortunately, video card makers often make slightly cheaper
models which use cheaper video RAM which only uses half of the width
available on the GPU. That cuts the memory bandwidth in half and almost
always seriously harms the video card’s performance. Those «half-width»
models usually spend lots of time with the GPU doing nothing while waiting
for the slow video RAM to respond. The really sad part is that the
full-width cards are usually only a little more expensive than the half-width
cards. The half-width cards are usually a very bad deal if you care at all
about performance. Unfortunately, many of the websites selling these video
cards don’t tell you the memory width or give you an incorrect value. I’m not
just talking about fly-by-night websites. Some of the largest websites which
list memory widths often list the full-width value even for the half-width
versions of the video card. And if you buy your video cards from a retail
store by reading specifications on the box you’re still in trouble because
most of the half-width cards don’t list their memory widths at all.

So the question is, how do you determine if you’re buying a half-width or
full-width card? Some of the manufacturers are nice enough to provide accurate
specifications which provide the clock rates and memory widths. So the safest
way to be sure is to search for the exact model you are interested in on the
manufacturer’s website and read the technical specifications. Speaking from
experience, with the low-end cards you have about a 50/50 chance of getting
the information you need from the manufacturer’s website. And if the
information is on the manufacturer’s websites, you still can’t always trust
it. I’ve seen some cases of manufacturer’s websites which list the
full-width value for some half-width models. I doubt it’s on purpose. It
usually just looks like a mistake. You also have to be careful when
reading the specs on the video card because lots of things which sound like
the video memory width actually have nothing to do with it. None of the
following descriptions have anything to do with the memory width.

128-bit floating-point color precision allows for a greater range of colors and brightness
Highly Optimized 128-bit 2D engine with support for new WindowsXP GDI extensions
128-bit, studio-quality floating point precision through the entire graphics pipeline
Native support for 128-bit floating point, 64-bit floating point and 32-bit integer rendering modes
True 128bit studio precision color
256-bit graphics architecture
64-bit floating point texture filtering and blending
250 MHz Engine Clock
3. 8 Billion texels/ sec fill rate

All of the following descriptions refer to the video memory systems.

Description	Memory Type	Memory Clock	Memory Width	Bandwidth
64/128-bit advanced memory interface	?	?	64 bits or 128 bits	?
128-bit advanced memory interface	?	?	128 bits	?
16/32 MB SDRAM	SDR	?	?	?
128/256 MB DDR SDRAM	DDR	?	?	?
400 MHz Memory Clock	?	400 MHz	?	?
8. 0 GB/sec memory bandwidth (128bit, 500 MHz)	DDR	250 MHz	128 bits	8.0GB/s
30.4 GB/sec memory bandwidth	?	?	?	30.4GB/s

You can, in many cases, figure out the memory width by carefully examining
the pictures of the video card which are available on many of the websites
which sell them. Newegg, for example,
usually shows pictures of both sides of the video card. You can also often
use google to find Internet reviews of a
video card which includes closeup pictures. But in order to find the memory
width from the images, you need to learn some arcane information about GPUs,
circuit boards, and RAM packaging. If you don’t want to learn this
(fascinating only to computer geeks) information, then you should just try to find a
model which has memory width information on the manufacturer’s website. But
if you have a limited selection of cards, then you may get stuck learning how
to find the memory width by looking carefully at the card. It tends to be the
low-end video cards which do not publish their true memory bandwidth. If
you’re buying a low-end card then you definitely have to be careful to avoid
the half-width models. Those cards are not that fast in the first place and the
last thing you need is to make things worse by getting a card with low memory
bandwidth.

The rest of this section is a bit technical so you should probably only
continue with this if you cannot find the information you need on the
manufacturer’s website (or you just like to be extremely careful when buying
things). The first thing you need to know is what video RAM looks like. A
video card has lots of silicon chips but only some of them are RAM chips. In
the pictures below, the RAM chips have a green «X» on them.

There are usually four or eight RAM chips on a video card but some very
low-end cards have just one or two. Sometimes the RAM
chips are all on the front of the card and other times half of the RAM chips
are on the front and half are on the back. All of the RAM chips are
identical. They’re easy to identify because they are placed very close to the
GPU.
The GPU is a large chip which has a large heatsink and often has a fan.
Some high-end video cards also have heatsinks covering the RAM chips. In
cases like that, you just have to go on the manufacturer’s specifications
since you can’t see the chips in the images.

Now you need to check the RAM chip packages. The «package» refers to the
black plastic package which encloses the chip. The pictures below show the
most common RAM chip packages.

You need to check which kind of package the RAM chips use. TSOPs (thin small
outline package) have pins (the little metal wires sticking out the sides of
the black plastic part) on opposite sides on the package. The TSOP 66 has 66
pins and is a very common package. The TSOP 86 has 86 pins and is much less
common. You may have to look carefully at pictures of the video card and
count the pins to figure out which one you’re looking at. The TQFP 100 (thin
quad flat pack) package has a total of 100 pins sticking out all four sides
of its package. The BGA 144 (ball grid array) doesn’t actually have pins
which you can see. There are 144 solder balls underneath the package but it’s
not hard to identify BGA packages because they are just small packages with
no visible pins.

The reason you need to recognize the chip package is because it helps you
guess how «wide» the RAM chip is. RAM chips are a certain number of bits
wide. The most common RAM chips used right now (late 2006) are 16 bits or
32 bits wide. These are usually refered to as «x16» and «x32» which are
pronounced «by 16» and «by 32». The only way to be absolutely sure about the
width of a RAM chip is to read the manufacturer’s number off the top of the
chip and then look it up (usually pretty easy with
google). But the kinds of pictures you
find on websites are rarely sharp enough to allow you to read the numbers so
you’re stuck guessing at the RAM width by looking at the packages. The TSOP
66 package can be a maximum of 16 bits wide. TSOP 66s can occasionally be 8
bits wide but that is very rare in any kind of video card you’re likely to
run into. If you are looking at a TSOP 66 on a video card built since about
2000, it is almost certainly a x16 RAM chip. The TSOP 66 is the «standard»
x16 RAM chip so it is very common. The TSOP 86 is much less common and is
normally a x32 chip. A TQFP 100 is almost always a x32 chip. BGA packages can
vary a bit, but BGA RAMs on video cards are almost always x32. So if you’re
looking at a TSOP 66, it’s probably a x16 chip. If you have any of the other
three packages shown above, it’s probably a x32 chip. If you have anything
else, then you just have to get by with what you can find on the
manufacturer’s website.

To figure out the total memory width of virtually all modern video cards, all
you have to do is multiply the width of each RAM by the total number of RAM
chips on the card. Unfortunately, there are some exceptions to the «multiply»
rule but they are fairly uncommon. Some very old video cards do not follow
the rule but you shouldn’t be buying those anyway. Another exception is a
card where the multiply rule gives you a result which is twice the maximum
number of bits supported by the GPU. In that case, of course, the real memory
width is the maximum bits supported by the GPU. That case comes up
occasionally when a manufacturer uses the same circuit board for two models:
one with a certain amount of RAM (like 128 MB), and another with twice that
amount of RAM (like 256 MB), but both models support the maximum memory width.

If you’re considering low-end cards then there is one very common case to
watch out for. The image above shows GeForce FX 5700 LE with 128 megabytes
of video RAM. This particular card has two models: the 128
MB
model, and a 256 MB model. It has
room for eight RAM chips on the circuit board but the 128 MB model only uses
four RAM chips. The 256 MB model has all eight RAM chips. If you check the
RAM width column in the video card table,
you’ll see that the width for an FX 5700 LE can be
64 bits or
128 bits. Many manufacturers
just produce a single circuit board to make both the 128 MB and 256 MB models.
Then they only include four RAM chips to make the 128 MB cards. Unfortunately,
this cuts the memory width in half in every example I’ve been able to verify.
The card shown above is a 64 bit wide card. It is extremely common to find
websites selling the 128 MB version which claim that it’s a 128 bit wide card
even when it’s actually 64 bits wide. You also find this case often with
GeForce FX 5200s, Radeon 9200s, Radeon 9250s, Radeon 9550s, and others.

Once you have the memory width, you can use it and the memory type and memory
clock to calculate the peak memory bandwidth. If you’re looking at a video
card which has two different memory widths, then it is definitely worth the
trouble to make sure you know what you’re getting. The marketing
specifications on the models with the smaller memory width seldom go out of
their way to point out the shortcomings of that model. If the marketing
information doesn’t clearly state the memory bandwidth, then you can usually
assume the worst. And if you’re looking at low-end video cards, be absolutely
sure to avoid the models with half the maximum memory width. The cards with
half the memory bandwidth are usually only a little cheaper but their
performance is much lower.

HyperMemory and TurboCache

Both ATI and NVIDIA (two large
GPU
makers) have designed low-end video cards which complicate the whole memory
bandwidth issue. NVIDIA’s implementation is called
TurboCache.
ATI’s is called
HyperMemory.
Cards which implement TurboCache are often called «TC» models and HyperMemory
is often shortened to «HM». You need to watch out for these kinds of cards
because their memory systems are very different from most video cards.

Both of these kinds of video cards borrow
RAM
from the
motherboard
to use as
video RAM.
These video cards have a total useable amount of RAM which is the sum of both
the video RAM actually on the video card plus the RAM borrowed from the
motherboard. These cards are produced because it’s cheaper to
borrow RAM from the motherboard than it is to include «real» video RAM on the
video card. Unfortunately, it often results in a very slow video card.

Both HyperMemory and TurboCache video cards can get a bit shifty when it
comes to their specifications. They tend to emphasize the total useable video
RAM (which includes RAM borrowed from the motherboard) and deemphasize the
actual amount of video RAM on the card. For example, a common model is sold
as a «128 megabyte» video card but it actually only contains 32 megabytes
of real video RAM. The other 96
MB
is borrowed from the motherboard. You’ll sell a lot more
video cards claiming to have 128 MB than a video card claiming to have 32 MB
so you can guess which number is printed in larger letters on the box.

But this page is about memory bandwidth. Here too, these kinds of cards are
often marketed deceptively. HyperMemory and TurboCache access the motherboard
RAM through a
PCI-Express x16
slot. That kind of slot has a peak read speed or 4
GB/s
and can simultanenously write at 4 GB/s. You can read more about this kind
of slot on this page. HyperMemory and
TurboCache can access the motherboard RAM simultaneously with the video RAM
on the card. Unscrupulous vendors sometimes quote their memory bandwidth as
the sum of the three bandwidths: the actual video RAM bandwidth, the
PCI-Express x16 read speed, and the PCI-Expess x16 write speed. So they
actually add 8 GB/s to the real memory bandwidth for their marketing
specifications. This wildly exaggerates the real memory bandwidth of the
card. While it’s true that it will give you the theoretical peak memory
bandwidth, you’ll never even get close to that number in real life. First of
all, video cards read from their RAM far more than they write to it. Adding
in both the 4 GB/s read speed and the 4 GB/s write speed is ludicrous.
Secondly, you need to remember that the video card isn’t the only thing in
your computer which needs to access your motherboard RAM. There’s a voracious
consumer of motherboard RAM bandwidth called the
CPU
which is keeping it plenty busy. The video card has to share access with the
CPU so it rarely gets close to the 4 GB/s theoretical read limit.

The actual RAM on the video card is usually a very small amount of memory
which has a very small memory width like 32 or 64 bits. This results in some very low
real video RAM bandwidth in many models. In HyperMemory and TurboCache
cards, the memory clock rate of the half-width video RAM is usually higher
than the clock rate for the same model with full-width RAM so the slow models
are usually faster than half the speed of the full-width models. But they’re
still very slow. Many of these cards only have tiny amounts of real video RAM
like 16 or 32 megabytes. So most of the video card data will end up on the
motherboard if you’re trying to run any modern games. One upside is that the
video cards are smart enough to store the most commonly used data in the
video RAM on the video card. That softens the blow of having to access data
stored in the relatively slow motherboard RAM.

Since the GPU can overlap accesses to both the real video RAM and the RAM on
the motherboard, it’s not fair to specify the memory bandwidth as just the
speed of the real video RAM. But then again it doesn’t get anywhere near to the
video RAM plus 8 GB/s number which you often see in marketing literature. The
only accurate way to check the performance is by comparing to «regular» video
cards which don’t borrow RAM from the motherboard.
This
page gives a pretty good comparison of HyperMemory and TurboCache cards
to regular video cards.

Power supply information
Compatibility issues for ATX power supplies and motherboards A short history of PC power supply voltage rails So what’s all this rubbish about multiple 12 volt rails? All about the various PC power supply cables and connectors Rail complications #1 — current limit problems: too much current Rail complications #2 — cross loading problems: unbalanced current Rail complications #3 — minimum loading problems: too little current Using PC power supplies in things other than PCs

Power supply information

Compatibility issues for ATX power supplies and motherboards
A short history of PC power supply voltage rails
So what’s all this rubbish about multiple 12 volt rails?
All about the various PC power supply cables and connectors
Rail complications #1 — current limit problems: too much current
Rail complications #2 — cross loading problems: unbalanced current
Rail complications #3 — minimum loading problems: too little current
Using PC power supplies in things other than PCs

Useful technical information
How to install your motherboard chipset drivers Test your motherboard memory with Memtest86 Torture test your CPU with Prime95 How to find specifications for your computer Create a system restore point in case something goes wrong Getting administrator privileges Fiddling with your BIOS How to boot into safe mode or VGA mode Rate your CPU speed at games Glossary of technical terms

Random stuff
My favorite RollerCoaster Tycoon 3 coasters PeepFactory peep generator for RollerCoaster Tycoon 3

Is a 256-bit memory bus really necessary in middle-end video cards? Theory and practice.

/ Video cards

3DNews Video cards General information Is a 256-bit memory bus really necessary in …

The most interesting news

The performance limitation of mid-range video cards due to the «narrow» 128-bit memory bus is greatly exaggerated. Throw away regrets, we have practically nothing to lose

This material is rather theoretical (at least for the most part) and is devoted to studying the influence of the video memory bus width on the performance of graphics accelerators. It would seem, what is there to test? And so it is clear that video cards with a memory bus width of 256-bit will be more productive than the same video cards with a memory bus width of 128-bit, for example. But don’t jump to conclusions. Maybe relatively weak GPUs don’t need a wide memory bus? When video chip developers “cut” the width of the video memory bus, is it a deliberate creation of a “budget” solution (in order to fit into the required price range) or a sober calculation based on the fact that more is not needed? We will try to deal with these questions.

For experiments, we decided to take a rather old video card — GeForce 7600GT. At the time of its appearance on the market, this product was a typical representative of the middle class and had all the key features of a middle-end solution. This is both a 128-bit memory bus and a relatively weak video processor. The choice of this video card is due to the fact that you can pair it with an analogue that practically does not differ in video processor performance, but has a 256-bit memory bus. Many probably already guessed what we mean, but we will talk about this in detail a little later. In the meantime, let’s try to find out how much the 128-bit wide memory bus limits the performance of the GPU on the 7600GT video card.

Preliminary notes

We will use the number of FPS (frames per second) that the video card will show in the Quake 4 test as a measure of video card performance. The screen resolution was chosen equal to 1280×1024 pixels and remained unchanged throughout all the tests. This is a typical resolution of most modern monitors with a diagonal of 17-19 inches. The graphics mode in the game itself was set to “High Quality”, with the help of video drivers, the testing modes “NO AA / AF” or “4AA / 16AF” were selected. The following test bench was used as a test platform:

This stand is not a performance champion among modern CPUs. However, as has been repeatedly shown, when testing middle-end video cards, CPU performance is not a limiting factor.

Standard frequencies for GeForce 7600GT are 560/700 MHz for GPU/video memory respectively. For video memory, the real frequency is indicated in megahertz, and not the effective one (1400 MHz DDR), this was done only for the convenience of plotting graphs. For the same reasons, we will set the initial frequency for the 7600GT video processor to exactly 600 MHz rather than 560 MHz.

Testing

What will the testing be? How do you even know how much the memory bus width limits GPU performance? Let’s consider the situation as follows. We have a video card that does a certain «work», and based on the FPS value at the output, we will draw certain conclusions. We have two parameters at our disposal that we can change — the frequency of the GPU and the frequency of the video memory of the video card. Obviously, the frequency of the video memory directly determines the bandwidth of the video memory, with other parameters unchanged (such as the width of the memory bus). To determine how much the performance of a video card is limited by the «speed» of the video memory, we will build the following graph.

We lowered the video memory frequency on the 7600GT video card to 200 MHz (real), and then increased it in 50 MHz steps. Of course, in reality, no one will voluntarily lower the frequency of video memory, the meaning is different. If we have two parameters that affect the final result, and we assume that one of the parameters is a «limiter», then with a linear increase in this parameter, we should observe a linear increase in the final value. To check whether this is true or not, let’s plot two tangents to the graph on the above graph, on the left and on the right.

As can be seen from the graph, on the left side, the increase in FPS with an increase in the frequency of video memory is linear, that is, the video memory bandwidth is clearly not enough and it really is a limiting factor. As the frequency of the video memory increases, the tangent to the graph begins to «tilt» towards the X axis, therefore, increasing the frequency of the video memory becomes less effective means of improving the performance of the video card. Theoretically, if we could increase the frequency of the video memory as high as we like, sooner or later we would see how the graph line would become parallel to the X axis, which means that the overall performance would already be limited only by the power of the video processor. Theory is theory, but is it possible to see this in practice? Maybe why not. Since we cannot greatly overclock the video memory, let’s simulate a similar situation by reducing the performance of the GPU, while leaving the frequencies for the video memory unchanged. In the following graph, we lowered the GPU frequency on the video card to 300 MHz.

As you can see, the theory is being confirmed. When the actual video memory frequency becomes twice as high as the GPU frequency, we get practically no increase in results, even with a memory bus width of 128 bits. However, this specific conclusion can only apply to the 7600GT video processor, so let’s not rush to generalizations. Now let’s see what happens if we «heavier» the graphics mode by turning on full-screen anti-aliasing and anisotropic filtering.

Obviously, the main load falls on the video memory. For a GPU frequency of 600 MHz, we observe an almost linear increase in the results with an increase in the frequency of the video memory. And for a GPU frequency of 300 MHz, there is no longer a horizontal «shelf» on the graph, which would indicate excessive video memory bandwidth. It is noteworthy that on the left side of the graphics for different GPU frequencies merge into one line. Apparently, here the speed of video memory limits the overall performance so much that there is no difference between GPUs operating at 600 MHz and 300 MHz.

The most inquisitive readers have probably already asked themselves the question — «what is the optimal combination of GPU / video memory frequencies?». As we have seen, if the video memory frequency is too low, the GPU cannot show its full potential. But it doesn’t make much sense to increase the frequency of the video memory too much, since the results stop growing.
As usual, it is unlikely that this question will be answered unequivocally, because the optimal combination of frequencies depends both on the GPU architecture and on the «severity» of the graphics mode, not to mention the fact that it can change from one game to another.

For those who prefer to study problems from different points of view, here are a couple more graphs. Their difference from the previous ones is that this time we fixed the real frequency of the video memory (and not the GPU) at 300 MHz and 600 MHz. Here’s what happened for the mode without AA / AF.

If we set the real video memory frequency on the GeForce 7600GT to 300 MHz, then changing the GPU frequency does not affect the overall performance at all, and we get a horizontal «saturation line» on the graphics. If the memory frequency is 600 MHz, then the increase in the results from increasing the frequency of the GPU is more noticeable, but again, as soon as the GPU frequency reaches 600 MHz, the increase in results practically stops.

If we build similar graphs for the 4AA / 16AF mode, we will already see two “saturation lines”. Quite a natural result, since in a heavier graphics mode the performance of the video card «rests» on the speed of the video memory.

Let’s sum up the intermediate results. As follows from the graphs above, under our test conditions for a 7600GT video card with a memory bus width of 128 bits, the optimal ratio of the actual video memory frequency to the graphics processor frequency is approximately 1.5-2 : 1. That is, if the video processor frequency is 600 MHz, video memory frequency should be at level 900-1200 MHz (real). The recommended frequencies for a typical 7600GT are 560/700 MHz for GPU/video memory, so their ratio is 1:1.25, which is slightly less than the «optimal» level we found.

As we have repeatedly emphasized, this «optimal» ratio is valid for the 7600GT with a 128-bit memory bus width. What happens if we double the memory bus width? In theory, from the point of view of increasing the maximum bandwidth of video memory, it will also double, which could be regarded as doubling the operating frequency of video memory with a bus width of 128 bits and, therefore, will more closely correspond to the «optimal» ratio of the frequency of the GPU and video memory. Like it or not, we will now try to find out.

7600GT 256-bit memory

You will say that such video cards do not exist in nature. In general, yes, it doesn’t. But there are other video cards, from which, if desired, you can get an analogue of the GeForce 7600GT with a memory bus width of 256 bits. Actually, we did just that. We took the ASUS EN7900GS TOP video card, which has nominal frequencies of 590/720 MHz and a 256-bit memory bus width. Then, using the RivaTuner utility, we turned off the pixel and vertex blocks in such a way as to bring them into exact correspondence with the GeForce 7600GT pipeline formula — 12p, 5v. Thus, we have at our disposal an analog of the 7600GT video card, which has similar GPU characteristics to the 7600GT, but has a memory bus width of 256-bit instead of 128. On the graphs, we will display the results of this video card in red.

Below is a chart similar to Chart #1 and supplemented by the results obtained on an analog GeForce 7600GT with a 256-bit memory bus width.

It is obvious that increasing the width of the video memory bus significantly improves performance. At low video memory frequencies (left side of the graph), the performance gain of 7600GT 256-bit is almost twofold compared to the regular 7600GT 128-bit. As the video memory frequency increases, the relative superiority of the 256-bit version of the 7600GT decreases, and at a typical video memory frequency of 700 MHz it is only 26%, which is quite a natural result. It was shown above that with an increase in the frequency of video memory, sooner or later, we will get a horizontal line of results on the graph, when the overall performance of the video card is no longer limited by memory and depends only on the GPU. It is clear that for the 256-bit version of the 7600GT such a «saturation mode» will be reached faster.

As for the practical side of the issue, the 26% superiority of the variant with a 256-bit memory bus width over the 128-bit 7600GT is, of course, significant, but video card manufacturers, it turns out, are not so wrong when they limit themselves to the bus in middle-end products memory 128-bit. After all, they, in addition to pure productivity, have to take into account economic issues. And the development of a video processor with a 256-bit memory controller will be more difficult, and therefore more expensive, not to mention the more complicated and more expensive design of the printed circuit board itself.

In simple terms, it makes no sense for a not the most powerful video processor to connect video memory via a wider bus. Quake 4 is already quite old, new games place higher demands on GPU performance, and it’s not a fact that even with a wide memory bus, the average GPU can handle it. This can be illustrated with a simple example. Let’s build another graph, under the same conditions as the previous one, but we will reduce the GPU frequency by half.

And what do we see? With a weak GPU and a nominal video memory frequency, the difference in the results of 7600GT 128-bit and 256-bit is only 12%. So, if you have a weak GPU, there is no need to complain about the lack of memory bus bandwidth. A weak video processor simply will not be able to take full advantage of it. It is quite possible that the same fact explains the popularity of GDDR2 memory in the low-end video card segment, where GPU performance is so cut that it is simply pointless to install faster video memory.

Well, with a simple graphics mode, everything is clear. And what happens if we turn on full-screen anti-aliasing and anisotropic filtering?

Obviously, as the load on the video memory increases, the performance gain from the wide memory bus becomes more noticeable, and at a typical video memory frequency of 700 MHz it is 60%. Here, perhaps, one can regret that middle-end products are not equipped with a memory bus width of 256-bit (or higher). On the other hand, how many modern games do you know that can be played at 4AA/16AF settings on mid-range graphics cards? That’s it. If you manage to set high-quality graphics in the game, then full-screen anti-aliasing is, as a rule, out of the question. And this is quite a typical situation for video cards of the middle segment and modern games, which has been repeated for many years.

And finally, here are a couple more charts. The results on the first of them were obtained under the following conditions: the GPU frequency was changed, while the video memory frequency remained unchanged. For the standard 7600GT 128-bit memory frequency was set to 600 MHz, for the emulated 7600GT 256-bit video memory frequency was set to 300 MHz. Thus, the maximum theoretical bandwidth of video memory for these two video cards was the same. Now let’s see how efficiently the wider memory bus is used depending on the GPU frequency.

Starting with a GPU frequency of 300 MHz, a video card with a 128-bit memory bus shows better results. It turns out that, other things being equal, a narrow memory bus is more efficient in terms of performance. Maybe this is because we used the simple graphics mode? Let’s enable AA/AF and check again.

Strange as it may seem, even on this graph we see the superiority of the 7600GT 128-bit over its 256-bit counterpart, and even more noticeable. Apparently, a slow but wide memory bus is not used as efficiently as a narrow but fast one.

Conclusion

As it turned out in the course of this testing, the 128-bit wide memory bus for video processors of the middle-end class is definitely a performance constraint. But the extent of this limitation should not be exaggerated. In typical conditions for using mid-range video cards — not the highest graphics settings in the application and the lack of full-screen anti-aliasing, the transition to a wider memory bus can add only a couple of tens of percent to the overall performance of the video system, while the cost of such a “transition” can significantly affect product price. In general, developers of video processors are probably right that they are in no hurry to introduce a 256-bit memory bus into middle-end products, no matter how sad it is for us users. But we don’t lose much either. A good overclocking potential, which, as a rule, is inherent in middle-end video cards, almost always allows you to compensate for a couple of tens of «missed» percent of the video card’s performance.

Perhaps someone will be disappointed that in this test we took a very old representative of the middle-end represented by the GeForce 7600GT and the «veteran» Quake 4 as a tested application. But otherwise it would be difficult to choose the exact «pair» of video cards that differ only in the width of the memory bus, and many of the nuances would not have manifested themselves so strongly. And the use of a newer game for tests could overload the «old man» 7600GT, and again nullify the difference in the results. Dont be upset. We will continue our research, and in the following materials we will study the performance of modern middle class representatives in new games.

By the way, are you curious to know how efficiently the 512-bit memory bus is used in the Radeon HD2900XT?

— Discuss material on the forum.

If you notice an error, select it with the mouse and press CTRL+ENTER.

Related materials

Permanent URL: https://3dnews.ru/266173

Tags:
am2+

⇣ Comments

Bit depth and memory capacity of modern video cards

If you are interested in replacing a video card, then most likely you wondered about its bitness. In this article, we will discuss this issue and try to give simple recommendations. We talked about the choice of video cards in more detail here and we can say with confidence that when buying a video adapter, you need to pay attention to three main components: bit depth, memory size and memory type.

Memory capacity

Video card memory capacity can be from 512 to 4 GB. If you like to play computer games, then your best choice would be a video card with as much memory as possible, but keep in mind that devices with 3-4 GB of memory are very expensive, not every PC user can afford them. At the moment, video adapters with 1-2 GB of memory are the best solution.

Be aware that some graphics card manufacturers provide their products with a large amount of memory (for example, 2 GB or more), while the product is not very expensive. Do not fall for this trick — such video cards are very weak and will not pull most modern games even at medium settings.

Manufacturers, wanting to sell their products, bundle some of their products with unnecessarily large amounts of memory that they don’t need at all. To put it simply, 2 or more GB of memory will indeed be installed on the video card, but most of the volume will not be used due to the low bandwidth (bit depth) of the device. Noticed in the store a video card with a large number of gigabytes, and the price is surprisingly low? Do not rush to buy!

Bit depth — memory bus

It is worth mentioning the types of memory, now there are two main types: GDDR-3 and GDDR-5, the first option is used in budget video cards (usually not very powerful), the second option is the most optimal (provides high memory speed). If you are looking for a video adapter for games, then pay attention to instances with GDDR-5 memory.

Another important characteristic of a video card is its bit depth (another name is the memory bus). Now there are cards on the market with 64, 128, 192, 256, 448, 512 bits.

Speaking about the bitness of video cards, for a better understanding, imagine a bottle with a neck. The fluid in the tank is the volume of data, and the neck is the tire itself. The wider the neck, the faster the liquid will pour out, in the case of a video card, a larger amount of data will pass.

64-bit bus is supplied with video adapters intended for office use, i.e. if you want to play games at more or less high graphics settings, then these video cards are not for you, they are too weak.

Cards with 128-bit bus are already a better choice for gamers who have a limited budget to buy a new graphics card. To get the best performance, buy a 128 bit card with GDDR-5 memory only.