16 bit 44.1 khz: Why Are We Still Using 44.1kHz 16bit for Music?

Why Are We Still Using 44.1kHz 16bit for Music?

Somewhere in digital music history, audio at 44.1kHz 16bit became the accepted standard for CD. Our stock music library uses this standard (as do most) and all consumer music still uses it in 2019. However, it is an old standard now.

It is believed that these sample/bit rates were established from a balance between two factors, the minimum sample rate needed for the human ear and the amount of storage available on a CD. Obviously, at the time (1980s ) there would have been limits on what technology could achieve.

The human ear can hear up to a range of 20kHz in theory, however, in reality, it is probably somewhere between 15-18kHz. So, having the rate over double was deemed sufficient for quality. This is also to do with the Nyquist Theorem (or sampling theorem), which I will not cover here as it is technical and beyond the scope of this post.

At this sample rate, a CD was able to hold 74 minutes of audio.

It is said to have been devised by SONY and Philips.

But, why are we still using it 30+ years later?

It appears to have stuck for two main reasons:

  1. It is deemed good enough by most
  2. It has been so widely used on the CD format

Is it good enough though?

This is a contentious issue. There are passionate people who will tell you that higher resolution audio is much better and there are those who will tell you that the human ear can not distinguish the difference.

There are numerous cases on the web of people saying that they have tested it and can hear a difference, and just as many who say they can’t hear a difference. The problem is that it is subjective and the results rely on many variables, such as the speakers used, the quality of the Digital to Analogue conversion, how the source audio was recorded, the listening environment, the age of the person listening and the condition of their ears.

Regardless of which way you fall in this debate, or what you believe, there are higher resolution audio formats used in DVD and Blu-ray. Also, broadcasters broadcast in 48kHz.

For example:

  • CD – 44.1 kHz at 16 bit
  • Broadcast – 48 kHz at 24 bit (or 16 bit)
  • DVD – 48 kHz at 16 bit
  • Blu-ray – 96 kHz at 24 bit

It seems that regardless of common beliefs, companies are putting resources into developing higher quality audio formats. It is just that they are being used in film and TV, rather than as a music-only format.

Despite these higher formats used in other media, the standard for music is still 44.1 kHz at 16bit. This is probably due to the popularity of the CD (compact disc) format. It has now migrated into digital downloads too, albeit lower-quality compressed mp3 versions (which is another topic altogether).

Where does this leave music producers?

Often confused. To add to the mix, our DAWs (Digital Audio Workstations) typically have the capability to export audio at 44.1kHz, 48kHz, 96kHz or 192kHz with bit rates of 16. 24, 32 or 64.

The paradox is that the samples and virtual instruments are usually only recorded at 44. 1kHz or 48kHz at 16 or 24 bit.

This means that any audio we export at higher resolution is upsampled.

My view is that 44.1kHz 16bit is okay, but I also have 48kHz 24bit versions of my music too. I think I can hear/feel a difference, but I have no proof. Furthermore, I could be tricking myself.

Conclusion

  • With music, we are stuck on “CD quality” (44.4kHz 16bit) because of the popularity of the (now declining) CD format.
  • With film and broadcast, we are using higher-resolution audio.
  • People who create music have the means to make higher resolution audio, but the samples they use are recorded in lower resolution.
  • Music is better (or not) in higher resolution depending on your point of view/listening experience.

It seems that music will be 44.1kHz at 16bit for the foreseeable future, but film audio has moved beyond that. Personally, given that music consumption is mostly streamed these days, I can’t see a reason to increase the quality for listeners as the audio is reduced in quality for streaming anyway. Furthermore, unless we are all able to have perfect listening environments and deal with massive audio file sizes it will not be beneficial.

For film, there is clearly a requirement for higher resolution audio. However, I must say we have never been asked for anything higher than 44.1kHz 16bit (which is the format we use in our royalty-free music library).

To make producing higher resolution music meaningful though, our samples really need to be higher resolution too. Otherwise, our DAW’s are just upsampling.

It does seem that most of the music used in Blu-ray film production is just upsampled too, as it is customary for a composer to deliver it in 44.1 or 48kHz, 16 or 24bit. Having said that, I do know of some cases where 96kHz delivery was requested.

I am going away to unknot my brain now, but feel free to share or comment if you have a view or experience in this matter ?

About Lee

I am Lee Pritchard, a composer and digital media producer. I spend most of my time composing and producing music. My music is available from BeanstalkAudio and MediaMusicNow

You can connect and learn more about me and what I do at my personal website leepritchard.com/links

Reader Interactions

High bitrate audio is overkill: CD quality is still great

Everybody wants great audio, but sometimes our quests for improvement lead us down some really dark and… dumb… corridors. As it is with many disciplines, with music a little knowledge goes a long way. You may have seen discussion online surrounding bit depth and sample rates, but what you probably don’t know is that there isn’t some magic setting that’ll make everything sound better. That’s because digital music as it is today has already left our perceptual limits in the rear-view mirror. You don’t need crazy-high quality files unless you’re creating music that needs heavy editing.

While I’m no stranger to delivering bad news, like any good journalist I show my evidence. The truth of the matter is that humans just can’t perceive the difference between files at a certain point, and you shouldn’t get sucked into the marketing hype if it’s more expensive than what you have already. While I have no doubt that formats like MQA are technologically impressive, most won’t really be able to appreciate the increased fidelity. Chances are near 100% that your current library is perfectly fine.

You only need a sample rate of 44.1kHz

If you’ve looked at your music player’s information tab, you may notice some of your songs have sample rates of 44.1kHz, or 48kHz. You may also notice that your DAC or a phone like the LG V30 support files with sample rates up to 384kHz.

That’s overkill. Nobody on God’s green Earth is going to know or care about the difference because our ears just aren’t that sensitive. Don’t believe me? It’s time for some math. To understand what the limit of human perception is for sample rates, we need to identify three things:

  1. The limit of frequencies that you can hear
  2. What’s the minimum sample rate needed to meet that range (2 x highest audible frequency in Hz)
  3. Does the sample rate of your music files exceed that number?

Sounds simple enough, and it is. The most common range of human hearing tops out at about 20kHz, which is 20,000 periods per second. For the sake of argument, let’s expand that range to the uppermost limits of what we know is possible: 22kHz. If you want to check out the limits of your hearing, use this tool to find the upper limits of your perception. Just be sure you don’t set the volume too loud before you do it. If you’re over 20, that number should be about 16-17kHz, lower if you’re over 30, and so on.

If your hearing can’t reach anything higher than 22.05kHz, then the 44.1kHz file can outresolve the range of frequencies you can hear.

Using the Nyquist-Shannon sampling theorem, we know that a sample rate that provides two samples per period is sufficient to reproduce a signal (in this case, your music). 2 x 22,000 = 44,000, or just under the 44,100 samples per second offered by a 44.1kHz sample rate. Anything above that number is not going to offer you much improvement because you simply can’t hear the frequencies that an increased sample rate would unlock for you.

Any sample rate that exceeds twice the frequency will be perfectly represented (above). It’s only when the sample rate drops below that point where problems arise (below).

Additionally, the frequencies you hear at the highest end diminish over time as you age, get ear infections, or are exposed to loud sounds. For example, I can’t hear anything above 16kHz. This is why to older ears, music has less audible distortion if you use a low-pass filter to get rid of sound that you can’t hear—it’ll make your music sound better even though it’s not technically as “high-def” as the original file. If your hearing can’t reach anything higher than 22.05kHz, then the 44.1kHz file can handily outresolve the range of frequencies you can hear.

16-bit audio is fine for everyone

The other audio quality myth is that 24-bit audio will unlock some sort of audiophile nirvana because it’s that much more data-dense, but in terms of perceptual audio any improvement will be lost on human ears. Capturing more data per sample does have benefits for dynamic range, but the benefits are pretty much exclusively in the domain of recording.

Though it’s true a 24-bit file will have much more dynamic range than a 16-bit file, 144dB of dynamic range is enough to resolve a mosquito next to a Saturn V rocket launch. While that’s all well and good, your ears can’t hear that difference in sound due to a phenomenon called auditory masking. Your physiology makes quieter sounds muted by louder ones, and the closer they are in frequency to each other: the more they’re masked out by your brain. With enhancements like dithering, 16-bit audio can “merely” resolve the aforementioned mosquito next to a 120dB jet engine takeoff. Still dramatic overkill.

This is what a 24-bit music file looks like before any data is removed. Frequency is the Y-axis, time is the X axis, and intensity is color.

However, it’s the quieter sounds that many audiophiles claim is the big difference, and that’s partially true. For example, a wider dynamic range allows you to raise the volume farther without raising audible noise, and that’s the big sticking point here. Where 24 and even 32-bit files have their place in the mixing booth, do they offer any benefit for MP3, FLAC, or OGG files?

Hey kids, try this at home!

While my colleague Rob at Android Authority already proved this with an oscilloscope and some hardcore research, we’re going to perform an experiment that you can do yourself—or just read if you don’t mind spoilers. After scouring the web, I found a couple files on Bandcamp that were actually released in 24-bit lossless files. Many of the ones I found on purported “HD Audio” sites were simply upconverted from 16-bit, meaning they were identical in every way but price. Next, I followed this procedure:

  1. Make a copy of the original 24-bit file
  2. Open in your audio editing program of choice (I suggest Audacity), and invert the file; save as 16-bit/44.1kHz WAV
  3. Open both the parent file and your newly-edited file, and export it as one track
  4. Open the mixed-down track in any program that allows you to view what’s called a spectrogram
  5. Giggle to yourself at spending a lot of money on Hi-res audio

Essentially what we just did here is take a 96kHz/24-bit file, then subtract all the data that you can hear in a CD-quality version of itself. What’s left is the difference between the two! This is the exact same principle that Active Noise Canceling is based on. This is the result I got:

While those little purple bits are visible in the spectrogram, they’re well below the threshold of audibility in the presence of music.

Okay, so there’s a bit of difference in the uppermost reaches of the file, but that’s out of the range of human hearing. In fact, you should probably just filter that out anyway. So let’s show what a human can actually hear by applying a low pass at 20kHz just to cover our bases. Et voila: a final peak of… -85dB at best. Okay, we’re kinda skirting the edges of audibility here, but here’s the problem—in order to actually hear any of this extra data, you need to:

  1. Be listening to music at a level that’s unsafe to listen to for more than 1 minute (96+dB)
  2. Have microphones for ears

While that last point may seem a bit snarky, we know that your brain filters out sounds that are close in frequency to each other (see: auditory masking, linked above). So when you’re listening to music, you’re actually not hearing all the sound at once, you’re just hearing what your brain has separated out for you. So in order to hear the difference between 24-bit/96kHz files and CD-quality audio: the individual sounds can only occupy a very narrow frequency range, be very loud, and the other notes that occur in the same time period must be vary far apart in terms of frequency.

There is no safe listening level to hear the difference between these files.

If we’ve learned anything from this Yanny/Laurel fiasco, a human voice does not fit these criteria (Editor’s note: It’s “Laurel”). So really, the most likely places you’d actually be able to hear the differences between the two are in low frequency notes with somewhat muted harmonics. But there’s a catch: Humans are really bad at hearing low-frequency sounds. In order to hear these notes at equal loudness to higher-frequency notes, you’ll need anywhere from 10 to 40dB of extra power. So those peaks at -87dB in ranges from 20-90Hz may as well be -97 to -127dB, which is outside the range of human hearing. There is no safe listening level to hear the difference between these files.

Cool, huh? It’s always good to know that anyone coming along and telling you that your music collection has to be re-bought because it’s not “high-def” enough is demonstrably wrong. If you’re a budding audiophile, the thing you need to take away from this is to relax: we’re in a golden age of audio here—CD quality is more than fine enough, just enjoy your music! While some may seek higher-quality audio, it’s not necessary if all you want to do is listen to good music.

Is «16 bit CD Quality» Good Enough?

by Pat Brown

Is “CD Quality” Good Enough? In this article, Pat give you an opportunity to compare between 16 to 24 bit.

Flashback to earlier this spring when I started a discussion thread on the SAC Forum regarding digital audio resolution, citing a study that suggested that CD quality was sufficient to fully capture the frequency response and dynamic range detectable by humans. A fire storm ensued, and I posted that we would conjure up a digital resolution demonstration for the then upcoming SynAudCon Digital seminar, to be held in North Haven, CT. The seminar has come and gone, and we did indeed conduct the demo. Following is what we did, including some resources for replicating it on your own.

I will start by saying that this is a surprisingly controversial topic. Many audio practitioners are insulted by the suggestion that “CD quality” is good enough for their golden ears. We’re pros, right? We should be able to hear the difference between Switchcraft and Neutrik connectors, and we definitely deserve better than “CD Quality.”  In reality, CD quality may not be as bad as you think. Digital technology has long had sufficient bandwidth and dynamic range to satisfy human hearing (Figure 1). Is anything gained by making it better than it needs to be?

Figure 1 – This graphic compares some common digital resolutions. These are theoretical values. The green box highlights the approximate limits of human perception. The dashed red lines indicate the practical limits of current analog performance.

CD Quality

“CD quality” audio resolution uses a 16 bit word for each sample. The sample rate is 44.1 kHz. This is often described as simply “16/44.1k.” This translates into an analog dynamic range of approximately 96 dB, and an analog bandwidth of approximately 22 kHz. Technology broke through these limits long ago, and today the most common bit depth is 24 bits, and sample rates of 192 kHz and beyond are possible (24/192k). Those who deal with professional sound systems rarely encounter 44.1 kHz as a sample rate option. It has been increased to a more logical 48 kHz rate, yielding a bit more high frequency extension. Most DSPs use a 48 kHz sample rate and 24 bit words (24/48k).

The Price of Excess Resolution

Just because something is possible doesn’t mean it is necessary. Increasing the digital audio resolution beyond what is needed can strain playback and recording systems and force compromises that include lower channel counts, more storage space, a heavier processing load for your DSP, and greater required bandwidth for streaming. Higher bandwidth systems may be able to pass the “nasties” that often exist above 20 kHz, such as artifacts from switch-mode power amplifiers and noise shaping circuits. It’s entirely possible that these artifacts will be far higher in level than the harmonic content of the program material that you are trying to reproduce.

We want our bandwidth to be wide enough, but not too wide. At some point, “more” is not necessarily “better,” and in some cases it may be worse.

Digital Audio Resolution Demonstration

Theories and opinions abound on the Internet. As with most things, you can forget going there to get to the truth. The truth, in this case, is what is right for you. Digital resolution is something that you can self-assess, and I have created some resources that can help.

I decided to use the SAC Digital seminar for a resolution experiment. What better scenario than a room full of audio professionals, and three days of training on the fundamentals of digital audio? The idea was to configure a high resolution (24/192k) playback system that seminar attendees could use to compare digital resolutions. It immediately became apparent that 96 kHz of analog bandwidth all the way to the listener is an impossibility. Loudspeaker technology is limited to about half that, to be charitable, and air absorption would wreak havoc on anything higher than 20 kHz, even at a few meters. I dropped my target resolution to 24/96k and proceeded. My first stab at the playback system involved a custom two-way studio monitor with a ribbon tweeter. A 24/96k plate amplifier with on-board DSP was used to drive the monitor. I loaded the box, configured the plate amp, and measured the result. Once equalized the monitor was flat to about 40 kHz, which is very near the high frequency limit of my very expensive reference measurement microphone. This makes one ponder the HF limits of most studio and performance mics, but that’s a different topic. The system was minimum phase with the exception of the expected shift produced by low order IIR crossover network. I added a subwoofer to extend the low frequency bandwidth.

Photo 1 – A high resolution playback system

 

Coming up with a signal source was not nearly so easy. While many audio interfaces support up to 24/192 resolution for recording, none of them had analog or digital outputs that exceeded 90 dB of dynamic range and 20 kHz of bandwidth, which is not even CD quality. That’s not to say that they don’t exist, but I didn’t have carte blanche on the budget and wanted to use something I already owned. Every attempt to string several components together to form a system resulted in bandwidth compromises that invalidated the demo. And there in as a very important point – we need a system response that is better than CD quality, and I couldn’t come up with a way to get it (Photo 1).

Sennheiser to the Rescue

The digital seminar happened to be in North Haven, CT, which is very near Sennheiser’s US headquarters. They had some people registered for the class, and offered to help with demo equipment. As luck would have it, they happen to make a headphone playback system with resolution that exceeds the playback system that I was trying to assemble piece meal. Their HD 800 headphones, driven by the HDVD 800 digital headphone amplifier, is about as good as audio can get (Photo 2). It includes USB input with special drivers that support 24/192 playback. I would playback directly from a PC, eliminating all of the bottlenecks that I had been fighting in my previous attempts to assemble a high resolution system.

 

The bandwidth of the headphone system is

6 Hz – 51 kHz (-10 dB)

14 Hz – 44.1 kHz (- 3 dB)

This is probably the practical limit for any playback system, and you can knock some off of each end for even a wide bandwidth sound reinforcement system (24 Hz – 16 kHz).

So, I scrapped my 3-way studio monitor system and went with the headphone playback system. I re-learned a few lessons regarding the limitations of playback systems, mainly that adding another octave or two of high frequency extension is NOT trivial, and without it there may be no need for sample rates that exceed CD quality.

Photo 2 – The Sennheiser HD800 headphone listening system. This is the simplest possible signal chain for experiencing high resolution digital audio. If it is not audible on this system, it is not likely to be audible on any system.

Photo 3 – A SAC Digital attendee listens to the demo, as others await their turn.

The Program Material

The number of possible program sources exceeds infinity. I decided to create a “waveform olympics” track that each attendee could listen to through the system (Figure 2). The track would allow the evaluation of their high frequency hearing acuity and mid-band dynamic range. I also included some high resolution recording bites so that the demo wasn’t just test tones. I created the demo track in Adobe Audition, a professional-quality WAV editor. The resolution selected was 24/96k, because one simply cannot find or make recordings with frequency content higher than about 40 kHz, and playback is an even greater challenge. I know that marketing forces have led us to believe otherwise, but try it (measured results, please). I also theorized that if one couldn’t clearly hear the benefits of a 96 kHz sample rate, there was no point in doubling again to 192 kHz, which is an experiment we could revisit at the fall SynAudCon Digital seminar in Phoenix.

I first planned to use an Audio Precision analyzer (spectrum analyzer mode) to monitor the output of the headphone amplifier. This would prove the frequency response and dynamic range of the system. When set for the required resolution, the responsiveness of the display was less than stellar for monitoring program material. I opted instead to use the spectrum analyzer built into Adobe Audition. It tracked nearly perfectly with the program material, and allowed a linear display that makes the high frequency content clearly visible (Figure 2). The track includes the following:

  • 1. Linear sine sweep from 20 Hz – 48 kHz. I used a linear sweep because it dwells much longer in the high frequency octaves than a log sweep. The listener judges at what frequency the tone disappears.
  • 2. Linear sine sweep from 48 kHz – 20 Hz. The listener judges at what frequency the tone reappears.
  • 3. Multi-tone fade out for evaluating dynamic range. I picked a series of frequencies in the upper mid-range where human hearing is most sensitive. The tones start at -6 dB full-scale, and fade incrementally into the noise floor. I ended up reducing the initial levels by 10 dB for the live demo because they were deafening and rather startling to the listeners. I left them intact in the download file, so keep that in mind. The level drops are incremental, starting at 10 dB/step at the higher levels and reducing to 5 dB/step at the lower levels. This was necessary to achieve a controlled level reduction that can be judged by looking at the level meters of your wave editor. The noise floor of the Sennheiser playback system was inaudible, so the tones did not fade into noise – rather they faded to inaudibility.
  • 4. Multi-tone fade in. This is the reverse of the previous track, which allows the listener to judge the lowest level at which they can hear the tones appear rather than disappear. The levels are visible on the vertical axis of the spectrum analyzer, allowing the listener to judge a “dB re. full-scale” level at which they could no longer hear the tones.
  • 5. High resolution music segment (24/96k) with violins with strong harmonic content above 20 kHz.
  • 6. The same segment, but resampled to CD quality (16/44.1k).
  • 7. The same segment, but with the spectral content below 20 kHz eliminated with a brick-wall filter, leaving only the spectral content above 20 kHz. When spectral content beyond 20 kHz does exist on a recording, it is usually very low in level. I normalized this to full-scale, which adds about 30 dB of gain. This greatly increases the likelihood of audibility.
  • 8. A recording of scissors cutting hair, with strong spectral content above 20 kHz. I processed it in the same ways as the previous track, which yielded the “just better than CD quality” version, as well as the track that only includes the above 20 kHz content, normalized to full scale.

You can download the track for your own experimentation at the end of this article. Keep in mind that you will need a stellar playback system to get the full effect, and the louder you play it, the more dynamic range will be audible. The sample rate of your sound card should be 96 kHz. It probably isn’t now.

Figure 2- The “Waveform Olympics” audio track, with annotations describing each segment.

“Waveform Olympics” – The Movie

The following video is a screen capture of the demonstration. It adds some visual relevance to the explanation. This is Adobe Audition’s “Frequency Analysis” window and “Peak Level Meter” window, placed side-by-side. The frequency (horizontal) axis is linear. This more clearly displays the frequency range that would be missing on a CD, or alternately, be included using a 96 kHz sample rate. I have marked the “CD quality” limits for both frequency and dynamic range. Here is a link to the MOV movie file (~300 MB).

Wave_Olympics_v(1,0)

 

Figure 3 – A frame from the “Waveform Olympics” movie. The limits of CD quality are indicated. Any aspect of the waveform beyond these boundaries would not be preserved at 16/44.1k.

Conclusion

The objective of the experiment was to give each attendee the opportunity to form their own opinion as to the required digital resolution for satisfying the needs of their hearing. It was not a scientific experiment or double-blind A-B test, conducted to publish the results or prove a point. Plenty of those have been done, and the results and conclusions are always a source of contention, doubt, angry blog posts and shouting matches. What is important is for an individual to determine what they believe, by actual experience, rather than from reading the scores of opinions published on the internet. This will profoundly affect your philosophy regarding digital audio, and influence the systems that you design.

I’d be remiss if I didn’t share at least my own opinion, based on the demonstration described. It is as follows. Regarding sample rate, I don’t believe that spectral content above 20 kHz is audible to humans. Period. No one at the seminar claimed to hear the portions of the demo tracks designed to make it audible – under the most controlled conditions I could create. It is a valid argument that higher-than-48 kHz sampling may in some cases be required to produce an end result that is accurate at 20 kHz. But, this is due to poor converters, not the needs of the human auditory system.

Regarding bit depth, I believe that an honest 16 bits is sufficient to fully capture the dynamic range of the linear range of human hearing. The demo track revealed audibility to about -80 dBFS, to be charitable, which equates to about 13 bits. This is from a starting SPL that bordered on uncomfortable, and was at or very near the maximum SPL possible from the playback system. Granted, if your playback system produces 130 dB-SPL then you may indeed be able to hear content that is down 100 dB in level, assuming you are in a room with a very low noise floor. Under these conditions, there could be a benefit from increasing the bit depth beyond 16 bits.

The fact that most converters use 24 bit resolution means that we can waste some of the possible dynamic range and still produce a result that is adequate for human hearing.

So, is “CD Quality” good enough? I’ll stick my neck out and state that properly utilized, it actually IS good enough for the delivery of program material to a human listener. Not only is it good enough, it is probably all that is possible given the many potential bottlenecks in real world playback chains. The fact that we can have higher resolution means that a “CD quality” end result may be achieved using sloppier recording techniques and a non-optimal system gain structure – and that’s a good thing.

The Fine Print

Some important caveats and clarifications follow..

  • This discussion is about the analog resolution that results from 16/44.1k conversion. Cheap 16/44.1k codecs that do not realize this resolution abound. We used great gear for the demo. Yes, the room could have been quieter, but it wasn’t bad, and the use of headphones minimized the impact of the room’s noise floor.
  • I don’t dispute that there are some benefits of using higher sample rates beyond achieving higher playback resolution, such as lower latency in a DSP. It’s a valid argument that it is better to have too much resolution than not enough, just as with pixels in a digital photo. I could agree with using a 96 kHz sample rate for the recording and production processes, where the intent is to end up at CD quality after processing, etc. At least the increased bandwidth will be measurable, if not audible.
  • I don’t dispute that people hear differences between digital devices. This usually gets attributed to resolution, because that is the easiest thing to blame. The actual cause (and there can be many) is usually unknown.
  • It’s great to have a playback system that exceeds the performance level needed. The Sennheiser system we used for the demo was coveted by all, including me, and it is worth every dime of its $3500 price tag. For me it is not because it supports 24/192k resolution, but because it is robust in design and construction and has versatile I/O, including analog. If I owned it, I would run it at 24/48k.
  • In professional sound system work, the 44.1 kHz sample rate has been replaced by 48 kHz, making 24/48k digital audio a standard of sorts, and the highest resolution needed for full-bandwidth audio reproduction. From the listening demo, even a 32 kHz sample rate (16 kHz audio bandwidth) would have been sufficient for most listeners, including myself.
  • “CD quality” performance would be quite an accomplishment for a sound reinforcement system. I doubt that there is a “large room” system in existence that realizes 96 dB of dynamic range and flat frequency response to 22 kHz. An IMAX theater probably comes the closest.
  • A thought-provoking expose on digital resolution is available here. This is what sparked the Forum discussion to start with. While some flaws have been pointed out in the the author’s presentation, they are minor and don’t detract from his main points.

My opinion is just that – my opinion. But, it is based on an honest confrontation with the question, and many hours spent seeking an objective proof of the need for “more” that did not materialize in spite of my best efforts. To me, the real danger is those that have formed their opinion from the opinions of others, often based on unsubstantiated claims and anecdotal evidence. There are already those that would brand a DSP operating at 48 kHz as “insufficient” or “low fidelity.” This can ultimately bring marketing pressure on manufacturers to “drink the cool-aid” and support the higher resolutions with the attendant performance trade-offs. Do I really want to double the amount of digital audio information (assuming 96 kHz vs. 48 kHz) to reproduce a frequency range that is not audible to humans? You should decide for yourself.  pb

 

Download the “Wave Olympics” wave file (150 MB)

 

what high-resolution audio formats hide / Sudo Null IT News Among the reasons for the emergence of new formats at one time is dissatisfaction with the sound quality of CDs, because at the dawn of the era of compact discs, all record labels were in a hurry to reissue their analog catalog on digital media, caring little about quality: at least plus or minus attractive and devoid of all sorts of clicks and other roughness sound already seemed like a victory.

Thus, the manufacturers of equipment and record labels began to fulfill their promises of the highest quality of CD sound far from immediately. As a result, irreparable damage was done to the format in the minds of many audiophiles, and CDs turned into a kind of digital «villain». However, looking ahead, it is worth noting that CDs can sound absolutely wonderful — subject to the highest quality of the original recording, as well as due attention to mastering and production. But first things first.

Lately we have seen a return to vinyl and also a growing interest in high resolution digital files. But are we in danger of falling into the trap of the 1970s, when all attention was focused only on technical characteristics? To answer this question, let’s talk about how and what we hear, as well as about the realities of high-res recording.

First, let’s touch on the technical side of the issue. The CD format with 16-bit quantization and a sampling rate of 44.1 kHz allows recording audio in the frequency range from 0 Hz to about 22 kHz (that is, slightly wider than the capabilities of human hearing) and a dynamic range of about 95 dB, which is quite enough for the vast majority of musical instruments. At the same time, the 24-bit 48 kHz format expands the dynamic range to 150 dB, and the upper frequency to 24 kHz. What’s more, many audiophiles prefer 24-bit 96kHz, topped out at 48kHz, and 24-bit 192kHz, topped out at 96kHz. Such high sound frequencies are far beyond the capabilities of human hearing, so a simple and reasonable question arises here: what exactly is all this for?

Some Hi-Res proponents will say that although they don’t hear anything at these frequencies, they can still «feel» the difference, often presenting it as a higher «airiness» of the sound. At the same time, it is curious with what particular sense organ they feel this airiness? In general, we can indeed «feel» very low frequencies — provided that they act with a high amplitude and from a relatively close distance. Well, as for Hi-Res fans, most likely they perceive the sound to be smoother and more coherent due to the high sampling rate. In addition, the ADC and other components used during recording in 192 kHz is likely to be high quality, which in itself will affect the listening experience.

In order to test the advantages of Hi-Res in practice, everyone can conduct an interesting test — listen to randomly several records with different sampling rates on a good digital audio player. Tape the display of the player with something handy so that you have no idea what resolution the file is currently playing at. Arm yourself with a pen and notepad (well, or the “notes” app on your smartphone), listen to all the recordings and note what exactly you hear and which track sounds better. If you can accurately detect higher sample rates, you can safely say that you have extraordinary hearing.

Most people over middle age can hear a tone up to a maximum of 15 kHz. By the age of 60, this frequency can drop to about 12-13 kHz for the average male (and possibly slightly higher for women). Thus, a 96 kHz audio signal means little to the perception of sound. However, it is likely that many listeners will feel the extra smoothness and cohesion in the mids that Hi-Res boasts. However, many other factors also affect the sound quality of recordings, some of which play a very important role.

For example, one might wonder what equipment is used to record. Indeed, ironically, many professional condenser microphones from Sennheiser, Beyerdynamic, AKG, Neumann, Shure, Rode, and Audio Technica have frequency responses that drop rapidly just past 20kHz. What’s more, some popular mics start to noticeably roll off after 18kHz: thus, they are unlikely to pick up anything at 48kHz or 96kHz, and in most cases this is just fine, because you don’t really want to enter high frequency noise into the mixing circuit.

So, in order to record really «high-resolution» audio, you first need special microphones that can pick up very high frequencies without introducing too much noise of their own. Next, we need mic preamps and mixers with extended frequency response and ultra-low noise, as well as a high-end analog-to-digital converter. Suppose we have microphones with a flat frequency response from 20 Hz to 96 kHz and ultra-low noise, connected with special audio cables to an ultra-low noise preamplifier. Next, we will send this signal to the mixing section and a high-end analog-to-digital converter that transmits the high-resolution audio signal to a digital recorder or computer with similarly improved characteristics.

And in general, yes, all this is really feasible. Moreover, if you record a violin solo in 24-bit 96 kHz in this way, you will notice that at the highest notes, some harmonics reach a frequency of approximately 28 kHz. A soprano flute can also produce similar harmonics, but whether we are able to hear them is another, no less interesting question. Ultimately, almost all of the sound signal significant to our hearing in recordings of a violin solo may well be contained on a 16-bit CD with a sampling rate of 44.1 kHz.

It is doubly surprising that even a full-fledged orchestra, with its widest dynamic range, can be recorded in 16 bits, provided that the levels are initially set correctly (without resorting to compression). Of course, we should not forget that it is quite possible to generate electronic sounds that go beyond the frequency range of human hearing and a dynamic range of 100 dB. But all this remains, as a rule, at the level of theory.

In conclusion, it is worth noting that due to the higher smoothness and coherence of sound in the midrange, Hi-Res recordings are definitely worthy of the attention of listeners, but only on condition that the audio system allows you to reproduce all these nuances.

Curiously, many audiophiles have favorite classical music recordings made in the late 50’s and early 60’s. After all, music is not only technical characteristics, but the determining factor is often the performance and professionalism of the sound engineer, which allows you to make a good recording even with a minimal set of microphones. And having listened to some jazz recordings made in the early 60s, one cannot fail to note that they sound very lively and musical: maybe it’s not so important that they are not in Hi-Res.

Bit versus kilohertz: which is more important? • Stereo.ru

Reflections on the success of the 12-bit drum machine E-Mu SP-1200 and the rather narrow dynamics of the pop/rock repertoire gave rise to heretical thoughts. Are the characteristics of our digital protocols optimal?

Fans of studio master sound can be angry all they want, but the fact remains. The Red Book format, at an age of 35 years unthinkable for digital technologies, still remains the main container for commercial phonograms. Even if you’re listening to a plucked track in MP3 or iTunes, its proportions are described in the same 16 bits per 44.1kHz reference sample. Is it a lot or a little? Watching what to measure.

A CD or similar format file is able to provide 16 x 6 = 96 dB between the quietest and loudest passage. This is very much. A technical signal for laboratory tests can make the DAC produce such a figure, but I do not know of real musical events with such a scope. Even the same «1812» with a gun — there is 60 dB in the most peak moments and a little more than 20 dB on average. In a modern phonogram, the dynamic range indicators are usually narrowed by a factor of three.

According to legend, Philips initially wanted to settle for 14-bit resolution; multiply 14 by 6 = 84 dB, this is still above the level of the rumble of the most expensive vinyl paths. The first generation of Philips TDA1540 DACs operated with 14 bits and nothing, many vintageists are very pleased with this chip to this day.

The first generation of CD players used a Philips TDA1540

14-bit DAC. In general, CD quality seems to be enough for the most daring audio tasks. And yet, when you compare the master in Hi-Res and the standard Red Book CD obtained from it, something seems to be lost. Somewhere more, somewhere not so much — depends on the content. But don’t forget that resampling and bit reduction are reduced by various algorithms, so the final quality for CD printing turns out to be guesswork.

My personal experience of fiddling with recording, editing and playing digital audio by and large has two points of suspicion. The first one looks quite technically justified.

I categorically do not like that on an audio stream with a resolution of 44.1 kHz, the cutoff frequency lies too low, in the region of 20 kHz. It seems that it shouldn’t be particularly heard there, but as the graphs of the digital filtering of the DAC show, the devil knows what is going on in the vicinity. A hard cut of the recording spectrum, although in real life there is a gentle drop. Or vice versa, an early blockage due to the specifics of the filter. And some parasitic harmonics at high frequencies. Their specific weight relative to the total signal is not very large, but still the picture is unsightly. All these oversamplings are required due to the inability to set a normal analog filter at 22.05 kHz.

It would be great if 50 kHz sampling of the first Soundstream digital recorders were left as standard in the early 80’s. And even better if it were about 60 kHz. Thus, we would get a fairly extended frequency response, providing a smooth decay of all musical touches and nuances up to 30 kHz, as in a good tape recorder or SACD. There really is nothing above. But in the end it turned out differently.

Prior to the announcement of the CD, Soundstream digital recorders recorded audio at 16-bit/50 kHz

Sony chose 44.1 kHz due to compatibility with the PAL standard. Betacam and VHS professional VCRs allowed PCM audio to be recorded. Three values ​​fit into each of the 588 lines of PAL video transmitted at 25 frames per second: 3 x 588 x 25 = 44100. Here’s the arithmetic.

A Sony VCR using a PCM-F1 processor could record digital audio code

. Further development of digital recording and playback technologies used a multiple multiplication of the basic CD and DAT formats — 44.1 and 48 kHz: i. 88.2, 96 kHz and so on. Of course, it became possible to move the quantization noise away into ultrasound, but the size of the audio files also grew exponentially. And also an increase of one and a half times when switching from 16 to 24 bits. What if it’s 32 bit? And when I try to make this huge audio array a little smaller, the second suspiciousness finishes me off.

It would seem that a resolution of 24 bits and higher implies sampling far beyond the limits of human hearing. It’s no joke, 24 x 6: there is neither technology nor phonograms so that they dance in the 144 dB range. That’s why 24 bits were started in the studios — to take out any overlay errors when editing to hell. But it is worth subjecting such a file to decimation, even just resampling from 192 at 96 kHz, and something subtly changes. Slightly different levels, slightly flatter and duller sound, which I don’t really like in comparison. Therefore, I choose the original high-res not for the abstract frequency, but only for the absence of scars that the master file grows on the way down. Let’s try to evaluate these injuries.

For experimentation, a pet label 2L was chosen, which offers some of its DXD recordings for free download. I must say, the repertoire, as it happens with audiophile offices, is rather painful and slow. But, fortunately, Eugene Bozza’s «Children’s Overture» was found there and rescued. This phonogram rattles quite vigorously to judge the change in sound during the transformation of the master file.

Initially, 5 and a half minutes of the DXD original «Children’s Overture» with 24-bit / 352.8 kHz characteristics takes up a whopping 437 megabytes. And it’s compressed in FLAC, almost the size of an entire CD! What will we save on?

In the early days of digital audio, there were no effective models for dealing with quantization errors. Yes, and the computing power of the processors was much beyond the power. The sizzling 8-bit sound of the first computer games became a stereotype for many generations to come, but now you will see for yourself that 8-bit can play pretty well today. The so-called dithering has become a miraculous panacea, or, to be more precise, its variety, noise shaping.

A very sensible article by iZotope developer Alexey Lukin gives a clear example of how adding a handful of noise helps a picture when the resolution is reduced to 4 bits with 16 gradations of brightness. It’s just a miracle when you see how quantization errors (the so-called image posterization) practically disappear. The same thing happens with sound.

In contrast to the general case of dithering, noise shaping is generated not in the entire band of bands, but only in the high-frequency region, which is less noticeable by ear. Reasoning about visibility is similar to the thoughts of the developers of the MP3 algorithm, with the only difference that these add to the frequency range, not cut. Noise-shaping allows you to increase the dynamic range of the phonogram, it is heartily used in DSD encoding, and traces of its work are also visible when recording the «Children’s Overture».

So, with the help of the resampler and iZotope MBIT+ proprietary noise-shaping, a whole heap of «Children’s Overtures» was generated. The result was a stack of FLACs with a bit depth of 8, 12, 16, 20 and 24 bits at a multiple sampling of 44.1 or 88.2 kHz. There were also a couple of MP3 samples with a bitrate of 320 kb/s. One was rolled over from a 24bit/88.2kHz file, the other from a 16bit/44.1kHz file, which are also on this list. You can download the archive and decide for yourself what you like.

Of course, the most complete version 24/88 played the clearest and best of all, almost indistinguishable from the original. I hoped that lowering to 20 bits would not affect the quality, but that was not the case. So let’s move on to the other side of the list.

Sorting the folder by size showed that the smallest sample was 8 bit / 44. 1 kHz. Less than 12 megabytes after 400! Despite the audible noise, it sounds very provocatively and this is not an illusion — after all the mathematics, the level of the phonogram has grown a little. MP3s were expected to be next in volume. I don’t know about you, but out of the whole set, I found them the most boring to check. And this despite the fact that in the pause such files had everything clean and tidy. Well, not mine, and that’s it. Crumpled gray sound without a spark. It is more pleasant to listen to a noisy, but lossless with a low bit rate, reminiscent of a cassette. Here we go further on them.

One and a half times more than MP3 turned out to be a pair of samples at 12 bit / 44.1 kHz and 8 bit / 88.2 kHz. Size — 19.7 and 23.5 MB, respectively. Compared to the basic CD-resolution (28.5 MB), additional noise is noticeable only in an 8-bit track, and even then in headphones. I could not give a clear preference for any one version.

Subjectively, a file with a higher bit depth plays faster, more assertive, especially for 24 bit / 44. 1 kHz. But 8-bit and 12-bit audio at the higher sampling rate of 88.2kHz also has its upsides. More «flexible» aftersounds, the stage is built deeper in the absence of a digital filter in the audible area. You can also group tracks by size and compare them yourself.

In terms of quality / size, I would single out the following three, and all of them, alas, rely on an increased sampling rate of 88.2 kHz:

• 12 bit / 88.2 kHz (13 times original reduction)

• 8 bit / 88.2 kHz (18.5 times original reduction)

• 16 bit / 88.2 kHz (10x original reduction)

Summing up this review, if it was possible to restart the entire digital industry again, I would prefer to use the following gradation of PCM protocols:

• 60 kHz sampling rate as industry standard

• 120 kHz sampling rate for demanding high-end applications

• 10 bit bit length for audio streaming (10 bit / 60 kHz)

• 14 bit length for standard music distribution (14 bit / 60 kHz)

• 22 bit for studio work and audiophile editions of music (22 bit / 60 kHz or 22 bit / 120 kHz)

Audio coding: secrets revealed | Article

Audio setting for video capture and broadcast.

As people directly related to the AV field, we are constantly talking about audio encoding and audio codecs, but what is it? An audio codec is essentially a device or algorithm capable of encoding and decoding a digital audio signal.

In practice, audio waves that are transmitted over the air are continuous analog signals. Signals are converted to digital format by a device called an analog-to-digital converter (ADC), and the inverse conversion device is called a digital-to-analog converter (DAC). The codec is between these two functions and it is it that allows you to adjust some important parameters for successful capture, recording and broadcasting of an audio signal: codec algorithm, sample rate, bit depth and bit rate.

The three most popular audio codecs are Pulse-Code Modulation (PCM), MP3 and Advanced Audio Coding (AAC). The choice of codec determines the degree of compression and the quality of the recording. PCM is a codec used by computers, CDs, digital phones, and sometimes SACDs. The signal source for PCM is sampled at regular intervals, and each sample represents the amplitude of the analog signal in a digital value. PCM is the simplest option for digitizing an analog signal.

With the right parameters, this digitized signal can be fully reconstructed back to analog without any loss. But this codec, which provides almost complete identity to the original audio, is unfortunately not very economical, which translates into very large file volumes, and such files are not suitable for streaming. We recommend using PCM when recording digital images for your sources or when you are post-processing audio.

Fortunately, we always have the option to choose another codec that can compress digital data (compared to PCM) based on some useful observations about the behavior of sound waves. But in this case, you have to make a compromise: all alternative algorithms are associated with “losses”, since it is impossible to completely restore the original signal, but, nevertheless, the result is still good enough that most users will not be able to catch the difference.

MP3 is an audio encoding format using just such a digital data compression algorithm that allows you to save the audio signal into smaller files. The MP3 codec is most commonly used by users to record and store music files. We recommend using MP3 for broadcasting audio content as it requires less network bandwidth.

AAC is a newer audio coding algorithm that is the successor to MP3. AAC has become the standard for MPEG-2 and MPEG-4 formats. In fact, this is also a digital data compression codec, but with less quality loss than MP3 when encoding with the same bitrates. We recommend using this codec for live streaming.

Sampling frequency (kHz, kHz)

Sampling frequency (or sampling frequency) — the frequency at which the signal is digitized, stored, processed or converted from analog to digital. Time discretization means that the signal is represented by a number of its samples (samples) taken at regular intervals.

Measured in hertz (Hz, Hz) or kilohertz (kHz, kHz) 1 kHz equals 1000 Hz. For example, 44,100 samples per second would be 44,100 Hz or 44.1 kHz. The selected sampling frequency will determine the maximum playback frequency, and, as follows from the Kotelnikov theorem, in order to completely restore the original signal, the sampling frequency must be twice the highest frequency in the signal spectrum.

The human ear is known to be able to pick up frequencies between 20 Hz and 20 kHz. Considering these parameters and the values ​​shown in the table below, one can understand why 44.1 kHz was chosen as the sampling rate for CDs and is still considered a very good sampling rate for recording.

There are a number of reasons for choosing a higher sampling rate, although it may seem like a waste of time and effort to reproduce sound outside the range of human hearing. At the same time, 44.1 — 48 kHz will be quite enough for the average listener for a high-quality solution to most problems.

Bit depth

Along with the sampling rate, there is such a thing as bit depth or sound depth. Bit depth is the number of bits of digital information to encode each sample. Simply put, the bit depth determines the «accuracy» of the measurement of the input signal. The greater the bit depth, the smaller the error of each individual conversion of the magnitude of the electrical signal into a number and vice versa. With the smallest bit depth possible, there are only two options for measuring audio fidelity: 0 for complete silence and 1 for full volume. If the bit depth is 8 (16), then when measuring the input signal, 2 8 = 256 (2 16 = 65 536) different values.

The bit depth is fixed in the PCM codec, but for codecs that require compression (for example, MP3 and AAC), this parameter is calculated during encoding and may vary from sample to sample.

Bitrate

Bitrate is an indicator of the amount of information that encodes one second of sound. The higher it is, the less distortion and the closer the encoded composition is to the original. For linear PCM the bitrate is very easy to calculate.

bit rate = sample rate × bit depth × channels

For systems such as the Epiphan Pearl Mini that encode 16-bit linear PCM (16 bits), this calculation can be used to determine how much additional bandwidth PCM audio may require. For example, for stereo (two channels), the signal is digitized at a frequency of 44.1 kHz to 16-bit, and the bit rate is calculated as follows:

44.1 kHz × 16 bits × 2 = 1411.2 kbps

Meanwhile, audio compression algorithms such as AAC and MP3 have fewer bits to transmit the signal (which is their purpose), so they use lower bit rates. Usually the values ​​are in the range from 96 kbps to 320 kbps. For these codecs, the higher the bitrate you select, the more audio bits you get per sample, and the higher the sound quality will be.

Sampling frequency, bit depth and bit rates in real life.

Audio CDs, one of the first most popular inventions for the general public to store digital audio, used 44. 1 kHz (20 Hz — 20 kHz, human ear range) and 16-bit bit depth. These values ​​were chosen in order to be able to store as much audio as possible on the disc with good sound quality.

When video was added to audio and DVDs and later Blu-ray discs appeared, a new standard was created. Recordings for DVD and Blu-rays typically use linear PCM at 48kHz (stereo) or 96kHz (5.1 surround) and 24 bits. get the best possible quality using the additional available disk space.

Our recommendation

CDs, DVDs and Blu-ray discs have the same goal — to give the consumer a high quality playback mechanism. The goal of all developments was to provide high quality audio and video, without worrying about the size of the file (as long as it fits on the disk). Linear PCM could provide such quality.

On the contrary, mobile media and streaming media have a completely different goal — to use the lowest possible bitrate, while still sufficient to maintain an acceptable quality for the listener.