In the world of hi-fi audio and audio production, data compression is one of the most important metrics used by listeners and creators to denote the quality of their audio files.
How does digital audio data compression work? Audio data compression works by removing bits from a digital audio file to reduce its size. Lossy compression removes “nonessential details” irreversibly (notably frequency content). Lossless compression removes “statistical redundancies” to reduce file size reversibly without impacting the audio.
In this article, we'll discuss how audio data compression works, how it affects the resulting sound of audio files and why it's used.
Why Do We Need To Compress Digital Audio Files?
Digital audio data compression is a powerful tool for massively reducing file sizes, thereby facilitating the easy sharing of music and recordings across the web. Simply put, without digital audio compression, the modern audio streaming landscape would not exist.
I should mention that audio, whether analog or digital, uncompressed, lossy-compressed or lossless-compressed, is only a representation of sound. An audio single must be played back through a transducer (speakers, headphones) and transmitted to our ears in order to be heard.
To learn more about the differences between audio and sound, check out my article What Is The Difference Between Sound And Audio?
Audio compression aims to reduce the number of bits involved in accurately reproducing a sound. It's important to remember that a digital file – whether it's audio, video, software, a word document, or whatever else – is just a string of binary digits (bits). Compression aims to remove any unnecessary bits while still maintaining an accurate sound.
Uncompressed audio files are huge! For example, a CD-quality (44.1KHz, 16-bit) WAV file has a file size of approximately 10MB per minute. Lossy compressed audio formats like MP3 often cut this size by five times.
When sharing and streaming audio files across the web, uncompressed files take up lots of bandwidth and disk space for end-users. These technological demands, along with the technological limitations of the time, are why the compressed MP3 format became so popular for online music sharing in the early-2000s.
How Does Audio Compression Work?
An all-important question for producers and listeners is how digital audio data compression works? What does it mean when an audio file is lossless – how does that compare to lossy compression or no compression at all?
Audio data compression works by removing the bits that we don't need from a sound file. There are two main types of audio compression: lossless and lossy.
Lossless Audio Compression
With lossless compression, no data is permanently lost through compression. Instead of removing sounds that we either can't hear or struggle to perceive, lossless compression removes any redundant data.
The most popular lossless audio format is FLAC. Other lossless audio formats include:
- ALAC (Apple Lossless)
- WMA Lossless (Windows Media Audio Lossless)
- Monkey's Audio
- SHN (Shorten)
Lossy Audio Compression
Far more common, however, is lossy compression. This is defined by any compression method that irreversibly removes some information from the original representation.
The most popular lossy format is MP3. Other lossy audio formats include:
- AAC (Advanced Audio Codec)
- WMA Lossy (Windows Media Audio Lossy)
How do lossy compression methods decide which data to remove and which to keep?
Firstly, the human ear can't hear the entire sound spectrum, and lossy compression processes remove any sound outside of our hearing range. This range is 20 Hz and 20 kHz.
Though we can hear frequencies within the 20 Hz – 20,000 Hz range, we're most sensitive to a smaller range, generally given as 100 Hz to about 6 kHz. Therefore, in theory, any quiet content in the low-end and high-end can also be removed without a noticeable impact on the overall sound quality.
For example, here are the relationships between the bitrate and the high-end frequency cutoff points for MP3 files (assuming two-channel/stereo audio with a sample rate of 44.1 kHz and bit depth of 16-bit):
|Bit Rate||High-End Frequency Cutoff||File Size|
|320 kbps||~20.5 kHz||2.4 MB/min|
|256 kbps||~20 kHz||1.92 MB/min|
|192 kbps||~18 kHz||1.44 MB/min|
|160 kbps||~17 kHz||1.2 MB/min|
|128 kbps||~16 kHz||960 kB/min|
|96 kbps||~15 kHz||720 kB/min|
|64 kbps||~11 kHz||480 kB/min|
|32 kbps||~5 kHz||240 kB/min|
Note that, for technical reasons related to the scalefactor band 21 of the MP3 coding format, MP3 files all have a low-pass filter set at 16 kHz. Therefore, even if the file can theoretically store digital audio information at these high frequencies, they're generally rolled off anyway.
Lossy compression methods also use a neat psychoacoustic hack known as auditory masking.
Used in processes like MP3, auditory masking (also known as sound masking) utilizes the phenomenon that weaker sound signals are imperceptible in the presence of strong, loud sound signals. For example, in orchestral music, loud instruments can ‘mask out' softer, quieter sounds in the mix.
Therefore, it's safe to remove these sound signals as long as they're removed with respect to the right masked threshold. The MP3 format does just that, with an algorithm that finds and removes “masked” information from the digital data to free up space.
However, this threshold is non-linear and varies between people and between certain noises (namely, the amplitude of the loud sound).
MP3 gets this right. Even with a six-to-one compression ratio, expert listeners can't distinguish between a compressed MP3 track and their original audio clips (source).
In fact, audio is the area where lossy compression is most successful. Most lossy compression methods in images and video noticeably degrade the visual quality. Lossy compression is highly unsuitable for other files where all of the data needs to be preserved (like a spreadsheet or Word document).
By combining the intensive studies of psychoacoustics and computer science, engineers have produced an efficient and imperceptible lossy compression algorithm.
Should I Use Compressed Audio For My Projects?
The argument between lossy compression and lossless compression (or indeed, uncompressed) audio depends on what you intend to use your audio files for.
Lossy compression does have its disadvantages. As some data is destroyed, lossy compressed audio formats like MP3 aren't suitable for recording stems and takes to create a master mix. For example, if you record vocals or instruments in a lossy format, and this is continuously encoded as destructive edits are made, you're slowly chipping away at your file's sound quality.
During production, you should be using lossless audio (like FLAC or ALAC) or an uncompressed audio format. The most popular format for production is WAV – which doesn't use any compression during encoding.
Although, as soon as we consider delivering your project to a listener, a high-quality MP3 is more than fine. You would do well encoding in 320kbps, as low bit rates can hurt sound quality.
To learn about digital audio formats in much more detail, check out my article Complete Guide To Digital Audio Formats (MP3, WAV, & More).
Have any thoughts, questions or concerns? I invite you to add them to the comment section below! I'd love to hear your insights and inquiries and will do my best to add to the conversation. Thanks!
This article has been approved in accordance with the My New Microphone Editorial Policy.