Normalizing audio is a fairly simple concept, but its applications are not always fully understood. Is normalization always necessary, never necessary, or only applicable to certain situations? Let’s find out.
Should you normalize audio? Normalizing audio is an effective strategy for making samples, and vocal takes more consistent in volume before/during mixing and even as a method for mastering to bring a group of final music, podcast, or television mixes up to a consistent level. Normalizing audio should be avoided on the master track or during the pre-master or master bounce down to avoid intersample peaking.
In this article, we’ll discuss what audio normalization is and the two types of normalization. We’ll consider the pros and cons as well as the typical and effective applications of this process.
What Is Normalization?
Normalization is the process of adding or subtracting a certain amount of gain or amplification to bring an audio recording to a target level, otherwise known as the “norm”. It’s a pretty simple concept to grasp. Proper normalizing does not affect the dynamic range of the audio; it simply adds or subtracts gain from the audio to make it louder or quieter, respectively.
Normalization, as an idea, can be applied to analog audio. However, as a process, it is distinctly done to digital audio, which has easy-to-read information (by digital systems) and clearly defined limits.
Normalization became common practice when digital audio workstations began dominating the recording industry in the 1990s. Today, normalization is often regarded negatively in the audio world, losing ground against other, less invasive techniques.
That being said, when used wisely, it can be a great ally in audio editing, mixing, and making audio more consistent. It has applications in music, television, broadcasting, podcasting and more. Further, doing loudness normalization to dialogues and podcasts can enhance their perceived quality considerably.
That all being said, normalization is done in one of two ways:
The first method, commonly known as peak normalization, is not a complex process but rather a linear one. It is achieved by taking the highest peak in the waveform and bringing it to the norm along with the rest of the clip proportionally. Hence, by applying the same amount of gain across the board, dynamics are respected, and you get a waveform that is close to the original, only louder (or quieter).
The peak normalization process effectively finds the highest PCM sample value of an audio file and applies gain to, typically, bring the peak up to 0 dBFS (decibels Full-Scale), which is the upper limit of a digital audio system. Note that this normalization can also be used to bring the audio down and doesn’t necessarily have to adjust the peak level to 0 dBFS (though this is the most common).
To learn more about the often confusing subject of decibels in audio, check out my article What Are Decibels? The Ultimate dB Guide For Audio & Sound.
Note that peak normalization is only concerned with detecting the peak of the audio signal and in no way accounts for the perceived loudness of the audio. This brings us to the next type of normalization.
The second method is called loudness normalization and involves much more complex processing.
The reason many people choose this second method is because of the human perception of loudness. At equal dBFS values (and ultimately sound pressure levels), sustained sounds are perceived to be louder than transient sounds.
For example, let’s consider peak normalizing a 2-second clip of a square wave and a 2-second clip of a snare drum hit to 0 dBFS. The square wave, which is sustained, will be perceived as being much louder than the snare hit, even though they’ll both be normalized to a peak value of 0 dBFS.
Loudness normalizing, on the other hand, will adjust the levels of the recording to perceived loudness. For this, a different measurement called LUFS (Loudness Unit Full Scale) or LKFS (Loudness, K-Weighted, Relative To Full Scale) is employed. This is a more complex, advanced procedure, and the results are perceived as louder by the human ear.
Note that although LKFS and LUFS have different names, they are the same. They are both standard loudness measurement units used for audio normalization in broadcast, television, music, and other recordings.
RMS values could also be used to find the “average” level of the audio, though RMS is not directly related to how we perceive sound. The audible range for human hearing is 20 Hz to 20,000 Hz, though we are more sensitive to certain frequencies (particularly in the 200 Hz to 6,000 Hz range). LUFS/LKFS takes this into account for “perceived loudness” while RMS values do not.
This process works with EBU R-128 volume detection to find the “average” perceived loudness of an audio file and to adjust the overall perceived loudness accordingly. This normalization process could be used to bring the overall level up or down, depending on the circumstance.
Normalization Vs. Dynamic Compression
As mentioned previously, dynamic range compression and normalization are similar but different. It’s a common mistake to confuse normalization with dynamic compression. Yet, there is a big difference between these processes.
Dynamic range compression is the process of reducing the dynamic range of an audio signal (the difference in amplitude between the highest and lowest points). Compression does so by attenuating the signal amplitude above a set threshold point and providing makeup gain to make up for the levels lost.
To learn more about dynamic range compression, check out my article The Complete Guide To Audio Compression & Compressors.
With normalization, as we’ve discussed, the amount of gain applied to the entire recording is consistent, and hence, the dynamic range is preserved. This means the resulting audio is the same as the original, just louder (or quieter).
The Pros & Cons Of Audio Normalization
Now that we understand what normalization is let’s discuss the benefits and drawbacks of audio normalization.
Audio Normalization Pros
Volume consistency: The first pro is common practice, level-out audio recorded in different conditions and places. For example, to level the tracks in a record or the episodes in a podcast.
Avoid peaks above 0 dBFS: At high enough sample rates, normalizing a track to reach maximum loudness will keep the audio at or below the digital maximum, thereby avoiding digital clipping/distortion. Note that limiters also offer the same result, though, like compression, they do so by affecting the dynamic range of the audio.
Audio Normalization Cons
Normalization is often destructive: Although DAWs offer up to 999 levels of undo in most cases, not every process can be undone. Indeed, most programs will ask you to create a new version of the file to normalize it. If you happen to discard or lose the original, un-processed one, you won’t be able to undo the normalization process and be left with whatever the result was.
Inter-sample peaks: If the sample rate is too low, the 0 dBFS may actually be exceeded when normalizing during the bounce down. As the name suggests, this clipping happens as the digital audio is reconstructed as analog audio and the resulting continuous waveform clips between two or more digital samples that were too close to the 0 dB limit.
When To Normalize Audio
So, normalizing a recording does have pros and cons, but when is it better to use audio normalization? Let’s take a look at some scenarios:
Samples: Normalizing audio samples grants you that, when mixing, you won’t have to touch levels to achieve consistency. So, normalizing each sample before they go into the mix is a great idea.
Album assembly normalization: When putting together a collection of recordings, loudness normalization can be used to ensure there are no volume jumps from track to track. The same process can be applied to episodes in a podcast or television program.
To even out differences in a vocal take: A simple studio trick is to normalize a vocal take, chopping it down to its bits and using your ears as a guide. Performing normalization only on the low-volume sections will bring up the volume only where you think it’s lacking and create a more consistent take. Beware of abusing this method because you might kill vocal dynamics, so let your ears be your guide.
When To Avoid Normalizing Audio
When bouncing your track: A common mistake is to add a normalization plug-in to the master bus when bouncing. As discussed previously, inter-sample peaking may occur, which will produce distortion and artifacts in the final track that weren’t there in the session.
A pre-mastering track: If you are about to send a track for mastering that you think needs the volume of the lower-volume sections pumped up, then normalizing can help. Otherwise, if you already pushed it with gain staging, normalizing it will cut off the headroom for the mastering engineer to do his or her job.
A Note On Modern Streaming Services
Some streaming services like Spotify and YouTube normalize audio, so you don’t have to adjust the volume from one song to the next whenever you are in a playlist. Knowing this information, you can count on normalization and abstain from doing it in the bouncing or mastering processes.
Regarding streaming services, here is a shortlist of loudness normalization per popular streaming platform:
|Streaming Service||Loudness Normalization|
|Amazon Music||-9 to -13 LUFS|
|Apple Music||-16 LUFS|
|Soundcloud||-8 to -13 LUFS|
|Spotify||-13 to -15 LUFS|
|YouTube||-13 to -15 LUFS|
A Note On Normalization’s Role In The Loudness War
The loudness war is a trend of increasing perceived loudness in recorded music and audio at the expense of dynamic range and overall quality.
In the 1980s, just as compact discs (CDs) were becoming popular, it was common practice to peak normalize audio to 0 dBFS.
In the 1990s, the loudness war started as mastering engineers began optimizing perceived loudness in their digital recordings.
The loudness war became widespread in the 2000s, culminating in the infamous Death Magnetic album by Metallica in 2008.
Achieving this loudness had less to do specifically with normalization and more so to do with overly aggressive dynamic range processing.
In the 2010s, the loudness war began cooling down. People were largely fed up with the loudness-over-quality mindset. Dynamic music simply sounds better.
Another large part of the cooling of the loudness war is the popularity of streaming services, which we discussed above. As streaming services began loudness normalizing the audio themselves, the additional loudness achieved by dynamic range compression was mitigated. In other words, the loudest songs of the loudness war lost their advantage of being louder than the rest. Still, they maintained the negative consequence of reduced dynamics and increased distortion and pumping.
In this regard, we can thank normalization for helping to reduce and even reverse the trend of loudness over quality in modern music recording.
What is the difference between audio compression and limiting? Dynamic range compression and limiting both work on the same principle of reducing an audio signal’s dynamic range. Limiting is compression with a very high (often infinite) ratio. Compressors reduce signal levels above a set threshold by a ratio. Limiters are designed to set a maximum output level.
Further reading: What Is The Difference Between Audio Compression & Limiting?
What is upward compression? Upward compression is a type of dynamic range compression that boosts the amplitude of an audio signal below a certain threshold while maintaining the amplitude above the threshold. Upward compression is available in digital plugins and via parallel compression with hardware or software.
Further reading: What Is Upward Dynamic Range Compression In Audio?