MasterDigital is a leading provider of audio mastering, preservation and restoration services worldwide.
Audio is our passion.
Based in the New Orleans area. Since 1991.
What Happens to My Recording When It's Played on the Radio?
by Frank Foti, Omnia Audio and Robert Orban, CRL/Orban
Few people in the record industry really know how a radio station processes their material before it hits the FM airwaves. This article's purpose is to remove the many myths and misconceptions surrounding this arcane art.
Every radio station uses a transmission audio processor in front of its transmitter. The processor's most important function is to control the peak modulation of the transmitter to the legal requirements of the regulatory body in each station's nation. However, very few stations use a simple peak limiter for this function. Instead, they use more complex audio chains. These can accurately constrain peak modulation while significantly decreasing the peak-to-average ratio of the audio. This makes the station sound louder within the allowable peak modulation.
Garbage In-Garbage Out
Manufacturers have tuned broadcast processors to process the clean, dynamic program material that the recording industry has typically released throughout its history. (The only significant exception that comes to mind is 45-rpm singles, which often were overtly distorted.) Because these processors have to process speech, commercials, and oldies in addition to current material, they can't be tuned exclusively for "hypercompressed," distorted CDs. Indeed, experience has shown that there's no way to tune them successfully for this degraded material.
For 20 years, broadcast processor designers have known that achieving highest loudness consistent with maximum punch and cleanliness requires extremely clean source material. For more than 20 years, Orban has published application notes to help broadcast engineers clean up their signal paths. These notes emphasize that any clipping in the path before the processor will cause subtle degradation that the processor will often exaggerate severely. The notes promote adequate headroom and low distortion amplification to prevent clipping even when an operator drives the meters into the red.
About three years ago, we started to notice CDs arriving at radio stations that had been pre-distorted in production or mastering to increase their loudness. For the first time, we started seeing frequently reoccurring flat topping caused by brute-force clipping in the production process. Broadcast processors react to pre-distorted CDs exactly the same way as they have reacted to accidentally clipped material for more than 20 years-they exaggerate the distortion. Because of phase rotation, the source clipping never increases on-air loudness-it just adds grunge.
The authors understand the reasoning behind the CD loudness wars. Just as radio stations wish to offer the loudest signal on the dial, it is evident that recording artists, producers, and even some record labels want to have a loud product that stands out against its competition in a CD changer or a music store's listening station.
In radio broadcasting this competition has existed for at least the last 25 years. 25 years ago, radio stations used simple clipping to get louder, and this 25-year-old technique has now migrated to the music industry. The following graphic shows a section of a severely clipped waveform from a contemporary CD. The area marked between the two pointers highlights the clipped portion. This is one of the roots of the problem as described in this paper; the other is excessive digital limiting that does not necessarily cause flat-topping, but still removes transient punch and impact from the sound.
The problem today is that we now have sophisticated and powerful audio processing for the broadcast transmission system and this processing does not coexist well with a signal that has already been severely clipped. Unfortunately, with current pop CDs, the example shown above is more the norm than the exception.
The attack and release characteristics of broadcast multiband compression were tuned to sound natural with source material having short-term peak-to-average ratios typical of vinyl or pre-1990 CDs. Excessive digital limiting of the source material radically reduces this short-term peak-to-average ratio and presents the broadcast processor with a new, synthetic type of source that the broadcast processor handles less gracefully and naturally than it handles older material. Instead of being punchy, the on-air sound produced from these hypercompressed sources is small and flat, without the dynamic contours that give music its dramatic impact. The on-air sound resembles musical wallpaper and makes the listener want to turn down the volume control to background levels.
There is a myth that broadcast processing will affect hypercompressed material less than it will more naturally produced material. This is true in only one aspect-if there is no long-term dynamic range coming in, then the broadcast processor's AGC will not further reduce it. However, the broadcast processor will still operate on the short-term envelopes of hypercompressed material and will further reduce the peak-to-average ratio, degrading the sound even more.
Hypercompressed material does not sound louder on the air. It sounds more distorted, making the radio sound broken in extreme cases. It sounds small, busy, and flat. It does not feel good to the listener when turned up, so he or she hears it as background music. Hypercompression, when combined with "major-market" levels of broadcast processing, sucks the drama and life from music. In more extreme cases, it sounds overtly distorted and is likely to cause tune-outs by adults, particularly women.
A Typical Processing Chain-What Really Goes On When Your Recording is Broadcast
A typical chain consists of the following elements, in the order that they appear in the chain:
1. Phase Rotator. The phase rotator is a chain of allpass filters (typically four poles, all at 200Hz) whose group delay is very non-constant as a function of frequency. Many voice waveforms (particularly male voices) exhibit as much as 6dB asymmetry. The phase rotator makes voice waveforms more symmetrical and can sometimes reduce the peak-to-average ratio of voice by 3-4dB. Because this processing is linear (it adds no new frequencies to the spectrum, so it doesn't sound raspy or fuzzy) it's the closest thing to a "free lunch" that one gets in the world of transmission processing.
There are a few prices to play. In the good old days when source material wasn't grossly clipped, the main price was a very subtle reduction in transparency and definition in music. This was widely accepted as a valid trade-off to achieve greatly reduced speech distortion, because the phase rotator's effects on music are unlikely to be heard on typical consumer radios, like car radios, boom boxes, "Walkman"-style portables, and table radios.
However, with the rise of the clipped CD, things have changed. The phase rotator radically changes the shape of its input waveform without changing its frequency balance: If you measured the frequency response of the phase rotator, it would measure "flat" unless you also measured phase response, in which case you would say that the "magnitude response" was flat and the phase response was highly non-linear with frequency. The practical effect of this non-linear phase response is that flat tops in the original signal can end up anywhere in the waveform after processing. It's common to see them go right through a zero crossing. They end up looking like little smooth sections of the waveform where all the detail is missing-a bit like a scar from a severe burn. This is an apt metaphor for their audible effect, because they no longer help reduce the peak-to-average ratio of the waveform. Instead, their only effect is to add unnecessary grungy distortion.
There has been a myth in the recording world that broadcast processing will modify these clipped, over-compressed CDs less it will modify clean, dynamic CDs. Thanks in part to phase rotation, this myth is absolutely false. In particular, any clipping in the source material causes nothing but added distortion without increasing on-air loudness at all.
2. AGC. The next stage is usually an average-responding AGC. By recording studio standards, this AGC is required to operate over a very wide dynamic range-typically in the range of 25dB. Its function is to compensate for operator errors (in live production environments) and for varying average levels (in automated environments). Average levels vary mainly because the peak to average ratio of CDs themselves has varied so much in the last 10 years or so. Therefore, normalizing hard disk recordings (to use all available headroom) has the undesirable side effect of causing gross variations in average levels. Indeed, 1:1 transfers (which are also common) will also exhibit this variation, which can be as large as 15dB.
The price to be paid is simple: the AGC will eliminate long-term dynamics in your recording. Virtually all radio station program directors want their stations to stay loud always, eliminating the risk that someone tuning the radio to their station will either miss the station completely or will think that it's weak and can't be received satisfactorily. Radio people often call this effect "dropping off the dial."
AGCs can be either single-band or multiband. If they are multiband, it's rare to use more than two bands because AGCs operate slowly, so "spectral gain intermodulation" (such as bass' pumping the midrange) is not as big a potential problem as it is for later compression stages, which operate more quickly.
AGCs are always gated in competent processors. This means that their gain essentially freezes if the input drops below a preset threshold, preventing noise suck-up despite the large amount of gain reduction.
3. Stereo Enhancement. Not all processors implement stereo enhancement, and those that do may implement it somewhere other than after the AGC. (In fact, stand-alone stereo enhancers are often placed in the program line in front of the transmission processor.)
The common purpose of stereo enhancement is to make the signal stand out dramatically when the car radio listener punches the tuning button. It's a technique to make the sound bigger and more dramatic. Overdone, it can remix the recording. Assuming that stereo reverb, with considerable L-R energy, was used in the original mix, stereo enhancement, for example, can change the amount of reverb applied to a center-channel vocalist. The moral? When mixing for broadcast, err on the "dry" side, because some stations' processors will bring the reverb more to the foreground.
Because each manufacturer uses a different technique for stereo enhancement, it's impossible to generalize about it. The only universal constraints are the need for strict mono compatibility (because FM radio is frequently received in mono, even on "stereo" radios, due to signal-quality-trigged mono blend circuitry), and the requirement that the stereo difference signal (L-R) not be enhanced excessively. Excessive enhancement always increases multipath distortion (because the part of the FM stereo signal that carries the L-R information is more vulnerable to multipath). Excessive enhancement will also reduce the loudness of the transmission (because of the "interleaving" properties of the FM stereo composite waveform, which we won't further discuss).
These constraints mean that recording-studio-style stereo enhancement is often incompatible with FM broadcast, particularly if it significantly increases average L-R levels. In the days of vinyl, a similar constraint existed because of the need to prevent the cutter head from lifting off the lacquer, but with CDs, this constraint no longer exists. Nevertheless, any mix intended for airplay will yield the lowest distortion and highest loudness at the receiver if its L-R/L+R ratio is low. Ironically, mono is loudest and cleanest!
4. Equalization. Equalization may be as simple as a fixed-frequency bass boost, or as complex as a multi-stage parametric equalizer. EQ has two purposes in a broadcast processor. The first is to establish a signature for a given radio station that brands the station by creating a "house sound." The second purpose is to compensate for the frequency contouring caused by the subsequent multiband dynamics processing and high frequency limiting. These may create an overall spectral coloration that can be corrected or augmented by carefully chosen fixed EQ before the multiband dynamics stages.
5. Multiband Compression and Limiting. Depending on the manufacturer, this may occur in one or two stages. If it occurs in two stages, the multiband compressor and limiter can have different crossovers and even different numbers of bands. If it occurs in one stage, the compressor and limiter functions can "talk" to each other, optimizing their interaction. Both design approaches can yield good sound and each has its own set of tradeoffs.
Usually using anywhere between four and six bands, the multiband compressor/limiter reduces dynamic range and increases audio density to achieve competitive loudness and dial impact. It's common for each band to be gated at low levels to prevent noise rush-up, and manufacturers often have proprietary algorithms for doing this while minimizing the audible side effects of the gating.
An advanced processor may have dozens of setup controls to tune just the multiband compressor/limiter. Drive and output gain controls for the various compressors, attack and release time controls, thresholds, and sometimes crossover frequencies are adjustable, depending on the processor design. Each of these controls has its own effect on the sound, and an operator needs extensive experience if he or she is to tune a broadcast multiband compressor so that it sounds good on a wide variety of program material without constant readjustment. Unlike mastering in the record industry, in broadcast there's no mastering engineer available to optimize the processing for each new source!
6. Pre-Emphasis and HF Limiting. FM radio is pre-emphasized at 50 microseconds or 75 microseconds, depending on the country in which the transmission occurs. Pre-emphasis is a 6dB/octave high frequency boost that's 3dB up at 2.1kHz (75µs) or 3.2kHz (50µs). With 75µs pre-emphasis, 15kHz is up 17dB!
Depending on the processor's manufacturer, pre-emphasis may be applied before or after the multiband compressor/limiter. The important thing for mixers and mastering engineers to understand is that putting lots of energy above 5kHz creates significant problems for any broadcast processor because the pre-emphasis will greatly increase this energy. To prevent loudness loss, the processor applies high frequency limiting to these boosted high frequencies. HF limiting may cause the sound to become dull, distorted, or both, in various combinations. One of the most important differences between competing processors is how effectively a given processor performs HF limiting to minimize audible side effects. In state-of-the-art processors, HF limiting is usually performed partially by HF gain reduction and partially by distortion-cancelled clipping.
7. Clipping. In most processors, the clipping stage is the primary means of peak limiting. It's crucial to broadcast processor performance. Because of the FM pre-emphasis, simple clipping doesn't work well at all. It produces difference-frequency IM distortion, which the de-emphasis in the radio then exaggerates. (The de-emphasis is flat below 2-3kHz, but rolls off at 6dB/octave thereafter, effectively exaggerating energy below 2-3kHz.) The result is particularly offensive on cymbals and sibilance ("essses" become "efffs").
In the late seventies, one of the authors of this article (R.O.) invented distortion-cancelled clipping. This manipulates the distortion spectrum added by the clipper's action. In FM, it typically removes the clipper-induced distortion below 2kHz (the flat part of the receiver's frequency response). This typically adds about 1dB to the peak level emerging from the clipper, but, in exchange, allows the clipper to be driven much harder than would otherwise be possible.
Provided that it doesn't introduce audibly offensive distortion, distortion-cancelled clipping is a very effective means of peak limiting because it affects only the peaks that actually exceed the clipping threshold and not surrounding material. Accordingly, clipping does not cause pumping, which gain reduction can do, particularly when gain reduction operates on pre-emphasized material. Clipping also causes minimal HF loss by comparison to HF limiting that uses gain reduction. For these reasons, most FM broadcast processors use the maximum practical amount of clipping that's consistent with acceptably low audible distortion.
Real-world clipping systems can get very complicated because of the requirement to strictly band-limit the clipped signal to less than 19kHz despite the harmonics that clipping adds to the signal. (Bandlimiting prevents aliasing between the stereo main and subchannel, protects subcarriers located above 55kHz in the FM stereo composite baseband, and protects the stereo pilot tone at 19kHz). Linearly filtering the clipped signal to remove energy above 15kHz causes large overshoots (up to 6dB in worst case) because of a combination of spectral truncation and time dispersion in the filter. Even a phase-linear lowpass filter (practical only in DSP realizations) causes up to 2dB overshoot. Therefore, state-of-the-art processors use complex overshoot compensation schemes to reduce peaks without significantly adding out-of-band spectrum.
Some chains also apply composite clipping or limiting to the output of the stereo encoder. The stereo encoder is the circuit that encodes the left and right channels into the single multiplex signal that drives the transmitter, and it's actually the peak level of this signal that government broadcasting authorities regulate. Composite clipping or limiting has long been a controversial technique, but the latest generation of composite clippers or limiters has greatly reduced the interference problems characteristic of earlier technology.
Broadcast processing is complex and sophisticated, and was tuned for the recordings produced using practices typical of the recording industry during almost all of its history. In this historical context, hypercompression is a short-term anomaly and does not coexist well with the "competitive" processing that most pop-music radio stations use. We therefore recommend that record companies provide broadcasters with radio mixes. These can have all of the equalization, slow compression, and other effects that producers and mastering engineers use artistically to achieve a desired "sound." What these radio mixes should not have is fast digital limiting and clipping. Leave the short-term envelopes unsquashed. Let the broadcast processor do its work. The result will be just as loud on-air as hypercompressed material, but will have far more punch, clarity, and life.
A second recommendation to the record industry is to employ studio or mastering processing that provides the desired sonic effect, but without the undesired extreme distortion component that clipping creates. The alternative to brute-force clipping is digital look-ahead limiting, which is already widely available to the recording industry from a number of different manufacturers (including the authors' companies). This processing creates lower modulation distortion than clipping and also avoids blatant flat-topping of waveforms. Compared to clipping, it is therefore substantially more compatible with broadcast processing. Nevertheless, even digital limiting can have a deleterious effect on sound quality by reducing the peak-to-average ratio of the signal to the point that the broadcast processor responds to it in an unnatural way, so it should be used conservatively. Ultimately, the only way to tell how one's production processing will interact with a broadcast processor is to actually apply the processed signal to a real-world broadcast processor and to listen to its output, preferably through a typical consumer radio.