I have developed a preset which keeps dynamic range (per song) as much inline with the source as reasonably possible. This will keep the "kick" on snare and bass drums intact. This will also maintain an audio sound similar to the artist's/mastering engineer's intent, while normalizing the audio between tracks.
aacPlus Processing:
When preprocessing audio for aacPlus, especially at lower bitrates, it is important to understand how the CODEC behaves under compression and limiting environments to prevent artifacting. With aacPlus (V1 or V2), there are a few types of artifacting which can occur under certain audio conditions, primarily due to the Spectral Band Replication (SBR) algorithm which reproduces high frequencies using a noise generator.
What is SBR? Simply put, SBR takes high frequencies, translates them algorithmically onto the lower frequency range. Upon encoding, the CODEC encodes only the lower frequencies (with the higher frequency content riding "on top"). Upon decoding, the CODEC decodes the lower frequencies, and simultaneously translates the high frequency content back to the high frequency range as appropriate. The high frequency content has to be reproduced with a noise generator since the true content was never actually encoded.
So, what artifacts actually arise? First and foremost, there is a phenomenon I call percussion "slush". Percussion "slush" occurs primarily when a loud bass tone is played, and the peak of it coincides with a loud peak of something in the SBR range. Any cymbals will then sound like they aren't cymbals but actually sheets of water being smashed with a mallet. You can avoid this by processing the sharper peaks of high frequency audio with quick and abrupt clipping. Or, you can attenuate the audio to reduce the peaks and eliminate the phenomenon overall. As you can see with my algorithm, I am employing both techniques. The upper three bands are the ones primarily in question; the uppermost and lowermost of those bands are being attenuated; the midmost of those bands are being clipped sharply as appropriate. This band represents a lot of the higher frequency content; the nature of the content in that band tends to be momentary, so slushiness isn't as noticeable there. Furthermore, attenuating that band too drastically will create a "muffled" high frequency sound. This "muffled" sound will fatigue listeners over time. Thus, I think clipping is the appropriate way to optimize that band for encoding, and it serves as an effective de-esser too.
Secondly, there is a phenomenon I call "hi-hat inconsistency". This is a phenomenon which occurs when a hi-hat is played back rapidly and repetitively; the hi-hat sounds going into the encoder sound the same, but the hi-hat sounds going out of the encoder each sound different. Their amplitudes and phases will differ from one another. This occurs when there is a large amount of bass being encoded while hi-hats are being encoded simultaneously. The demand on the CODEC is too great, and the percussion suffers through aacPlus's lossy algorithm. Attenuating these hi-hats as appropriate is important to help avoid this. Thus, pre-limiting is used to adjust the gain of the bass and treble to help unify the response between songs, and the multiband compression levels are set to coordinate with this accordingly to avoid this issue. Centering bass is also another key aspect in alleviating this issue.
Thirdly, there is a phenomenon I call "booosh woooosh fooosh chee chee chee shhhhhhhhhhhhhh". This is a phenomenon which occurs when you aggressively compress the higher frequencies and cause a great amount of intermodulation distortion. This is another issue inherent to SBR where the pre-limiter (handling high frequency content gently before multiband compression) helps drastically. If you rapidly change the volume of an audio band, you are pushing the amplitude of the audio band up and down accordingly. This practice in and of itself can cause intermodulation, as you have now created a new audio crest or trough, which can create coding nightmares with high frequency content, as these issues will arise in frequencies slightly higher than those you are manipulating. Thus, hard clipping is limited to lower frequency content (except that solitary high frequency band mentioned above).
Finally, specifically with aacPlus V2 and "Parametric Stereo" (PS), there are unique issues which can arise pertaining to aggressive phase manipulation. Audio nullification ? audio "cancelled out" ? can occur from improper phase manipulation. Phase and width settings must be utilized with care, or you could potentially cancel out voices in the encoding process (but you will hear them prior to encoding!) Obviously, unless you want to lose your vocal section ? this is undesirable. Parametric Stereo is a "steering" algorithm; in aacPlus V2 with PS, the audio is encoded as monaural. Steering information is added to the audio stream to guide frequencies as appropriate to the left and right channels. This can double effective coding bitrate based on the source audio. For example, without parametric stereo, aacPlus audio at 40kbps "Stereo" would be treated as two 20kbps "monaural" streams regardless of the source content being monaural or stereo at that time. With "Parametric Stereo", however, audio encoded at 40kbps being monaural at that moment will get 40kbps of audio bandwidth. Once stereo content plays, about 3kbps of audio is utilized to steer that audio left or right, as appropriate. The monaural portion of the content still gets 37kbps of audio bandwidth, resulting in overall higher audio fidelity.
Also, there is "squeakiness". This is where midrange content can squeak when the phase is misaligned, such as with many lower bitrate (~128kbps) MP3s. The azimuth settings alleviate this issue almost completely.
The overall algorithm provides as minimal manipulation of the source audio as possible; the average user shouldn't notice substantial manipulation of the audio aside from normalization and an overall pleasant listening experience, especially at lower audio bitrates of 24kbps, 32kbps and 40kbps (bitrates frequently employed by streams across the internet, and also on XM satellite radio, which does a hardware version of what I am accomplishing here).
I have attached audio waveforms prior to processing, after processing but prior to encoding, and after encoding to show its effectiveness. Feel free to use these to help you with improving sound quality. I will update this preset accordingly as I discover new tricks to aacPlus processing.
The topmost waveform is the source audio. The center waveform is the audio after processing, but prior to encoding. The bottom waveform is the audio after processing and encoding; the CODEC utilized was 32kbps aacPlus V2 with Parametric Stereo enabled.
Hvz, if you like this concept and approach, feel free to implement this as a built-in preset for aacPlus streaming!
PRESET [aacPlus CODEC Optimization]:
Code:
[Common]
Pre amplifier=2.800000191
Post amplifier=0.899999976
Extra loudness=1
Hard limit output=1
Downsample very high input sample rates to near 44.1 kHz=1
Process for low latency=0
Mode=Advanced
[Noise Gate]
Enabled=1
Difference=0
Noise level=2
[Singleband Compressor]
Enabled=0
Difference=0
Maximum volume=10
Maximum value=32767
Attack speed=0.999998987
Decay speed=0.999000013
Above Top Limiter=1
[Pre Compressor]
Enabled=1
Difference=0
Delay enabled=0
Maximum volume - Band 1=20500
Maximum volume - Band 2=17500
Attack speed - Band 1=0.000002059
Attack speed - Band 2=0.000002059
Decay speed - Band 1=0.013692533
Decay speed - Band 2=0.013600473
[Multiband Compressor]
Enabled=1
Difference=0
Delay enabled=0
Very high quality enabled=1
Maximum volume - Band -1=6950
Maximum volume - Band 0=6500
Maximum volume - Band 1=4850
Maximum volume - Band 2=4600
Maximum volume - Band 3=3650
Maximum volume - Band 4=2750
Maximum volume - Band 5=2000
Maximum volume - Band 6=2900
Maximum volume - Band 7=3950
Maximum volume - Band 8=2400
Attack speeds linked=1
Attack speed - Band -1=0.00150717
Attack speed - Band 0=0.00150717
Attack speed - Band 1=0.00150717
Attack speed - Band 2=0.00150717
Attack speed - Band 3=0.00150717
Attack speed - Band 4=0.00150717
Attack speed - Band 5=0.00150717
Attack speed - Band 6=0.00150717
Attack speed - Band 7=0.00150717
Attack speed - Band 8=0.00150717
Decay speeds linked=1
Decay speed - Band -1=0.000314613
Decay speed - Band 0=0.000314613
Decay speed - Band 1=0.000314613
Decay speed - Band 2=0.000314613
Decay speed - Band 3=0.000314613
Decay speed - Band 4=0.000314613
Decay speed - Band 5=0.000314613
Decay speed - Band 6=0.000314613
Decay speed - Band 7=0.000314613
Decay speed - Band 8=0.000314613
Above Top Limiter=1
Clipping enabled=1
Postprocessing enabled=1
Relative clip position - Band -1=1.352941036
Relative clip position - Band 0=1.352941036
Relative clip position - Band 1=1.150537729
Relative clip position - Band 2=1.150537729
Relative clip position - Band 3=1.352941036
Relative clip position - Band 4=-1
Relative clip position - Band 5=-1
Relative clip position - Band 6=-1
Relative clip position - Band 7=0.600000024
Relative clip position - Band 8=-1
Final limiter value=0.556199968
Final limiter decay speed=0
Final limiter clipping=1
Equalizer enabled=1
Equalize before multiband-compression=1
Equalizer position - Band -1=3
Equalizer position - Band 0=2.333333254
Equalizer position - Band 1=1.500000238
Equalizer position - Band 2=1.500000238
Equalizer position - Band 3=1.409638405
Equalizer position - Band 4=1
Equalizer position - Band 5=0.801801801
Equalizer position - Band 6=0.801801801
Equalizer position - Band 7=0.754385948
Equalizer position - Band 8=0.754385948
[Stereo]
Enabled=1
Delay enabled=0
Difference=0
Center bass=1
AZIMUTH limit=60.979999542
AZIMUTH change speed=0.200000003
Image phase amplifier=1.549999952
Image phase amplifier maximum angle=126
Image phase amplifier maximum separation strength=60.86000061
Image width amplifier=1.299999952
Extra phase shift=0
Mono or stereo only=-0.550000012
[Channel Delay]
Enabled=0
Left Delay=0
[Output Filter]
Enabled=1
Lowpass filter=16500
[Final Pre-Limiter]
Enabled=1
Difference=0
Pre-amp=1.381183982
Response time=0.800000012
[Final Limiter]
Enabled=1
Difference=0
Pre-amp=1.000255942
Response time=0.0125
[FM Transmitter]
Enabled=0
Pre-emphasize=0
Pre-emphasis time=50
Output is pre-emphasized=0
Stereo encoder enabled=0
RDS encoder enabled=0
RDS PS text=2s:STEREO/2s:TOOL/<1=1.5s,2..-2=2t,-1=1.5s:WWW.STEREOTOOL.COM
RDS RadioText text=60s:Stereo Tool: Professional Audio Processing - http://www.stereotool.com/30s:Stereo Tool by Hans van Zutphen, 1999-2008 - http://www.stereotool.com
RDS PTY=0
RDS PI=65535
RDS Alternative frequency 1=0
RDS Alternative frequency 2=0
RDS Alternative frequency 3=0
RDS Alternative frequency 4=0
RDS Alternative frequency 5=0
RDS Alternative frequency 6=0
RDS Alternative frequency 7=0
RDS Alternative frequency 8=0
RDS Alternative frequency 9=0
RDS Alternative frequency 10=0
RDS Alternative frequency 11=0
RDS Alternative frequency 12=0
RDS Alternative frequency 13=0
RDS Alternative frequency 14=0
RDS Alternative frequency 15=0
RDS Alternative frequency 16=0
RDS Alternative frequency 17=0
RDS Alternative frequency 18=0
RDS Alternative frequency 19=0
RDS Alternative frequency 20=0
RDS Alternative frequency 21=0
RDS Alternative frequency 22=0
RDS Alternative frequency 23=0
RDS Alternative frequency 24=0
RDS Alternative frequency 25=0
RDS TP=0
RDS TA=0
RDS Music=1
RDS Artificial Head=0
RDS Compressed=1
RDS Dynamic PTY=0
RDS RadioText Enabled=1
RDS ClockTime Enabled=1
[Direct soundcard access]
Enabled=0
Device ID=
Volume=1
Buffer size=1
Send to Winamp=Nothing
ASIO Override channel 1=4
ASIO Override channel 2=5
[Low latency output]
Enabled=0
Device ID=
Volume=1
Buffer size=0.079999998
ASIO Override channel 1=2
ASIO Override channel 2=3