Windows stand alone:
http://www.stereotool.com/download/ster ... 41-034.exe
Winamp DSP:
http://www.stereotool.com/download/dsp_ ... 41-034.exe
VST32:
http://www.stereotool.com/download/vst_ ... 41-034.dll
Changes:
- Optimization: Removed a few 'division by 0' protections that appear to be unnecessary.
- Optimization: RF spectrum calculation uses lookup table for sin() calculations.
- Optimization/Bug fix: RF spectrum calculation was analysing (and then removing) 4 times more data than it should.
Older changes:
- Optimized clipper bass handling performance. CPU load should be 1-2% lower.
- Optimized RDS encoder performance. CPU load should be about 2% lower.
- Added ASIO logging. Logging is written to C:\temp\asio.log if the directory C:\temp exists.
- Optimized Natural Dynamics and compressors further.
- Fixed ASIO bug introduced in previous beta
- Improved Natural Dynamics performance
- Hm, what about the FM output latency??? -> Fixed
- Reduce stand alone version memory usage (unused low latency thread items can be removed.)
- Stand alone version: OPTIMIZE SOUND CARD HANDLING CODE (USES EXTREMELY AMOUNT OF CPU POWER). Done, CPU load is a few % lower now. More testing needed to make sure no new bugs are introduced!
- For ASIO HQ mode, add redundancy protection. This reduces the effect of hiccups if the buffer is too small. Done for Normal and Low Latency, FM still needed
- Declipper display was broken.
- Lower ASIO latencies possible. Now both for normal and low latency monitoring outputs.
- Fix ASIO buffer size calculation. Probably need an offset to allow easy upgrades from older versions.
- Bugs in beta's after BETA023 should be fixed now. The extra thread is still gone though.
- FM output for input sample rates below 30 kHz is no longer possible. This makes things a lot easier (I know that I will never need more than 4 times upsampling so I don't have to change buffer sizes depending on the input sample rate).
- Get rid of chain2() thread. This should also allow reducing the ASIO latency by 1 step (usually 1.5 ms). Hm.... Or not? I'm confused
Well at least the thread is gone now.
- New ASIO behavior: Push samples, read them back directly from buffer, skip whole Chain2 stuff.
- Old Hard Limit for composite limiting was slightly tighter input level was very high. And had no overshoots; the new one does.
- Lot of things from separate thread moved to main code. This includes a lot of changes that affect different sample rates and might be buggy. -> Was indeed buggy
- Memory usage reduced by 60 MB -> No, was only in my VMware environment. Removed this change. Hm now other people are confirming it. Will add this change again.
- Removed some multi-threading, replaced it by doing everything in the same thread. I would have expected a small deterioration in performance, but on my pc it's actually running faster!
- Memory usage should have been a lot lower but appears to be nearly unaffected. I don't understand why... Oddly, if I turn some compiler optimizations off, it uses about 60 MB less!
- Hard Limit for composite clipper caused very soft clicks every block!!! Also in older versions...
- Composite Limiter was running in a separate thread, and taking 2 ms extra latency. The new version does not do that anymore, and returns a cleaner spectrum, but it requires a bit more horse power from the PC because it doesn't run on a separate CPU anymore.
- Added HQ mode Not available, for testing only
- Improve Multiband3 and Singleband2 limiting and (to a lesser extent) compression for low latency settings. LQ output should sound similar to normal output! Fixing this will also improve audio at lower latency settings. Compressor is probably more or less ok, limiter is pretty horrible, also at lower latencies!
- Fixed Phase Rotation frequency effects at low latencies (need to compensate for loss at certain freqs in low latency modes)
- Fixed AZIMUTH behavior at lower latencies
- AGC behaves slightly differently for lower latencies - Kinda OK. With shorter block size the drop for short spikes is bigger, which leads to a slightly lower overall output level. But I cannot easily fix that. Other differences are fixed now.
- Clipper (probably only ABDP) does not work well for latency 128. Yup -> if I lower the top bass freq from 400 to 200 Hz it's MUCH better. Fixed.
- Something removes low bass in low latency modes. -> EQ and other things. -> Improved. Difference is still large though.
Rewrote LQ Low Latency monitoring to use the normal processing code. Works reasonable, sound resembles that of the normal latency EXCEPT for the bass limiters and to a lesser extent the compressors in the multiband section. Memory usage for plugin version is reduced by more than 20 MB. Stand alone version might use slightly more than before.
- Fixed FOX TV Carbon Coder R128 normalization issue Waiting for feedback.
- Moved a lot of threads into a single thread. Might improve hiccups that some people have reported.
- Added Power Highs (it's in the same window as Power Bass).
- Moved Power Bass and Power Highs to before the wideband AGC to improve volume level consistency.
- Sudden fast rise of bass or highs is limited, new slider 'Release boost' added. I'm not really sure yet if this is ok; if there's a loud high or low sound it can push the band down a lot, and it comes down slower than before. If needed I can add something to allow it to come back faster after a short spike. Waiting for feedback first though.
- Sidechain checkbox removed (without that doesn't exist anymore).
Attempt #2: Redesigned Simple Clipper. Reduced CPU load.
- Reduced the memory usage
- Fixed most of the Stereo Image artifacts!!! "Deprecated" is removed from the sliders that were marked with it. See (*) for a cool new possibility!
- Removed some more unnecessary steps (AZIMUTH 2x, Stereo Boost 2x). 10 remaining.
Fixed 'Post filter for DC offset' problem.
52. Check CPU load. Start with checking if there's anything left that uses the 'unnecessary steps'. Sevdah Web preset: Data still gets converted 58 times... I think I need to do this one first, it should have some effect on the CPU load.
28 removed - next convert the 2 IIR filters so they can be optimized and the merge/split around it can be removed. I'm not measuring any effect from this though (but it makes the code simpler which is also good)
53. Noise Gate/Stereo Boost: Pre-calculate 1-cos() and sqrt() values.
55. Check MemoryPool behavior for cache improvements -> No effect measured, and might make behavior less constant.
56. Check if we can go in opposite direction for each next step to improve cache.
57. Check if lazy reverse FFT is an option. -> No, difficult and gain does not even seem to be measurable.
58. Created a separate class that performs the processing chain. Currently the same code is repeated twice (once for normal processing, once for low latency processing) - which means that a lot of code is duplicated and it's difficult to add extra chains. Most, not all, of that code is now moved elsewhere.
TO BE DONE:
- Asio hiccups protection: FM still needed
- Make 'Hide hiccups' behavior switchable (now enabled for Normal and Low Latency output, not yet for FM).
- Composite Limiter effect no longer visible in GUI. (Is that bad?)
- Remove rest of Chain2
Busy with Calibrate
- Fix test tones in HQ mode (pc and pc2 must be linked)
- Still some weird buffer filling behavior, definitely when using normal (non-ASIO) I/O. But ASIO also has a small offset sometimes.
SOON:
- Add watchdog for stand-alone version. Both built-in and separate.
LATER:
- Make Low Latency monitoring latency configurable (between 128 and 512, current value is 256).
- Add some code to New/Delete in _DEBUG mode to test for never used memory (large blocks filled with 0's).
- Spread over cores is not constant, which causes differences in performance. I *think* it might be the chain2() code that causes this. Actually it might be a good idea to get rid of that completely... Check if this is still the case -> Yes, for some people it is. It does not make any sense though, if you just process without GUI there are exactly 2 threads running, so the load should always be the same (only the threads can move to another CPU core).
- Search for -- loops and try to optimize them
- Search for nested min/max calls and split them