Stereo Tool
https://forums.stereotool.com/

Stereo Tool 7.03 BETA
https://forums.stereotool.com/viewtopic.php?t=4448
Page 76 of 102

Author:  radiofreak [ Mon Mar 11, 2013 10:05 pm ]
Post subject:  Re: Stereo Tool 7.03 BETA

Quote:
I don't think so, i have many many schematics where hardware RDS and stereo decoders are completly independent, maybe in Germany was something different... I don't know... In Poland it's doesn't matter 90 or 0, some, station has 0 some 90. Specification allows RDS subcarrier 0/180 and +/-90 to 3rd harmonic of pilot tone, +/-90 is better.
AFAIK there is no (longer a) rule whether 90° or 0° here in Germany. I've analysed some stations with the Pira around here. Some are 90°, most are 0° and one was even between (should I contact them? :mrgreen: ). It might be that the old ARI (http://en.wikipedia.org/wiki/Autofahrer ... ionssystem) required a special phase degree, but this system was switched off years ago and was replaced completely by the RDS-TP and -TA-Feature.
When we license our station year in and year out, the regulation authority doesn't care what we do in and with our MPX, unless the max. deviation is <=75 kHz and the MPX-Power is <=0dBr - exept we want to use RDS-TA. Then, the local police office has to be informed...

Author:  Brian [ Tue Mar 12, 2013 1:00 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

Quote:
@Brian: I have checked all the vectorization reports of performance sensitive functions, and for nearly all of them I either made them vectorize or I understand why they aren't.
So, you now have the explanation for this and understand why it happens????

http://software.intel.com/en-us/forums/topic/346811
Quote:
I'm trying to squeeze the last bit of optimization out of a program using Intel C++ 10.1 (because with later versions I'm getting slower code - I'll look into that later).

When looking at the vectorization reports, I noticed 2 things I hadn't expected, and I wonder if they can be solved (without rewriting lots of code - total code base is over 2 MB and I'm working on it alone). I've tried to google them but didn't find any useful answers.

This one seems to be the most important:

fft_abs_sse2[2*cc] = max(fft_abs_sse2[2*cc], strength * m);

.\Clip1Ch.cpp(1999): (col. 13) remark: vector dependence: proven ANTI dependence between fft_abs_sse2 line 1999, and fft_abs_sse2 line 1999.
.\Clip1Ch.cpp(1999): (col. 13) remark: vector dependence: proven ANTI dependence between fft_abs_sse2 line 1999, and fft_abs_sse2 line 1999.
.\Clip1Ch.cpp(1999): (col. 13) remark: vector dependence: proven FLOW dependence between fft_abs_sse2 line 1999, and fft_abs_sse2 line 1999.
.\Clip1Ch.cpp(1999): (col. 13) remark: vector dependence: proven FLOW dependence between fft_abs_sse2 line 1999, and fft_abs_sse2 line 1999.
.\Clip1Ch.cpp(1999): (col. 13) remark: vector dependence: proven ANTI dependence between fft_abs_sse2 line 1999, and fft_abs_sse2 line 1999.
...

While I know that there's an _mm_max_ SIMD instruction. Problem might be the definition of max, I'm using:
#define max(a,b) (((a)>(b)) ? (a) : (b))
The compiler might see this as an if instruction if it's unable to optimize everything out. Is there a better definition for max that doesn't cause the compiler to see dependencies where there are none?

Another situation that occurs very frequently in my code is this:

for (int c=0; c<f1; c++)
{
temp[2*c] *= one_DIV_bass_static_clip_level_dynamic;
temp[2*c+1] *= one_DIV_bass_static_clip_level_dynamic;
}

Clearly, there are no dependencies between temp[2*c] and temp[2*c+1], but the compiler thinks otherwise:

.\Clip1Ch.cpp(797): (col. 9) remark: loop was not vectorized: existence of vector dependence.
.\Clip1Ch.cpp(800): (col. 13) remark: vector dependence: proven FLOW dependence between temp line 800, and temp line 799.
.\Clip1Ch.cpp(800): (col. 13) remark: vector dependence: proven ANTI dependence between temp line 800, and temp line 799.
.\Clip1Ch.cpp(800): (col. 13) remark: vector dependence: proven OUTPUT dependence between temp line 800, and temp line 799.

I think if these two situations are solved at least 50% of the loops that currently don't get vectorized will be. Your help is greatly appreciated :)
Quote:
The changes I made today should give a reduction of about 4% in the total CPU load (on the most active CPU core; reduction should be bigger on a single core system).
BETA052 resulted in 3-5% decrease. So, no, not any bigger than what you said for that one. You should also note that you said 4% via your writing to a file method of testing, but my 3-5% (average of 4) was from simply looking at Task Manager. IOW, it's not always inaccurate... Looking at things through ProcExp, I'm not seeing any DPC activity either. Never have. There's some interrupt stuff every now and then, but it's minimal.

I still believe that your code on K8 is cache (aka core) clock dependent for most things, with benefit to the Opteron and X2 line with whatever supports multicore processing.

:arrow: The reason why I am wanting to profile on my system is because my speculation is that there are alternative SIMD / assembly instructions that would help K8, but be performance-neutral to newer processors.

That's why I asked about Scalar vs. Packed. It's also why I've mentioned the MOVNTPS instruction. MOVNTPS helps minimize cache pollution, which would be beneficial to all systems, if you can use it, which you may or may not be able to.

Author:  hvz [ Tue Mar 12, 2013 1:20 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

Quote:
So, you now have the explanation for this and understand why it happens????
No, I worked around it.
Quote:
It's also why I've mentioned the MOVNTPS instruction. MOVNTPS helps minimize cache pollution, which would be beneficial to all systems, if you can use it, which you may or may not be able to.
I almost never write data to memory that I don't need very quickly afterwards. So I don't expect much from this (I have used it at work in the past, so yes, I know what it does. But it's mainly useful if you're working on large amounts of data, not for short blocks of audio data.)


Edit: Just to make sure: This 3-5% is what you get when you divide the old CPU load by the new one? Absolute numbers don't mean much, I'm of course talking about relative changes. Task Manager is often accurate, but sometimes it's not at all, and I really don't know when to trust it and when not.

Author:  Brian [ Tue Mar 12, 2013 2:11 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

Quote:
Quote:
So, you now have the explanation for this and understand why it happens????
No, I worked around it.
OK. Good to know. If I had known that, I wouldn't have been pressuring as much, but all I had to go on was your last post over there...
Quote:
I almost never write data to memory that I don't need very quickly afterwards. So I don't expect much from this (I have used it at work in the past, so yes, I know what it does. But it's mainly useful if you're working on large amounts of data, not for short blocks of audio data.)
I tended to doubt it could be used, but worth a shot. Semi-related to that is SFENCE, but again, not sure if it would help.

Quote:
Edit: Just to make sure: This 3-5% is what you get when you divide the old CPU load by the new one? Absolute numbers don't mean much, I'm of course talking about relative changes.
No, I did give you absolute. Relative would be 5-8%. Still not a huge delta, given that I'm already pushing up near 80 if quality is set to 100.
Quote:
Task Manager is often accurate, but sometimes it's not at all, and I really don't know when to trust it and when not.
I see your point, which is why I pointed you in the direction of Process Explorer and Process Monitor.

Author:  hvz [ Tue Mar 12, 2013 2:28 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

Next version will be about 7-10% faster than 052, measured again with PhantomFM's 80s preset. (I'm getting some jitter in my measurements, and it's too late to repeat it a lot of times).

Those are again relative numbers. Difference on single core systems is probably again bigger.

By the way, 45% of the CPU cycles are now spent in an external Intel library that I have no influence on (except calling it less often of course), so improving things is getting increasingly harder.

Improvements in this new version are in the compressors and in the advanced clipper.

Author:  gpagliaroli [ Tue Mar 12, 2013 3:15 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

All optimizations are welcome, and in the beta 52 is notice some of them, my system will thank you. ;)

Take this opportunity to make a comment about the activation of Side Chain Compressor both the AGC and the SingleBand.
I think it confuses the fact activation, first "Use Side Chain" and the other "PEQ Sidechain". I think that when you activate the "Use Side Chain", you should activate the "PEQ Sidechain" or directly take this last check, because without the EQ sidechain is meaningless.

Another control display has a problem, is the "Drive" the MB, where the maximum is 42 dB, and put half the control is 36 dB. Is there a problem of scaling or calculation of dB. With the "Output level" goes something like this.

Author:  Brian [ Tue Mar 12, 2013 7:27 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

Quote:
By the way, 45% of the CPU cycles are now spent in an external Intel library that I have no influence on (except calling it less often of course), so improving things is getting increasingly harder.
Might be time to revisit the 13 vs. 10.1 compiler situation. I mean for future enhancements, not for this release. Then with that, I also don't mean 5-10 releases down the road, but stopping after this release and trying to figure out that issue.

Author:  hvz [ Tue Mar 12, 2013 9:22 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

Another performance improvement (probably about 9% compared to previous beta!)
Stand alone: http://www.stereotool.com/download/ster ... 04-053.exe
Winamp DSP: http://www.stereotool.com/download/dsp_ ... 04-053.exe
VST: http://www.stereotool.com/download/vst_ ... 04-053.dll

Author:  hvz [ Tue Mar 12, 2013 9:27 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

Quote:
Quote:
By the way, 45% of the CPU cycles are now spent in an external Intel library that I have no influence on (except calling it less often of course), so improving things is getting increasingly harder.
Might be time to revisit the 13 vs. 10.1 compiler situation. I mean for future enhancements, not for this release. Then with that, I also don't mean 5-10 releases down the road, but stopping after this release and trying to figure out that issue.
I could try that later. Just want to mention that chances are that for older systems, it kinda makes sense that performance is getting worse, Intel is dropping support for older CPU's pretty fast. For example, the library that I use (IPP) which is now using 43% (not 45) of the total CPU load is a version from several years ago, it's the last version with properly optimized SSE2 support. Newer versions "assume" that you have at least SSE3, and if you don't you get some fallback SSE2 code that is not very well optimized anymore. :shock:

Author:  phantomfm [ Tue Mar 12, 2013 10:06 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

Quote:
Another performance improvement (probably about 9% compared to previous beta!)
Stand alone: http://www.stereotool.com/download/ster ... 04-053.exe
Winamp DSP: http://www.stereotool.com/download/dsp_ ... 04-053.exe
VST: http://www.stereotool.com/download/vst_ ... 04-053.dll
For me, a 4% inprovement so total CPU load is now 29%! Well done !

Page 76 of 102 All times are UTC+02:00
Powered by phpBB® Forum Software © phpBB Limited
https://www.phpbb.com/