All times are UTC+02:00




Post new topic  Reply to topic  [ 1012 posts ]  Go to page Previous 174 75 76 77 78102 Next
Author Message
PostPosted: Mon Mar 11, 2013 10:05 pm 

Joined: Fri Jan 25, 2013 1:24 pm
Posts: 156
Location: Germany
Quote:
I don't think so, i have many many schematics where hardware RDS and stereo decoders are completly independent, maybe in Germany was something different... I don't know... In Poland it's doesn't matter 90 or 0, some, station has 0 some 90. Specification allows RDS subcarrier 0/180 and +/-90 to 3rd harmonic of pilot tone, +/-90 is better.
AFAIK there is no (longer a) rule whether 90° or 0° here in Germany. I've analysed some stations with the Pira around here. Some are 90°, most are 0° and one was even between (should I contact them? :mrgreen: ). It might be that the old ARI (http://en.wikipedia.org/wiki/Autofahrer ... ionssystem) required a special phase degree, but this system was switched off years ago and was replaced completely by the RDS-TP and -TA-Feature.
When we license our station year in and year out, the regulation authority doesn't care what we do in and with our MPX, unless the max. deviation is <=75 kHz and the MPX-Power is <=0dBr - exept we want to use RDS-TA. Then, the local police office has to be informed...


Top
   
PostPosted: Tue Mar 12, 2013 1:00 am 

Joined: Sun Dec 12, 2010 2:26 pm
Posts: 885
Quote:
@Brian: I have checked all the vectorization reports of performance sensitive functions, and for nearly all of them I either made them vectorize or I understand why they aren't.
So, you now have the explanation for this and understand why it happens????

http://software.intel.com/en-us/forums/topic/346811
Quote:
I'm trying to squeeze the last bit of optimization out of a program using Intel C++ 10.1 (because with later versions I'm getting slower code - I'll look into that later).

When looking at the vectorization reports, I noticed 2 things I hadn't expected, and I wonder if they can be solved (without rewriting lots of code - total code base is over 2 MB and I'm working on it alone). I've tried to google them but didn't find any useful answers.

This one seems to be the most important:

fft_abs_sse2[2*cc] = max(fft_abs_sse2[2*cc], strength * m);

.\Clip1Ch.cpp(1999): (col. 13) remark: vector dependence: proven ANTI dependence between fft_abs_sse2 line 1999, and fft_abs_sse2 line 1999.
.\Clip1Ch.cpp(1999): (col. 13) remark: vector dependence: proven ANTI dependence between fft_abs_sse2 line 1999, and fft_abs_sse2 line 1999.
.\Clip1Ch.cpp(1999): (col. 13) remark: vector dependence: proven FLOW dependence between fft_abs_sse2 line 1999, and fft_abs_sse2 line 1999.
.\Clip1Ch.cpp(1999): (col. 13) remark: vector dependence: proven FLOW dependence between fft_abs_sse2 line 1999, and fft_abs_sse2 line 1999.
.\Clip1Ch.cpp(1999): (col. 13) remark: vector dependence: proven ANTI dependence between fft_abs_sse2 line 1999, and fft_abs_sse2 line 1999.
...

While I know that there's an _mm_max_ SIMD instruction. Problem might be the definition of max, I'm using:
#define max(a,b) (((a)>(b)) ? (a) : (b))
The compiler might see this as an if instruction if it's unable to optimize everything out. Is there a better definition for max that doesn't cause the compiler to see dependencies where there are none?

Another situation that occurs very frequently in my code is this:

for (int c=0; c<f1; c++)
{
temp[2*c] *= one_DIV_bass_static_clip_level_dynamic;
temp[2*c+1] *= one_DIV_bass_static_clip_level_dynamic;
}

Clearly, there are no dependencies between temp[2*c] and temp[2*c+1], but the compiler thinks otherwise:

.\Clip1Ch.cpp(797): (col. 9) remark: loop was not vectorized: existence of vector dependence.
.\Clip1Ch.cpp(800): (col. 13) remark: vector dependence: proven FLOW dependence between temp line 800, and temp line 799.
.\Clip1Ch.cpp(800): (col. 13) remark: vector dependence: proven ANTI dependence between temp line 800, and temp line 799.
.\Clip1Ch.cpp(800): (col. 13) remark: vector dependence: proven OUTPUT dependence between temp line 800, and temp line 799.

I think if these two situations are solved at least 50% of the loops that currently don't get vectorized will be. Your help is greatly appreciated :)
Quote:
The changes I made today should give a reduction of about 4% in the total CPU load (on the most active CPU core; reduction should be bigger on a single core system).
BETA052 resulted in 3-5% decrease. So, no, not any bigger than what you said for that one. You should also note that you said 4% via your writing to a file method of testing, but my 3-5% (average of 4) was from simply looking at Task Manager. IOW, it's not always inaccurate... Looking at things through ProcExp, I'm not seeing any DPC activity either. Never have. There's some interrupt stuff every now and then, but it's minimal.

I still believe that your code on K8 is cache (aka core) clock dependent for most things, with benefit to the Opteron and X2 line with whatever supports multicore processing.

:arrow: The reason why I am wanting to profile on my system is because my speculation is that there are alternative SIMD / assembly instructions that would help K8, but be performance-neutral to newer processors.

That's why I asked about Scalar vs. Packed. It's also why I've mentioned the MOVNTPS instruction. MOVNTPS helps minimize cache pollution, which would be beneficial to all systems, if you can use it, which you may or may not be able to.


Top
   
PostPosted: Tue Mar 12, 2013 1:20 am 
Site Admin
User avatar

Joined: Mon Mar 17, 2008 1:40 am
Posts: 11425
Quote:
So, you now have the explanation for this and understand why it happens????
No, I worked around it.
Quote:
It's also why I've mentioned the MOVNTPS instruction. MOVNTPS helps minimize cache pollution, which would be beneficial to all systems, if you can use it, which you may or may not be able to.
I almost never write data to memory that I don't need very quickly afterwards. So I don't expect much from this (I have used it at work in the past, so yes, I know what it does. But it's mainly useful if you're working on large amounts of data, not for short blocks of audio data.)


Edit: Just to make sure: This 3-5% is what you get when you divide the old CPU load by the new one? Absolute numbers don't mean much, I'm of course talking about relative changes. Task Manager is often accurate, but sometimes it's not at all, and I really don't know when to trust it and when not.


Top
   
PostPosted: Tue Mar 12, 2013 2:11 am 

Joined: Sun Dec 12, 2010 2:26 pm
Posts: 885
Quote:
Quote:
So, you now have the explanation for this and understand why it happens????
No, I worked around it.
OK. Good to know. If I had known that, I wouldn't have been pressuring as much, but all I had to go on was your last post over there...
Quote:
I almost never write data to memory that I don't need very quickly afterwards. So I don't expect much from this (I have used it at work in the past, so yes, I know what it does. But it's mainly useful if you're working on large amounts of data, not for short blocks of audio data.)
I tended to doubt it could be used, but worth a shot. Semi-related to that is SFENCE, but again, not sure if it would help.

Quote:
Edit: Just to make sure: This 3-5% is what you get when you divide the old CPU load by the new one? Absolute numbers don't mean much, I'm of course talking about relative changes.
No, I did give you absolute. Relative would be 5-8%. Still not a huge delta, given that I'm already pushing up near 80 if quality is set to 100.
Quote:
Task Manager is often accurate, but sometimes it's not at all, and I really don't know when to trust it and when not.
I see your point, which is why I pointed you in the direction of Process Explorer and Process Monitor.


Top
   
PostPosted: Tue Mar 12, 2013 2:28 am 
Site Admin
User avatar

Joined: Mon Mar 17, 2008 1:40 am
Posts: 11425
Next version will be about 7-10% faster than 052, measured again with PhantomFM's 80s preset. (I'm getting some jitter in my measurements, and it's too late to repeat it a lot of times).

Those are again relative numbers. Difference on single core systems is probably again bigger.

By the way, 45% of the CPU cycles are now spent in an external Intel library that I have no influence on (except calling it less often of course), so improving things is getting increasingly harder.

Improvements in this new version are in the compressors and in the advanced clipper.


Top
   
PostPosted: Tue Mar 12, 2013 3:15 am 
User avatar

Joined: Wed Jun 16, 2010 4:30 pm
Posts: 600
Location: Buenos Aires, Argentina
All optimizations are welcome, and in the beta 52 is notice some of them, my system will thank you. ;)

Take this opportunity to make a comment about the activation of Side Chain Compressor both the AGC and the SingleBand.
I think it confuses the fact activation, first "Use Side Chain" and the other "PEQ Sidechain". I think that when you activate the "Use Side Chain", you should activate the "PEQ Sidechain" or directly take this last check, because without the EQ sidechain is meaningless.

Another control display has a problem, is the "Drive" the MB, where the maximum is 42 dB, and put half the control is 36 dB. Is there a problem of scaling or calculation of dB. With the "Output level" goes something like this.

_________________
by GAP
"Less is More" (Bob Katz)


Top
   
PostPosted: Tue Mar 12, 2013 7:27 am 

Joined: Sun Dec 12, 2010 2:26 pm
Posts: 885
Quote:
By the way, 45% of the CPU cycles are now spent in an external Intel library that I have no influence on (except calling it less often of course), so improving things is getting increasingly harder.
Might be time to revisit the 13 vs. 10.1 compiler situation. I mean for future enhancements, not for this release. Then with that, I also don't mean 5-10 releases down the road, but stopping after this release and trying to figure out that issue.


Top
   
PostPosted: Tue Mar 12, 2013 9:22 am 
Site Admin
User avatar

Joined: Mon Mar 17, 2008 1:40 am
Posts: 11425
Another performance improvement (probably about 9% compared to previous beta!)
Stand alone: http://www.stereotool.com/download/ster ... 04-053.exe
Winamp DSP: http://www.stereotool.com/download/dsp_ ... 04-053.exe
VST: http://www.stereotool.com/download/vst_ ... 04-053.dll


Top
   
PostPosted: Tue Mar 12, 2013 9:27 am 
Site Admin
User avatar

Joined: Mon Mar 17, 2008 1:40 am
Posts: 11425
Quote:
Quote:
By the way, 45% of the CPU cycles are now spent in an external Intel library that I have no influence on (except calling it less often of course), so improving things is getting increasingly harder.
Might be time to revisit the 13 vs. 10.1 compiler situation. I mean for future enhancements, not for this release. Then with that, I also don't mean 5-10 releases down the road, but stopping after this release and trying to figure out that issue.
I could try that later. Just want to mention that chances are that for older systems, it kinda makes sense that performance is getting worse, Intel is dropping support for older CPU's pretty fast. For example, the library that I use (IPP) which is now using 43% (not 45) of the total CPU load is a version from several years ago, it's the last version with properly optimized SSE2 support. Newer versions "assume" that you have at least SSE3, and if you don't you get some fallback SSE2 code that is not very well optimized anymore. :shock:


Top
   
PostPosted: Tue Mar 12, 2013 10:06 am 
User avatar

Joined: Fri Jan 27, 2012 10:36 am
Posts: 178
Location: den Helder / The Netherlands
Quote:
Another performance improvement (probably about 9% compared to previous beta!)
Stand alone: http://www.stereotool.com/download/ster ... 04-053.exe
Winamp DSP: http://www.stereotool.com/download/dsp_ ... 04-053.exe
VST: http://www.stereotool.com/download/vst_ ... 04-053.dll
For me, a 4% inprovement so total CPU load is now 29%! Well done !

_________________
----------------------------------------------------------------------------
AIRCHAIN-GURU professional independant airchain consultancy.
Orban/Omnia/Vorsis/DSPX/Aphex/Inovonics
----------------------------------------------------------------------------


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 1012 posts ]  Go to page Previous 174 75 76 77 78102 Next

All times are UTC+02:00


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Limited