Quote:
But, is the SSE3 version (without IPP) faster on your system?
I'm confused here. "Without IPP"????
What I specifically asked you to try was to try a build using the Intel C++ Compiler, requesting SSE3, that was linked to the 7.0 or 7.1 IPP. That combination will not generate SSE3 code for any of the IPP functions because with IPP 7.x, there is no SSE3 support with 32-bit compilations, only with 64-bit.
To directly answer your question to me, no, what you posted was no faster. The problem is that both what I asked you to do, which I more clearly defined above, AND what you just said ("without IPP"), will only give SSE3 code for those parts of the application that do NOT use IPP.
As I understand things, if you issue a SSE3 compile with the following conditions:
- IPP version 7.0.x or 7.1.x
- Static linking WITH dispatching
...what you will get out of IPP are the "w7" optimizations. The w7 optimizations are stated as SSE2.
If you issue a SSE3 compile with the following conditions:
- IPP version 6.1.x
- Static linking WITH dispatching
...what you will get out of IPP are the "t7" optimizations. The t7 optimizations are stated as SSE3.
So, even though my first response to your question is "no", the way that you did the compiling is VERY, VERY, VERY important, specifically the version of IPP that you used. If you did as I requested and used a NEWER IPP, then *ANY* code compiled for 32-bit requesting SSE3 will only get SSE2 out of IPP.
In order to get SSE3 from IPP with a 32-bit build, you MUST use IPP version 6.1.x or lower.
Edit:
Looking around further, I've noticed even older posts from Vladimir and Ying (Intel employees responding in the forum) that are adamant about not supporting optimizations on non-Intel systems. At the time (2004), they insisted the "proper" thing to do was if the CPU was detected as non-Intel, then the "generic" code would be executed. At that time, "generic" meant code path "px", which is C-compliant IA-32. In other words, bog slow.
Bottom line is that I would NOT trust IPP 7.x.x to be AMD-friendly, except for the absolute newest of AMD processors that support SSE4.x. What was done in IPP 7.x.x was very likely their way to circumvent the judicial system that instructed them not to intentionally cripple non-Intel systems.