Quote:
Quote:
If changing out the linking to the IPP libraries is deemed too time-consuming and/or not worth it, as an alternative, could a plain x86 DSP build (FPU or whatever it's called) be made with the current IPP linking? This test is to see if SSE is completely eliminated, do AMD platforms perform worse?
SSE was technically introduced by Intel to compensate for their FPU performance vs. AMD K7 and K8. The K7/K8 FPUs (3 of them) are massively better than Intel P4. What I noticed in profiling with Code Analyst was that the most significant stalling point was due to the FPU units being full. If performance is generally the same, then odds are that the compiler is steering non-Intel systems down the slowest path.
I tried, and even my i7 can barely handle this version. So unless you think your k8 is faster than an i7 this is useless...
After you said you'd do a generic build, I wanted to reply to make sure I clarified things a bit more, but I didn't have a chance to do that until now.
Also, I'd like to make absolutely sure that you understand that what I'm trying to test here isn't exclusively for K8, but would apply for all AMD processors, even a brand new one bought today, so long as you're using the 10.1 compiler.
So, no, I don't think a k8 would be better than an i7 on straight FPU code. That said though, I may have had you underperform too much.
From what I understand, by reading Agner Fog's blog (nice rhyme, eh?), and reading pages 132-137 of this document:
http://www.agner.org/optimize/optimizing_cpp.pdf
... the Intel C++ Compiler does not treat non-Intel processors fairly, while the IPP libraries (at least through 6.1) do treat non-Intel fairly. What I was trying to have you do was a compiler test, but not both the compiler and IPP. Yeah, I know I was just complaining about mixed-case situations, but it is exactly that which I'm trying to test for, i.e. if you request a SSE2 build from the compiler and IPP, IPP will give SSE2, but the compiler may not.
So, based on your previous comments, I gather it is possible to set the compiler different from IPP. What I'm most curious about, and again bear in mind that this applies to all AMD systems, not just the older K8, is if the 10.1 compiler is treating us fairly.
Also, you should be fully aware that your i7 will indeed see a performance degradation by setting up generic IA-32 for the compiler, but SSE2 for IPP. That would be completely normal. My testing here however is not for Intel systems, but for AMD systems. Your Intel system gets treated fairly by the 10.1 compiler. Forcing the level for you lower is what may be automatically happening to those of us on AMD processors, and you have no way to test for or ever see it happen.
Does that make more sense to you now, and do you see why it is of larger scope than just me and my system?