Same track (Timbaland - Carry Out)
B127 with W flag - 2m 29s
N flag test version - 2m 28s
Statistically a tie. Whatever I'm remembering about the N flag and SETI (or some other BOINC project), it doesn't seem to make a difference here.
Per an AMD document, located here:
http://support.amd.com/us/Processor_TechDocs/32035.pdf
...they suggest, for Intel 32-bit compiles on Windows:
3.11.2 Generic Performance Switches
Use of the -QxW -Qipo -O3 switches are recommended for Intel compiler version 10.0.
The -QxW switch instructs the compiler to optimize for Pentium 4 processor (including SSE2
instructions).
The -Qipo switch enables interprocedural (across multiple source files) analysis.
The -O3 optimizes for speed and includes several aggressive optimizations.
...because further on, they state that -QxN is "unsafe" and could lead to unexpected crashes.
O3 has a disclaimer in the Intel documentation (
http://cache-www.intel.com/cd/00/00/34/ ... 347599.pdf ) that it may not be better than O2 (the default) in some instances, and in some instances could be slower.
O3
Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:
• Loop unrolling, including instruction scheduling

• Code replication to eliminate branches
• Padding the size of certain power-of-two arrays to allow more efficient cache use.
On Windows systems, the O3 option sets the /GF (/Qvc7 and above), /Gf (/Qvc6 and below), and /Ob2 option.
On Linux and Mac OS systems, the O3 option sets option -fomit-frame-pointer.
On systems using IA-32 and Intel® 64 architecture, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. On systems using IA-64 architecture, the O3 option enables optimizations for technical computing applications (loop-intensive code): loop optimizations and data prefetch.
The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
So, now we've come...to the end of the road...
