Stereo Tool
https://forums.stereotool.com/

Stereo Tool 7.03 BETA
https://forums.stereotool.com/viewtopic.php?t=4448
Page 79 of 102

Author:  Brian [ Tue Mar 12, 2013 11:15 pm ]
Post subject:  Re: Stereo Tool 7.03 BETA

Quote:
Wow, not THAT is surprising! I've tried the SSE3 and even the SSE4.1 version in the version of IPP that I currently use, and when I did I could not measure any performance difference compared to the SSE2 version, and the same was true with compiler settings (although, with all the optimizations going on now it might be interesting to retest that one).
I'm assuming "not" was a typo and you meant "Wow, now THAT is surprising!"...?

The only thing that later K8 processors don't have are MONITOR and MWAIT, and that's because those instructions are specific to Intel's Hyperthreading.

However, the bad news is that Windows XP may not support SSE3. I'm still trying to figure that out for sure. If that is the case, then it wouldn't help me, and you wouldn't be able to see any difference if testing using XP, but people on Vista and newer might.

Author:  hvz [ Tue Mar 12, 2013 11:22 pm ]
Post subject:  Re: Stereo Tool 7.03 BETA

I'll compile an SSE2 and SSE4 version and compare the assembly. Previously they were nearly identical (maybe 2 places in the whole thing where it was different).

Author:  Brian [ Tue Mar 12, 2013 11:31 pm ]
Post subject:  Re: Stereo Tool 7.03 BETA

Quote:
I'll compile an SSE2 and SSE4 version and compare the assembly. Previously they were nearly identical (maybe 2 places in the whole thing where it was different).
SSE4 wouldn't help K8 systems, and possibly no AMD systems at all, as I think the SSE4.x stuff is Intel only. AMD only has SSE4a, and I doubt the Intel compiler would generate that. It needs to be SSE3.

Edit: SSE4.1 and 4.2 got added on the AMD side only on Bulldozer-based processors and newer. These were released in 2011, and are the FX-xxxx line. So, only really new AMD systems would have SSE4.x support. This again confirms that the best option is to bump up to SSE3. Even if a newer i3/i5/i7 doesn't see any benefit, there could be benefits to older systems, specifically AMD systems. In other words, if it doesn't help your system, but also doesn't hurt it either, it may make sense to offer another build, which is SSE3, in addition to the SSE and SSE2 versions.

As for Windows XP support, Anandtech did tests on the newer K8 cores on Windows XP, so I think OS support is a non-issue. All that needs to be there is the support on the CPU core itself.

Edit 2: This is the output of Coreinfo on my system. Coreinfo is another Microsoft Sysinternals utility that is used to display processor CPUID information. http://technet.microsoft.com/en-us/sysi ... 35722.aspx
Code:
AMD Athlon(tm) 64 Processor 3700+
x86 Family 15 Model 39 Stepping 1, AuthenticAMD
HTT       	-	Multicore
HYPERVISOR	-	Hypervisor is present
VMX       	-	Supports Intel hardware-assisted virtualization
SVM       	-	Supports AMD hardware-assisted virtualization
EM64T     	*	Supports 64-bit mode

SMX       	-	Supports Intel trusted execution
SKINIT    	-	Supports AMD SKINIT

NX        	*	Supports no-execute page protection
SMEP      	-	Supports Supervisor Mode Execution Prevention
SMAP      	-	Supports Supervisor Mode Access Prevention
PAGE1GB   	-	Supports 1 GB large pages
PAE       	*	Supports > 32-bit physical addresses
PAT       	*	Supports Page Attribute Table
PSE       	*	Supports 4 MB pages
PSE36     	*	Supports > 32-bit address 4 MB pages
PGE       	*	Supports global bit in page tables
SS        	-	Supports bus snooping for cache operations
VME       	*	Supports Virtual-8086 mode
RDWRFSGSBASE	-	Supports direct GS/FS base access

FPU       	*	Implements i387 floating point instructions
MMX       	*	Supports MMX instruction set
MMXEXT    	*	Implements AMD MMX extensions
3DNOW     	*	Supports 3DNow! instructions
3DNOWEXT  	*	Supports 3DNow! extension instructions
SSE       	*	Supports Streaming SIMD Extensions
SSE2      	*	Supports Streaming SIMD Extensions 2
SSE3      	*	Supports Streaming SIMD Extensions 3
SSSE3     	-	Supports Supplemental SIMD Extensions 3
SSE4.1    	-	Supports Streaming SIMD Extensions 4.1
SSE4.2    	-	Supports Streaming SIMD Extensions 4.2

AES       	-	Supports AES extensions
AVX       	-	Supports AVX intruction extensions
FMA       	-	Supports FMA extensions using YMM state
MSR       	*	Implements RDMSR/WRMSR instructions
MTRR      	*	Supports Memory Type Range Registers
XSAVE     	-	Supports XSAVE/XRSTOR instructions
OSXSAVE   	-	Supports XSETBV/XGETBV instructions
RDRAND    	-	Supports RDRAND instruction
RDSEED    	-	Supports RDSEED instruction

CMOV      	*	Supports CMOVcc instruction
CLFSH     	*	Supports CLFLUSH instruction
CX8       	*	Supports compare and exchange 8-byte instructions
CX16      	-	Supports CMPXCHG16B instruction
BMI1      	-	Supports bit manipulation extensions 1
BMI2      	-	Supports bit maniuplation extensions 2
ADX       	-	Supports ADCX/ADOX instructions
DCA       	-	Supports prefetch from memory-mapped device
F16C      	-	Supports half-precision instruction
FXSR      	*	Supports FXSAVE/FXSTOR instructions
FFXSR     	*	Supports optimized FXSAVE/FSRSTOR instruction
MONITOR   	-	Supports MONITOR and MWAIT instructions
MOVBE     	-	Supports MOVBE instruction
ERMSB     	-	Supports Enhanced REP MOVSB/STOSB
PCLULDQ   	-	Supports PCLMULDQ instruction
POPCNT    	-	Supports POPCNT instruction
SEP       	*	Supports fast system call instructions
LAHF-SAHF 	*	Supports LAHF/SAHF instructions in 64-bit mode
HLE       	-	Supports Hardware Lock Elision instructions
RTM       	-	Supports Restricted Transactional Memory instructions

DE        	*	Supports I/O breakpoints including CR4.DE
DTES64    	-	Can write history of 64-bit branch addresses
DS        	-	Implements memory-resident debug buffer
DS-CPL    	-	Supports Debug Store feature with CPL
PCID      	-	Supports PCIDs and settable CR4.PCIDE
INVPCID   	-	Supports INVPCID instruction
PDCM      	-	Supports Performance Capabilities MSR
RDTSCP    	-	Supports RDTSCP instruction
TSC       	*	Supports RDTSC instruction
TSC-DEADLINE	-	Local APIC supports one-shot deadline timer
TSC-INVARIANT	-	TSC runs at constant rate
xTPR      	-	Supports disabling task priority messages

EIST      	-	Supports Enhanced Intel Speedstep
ACPI      	-	Implements MSR for power management
TM        	-	Implements thermal monitor circuitry
TM2       	-	Implements Thermal Monitor 2 control
APIC      	*	Implements software-accessible local APIC
x2APIC    	-	Supports x2APIC

CNXT-ID   	-	L1 data cache mode adaptive or BIOS

MCE       	*	Supports Machine Check, INT18 and CR4.MCE
MCA       	*	Implements Machine Check Architecture
PBE       	-	Supports use of FERR#/PBE# pin

PSN       	-	Implements 96-bit processor serial number

PREFETCHW 	*	Supports PREFETCHW instruction

Logical to Physical Processor Map:
*  Physical Processor 0

Logical Processor to Socket Map:
*  Socket 0

Logical Processor to NUMA Node Map:
*  NUMA Node 0

Logical Processor to Cache Map:
*  Data Cache          0, Level 1,   64 KB, Assoc   2, LineSize  64
*  Instruction Cache   0, Level 1,   64 KB, Assoc   2, LineSize  64
*  Unified Cache       0, Level 2,    1 MB, Assoc  16, LineSize  64

Author:  hvz [ Wed Mar 13, 2013 1:11 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

I've compared the performance of an SSE4.1 build against the SSE2 build. Difference is small, less than 0.5%, could even be 0. I've glanced at the generated assembly code and it looks like the only thing that changes is the order of instructions.

The difference between SSE2 and SSE3 should be even smaller....

I'll build an SSE3 Winamp plugin version just to be sure, but I don't expect any noticeable difference from it.

Author:  hvz [ Wed Mar 13, 2013 2:18 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

SSE3 Winamp DSP version: http://www.stereotool.com/download/dsp_ ... A-SSE3.exe

I don't expect any significant difference in comparison with version 053, the next build will be SSE2 again unless there's a good reason to change! So if you DO see a difference please let me know! (Also if you don't.)

Author:  Brian [ Wed Mar 13, 2013 4:41 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

Which version of IPP did you use, and are you using Dynamic linking, Static linking (further defined as with or without dispatching), or a customized SO?

Author:  vmp94 [ Wed Mar 13, 2013 5:45 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

The SSE3 version sounds very distorted. The multiband limiters are going crazy. This happens on every single preset. SSE2 works fine.

Author:  Brian [ Wed Mar 13, 2013 7:34 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

Quote:
The SSE3 version sounds very distorted. The multiband limiters are going crazy. This happens on every single preset. SSE2 works fine.
Every one of your own presets, or every one of the built-in presets? If "every one" of the built-in presets, do you truly mean EVERY ONE, including my own (Cobalt) that does not use any of the new stuff added?

Either way, what specific processor do you have, and what specific Operating System?

FYI, I have no problems at all with the SSE3 version, but I have not yet tested with any of the new stuff. Also, I have some theories that will depend on what Hans says the version of IPP he used and how it was linked.

Edit: Tested R-Type. No problems. Need to know exactly what you meant by "every one" and the CPU and OS.

Author:  Brian [ Wed Mar 13, 2013 10:12 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

Meh, I'll just skip waiting for Hans to answer, because I'm exhausted after wading through posts on the Intel forums that have semi-broken English... :|

From what I've been able to determine, the only way to generate SSE3 code in a 32-bit build on AMD systems is to use IPP 6.1. With IPP 7.0, Intel, in their ever-so-restrictive mindset, removed the t7 optimizations. The t7 optimizations were the SSE3 instructions. They made the executive decision that on P4 with SSE3, there was not enough difference between t7 and w7 (SSE2), so they would no longer provide the t7 optimizations on IA-32. Who cares about all the AMD systems that might've done better with t7, right? There was a modest uproar from developers who knew they had AMD end-users, but the only thing told to them was that their feedback would be forwarded and that there *might* be an inclusion of SSE3 inside the w7 optimization. I haven't been able to find any documentation that SSE3 made it into w7 though.

So, my question to Hans about the dynamic vs. static linking had to do with both this, and, if static, if ippInit() is called when it should be.

As an aside, given all the difficulties with migrating from 10.1 to 13.x, and now the 6.x vs. 7.x issue, perhaps looking into the AMD performance libraries (Framewave) might be a worthwhile venture. Their claim is that their libraries do not check for processor manufacturer, only feature flags.

That said though, technically speaking, what Intel did with IPP 7.0 isn't checking manufacturer either, but when they made the decision to ditch SSE3 out of the 32-bit code paths, they knew full well that there were no AMD processors that supported SSSE3 or SSE4.x. In other words, the move can be viewed as a loophole in the AMD settlement where they said they would no longer hinder performance.

Here's a quote from one of their fine forum representatives about "older" systems. Check the bolded and underlined part, and bear in mind that a brand new, top-of-the-line AMD processor at that time in 2010 had no SSE4.1 or 4.2 support.
Quote:
vladimir-dudnik (Intel)
Thu, 06/17/2010 - 09:33

Intel does not sell SSE3 processors anymore. I do not think there is any legal obligation to support end-of-lifed products for any company. Otherwise we just will not be able to deliver new technologies like Westmere or AVX processors (which is coming soon).

The functionality you are looking for still be available in IPP 6.1 product.

By the way, the performance oriented customers migrating to the newest platforms. I personally would not consider those who use old or even end of lifed platforms as performance oriented customers. If they do not care about performance why anyone else should do?

Regards,
Vladimir
Arrogant jerk... Never heard of not having the money I suppose. Also completely ignored (likely intentionally) that there were no AMD systems that could support SSE4.1 or 4.2 at the time. Again, arrogant jerk... Finally, yes, Intel was selling "SSE3" processors, as they are still selling them today. If they weren't, then your i7 processor wouldn't support SSE3, now would it? :roll:

BTW, you participated in that thread...

Author:  hvz [ Wed Mar 13, 2013 10:37 am ]
Post subject:  Re: Stereo Tool 7.03 BETA

Well in the 7.x versions they also stopped providing a fully optimized SSE2 version. And so, this is the reason why I stayed with the latest 6 version.

At least on Intel systems, there is no measurable difference in performance - in the functions that I use! - between the SSE2, SSE3 and SSE4.x versions. That's why I decided to statically link the SSE2 version to reduce executable file size (it more than doubles if I include all versions, even when I'm only using only 2 functions from it).


But, is the SSE3 version (without IPP) faster on your system?

Page 79 of 102 All times are UTC+02:00
Powered by phpBB® Forum Software © phpBB Limited
https://www.phpbb.com/