Stereo Tool https://forums.stereotool.com/ |
|
Stereo Tool 7.03 BETA https://forums.stereotool.com/viewtopic.php?t=4448 |
Page 79 of 102 |
Author: | Brian [ Tue Mar 12, 2013 11:15 pm ] |
Post subject: | Re: Stereo Tool 7.03 BETA |
Quote: Wow, not THAT is surprising! I've tried the SSE3 and even the SSE4.1 version in the version of IPP that I currently use, and when I did I could not measure any performance difference compared to the SSE2 version, and the same was true with compiler settings (although, with all the optimizations going on now it might be interesting to retest that one).
I'm assuming "not" was a typo and you meant "Wow, now THAT is surprising!"...?The only thing that later K8 processors don't have are MONITOR and MWAIT, and that's because those instructions are specific to Intel's Hyperthreading. However, the bad news is that Windows XP may not support SSE3. I'm still trying to figure that out for sure. If that is the case, then it wouldn't help me, and you wouldn't be able to see any difference if testing using XP, but people on Vista and newer might. |
Author: | hvz [ Tue Mar 12, 2013 11:22 pm ] |
Post subject: | Re: Stereo Tool 7.03 BETA |
I'll compile an SSE2 and SSE4 version and compare the assembly. Previously they were nearly identical (maybe 2 places in the whole thing where it was different). |
Author: | Brian [ Tue Mar 12, 2013 11:31 pm ] |
Post subject: | Re: Stereo Tool 7.03 BETA |
Quote: I'll compile an SSE2 and SSE4 version and compare the assembly. Previously they were nearly identical (maybe 2 places in the whole thing where it was different).
SSE4 wouldn't help K8 systems, and possibly no AMD systems at all, as I think the SSE4.x stuff is Intel only. AMD only has SSE4a, and I doubt the Intel compiler would generate that. It needs to be SSE3.Edit: SSE4.1 and 4.2 got added on the AMD side only on Bulldozer-based processors and newer. These were released in 2011, and are the FX-xxxx line. So, only really new AMD systems would have SSE4.x support. This again confirms that the best option is to bump up to SSE3. Even if a newer i3/i5/i7 doesn't see any benefit, there could be benefits to older systems, specifically AMD systems. In other words, if it doesn't help your system, but also doesn't hurt it either, it may make sense to offer another build, which is SSE3, in addition to the SSE and SSE2 versions. As for Windows XP support, Anandtech did tests on the newer K8 cores on Windows XP, so I think OS support is a non-issue. All that needs to be there is the support on the CPU core itself. Edit 2: This is the output of Coreinfo on my system. Coreinfo is another Microsoft Sysinternals utility that is used to display processor CPUID information. http://technet.microsoft.com/en-us/sysi ... 35722.aspx Code: AMD Athlon(tm) 64 Processor 3700+ x86 Family 15 Model 39 Stepping 1, AuthenticAMD HTT - Multicore HYPERVISOR - Hypervisor is present VMX - Supports Intel hardware-assisted virtualization SVM - Supports AMD hardware-assisted virtualization EM64T * Supports 64-bit mode SMX - Supports Intel trusted execution SKINIT - Supports AMD SKINIT NX * Supports no-execute page protection SMEP - Supports Supervisor Mode Execution Prevention SMAP - Supports Supervisor Mode Access Prevention PAGE1GB - Supports 1 GB large pages PAE * Supports > 32-bit physical addresses PAT * Supports Page Attribute Table PSE * Supports 4 MB pages PSE36 * Supports > 32-bit address 4 MB pages PGE * Supports global bit in page tables SS - Supports bus snooping for cache operations VME * Supports Virtual-8086 mode RDWRFSGSBASE - Supports direct GS/FS base access FPU * Implements i387 floating point instructions MMX * Supports MMX instruction set MMXEXT * Implements AMD MMX extensions 3DNOW * Supports 3DNow! instructions 3DNOWEXT * Supports 3DNow! extension instructions SSE * Supports Streaming SIMD Extensions SSE2 * Supports Streaming SIMD Extensions 2 SSE3 * Supports Streaming SIMD Extensions 3 SSSE3 - Supports Supplemental SIMD Extensions 3 SSE4.1 - Supports Streaming SIMD Extensions 4.1 SSE4.2 - Supports Streaming SIMD Extensions 4.2 AES - Supports AES extensions AVX - Supports AVX intruction extensions FMA - Supports FMA extensions using YMM state MSR * Implements RDMSR/WRMSR instructions MTRR * Supports Memory Type Range Registers XSAVE - Supports XSAVE/XRSTOR instructions OSXSAVE - Supports XSETBV/XGETBV instructions RDRAND - Supports RDRAND instruction RDSEED - Supports RDSEED instruction CMOV * Supports CMOVcc instruction CLFSH * Supports CLFLUSH instruction CX8 * Supports compare and exchange 8-byte instructions CX16 - Supports CMPXCHG16B instruction BMI1 - Supports bit manipulation extensions 1 BMI2 - Supports bit maniuplation extensions 2 ADX - Supports ADCX/ADOX instructions DCA - Supports prefetch from memory-mapped device F16C - Supports half-precision instruction FXSR * Supports FXSAVE/FXSTOR instructions FFXSR * Supports optimized FXSAVE/FSRSTOR instruction MONITOR - Supports MONITOR and MWAIT instructions MOVBE - Supports MOVBE instruction ERMSB - Supports Enhanced REP MOVSB/STOSB PCLULDQ - Supports PCLMULDQ instruction POPCNT - Supports POPCNT instruction SEP * Supports fast system call instructions LAHF-SAHF * Supports LAHF/SAHF instructions in 64-bit mode HLE - Supports Hardware Lock Elision instructions RTM - Supports Restricted Transactional Memory instructions DE * Supports I/O breakpoints including CR4.DE DTES64 - Can write history of 64-bit branch addresses DS - Implements memory-resident debug buffer DS-CPL - Supports Debug Store feature with CPL PCID - Supports PCIDs and settable CR4.PCIDE INVPCID - Supports INVPCID instruction PDCM - Supports Performance Capabilities MSR RDTSCP - Supports RDTSCP instruction TSC * Supports RDTSC instruction TSC-DEADLINE - Local APIC supports one-shot deadline timer TSC-INVARIANT - TSC runs at constant rate xTPR - Supports disabling task priority messages EIST - Supports Enhanced Intel Speedstep ACPI - Implements MSR for power management TM - Implements thermal monitor circuitry TM2 - Implements Thermal Monitor 2 control APIC * Implements software-accessible local APIC x2APIC - Supports x2APIC CNXT-ID - L1 data cache mode adaptive or BIOS MCE * Supports Machine Check, INT18 and CR4.MCE MCA * Implements Machine Check Architecture PBE - Supports use of FERR#/PBE# pin PSN - Implements 96-bit processor serial number PREFETCHW * Supports PREFETCHW instruction Logical to Physical Processor Map: * Physical Processor 0 Logical Processor to Socket Map: * Socket 0 Logical Processor to NUMA Node Map: * NUMA Node 0 Logical Processor to Cache Map: * Data Cache 0, Level 1, 64 KB, Assoc 2, LineSize 64 * Instruction Cache 0, Level 1, 64 KB, Assoc 2, LineSize 64 * Unified Cache 0, Level 2, 1 MB, Assoc 16, LineSize 64 |
Author: | hvz [ Wed Mar 13, 2013 1:11 am ] |
Post subject: | Re: Stereo Tool 7.03 BETA |
I've compared the performance of an SSE4.1 build against the SSE2 build. Difference is small, less than 0.5%, could even be 0. I've glanced at the generated assembly code and it looks like the only thing that changes is the order of instructions. The difference between SSE2 and SSE3 should be even smaller.... I'll build an SSE3 Winamp plugin version just to be sure, but I don't expect any noticeable difference from it. |
Author: | hvz [ Wed Mar 13, 2013 2:18 am ] |
Post subject: | Re: Stereo Tool 7.03 BETA |
SSE3 Winamp DSP version: http://www.stereotool.com/download/dsp_ ... A-SSE3.exe I don't expect any significant difference in comparison with version 053, the next build will be SSE2 again unless there's a good reason to change! So if you DO see a difference please let me know! (Also if you don't.) |
Author: | Brian [ Wed Mar 13, 2013 4:41 am ] |
Post subject: | Re: Stereo Tool 7.03 BETA |
Which version of IPP did you use, and are you using Dynamic linking, Static linking (further defined as with or without dispatching), or a customized SO? |
Author: | vmp94 [ Wed Mar 13, 2013 5:45 am ] |
Post subject: | Re: Stereo Tool 7.03 BETA |
The SSE3 version sounds very distorted. The multiband limiters are going crazy. This happens on every single preset. SSE2 works fine. |
Author: | Brian [ Wed Mar 13, 2013 7:34 am ] |
Post subject: | Re: Stereo Tool 7.03 BETA |
Quote: The SSE3 version sounds very distorted. The multiband limiters are going crazy. This happens on every single preset. SSE2 works fine.
Every one of your own presets, or every one of the built-in presets? If "every one" of the built-in presets, do you truly mean EVERY ONE, including my own (Cobalt) that does not use any of the new stuff added?Either way, what specific processor do you have, and what specific Operating System? FYI, I have no problems at all with the SSE3 version, but I have not yet tested with any of the new stuff. Also, I have some theories that will depend on what Hans says the version of IPP he used and how it was linked. Edit: Tested R-Type. No problems. Need to know exactly what you meant by "every one" and the CPU and OS. |
Author: | Brian [ Wed Mar 13, 2013 10:12 am ] |
Post subject: | Re: Stereo Tool 7.03 BETA |
Meh, I'll just skip waiting for Hans to answer, because I'm exhausted after wading through posts on the Intel forums that have semi-broken English... ![]() From what I've been able to determine, the only way to generate SSE3 code in a 32-bit build on AMD systems is to use IPP 6.1. With IPP 7.0, Intel, in their ever-so-restrictive mindset, removed the t7 optimizations. The t7 optimizations were the SSE3 instructions. They made the executive decision that on P4 with SSE3, there was not enough difference between t7 and w7 (SSE2), so they would no longer provide the t7 optimizations on IA-32. Who cares about all the AMD systems that might've done better with t7, right? There was a modest uproar from developers who knew they had AMD end-users, but the only thing told to them was that their feedback would be forwarded and that there *might* be an inclusion of SSE3 inside the w7 optimization. I haven't been able to find any documentation that SSE3 made it into w7 though. So, my question to Hans about the dynamic vs. static linking had to do with both this, and, if static, if ippInit() is called when it should be. As an aside, given all the difficulties with migrating from 10.1 to 13.x, and now the 6.x vs. 7.x issue, perhaps looking into the AMD performance libraries (Framewave) might be a worthwhile venture. Their claim is that their libraries do not check for processor manufacturer, only feature flags. That said though, technically speaking, what Intel did with IPP 7.0 isn't checking manufacturer either, but when they made the decision to ditch SSE3 out of the 32-bit code paths, they knew full well that there were no AMD processors that supported SSSE3 or SSE4.x. In other words, the move can be viewed as a loophole in the AMD settlement where they said they would no longer hinder performance. Here's a quote from one of their fine forum representatives about "older" systems. Check the bolded and underlined part, and bear in mind that a brand new, top-of-the-line AMD processor at that time in 2010 had no SSE4.1 or 4.2 support. Quote:
vladimir-dudnik (Intel)
Arrogant jerk... Never heard of not having the money I suppose. Also completely ignored (likely intentionally) that there were no AMD systems that could support SSE4.1 or 4.2 at the time. Again, arrogant jerk... Finally, yes, Intel was selling "SSE3" processors, as they are still selling them today. If they weren't, then your i7 processor wouldn't support SSE3, now would it? Thu, 06/17/2010 - 09:33 Intel does not sell SSE3 processors anymore. I do not think there is any legal obligation to support end-of-lifed products for any company. Otherwise we just will not be able to deliver new technologies like Westmere or AVX processors (which is coming soon). The functionality you are looking for still be available in IPP 6.1 product. By the way, the performance oriented customers migrating to the newest platforms. I personally would not consider those who use old or even end of lifed platforms as performance oriented customers. If they do not care about performance why anyone else should do? Regards, Vladimir ![]() BTW, you participated in that thread... |
Author: | hvz [ Wed Mar 13, 2013 10:37 am ] |
Post subject: | Re: Stereo Tool 7.03 BETA |
Well in the 7.x versions they also stopped providing a fully optimized SSE2 version. And so, this is the reason why I stayed with the latest 6 version. At least on Intel systems, there is no measurable difference in performance - in the functions that I use! - between the SSE2, SSE3 and SSE4.x versions. That's why I decided to statically link the SSE2 version to reduce executable file size (it more than doubles if I include all versions, even when I'm only using only 2 functions from it). But, is the SSE3 version (without IPP) faster on your system? |
Page 79 of 102 | All times are UTC+02:00 |
Powered by phpBB® Forum Software © phpBB Limited https://www.phpbb.com/ |