Woke up to a nice, lovely weekend noon!
2 new betas in repository, yaaahooo!!!
BTW, the effect of this whole AMD-Intel lawsuit can be seen in Intel C++ Studio XE 2011 where the dispatcher has been done away with.
Quote:
Phoenix
About AMD patches: I've tried what would happen if I replace the search for 'GenuineIntel' on my pc by 'GanuineIntel', so it would fail (basically I opened the executable file and changed every occurrence of 'Genu' that I could fine). I cannot measure any difference in performance! Could you measure a performance difference in your case (ie. lower CPU load or something like that?)
I'll explain the patching part, but Hans, I also intend to know from you that to what extent SSE2 can augment the performance in a modern Intel or AMD Quad core processor based system wrt StereoTool(which computes several floating point ops per second)? The performance incentive that I witnessed was on a 2core VM running on my Phenom II X4 965BE. The actual machine didnot show any noticeable performance change. So someone who has a relatively old AMD dual core(or may be single core) CPU is a good candidate for testing the patched version. Prior patching beta31D, the buffer in VM needed to be at 20ms for absolute glitch-free sound. Now I could put it to 5.2ms(infact I disabled Superfetch service and got it to 5.0) and ASIO4ALL at 128 samples. I'm yet to test the betas post 31D and from the reviews I gather that there are improvements. Will let you know how these perform on the afore said setup. Other than that I cannot comment anything conclusive about 'Performance Increment'.
Now the patching part:
There are 2 approaches:
i. Make the vendor string change to vendor string of AMD,
provided the string size matches and it actually corresponds to vendor string of target CPU. Because Ideally changing it to 'GanuineIntel' would make the dispatcher take Generic IA-32 path on any machine(which is why I raised question about SSE2 showing effect as you said you cannot measure any changes on your QuadCore Intel). So as per 1st approach for AMD systems, 'GenuineIntel' translates to 'AuthenticAMD'. Which also means that truncated strings translate as: 'Genu' -> 'Auth', 'ineI' -> 'enti' and 'ntel' -> 'cAMD' in the entire file. In Stereo Tool there are 6 such instances. In 'TapeRestoreLive' there were 2(if I remember correct).
ii. Make the compare instruction test to 'True' so that 'intel bit' gets the value 1. This is what the dispatcher looks like:
Quote:
mov eax, [ebp][-0008]
cmp eax, 0756E6547 ;"uneG" ; Checking on "Genu"
jne not_intel ; if it doesn’t equal, switching for not_intel
mov eax, [ebp][-0010]
cmp eax, 049656E69 ;"Ieni" ; Checking on "ineI"
jne not_intel ; doesn’t equal - not_intel
mov eax, [ebp][-0014]
cmp eax, 06C65746E ;"letn" ; Checking on "ntel"
jne not_intel ; doesn’t equal - not_intel
mov edx, 000000001 ; the intel byte
jmps next
not_intel:
xor edx, edx ; and here we have 0 for all non-Intel CPUs
next:
The binary file is scanned for compare instructions like 81 fa 47 65 6e 75 (cmp edx, 0756E6547), the latter being replaced by commands like
testl edx, 000000000 (f7 c2 00 00 00 00) which will always evaluate to 'True' no matter what the Vendor string is. Similar checks on EAX, EBX, ECX, EBP, ESI, EDI, and [EBP].
So to be double sure, if you release a beta version compiler with the option /Qx:SSE2 only(no /arch:SSE2), then it would run only on Intel compilers supporting SSE2 instruction set and would throw an error or just won't run on AMD. but if you patch using either of the 2 methods, then the same binar would run on AMD systems. This is sure-shot check that the executable is leveraging the SSE2 instruction set.
Let me know your thoughts.
