Quote:
I mean I don't see any pattern, but it does use multicores *sometimes*
On the Intel machine, it's freaking persistent

Odd. I'm processing things in 2 threads, so I would assume that Windows would spread it over multiple cores. No idea why it doesn't - maybe some power management feature? If the CPU load is not that high on the one core that's in use, it does not matter much in total performance anyway.
What you could try is letting it write to disk so it will use at least one core upto 100%. If you still see difference in behavior (and performance!) if you do that, something is wrong.
BTW: I'm measuring a 40% increase, which is much more than I had expected - but remember that only the clipper and HALF of the declipper are multicore now, the rest of the processing (including half of the clipper and 3/4th of the declipper!) are still running in the same thread. So actually I would still expect a very uneven spread.
O, and are you using hyperthreading? If so, things may be radically different if the 2 cores that are selected are actually one core, or not!