All times are UTC+02:00




Post new topic  Reply to topic  [ 6 posts ] 
Author Message
 Post subject: About performance
PostPosted: Fri Mar 01, 2013 10:54 am 

Joined: Sun Dec 12, 2010 2:26 pm
Posts: 885
I've decided to split off the discussion about performance into another thread so as to not continually clutter the other one.

To summarize what I've said over the past few hours:

1) VTune and Task Manager may be reporting on different values, so it may be an apples-to-oranges comparison.

2) Task Manager, as it ships, has 5 update speed settings, where 4 of them show updates, and 1 is paused. The issue is with the 4 that show updates, the shipping default is 1 second, but that default is NOT attainable after the update speed has been changed AT ANY TIME, unless you edit the Windows Registry. Without editing the Registry, "High" gives you 0.5 seconds, "Normal" gives you 2.0 seconds, and "Low" gives you 4.0 seconds. If a 1-second sampling is done via VTune and the update speed of Task Manager is not set to "High" *AND* the update speed has been changed in the past *AND* the Registry has not been edited, then the update speed is either 2.0 seconds or 4.0 seconds, which is longer than the sample in VTune, and thus may indeed show "00" given that it completely missed the VTune activity.

3) To counter / mitigate the claim that Task Manager is flawed, I also use Process Explorer. Process Explorer is developed by Mark Russinovich, who is widely recognized as the foremost expert on such matters involving the Windows kernel. Process Explorer is more accurate than Task Manager. Process Explorer is reporting the same values as Task Manager, where "same" is either exactly the same or within 3-5 points. I have thus far not sent Mark an email to ask his opinion on this matter, but I will if it is needed.

**********

New discovery:

I had previously questioned Bojcha about his assertion that CPU load was "good" on his processor. His processor is based on the AMD K8 architecture, as is mine.

As I have not received a response, I decided to tinker around. I noticed that his processor was showing clocked at 3.36 GHz. That is a pretty impressive overclock. Well, that got me thinking... My processor, running at 2.75 GHz, is also overclocked. Well, what if I dropped the clock speed?

So, I dropped from 2750 to 2250. Testing the same preset at the lower clock speed gave me a 25-30 point increase in CPU load.

So, if I suppose that performance increases linearly with clock speed, then Bojcha's system likely shows at least:

3360 - 2750 = 610
2750 - 2250 = 500
610 / 500 = 1.22
25 x 1.22 = 30.5, rounding down to 30.

So, if I then take my reported load of about 65-70% without using the declipper and bring it down by 30, that's 35-40, and I too would be OK with that.

So, a possible (likely) conclusion is that Bojcha's system is able to plow through the inefficiencies better due to brute force (higher clock rate). Due to the L2 cache and memory controller being on-die and at full speed, CPU-to-cache speed and bandwidth is increased. Other things would still be limited to whatever the HyperTransport bus is running. Default for mine is 1 GHz, but it too can be overclocked, but not usually by a whole lot. I started getting errors when I went beyond a 50 MHz overclock on that bus. Default for Bojcha's might be higher, depending on his specific motherboard and which HyperTransport revision it has.

At any rate, Bojcha's system should NOT be a performance baseline for K8 platforms. It is among the highest clock speeds I've seen for K8. Usually 3.0-3.2 is the most. Appropriate clock speeds would be 2.4 - 2.9.


Top
   
 Post subject: Re: About performance
PostPosted: Fri Mar 01, 2013 12:44 pm 
Site Admin
User avatar

Joined: Mon Mar 17, 2008 1:40 am
Posts: 11213
Task manager uses some polling mechanism that's probably linked to process switches which happen every 15 ms (approx.) in Windows. So if some process becomes active directly after that every time (for example because it gets a slot and finishes before the slot time, 15 ms, is finished), it will not be detected. Note: I'm just guessing what happens here... Thing is that for reliable measurements you would have to do things that use more CPU power. So it's very well possible that Process Explorer uses the same measurement - I don't know.

What I do know is this: 0% CPU usage is just not possible. Not even on my i7 system.

What I also know: I've seen in other cases where Task Manager reported a CPU load of 0%, that there was still a considerable slowdown in other programs. In fact, as soon as I started something else, it would suddenly report a CPU load of 30% of so. Also, I have seen cases where just changing the timings of different parts of the processing, WITHOUT doing anything different, also caused jumps between 0 and 30%. Which seems to confirm what I said above.


Your Bojcha calculation is incorrect: You need to use Needed processing power / CPU speed as a measure, so CPU loads are relative to 1 / CPU speed. In your calculation if Bojcha's system was another 600 MHz faster he would get a CPU load of 0. Regardless of that, many other things also have effects (for example memory speed, but also memory layout: Place 2 memory chips in parallel and you'll get twice the speed. If you place them in the wrong banks they will still work but not at twice the speed. This is also why it's better to have 2 smaller chips than one big one.)


So I would propose to forget about this type of "details" (you probably don't agree that they are details), the only thing that's important here is if you see the CPU load go up or down. With all the optimizations, I would expect a preset with 5 MB bands, no lookahead, no burst protection, to run roughly at the same speed as that same preset with the old multiband IN THE LAST RELEASED VERSION - so before I did all the optimizations. The new MB *does* take more than the old one, but overall many things have been improved and the total CPU load, with these settings, should at least be in the same region as that of version 7.03. But, please don't use Task Manager to check... Instead, my current best measurement method is to load the DSP plugin in Winamp, select File Writer Output Plugin (out_filewrite.dll), set the Winamp priority to REALTIME in TaskManager, then run the program for exactly 1 minute and check how much audio it processed. I usually set the screen drawing speed to slow (about 5%, you might need to set it a bit higher) and look at it during the measurement to check if there are no hiccups; if there are I do the measurement again. Also repeat each measurement a few times and use the highest result you saw.


Top
   
 Post subject: Re: About performance
PostPosted: Fri Mar 01, 2013 10:04 pm 

Joined: Sun Dec 12, 2010 2:26 pm
Posts: 885
Quote:
Task manager uses some polling mechanism that's probably linked to process switches which happen every 15 ms (approx.) in Windows. So if some process becomes active directly after that every time (for example because it gets a slot and finishes before the slot time, 15 ms, is finished), it will not be detected. Note: I'm just guessing what happens here... Thing is that for reliable measurements you would have to do things that use more CPU power. So it's very well possible that Process Explorer uses the same measurement - I don't know.
Yes, and no. ProcExp uses that *AND* other metrics that are available that Task Manager does not use.

I see I provided a link, but you didn't read it, instead "guessing"... :(

http://www.techspot.com/community/topic ... er.172232/

It talks about the 15.6 ms thing, but it also mentions:
Quote:
Finally, Task Manager does not account for CPU time spent servicing interrupts or deferred procedure calls (DPCs), incorrectly including that time with the System Idle Process.

Procexp represents CPU usage more accurately than does Task Manager. First, Procexp shows per-process CPU utilization percentages rounded to a resolution of two decimal places by default instead of to an integer. Second, Procexp tracks the time spent servicing interrupts and DPCs and displays them separately from the Idle process. Finally, Procexp uses additional system metrics so that processes consuming small amounts of CPU can be identified and, when possible, provide a more accurate account of actual CPU consumption. Different metrics are available on Windows XP, Windows Vista, and Windows 7 and their corresponding server versions. Procexp takes advantage of whatever is available to report the most accurate measures possible.
Quote:
What I do know is this: 0% CPU usage is just not possible. Not even on my i7 system.
No, it's not, but what you likely don't know is that 0.9% gets rounded *DOWN* to 00. Neither did I, until I started investigating what you're claiming. If I had "guessed" instead of investigating, I would've believed that 0.5-0.9 would've been rounded UP.

Bottom line here is you've got some built-in paranoia about Task Manager. Some of it is justified, but most of it? I don't know. At any rate, Process Explorer tracks other things, and I have been giving you values presented in it. As I said, if you remain unconvinced, I, or you, could email Mark Russinovich. Getting in contact with the author is probably better than guessing, I guess... ;)
Quote:
You need to use Needed processing power / CPU speed as a measure, so CPU loads are relative to 1 / CPU speed. In your calculation if Bojcha's system was another 600 MHz faster he would get a CPU load of 0.
And this is different from your i7 showing 0% HOW?
Quote:
So I would propose to forget about this type of "details" (you probably don't agree that they are details)
They *ARE* details, but they seem to be details that you are downplaying.

Bojcha has your ear. That's as obvious as can be. If he tells you his system is performing just fine, well, that's good, but you need to put a weighting on that information because his system is clocked at almost the fastest the K8 microarchitecture will allow. As such, it is NOT a valid system to use as a performance baseline for this platform, period. It would be a "best case scenario", and nothing more.

Consider the exchange between Bojcha and gpagliaroli, where Bojcha is going "what problem?", and the explanation given back to Bojcha. If you don't want to believe me, have Bojcha clock back down into the 2.8-2.9 range of what his processor is suppose to be and then have him report what he sees. It will not handle things as well.

As a side note though, it does speak to what I was saying about K8 being a pretty solid architecture - vastly superior to Netburst (Pentium 4), and minimally competitive with Core on the low end.
Quote:
With all the optimizations, I would expect a preset with 5 MB bands, no lookahead, no burst protection
When I begged for the unnecessary iterations to be removed, you didn't list all these stipulations. All you told me was "new multiband with 5 bands should be the same or less". No stipulations.

As for the filewrite thing with Winamp, there's a problem with that method that I don't think you understand. When you start to play a track in Winamp, Winamp looks at the file and gathers information from the tags. Every time you start a track, there is a period of high CPU usage that takes, on my system, 1-2 seconds to go away. When you try to write out to a file, you add on the time to actually create the file, so you've got the tag stuff and the file creation all going on. Further, you have to specify the bitrate on a MP3, and I can choose anything from 18 to 320 stereo.

So, if I go through all these hoops, and it ends up showing roughly the same as Process Explorer, will we be able to move on?


Top
   
 Post subject: Re: About performance
PostPosted: Fri Mar 01, 2013 11:41 pm 
Site Admin
User avatar

Joined: Mon Mar 17, 2008 1:40 am
Posts: 11213
Ok, then Process Explorer should be a lot better - please remember that for example ASIO and potentially also Winamp threading can very well be DPC etc based which might explain the low values that sometimes occur.


0% is not possible because there's quite a lot of stuff going on that just takes a lot more. Like I said, I've seen jumps from 0 to 30% by changing just a single line of code or by changing some timings, that's just not possible. I don't know - based on what you write this should not be the case in process explorer. Good. So then Process Explorer should never display 0.

Same for what you say about Bojcha, btw, ok fine, if his CPU would be another 600 MHz faster (above the point where it reaches 0%) it would be -30%, yes a negative CPU load, in your calculation. Does that convince you that the calculation is wrong?


Then to the point: In the end only one thing matters: How much audio can your system process in a given amount of time. On a multicore system, looking at CPU load is useless for that since the processing is spread over multiple cores, so the only way to determine this is by letting it run at maximum speed and measuring how much it processes in a given amount of time. If you have a reliable means of measuring CPU usage, on a single core system WITHOUT HYPERTHREADING (!) you can do the same thing by looking at CPU load, but there's not reason why the method I'm describing wouldn't work and at least you rule out any measurement issues in Task Manager etc.


About the MP3 writing: I've tried measuring 1, 2 and 3 seconds, with only a few filters enabled. Result:
1 second: 4 seconds processed
2 seconds: 9 seconds processed
3 seconds: 14 seconds processed.
So this startup thing you mention takes - on my system anyway - at most about 0.2 seconds. On a total of one minute this is negligable. Also, I don't even know what exact format I'm writing, just the default I guess, but if I let it run without processing it encodes one minute of audio in 2-4 seconds. If you really want to you can compensate the measurements for this but again, compared to what the processing does this is negligable. Especially since I'm mainly interested in changes, not so much in absolute numbers (in your case that's a bit different since you want to see it it can run on your system).

About the specific MB settings: When I told you all those things I'm saying should be turned off weren't there yet.


Top
   
 Post subject: Re: About performance
PostPosted: Sat Mar 02, 2013 1:08 am 

Joined: Sun Dec 12, 2010 2:26 pm
Posts: 885
Quote:
Ok, then Process Explorer should be a lot better - please remember that for example ASIO and potentially also Winamp threading can very well be DPC etc based which might explain the low values that sometimes occur.
The DPC stuff is mentioned in what I quoted before, but I have found this video presentation done at Microsoft TechEd 2010 about Process Explorer and Process Monitor.

http://channel9.msdn.com/Events/TechEd/ ... 010/WCL314

At about 16 minutes in, there is discussion about symbol files. If you watch this, and truly watch it, then it's going to start clicking with you why I'm asking for the debug files. The "aha" moment should be at about 17m 43s. Immediately after that he starts talking about interrupts and DPC calls. Not long after that he talks about something showing zero, Internet Explorer. What you have to do is look at Context Switches and the Context Switch Delta.

At any rate, Process Explorer is one tool in the entire toolbox. I had forgotten about Process Monitor. Using Process Monitor was how we were able to determine that our software was opening and closing a file 4 times per second, and thus slowing performance due to the I/O overhead. We traced what was doing it and altered the behavior to once per second.

What I believe is the final tool that will give me enough information is AMD Code Analyst, AMD's "VTune". Again, I'm hindered by the lack of symbolic debug information.
Quote:
Same for what you say about Bojcha, btw, ok fine, if his CPU would be another 600 MHz faster (above the point where it reaches 0%) it would be -30%, yes a negative CPU load, in your calculation. Does that convince you that the calculation is wrong?
No, it does not, because there are the concepts of minima and maxima, or Greatest / Least element, given that the CPU load values are in a closed (and I think bounded) interval of positive non-zero values. Zero would be the minimum, and 100 would be the maximum, and the set would be non-inclusive, if I'm remembering correctly.

The only part about my calculation that I hold in suspect is if it is linear, which is what you might be trying to say. In response to that, even if there is a curve, making it non-linear, there is absolutely no way that you could know at what point the slope decreases and by how much, without testing for it.
Quote:
If you have a reliable means of measuring CPU usage, on a single core system WITHOUT HYPERTHREADING (!) you can do the same thing by looking at CPU load, but there's not reason why the method I'm describing wouldn't work and at least you rule out any measurement issues in Task Manager etc.
BINGO! Remember, I'm on a single core system WITHOUT HYPERTHREADING. For me, all this jumping through hoops is simply to placate you and your mistrust of Task Manager, to wit I have explained that I'm using Process Explorer. You are unfamiliar with Process Explorer, and are guessing about what it can or cannot do without using it. I've used it and many other tools that have yielded successful diagnostics of performance problems.


Top
   
 Post subject: Re: About performance
PostPosted: Sat Mar 02, 2013 3:19 am 

Joined: Sun Dec 12, 2010 2:26 pm
Posts: 885
I found another clip from the same TechEd event:

http://channel9.msdn.com/Events/TechEd/ ... 010/WCL315

Mark Russinovich - "Case of the Unexplained"

In the beginning of the presentation, he explains that he is a Technical Fellow at Microsoft. He explains that a Technical Fellow is the highest technical position in Microsoft and is equivalent to a Vice-President in the company structure.

This video shows how to use Process Explorer, including looking into stacks, looking at non-obvious things, etc...

As I also said, ProcExp is not the only tool in the toolbox.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 6 posts ] 

All times are UTC+02:00


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Limited