Stereo Tool
https://forums.stereotool.com/

Unit testing: Best-case vs. worst-case
https://forums.stereotool.com/viewtopic.php?t=4747
Page 1 of 1

Author:  Brian [ Fri Mar 29, 2013 7:35 am ]
Post subject:  Unit testing: Best-case vs. worst-case

I had put off posting this, but as I see development has begun again....

This is again from Agner Fog. Agner lives in Denmark, so perhaps he could be contacted (or visited?) for comment and/or assistance with the current design philosophy.

Again, only trying to encourage re-evaluation of design / testing philosophy. I think the product is good, but it could be made better.

From http://www.agner.org/optimize/optimizing_cpp.pdf

Pages 161 and 162:

16.1 The pitfalls of unit-testing

It is common practice to test each function or class separately in software development. This unit-testing is necessary for verifying the functionality of an optimized function, but unfortunately the unit-test doesn't give the full information about the performance of the function in terms of speed.

Assume that you have two different versions of a critical function and you want to find out which one is fastest. The typical way to test this is to make a small test program that calls the critical function many times with a suitable set of test data and measure how long time it takes. The version that performs best under this unit-test may have a larger memory footprint than the alternative version. The penalty of cache misses is not seen in the unit-test because the total amount of code and data memory used by the test program is likely to be less than the cache size.

When the critical function is inserted in the final program, it is very likely that code cache and data cache are critical resources. Modern CPUs are so fast that the clock cycles spent on executing instructions are less likely to be a bottleneck than memory access and cache size. If this is the case then the optimal version of the critical function may be the one that takes longer time in the unit-test but has a smaller memory footprint.

If, for example, you want to find out whether it is advantageous to roll out a big loop then you cannot rely on a unit-test without taking cache effects into account.

You can calculate how much memory a function uses by looking at a link map or an assembly listing. Use the "generate map file" option for the linker. Both code cache use and data cache use can be critical. The branch target buffer is also a cache that can be critical. Therefore, the number of jumps, calls and branches in a function should also be considered.

A realistic performance test should include not only a single function or hot spot but also the innermost loop that includes the critical functions and hot spots. The test should be performed with a realistic set of data in order to get reliable results for branch mispredictions. The performance measurement should not include any part of the program
that waits for user input. The time used for file input and output should be measured separately.

The fallacy of measuring performance by unit-testing is unfortunately very common. Even some of the best optimized function libraries available use excessive loop unrolling so that the memory footprint is unreasonably large.

Author:  Brian [ Fri Mar 29, 2013 7:45 am ]
Post subject:  Re: Unit testing: Best-case vs. worst-case

16.2 Worst-case testing

Most performance tests are done under the best-case conditions. All disturbing influences are removed, all resources are sufficient, and the caching conditions are optimal. Best-case testing is useful because it gives more reliable and reproducible results. If you want to compare the performance of two different implementations of the same algorithm, then you need to remove all disturbing influences in order to make the measurements as accurate and reproducible as possible.

However, there are cases where it is more relevant to test the performance under the worst-case conditions. For example, if you want to make sure that the response time to user input never exceeds one second, then you should test the response time under worst-case conditions.


Programs that produce streaming audio or video should also be tested under worst-case conditions in order to make sure that they always keep up with the expected real-time speed. If the computer system is too slow, then there will be occasional delays or glitches in the output, which is usually unacceptable.

Each of the following methods could possibly be relevant when testing worst-case performance:
  • The first time you activate a particular part of the program, it is likely to be slower than the subsequent times because of lazy loading of the code, cache misses and branch mispredictions.
  • Test the whole software package, including all runtime libraries and frameworks, rather than isolating a single function. :!: Switch between different parts of the software package in order to increase the likelihood that certain parts of the program code are uncached or even swapped to disk.
  • Software that relies on network resources and servers should be tested on a network with heavy traffic and a server in full use rather than a dedicated test server.
  • Use large data files and databases with lots of data.
  • Use an old computer with a slow CPU, an insufficient amount of RAM, a lot of irrelevant software installed, a lot of background processes running, and a fragmented hard disk.
  • :arrow: Why I volunteered to profile.
  • Test with different brands of CPUs, different types of graphics cards, etc.
  • Use an antivirus program that scans all files on access.
  • Run multiple processes or threads simultaneously. If the microprocessor has hyperthreading, then try to run two threads in each processor core.
  • Try to allocate more RAM than there is, in order to force the swapping of memory to disk.
  • Provoke cache misses by making the code size or data used in the innermost loop bigger than the cache size. Alternatively, you may actively invalidate the cache. The operating system may have a function for this purpose, or you may use the _mm_clflush intrinsic function.
  • Provoke branch mispredictions by making the data more random than normal.

Page 1 of 1 All times are UTC+02:00
Powered by phpBB® Forum Software © phpBB Limited
https://www.phpbb.com/