rate(freq) depends on the system timer (GetTickCount() on Windows) to be accurate on the scale of 1.0/freq seconds. On Windows, this is not true for frequencies much over 100. We need either a more accurate timing mechanism such as RDTSC, or an accumulator that will deliver correct amortized performance.
Sam Schulenburg points out another timing mechanism on Windows:
> the windows API supports the following functions:
> 1) QueryPerformanceCounters()
> 2) QueryPerformanceFrequency()
>
> These functions can give you micro second resolution on high speed mother
> boards. You can also use these functions to determine if the performance
> counters are available on your mother board.
Unless someone also finds a microsecond-resolution sleep on Windows, there are still two approaches to implementing rate():
1. Amortization. Call sleep() only every N calls to rate(), so that both the overhead and inaccuracy are amortized away. Pro: Excess processor power is actually yielded to other programs. Con: The delay time will be "unsteady" - 0,0,0,0,0,20ms,0,0,0,0,0,0,20ms,...
2. Delay loop. When the frequency is low, call sleep(); when it is high, just call the high-frequency timer of choice in a loop. Pro: Delay time will be consistent. Con: excess processor power is frittered away in the delay loop instead of yielded.
rate() now uses a delay loop for times <= 10ms, and the Windows clock function is now based on the QueryPerformance* API when available.
unit_test\test_rate.py does accuracy measurements on rate().