"Christmas - the time to fix the computers of your loved ones" « Lord Wyrm

The HPET bug: What it is and what it isn't

mat 26.04.2018 190015 56
timerbench-and-the-hpet-bug-preview_230280.jpg
Anandtech recently released an article that pointed out problems with their CPU reviews due to an enabled High Precision Event Timer in Windows. Some Intel processors suffered from decreased performance in games and other benchmarks. Since then a lot of misconceptions are going around. People are calling out Intel as cheaters when actually the opposite is going on. We take this opportunity to have another look at the HPET bug and finally announce TimerBench, our Windows timer benchmark to the public. It provides proof behind these infamous HPET problems and helps you to test the impact of your timer configuration on your system performance.

Deutsche Version

Download: TimerBench 1.5 (173 MB, Self-extracting EXE, CRC32: 34EC2A27)
Prerequisites: DirectX 11

Let's rip the bandaid off: Yes, there IS an HPET bug and no, Intel is NOT cheating. The contrary is the case, Intel suffers from this bug and we are pretty sure that Anandtech is not the only review site that has published wrong results because of these issues. Evidently there are lots of people out there having low framerates on their setups and no explanation for it.

To understand this properly we need to go back serveral years when HPET was already diagnosed to hurt gaming performance in certain situations. Forum threads about this topic popped up here an there, lots of misinformation and make-believe happened back then. Yet the root of some of the performance problems was the same as it is today, but in a different way. HPET was always a costly way to retrieve an incremental timestamp counter, especially when CPU cycles were on a tight budget. A lot of things that are now implemented in hardware had to be done in software in the early days. Not to forget that CPUs had a single core or games were simply not ready to use multiple threads in an efficient way. So the usage of HPET took away precious calculation power of already CPU bound games therefor hurting 3D performance.

Since then CPUs became more powerful and serveral dedicated CPU cores are the defacto standard nowadays. In addition graphics APIs finally adopted multithreading as well, all in favor of reducing the impact of CPU power in games. Today the bottleneck of 3D performance has shifted from CPU to GPU, that's especially true for high resolutions, AA, post processing and the likes. That's the reason why CPU reviews are benched in FullHD without AA, because otherwise you would barely see a difference.

In the process of this shift the graphics API makers, engine and game developers started to use lots of timestamp queries to measure performance, provide framerate-independent functionality like animations and other movement and so on. And why not, when there is no impact at all.

This is where it gets tricky. When Skylake X and Kaby Lake X was released (too early) in summer 2017 we had the pleasure to review the i7-7740X and i9-7900X. After a few days we came across the same thing that happened to Anandtech recently: the numbers for game benchmarks on lower resolutions didn't add up at all. It took some effort to finally pinpoint the low framerates to an enabled HPET timer. The following video shows our findings and should paint a clear picture of the impact of this issue:

The first encounter of the HPET bug came with Skylake X


We named it the "X299 HPET bug" as the anomaly only occured on CPUs using the X299 chipset back then. Other CPUs were not affected at the time. We contacted Intel and they didn't even bother to comment on this. When approaching an Intel engineer at a press workshop, they even knew about our bug report but denied us to show further proof. Anyway, soon after Coffee Lake S came along it became clear that all new Intel platforms are affected by the bug. We were pretty sure now that this will blow up into Intel's face at some point in the future.

So what is happening here? As we mentioned earlier timers are used heavily these days. The goto timer function for high precision in Windows is called QueryPerformanceCounter(), a WIN32 API function to access the most accurate timer available in the system. QPC is used by almost everybody, although it's known for its problems and inconsistency (but that's another story). The big problem with QPC and its abstraction layer is that the developers don't know what timer will actually be used on the customer's system. Or to put it in a more truthful way: they don't care about it. In normal conditions QPC will use TSC, a very fast timestamp counter inside the CPU. But when HPET is forcefully enabled by the user or an application, QPC will prefer it to TSC. Although the query for an HPET timestamp takes longer, it's more accurate as well. There is nothing wrong with that and there hasn't been for years. Then X299 showed up and the query for an HPET timestamp suddenly takes 7 times longer! The number of possible HPET timer calls per seconds went from 1.4 million on Broadwell-E to merely 200.000 calls on Skylake X. Let's remember that this is a high-end platform also used for scientific purposes where accuracy and performance are both very relevant.

Additionally there is word out there that the slow HPET calls are a consequence of the Meltdown and Spectre bugfixes. This is NOT the case. We found problems with HPET latencies back in July 2017, where these security flaws were far away from being on Intel's radar. Even though the Smeltdown fixes did not cause the HPET to be slow, it introduced additional strain on the CPU that adds on top of an already existing CPU bottleneck.

In summary the problem is a very slow timer implementation of the High Precision Event Timer on modern platforms, that is used without care by the developers. Badly affected are Skylake X and Kaby Lake X. Impacts can also be shown on Threadripper, Coffee Lake and in some degree on Ryzen as well. It could be discussed if a slow functionality is a bug, but honestly let's just call it the "HPET bug".

While the reduced theoretical numbers of HPET timer calls are quite self explantory, the impact of the slow HPET can not be directly applied on game performance. It heavily depends on the usage of timer functions in the game/engine and the combination of resolution, details and graphics card in place. So to trigger the bug you normally run your games on something like FullHD, maybe an older, less GPU heavy game as well, and power it with an oversized graphics card. In effect the HPET bug will show on screen with a decreased average framerate and an additional stuttering every now and then. Especially the last part is were the bug really kicks. Due to horribly high frametimes it looks like the game freezes for a few milliseconds. With X299 this stuttering happens on the Windows UI as well. It starts in the final stages of booting with some mild flickering of the loading icon and can be seen in action when dragging windows around or a window/control gets invalidated and is refreshed. Not always but once you see it, it can not be unseen. Bottom line is your expensive system will give an inadequate experience once HPET is enabled.

Because the HPET bug can be difficult to spot, we have implemented a timer benchmark for windows that sheds some light on your timer configuration and its performance. It's called TimerBench and mainly focuses on QPC because it's the defacto standard in Windows. There is a synthetic test to show the maximum number of possible timer calls and a game test to analyze the impact of your configured timer in 3D applications. It uses Unreal Engine 4 and DirectX 11, a famous combination for games.



Download: TimerBench 1.5 (173 MB, Self-extracting EXE, CRC32: 34EC2A27)
Prerequisites: DirectX 11

We recommend the following process to figure out if your system is held back by your timer configuration:

  1. Choose the lowest resolution you are gaming with.
  2. Close other applications like browsers as they can have a negative impact when testing HPET.
  3. Start the benchmark.
  4. Change the "QPC Mode" to TSC or the other way round. A reboot will be necessary.
  5. Repeat step 1 to 3 and compare the results. Use the button "Open Result" to load the results of previously recorded runs.

With that said we leave you to benchmark your system's timer configuration, analyze the results and share it with the public. If you have any questions or problem with benchmark, let us know in the comments or write us at mat @ overclockers.at.
Kontakt | Unser Forum | Über overclockers.at | Impressum | Datenschutz