"Christmas - the time to fix the computers of your loved ones" « Lord Wyrm

Intel's XTU analyzed

mat 02.01.2019 101231 22
xtu-analyzed-preview_235450.png
Intel's XTU benchmark is one of the most famous CPU benchmarks out there. HWBOT has registered well over 800,000 results, no other benchmark comes even close to that number. But what is being tested with XTU, is it well implemented, safe against cheating or even reliable? To answer these questions I am going to dive deep into the reverse engineered source code of XTU and reveal inside information, tweaks and various attack vectors.

Before we get to it I'd like to make clear that I will not provide any helper applications to cheat XTU. This article is for educational purposes only, although I will uncover a few legit tweaks and methods to experiment with that could result in an increase of XTU scores.

Just Prime95



When analyzing the security and reliability of an application you have to see things with the eyes of the attacker. For my process it always helps to get a big picture first before reverse engineering anything. Sysinternal's ProcessMonitor is a good way to start, a filter needs to be set for "PerfTune.exe" to be able to tune out the other applications:


The selected row shows the creation of a file called p95-bench(32-bit).exe or p95-bench(64-bit).exe depending on the OS and detected hardware. This file is the actual benchmark that is being called 20 times for the whole run, once for each movement of the blue process bar. It can be found in a temporary folder called: C:\ProgramData\Intel\Intel Extreme Tuning Utility\Temp
The ProgramData folder and the temporary benchmark executables are marked as "hidden", so be sure to enable the setting to show hidden files in Windows Explorer to view them. After the benchmark run these files will be deleted again.

So "p95-bench", huh? Could this be Prime95? Let's grab the file during a benchmark run and have a look at it. The file's properties show that it was digitally signed to avoid modification. But that doesn't hinder us to run a reverse engineering tool (Ollydbg or IDA will do). We don't even need to reverse anything at this point, we will just have a look at the strings inside the benchmark executable:


A search for "mersenne" reveals that this is indeed the well known Prime95 at work here, at least a very basic command line version of it. That immediately raises the question why this benchmark is only compatible with Intel CPUs? Prime95 in its latest versions even brings a rudimentary Zen support to the table, so the benchmark itself can't be the real reason. I will dive into that later on, but let's analyze the executable first. By searching through the strings of the application we can also find that the Prime95 version used by Intel is "27.7", one of the first to make use of AVX instructions. The application's strings also reveal that most of the configuration options defined by prime.ini and local.ini are still supported. Good to know.

Knuckle deep inside the borderline



Now it's time to see some code. By knowing that Prime95 27.7 is used and that it's actually open source, this is just too easy. By guessing the filename on mersenne.org's FTP we can download the source code to assist us with the reverse engineering process: p95v277.source.zip

Don't worry, I won't bother you with learning reverse engineering here (we will save that up for another article). But I do want to give you a quick look at what you would be able to see. As an example, these are among the last assembler instructions the benchmark executes:


Looking at the executable's instructions without debugging is called static analysis. Although it gives good insight of what is in the file, it's very hard to determine which code is actually executed. That's why it is necessary to isolate the functionality we want to reverse engineer as good as possible to be able to debug it whenever we want. In our case we need to run the benchmark executable without pressing the benchmark button in XTU's main application. This is easy in our case as we have already grabbed the executable from the temporary directory and can therefore execute it via a command line prompt (cmd.exe or PowerShell). The only thing missing is that we need to add a single command line parameter to really run the benchmark. By the way, this is a relict from old XTU versions; this parameter defined the location of the result file on the hard drive. Good to know that this has been improved, although not by much (more on that later). This parameter is ignored anyway so just pass 0 or whatever. Important to point out here is that XTU has to run in the background because the benchmark executable relies on a dependency provided by XTU and its system service. Here is the output:

Code: TEXT
PS C:\ProgramData\Intel\Intel Extreme Tuning Utility\Temp> & '.\p95-bench(64-bit).exe' 0
Results file: 0
Create Window: -2
Base Title -2: Main thread
-2: [Jan 2 12:45]
-2: Mersenne number primality test program version 27.7

-2: [Jan 2 12:45]
-2: Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 12 MB

Starting Benchmark
Create Window: 0
Base Title 0: Worker #1
-2: [Jan 2 12:45]
-2: Starting worker.

Title 0: Starting
0: [Jan 2 12:45]
0: Worker starting

Title 0: Benchmarking
0: [Jan 2 12:45]
0: Timing FFTs using 12 threads.

0: [Jan 2 12:45]
0: Timing 1875 iterations of 1024K FFT length.
0: Best time: 0.578 ms., avg time: 0.816 ms.

Benchmark completed
0: [Jan 2 12:45]
0: Benchmark complete.

Title 0: Not running

Finally we are able to step through the code line by line and with the help of the original source code of Prime95 it is a delight to label most of the general functions correctly to complete the puzzle piece by piece. So without further ado this is a simplified list of what the benchmark does each time it is called:

  • Creating an event to signal the main application that the result is up for grabs later on
  • Reading a TSC "bending" value from a memory mapped file
  • Reading configuration files and if they are not available use hardcoded or detected defaults instead (like CPU architecture, L2/L3 cache sizes, hyperthreading, ...)
  • Initializing Prime95's "FFT timings benchmark" (also accessible in Prime95 via the menu => Options => Benchmark btw)
  • Generating random input data for the calculation based on the current time
  • Running 1875 Lucas Lehmer iterations (gwnum's gwsquare function) while checking on each iteration if it was the fastest one
  • Writing a memory mapped result file that includes the fastest iteration time as a double precision value in milliseconds
  • Signal the main application that the memory mapped result file was written

You might ask yourself what the TSC "bending" value might be. Well, XTU uses the CPU's internal timestamp counter called TSC (also known as the RDTSC/RDTSCP instruction) to measure time. To convert the elapsed number of ticks into seconds you need to know how many ticks are happening each second. Makes sense, doesn't it? Modern CPUs have an invariant (or constant) TSC that ticks with the base clock the CPU was booted with. Any bclock or CPU ratio changes won't have an effect on this number so a benchmark can rely on the number of ticks provided by CPUID information (side note: that's not true for AMD's Ryzen but that's another story). This does not apply when your CPU is too old and has no invariant TSC or if you are using Windows 8+ in combination with an AMD CPU or an Intel CPU older than Skylake. These systems will be prone to time skewing when altering the bclock in the OS because the number of ticks per second changes but the benchmark still divides by the assumed bootup value. So Intel introduced their own dynamic measurement for ticks per second to avoid timing bugs that is passed to the benchmark executable via memory mapped file to calculate the final result value in milliseconds.

Reversing the XTU main application



That's already quite a list but we are not finished yet. Let's take a better look at the XTU main application (PerfTune.exe) again. By opening the executable with IDA we will encounter the following dialog indicating that this a .NET assembly executable written in C#:


IDA doesn't do well with .NET but there are other disassemblers out there to make our lives easier. For a quick static analysis of unencrypted executables I prefer JetBrain's dotPeek, for debugging or other heavy lifting I recommend the much more advanced dnSpy.


As we can see in this screenshots from dotPeek the executable PerfTune.exe is decompiled without any troubles, so we can read every line of code Intel's engineers have written. The problem starts when certain libraries are invoked like "BenchmarkLibrary" (IntelBenchmarkSDK.dll). Most of these DLLs are also written in C# but have been encrypted so dotPeek could not decompile the method bodies instead we get comments stating "ISSUE: unable to decompile the method.". That's where dnSpy comes into play:


Now we know that we are struggling with a .NET obfuscator, namely a custom variant of ConfuserEx. This is not a bad choice by the Intel engineers per se as it involves at least a serious attempt to be able to read the DLL's code. Sadly, it's still pretty easy: We just need to debug the application with dnSpy and step into the DLL we want to deobfuscate (set a breakpoint at MultipleRun() and follow the DLL via the call to BenchmarkLibrary::StartBenchmarkRun()). That way the module (the DLL) will have to be loaded as readable but still obfuscated code to be executed:


Now we can save the DLL loaded in memory to a file and clean it up with a deobfuscator. I prefer a self compiled version of de4dot for that task. Now that we have a clean version of IntelBenchmarkSDK.dll we need to exchange it with the original version. That's where another anti-tampering protection of Intel comes into play:


That's good news as well, it seems that Intel is checking for modifications of the application's files and their dependencies. Let's dive deeper to find out if it's worth it. A quick debug session later with a breakpoint at the message box and a look at the call stack I can follow a path of functions that lead to the WIN32 function WinVerifyTrust() to check the digital signature of each DLL in the application's working directory. That is pretty common but the verification is incomplete. It only checks if the digital signature is valid, but does not care who actually signed it. Perfect for our endeavour because I can just sign the IntelBenchmarkSDK.dll with my own signature and switch out the original one with the deobfuscated version. Mission accomplished.


Sadly, from this moment on it's very easy to debug into the HWBOT upload process and catch the AES encryption key and IV. It's also possible to change the data XML file to our likings before it gets encrypted. I won't share any details of course but if you ever wondered what data gets uploaded to HWBOT by XTU you can have a look at this example score with an i7-8700K:

Unencrypted XTU data file
Code: XML
<?xml version="1.0" encoding="utf-16"?>
<XtuProfile
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns="http://intel.com/xtu/xtuProfileSchema">
	<XtuVersion>
		<Field name="Service Version">6.4.1.25</Field>
		<Field name="Client Version">6.4.1.25</Field>
		<Field name="Profile ID"></Field>
		<Field name="Application">Intel® Extreme Tuning Utility</Field>
		<Field name="Provider">Intel</Field>
	</XtuVersion>
	<PowerCondition>
		<Field name="PowerPlanChanged">False</Field>
		<Field name="PowerSourceChanged">False</Field>
	</PowerCondition>
	<OverclockingRestricted>
		<Field name="Value">False</Field>
	</OverclockingRestricted>
	<EngineeringSample>
		<Field name="Value">False</Field>
	</EngineeringSample>
	<ProcessorFamily>
		<Field name="Value">13</Field>
	</ProcessorFamily>
	<Score>
		<Field name="Value">2347</Field>
		<Field name="Benchmark Version">1.0</Field>
	</Score>
	<MaxProcFrequency>
		<Field name="Value">4,41</Field>
	</MaxProcFrequency>
	<HighestCPUTemperature>
		<Field name="Value">60</Field>
	</HighestCPUTemperature>
	<Hardware>
		<Component name="Memory">
			<MemoryBank>
				<Field name="Bank Label">BANK 1</Field>
				<Field name="Device Locator">ChannelA-DIMM1</Field>
				<Field name="Default Speed">3200 MHz</Field>
				<Field name="Capacity">8.00 GB</Field>
				<Field name="Manufacturer">8313</Field>
			</MemoryBank>
			<MemoryBank>
				<Field name="Bank Label">BANK 3</Field>
				<Field name="Device Locator">ChannelB-DIMM1</Field>
				<Field name="Default Speed">3200 MHz</Field>
				<Field name="Capacity">8.00 GB</Field>
				<Field name="Manufacturer">8313</Field>
			</MemoryBank>
			<Field name="Total Installed Memory">16.00 GB</Field>
		</Component>
		<Component name="Graphics">
			<Field name="Name">NVIDIA GeForce GTX 1080 Ti</Field>
			<Field name="Compatibility">NVIDIA</Field>
			<Field name="RAM">4.00 GB</Field>
			<Field name="DAC Type">Integrated RAMDAC</Field>
			<Field name="Driver Version">25.21.14.1694</Field>
			<Field name="Driver Date">11/12/2018</Field>
		</Component>
		<Component name="BIOS">
			<Field name="Manufacturer">American Megatrends Inc.</Field>
			<Field name="Version">F10</Field>
			<Field name="Release Date">9/18/2018</Field>
		</Component>
		<Component name="Operating System">
			<Field name="Manufacturer">Microsoft Corporation</Field>
			<Field name="Name">Microsoft Windows 10 Pro</Field>
			<Field name="Version">10.0.17763</Field>
			<Field name="Service Pack">N/A</Field>
		</Component>
		<Component name="Motherboard">
			<Field name="Manufacturer">Gigabyte Technology Co., Ltd.</Field>
			<Field name="Model">Z370 AORUS ULTRA GAMING WIFI-CF</Field>
			<Field name="Version">x.x</Field>
		</Component>
		<Component name="Processor">
			<Field name="Brand String">Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz</Field>
			<Field name="Physical CPU Cores">6</Field>
			<Field name="Logical CPU Cores">12</Field>
			<Field name="Possible Turbo Bins">Unlimited</Field>
			<Field name="Turbo Overclockable">True</Field>
			<Field name="Intel® Turbo Boost Max">True</Field>
			<Field name="Intel® Speed Shift">True</Field>
			<Field name="Microcode Update">0x96</Field>
			<Field name="Family">CoffeeLake</Field>
		</Component>
	</Hardware>
	<Settings>
		<Section name="Cache">
			<RangeControl id="76">
				<Name>Processor Cache Ratio</Name>
				<Value>44</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
			<RangeControl id="77">
				<Name>Cache Voltage</Name>
				<Value>Default</Value>
				<Units>V</Units>
				<Modifiable>true</Modifiable>
				<Min>Default</Min>
				<Max>2.000</Max>
			</RangeControl>
			<ToggleControl id="78">
				<Name>Cache Voltage Mode</Name>
				<Value>Adaptive</Value>
				<Modifiable>false</Modifiable>
			</ToggleControl>
			<RangeControl id="79">
				<Name>Cache Voltage Offset</Name>
				<Value>0</Value>
				<Units>mV</Units>
				<Modifiable>true</Modifiable>
				<Min>-1000</Min>
				<Max>999</Max>
			</RangeControl>
			<RangeControl id="106">
				<Name>Cache IccMax</Name>
				<Value>255.00</Value>
				<Units>A</Units>
				<Modifiable>true</Modifiable>
				<Min>1.00</Min>
				<Max>255.75</Max>
			</RangeControl>
		</Section>
		<Section name="Processor">
			<RangeControl id="2">
				<Name>Core Voltage</Name>
				<Value>Default</Value>
				<Units>V</Units>
				<Modifiable>true</Modifiable>
				<Min>Default</Min>
				<Max>2.000</Max>
			</RangeControl>
			<RangeControl id="34">
				<Name>Core Voltage Offset</Name>
				<Value>0</Value>
				<Units>mV</Units>
				<Modifiable>true</Modifiable>
				<Min>-1000</Min>
				<Max>999</Max>
			</RangeControl>
			<ToggleControl id="41">
				<Name>Enhanced Intel® SpeedStep Technology</Name>
				<Value>Enable</Value>
				<Modifiable>false</Modifiable>
			</ToggleControl>
			<ToggleControl id="88">
				<Name>Core Voltage Mode</Name>
				<Value>Adaptive</Value>
				<Modifiable>false</Modifiable>
			</ToggleControl>
			<RangeControl id="102">
				<Name>Processor Core IccMax</Name>
				<Value>255.00</Value>
				<Units>A</Units>
				<Modifiable>true</Modifiable>
				<Min>1.00</Min>
				<Max>255.75</Max>
			</RangeControl>
			<RangeControl id="114">
				<Name>AVX Ratio Offset</Name>
				<Value>0</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>0</Min>
				<Max>31</Max>
			</RangeControl>
			<RangeControl id="3489660933">
				<Name>Processor Core Ratio</Name>
				<Value>43</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
		</Section>
		<Section name="Processor.PowerCurrent">
			<RangeControl id="47">
				<Name>Turbo Boost Short Power Max</Name>
				<Value>4095.875</Value>
				<Units>W</Units>
				<Modifiable>true</Modifiable>
				<Min>1.000</Min>
				<Max>4095.875</Max>
			</RangeControl>
			<RangeControl id="48">
				<Name>Turbo Boost Power Max</Name>
				<Value>4095.875</Value>
				<Units>W</Units>
				<Modifiable>true</Modifiable>
				<Min>1.000</Min>
				<Max>4095.875</Max>
			</RangeControl>
			<ToggleControl id="49">
				<Name>Turbo Boost Short Power Max Enable</Name>
				<Value>Enable</Value>
				<Modifiable>true</Modifiable>
			</ToggleControl>
			<ToggleControl id="50">
				<Name>Package Turbo Power Lock</Name>
				<Value>Disable</Value>
				<Modifiable>false</Modifiable>
			</ToggleControl>
			<RangeControl id="66">
				<Name>Turbo Boost Power Time Window</Name>
				<Value>8.000</Value>
				<Units>Seconds</Units>
				<Modifiable>true</Modifiable>
				<Min>0.250</Min>
				<Max>96.000</Max>
			</RangeControl>
		</Section>
		<Section name="Processor.Turbo">
			<ToggleControl id="26">
				<Name>Intel® Turbo Boost Technology</Name>
				<Value>Enable</Value>
				<Modifiable>false</Modifiable>
			</ToggleControl>
			<ToggleControl id="80">
				<Name>Overclocking Lock</Name>
				<Value>Unlock</Value>
				<Modifiable>false</Modifiable>
			</ToggleControl>
		</Section>
		<Section name="Processor.Turbo.RatioLimit">
			<RangeControl id="116">
				<Name>Core 0</Name>
				<Value>83</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
			<RangeControl id="117">
				<Name>Core 1</Name>
				<Value>83</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
			<RangeControl id="118">
				<Name>Core 2</Name>
				<Value>83</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
			<RangeControl id="119">
				<Name>Core 3</Name>
				<Value>83</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
			<RangeControl id="120">
				<Name>Core 4</Name>
				<Value>83</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
			<RangeControl id="121">
				<Name>Core 5</Name>
				<Value>83</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
		</Section>
		<Section name="Processor.Turbo.Ratios">
			<RangeControl id="29">
				<Name>1 Active Core</Name>
				<Value>47</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
			<RangeControl id="30">
				<Name>2 Active Cores</Name>
				<Value>46</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
			<RangeControl id="31">
				<Name>3 Active Cores</Name>
				<Value>45</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
			<RangeControl id="32">
				<Name>4 Active Cores</Name>
				<Value>44</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
			<RangeControl id="42">
				<Name>5 Active Cores</Name>
				<Value>44</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
			<RangeControl id="43">
				<Name>6 Active Cores</Name>
				<Value>43</Value>
				<Units>x</Units>
				<Modifiable>true</Modifiable>
				<Min>8</Min>
				<Max>83</Max>
			</RangeControl>
		</Section>
	</Settings>
</XtuProfile>

The Interprocess Communication (sucks)



In the previous chapter of this article I uncovered the attack vector to modify the main application's DLLs by simply digitally signing it again with my own certificate. This is also true for the temporarily created p95-bench(..-bit).exe files. The only thing that needs to be added to the recipe to run our own inner benchmark executable is to add the "read only" flag to these files by using the Windows Explorer's File Property dialog. By doing that XTU can't overwrite the modified versions and starts whatever executable it finds under those names (as long as they are digitally signed of course). With that in mind let's write our own benchmark file that integrates nicely into XTU to return any result value we like. To do that we need to implement the same interprocess communication that the original benchmark executable provides for the XTU main application by creating the memory mapped result file and an additional event to signal that the result file was successfully written. Now the fun part begins were we need to reverse engineer the result data that is written to the file. To show how vulnerable the communication with memory mapped files is, I chose to do this with a small command line tool that continuously reads data in these files. This is the result data for each loop in binary (formatted as bytes):

Code: TEXT
Loop 1: 00000000 00000000 00000000 00000000 00000000 00000000 10010000 01000000 00100110 10100000 00110000 11000101 10101100 00010011 11100110 00111111 11110001 01100100 11101000 01011110 00001101 00110011 11101111 00111111
Loop 2: 00000000 00000000 00000000 00000000 00000000 00000000 10010000 01000000 11110010 01101111 00100100 11000100 10010000 01101100 11100010 00111111 01111110 00110011 00111111 11010001 11101101 01100101 11110010 00111111
Loop 3: 00000000 00000000 00000000 00000000 00000000 00000000 10010000 01000000 01011011 10000011 01001011 01000001 00101100 11000111 11100010 00111111 10001011 10010011 11100011 11000110 11111000 00111111 11110110 00111111
...

Next we need to align the data as meaningful variables. We already know that somewhere in there might be a floating point value that returns the time of the fastest iteration. Bingo, the first 192 bit are double precision values and already look very promising. This would be the C++ code to align our data correctly:

Code: CPP
#define FILESIZE 0x18u
#define OTHERDATASIZE (FILESIZE - (sizeof(double) * 3))

// Read result data and convert it
char *szData = <read memory mapped file data>;

struct DATARECORD
{
	double dValue1,
		   dValue2,
		   dValue3;

	char szOther[OTHERDATASIZE];
};

DATARECORD *pData = (DATARECORD *)szData;

// Output our doubles
cout << "d1: " << pData->dValue1 << ", d2: " << pData->dValue2 << ", d3: " << pData->dValue3 << "\n";

That leads to:

Code: TEXT
d1: 1024, d2: 0.68990171922360299, d3: 0.97498196160009176
d1: 1024, d2: 0.57575262364434665, d3: 1.1498850034433583
d1: 1024, d2: 0.58681309464382581, d3: 1.3906181115246194

Looking at the output of our first standalone run we can easily guess that the second double precision value is the time for fastest iteration while the third seems to be the average time for all 1875 iterations. I couldn't figure out what the rest of the data (pData->szOther) represents, although another look at the benchmark executable with IDA could reveal that as well. But they are not a part of the XTU score so we don't need to know them. What we know for a fact now is that the result file is accessible through our own command line tool and stores unencrypted data. The interprocess communication with memory mapped files is therefore without doubt the weakest spot of XTU. If you think it can't get worse then have a look at the following screenshot showing a DLL injection into XTUService.exe to redirect calls to the WIN32 functions that are used to access the TSC bending value's memory mapped file. I intercept the creation and opening calls and change the file path to an alternate location where the service will continuously write the tick count into a file that will never be used. Finally I create the original TSC bend file myself and write my much bigger custom value to it, 15,000,000 instead of 3,696,000. The benchmark executable will now perceive a second four times longer than it really is.


The Formula



We have collected so many pieces of the XTU puzzle by now, that we can tackle the score formula next. Of course there is always the possiblity to have a peek at the C# source code, but we won't do that. Reverse engineering a formula is too much fun, no shortcuts necessary. Instead I went ahead and implemented an empty inner benchmark executable that contains all the necessary interprocess communication to work exactly as the original one, although it does nothing but return a custom result value. I started with 0.2 milliseconds (world record) and went back to 10 ms to get a range of meaningful results:


As you can see it is not a linear curve. So to solve this we need to figure out how it scales. Focus on the 1307 points for the result of 1 ms. 2 ms get exactly half the points, 0.5 ms will be awarded with two times the points. That sounds very much like 1307 / 2 and 1307 / 0.5. The division by comma is inversed so it's 1307 divided by 1/2 = 1307 * 2. So the formula is nothing more than a simple division with a chosen dividend:

xtu-the-formula_235449.png


It's interesting to see that the difference between a world record with 18 cores (6344 points with 7980XE@5.7G) and the fastest 8 core score (4778 points with 9900K@6.8G) is merely 0.0675 ms, essentially nothing. That indicates really bad core scaling that is covered up by the non-linear formula. We will have a closer look at scaling in the second part of this article (spoiler alert: it sucks!).

Attack Vectors



To summarize my findings I have compiled the following sorted list of attack vectors that XTU suffers from:

  1. Reading, writing and intercepting XTU's unencrypted interprocess communication between the inner benchmark executable, the XTU service and the main application to skew time (TSC bending value) or modify the result data.
  2. Intercepting RDTSC/RDTSCP instructions via hypervisor to change how the benchmark interprets time in seconds. I notified Intel about this exploit in 2017.
  3. Modify or exchange DLLs and the inner benchmark executable by digitally signing it with your own valid certificate.

Tweaks



Quick performance testing

  1. Open an Explorer and go to: C:\ProgramData\Intel\Intel Extreme Tuning Utility\Temp
  2. Check if you have "Show hidden files, folders, drives" enabled in your Explorer's Folder Options
  3. Create a local.txt in this directory and paste this line into it: CpuArchitecture=5
    This line sets the CPU architecture to i3/i5/i7 (the fastest implementation) and has to remain in config file for standalone testing. Otherwise the architecture won't be correctly detected.
  4. Open XTU and start a benchmark run
  5. Go back to the Explorer and refresh it, you will find a hidden file called p95-bench(64-bit).exe (or p95-bench(32-bit).exe for 32 bit systems). Show its file properties and check the box for "read only" while XTU is still benching. With this the standalone benchmark executable will be permanently available for testing.
  6. Open a command line window (cmd.exe or PowerShell) and navigate to the above location
  7. Execute the benchmark with an additional command line parameter: p95-bench(64-bit).exe 0

Abusing Prime95's configuration files

  1. Go through all steps above for quick performance testing
  2. You can now edit your local.txt file to your own likings. Have a look at the official documentation of Prime95 27.7 (undoc.txt).
  3. Important: To enable the configuration file inside XTU you need to copy local.txt to: C:\Windows\SysWOW64
    Depending on the window version and bitness the default working directory could also be C:\Windows\system32
  4. Check with some cores or AVX disabled if your config file impacts the score

Example for disabling hyperthreading and two cores on the 7980XE (CPU doesn't scale beyond 16 cores):

Code: INI
NumCPUs=16
CpuNumHyperthreads=1

Disabling AVX (never did anything good to me but great for testing config file's impact)

Code: INI
CpuSupportsAVX=0

Customizing L2 cache configuration (sometimes as smaller setting is faster, see undoc.txt):

Code: INI
CpuL2CacheSize=256
CpuL2CacheLineSize=128
CpuL2SetAssociative=4

Only the best iteration wins!

A fact that seems to be publicy unknown is that although 20 loops with 1875 Lucas Lehmer iterations are executed, only the best iteration counts for the final score. So you don't need to run all loops with full speed. Or even better: Use a configuration file that disables certain features and enable it (via batch file) after the first few loops have successfully gone through. This will downgrade your settings to 2 cores, no hyperthreading and AVX disabled:

Code: INI
CpuArchitecture=5
NumCPUs=2
CpuNumHyperthreads=1
CpuSupportsAVX=0

Conclusion



My thorough investigation shows that XTU is vulnerable to several serious attacks. I found two distinct ways to change the benchmark's perception of time and therefor the final score. Additionally the benchmark's DLLs and executables can be easily modified by abusing the fact that the main application only relies on valid digital signatures without checking the owner of the certificate itself. Last but not least the interprocess communication can be intercepted by anyone with basic WIN32 programming skills and the courage to dump the strings of the inner benchmark executable to gather the names of the memory mapped files.

Security issues aside the real problem lies in Intel's choice of the workload for benching with XTU. Although it uses the well known Prime95, that is a great stress test and surely can be a great benchmark as well, Intel's engineers chose the (now pretty much obsolete) FFT timings benchmark, that scales horribly with more than 8 cores. That's why Prime95 normally uses several worker threads to run as many FFTs in parallel as possible. Running only a single Lucas Lehmer iteration especially with the small size of 1024K on all available cores was never going to be future proof. If somebody should have known that, it would be Intel, right?

Furthermore XTU is a benchmark that gives a score to something that only takes a fraction of a millisecond into account (see "Only the best iterations wins!" above). There is also no error checking implemented, so whatever it calculates it could be garbage as far as we know. So it actually just tests if the OS is stable enough with the current system settings, for about a second, then sleeps for about a second and repeats that process 20 times. Yeah.

This benchmark is a dead end, no doubt about it.
Kontakt | Unser Forum | Über overclockers.at | Impressum | Datenschutz