Der Lego Thread
About two months ago our beloved Turrican passed away in a horrible car accident. He was a big part of our community and is still Austria's only overclocking legend! Out of my sadness and anger I started to work on this project as a virtual monument, something to honour him in our scene. It's an homage to SuperPI, that Turrican benched on every possible platform, and calculates Pi completely parallelized on graphics cards and CPUs. So let's get our gear going and do exactly what our Karl showed us in all his years: Bench the hell out of GPUPI!
Downloads: GPUPI 3.2 | GPUPI 3.2 - Legacy Version (only use on old systems, supports XP, 32 bit, OpenCL 1.1 and GeForce 200 to 500 series cards)
GPUPI calculates the mathematical constant Pi in parallel by using the BPP formula and optimizing it for OpenCL capable devices like graphics cards and main processors. It's implemented with C++, STL and pure Win32 to avoid unnecessary dependencies. The result of the benchmark are exactly nine digits of pi in hexadecimal. So if you're calculating pi in 1B (1 billion) it will not output all digits like available serial calculations of Pi, but display only the nine digits after the billionth hexadecimal digit. This limitation is due to the nature of parallel implementations and the used Pi formula.
The calculation is split up into smaller packages called batches which themselves consist of millions of calculations of the BPP series term on each possible compute core of all selected devices. Afterwards the millions of intermediary results will be accumulated inside the device's memory to get a single result per batch. This technique is very common in parallel applications and called memory reduction. Use the Batch Size and Reduction Size in the settings dialog to fine tune the calculations and reductions for your selected devices.
The benchmark relies heavily on 64 bit integer performance. Additionaly each series term calculation of the BPP formula needs a division using double precision. The result is stored and finally accumulated as doubledouble, two doubles combined for even higher precision. Starting with 1B digits each kernel has to make use of custom 128 bit integer routines where necessary.
A few examples of GPUPI runs with 1B:
- GPUPI 2.3.4 (1.22 MB, CRC-32: 88BE77A2)
- GPUPI 2.3.4 - Legacy Version (only use for Windows XP/GeForce 200 series!, 626 KB, CRC-32: 5717B079)
- GPUPI 2.2 (1.08 MB, CRC-32: 054753D5)
- GPUPI 2.2 - Legacy Version (Windows XP, GeForce 200 series, 630 KB, CRC-32: 0D6E44A3)
- GPUPI 2.1.2 (1.06 MB, CRC-32: CB9069F2)
- GPUPI 2.1.2 - Legacy Version (Windows XP, GeForce 200 series, 590 KB, CRC-32: 1497D087)
- GPUPI 2.0 (1.4 MB)
What's the minimum requirements to run the benchmark?
GPUPI supports not only GPUs, but also CPU calculations. It needs at least OpenCL 1.x with double precision support. Target platform for the benchmark is Windows Vista and later. To bench on Windows XP, download the legacy version.
To check if your graphics card supports double precision, have a look at these (incomplete) lists on Wikipedia: AMD | NVIDIA
Which drivers do I need for OpenCL?
- NVIDIA GPUs: GeForce Driver
- AMD GPUs: Catalyst Driver
- Intel GPUs: OpenCL driver for Intel® Iris™ and HD Graphics for Windows
- Intel CPUs: OpenCL™ Runtime x.x for Intel® CPU and Intel® Xeon Phi™ coprocessors for Windows*
- AMD CPUs: If you can't install the Catalyst drivers, use the AMD APP SDK instead.
GPUPI says that MSVCP110.dll/MSVCP120.dll is missing?
The benchmark executable is compiled with Visual Studio 2013 and therefor needs the Visual C++ Redistributable Packages for Visual Studio 2013. Download vcredist_86.exe to run GPUPI.exe (32 bit) or vcredist_64.exe to run GPUPI_x64.exe (64 bit).
The legacy version of the benchmark up to GPUPI 2.2 has been built with Visual Studio 2012. So you will have to install the Visual C++ Redistributable Packages for Visual Studio 2012 instead.
Error: High Precision Event Timer for time measurement not found!
GPUPI needs a timer with a very high resolution to ensure that the time measurement for a benchmark run is precise. Therefor you need to have the High Precision Event Timer (HPET) enabled in your BIOS settings and your system settings. To check the status of the latter, open up a command prompt with administration rights and run:
The value of useplatformclock should be "Yes". If it's not, you can fix this by running:Code:
A reboot might be necessary afterwards.Code:
bcdedit /set useplatformclock yes
What is the Batch Size?
The Batch Size is the number of partial calculations, that will be used to calculate the result. So 1M means one million of simultaneously invoked kernel calls to the OpenCL device before the partial sums will be reduced to only one, which will be again be used for the final equation to calculate the digits of pi. So the Batch Size is important for two things: core utilization and memory allocation.
Most important especially for GPUs is the utilization of cores on the device. Some GPUs handle a higher number of kernel calls at once more efficiently than others. It's a fine line and you should test multiple batch sizes on a new graphics card. But choose wisely as a bigger batch size also affects the memory usage, that will consequently have to be reduced. Therefor the calculation and reduction time is shown separately after the benchmark run to see what's the impact of your current batch size setting.
By the way, the batches that are shown while running the benchmark are not equal to the batches you can set the size for. Those are pseudo batches that contain multiple smaller batches to let you know the progress of the calculation.
How can I reset the configuration?
After the first benchmark run the application will create a configuration file in the current working directory named GPUPI.cfg. If it's deleted, the benchmark will load its default configuration. Be sure to have the benchmark closed while doing this.
Can GPUPI handle multiple graphics cards in SLI/Crossfire?
Yes, it can! Use GPUPI 2.0 or higher to make use of multiple graphics cards or even multiple CPUs in your system.
What's done to verify the results?
The hexadecimal output was verified by the result tables of Fabrice Bellard. For 32M I had to run y-Cruncher to be sure. These verified digits are stored in a different form in the executable and are used to verify the result after each run.
I get the error message "Display driver has stopped responding and has recovered", but I didn't overclock anything!
All graphics card drivers have a built-in watchdog that protects the hardware from infinte loops. This happens after about five seconds and usually causes a little blackscreen, but nothing to worry about. If you did not overclock, this indicates that your GPU can not handle a single batch of the calculation before the watchdog believes something is wrong. Just use a smaller Batch Size in the setting dialog before each run and it should be fine. This mostly happens to cheaper graphics cards.
I get "Invalid result", but I didn't overclock anything!
These strange issues happen with old driver versions. There seem to be some precision bugs with floating point operations, that alter the final result. Sometimes it's just a few digits, sometimes it's more. Be sure to keep your driver up to date before running the benchmark, to avoid problems like that.
Is it possible to run GPUPI on Windows XP?
Yes, but with restrictions. Only the pure OpenCL version (GPUPI_OpenCL.exe) runs on XP, because the new CUDA toolkits no longer support it. It still might be difficult to get the OpenCL drivers working. For Radeon cards the AMD APP SDK v2.5 might help you out.
I get the error "Error: Could not create context!" and a popup stating "Could not start worker thread" with my Intel Core2Duo or Core2Quad!
This is a driver bug with the Intel OpenCL drivers, that do no longer work with these old CPUs. Install the AMD OpenCL drivers that are included with the AMD Catalyst software instead, they will work. You will have to plug in an AMD card to be able to install them though. If you don't have one at your disposal, you can try to install the APP SDK 2.9, but the drivers won't be as new (and maybe as fast) as the latest Catalyst.
GPUPI crashes when launching the application.
- Using Windows 7? Try installing SP1, if you haven't yet.
- Reinstall Visual C++ Redistributable Packages for Visual Studio 2013 (or 2012 if using the legacy version), it may be faulty. For download links see the FAQ entry with missing MSVCP110.dll/MSVCP120.dll.
I can not install the latest or an older Intel OpenCL driver. The error message says that a "higher version of the OpenCL runtime for Intel Core and Xeon Processors is already installed".
Open the registry editor and delete the following two registry keys:
I have installed the latest Intel OpenCL 2.0 drivers but only OpenCL 1.2 shows up!
Your system will most likely have the old version of the Intel OpenCL driver linked. This can be easily fixed by deleting all old Intel OpenCL DLLs in the following registry key: HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors\
Search for "IntelOpenCL64.dll" or "Intelocl64.dll" an delete the value. Change the key name to "C:\Program Files (x86)\Common Files\Intel\OpenCL\bin\x64\Intelocl64.dll" (or where ever your Intelocl64.dll file is)
Why is the benchmark on my system so much slower than on other comparable systems?
- Your memory may be overclocked too high and is therefor unstable and causing efficiency problems. Try stock clocks and have a look at the reduction time in the statistics shown below the result.
- This might be related to drivers problems as well. Uninstall your current OpenCL or CUDA drivers, that you are using. Reinstall a different driver version and try again.