GPUPI: International support thread

mat 11.11.2014 280630 1

About two months ago our beloved Turrican passed away in a horrible car accident. He was a big part of our community and is still Austria's only overclocking legend! Out of my sadness and anger I started to work on this project as a virtual monument, something to honour him in our scene. It's an homage to SuperPI, that Turrican benched on every possible platform, and calculates Pi completely parallelized on graphics cards and CPUs. So let's get our gear going and do exactly what our Karl showed us in all his years: Bench the hell out of GPUPI!

DOWNLOAD: GPUPI is officially integrated into BenchMate. Download BenchMate to get the latest version of GPUPI.

Deutsche Version

Legacy Version for Windows XP

Supports XP SP3, 32 bit, OpenCL 1.1 and GeForce 200 to 500 series cards, slower on modern hardware!

CUDA/OpenCL drivers for GPUs: Install the latest graphics card drivers!

OpenCL drivers for CPUs:

AMD APP SDK 3.0 (OpenCL 2.0, just install the driver, uncheck the rest)
AMD APP SDK 2.9.1 (OpenCL 1.2, just install the driver, uncheck the rest)
Intel CPU Runtime for OpenCL 3.0 (only with BenchMate 11+)
Intel CPU Runtime for OpenCL 2.1 (quick registration necessary)

Archived Versions:

GPUPI 2.3.4

Technical Details

GPUPI calculates the mathematical constant Pi in parallel by using the BPP formula and optimizing it for OpenCL capable devices like graphics cards and main processors. It's implemented with C++, STL and pure Win32 to avoid unnecessary dependencies. The result of the benchmark are exactly nine digits of pi in hexadecimal. So if you're calculating pi in 1B (1 billion) it will not output all digits like available serial calculations of Pi, but display only the nine digits after the billionth hexadecimal digit. This limitation is due to the nature of parallel implementations and the used Pi formula.

The BPP formula used for calculation pi in this benchmark

The calculation is split up into smaller packages called batches which themselves consist of millions of calculations of the BPP series term on each possible compute core of all selected devices. Afterwards the millions of intermediary results will be accumulated inside the device's memory to get a single result per batch. This technique is very common in parallel applications and called memory reduction. Use the Batch Size and Reduction Size in the settings dialog to fine tune the calculations and reductions for your selected devices.

The benchmark relies heavily on 64 bit integer performance. Additionaly each series term calculation of the BPP formula needs a division using double precision. The result is stored and finally accumulated as doubledouble, two doubles combined for even higher precision. Starting with 1B digits each kernel has to make use of custom 128 bit integer routines where necessary.

A few examples of GPUPI runs with 1B:

AMD Radeon R9 290, NVIDIA GeForce GTX 980 and Intel Core i7-4960X@4 GHz

Archived Versions

GPUPI 3.2 | GPUPI 3.2 - Legacy Version (supports XP, 32 bit, OpenCL 1.1 and GeForce 200 to 500 series)
GPUPI 3.1.1 | GPUPI 3.1.1 - Legacy Version (supports XP, 32 bit, OpenCL 1.1 and GeForce 200 to 500 series)
GPUPI 3.0.1
GPUPI 2.3.4 (1.22 MB, CRC-32: 88BE77A2)
GPUPI 2.3.4 - Legacy Version (only use for Windows XP/GeForce 200 series!, 626 KB, CRC-32: 5717B079)
GPUPI 2.2 (1.08 MB, CRC-32: 054753D5)
GPUPI 2.2 - Legacy Version (Windows XP, GeForce 200 series, 630 KB, CRC-32: 0D6E44A3)
GPUPI 2.1.2 (1.06 MB, CRC-32: CB9069F2)
GPUPI 2.1.2 - Legacy Version (Windows XP, GeForce 200 series, 590 KB, CRC-32: 1497D087)
GPUPI 2.0 (1.4 MB)

FAQ

What's the minimum requirements to run the benchmark?

GPUPI supports not only GPUs, but also CPU calculations. It needs at least OpenCL 1.x with double precision support. Target platform for the benchmark is Windows Vista and later. To bench on Windows XP, download the legacy version.

To check if your graphics card supports double precision, have a look at these (incomplete) lists on Wikipedia: AMD | NVIDIA

Which drivers do I need for CUDA/OpenCL?

NVIDIA GPUs: GeForce Driver
AMD GPUs: Graphics Card Driver
Intel GPUs: Graphics Card Driver
AMD CPUs with OpenCL 2.0 support: AMD APP SDK 3.0 (just install the driver, uncheck the rest)
AMD CPUs with OpenCL 1.2 support: AMD APP SDK 2.9.1 (often faster than OpenCL 2.0, just install the driver, uncheck the rest)
Intel CPUs: Intel CPU Runtime for OpenCL (quick registration necessary)

GPUPI says that MSVCP110.dll, MSVCP120.dll or MSVCP140.dll is missing?

GPUPI 3.x: The benchmark executable is compiled with Visual Studio 2015 and therefor needs the Visual C++ Redistributable Packages for Visual Studio 2015. You will only need to install the 64 bit version: vcredist_64.exe

GPUPI 3.x - Legacy Version: The "Legacy Version" of GPUPI 3.x is for older platforms and includes support for 32 bit versions of Windows as well as Windows XP. Therefor it was needed to compile it with Visual Studio 2013, so you'll need to install the following Visual C++ Redistributable Packages for Visual Studio 2013. This is for 32 bit only so just install: vcredist_x86.exe

GPUPI 2.x: The benchmark executable is compiled with Visual Studio 2013 and therefor needs the Visual C++ Redistributable Packages for Visual Studio 2013. Download vcredist_86.exe to run GPUPI.exe (32 bit) or vcredist_64.exe to run GPUPI_x64.exe (64 bit).

GPUPI 2.x - Legacy Version: The "Legacy Version" up to GPUPI 2.2 has been built with Visual Studio 2012. So you will have to install the Visual C++ Redistributable Packages for Visual Studio 2012 instead.

Why do I need the High Precision Event Timer (HPET) to get comparable results that can be uploaded to HWBOT?

Depending on your system configuration the default timers might not be reliable enough to measure comparable scores for world records. In that case GPUPI will need you to enable HPET (High Precision Event Timer). You can check the status of your system timer configuration by opening up a command prompt with administration rights:

Code: SHELL
bcdedit /enum

The value of useplatformclock should be "Yes". If it's not, you can fix this by running:

Code: SHELL

bcdedit /set useplatformclock yes

A reboot will be necessary to apply any changes.

What is the Batch Size?

The Batch Size is the number of partial calculations, that will be used to calculate the result. So 1M means one million of simultaneously invoked kernel calls to the OpenCL device before the partial sums will be reduced to only one, which will be again be used for the final equation to calculate the digits of pi. So the Batch Size is important for two things: core utilization and memory allocation.

Most important especially for GPUs is the utilization of cores on the device. Some GPUs handle a higher number of kernel calls at once more efficiently than others. It's a fine line and you should test multiple batch sizes on a new graphics card. But choose wisely as a bigger batch size also affects the memory usage, that will consequently have to be reduced. Therefor the calculation and reduction time is shown separately after the benchmark run to see what's the impact of your current batch size setting.

By the way, the batches that are shown while running the benchmark are not equal to the batches you can set the size for. Those are pseudo batches that contain multiple smaller batches to let you know the progress of the calculation.

Is it possible to run GPUPI on Windows XP?

Yes, use the "Legacy Version" of GPUPI!

Why is my NVIDIA graphics card not using its highest GPU and/or memory frequency during the benchmark run?

There is a known bug in CUDA that does not apply the correct P State during a workload. Use the NVIDIA Inspector and set the option "CUDA - Force P2 State" to Off in the driver's Base Profile. See this Screenshot to find this option. (Thanks to mllrkllr88 for providing the fix)

How can I reset the configuration?

After the first benchmark run the application will create a configuration file in the current working directory named GPUPI.cfg. Delete this file to load the default configuration. Be sure to have the benchmark closed while doing this.

Can GPUPI handle multiple graphics cards in SLI/Crossfire?

Yes, it can! Use GPUPI 2.0 or higher to make use of multiple graphics cards or even multiple CPUs in your system.

What's done to verify the results?

The hexadecimal output was verified by the result tables of Fabrice Bellard. For 32B I had to run y-Cruncher to be sure. These verified digits are stored in a different form in the executable and are used to verify the result after each run.

I get the error message "Display driver has stopped responding and has recovered", but I didn't overclock anything!

All graphics card drivers have a built-in watchdog that protects the hardware from infinte loops. This happens after about five seconds and usually causes a little blackscreen, but nothing to worry about. If you did not overclock, this indicates that your GPU can not handle a single batch of the calculation before the watchdog believes something is wrong. Just use a smaller Batch Size in the setting dialog before each run and it should be fine. This mostly happens to cheaper graphics cards.

I get "Invalid result", but I didn't overclock anything!

These strange issues happen with old driver versions. There seem to be some precision bugs with floating point operations, that alter the final result. Sometimes it's just a few digits, sometimes it's more. Be sure to keep your driver up to date before running the benchmark, to avoid problems like that.

I get the error "Error: Could not create context!" and a popup stating "Could not start worker thread" with my Intel Core2Duo or Core2Quad!

This is a driver bug with the Intel OpenCL drivers, that do no longer work with these old CPUs. Install the AMD OpenCL drivers that are included with the AMD Catalyst software instead, they will work. You will have to plug in an AMD card to be able to install them though. If you don't have one at your disposal, you can try to install the APP SDK 2.9, but the drivers won't be as new (and maybe as fast) as the latest Catalyst.

GPUPI crashes when launching the application.

Using Windows 7? Try installing SP1, if you haven't yet.
Reinstall Visual C++ Redistributable Packages, it may be faulty. For download links see the FAQ entry with missing MSVCP110.dll, MSVCP120.dll and MSVCP140.dll files.

HWiNFO can not be initialized before the benchmark run. What can I do?

If HWiNFO is already running in the background, please close it.
If you have an Antivirus software or Tuning tool running (AVG, Kaspersky, ...), disable them temporarily. Windows Defender should not be a problem though.
If you are on Windows 7, install the KB3033929 update.

I can not install the latest or an older Intel OpenCL driver. The error message says that a "higher version of the OpenCL runtime for Intel Core and Xeon Processors is already installed".

Open the registry editor and delete the following two registry keys:

HKEY_LOCAL_MACHINE\SOFTWARE\Intel\OpenCL\cpu_version
HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Intel\OpenCL\cpu_version

Run the installer again, it should work now.

I have installed the latest Intel OpenCL 2.0 drivers but only OpenCL 1.2 shows up!

Your system will most likely have the old version of the Intel OpenCL driver linked. This can be easily fixed by deleting all old Intel OpenCL DLLs in the following registry key: HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors\
Search for "IntelOpenCL64.dll" or "Intelocl64.dll" an delete the value. Change the key name to "C:\Program Files (x86)\Common Files\Intel\OpenCL\bin\x64\Intelocl64.dll" (or where ever your Intelocl64.dll file is).

Why is the benchmark on my system so much slower than on other comparable systems?

Your memory may be overclocked too high and is therefor unstable and causing efficiency problems. Try stock clocks and have a look at the reduction time in the statistics shown below the result.
This might be related to drivers problems as well. Uninstall your current OpenCL or CUDA drivers, that you are using. Reinstall a different driver version and try again.

» Beitrag diskutieren (1 Kommentare)