After months of hard work we proudly announce the official release of GPUPI 3. The third generation of our Compute Benchmark is now finally feature complete. Version 3 brings integrated hardware detection via HWiNFO
, a new Mixed Multi-GPU Mode to unite different GPUs like AMD and NVIDIA, a brand new Command Line Mode, a new logo
and a major speedup for all devices and platforms!Downloads:New Hardware supported
Optimizations of the Pi Algorithm
- Titan V
- Vega 56 und Vega 64
- Coffee Lake
- Intel Xeon Phi (Many Integrated Core Architecture)
- Raven Ridge (Ryzen 5 2400G, Ryzen 3 2200G)
- Ryzen 2
For the first time since GPUPI 1.x the code inside the calculation core was improved. GPUPI had some speed increases before but only because new hardware was officially supported/optimized or new CUDA/OpenCL versions introduced new features that GPUPI could use for improvements to stay on the edge of what's out there. This time we were able to trim the whole core and especially the 128 bit integer calculations. The compute kernels are now cleaner than ever and need less code paths to be backwards compatible. Additionally the comparability between different hardware architectures and driver platforms is now better than ever. Last but not least it offers a huge speedup
to calculate pi faster on all compatible devices.
We know that the difference in speed brings up the question of credibility
to compare different hardware. It also forces overclockers around the world to rebench their hardware to stay on top of the ranks. We thought hard about this and decided to go for it and push things forward for the benefit of releasing the best version of GPUPI, that we can currently offer. For detailed information about these implications please read our remarks on the HWBOT forums here
The following performance improvements have been implemented in GPUPI 3:
- Optimized 128 bit integer arithmetics
- Support for CUDA 9.x with reduced kernel call latency and Memory Reduction with Warp Shuffeling (NVIDIA Pascal and above)
- Improved kernel synchronization for OpenCL to reduce overhead
To give you some numbers we benched GPUPI 2.3.4 and GPUPI 3.3 with 2x GTX 1080 Ti on 32B. The 32 billionth digits of pi are now calculated in 4 minutes und 37 seconds!
Debug Log und Tooltips GPUPI 3.3 vs GPUPI 2.3.4 - guess which version wins! Hardware-Erkennung mittels HWiNFO GPUPI 3 - Tooltips and Debug Log Mixed Multi-GPU-Modus GPUPI 3 - Hardware Detection Painless Timers A GTX 1080 and a R9 390 combined on a GIGABYTE Z370 AORUS ULTRA GAMING WIFI
GPUPI relied heavily on the High Precision Event Timer (HPET) for robust timing. With version 3 we finally allow time measurement via the integrated Timestamp Counter
(TSC) on all platforms that are not prone to clock skewing on Windows 8 and above. Our new timer code gracefully falls back to RTC or HPET to ensure secure timing. If HPET is absolutely necessary to produce competitive benchmark results a dialog will pop up that asks to enable HPET on the system including a reboot to activate the changes. HPET can be disabled again later by accessing the Tools menu and selecting "Disable HPET".
To make things easier for casual benchers GPUPI now allows to calculate pi even without a secure timer. For the sake of comparability any submission to HWBOT will then be prohibited.Command Line Mode
By using GPUPI-CLI.exe
you can now calculate pi via Command Line Interface. Be sure to start the Powershell or Command Prompt with Administrator rights
to avoid that the benchmark runs in a seperate window. GPUPI needs Adminstrator privileges to temporarily install the hardware detection driver.
The Command Line Mode is especially interesting for benchers and reviewers that like to automate GPUPI.
From left to right: Listing all devices, Test Automation and benchmark run, Help with a list of all command line parameters
- GPUPI-CLI.exe -h ... Help
- GPUPI-CLI.exe -l ... List all devices on the system
- GPUPI-CLI.exe -d 1B -g ... Calculate 1B on all graphics cards in the system and let the automatic testing find the best Compute Platform, Batch Size and Reduction Size
- GPUPI-CLI.exe -d 1B -c -a OpenCL -b 20M -r 512 ... Calculate 1B on all OpenCL compatible graphics cards in the system and use Batch Size 20M and Reduction Size 512
- GPUPI-CLI.exe -d 100M -c ... Calculate 100M on all CPUs in the system and test for the best combination of benchmark parameters (this will take a while)
Thanks to our longtime community member Jackinger we have a new logo an we love it! Yes, it's actually two logos that are used depending on the icon size we need.
Further improvements We love our new logo! Both variants!
- HWBOT submission is now encrypted via https, screenshots are mandatory to help with result moderation, HWBOT data files are now properly named (ie: 2x_GeForce_GTX_1080_Ti_GPUPI_32B_04m-37.242s.hwbot)
- New Reduction Size: 1024
- Support for OpenCL 2.0 (no speedup yet, but faster initialization of devices)
- Improved Device Detection für AMD graphics cards
- Detailed output of all used Compute Platforms and their versions
- New setting: "Run Confirmation"
- New Legacy Version that improves compatibility to old hardware
- Detailed and better formatted error messages
- New method to display the benchmark progress without flickering when benching graphics cards
- Revised the about dialog
A big thank you goes out to everybody that helped to test the new version or supported hardware:
- Martin Malik, developer of HWiNFO
- Jackinger for the new logo
- Intel Germany
- GIGABYTE Germany
- Blaues U-boot for sending us his GTX 280
- Garbage for his tireless testing of exotic and unreleased CPUs and APUs
- MisterHM80 for testing the new support of Intel Xeon Phis
- steponz for letting us fix GPUPI for the Titan V over remote control
» Beitrag diskutieren (1 Kommentare)