Oleg Zabluda's blog
Saturday, April 01, 2017
 
99% matrix sparsity is generally not enough for things to be faster on GPU than simply doing dense multiplication.
99% matrix sparsity is generally not enough for things to be faster on GPU than simply doing dense multiplication. Easy-to-understand benchmark data is hard to get. For example, typically, NVIDIA will say that their cuSparse is 2x-5x faster than Intel’s MKL, but nowhere would you find how many FLOPS they get:
https://developer.nvidia.com/cusparse

Here is the secret knowledge for you. On slide 29, AMD brags how clSparse is faster than cuSparse
http://www.iwocl.org/wp-content/uploads/iwocl-2016-clsparse-vendor-blas-library.pdf

getting a whopping 40 GFLOP/s (geometric mean across benchmarks) on a GPU (FURY-X) with 8600 GFLOP/s (40/8600=4.5% utilization) vs 10 GFLOP/s on Titan-X Maxwell with 6144 GFLOP/s (10/6144=1.5% utilization)
https://developer.nvidia.com/cusparse

Labels:


| |

Home

Powered by Blogger