- •Foreword
- •CUDA installation
- •Installing CUDA environment
- •Measuring GPUs performance
- •Linpack benchmark for CUDA
- •Tests results
- •One Tesla S2050 GPU (428.9 GFlop/s)
- •Two Tesla S2050 GPUs (679.0 GFlop/s)
- •Four Tesla S2050 GPUs (1363 GFlop/s)
- •Two Tesla K20m GPUs (1789 GFlop/s)
- •CUBLAS by example
- •General remarks on the examples
- •CUBLAS Level-1. Scalar and vector based operations
- •cublasIsamax, cublasIsamin - maximal, minimal elements
- •cublasSasum - sum of absolute values
- •cublasScopy - copy vector into vector
- •cublasSdot - dot product
- •cublasSnrm2 - Euclidean norm
- •cublasSrot - apply the Givens rotation
- •cublasSrotg - construct the Givens rotation matrix
- •cublasSscal - scale the vector
- •cublasSswap - swap two vectors
- •CUBLAS Level-2. Matrix-vector operations
- •cublasSger - rank one update
- •cublasStbsv - solve the triangular banded linear system
- •cublasStpsv - solve the packed triangular linear system
- •cublasStrsv - solve the triangular linear system
- •CUBLAS Level-3. Matrix-matrix operations
- •cublasStrsm - solving the triangular linear system
- •MAGMA by example
- •General remarks on Magma
- •Remarks on installation and compilation
- •Remarks on hardware used in examples
- •Magma BLAS
- •LU decomposition and solving general linear systems
- •QR decomposition and the least squares solution of general systems
- •Eigenvalues and eigenvectors for general matrices
- •Eigenvalues and eigenvectors for symmetric matrices
- •Singular value decomposition
Chapter 3
CUBLAS by example
3.1General remarks on the examples
CUBLAS is an abbreviation for CUDA Basic Linear Algebra Subprograms. In the le /usr/local/cuda-5.5/doc/pdf/CUBLAS Library.pdf one cannd a detailed description of the CUBLAS library syntax and we shall avoid to repeat the information contained there. Instead we present a series of examples how to use the library.
All subprograms have four versions corresponding to four data types
s,S - float { real single-precision
d,D - double { real double-precision,
c,C - cuComplex { complex single-precision,
z,Z - cuDoubleComplex {complex double-precision.
For example cublasI<t>amax is a template which can represent cublasIsamax, cublasIdamax, cublasIcamax or cublasIzamax.
We shall restrict our examples in this chapter to single precision versions. The reason is that low-end devices have restricted double precision capabilities. On the other hand the changes needed in the double precision case are not signi cant. In most examples we use real data but the complex cases are also considered (see the subsections with the title of the form cublasC*).
CUBLAS Library User Guide contains an example showing how to check for errors returned by API calls. Ideally we should check for errors on every API call. Unfortunately such an approach doubles the length of our sample codes (which are as short as possible by design). Since our set of CUBLAS sample code (without error checking) is 80 pages long we have decided to ignore the error checking and to focus on the explanations which cannot be found in User Guide. The reader