- •Foreword
- •CUDA installation
- •Installing CUDA environment
- •Measuring GPUs performance
- •Linpack benchmark for CUDA
- •Tests results
- •One Tesla S2050 GPU (428.9 GFlop/s)
- •Two Tesla S2050 GPUs (679.0 GFlop/s)
- •Four Tesla S2050 GPUs (1363 GFlop/s)
- •Two Tesla K20m GPUs (1789 GFlop/s)
- •CUBLAS by example
- •General remarks on the examples
- •CUBLAS Level-1. Scalar and vector based operations
- •cublasIsamax, cublasIsamin - maximal, minimal elements
- •cublasSasum - sum of absolute values
- •cublasScopy - copy vector into vector
- •cublasSdot - dot product
- •cublasSnrm2 - Euclidean norm
- •cublasSrot - apply the Givens rotation
- •cublasSrotg - construct the Givens rotation matrix
- •cublasSscal - scale the vector
- •cublasSswap - swap two vectors
- •CUBLAS Level-2. Matrix-vector operations
- •cublasSger - rank one update
- •cublasStbsv - solve the triangular banded linear system
- •cublasStpsv - solve the packed triangular linear system
- •cublasStrsv - solve the triangular linear system
- •CUBLAS Level-3. Matrix-matrix operations
- •cublasStrsm - solving the triangular linear system
- •MAGMA by example
- •General remarks on Magma
- •Remarks on installation and compilation
- •Remarks on hardware used in examples
- •Magma BLAS
- •LU decomposition and solving general linear systems
- •QR decomposition and the least squares solution of general systems
- •Eigenvalues and eigenvectors for general matrices
- •Eigenvalues and eigenvectors for symmetric matrices
- •Singular value decomposition
Chapter 4
MAGMA by example
4.1General remarks on Magma
MAGMA is an abbreviation for Matrix Algebra for GPU and Multicore Architectures (http://icl.cs.utk.edu/magma/). It is a collection of dense linear algebra routines, a successor of Lapack and ScaLapack, specially developed for heterogeneous GPU-based architectures.
Magma is an open-source project developed by Innovative Computing Laboratory (ICL), University of Tennessee, Knoxville, USA.
It includes
LU, QR and Cholesky factorization.
Hessenberg reduction.
Linear solvers based on LU, QR and Cholesky decompositions.
Eigenvalue and singular value problem solvers.
Generalized Hermitian-de nite eigenproblem solver.
Mixed-precision iterative re nement solvers based on LU, QR and Cholesky factorizations.
A more detailed (but not complete) information on procedures contained in Magma can be found in table of contents. A complete information can be found for example in magma-X.Y.Z/src directory. Let us notice that the source les in this directory contain a precise syntax description of Magma functions, so we do not repeat this information in our text (the syntax is also easily available on the Internet). Instead, we present a series of examples how to use the library.
All subprograms have four versions corresponding to four data types
4.1 General remarks on Magma |
103 |
s - float { real single-precision
d - double { real double-precision,
c - magmaFloatComplex { complex single-precision,
z - magmaDoubleComplex { complex double-precision.
For example magma i<t>amax is a template which can represent magma isamax, magma idamax, magma icamax or magma izamax.
We shall restrict our examples to the most popular real, single and double precision versions. The single precision versions are important because in users hands there are millions of inexpensive GPUs which have restricted double precision capabilities. Installing Magma on such devices can be a good starting point to more advanced studies. On the other hand in many applications the double precision is necessary, so we have decided to present our examples in both versions (in Magma BLAS case only in single precision). In most examples we measure the computations times, so one can compare the performance in both precisions.
Ideally we should check for errors on every function call. Unfortunately such an approach doubles the length of our sample codes (which are as short as possible by design). Since our set of Magma sample code (without error checking) is almost 140 pages long we have decided to ignore the error checking and to focus on the explanations which cannot be found in the syntax description.
To obtain more compact explanations in our examples we restrict the full generality of Magma to the special case where the leading dimension of matrices is equal to the number of rows and the stride between consecutive elements of vectors is equal to 1. Magma allows for more exible approach giving the user the access to submatrices an subvectors. The corresponding generalizations can be found in syntax descritions in source les.
4.1.1Remarks on installation and compilation
Magma can be downloaded from http://icl.cs.utk.edu/magma/software/ index.html. In the Magma directory obtained after extraction of the downloaded magma-X.Y.Z.tar.gz le there is README le which contains installation instructions. The user must provide make.inc which speci es where CUDA, BLAS and Lapack are installed in the system. Some sample make.inc les are contained in Magma directory. After proper modi cation of the make.inc le, running