Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
CUBLAS and MAGMA by example.pdf
Скачиваний:
36
Добавлен:
22.03.2016
Размер:
2.45 Mб
Скачать

Chapter 2

Measuring GPUs performance

2.1Linpack benchmark for CUDA

Registered developers can download from https://developer.nvidia.com/ the version of Linpack benchmark prepared specially for CUDA. In August, 2013 the current version for Tesla cards was hpl-2.0 FERMI v15.tgz.

After uncompressing one obtains the directory hpl-2.0 FERMI v15. We enter the directory

$ cd hpl-2.0_FERMI_v15

The le INSTALL contains installation instructions. The example le Make.CUDA should be edited. In our system we have edited (only) the following lines:

TOPdir = $HOME/hpl-2.0_FERMI_v15

 

MPdir = /usr/lib64/openmpi

# Redhat/Centos default

MPinc = -I/usr/include/openmpi-x86_64

# for OpenMPI

MPlib = -L/usr/lib64/openmpi/lib

 

LAdir = /opt/intel/mkl/lib/intel64

# MKL presence assumed !!!

LAinc = -I/opt/intel/mkl/include

 

LAlib = -L$(TOPdir)/src/cuda -ldgemm -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -L$(LAdir) -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5

After entering the directory we can do the compilation

$ make

which creates in hpl-2.0 FERMI v15/bin/CUDA a new executable xhpl. We can enter the directory

$ cd bin/CUDA

and edit two les run linpack and HPL.dat. For example in run linpack script le we edited (only) the two lines

2.1 Linpack benchmark for CUDA

13

HPL_DIR=$HOME/hpl-2.0_FERMI_v15

CPU_CORES_PER_GPU=8

(two eight core CPUs + two S2050 GPUs in each of two nodes). The le HPL.dat contains the description of the problem to be solved. Linpack solves dense NxN systems of linear equations in double precision. Users can specify in HPL.dat the number of problems, their sizes and some other parameters. The detailed description of this le can be found in hpl-2.0 FERMI v15/TUNING.

For our benchmarks we have edited the sample HPL.dat le:

HPLinpack benchmark input file

Innovative Computing Laboratory, University of Tennessee

HPL.out

output file name (if any)

6

device out (6=stdout,7=stderr,file)

1

# of problems sizes (N)

100000

Ns

1

# of NBs

768

NBs

0

PMAP process mapping (0=Row-,1=Column-major)

1

# of process grids (P x Q)

2

Ps

2

Qs

16.0threshold

1

 

 

# of panel fact

0

1

2

PFACTs (0=left, 1=Crout, 2=Right)

1

 

 

# of recursive stopping criterium

2

8

 

NBMINs (>= 1)

1

 

 

# of panels in recursion

2

 

 

NDIVs

1

 

 

# of recursive panel fact.

0

1

2

RFACTs (0=left, 1=Crout, 2=Right)

1

 

 

# of broadcast

0

2

 

BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)

1

 

 

# of lookahead depth

1

0

 

DEPTHs (>=0)

1

 

 

SWAP (0=bin-exch,1=long,2=mix)

192

 

swapping threshold

1

 

 

L1 in (0=transposed,1=no-transposed) form

1

 

 

U in (0=transposed,1=no-transposed) form

1

 

 

Equilibration (0=no,1=yes)

8

 

 

memory alignment in double (> 0)

Let us comment the rst ten lines of this le (beginning from HPL.out). The remaining lines were unchanged.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]