Measuring GPUs performance

2.1Linpack benchmark for CUDA

Registered developers can download from https://developer.nvidia.com/ the version of Linpack benchmark prepared specially for CUDA. In August, 2013 the current version for Tesla cards was hpl-2.0 FERMI v15.tgz.

After uncompressing one obtains the directory hpl-2.0 FERMI v15. We enter the directory

$ cd hpl-2.0_FERMI_v15

The le INSTALL contains installation instructions. The example le Make.CUDA should be edited. In our system we have edited (only) the following lines:

TOPdir = $HOME/hpl-2.0_FERMI_v15
MPdir = /usr/lib64/openmpi	# Redhat/Centos default
MPinc = -I/usr/include/openmpi-x86_64	# for OpenMPI
MPlib = -L/usr/lib64/openmpi/lib
LAdir = /opt/intel/mkl/lib/intel64	# MKL presence assumed !!!
LAinc = -I/opt/intel/mkl/include

LAlib = -L$(TOPdir)/src/cuda -ldgemm -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -L$(LAdir) -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5

After entering the directory we can do the compilation

$ make

which creates in hpl-2.0 FERMI v15/bin/CUDA a new executable xhpl. We can enter the directory

$ cd bin/CUDA

and edit two les run linpack and HPL.dat. For example in run linpack script le we edited (only) the two lines

2.1 Linpack benchmark for CUDA

HPL_DIR=$HOME/hpl-2.0_FERMI_v15

CPU_CORES_PER_GPU=8

(two eight core CPUs + two S2050 GPUs in each of two nodes). The le HPL.dat contains the description of the problem to be solved. Linpack solves dense NxN systems of linear equations in double precision. Users can specify in HPL.dat the number of problems, their sizes and some other parameters. The detailed description of this le can be found in hpl-2.0 FERMI v15/TUNING.

For our benchmarks we have edited the sample HPL.dat le:

HPLinpack benchmark input file

Innovative Computing Laboratory, University of Tennessee

HPL.out	output file name (if any)
6	device out (6=stdout,7=stderr,file)
1	# of problems sizes (N)
100000	Ns
1	# of NBs
768	NBs
0	PMAP process mapping (0=Row-,1=Column-major)
1	# of process grids (P x Q)
2	Ps
2	Qs

16.0threshold

1			# of panel fact
0	1	2	PFACTs (0=left, 1=Crout, 2=Right)
1			# of recursive stopping criterium
2	8		NBMINs (>= 1)
1			# of panels in recursion
2			NDIVs
1			# of recursive panel fact.
0	1	2	RFACTs (0=left, 1=Crout, 2=Right)
1			# of broadcast
0	2		BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1			# of lookahead depth
1	0		DEPTHs (>=0)
1			SWAP (0=bin-exch,1=long,2=mix)
192			swapping threshold
1			L1 in (0=transposed,1=no-transposed) form
1			U in (0=transposed,1=no-transposed) form
1			Equilibration (0=no,1=yes)
8			memory alignment in double (> 0)

Let us comment the rst ten lines of this le (beginning from HPL.out). The remaining lines were unchanged.

<<< < Предыдущая 1 23 / 223 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
20.04.20192.55 Mб19Computer Simulation.doc
#
11.07.201923.01 Кб4Computer viruses.docx
#
24.11.2019177.15 Кб3Conf_2012_12_05_FEM_ПО СТРАНИЦАМ ДИССЕРТАЦИЙ 20...doc
#
09.02.201582.78 Кб70Course_project_ads_2.docx
#
09.02.2015101.19 Кб25Course_project_PR_2_pravki.docx
#
22.03.20162.45 Mб36CUBLAS and MAGMA by example.pdf
#
09.02.2015435.71 Кб23culture_anticue_world.doc
#
09.02.201549 Кб5cолнышкин отчет.docx
#
27.10.20186.46 Mб12diplom.docx
#
27.10.20186.46 Mб11diplom.docx
#
09.02.20151.48 Mб143Diplom3.pdf