Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный электротехнический университет "ЛЭТИ"

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

CUBLAS and MAGMA by example.pdf

Скачиваний:

Добавлен:

22.03.2016

Размер:

2.45 Mб

Скачать

☆

<<< < Предыдущая 1 2 34 / 224 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 > Следующая >>>

2.2 Tests results

1.HPL.out is used as output le if the number in the next line is not equal to 6 or 7.

2.Number 6 means that the output goes to stdout. If it is replaced by 5 (for example) then the output goes to HPL.out

3.The number 1 in the third line means that we want to solve exactly one system.

4.The number 100000 denotes the size of the system. Large systems can give better performance but need more memory.

5.The number 1 in the fth line means that we shall try only one datablock size

6.The number 768 denotes the block size. The number should be a multiple of 128. It can be selected experimentally.

7.0 in the next line denotes row-major process mapping (not changed in sample HPL.dat le).

8.Next 1 denotes the number of grids used (in our example only one). Testing four cards, since we have two nodes with two GPUs in each, we choose one PxQ=2x2 grid. PxQ should be equal to the total number of tested GPUs.

9.The number 2 means that the rst dimension of the grid P=2.

10.The number 2 in the next line means that the second dimension of the grid Q=2.

2.2Tests results

At our disposal we had two nodes with Redhat 6.3, CUDA 5.5 and with the following hardware :

two socket Xeon CPU E5-2650, 2.00GHz,

two Tesla S2050 GPUs,

256 GB RAM,

Gigabit Ethernet.

2.2 Tests results

2.2.1One Tesla S2050 GPU (428.9 GFlop/s)

For one GPU we have used P=1, Q=1 parameters in HPL.dat and have obtained the following results.

$ mpirun -np 1 ./run_linpack

=======================================================================

T/V N NB P Q Time Gflops

-----------------------------------------------------------------------

WR10L2L2

100000

768

1554.29

4.289e+02

-----------------------------------------------------------------------

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=0.0039050 ...PASSED

=======================================================================

2.2.2Two Tesla S2050 GPUs (679.0 GFlop/s)

For two GPUs we have used P=1, Q=2 parameters in HPL.dat and have obtained the following results.

$ mpirun -np 2 ./run_linpack

=======================================================================

T/V N NB P Q Time Gflops

-----------------------------------------------------------------------

WR10L2L2

100000

768

981.87

6.790e+02

-----------------------------------------------------------------------

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=0.0035832 ...PASSED

=======================================================================

Remark. For two CPUs, using the CPU Linpack we have obtained 273.8 GFlop/s for N=100000.

2.2.3Four Tesla S2050 GPUs (1363 GFlop/s)

For four GPUs we have used P=2, Q=2 parameters in HPL.dat and have obtained the following results.

# For N=100000

$ mpirun -np 4 -host node1,node2 ./run_linpack

=======================================================================

T/V N NB P Q Time Gflops

-----------------------------------------------------------------------

WR10L2L2

100000

768

561.98

1.186e+03

-----------------------------------------------------------------------

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=0.0037021 ...PASSED

2.2 Tests results

=======================================================================

# For N=200000

$ mpirun -np 4 -host node1,node2 ./run_linpack

=======================================================================

T/V N NB P Q Time Gflops

-----------------------------------------------------------------------

WR10L2L2

200000

1024

3912.98

1.363e+03

-----------------------------------------------------------------------

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=0.0038225 ...PASSED

=======================================================================

Remark. Setting the number of solved systems to 20 and their size to 200000 we have checked that the system is able to work with the 1300 GFlop/s performance over 30 hours.

2.2.4Two Tesla K20m GPUs (1789 GFlop/s)

For two Kepler GPUs and two socket Xeon CPUs E5-2665 we have used P=1, Q=2 parameters in HPL.dat and have obtained the following results.

$ mpirun -np 2 ./run_linpack

=======================================================================

T/V N NB P Q Time Gflops

-----------------------------------------------------------------------

WR10L2L2

100000

768

372.74

1.789e+03

-----------------------------------------------------------------------

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=0.0030869 ...PASSED

=======================================================================

Remark. For two E5-2665 CPUs, using the CPU Linpack we have obtained

307.16 GFlop/s for N=100000.

<<< < Предыдущая 1 2 34 / 224 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
20.04.20192.55 Mб19Computer Simulation.doc
#
11.07.201923.01 Кб4Computer viruses.docx
#
24.11.2019177.15 Кб3Conf_2012_12_05_FEM_ПО СТРАНИЦАМ ДИССЕРТАЦИЙ 20...doc
#
09.02.201582.78 Кб70Course_project_ads_2.docx
#
09.02.2015101.19 Кб25Course_project_PR_2_pravki.docx
#
22.03.20162.45 Mб36CUBLAS and MAGMA by example.pdf
#
09.02.2015435.71 Кб23culture_anticue_world.doc
#
09.02.201549 Кб5cолнышкин отчет.docx
#
27.10.20186.46 Mб12diplom.docx
#
27.10.20186.46 Mб11diplom.docx
#
09.02.20151.48 Mб143Diplom3.pdf