Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный электротехнический университет "ЛЭТИ"

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

MatrixCUDAFranDissertation.pdf

Скачиваний:

Добавлен:

22.03.2016

Размер:

2.18 Mб

Скачать

☆

<<< < Предыдущая 1 23 / 473 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

CHAPTER 1. MATRIX COMPUTATIONS ON SYSTEMS EQUIPPED WITH GPUS

Each chapter of this document presents the developed work as well as the experimental results attained for the corresponding architecture. In this sense, each part of the document is selfcontained and can be read independently.

Finally, Chapter 7 presents the main conclusions from this research. In addition, it reports the main contributions of the thesis, the publications that have been generated, and the technological transfer activities derived from it. Finally, a few open research lines related to the work are discussed.

1.3.Description of the systems used in the experimental study

1.3.1.Performance metrics

The fundamental metric for performance evaluation (or e ciency) of an application is the execution time. However, codes with intensive ﬂoating-point arithmetic operations, as is the case of linear algebra operations, often employ other metrics to evaluate the pace at which these operations are performed. More precisely, the deﬁnition of ﬂop is usually bound to a ﬂoating-point arithmetic operation. Thus, the execution speed of a linear algebra code is usually given in terms of MFLOPS (106 ﬂops/s), GFLOPS (109 ﬂops/s), or even TFLOPS (109 ﬂops/s). Although the FLOPS rate is a metric derived from the execution time, the arithmetic processing speed (ﬂops/sec) presents a clear advantage in the graphical representation of performance data. Speciﬁcally, as the problem size is increased, the execution time of codes for common dense linear algebra operations also increases proportionally (often at a cubic pace). However, the FLOPS rate is limited by the conﬁguration and speed of the hardware (cycle time, amount of functional units, cache transfer rate, bus speed, etc.) Thus, the charts representing the FLOPS rate present an upper bound that makes them much easier to display and analyze.

Although there exist widely extended metrics such as acceleration or e ciency for parallel codes, such as the GPU implementations (also derived from the execution time), we advocate here for the homogeneity in the representations, and we will mostly measure parallel performance in terms of FLOPS. Nevertheless, whenever necessary, we will use other metrics to correctly illustrate parallel performance. In those speciﬁc cases, speciﬁc metrics will be introduced when necessary.

1.3.2.Hardware description

Three di erent systems have been used in the evaluation of the implementations presented in the following chapters. Those systems are representative of the di erent multi-core architectures present nowadays and, simultaneously, they illustrate how multiple hardware accelerators (in this case, GPUs) can be attached to a single system or a cluster of compute nodes to boost performance.

PECO is a cluster of four nodes interconnected using an Inﬁniband QDR network. Each node contains two Intel Xeon 5520 (Nehalem) Quadcore processors running at 2.27 Ghz, with 24 Gbytes of DDR2 RAM memory. Attached to the PCIExpress 2.0 bus of each node, there is a NVIDIA C1060 GPU with 4 Gbytes of DDR3 RAM memory. One of the nodes in this machine will be used for the evaluation stage of BLAS and LAPACK-level routines in Chapters 3 and 4.

TESLA2 is a shared memory multiprocessor equipped with the Intel Xeon technology. It is composed of two Intel Xeon 5440 (Harpertown) Quadcore processors running at 2.83 Ghz, with 16 Gbytes of DDR2 RAM memory. Attached to the PCIExpress 2.0 bus, there is a NVIDIA s1070 system consisting of four NVIDIA Tesla C1060 GPUs identical to those present in each

<<< < Предыдущая 1 23 / 473 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
22.03.20161.06 Mб224MATER_3.doc
#
18.11.2019295.42 Кб0MATLAB-1.doc
#
19.11.2019203.78 Кб0MATLAB-2.doc
#
09.02.20153.49 Mб22MATLAB-3.doc
#
09.02.2015344.3 Кб10Matrices.pdf
#
22.03.20162.18 Mб14MatrixCUDAFranDissertation.pdf
#
21.09.2019139.22 Кб2matved.docx
#
24.04.201933.9 Mб2maximum.docx
#
09.02.2015360.31 Кб63MA_1_пособие.pdf
#
09.02.201534.57 Mб8MA_Kudriav1.pdf
#
09.02.201526.97 Mб11MA_Kudriav2.pdf