Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
MatrixCUDAFranDissertation.pdf
Скачиваний:
14
Добавлен:
22.03.2016
Размер:
2.18 Mб
Скачать

CHAPTER 4. LAPACK-LEVEL ROUTINES ON SINGLE-GPU ARCHITECTURES

4.4.2.Blocked algorithm for the LU factorization

Consider a partitioning of matrices A, L, and U:

A =

A21

A22

, L =

L21

L22

, U =

0

U22

,

 

 

A11

A12

 

 

L11

0

 

 

U11

U12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

where A11, L11 and U11 are sub-matrices of size b × b (with b much smaller than the dimension of A); and A22, L22 and U22 are matrices with (n − b) × (n − b) elements. From A = LU:

A =

A21

A22

=

L21

L22

 

0

U22

=

L21U11

L21U12 + L22U22

,

 

 

A11

A12

 

 

 

L11

0

 

U11

U12

 

 

 

L11U11

L11U12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

from which the following expressions are obtained:

L11U11 = A11,

U21 = L111A12,

L21 = A21U11−1,

A22 − L21U12 = L22U22.

Thus, the recursive blocked algorithm for the calculation of the LU factorization of A, can be formulated as follows:

1. Partition A =

A11

A12

, with A11 of size b × b.

A21

A22

2.A11 := {L\U}11 such that A11 = L11U11 (that is, A11 is overwritten with its LU factorization).

3.AT12 := U12T = L11T A12.

4.A21 := L21 = A21/U21.

5.A22 := A22 − L21U12T .

6.Continue recursively with A = A22 in step 1.

The cost of this blocked formulation is 2n3/3 flops. In this algorithm, the calculation of the panels A12 and A21, and the update of the sub-matrix A22 are cast in terms of Level-3 BLAS routines. The latter operation involves the major part of the calculations (O(n2b) operations on O(n2) data). As typically b 1, the ratio between arithmetic operations and data accesses is increased, and thus data reutilization is improved.

4.4.3.LU factorization with partial pivoting

As noted before, the solution of a linear system of the form Ax = b exists and is unique if the coe cient matrix A is invertible. However, this is not the only requisite for the existence of the LU factorization. For example, if A is invertible, but any of the elements in the diagonal is zero, the factorization is not feasible, as a division by zero appears during the algorithm (see step 4 of the scalar algorithm presented above).

94

4.4. LU FACTORIZATION

Row pivoting is a strategy that can be applied to solve this issue: if the row with the zero diagonal element is swapped with a di erent row below that in the matrix (without a zero element in the corresponding entry), the factorization can continue normally. Note that these swappings must be applied as well to vector b before solving the linear system.

Another problem that can appear during the LU factorization is caused by diagonal elements with a small magnitude compared with the rest of the elements in its column. This situation involves a growth in the magnitude of the elements of U, with the consequent rounding errors in the presence of limited precision architectures, and a catastrophic impact on the accuracy of the results [144].

A row pivoting strategy that swaps the diagonal element with that of largest magnitude from those in and below the diagonal in the current column, guarantees that every element in L is equal or smaller than 1 in magnitude, and handles properly the growth of the elements in U, limiting the impact of the round-o errors. This technique is known as partial row pivoting, and will be implemented in our GPU codes.

Scalar algorithm for the LU factorization with partial pivoting

The introduction of the partial pivoting in the recursive scalar algorithm asks for two new steps that follow the initial partitioning of the coe cient matrix. First, the element with the largest magnitude in the first column of the current sub-matrix to be factorized has to be found. Second, the first row of the sub-matrix and the row of the selected element must be swapped.

The resulting algorithm is formulated as follows:

 

 

α11

a12T

 

1.

Partition A =

 

 

 

.

 

 

a21

A22

 

2.

π := index of the element with the largest magnitude in

α11

.

a21

 

 

a12T ) with the corresponding elements of the π-th row.

3.

Swap the row ( α11

 

4.α11 := µ11 = α11.

5.aT12 := uT12 = aT12.

6.a21 := l21 = a2121.

7.A22 := A22 − l21uT12.

8.Continue recursively with A = A22 in step 1.

The same sequence of swappings (or permutations) must be applied to vector b in the solution of the linear system. This application is usually performed after the LU factorization of the coe cient matrix A is computed; thus, it is necessary to store the information that allows the application of this sequence. In practice, a vector of n elements is enough to store the permutation information and recover it afterwards.

To formally include row swapping in the LU factorization, we next introduce permutation matrices. The e ect of a permutation matrix is to rearrange the elements of vectors and entire rows or columns of matrices.

95

CHAPTER 4. LAPACK-LEVEL ROUTINES ON SINGLE-GPU ARCHITECTURES

Definition P Rn×n is a permutation matrix if, when applied to the vector x = (χ0, χ1, . . . ,χn−1)T , rearranges the order of the elements in that vector. Such a permutation can be represented by a vector of integers p = (π0, π1, . . . ,πn−1)T , where {π0, π1, . . . ,πn−1} is a permutation of {0,1, . . . ,n − 1}, and the scalars πi indicate that the permuted vector is given by P x = (χπ0 π1 , . . . ,χπn−1 ).

A permutation matrix is equivalent to the identity matrix with permuted rows.

Definition If P is the identity matrix with rows i and j swapped, then applying P to a matrix A as P A swaps rows i and j of A.

The algorithm shown above is implemented in LINPACK [56]. In this algorithm, the application of the Gauss transformations and permutations on b must be interleaved, exactly as has been shown for A to obtain U. Thus, if Pk Rn×n and Lk Rn×n are, respectively, the permutation matrix and the Gauss transformation applied in the k-th stage of the algorithm, it follows that A = (P1L0P1L1 . . . Pn−1Ln−1)U and, thus, the solution of the linear system Ax = b can be obtained from Ux = (Ln−1 1Pn−1 . . . L1 1P1L0 1P0)b. A side e ect of the interleaving of permutations and Gauss transformations is the impossibility of the derivation of an analogous blocked algorithm, as each Gauss transformation is applied in terms of a rank-1 update (GER routine in BLAS-2).

An alternative approach for row pivoting is implemented in LAPACK [9]. In the implementation proposed in this library, each interchange a ects all elements in the involved rows. Thus, in the step 3 of the scalar algorithm, the elements to the left of the diagonal entry α11, together with ( α11 aT12 ), are permuted with all the elements in the row of index π. Pivoting with complete rows is equivalent to grouping the permutations, and hence it is not necessary to apply them in an interleaved manner with the Gauss transformations. More precisely, if matrix P = Pn−1 . . . P1P0 contains the permutations applied during the factorization, then the solution of the linear system Ax = b is obtained by solving the systems Ly = P b and Ux = y.

Adapting the recursive formulation of the algorithm for the LU factorization to this new pivoting scheme is not straightforward, mainly because the pivoting a ects the sub-matrix on which the calculations are performed at each iteration (A22 in the previous iteration), as well as to the elements that have already been factored in previous iterations. In this case it is more convenient the formulation of the algorithm using the FLAME notation for both the scalar and the blocked algorithms.

Algorithms in FLAME notation for the scalar and blocked LU factorization with partial pivoting

The formal derivation process proposed by the FLAME project drives to five di erent algorithmic variants for the LU factorization. From these variants, only three can be combined with partial pivoting [138]. The scalar and blocked algorithms for these three variants are shown in Figure 4.9 (left and right, respectively). In the FLAME algorithms, n(·) represents the number of columns of a matrix, and P (π1) denotes the permutation matrix that, applied to a second matrix, swaps the first row of it with row π1. If p1 = (π1, π2, . . . , πb), then P (p1) is the permutation matrix that, applied to a second matrix, swaps its first row with row π1, the second row with row π2, and so on, until the last swapping of row b with row πb. This sequence of permutations is gathered by vector p. The procedure PivIndex returns the index of the element of largest magnitude of the vector passed as a parameter.

TRILU(·) returns the strictly lower triangular matrix stored in the matrix passed as a parameter, transforming it to a unit lower triangular matrix. Its usage implies that the operation A12 := TRILU(A11)−T A12 is the solution of a triangular system (routine TRSM in BLAS). The rest of the operations in the algorithm are cast in terms of GEMM-like operations.

96

4.4. LU FACTORIZATION

Algorithm: [A, p] := LU PIV UNB(A)

Partition

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ABR !

, p →

 

pB !

 

 

 

 

 

 

 

 

 

A → ABL

 

 

 

 

 

 

 

 

 

 

 

 

 

 

AT L

AT R

 

 

 

 

 

 

 

 

 

pT

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

where

 

AT L is 0 × 0 and pT has 0 elements

 

 

 

 

while n(AT L) < n(A)

 

do

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Repartition

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

AT L AT R

 

 

 

 

 

 

 

 

T ,

pT

 

 

 

T

 

01

 

 

 

 

 

 

 

 

 

 

 

 

!

 

 

A00

 

a

A02

 

 

 

!

 

 

p0

 

 

 

 

 

 

 

 

 

 

 

 

 

a10

 

α11

a12

 

 

 

 

 

π1

 

 

 

 

ABL

ABR

 

 

 

 

 

pB

 

 

 

 

 

 

 

 

 

 

 

 

 

A20

 

a21

A22

 

 

 

 

p2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

11 and

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

where α

 

π

 

 

are 1×1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variant 1:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

α11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

π1 := PivIndex

 

 

 

 

!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a21

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A20

a21

 

A22

!

:= P (π1)

 

A20

a21

A22

!

 

 

 

 

 

 

 

a10T

α11

 

a12T

 

 

 

 

 

 

 

 

 

 

 

 

 

a10T

α11

a12T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a21 := a2111

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A22 := A22 − a21a12T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variant 2:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

α11 := P (p0) α11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a01

 

 

 

 

 

 

 

 

 

a01

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a21

 

 

 

 

 

a21

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a01 := TRILU(A00)

a01

 

 

 

 

 

 

 

 

 

 

 

 

 

α11 := α11 − a10T a01

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a21 := a21 − A20a01

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

α11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

π1 := PivIndex

 

 

 

!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a21

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A20

a21 ! := P (π1)

A20

a21

!

 

 

 

 

 

 

 

 

 

 

aT

α11

 

 

 

 

 

 

 

 

 

 

 

aT

 

α11

 

 

 

 

 

 

 

 

10

 

 

 

 

 

 

 

 

 

 

 

 

 

10

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a21 := a2111

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variant 3:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

α11 := α11 − aT a01

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a21 := a21 − A20a01

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a12T := a12T

 

− a10T A02

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

α11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

π1 := PivIndex

 

 

 

!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a21

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A20

a21

 

A22

!

:= P (π1)

 

A20

a21

A22

!

 

 

 

 

 

 

 

a10T

α11

 

a12T

 

 

 

 

 

 

 

 

 

 

 

 

 

a10T

α11

a12T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a21 := a2111

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Continue with

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

AT L AT R ← aT

 

α11

 

aT ,

pT

← π1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

!

 

 

A00

 

a01

 

A02

 

 

 

!

 

 

p0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10

 

 

 

 

 

12

 

 

 

 

 

 

 

 

 

 

ABL

ABR

 

 

 

 

 

 

 

 

 

 

pB

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A20

 

a21

 

A22

 

 

 

 

p2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

endwhile

Algorithm: [A, p] := LU PIV BLK(A)

Partition

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

!

 

 

 

 

 

 

 

 

 

 

 

 

 

A →

 

 

AT L

AT R

 

,

 

p →

 

 

 

 

pT

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ABL

ABR

 

 

 

 

 

pB

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

where

AT L is 0 × 0 and pT has 0 elements

 

 

 

 

 

while

 

n(AT L) < n(A)

 

 

do

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Determine block size b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Repartition

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

AT L

 

AT R

A00

 

 

 

 

A01

 

 

A02

,

 

pT

 

p0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

!

 

 

 

 

A10

 

 

 

 

A11

 

 

A12

 

 

 

 

!

 

 

p1

 

 

 

 

 

ABL

 

ABR

 

 

 

 

 

 

 

pB

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A20

 

 

 

 

A21

 

 

A22

 

 

 

 

p2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A11 is b

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

where

 

 

 

 

 

 

 

×b and p

 

 

 

has b elements

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variant 1:

#

:= LU PIV UNB

A21

!

 

 

 

 

 

 

 

 

 

"

 

A21

! , p1

 

 

 

 

 

 

 

 

 

 

 

A11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A22 !

 

:= P (p1)

 

 

 

A20

 

 

A22 !

 

 

 

 

 

 

 

 

 

 

A20

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A10

 

 

 

A12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A10

 

 

A12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A12 := TRILU(A11)−1A12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A22 := A22 − A21A12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variant 2:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A11 := P (p0) A11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A01

 

 

 

 

 

 

 

 

 

 

 

A01

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A21

 

 

 

A21

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A01 := TRILU(A00)

A01

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A11 := A11 − A10A01

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A21 := A21 − A20A01

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

!

 

 

 

 

 

 

 

 

 

"

 

A21

! , p1

#

:= LU PIV UNB

A21

 

 

 

 

 

 

 

 

 

 

 

A11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A11

 

 

 

 

 

 

 

 

 

 

 

 

!

 

 

 

 

 

 

 

 

A10

!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A10

 

 

 

:= P (p1)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A20

 

 

 

A20

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variant 3:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A11 := A11 − A10A01

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A21 := A21 − A20A01

 

 

 

 

 

 

 

 

 

 

 

 

 

A21

!

 

 

 

 

 

 

 

 

 

"

 

A21

! , p1

#

:= LU PIV UNB

 

 

 

 

 

 

 

 

 

 

 

A11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A22 !

 

:= P (p1)

 

 

 

 

 

 

 

A22 !

 

 

 

 

 

 

 

 

 

 

A20

 

 

 

 

 

 

 

A20

 

 

 

 

 

 

 

 

 

 

 

 

A10

 

 

 

A12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A10

 

 

A12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A12 := A12 − A10A02

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A12 := TRILU(A11)−1A12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Continue with

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

AT L AT R ← A10

 

A11

 

 

 

A12

,

 

pT

← p1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

!

 

 

 

 

A00

 

A01

 

 

 

A02

 

 

 

 

!

 

p0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A20

 

A21

 

A22

 

 

 

p2

 

 

 

ABL

 

ABR

 

 

 

pB

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

endwhile

Figure 4.9: Algorithmic variants for the scalar (left) and blocked (right) algorithms for the LU factorization with partial pivoting.

97

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]