Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Brereton Chemometrics

.pdf
Скачиваний:
48
Добавлен:
15.08.2013
Размер:
4.3 Mб
Скачать

302

 

 

 

 

CHEMOMETRICS

 

 

 

 

Table 5.14 Magnitudes of first 15 PLS1 components (centred

 

data) for pyrene.

 

 

 

 

 

 

 

 

 

 

 

Component

Magnitude

Component

Magnitude

 

 

 

 

 

 

1

7.944

9

0.004

 

2

1.178

10

0.007

 

3

0.484

11

0.001

 

4

0.405

12

0.002

 

5

0.048

13

0.002

 

6

0.158

14

0.003

 

7

0.066

15

0.001

 

8

0.01

 

 

 

 

 

 

 

 

 

mean centred spectra is 10.313, hence the first two components account for 100 ×

(7.944 + 1.178)/10.313 = 88.4 % of the overall variance, so the root mean square

error after two PLS components have been calculated is 1.191/(27 × 25) = 0.042

(since 1.191 is the residual error) or, expressed as a percentage of the mean centred data,

E% = 0.042/ 10.313/(27 × 25) = 40.0 %. This could be expressed as a percentage of the mean of the raw data = 0.042/0.430 = 9.76 %. The latter appears much lower and is a consequence of the fact that the mean of the data is considerably higher than the standard deviation of the mean centred data. It is probably best simply to determine the percentage residual sum of square error (= 100 88.4 = 11.6 %) as more components are computed, but it is important to be aware that there are several approaches for the determination of errors.

The error in concentration predictions for pyrene using two PLS components can be computed from Table 5.13:

the sum of squares of the errors is 0.385;

dividing this by 22 and taking the square root leads to a root mean square error of 0.128 mg l1;

the average concentration of pyrene is 0.456 mg l1;

hence the percentage root mean square error (compared with the raw data) is 28.25 %.

Relative to the standard deviation of the centred data it is even higher. Hence the ‘x’ and ‘c’ blocks are modelled in different ways and it is important to recognise that the percentage error of prediction in concentration may diverge considerably from the percentage error of prediction of the spectra. It is sometimes possible to reconstruct spectral blocks fairly well but still not predict concentrations very effectively. It is best practice to look at errors in both blocks simultaneously to gain an understanding of the quality of predictions.

The root mean square errors for modelling both blocks of data as successive numbers of PLS components are calculated for pyrene are illustrated in Figure 5.13, and those for acenaphthene in Figure 5.14. Several observations can be made. First, the shape of the graph of residuals for the two blocks is often very different, see especially acenaphthene. Second, the graph of c residuals tends to change much more dramatically than that for x residuals, according to compound, as might be expected. Third, tests for numbers of significant PLS components might give different answers according to which block is used for the test.

CALIBRATION

303

 

 

 

0.1

 

 

 

 

 

 

 

error

0.01

 

 

 

 

 

 

 

RMS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.001

 

 

 

 

 

 

 

 

1

3

5

7

9

11

13

15

 

 

 

 

(a) x

block

 

 

 

 

 

 

 

Component number

 

 

 

 

1

 

0.1

error

0.01

RMS

 

 

0.001

 

0.0001

1

3

5

7

9

11

13

15

 

 

 

(b) c

block

 

 

 

 

 

 

Component number

 

 

 

Figure 5.13

Root mean square errors in x and c blocks, PLS1 centred and pyrene

The errors using 10 PLS components are summarised in Table 5.15, and are better than PCR in this case. It is important, however, not to get too excited about the improved quality of predictions. The c or concentration variables may in themselves contain errors, and what has been shown is that PLS forces the solution to model the apparent c block better, but it does not necessarily imply that the other methods are worse at discovering the truth. If, however, we have a lot of confidence in the experimental procedure for determining c (e.g. weighing, dilution, etc.), PLS will result in a more faithful reconstruction.

5.5.2 PLS2

An extension to PLS1 was suggested some 15 years

ago, often called PLS2.

In

fact there is little conceptual difference, except that

the latter allows the use

of

304

 

 

 

 

 

 

 

CHEMOMETRICS

 

0.1

 

 

 

 

 

 

 

error

0.01

 

 

 

 

 

 

 

RMS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.001

 

 

 

 

 

 

 

 

1

3

5

7

9

11

13

15

 

 

 

 

(a) x

block

 

 

 

 

 

 

 

Component number

 

 

 

0.1

 

 

 

 

 

 

 

RMSerror

 

 

 

 

 

 

 

0.01

 

 

 

 

 

 

 

1

3

5

7

9

11

13

15

 

 

 

(b) c

block

 

 

 

 

 

 

Component number

 

 

 

Figure 5.14

Root mean square errors in x and c blocks, PLS1 centred and acenaphthene

a concentration matrix, C, rather than concentration vectors for each individual compound in a mixture, and the algorithm is iterative. The equations above alter slightly in that Q is now a matrix not a vector, so that

X = T .P + E

C = T .Q + F

The number of columns in C and Q are equal to the number of compounds of interest. In PLS1 one compound is modelled at a time, whereas in PLS2 all known compounds can be included in the model simultaneously. This is illustrated in Figure 5.15.

CALIBRATION

305

 

 

Table 5.15 Concentration estimates of the PAHs using PLS1 and 10 components (centred).

Spectrum No.

 

 

 

 

 

 

 

 

PAH concentration (mg l1)

 

 

 

 

 

 

 

 

 

Py

Ace

Anth

 

Acy

 

Chry

Benz

Fluora

 

Fluore

Nap

Phen

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

0.462

0.112

0.170

0.147

0.341

1.697

 

0.130

0.718

0.110

0.553

2

 

 

0.445

0.065

0.280

0.175

0.440

2.758

 

0.138

0.408

0.177

0.772

3

 

 

0.147

0.199

0.285

0.162

0.562

1.635

 

0.111

0.784

0.159

0.188

4

 

 

0.700

0.174

0.212

0.199

0.333

1.097

 

0.132

0.812

0.054

0.785

5

 

 

0.791

0.167

0.285

0.111

0.223

2.118

 

0.171

0.211

0.176

0.519

6

 

 

0.616

0.226

0.176

0.040

0.467

2.172

 

0.068

0.752

0.116

0.928

7

 

 

0.767

0.119

0.108

0.180

0.452

0.522

 

0.153

0.577

0.177

0.202

8

 

 

0.476

0.085

0.228

0.157

0.109

2.155

 

0.129

0.967

0.046

0.184

9

 

 

0.317

0.145

0.232

0.042

0.440

1.576

 

0.171

0.187

0.009

0.367

10

 

 

0.614

0.178

0.046

0.154

0.334

2.702

 

0.039

0.174

0.084

0.219

11

 

 

0.625

0.029

0.237

0.121

0.574

0.543

 

0.042

0.423

0.039

0.516

12

 

 

0.179

0.161

0.185

0.175

0.091

0.560

 

0.098

0.363

0.110

0.709

13

 

 

0.579

0.119

0.262

0.061

0.118

1.074

 

0.012

0.522

0.149

0.428

14

 

 

0.463

0.198

0.067

0.054

0.226

0.561

 

0.134

0.788

0.110

0.330

15

 

 

0.752

0.041

0.062

0.075

0.113

1.646

 

0.193

0.401

0.072

0.943

16

 

 

0.149

0.017

0.115

0.037

0.338

2.186

 

0.062

0.474

0.196

0.349

17

 

 

0.148

0.106

0.050

0.096

0.453

1.044

 

0.112

0.974

0.092

0.585

18

 

 

0.274

0.075

0.149

0.119

0.223

1.098

 

0.199

0.280

0.147

0.256

19

 

 

0.151

0.119

0.213

0.109

0.236

2.664

 

0.075

0.536

0.050

0.953

20

 

 

0.458

0.140

0.114

0.095

0.555

1.067

 

0.100

0.220

0.198

0.944

21

 

 

0.615

0.080

0.120

0.189

0.226

1.581

 

0.040

1.024

0.198

0.738

22

 

 

0.318

0.091

0.267

0.097

0.329

0.523

 

0.187

0.942

0.157

0.967

23

 

 

0.295

0.160

0.124

0.160

0.122

2.669

 

0.182

0.826

0.171

0.531

24

 

 

0.761

0.072

0.167

0.047

0.541

2.687

 

0.153

1.049

0.122

0.378

25

 

 

0.296

0.120

0.047

0.197

0.555

2.166

 

0.170

0.590

0.082

0.758

E%

5.47

19.06

7.85

22.48

3.55

2.46

 

21.96

 

12.96

16.48

 

7.02

 

 

 

 

J

 

 

 

 

A

 

 

J

 

 

 

 

J

 

 

 

 

 

 

 

 

 

 

 

 

A

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I

 

 

 

 

 

 

=

I

 

 

 

 

P

+

I

 

 

E

 

 

 

 

X

 

 

T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

N

 

 

 

A

 

 

N

 

 

 

 

N

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I

 

 

C

=

I

 

T

 

Q

+

I

 

 

F

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5.15

Principles of PLS2

306

CHEMOMETRICS

 

 

Table 5.16 Concentration estimates of the PAHs using PLS2 and 10 components (centred).

Spectrum No.

 

 

 

PAH concentration mg l1

 

 

 

 

Py

Ace

Anth

Acy

Chry

Benz

Fluora

Fluore

Nap

Phen

 

 

 

 

 

 

 

 

 

 

 

1

0.505

0.110

0.193

0.132

0.365

1.725

0.125

0.665

0.089

0.459

2

0.460

0.116

0.285

0.105

0.453

2.693

0.144

0.363

0.150

0.760

3

0.162

0.180

0.294

0.173

0.563

1.647

0.094

0.787

0.161

0.157

4

0.679

0.173

0.224

0.164

0.343

1.134

0.123

0.752

0.038

0.748

5

0.811

0.135

0.294

0.149

0.230

2.152

0.162

0.221

0.183

0.475

6

0.575

0.182

0.156

0.108

0.442

2.228

0.077

0.827

0.153

1.002

7

0.779

0.151

0.107

0.143

0.469

0.453

0.167

0.484

0.156

0.199

8

0.397

0.100

0.198

0.183

0.093

2.165

0.181

1.035

0.070

0.306

9

0.295

0.089

0.238

0.108

0.433

1.665

0.158

0.238

0.032

0.341

10

0.581

0.203

0.029

0.148

0.327

2.690

0.079

0.191

0.088

0.287

11

0.609

0.070

0.207

0.108

0.559

0.453

0.079

0.484

0.049

0.636

12

0.190

0.176

0.186

0.144

0.086

0.549

0.083

0.411

0.105

0.709

13

0.565

0.107

0.249

0.095

0.092

1.088

0.000

0.595

0.173

0.478

14

0.468

0.173

0.067

0.089

0.214

0.610

0.108

0.830

0.124

0.322

15

0.771

0.018

0.073

0.096

0.112

1.668

0.175

0.415

0.077

0.906

16

0.119

0.030

0.110

0.037

0.345

2.189

0.101

0.442

0.192

0.369

17

0.181

0.098

0.070

0.090

0.468

1.061

0.106

0.903

0.076

0.510

18

0.278

0.067

0.151

0.102

0.226

1.073

0.178

0.292

0.147

0.249

19

0.184

0.131

0.218

0.102

0.245

2.617

0.071

0.434

0.034

0.925

20

0.410

0.120

0.111

0.134

0.543

1.117

0.115

0.243

0.215

0.963

21

0.663

0.100

0.147

0.129

0.262

1.558

0.040

0.845

0.152

0.630

22

0.308

0.108

0.257

0.093

0.335

0.509

0.209

0.954

0.156

0.998

23

0.320

0.179

0.123

0.114

0.129

2.610

0.164

0.817

0.157

0.537

24

0.763

0.072

0.165

0.038

0.524

2.696

0.120

1.123

0.130

0.390

25

0.327

0.110

0.049

0.216

0.544

2.150

0.139

0.650

0.091

0.746

E%

10.25

34.11

13.66

44.56

6.99

4.26

33.41

18.62

25.83

14.77

 

 

 

 

 

 

 

 

 

 

 

It is a simple extension to predict all the concentrations simultaneously, the PLS2 predictions, together with root mean square errors being given in Table 5.16. Note that there is now only one set of scores and loadings for the x (spectroscopic) dataset, and one set of ga common to all 10 compounds. However, the concentration estimates are different when using PLS2 compared with PLS1. In this way PLS differs from PCR where it does not matter if each variable is modelled separately or all together. The reasons are rather complex but relate to the fact that for PCR the principal components are calculated independently of the c variables, whereas the PLS components are also influenced by both blocks of variables.

In some cases PLS2 is helpful, especially since it is easier to perform computationally if there are several c variables compared with PLS1. Instead of obtaining 10 independent models, one for each PAH, in this example, we can analyse all the data in one go. However, in many situations PLS2 concentration estimates are, in fact, worse than PLS1 estimates, so a good strategy might be to perform PLS2 as a first step, which could provide further information such as which wavelengths are significant and which concentrations can be determined with a high degree of confidence, and then perform PLS1 individually for the most appropriate compounds.

CALIBRATION

307

 

 

5.5.3 Multiway PLS

Two-way data such as HPLC–DAD, LC–MS and LC–NMR are increasingly common in chemistry, especially with the growth of coupled chromatography. Conventionally either a univariate parameter (e.g. a peak area at a given wavelength) (methods in Section 5.2) or a chromatographic elution profile at a single wavelength (methods in Sections 5.3 to 5.5.2) is used for calibration, allowing the use of normal regression techniques described above. However, additional information has been recorded for each sample, often involving both an elution profile and a spectrum. A series of two-way chromatograms are available, and can be organised into a three-way array, often visualised as a box, sometimes denoted by X where the line underneath the array name indicates a third dimension. Each level of the box consists of a single chromatogram. Sometimes these three-way arrays are called ‘tensors’, but tensors often have special properties in physics which are unnecessarily complex and confusing to the chemometrician. We will use the notation of tensors only where it helps in understanding the existing methods.

Enhancements of the standard methods for multivariate calibration are required. Although it is possible to use methods such as three-way MLR, most chemometricians have concentrated on developing approaches based on PLS, to which we will be restricted below. Theoreticians have extended these methods to cases where there are several dimensions in both the ‘x’ and ‘c’ blocks, but the most complex practical case is where there are three dimensions in the ‘x’ block, as happens for a series of coupled chromatograms or in fluorescence excitation–emission spectroscopy, for example. A simple simulated numerical example is presented in Table 5.17, in which the x block consists of four two-way chromatograms, each of dimensions 5 × 6. There are three components in the mixture, the c block consisting of a 4 × 3 matrix. We will restrict the discussion for the case where each column of c is to be estimated independently (analogous to PLS1) rather than all in one go. Note that although PLS is by far the most popular approach for multiway calibration, it is possible to envisage methods analogous to MLR or PCR, but they are rarely used.

5.5.3.1 Unfolding

One of the simplest methods is to create a single, long, data matrix from the original three-way tensor. In the case of Table 5.17, we have four samples, which could be arranged as a 4 × 5 × 6 tensor (or ‘box’). The three dimensions will be denoted I , J and K. It is possible to change the shape so that any binary combination of variables is converted to a new variable, for example, the intensity of the variable at J = 2 and K = 3, and the data can now be represented by 5 × 6 = 30 variables and is the unfolded form of the original data matrix. This operation is illustrated in Figure 5.16.

It is now a simple task to perform PLS (or indeed any other multivariate approach), as discussed above. The 30 variables are centred and the predictions of the concentrations performed when increasing number of components are used (note that three is the maximum permitted for column centred data in this case, so this example is somewhat simple). All the methods described above can be applied.

An important aspect of three-way calibration involves scaling, which can be rather complex. The are four fundamental ways in which the data can be treated:

1.no centring;

2.centre the columns in each J × K plane and then unfold with no further centring, so, for example, x1,1,1 becomes 390–(390 + 635 + 300 + 65 + 835)/5;

308

 

 

 

 

 

 

 

CHEMOMETRICS

 

 

 

 

 

 

Table 5.17 Three-way calibration dataset.

 

 

 

 

 

 

 

(a) X block, each of the 4 (=I ) samples gives a two-way

 

5 × 6 (=J × K) matrix

 

 

 

 

390

421

871

940

610

525

 

635

357

952

710

910

380

 

300

334

694

700

460

390

 

65

125

234

238

102

134

 

835

308

1003

630

1180

325

 

488

433

971

870

722

479

 

1015

633

1682

928

1382

484

 

564

538

1234

804

772

434

 

269

317

708

364

342

194

 

1041

380

1253

734

1460

375

 

186

276

540

546

288

306

 

420

396

930

498

552

264

 

328

396

860

552

440

300

 

228

264

594

294

288

156

 

222

120

330

216

312

114

 

205

231

479

481

314

268

 

400

282

713

427

548

226

 

240

264

576

424

336

232

 

120

150

327

189

156

102

 

385

153

482

298

542

154

 

 

 

 

 

 

(b) C block, concentrations of three compounds in each

 

of the four samples

 

 

 

 

 

1

9

10

 

 

 

 

 

7

11

8

 

 

 

 

 

6

2

6

 

 

 

 

 

3

4

5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

J

I K

J.K

I

K

K

K

 

1

2

J

X

Figure 5.16

Unfolding a data matrix

3. unfold the raw data and centre afterwards, so, for example, x1,1,1 becomes 390–(390 + 488 + 186 + 205)/4 = 72.75;

4.combine methods 2 and 3, start with centring as in step 2, then unfold and recentre a second time.

These four methods are illustrated in Table 5.18 for the case of the xi,1,1, the variables in the top left-hand corner of each of the four two-way datasets. Note that methods 3

CALIBRATION

 

 

 

309

 

 

 

 

Table 5.18 Four methods of mean centring the data in Table 5.17, illustrated by the

 

variable xi,1,1 as discussed in Section 5.5.3.1.

 

 

 

 

 

 

 

 

 

 

 

Sample

Method 1

Method 2

Method 3

Method 4

 

 

 

 

 

 

 

1

390

55

72.75

44.55

 

2

488

187.4

170.75

87.85

3

186

90.8

131.25

8.75

 

 

4

205

65

112.25

34.55

 

and 4 provide radically different answers; for example, sample 2 has the highest value (=170.75) using method 3, but the lowest using method 4 (= −87.85).

Standardisation is also sometimes employed, but must be done before unfolding for meaningful results; an example might be in the GC–MS of a series of samples, each mass being of different absolute intensity. A sensible strategy might be as follows:

1.standardise each mass in each individual chromatogram, to provide I standardised matrices of dimensions J × K;

2.unfold;

3.centre each of the variables.

Standardising at the wrong stage of the analysis can result in meaningless data so it is always essential to think carefully of the physical (and numerical) consequences of any preprocessing which is far more complex and has far more options than for simple two-way data.

After this preprocessing, all the normal multivariate calibration methods can be employed.

5.5.3.2 Trilinear PLS1

Some of the most interesting theoretical developments in chemometrics over the past few years have been in so-called ‘multiway’ or ‘multimode’ data analysis. Many such methods have been available for some years, especially in the area of psychometrics, and a few do have relevance to chemistry. It is important, though, not to get too carried away with the excitement of these novel theoretical approaches. We will restrict the discussion here to trilinear PLS1, involving a three-way x block and a single c variable. If there are several known calibrants, the simplest approach is to perform trilinear PLS1 individually on each variable.

Since centring can be fairly complex for three-way data, and there is no inherent reason to do this, for simplicity it is assumed that data are not centred, so raw concentrations and chromatographic/spectroscopic measurements are employed. The data in Table 5.17 can be considered to be arranged in the form of a cube, with three dimensions, I for the number of samples and J and K for the measurements.

Trilinear PLS1 attempts to model both the ‘x’ and ‘c’ blocks simultaneously. Here we will illustrate the use with the algorithm of Appendix A.2.4, based on methods proposed by de Jong and Bro.

Superficially, trilinear PLS1 has many of the same objectives as normal PLS1, and the method as applied to the x block is often represented diagrammatically as in Figure 5.17, replacing ‘squares’ or matrices by ‘boxes’ or tensors, and replacing, where necessary, the dot product (‘.’) by something called a tensor product (‘ ’). The ‘c

310

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CHEMOMETRICS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

iP

A

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

J

 

 

 

 

 

 

 

 

 

 

 

A

J

 

 

 

 

 

 

 

 

 

A

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

J

 

 

 

 

 

 

 

 

 

A

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I

 

 

 

 

=

I

 

 

 

 

 

 

 

K

 

 

 

=

I

 

 

 

 

 

 

 

 

 

 

K

 

kP

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

K

 

 

 

 

 

 

 

 

 

 

 

 

 

P

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

T

 

 

 

 

 

 

 

 

 

 

 

 

 

T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

+

 

 

 

 

+

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

J

 

 

 

 

 

 

 

 

 

 

 

 

J

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

K

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

K

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

E

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

E

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5.17

Representation of trilinear PLS1

block decomposition can be represented as per PLS1 and is omitted from the diagram for brevity. In fact, as we shall see, this is an oversimplification, and is not an entirely accurate description of the method.

In trilinear PLS1, for each component it is possible to determine

a scores vector (t), of length I or 4 in this example;

a weight vector, which has analogy to a loadings vector (j p) of length J or 5 in this example, referring to one of the dimensions (e.g. time), whose sum of squares equals 1;

another weight vector, which has analogy to a loadings vector (k p) of length K or 6 in this example, referring to the other one of the dimensions (e.g. wavelength) whose sum of squares also equals 1.

Superficially these vectors are related to scores and loadings in normal PLS, but in practice they are completely different, a key reason being that these vectors are not orthogonal in trilinear PLS1 influencing the additivity of successive components. Here, we keep the notation scores and loadings, simply for the purpose of retaining familiarity with terminology usually used in two-way data analysis.

In addition, a vector q is determined after each new component, by

q=(T .T )1.T .c

so that

cˆ = T .q

CALIBRATION

311

 

 

 

or

 

c = T .q + f

where T is the scores matrix, the columns of which consist of the individual scores vectors for each component and has dimensions I × A or 4 × 3 in this example if three PLS components are computed, and q is a column vector of dimensions A × 1 or 3 × 1 in our example.

A key difference from bilinear PLS1 as described in Section 5.5.1 is that the elements of q have to be recalculated afresh as new components are computed, whereas for two-way PLS, the first element of q, for example, is the same no matter how many components are calculated. This limitation is a consequence of nonorthogonality of individual columns of matrix T.

The x block residuals after each component are often computed conventionally by

resid,a

xij k =

resid

,a1x ti p p

 

j k

where resid,a xij k is the residual after a components are calculated, which would lead to a model

 

A

xˆij k =

 

j k

ti pj pk

 

a=1

Sometimes these equations are written as tensor products, but there are numerous ways of multiplying tensors together, so this notation can be confusing and it is often conceptually more convenient to deal directly with vectors and matrices, just as in Section 5.5.3.1 by unfolding the data. This procedure can be called matricisation.

In mathematical terms, we can state that

A

unfolded ˆ

t a .

unfolded

pa

X =

 

 

a=1

 

 

where unfolded pa is simply a row vector of length J.K. Where trilinear PLS1 differs from unfolded PLS described in Section 5.5.3.1 is that a matrix Pa of dimensions J × K can be obtained for each PLS component given by

Pa =jpa .kpa

and Pa is unfolded to give unfolded pa .

Figure 5.18 represents this procedure, avoiding tensor multiplication, using conventional matrices and vectors together with unfolding. A key problem with the common implementation of trilinear PLS1 is that, since the scores and loadings of successive components are not orthogonal, the methods for determining residual errors are simply an approximation. Hence the x block residual is not modelled very well, and the error matrices (or tensors) do not have an easily understood physical meaning. It also means that there are no obvious analogies to eigenvalues. This means that it is not easy to determine the size of the components or the modelling power using the x scores and loadings, but, nevertheless, the main aim is to predict the concentration (or c block),

Соседние файлы в предмете Химия