![](/user_photo/_userpic.png)
Brereton Chemometrics
.pdf![](/html/611/48/html_Ju8uix7TUX.TRsv/htmlconvd-VXKArY291x1.jpg)
![](/html/611/48/html_Ju8uix7TUX.TRsv/htmlconvd-VXKArY292x1.jpg)
CALIBRATION |
283 |
|
|
single compound model. If this contribution were removed the error would be reduced dramatically.
An alternative, and common, method for including the intercept is to mean centre both the x and the c variables to fit the equation
c − c ≈ (x − x)b
or
cenc ≈ cenxb
or
I
(xi − x)(ci − c)
b ≈ (cenx .cenx)−1.cenx .cenc = i=1
I
(xi − x)2
i=1
It is easy to show algebraically that
•the value of b when both variables have been centred is identical with the value of b1 obtained when the data are modelled including an intercept term (=4.227 in this example);
•the value of b0 (intercept term for uncentred data) is given by c − bx = 0.469 − 4.227 × 0.149 = −0.173, so the two methods are related.
It is common to centre both sets of variables for this reason, the calculations being mathematically simpler than including an intercept term. Note that both blocks must be centred, and the predictions are of the concentrations minus their mean, so the mean concentration must be added back to return to the original physical values.
|
0.9 |
|
|
|
|
|
|
|
|
|
0.8 |
|
|
|
|
|
|
|
|
−1 |
0.7 |
|
|
|
|
|
|
|
|
mgl |
0.6 |
|
|
|
|
|
|
|
|
concentration |
|
|
|
|
|
|
|
|
|
0.5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Predicted |
0.4 |
|
|
|
|
|
|
|
|
0.3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.2 |
|
|
|
|
|
|
|
|
|
0.1 |
|
|
|
|
|
|
|
|
|
0 |
|
|
|
|
|
|
|
|
|
0.0 |
0.1 |
0.2 |
0.3 |
0.4 |
0.5 |
0.6 |
0.7 |
0.8 |
|
|
|
|
True concentration mg l−1 |
|
|
|
Figure 5.7
Predicted (vertical) versus known (horizontal) concentrations using the methods in Section 5.2.3
![](/html/611/48/html_Ju8uix7TUX.TRsv/htmlconvd-VXKArY294x1.jpg)
CALIBRATION |
285 |
|
|
Absorbance |
|
|
|
330 |
335 |
340 |
345 |
Wavelength (nm)
Figure 5.8
Absorbances of pure Pyr, Fluor, Benz and Ace between 330 and 345 nm
Table 5.5 Matrices for four components.
|
|
X |
|
|
|
|
C |
|
|
|
|
|
|
|
|
|
|
330 |
335 |
340 |
345 |
|
Py |
Ace |
Benz |
Fluora |
|
|
|
|
|
|
|
|
|
0.127 |
0.165 |
0.110 |
0.075 |
0.456 |
0.120 |
1.620 |
0.120 |
|
0.150 |
0.178 |
0.140 |
0.105 |
0.456 |
0.040 |
2.700 |
0.120 |
|
0.095 |
0.102 |
0.089 |
0.068 |
0.152 |
0.200 |
1.620 |
0.080 |
|
0.134 |
0.191 |
0.107 |
0.060 |
0.760 |
0.200 |
1.080 |
0.160 |
|
0.170 |
0.239 |
0.146 |
0.094 |
0.760 |
0.160 |
2.160 |
0.160 |
|
0.135 |
0.178 |
0.115 |
0.078 |
0.608 |
0.200 |
2.160 |
0.040 |
|
0.129 |
0.193 |
0.089 |
0.041 |
0.760 |
0.120 |
0.540 |
0.160 |
|
0.127 |
0.164 |
0.113 |
0.078 |
0.456 |
0.080 |
2.160 |
0.120 |
|
0.104 |
0.129 |
0.098 |
0.074 |
0.304 |
0.160 |
1.620 |
0.200 |
|
0.157 |
0.193 |
0.134 |
0.093 |
0.608 |
0.160 |
2.700 |
0.040 |
|
0.100 |
0.154 |
0.071 |
0.030 |
0.608 |
0.040 |
0.540 |
0.040 |
|
0.056 |
0.065 |
0.053 |
0.036 |
0.152 |
0.160 |
0.540 |
0.080 |
|
0.094 |
0.144 |
0.078 |
0.043 |
0.608 |
0.120 |
1.080 |
0.040 |
|
0.079 |
0.114 |
0.064 |
0.040 |
0.456 |
0.200 |
0.540 |
0.120 |
|
0.143 |
0.211 |
0.114 |
0.067 |
0.760 |
0.040 |
1.620 |
0.160 |
|
0.081 |
0.087 |
0.081 |
0.069 |
0.152 |
0.040 |
2.160 |
0.080 |
|
0.071 |
0.077 |
0.061 |
0.045 |
0.152 |
0.080 |
1.080 |
0.080 |
|
0.081 |
0.106 |
0.072 |
0.047 |
0.304 |
0.040 |
1.080 |
0.200 |
|
0.114 |
0.119 |
0.115 |
0.096 |
0.152 |
0.120 |
2.700 |
0.080 |
|
0.098 |
0.130 |
0.080 |
0.051 |
0.456 |
0.160 |
1.080 |
0.120 |
|
0.133 |
0.182 |
0.105 |
0.059 |
0.608 |
0.080 |
1.620 |
0.040 |
|
0.070 |
0.095 |
0.064 |
0.042 |
0.304 |
0.080 |
0.540 |
0.200 |
|
0.124 |
0.138 |
0.118 |
0.093 |
0.304 |
0.200 |
2.700 |
0.200 |
|
0.163 |
0.219 |
0.145 |
0.101 |
0.760 |
0.080 |
2.700 |
0.160 |
|
0.128 |
0.147 |
0.116 |
0.086 |
0.304 |
0.120 |
2.160 |
0.200 |
|
|
|
|
|
|
|
|
|
|
286 |
|
|
|
|
|
CHEMOMETRICS |
|
|
|
|
|||
|
Table 5.6 Matrix B for Section 5.3.2. |
|
|
|||
|
|
|
|
|
|
|
|
|
Py |
Ace |
Benz |
Fluor |
|
|
|
|
|
|
|
|
330 |
−3.870 |
2.697 |
14.812 |
−4.192 |
||
335 |
8.609 |
−2.391 |
3.033 |
0.489 |
|
|
340 |
−5.098 |
4.594 |
−49.076 |
7.221 |
|
|
|
345 |
1.848 |
−4.404 |
65.255 |
−2.910 |
|
This equation can be solved using the regression methods in Section 5.2.2, changing vectors and scalars to matrices, so that B = (X .X )−1.X .C , giving the matrix in Table 5.6.
• If desired, represent in equation form, for example, the first column of B suggests that
estimated [pyrene] = −3.870A330 + 8.609A335 − 5.098A340 + 1.848A345
In many areas of optical spectroscopy, these types of equations are very common. Note, though, that changing the wavelengths can have a radical influence on the coefficients, and slight wavelength irreproducibility between spectrometers can lead to equations that are not easily transferred.
• Finally, estimate the concentrations by
ˆ =
C X .B
as indicated in Table 5.7.
The estimates by this approach are very much better than the univariate approaches in this particular example. Figure 5.9 shows the predicted versus known concentrations for pyrene. The root mean square error of prediction is now
E |
= |
|
I |
(ci |
ci )2 |
/21 |
|
i |
1 |
− ˆ |
|
||
|
|
|
|
|
||
|
|
|
|
|
|
|
=
(note that the divisor is 21 not 25 as four degrees of freedom are lost because there are four compounds in the model), equal to 0.042 or 9.13 %, of the average concentration, a significant improvement. Further improvement could be obtained by including the intercept (usually performed by centring the data) and including the concentrations of more compounds. However, the number of wavelengths must be increased if the more compounds are used in the model.
It is possible also to employ classical methods. For the single detector, single wavelength model in Section 2.1.1,
cˆ = x(1/s)
where s is a scalar and x and c are vectors corresponding to the concentrations and absorbances for each of the I samples. Where there are several components in the mixture, this becomes
ˆ |
= |
X.S |
.(S.S )−1 |
C |
|
![](/html/611/48/html_Ju8uix7TUX.TRsv/htmlconvd-VXKArY296x1.jpg)
288 |
|
|
|
|
|
CHEMOMETRICS |
|
||||||
and the trick is to estimate S, which can be done in one of two ways: (1) by knowl- |
||||||
edge of the true spectra or (2) by regression since C .S |
≈ |
ˆ |
= |
(C .C )−1C .X . |
||
|
X , so S |
|
||||
Note that |
|
S .(S .S )−1 |
|
|
|
|
B |
≈ |
|
|
|
|
|
|
|
|
|
|
|
However, as in univariate calibration, the coefficients obtained using both approaches may not be exactly equal, as each method makes different assumptions about error structure.
Such equations make assumptions that the concentrations of the significant analytes are all known, and work well only if this is true. Application to mixtures where there are unknown interferents can result in serious estimation errors.
5.3.3 Multivariate Approaches
The methods in Section 5.3.2 could be extended to all 10 PAHs, and with appropriate choice of 10 wavelengths may give reasonable estimates of concentrations. However, all the wavelengths contain some information and there is no reason why most of the spectrum cannot be employed.
There is a fairly confusing literature on the use of multiple linear regression for calibration in chemometrics, primarily because many workers present their arguments in a very formalised manner. However, the choice and applicability of any method depends on three main factors:
1. the number of compounds in the mixture (N = 10 in this case) or responses to be estimated;
2.the number of experiments (I = 25 in this case), often spectra or chromatograms;
3.the number of variables (J = 27 wavelengths in this case).
In order to have a sensible model, the number of compounds must be less than or equal to the smaller of the number of experiments or number of variables. In certain specialised cases this limitation can be infringed if it is known that there are correlations between concentrations of different compounds. This may happen, for example, in environmental chemistry, where there could be tens or hundreds of compounds in a sample, but the presence of one (e.g. a homologous series) indicates the presence of another, so, in practice there are only a few independent factors or groups of compounds. Also, correlations can be built into the design. In most real world situations there definitely will be correlations in complex multicomponent mixtures. However, the methods described below are for the case where the number of compounds is smaller than the number of experiments or number of detectors.
The X data matrix is ideally related to the concentration and spectral matrices by
X ≈ C .S
where X is a 25 × 27 matrix, C a 25 × 10 matrix and S a 10 × 27 matrix in the example discussed here. In calibration it is assumed that a series of experiments are performed in which C is known (e.g. a set of mixtures of compounds with known concentrations are recorded spectroscopically). An estimate of S can then be obtained by
ˆ = ( .C )−1.C .X
S C
![](/html/611/48/html_Ju8uix7TUX.TRsv/htmlconvd-VXKArY298x1.jpg)
CALIBRATION |
|
|
|
|
|
|
289 |
|
|
||||||
and then the concentrations can be predicted using |
|
||||||
|
Cˆ |
= |
ˆ |
ˆ ˆ |
)−1 |
||
|
|
X.S |
.(S.S |
||||
exactly as above. This can be extended to estimating the concentrations in any unknown |
|||||||
spectrum by |
= |
|
ˆ |
ˆ |
ˆ |
= |
|
cˆ |
|
x.B |
|||||
|
x.S |
.(S.S )−1 |
|
Unless the number of experiments is exactly equal to the number of compounds, the prediction will not be completely model the data. This approach works because the
ˆ ˆ |
) are square matrices whose dimensions equal the number of |
matrices (C .C ) and (S .S |
compounds in the mixture (10 × 10) and have inverses, provided that experiments have been suitably designed and the concentrations of the compounds are not correlated. The predicted concentrations, using this approach, are given in Table 5.8, together with the percentage root mean square prediction error; note that there are only 15 degrees of freedom (=25 experiments −10 compounds). Had the data been centred, the number of degrees of freedom would be reduced further. The predicted concentrations are reasonably good for most compounds apart from acenaphthylene.
Table 5.8 Estimated concentrations for the case study using uncentred MLR and all wavelengths.
Spectrum No. |
|
|
|
PAH concentration (mg l−1) |
|
|
|
|||
|
Py |
Ace |
Anth |
Acy |
Chry |
Benz |
Fluora |
Fluore |
Nap |
Phen |
|
|
|
|
|
|
|
|
|
|
|
1 |
0.509 |
0.092 |
0.200 |
0.151 |
0.369 |
1.731 |
0.121 |
0.654 |
0.090 |
0.433 |
2 |
0.438 |
0.100 |
0.297 |
0.095 |
0.488 |
2.688 |
0.148 |
0.276 |
0.151 |
0.744 |
3 |
0.177 |
0.150 |
0.303 |
0.217 |
0.540 |
1.667 |
0.068 |
0.896 |
0.174 |
0.128 |
4 |
0.685 |
0.177 |
0.234 |
0.150 |
0.369 |
1.099 |
0.128 |
0.691 |
0.026 |
0.728 |
5 |
0.836 |
0.137 |
0.304 |
0.155 |
0.224 |
2.146 |
0.159 |
0.272 |
0.194 |
0.453 |
6 |
0.593 |
0.232 |
0.154 |
0.042 |
0.435 |
2.185 |
0.071 |
0.883 |
0.146 |
1.030 |
7 |
0.777 |
0.164 |
0.107 |
0.129 |
0.497 |
0.439 |
0.189 |
0.390 |
0.158 |
0.206 |
8 |
0.419 |
0.040 |
0.198 |
0.284 |
0.044 |
2.251 |
0.143 |
1.280 |
0.088 |
0.299 |
9 |
0.323 |
0.141 |
0.247 |
0.037 |
0.462 |
1.621 |
0.196 |
0.101 |
−0.003 |
0.298 |
10 |
0.578 |
0.236 |
0.020 |
0.107 |
0.358 |
2.659 |
0.093 |
0.036 |
0.070 |
0.305 |
11 |
0.621 |
0.051 |
0.214 |
0.111 |
0.571 |
0.458 |
0.062 |
0.428 |
0.022 |
0.587 |
12 |
0.166 |
0.187 |
0.170 |
0.142 |
0.087 |
0.542 |
0.100 |
0.343 |
0.103 |
0.748 |
13 |
0.580 |
0.077 |
0.248 |
0.133 |
0.051 |
1.120 |
−0.042 |
0.689 |
0.176 |
0.447 |
14 |
0.468 |
0.248 |
0.057 |
−0.006 |
0.237 |
0.558 |
0.157 |
0.712 |
0.103 |
0.351 |
15 |
0.770 |
0.016 |
0.066 |
0.119 |
0.094 |
1.680 |
0.187 |
0.450 |
0.080 |
0.920 |
16 |
0.101 |
0.026 |
0.100 |
0.041 |
0.338 |
2.230 |
0.102 |
0.401 |
0.201 |
0.381 |
17 |
0.169 |
0.115 |
0.063 |
0.069 |
0.478 |
1.054 |
0.125 |
0.829 |
0.068 |
0.523 |
18 |
0.271 |
0.079 |
0.142 |
0.106 |
0.222 |
1.086 |
0.211 |
0.254 |
0.151 |
0.261 |
19 |
0.171 |
0.152 |
0.216 |
0.059 |
0.274 |
2.587 |
0.081 |
0.285 |
0.013 |
0.925 |
20 |
0.399 |
0.116 |
0.095 |
0.170 |
0.514 |
1.133 |
0.101 |
0.321 |
0.243 |
1.023 |
21 |
0.651 |
0.025 |
0.146 |
0.232 |
0.230 |
1.610 |
−0.013 |
0.940 |
0.184 |
0.616 |
22 |
0.295 |
0.135 |
0.256 |
0.052 |
0.349 |
0.502 |
0.237 |
0.970 |
0.161 |
1.037 |
23 |
0.296 |
0.214 |
0.116 |
0.069 |
0.144 |
2.589 |
0.202 |
0.785 |
0.162 |
0.588 |
24 |
0.774 |
0.085 |
0.187 |
−0.026 |
0.547 |
2.671 |
0.128 |
1.107 |
0.108 |
0.329 |
25 |
0.324 |
0.035 |
0.036 |
0.361 |
0.472 |
2.217 |
0.094 |
0.918 |
0.128 |
0.779 |
E% |
9.79 |
44.87 |
15.58 |
69.43 |
13.67 |
4.71 |
40.82 |
31.38 |
29.22 |
16.26 |
|
|
|
|
|
|
|
|
|
|
|
![](/html/611/48/html_Ju8uix7TUX.TRsv/htmlconvd-VXKArY299x1.jpg)
290 |
CHEMOMETRICS |
|
|
220 |
240 |
260 |
280 |
300 |
320 |
340 |
Wavelength (nm)
Figure 5.10
Normalised spectra of the 10 PAHs estimated by MLR, pyrene in bold
The predicted spectra are presented in Figure 5.10, and are not nearly as well predicted as the concentrations. In fact, it would be remarkable that for such a complex mixture it is possible to reconstruct 10 spectra well, given that there is a great deal of overlap. Pyrene, which is indicated in bold, exhibits most of the main peak maxima of the known pure data (compare with Figure 5.3). Often, other knowledge of the system is required to produce better reconstructions of individual spectra. The reason why concentration predictions appear to work significantly better than spectral reconstruction is that, for most compounds, there are characteristic regions of the spectrum containing prominent features. These parts of the spectra for individual compounds will be predicted well, and will disproportionately influence the effectiveness of the method for determining concentrations. However, MLR as described in this section is not an effective method for determining spectra in complex mixtures, and should be employed primarily as a way of determining concentrations.
MLR predicts concentrations well in this case because all significant compounds are included in the model, and so the data are almost completely modelled. If we knew of only a few compounds, there would be much poorer predictions. Consider the situation in which only pyrene, acenaphthene and anthracene are known. The C matrix now has only three columns, and the predicted concentrations are given in Table 5.9. The errors are, as expected, much larger than those in Table 5.8. The absorbances of the remaining seven compounds are mixed up with those of the three modelled components. This problem could be overcome if some characteristic wavelengths or regions of the spectrum at which the selected compounds absorb most strongly are identified, or if the experiments were designed so that there are correlations in the data, or even by a number of methods for weighted regression, but the need to provide information about all significant compounds is a major limitation of MLR.
The approach described above is a form of classical calibration, and it is also possible to envisage an inverse calibration model since
ˆ =
C X .B