Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

Brereton Chemometrics

.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

4.3 Mб

Скачать

☆

<<< < Предыдущая 18 19 20 21 22 23 24 25 26 27 28 2930 / 5030 31 32 33 34 35 36 37 38 39 40 41 42 > Следующая >>>

282	CHEMOMETRICS

Exactly the same principles can be employed for calculating the coefﬁcients as in Section 2.1.2, but in this case b is a vector rather than scalar, and X is a matrix rather

than a vector, so that

b ≈ (X .X )−1.X .c

cˆ = −0.173 + 4.227x

Note that the coefﬁcients are different to those of Section 5.2.2. One reason is that there are still a number of interferents, from the other PAHs, in the spectrum at 335 nm, and these are modelled partly by the intercept term. The models of the previous sections force the best ﬁt straight line to pass through the origin. A better ﬁt can be obtained if this condition is not required. The new best ﬁt straight line is presented in Figure 5.6 and results, visually, in a much better ﬁt to the data.

The predicted concentrations are fairly easy to obtain, the easiest approach involving the use of matrix based methods, so that

cˆ = X.b

the root mean square error being given by

√

E = 0.229/23 = 0.100 mg l−1

representing an E% of 21.8 % relative to the mean. Note that the error term should be divided by 23 (number of degrees of freedom rather than 25) to reﬂect the two parameters used in the model.

One interesting and important consideration is that the apparent root mean square error in Sections 5.2.2 and 5.2.3 is only reduced by a small amount, yet the best ﬁt straight line appears much worse if we neglect the intercept. The reason for this is that there is still a considerable replicate error, and this cannot readily be modelled using a

Absorbance (AU)

0.30

0.25

0.20

0.15

0.10

0.05

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

Concentration mg l−1

Figure 5.6

Best ﬁt straight line using inverse calibration: data of Figure 5.5 and an intercept term

CALIBRATION	283

single compound model. If this contribution were removed the error would be reduced dramatically.

An alternative, and common, method for including the intercept is to mean centre both the x and the c variables to ﬁt the equation

c − c ≈ (x − x)b

cenc ≈ cenxb

(xi − x)(ci − c)

b ≈ (cenx .cenx)−1.cenx .cenc = i=1

(xi − x)2

i=1

It is easy to show algebraically that

•the value of b when both variables have been centred is identical with the value of b1 obtained when the data are modelled including an intercept term (=4.227 in this example);

•the value of b0 (intercept term for uncentred data) is given by c − bx = 0.469 − 4.227 × 0.149 = −0.173, so the two methods are related.

It is common to centre both sets of variables for this reason, the calculations being mathematically simpler than including an intercept term. Note that both blocks must be centred, and the predictions are of the concentrations minus their mean, so the mean concentration must be added back to return to the original physical values.

	0.9
	0.8
−1	0.7
mgl	0.6
concentration	0.6
concentration	0.5
	0.5
Predicted	0.4
Predicted	0.3
	0.3
	0.2
	0.1
	0
	0.0	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8
				True concentration mg l−1

Figure 5.7

Predicted (vertical) versus known (horizontal) concentrations using the methods in Section 5.2.3

284	CHEMOMETRICS

In calibration it is common to plot a graph of predicted versus observed concentrations as presented in Figure 5.7. This looks superﬁcially similar to that in the previous ﬁgure, but the vertical scale is different and the graph goes through the origin (providing the data have been mean centred). There is a variety of potential graphical output and it is important not to be confused, but to distinguish each type of information carefully.

It is important to realise that the predictions for the method described in this section differ from those obtained for the uncentred data. It is also useful to realise that similar methods can be applied to classical calibration, the details being omitted for brevity, as it is recommended that inverse calibration is performed in normal circumstances.

5.3 Multiple Linear Regression

5.3.1 Multidetector Advantage

Multiple linear regression (MLR) is an extension when more than one response is employed. There are two principal reasons for this. The ﬁrst is that there may be more than one component in a mixture. Under such circumstances it is usual to employ more than one response (the exception being if the concentrations of some of the components are known to be correlated): for N components, at least N wavelengths should normally be used. The second is that each detector contains extra, and often complementary, information: some individual wavelengths in a spectrum may be inﬂuenced by noise or unknown interferents. Using, for example, 100 wavelengths averages out the information, and will often provide a better result than relying on a single wavelength.

5.3.2 Multiwavelength Equations

In certain applications, equations can be developed that are used to predict the concentrations of compounds by monitoring at a ﬁnite number of wavelengths. A classical area is in pigment analysis by electronic absorption spectroscopy, for example in the area of chlorophyll chemistry. In order to determine the concentration of four pigments in a mixture, investigators recommend monitoring at four different wavelengths, and to use an equation that links absorbance at each wavelength to concentration of the pigments.

In the PAH case study, only certain compounds absorb above 330 nm, the main ones being pyrene, ﬂuoranthene, acenaphthylene and benzanthracene (note that the small absorbance due to a ﬁfth component may be regarded as an interferent, although adding this to the model will, of course, result in better predictions). It is possible to choose four wavelengths, preferably ones in which the absorbance ratios of these four compounds differ. The absorbance at wavelengths 330, 335, 340 and 345 nm are indicated in Figure 5.8. Of course, it is not necessary to select four sequential wavelengths; any four wavelengths would be sufﬁcient, provided that the four compounds are the main ones represented by these variables to give an X matrix with four columns and 25 rows.

Calibration equations can be obtained, as follows, using inverse methods.

•First, select the absorbances of the 25 spectra at these four wavelengths.

•Second, obtain the corresponding C matrix consisting of the relevant concentrations. These new (reduced) matrices are presented in Table 5.5.

• The aim is to ﬁnd coefﬁcients B relating X and C by C ≈ X .B, where B is a 4 × 4 matrix, each column representing a compound and each row a wavelength.

CALIBRATION	285

Absorbance
330	335	340	345

Wavelength (nm)

Figure 5.8

Absorbances of pure Pyr, Fluor, Benz and Ace between 330 and 345 nm

Table 5.5 Matrices for four components.

		X					C

330	335	340	345		Py	Ace	Benz	Fluora

0.127	0.165	0.110	0.075	0.456		0.120	1.620	0.120
0.150	0.178	0.140	0.105	0.456		0.040	2.700	0.120
0.095	0.102	0.089	0.068	0.152		0.200	1.620	0.080
0.134	0.191	0.107	0.060	0.760		0.200	1.080	0.160
0.170	0.239	0.146	0.094	0.760		0.160	2.160	0.160
0.135	0.178	0.115	0.078	0.608		0.200	2.160	0.040
0.129	0.193	0.089	0.041	0.760		0.120	0.540	0.160
0.127	0.164	0.113	0.078	0.456		0.080	2.160	0.120
0.104	0.129	0.098	0.074	0.304		0.160	1.620	0.200
0.157	0.193	0.134	0.093	0.608		0.160	2.700	0.040
0.100	0.154	0.071	0.030	0.608		0.040	0.540	0.040
0.056	0.065	0.053	0.036	0.152		0.160	0.540	0.080
0.094	0.144	0.078	0.043	0.608		0.120	1.080	0.040
0.079	0.114	0.064	0.040	0.456		0.200	0.540	0.120
0.143	0.211	0.114	0.067	0.760		0.040	1.620	0.160
0.081	0.087	0.081	0.069	0.152		0.040	2.160	0.080
0.071	0.077	0.061	0.045	0.152		0.080	1.080	0.080
0.081	0.106	0.072	0.047	0.304		0.040	1.080	0.200
0.114	0.119	0.115	0.096	0.152		0.120	2.700	0.080
0.098	0.130	0.080	0.051	0.456		0.160	1.080	0.120
0.133	0.182	0.105	0.059	0.608		0.080	1.620	0.040
0.070	0.095	0.064	0.042	0.304		0.080	0.540	0.200
0.124	0.138	0.118	0.093	0.304		0.200	2.700	0.200
0.163	0.219	0.145	0.101	0.760		0.080	2.700	0.160
0.128	0.147	0.116	0.086	0.304		0.120	2.160	0.200

286						CHEMOMETRICS

	Table 5.6 Matrix B for Section 5.3.2.

		Py	Ace	Benz	Fluor

330		−3.870	2.697	14.812	−4.192
335		8.609	−2.391	3.033	0.489
340		−5.098	4.594	−49.076	7.221
	345	1.848	−4.404	65.255	−2.910

This equation can be solved using the regression methods in Section 5.2.2, changing vectors and scalars to matrices, so that B = (X .X )−1.X .C , giving the matrix in Table 5.6.

• If desired, represent in equation form, for example, the ﬁrst column of B suggests that

estimated [pyrene] = −3.870A330 + 8.609A335 − 5.098A340 + 1.848A345

In many areas of optical spectroscopy, these types of equations are very common. Note, though, that changing the wavelengths can have a radical inﬂuence on the coefﬁcients, and slight wavelength irreproducibility between spectrometers can lead to equations that are not easily transferred.

• Finally, estimate the concentrations by

ˆ =

C X .B

as indicated in Table 5.7.

The estimates by this approach are very much better than the univariate approaches in this particular example. Figure 5.9 shows the predicted versus known concentrations for pyrene. The root mean square error of prediction is now

E	=		I	(ci	ci )2	/21
	=	i		1	− ˆ

(note that the divisor is 21 not 25 as four degrees of freedom are lost because there are four compounds in the model), equal to 0.042 or 9.13 %, of the average concentration, a signiﬁcant improvement. Further improvement could be obtained by including the intercept (usually performed by centring the data) and including the concentrations of more compounds. However, the number of wavelengths must be increased if the more compounds are used in the model.

It is possible also to employ classical methods. For the single detector, single wavelength model in Section 2.1.1,

cˆ = x(1/s)

where s is a scalar and x and c are vectors corresponding to the concentrations and absorbances for each of the I samples. Where there are several components in the mixture, this becomes

ˆ	=	X.S	.(S.S )−1
C

CALIBRATION				287

	Table 5.7 Estimated concentrations (mg l−1) for four components
	as described in Section 5.3.2.

	Py	Ace	Benz	Fluor

0.507		0.123	1.877	0.124
0.432		0.160	2.743	0.164
0.182		0.122	1.786	0.096
0.691		0.132	1.228	0.130
0.829		0.144	2.212	0.185
0.568		0.123	1.986	0.125
0.784		0.115	0.804	0.077
0.488		0.126	1.923	0.137
0.345		0.096	1.951	0.119
0.543		0.168	2.403	0.133
0.632		0.096	0.421	0.081
0.139		0.081	0.775	0.075
0.558		0.078	0.807	0.114
0.423		0.058	0.985	0.070
0.806		0.110	1.535	0.132
0.150		0.079	1.991	0.087
0.160		0.089	1.228	0.050
0.319		0.089	1.055	0.095
0.174		0.128	2.670	0.131
0.426		0.096	1.248	0.082
0.626		0.146	1.219	0.118
0.298		0.071	0.925	0.093
0.278		0.137	2.533	0.129
0.702		0.137	2.553	0.177
0.338		0.148	2.261	0.123

	0.9
	0.8
−1	0.7
l	0.7
mg	0.6
concentration	0.6
concentration	0.5
	0.5
Predicted	0.4
Predicted	0.3
	0.3
	0.2
	0.1
	0.0
	0.0	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8
				True concentration mg l−1

Figure 5.9

Predicted versus known concentration of pyrene, using a four component model and the wavelengths 330, 335, 340 and 345 nm (uncentred)

288						CHEMOMETRICS

and the trick is to estimate S, which can be done in one of two ways: (1) by knowl-
edge of the true spectra or (2) by regression since C .S			≈	ˆ	=	(C .C )−1C .X .
edge of the true spectra or (2) by regression since C .S				X , so S		(C .C )−1C .X .
Note that		S .(S .S )−1
B	≈	S .(S .S )−1
	≈

However, as in univariate calibration, the coefﬁcients obtained using both approaches may not be exactly equal, as each method makes different assumptions about error structure.

Such equations make assumptions that the concentrations of the signiﬁcant analytes are all known, and work well only if this is true. Application to mixtures where there are unknown interferents can result in serious estimation errors.

5.3.3 Multivariate Approaches

The methods in Section 5.3.2 could be extended to all 10 PAHs, and with appropriate choice of 10 wavelengths may give reasonable estimates of concentrations. However, all the wavelengths contain some information and there is no reason why most of the spectrum cannot be employed.

There is a fairly confusing literature on the use of multiple linear regression for calibration in chemometrics, primarily because many workers present their arguments in a very formalised manner. However, the choice and applicability of any method depends on three main factors:

1. the number of compounds in the mixture (N = 10 in this case) or responses to be estimated;

2.the number of experiments (I = 25 in this case), often spectra or chromatograms;

3.the number of variables (J = 27 wavelengths in this case).

In order to have a sensible model, the number of compounds must be less than or equal to the smaller of the number of experiments or number of variables. In certain specialised cases this limitation can be infringed if it is known that there are correlations between concentrations of different compounds. This may happen, for example, in environmental chemistry, where there could be tens or hundreds of compounds in a sample, but the presence of one (e.g. a homologous series) indicates the presence of another, so, in practice there are only a few independent factors or groups of compounds. Also, correlations can be built into the design. In most real world situations there deﬁnitely will be correlations in complex multicomponent mixtures. However, the methods described below are for the case where the number of compounds is smaller than the number of experiments or number of detectors.

The X data matrix is ideally related to the concentration and spectral matrices by

X ≈ C .S

where X is a 25 × 27 matrix, C a 25 × 10 matrix and S a 10 × 27 matrix in the example discussed here. In calibration it is assumed that a series of experiments are performed in which C is known (e.g. a set of mixtures of compounds with known concentrations are recorded spectroscopically). An estimate of S can then be obtained by

ˆ = ( .C )−1.C .X

S C

CALIBRATION							289

and then the concentrations can be predicted using
	Cˆ		=	ˆ	ˆ ˆ	)−1
				X.S	.(S.S
exactly as above. This can be extended to estimating the concentrations in any unknown
spectrum by	=		ˆ	ˆ	ˆ	=
cˆ							x.B
		x.S		.(S.S )−1

Unless the number of experiments is exactly equal to the number of compounds, the prediction will not be completely model the data. This approach works because the

ˆ ˆ	) are square matrices whose dimensions equal the number of
matrices (C .C ) and (S .S

compounds in the mixture (10 × 10) and have inverses, provided that experiments have been suitably designed and the concentrations of the compounds are not correlated. The predicted concentrations, using this approach, are given in Table 5.8, together with the percentage root mean square prediction error; note that there are only 15 degrees of freedom (=25 experiments −10 compounds). Had the data been centred, the number of degrees of freedom would be reduced further. The predicted concentrations are reasonably good for most compounds apart from acenaphthylene.

Table 5.8 Estimated concentrations for the case study using uncentred MLR and all wavelengths.

Spectrum No.				PAH concentration (mg l−1)
	Py	Ace	Anth	Acy	Chry	Benz	Fluora	Fluore	Nap	Phen

1	0.509	0.092	0.200	0.151	0.369	1.731	0.121	0.654	0.090	0.433
2	0.438	0.100	0.297	0.095	0.488	2.688	0.148	0.276	0.151	0.744
3	0.177	0.150	0.303	0.217	0.540	1.667	0.068	0.896	0.174	0.128
4	0.685	0.177	0.234	0.150	0.369	1.099	0.128	0.691	0.026	0.728
5	0.836	0.137	0.304	0.155	0.224	2.146	0.159	0.272	0.194	0.453
6	0.593	0.232	0.154	0.042	0.435	2.185	0.071	0.883	0.146	1.030
7	0.777	0.164	0.107	0.129	0.497	0.439	0.189	0.390	0.158	0.206
8	0.419	0.040	0.198	0.284	0.044	2.251	0.143	1.280	0.088	0.299
9	0.323	0.141	0.247	0.037	0.462	1.621	0.196	0.101	−0.003	0.298
10	0.578	0.236	0.020	0.107	0.358	2.659	0.093	0.036	0.070	0.305
11	0.621	0.051	0.214	0.111	0.571	0.458	0.062	0.428	0.022	0.587
12	0.166	0.187	0.170	0.142	0.087	0.542	0.100	0.343	0.103	0.748
13	0.580	0.077	0.248	0.133	0.051	1.120	−0.042	0.689	0.176	0.447
14	0.468	0.248	0.057	−0.006	0.237	0.558	0.157	0.712	0.103	0.351
15	0.770	0.016	0.066	0.119	0.094	1.680	0.187	0.450	0.080	0.920
16	0.101	0.026	0.100	0.041	0.338	2.230	0.102	0.401	0.201	0.381
17	0.169	0.115	0.063	0.069	0.478	1.054	0.125	0.829	0.068	0.523
18	0.271	0.079	0.142	0.106	0.222	1.086	0.211	0.254	0.151	0.261
19	0.171	0.152	0.216	0.059	0.274	2.587	0.081	0.285	0.013	0.925
20	0.399	0.116	0.095	0.170	0.514	1.133	0.101	0.321	0.243	1.023
21	0.651	0.025	0.146	0.232	0.230	1.610	−0.013	0.940	0.184	0.616
22	0.295	0.135	0.256	0.052	0.349	0.502	0.237	0.970	0.161	1.037
23	0.296	0.214	0.116	0.069	0.144	2.589	0.202	0.785	0.162	0.588
24	0.774	0.085	0.187	−0.026	0.547	2.671	0.128	1.107	0.108	0.329
25	0.324	0.035	0.036	0.361	0.472	2.217	0.094	0.918	0.128	0.779
E%	9.79	44.87	15.58	69.43	13.67	4.71	40.82	31.38	29.22	16.26

290	CHEMOMETRICS

220

240

260

280

300

320

340

Wavelength (nm)

Figure 5.10

Normalised spectra of the 10 PAHs estimated by MLR, pyrene in bold

The predicted spectra are presented in Figure 5.10, and are not nearly as well predicted as the concentrations. In fact, it would be remarkable that for such a complex mixture it is possible to reconstruct 10 spectra well, given that there is a great deal of overlap. Pyrene, which is indicated in bold, exhibits most of the main peak maxima of the known pure data (compare with Figure 5.3). Often, other knowledge of the system is required to produce better reconstructions of individual spectra. The reason why concentration predictions appear to work signiﬁcantly better than spectral reconstruction is that, for most compounds, there are characteristic regions of the spectrum containing prominent features. These parts of the spectra for individual compounds will be predicted well, and will disproportionately inﬂuence the effectiveness of the method for determining concentrations. However, MLR as described in this section is not an effective method for determining spectra in complex mixtures, and should be employed primarily as a way of determining concentrations.

MLR predicts concentrations well in this case because all signiﬁcant compounds are included in the model, and so the data are almost completely modelled. If we knew of only a few compounds, there would be much poorer predictions. Consider the situation in which only pyrene, acenaphthene and anthracene are known. The C matrix now has only three columns, and the predicted concentrations are given in Table 5.9. The errors are, as expected, much larger than those in Table 5.8. The absorbances of the remaining seven compounds are mixed up with those of the three modelled components. This problem could be overcome if some characteristic wavelengths or regions of the spectrum at which the selected compounds absorb most strongly are identiﬁed, or if the experiments were designed so that there are correlations in the data, or even by a number of methods for weighted regression, but the need to provide information about all signiﬁcant compounds is a major limitation of MLR.

The approach described above is a form of classical calibration, and it is also possible to envisage an inverse calibration model since

ˆ =

C X .B

CALIBRATION						291

	Table 5.9 Estimates for three PAHs using the full dataset and
	MLR but including only three compounds in the model.

	Spectrum No.				PAH concentration (mg l−1)
		Py			Ace	Anth

1		0.539			0.146	0.156
2		0.403			0.173	0.345
3		0.199			0.270	0.138
4		0.749			0.015	0.231
5		0.747			0.103	0.211
6		0.489			0.165	0.282
7		0.865			0.060	−0.004
8		0.459			0.259	0.080
9		0.362			0.121	0.211
10		0.512			0.351	−0.049
11		0.742			−0.082	0.230
12		0.209			0.023	0.218
13		0.441			0.006	0.202
14		0.419			0.095	0.051
15		0.822			0.010	0.192
16		0.040			0.255	0.151
17		0.259			0.162	0.122
18		0.323			0.117	0.104
19		0.122			0.179	0.346
20		0.502			0.085	0.219
21		0.639			0.109	0.130
22		0.375			−0.062	0.412
23		0.196			0.316	0.147
24		0.638			0.218	0.179
25		0.545			0.317	0.048
	E%	22.04986			105.7827	52.40897
However, unlike in Section 2.2.2, there are now more wavelengths than samples or
components in the mixture. The matrix B is given by
		B	=	(X .X )−1.X .C

as above. A problem with this approach is that the matrix (X X ) is now a large matrix, with 27 rows and 27 columns, compared with the matrices used above which have 10 rows and 10 columns only. If there are only 10 components in a mixtures, in a noise free experiment, the matrix X X would only have 10 degrees of freedom and no inverse. In practice, a numerical inverse can be computed but it will be largely a function of noise, and often contain some very large (and meaningless) numbers, because many of the columns of the matrix will contain correlations, as the determinant of the matrix X .X will be very small. This use of the inverse is only practicable if

1.the number of experiments and wavelengths is at least equal to the number of components in the mixture, and

2.the number of experiments is at least equal to the number of wavelengths.

Condition 2 either requires a large number of extra experiments to be performed or a reduction to 25 wavelengths. There have been a number of algorithms developed for

<<< < Предыдущая 18 19 20 21 22 23 24 25 26 27 28 2930 / 5030 31 32 33 34 35 36 37 38 39 40 41 42 > Следующая >>>

Соседние файлы в предмете Химия

#
15.08.20134.29 Mб17Baer M., Billing G.D. (eds.) - The role of degenerate states in chemistry (Adv.Chem.Phys. special issue, Wiley, 2002).pdf
#
15.08.20137.08 Mб55Basov N.I. i dr. Raschet i konstruirovanie formiruyushchego instrumenta dlya izgotovleniya izdelij (1991.pdf
#
15.08.20135.59 Mб69Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
#
15.08.2013324.82 Кб32benzyne-cyclization.pdf
#
15.08.201314.48 Mб18Borowko M. 2000 Computational methods in surface and colloid science.djvu
#
15.08.20134.3 Mб49Brereton Chemometrics.pdf
#
15.08.20131.07 Mб30Burshtejn K.Ya., Shorygin P.P. Kvantovohimicheskie raschety v organicheskoj himii i molekulyarnoj.pdf
#
15.08.201321.36 Mб45Carey F.A. - Organic Chemistry (2004)(en).djvu
#
15.08.201321.36 Mб39Carey F.A. Advanced organic chemistry 5ed., MGH, 2004.djvu
#
15.08.201311.62 Mб23Carey F.A. Advanced organic chemistry. Part A structure and mechanisms 1938.djvu
#
15.08.20138.77 Mб17Carey F.A. Advanced organic chemistry. Part B reaction and synthesis 1938.djvu