Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Brereton Chemometrics

.pdf
Скачиваний:
48
Добавлен:
15.08.2013
Размер:
4.3 Mб
Скачать

EXPERIMENTAL DESIGN

39

 

 

 

 

 

 

Response

20

10

0

4

5

6

pH

Figure 2.13

Graph of estimated response versus pH at the central temperature of the design in Table 2.6

an apparent change in sign. Using raw data, we might conclude that the response increases with increasing x, whereas with the coded data, the opposite conclusion might be drawn. Which is correct? Returning to our example, although the graph of the response depends on interaction effects, and so the relationship between y and pH is different at each temperature and concentration, but at the central point of the design, it is given in Figure 2.13, increasing monotonically over the experimental region. Indeed, the average value of the response when the pH is equal to 6 is higher than the average value when it is equal to 4. Hence it is correct to conclude that the response increases with pH, and the negative coefficient of Table 2.8 is misleading. Using coded data provides correct conclusions about the trends whereas the coefficients for the raw data may lead to incorrect deductions.

Therefore, without taking great care, misleading conclusions can be obtained about the significance and influence of the different factors. It is essential that the user of simple chemometric software is fully aware of this, and always interprets numbers in terms of physical meaning.

2.2.4.2 Size of Coefficients

The simplest approach to determining significance is simply to look at the magnitude of the coefficients. Provided that the data are coded correctly, the larger the coefficient, the greater is its significance. This depends on each coded factor varying over approximately the same range (between +1 and 1 in this case). Clearly, small differences in range are not important, often the aim is to say whether a particular factor has a significant influence or not rather than a detailed interpretation of the size of the coefficients. A value of 5.343 for b1 implies that on average the response is higher by 5.343 if the value of b1 is increased by one coded pH unit. This is easy to verify, and provides an alternative, classical, approach to the calculation of the coefficients:

1.consider the 10 experiments at which b1 is at a coded level of either +1 or 1, namely the first 10 experiments;

40

CHEMOMETRICS

 

 

2.then group these in five pairs, each of which the levels of the other two main factors are identical; these pairs are {1, 5}, {2, 6}, {3, 7}, {4, 8} and {9, 10};

3.take the difference between the responses at the levels and average them:

[(34.841 19.825) + (16.567 1.444) + (45.396 37.673)

+ (27.939 23.131) + (23.088 12.325)]/5

which gives an answer of 10.687 representing the average change in value of the response when the pH is increased from a coded value of 1 to one of +1, half of which equals the coefficient 5.343.

It is useful to make practical deductions from the data which will guide the experimenter.

The response varies over a range of 43.953 units between the lowest and highest observation in the experimental range.

Hence the linear effect of pH, on average, is to increase the response by twice the coded coefficient or 10.687 units over this range, approximately 25 % of the variation, probably quite significant. The effect of the interaction between pH and concentration (b13), however, is only 0.702 units or a very small contribution, rather less than the replicate error, so this factor is unlikely to be useful.

The squared terms must be interpreted slightly differently. The lowest possible coded value for the squared terms is 0, not 1, so we do not double these values to obtain

an indication of significance, the range of variation of the squared terms being between 0 and +1, or half that of the other terms.

It is not necessary, of course, to have replicates to perform this type of analysis. If the yield of a reaction varies between 50 and 90 % over a range of experimental conditions, then a factor that contributes, on average, only 1 % of this increase is unlikely to be too important. However, it is vital in all senses that the factors are coded for meaningful comparison. In addition, certain important properties of the design (namely orthogonality) which will be discussed in detail in later sections are equally important.

Provided that the factors are coded correctly, it is fairly easy to make qualitative comparisons of significance simply by examining the size of the coefficients either numerically and graphically. In some cases the range of variation of each individual factor might differ slightly (for example squared and linear terms above), but provided that this is not dramatic, for rough indications the sizes of the factors can be legitimately compared. In the case of two level factorial designs (described in Sections 2.6–2.8), each factor is normally scaled between 1 and +1, so all coefficients are on the same scale.

2.2.4.3 Student’s t-Test

An alternative, statistical indicator, based on Student’s t-test, can be used, provided that more experiments are performed than there are parameters in the model. Whereas this and related statistical indicators have a long and venerated history, it is always important to back up the statistics by simple graphs and considerations about the data. There are many diverse applications of a t-test, but in the context of analysing the significance of factors on designed experiments, the following the main steps are used

EXPERIMENTAL DESIGN

 

 

 

 

 

 

 

 

 

41

 

 

 

 

 

 

 

 

 

Table 2.11 Calculation of t -statistic.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(a) Matrix (D .D)1

 

 

 

 

 

 

 

 

 

 

b0

b1

b2

b3

b11

b22

b33

b12

b13

b23

 

b0

0.118

0.000

0.000

0.000

0.045 0.045 0.045

0.000

0.000

0.000

 

b1

0.000

0.100

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

 

b2

0.000

0.000

0.100

0.000

0.000

0.000

0.000

0.000

0.000

0.000

 

b3

0.000

0.000

0.000

0.100

0.000

0.000

0.000

0.000

0.000

0.000

 

b11

0.045

0.000

0.000

0.000

0.364

0.136

0.136

0.000

0.000

0.000

 

b22

0.045

0.000

0.000

0.000

0.136

0.364

0.136

0.000

0.000

0.000

 

b33

0.045

0.000

0.000

0.000

0.136

0.136

0.364

0.000

0.000

0.000

 

b12

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.125

0.000

0.000

 

b13

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.125

0.000

 

b23

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.125

(b) Values of t and significance

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

% Probability

s

ν

b

t

b0

0.118

0.307

17.208

56.01

>99.9

 

b1

0.100

0.283

5.343

18.91

>99.9

 

b2

0.100

0.283

7.849

27.77

>99.9

 

b3

0.100

0.283

8.651

30.61

>99.9

 

b11

0.364

0.539

0.598

1.11

70.7

 

b22

0.364

0.539

7.867

14.60

>99.9

 

b33

0.364

0.539

0.154

0.29

22.2

 

b12

0.125

0.316

2.201

6.97

>99.9

 

b13

0.125

0.316

0.351

1.11

70.7

 

b23

0.125

0.316

0.582

1.84

90.4

 

and are illustrated in Table 2.11 for the example described above using the coded values of Table 2.9.

1.Calculate the matrix (D D)1. This will be a square matrix with dimensions equal to the number of parameters in the model.

2.Calculate the error sum of squares between the predicted and observed data (compare the actual response in Table 2.6 with the predictions of Table 2.8):

I

Sresid = (yi yˆi )2 = 7.987

i=1

3.Take the mean the error sum of squares (divided by the number of degrees of freedom available for testing for regression):

s = Sresid /(N P ) = 7.987/(20 10) = 0.799

Note that the t-test is not applicable to data where the number of experiments equals the number of parameters, such as full factorial designs discussed in Section 2.3.1, where all possible terms are included in the model.

4.For each of the P parameters (=10 in this case), take the appropriate number from the diagonal of the matrix of Table 2.11(a) obtained in step 1 above. This

Slof

42

 

 

 

CHEMOMETRICS

 

 

 

is called the variance for each parameter, so that, for example, v11 = 0.364 (the

variance of b11).

=

 

 

 

5. For each coefficient, b, calculate t

b/sv. The higher this ratio, the more signif-

 

icant is the coefficient. This ratio is used for the t-test.

6.The statistical significance can then be obtained from a two-tailed t-distribution (this is described in detail in Appendix A.3.4), or most packages such as Excel have simple functions for the t-test. Take the absolute value of the ratio calculated above. If you use a table, along the left-hand column of a t-distribution table are tabulated

degrees of freedom, which equal the number available to test for regression, or N P or 10 in this case. Along the columns, locate the percentage probability (often the higher the significance the smaller is the percentage, so simply subtract from 1). The higher this probability, the greater is the confidence that the factor is significant. So, using Table A.4 we see that a critical value of 4.1437 indicates 99.9 % certainty that a parameter is significant for 10 degrees of freedom, hence any value above this is highly significant. 95 % significance results in a value of 1.8125, so b23 is just above this level. In fact, the numbers in Table 2.11 were calculated using the Excel function TDIST, which gives provides probabilities for any value of t and any number of degrees of freedom. Normally, fairly high probabilities are expected if a factor is significant, often in excess of 95 %.

2.2.4.4 F -test

The F -test is another alternative. A common use of the F -test is together with ANOVA, and asks how significant one variance (or mean sum of squares) is relative to another one; typically, how significant the lack-of-fit is compared with the replicate error. Simply determine the mean square lack-of-fit to replicate errors (e.g. see Table 2.4) and check the size of this number. F -distribution tables are commonly presented at various probability levels. We use a one-tailed F -test in this case as the aim is to see whether one variance is significantly bigger than another, not whether it differs significantly; this differs from the t-test, which is two tailed in the application described in Section 2.2.4.3. The columns correspond to the number of degrees of freedom for

and the rows to Srep (in the case discussed in here). The table allows one to ask how significant is the error (or variance) represented along the columns relative to that represented along the rows. Consider the proposed models for datasets A and B both excluding the intercept. Locate the relevant number [for a 95 % confidence that the lack-of-fit is significant, five degrees of freedom for the lack-of-fit and four degrees of freedom for the replicate error, this number is 6.26, see Table A.3 (given by a distribution often called F(5,4)), hence an F -ratio must be greater than this value for this level of confidence]. Returning to Table 2.4, it is possible to show that the chances of the lack-of-fit to a model without an intercept are not very high for the data in Figure 2.9 (ratio = 0.49), but there is some doubt about the data arising from Figure 2.10 (ratio = 1.79); using the FDIST function in Excel we can see that the probability is 70.4 %, below the 95 % confidence that the intercept is significant, but still high enough to give us some doubts. Nevertheless, the evidence is not entirely conclusive. A reason is that the intercept term (2.032) is of approximately the same order of magnitude as the replicate error (1.194), and for this level of experimental variability it will never be possible to predict and model the presence of an intercept of this size with a high degree of confidence.

EXPERIMENTAL DESIGN

43

 

 

Table 2.12 F -ratio for experiment with low experimental error.

Concentration

Absorbance

 

Model with

Model without

 

 

 

intercept

intercept

 

 

 

 

 

1

3.500

b0

0.854

n/a

1

3.398

b1

2.611

2.807

2

6.055

 

0.0307

1.4847

3

8.691

Sreg

3

8.721

Srep

0.0201

0.0201

4

11.249

Slof

0.0107

1.4646

411.389

513.978

616.431

6

16.527

F -ratio

0.531

58.409

 

 

 

 

 

The solution is perform new experiments, perhaps on a different instrument, in which the reproducibility is much greater. Table 2.12 is an example of such a dataset, with essential statistics indicated. Now the F -ratio for the lack-of-fit without the intercept becomes 58.409, which is significant at the >99 % level (critical value from Table A.2) whereas the lack-of-fit with the intercept included is less than the experimental error.

2.2.4.5 Normal Probability Plots

For designs where there are no replicates (essential for most uses of the F -test) and also where there no degrees of freedom available to assess the lack-of-fit to the data (essential for a t-test), other approaches can be employed to examine the significance of coefficients.

As discussed in Section 2.3, two-level factorial designs are common, and provided that the data are appropriately coded, the size of the coefficients relates directly to their significance. Normally several coefficients are calculated, and an aim of experimentation is to determine which have significance, the next step possibly then being to perform another more detailed design for quantitative modelling of the significant effects. Often it is convenient to present the coefficients graphically, and a classical approach is to plot them on normal probability paper. Prior to the computer age, a large number of different types of statistical graph paper were available, assisting data analysis. However, in the age of computers, it is easy to obtain relevant graphs using simple computer packages.

The principle of normal probability plots is that if a series of numbers is randomly selected, they will often form a normal distribution (see Appendix A.3.2). For example, if I choose seven numbers randomly, I would expect, in the absence of systematic effects, that these numbers would be approximately normally distributed. Hence if we look at the size of seven effects, e.g. as assessed by their values of b (provided that the data are properly coded and the experiment is well designed, of course), and the effects are simply random, on average we would expect the size of each effect to occur evenly over a normal distribution curve. In Figure 2.14, seven lines are indicated on the normal distribution curve (the horizontal axis representing standard deviations from the mean) so that the areas between each line equal one-seventh of the total area (the areas at the extremes adding up to 1/7 in total). If, however, an effect is very large, it will fall at a very high or low value, so large that it is unlikely to be arise from

44

CHEMOMETRICS

 

 

Normal distribution

probability

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

−4

−3

−2

 

−1

0

1

2

3

4

standard deviation

Figure 2.14

Seven lines, equally spaced in central regions and the sum of

area, dividing the normal distribution into eight regions, the six the two extreme regions having equal areas

Table 2.13 Normal probability calculation.

Effect

Coefficient

(p 0.5)/7

Standard deviation

b1

6.34

0.0714

1.465

b12

0.97

0.2143

0.792

b13

0.6

0.3571

0.366

b123

1.36

0.5

0

b3

2.28

0.6429

0.366

b12

5.89

0.7858

0.792

b2

13.2

0.9286

1.465

random processes, and is significant. Normal probability plots can be used to rank the coefficients in size (the most negative being the lowest, the most positive the highest), from the rank determine the likely position in the normal probability plot and then produce a graph of the coefficients against this likely position. The insignificant effects should form approximately on a straight line in the centre of the graph, significant effects will deviate from this line.

Table 2.13 illustrates the calculation.

1.Seven possible coefficients are to be assessed for significance. Note that the b0 coefficient cannot be analysed in this way.

2.They are ranked from 1 to 7 where p is the rank.

3.Then the values of (p 0.5)/7 are calculated. This indicates where in the normal distribution each effect is likely to fall. For example, the value for the fourth coefficient is 0.5, meaning that the coefficient might be expected in the centre of the distribution, corresponding to a standard deviation from the mean of 0, as illustrated

in Figure 2.14.

EXPERIMENTAL DESIGN

45

 

 

4.Then work out how many standard deviations corresponding to the area under the normal curve calculated in step 3, using normal distribution tables or standard functions in most data analysis packages. For example, a probability of 0.9286

(coefficient b2) falls at 1.465 standard deviations. See Table A.1 in which a 1.46 standard deviations correspond to a probability of 0.927 85 or use the NORMINV function in Excel.

5.Finally, plot the size of the effects against the value obtained in step 4, to give, for the case discussed, the graph in Figure 2.15. The four central values fall roughly on a straight line, suggesting that only coefficients b1, b2 and b12, which deviate from the straight line, are significant.

Like many classical methods of data analysis, the normal probability plot has limitations. It is only useful if there are several factors, and clearly will not be much use in the case of two or three factors. It also assumes that a large number of the factors are not significant, and will not give good results if there are too many significant effects. However, in certain cases it can provide useful preliminary graphical information, although probably not much used in modern computer based chemometrics.

2

1.5

1

0.5

0

−10

−5

0

5

10

15

−0.5

−1

−1.5

−2

Figure 2.15

Normal probability plot

46

CHEMOMETRICS

 

 

2.2.4.6 Dummy Factors

Another very simple approach is to include one or more dummy factors. These can be built into a design, and might, for example, be the colour of shoes worn by the experimenter, some factor that is not likely to have a real effect on the experiment; level 1 might correspond to black shoes and level +1 to brown shoes. Mathematical models can be built including this factor, and effects smaller than this factor ignored (remembering as ever to ensure that the scaling of the data is sensible).

2.2.4.7 Limitations of Statistical Tests

Whereas many traditionalists often enjoy the security that statistical significance tests give, it is important to recognise that these tests do depend on assumptions about the underlying data that may not be correct, and a chemist should be very wary of making decisions based only on a probability obtained from a computerised statistical software package without looking at the data, often graphically. Some typical drawbacks are as follows.

Most statistical tests assume that the underlying samples and experimental errors fall on a normal distribution. In some cases this is not so; for example, when analysing some analytical signals it is unlikely that the noise distribution will be normal: it is often determined by electronics and sometimes even data preprocessing such as the common logarithmic transform used in electronic absorption and infrared spectroscopy.

The tests assume that the measurements arise from the same underlying population. Often this is not the case, and systematic factors will come into play. A typical example involves calibration curves. It is well known that the performance of an instrument can vary from day to day. Hence an absorption coefficient measured on Monday morning is not necessarily the same as the coefficient measured on Tuesday morning, yet all the coefficients measured on Monday morning might fall into the same class. If a calibration experiment is performed over several days or even hours, the performance of the instrument may vary and the only real solution is to make a very large number of measurements over a long time-scale, which may be impractical.

The precision of an instrument must be considered. Many typical measurements, for example, in atomic spectroscopy, are recorded to only two significant figures. Consider a dataset in which about 95 % of the readings were recorded between 0.10 and 0.30 absorbance units, yet a statistically designed experiment tries to estimate 64 effects. The t-test provides information on the significance of each effect. However, statistical tests assume that the data are recorded to indefinite accuracy, and will not take this lack of numerical precision into account. For the obvious effects, chemometrics will not be necessary, but for less obvious effects, the statistical conclusions will be invalidated because of the low numerical accuracy in the raw data.

Often it is sufficient simply to look at the size of factors, the significance of the lack-of-fit statistics, perform simple ANOVA or produce a few graphs, to make valid scientific deductions. In most cases, significance testing is used primarily for a preliminary modelling of the data and detailed experimentation should be performed after eliminating those factors that are deemed unimportant. It is not necessary to have a very

EXPERIMENTAL DESIGN

47

 

 

detailed theoretical understanding of statistical significance tests prior to the design and analysis of chemical experiments, although a conceptual appreciation of, for example, the importance of coding is essential.

2.2.5 Leverage and Confidence in Models

An important experimental question relates to how well quantitative information can be predicted after a series of experiments has been carried out. For example, if observations have been made between 40 and 80 C, what can we say about the experiment at 90 C? It is traditional to cut off the model sharply outside the experimental region, so that the model is used to predict only within the experimental limits. However, this approaches misses much information. The ability to make a prediction often reduces smoothly from the centre of the experiments, being best at 60 C and worse the further away from the centre in the example above. This does not imply that it is impossible to make any statement about the response at 90 C, simply that there is less confidence in the prediction than at 80 C, which, in turn, is predicted less well than at 60 C. It is important to be able to visualise how the ability to predict a response (e.g. a synthetic yield or a concentration) varies as the independent factors (e.g. pH, temperature) are changed.

When only one factor is involved in the experiment, the predictive ability is often visualised by confidence bands. The ‘size’ of these confidence bands depends on the magnitude of the experimental error. The ‘shape’, however, depends on the experimental design, and can be obtained from the design matrix (Section 2.2.3) and is influenced by the arrangement of experiments, replication procedure and mathematical model. The concept of leverage is used as a measure of such confidence. The mathematical definition is given by

H = D.(D .D)1.D

where D is the design matrix. This new matrix is sometimes called the hat matrix and is a square matrix with the number of rows and columns equal to the number of experiments. Each of n experimental points has a value of leverage hn (the diagonal element of the hat matrix) associated with it. Alternatively, the value of leverage can

be calculated as follows:

hn = dn .(D .D)1.d n

where dn is the row of the design matrix corresponding to an individual experiment. The steps in determining the values of leverage for a simple experiment are illustrated in Table 2.14.

1.Set up design matrix.

2.Calculate (D D)1. Note that this matrix is also used in the t-test, as discussed in Section 2.2.4.3.

3.Calculate the hat matrix and determine the diagonal values.

4.These diagonal values are the values of leverage for each experiment.

This numerical value of leverage has certain properties.

48

CHEMOMETRICS

 

 

Table 2.14 Calculation of leverage.

Соседние файлы в предмете Химия