Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Robert I. Kabacoff - R in action

.pdf

Скачиваний:

Добавлен:

02.06.2015

Размер:

12.13 Mб

Скачать

☆

<<< < Предыдущая 15 16 17 18 19 20 21 22 23 24 25 2627 / 4827 28 29 30 31 32 33 34 35 36 37 38 39 > Следующая >>>

236	CHAPTER 9 Analysis of variance

Interaction Plot with 95% CIs

30
25
20
len
15
10
n=10	n=10	n=10	n=10	n=10	n=10
OJ 0.5	VC 0.5	OJ 1	VC 1	OJ 2	VC 2

Treatment and Dose Combination

Figure 9.7 Interaction between dose and delivery mechanism

on tooth growth. The mean plot with 95 percent confidence intervals was created by the plotmeans() function.

more tooth growth than Vitamin C. For 2mg of ascorbic acid, both delivery methods produced identical growth. Of the three plotting methods provided, I prefer the interaction2wt() function in the HH package. It displays both the main effects (the box plots) and the two-way interactions for designs of any complexity (two-way ANOVA, three-way ANOVA, etc.).

Although I don’t cover the tests of model assumptions and mean comparison procedures, they’re a natural extension of the methods you’ve seen so far. Additionally, the design is balanced, so you don’t have to worry about the order of effects.

dose

0.5

response.var

supp

len: main effects and 2−way interactions

len ~ supp | dose

len ~ dose | dose

len ~ supp | supp

len ~ dose | supp

0.5

supp

x.values

dose

30
25
20	len
15
10
5
30
25
20	len
15
10	Figure 9.8 Main effects

5and two-way interaction for the ToothGrowth dataset. This plot was created by the

interaction2way() function.

Repeated measures ANOVA

237

9.6Repeated measures ANOVA

In repeated measures ANOVA, subjects are measured more than once. This section focuses on a repeated measures ANOVA with one within-groups and one between-groups factor (a common design). We’ll take our example from the field of physiological ecology. Physiological ecologists study how the physiological and biochemical processes of living systems respond to variations in environmental factors (a crucial area of study given the realities of global warming). The CO2 dataset included in the base installation contains the results of a study of cold tolerance in Northern and Southern plants of the grass species Echinochloa crus-galli (Potvin, Lechowicz, & Tardif, 1990). The photosynthetic rates of chilled plants were compared with the photosynthetic rates of nonchilled plants at several ambient CO2 concentrations. Half the plants were from Quebec and half were from Mississippi.

In this example, we’ll focus on chilled plants. The dependent variable is carbon dioxide uptake (uptake) in ml/L, and the independent variables are Type (Quebec versus Mississippi) and ambient CO2 concentration (conc) with seven levels (ranging from 95 to 1000 umol/m^2 sec). Type is a between-groups factor and conc is a withingroups factor. The analysis is presented in the next listing.

Listing 9.7 Repeated measures ANOVA with one betweenand within-groups factor

>w1b1 <- subset(CO2, Treatment=='chilled')

>fit <- aov(uptake ~ conc*Type + Error(Plant/(conc), w1b1)

>summary(fit)

Error: Plant
	Df	Sum Sq Mean Sq F value			Pr(>F)
Type	1	2667.24	2667.24	60.414	0.001477	**
Residuals	4	176.60	44.15

---
Signif. codes:		0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: Plant:conc
	Df Sum Sq		Mean Sq F value		Pr(>F)
conc	1 888.57		888.57	215.46	0.0001253	***
conc:Type	1 239.24		239.24	58.01	0.0015952	**
Residuals	4	16.50	4.12
---
Signif. codes:		0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: Within
	Df Sum Sq		Mean Sq F value Pr(>F)
Residuals 30 869.05			28.97

>par(las=2)

>par(mar=c(10,4,4,2))

>with(w1b1, interaction.plot(conc,Type,uptake, type="b", col=c("red","blue"), pch=c(16,18),

main="Interaction Plot for Plant Type and Concentration"))

>boxplot(uptake ~ Type*conc, data=w1b1, col=(c("gold", "green")), main=”Chilled Quebec and Mississippi Plants”,

ylab=”Carbon dioxide uptake rate (umol/m^2 sec)”)

238				CHAPTER 9 Analysis of variance
		Interaction Plot for Plant Type and Concentration
	40
								Type


	35								Quebec
									Mississippi


of uptake	30
	30
	25
	25
mean	20


	15
	15
	10
	10

	95	175	250	350	500	675	1000
					conc

Figure 9.9 Interaction of ambient CO2 concentration and plant type on CO2 uptake. Graph produced by the interaction. plot() function.

The ANOVA table indicates that the Type and concentration main effects and the Type x concentration interaction are all significant at the 0.01 level. The interaction is plotted via the interaction.plot() function in figure 9.9.

In order to demonstrate a different presentation of the interaction, the boxplot() function is used to plot the same data. The results are provided in figure 9.10.

Chilled Quebec and Mississippi Plants

	sec)	40
		35
	(umol/m^2
		30
	rate	25
	uptake
		20
	dioxide
		15
	Carbon
		10

Quebec.95

Mississippi.95

Quebec.175

Mississippi.175

Quebec.250

Mississippi.250

Quebec.350

Mississippi.350

Quebec.500

Mississippi.500

Quebec.675

Mississippi.675

Quebec.1000

Mississippi.1000

Figure 9.10 Interaction of ambient CO2 concentration and plant type on CO2 uptake. Graph produced by the boxplot() function.

Multivariate analysis of variance (MANOVA)

239

From either graph, you can see that there’s a greater carbon dioxide uptake in plants from Quebec compared to Mississippi. The difference is more pronounced at higher ambient CO2 concentrations.

NOTE The datasets that you work with are typically in wide format, where columns are variables and rows are observations, and there’s a single row for each subject. The litter data frame from section 9.4 is a good example. When dealing with repeated measures designs, you typically need the data in long format before fitting your models. In long format, each measurement of the dependent variable is placed in its own row. The CO2 dataset follows this form. Luckily, the reshape package described in chapter 5 (section 5.6.3) can easily reorganize your data into the required format.

The many approaches to mixed-model designs

The CO2 example in this section was analyzed using a traditional repeated measures ANOVA. The approach assumes that the covariance matrix for any within-groups factor follows a specified form known as sphericity. Specifically, it assumes that the variances of the differences between any two levels of the within-groups factor are equal. In real-world data, it’s unlikely that this assumption will be met. This has led to a number of alternative approaches, including the following:

Using the lmer() function in the lme4 package to fit linear mixed models (Bates, 2005)

Using the Anova() function in the car package to adjust traditional test statistics to account for lack of sphericity (for example, Geisser–Greenhouse correction)

Using the gls() function in the nlme package to fit generalized least squares models with specified variance-covariance structures (UCLA, 2009)

Using multivariate analysis of variance to model repeated measured data (Hand, 1987)

Coverage of these approaches is beyond the scope of this text. If you’re interested in learning more, check out Pinheiro and Bates (2000) and Zuur et al. (2009).

Up to this point, all the methods in this chapter have assumed that there’s a single dependent variable. In the next section, we’ll briefly consider designs that include more than one outcome variable.

9.7Multivariate analysis of variance (MANOVA)

If there’s more than one dependent (outcome) variable, you can test them simultaneously using a multivariate analysis of variance (MANOVA). The following example is based on the UScereal dataset in the MASS package. The dataset comes from Venables

240	CHAPTER 9 Analysis of variance

& Ripley (1999). In this example, we’re interested in whether the calories, fat, and sugar content of US cereals vary by store shelf, where 1 is the bottom shelf, 2 is the middle shelf, and 3 is the top shelf. Calories, fat, and sugars are the dependent variables, and shelf is the independent variable with three levels (1, 2, and 3). The analysis is presented in the following listing.

Listing 9.8 One-way MANOVA

>library(MASS)

>attach(UScereal)

>y <- cbind(calories, fat, sugars)

>aggregate(y, by=list(shelf), FUN=mean)

	Group.1 calories		fat sugars
1	1	119	0.662	6.3
2	2	130	1.341	12.5
3	3	180	1.945	10.9

> cov(y)
	calories	fat sugars
calories	3895.2	60.67	180.38
fat	60.7	2.71	4.00
sugars	180.4	4.00	34.05

>fit <- manova(y ~ shelf)

>summary(fit)

Df Pillai approx F num Df den Df

Pr(>F)

shelf

1 0.1959

4.9550

61 0.00383 **

Residuals 63

---

Signif. codes:

0 '***'

0.001 '**'

0.01

'*'

0.05

'.' 0.1 ' ' 1

> summary.aov(fit)

Response calories :

univariate

Sum Sq

Mean Sq

F value

Pr(>F)

results

shelf

45313

13.995

0.0003983

***

Residuals

203982

3238

---

Signif. codes:

0 '***'

0.001 '**'

0.01

'*'

0.05

'.' 0.1 ' ' 1

Response fat :

Sum Sq

Mean Sq

F value

Pr(>F)

shelf

18.421

7.476 0.008108

Residuals

155.236

2.464

---

Signif. codes:

0 '***'

0.001 '**'

0.01

'*'

0.05

'.' 0.1 ' ' 1

Response sugars

Sum Sq

Mean Sq

F value

Pr(>F)

shelf

183.34

5.787 0.01909

Residuals

1995.87

31.68

---

Signif. codes:

0 '***'

0.001 '**'

0.01

'*'

0.05

'.' 0.1 ' ' 1

This listing uses the cbind() function to form a matrix of the three dependent variables (calories, fat, and sugars). The aggregate() function provides the shelf means, and the cov() function provides the variance and the covariances across cereals.

Multivariate analysis of variance (MANOVA)

241

The manova() function provides the multivariate test of group differences. The significant F value indicates that the three groups differ on the set of nutritional measures.

Because the multivariate test is significant, you can use the summary.aov() function to obtain the univariate one-way ANOVAs. Here, you see that the three groups differ on each nutritional measure considered separately. Finally, you can use a mean comparison procedure (such as TukeyHSD) to determine which shelves differ from each other for each of the three dependent variables (omitted here to save space).

9.7.1Assessing test assumptions

The two assumptions underlying a one-way MANOVA are multivariate normality and homogeneity of variance-covariance matrices.

The first assumption states that the vector of dependent variables jointly follows a multivariate normal distribution. You can use a Q-Q plot to assess this assumption (see the sidebar “A Theory Interlude” for a statistical explanation of how this works).

A theory interlude

If you have p x 1 multivariate normal random vector x with mean µ and covariance matrix Σ, then the squared Mahalanobis distance between x and µ is chi-square distributed with p degrees of freedom. The Q-Q plot graphs the quantiles of the chisquare distribution for the sample against the Mahalanobis D-squared values. To the degree that the points fall along a line with slope 1 and intercept 0, there’s evidence that the data is multivariate normal.

The code is provided in the following listing and the resulting graph is displayed in figure 9.11.

Listing 9.9 Assessing multivariate normality

>center <- colMeans(y)

>n <- nrow(y)

>p <- ncol(y)

>cov <- cov(y)

>d <- mahalanobis(y,center,cov)

>coord <- qqplot(qchisq(ppoints(n),df=p),

d, main="Q-Q Plot Assessing Multivariate Normality", ylab="Mahalanobis D2")

>abline(a=0,b=1)

>identify(coord$x, coord$y, labels=row.names(UScereal))

If the data follow a multivariate normal distribution, then points will fall on the line. The identify() function allows you to interactively identify points in the graph. (The identify() function is covered in chapter 16, section 16.4.) Here, the dataset appears to violate multivariate normality, primarily due to the observations for Wheaties Honey Gold and Wheaties. You may want to delete these two cases and rerun the analyses.

242	CHAPTER 9 Analysis of variance
	QQ Plot Assessing Multivariate Normality
	Wheaties Honey Gold
	40
	30
Mahalanobis D2	Wheaties
Mahalanobis D2	20
	10
	0

0	2	4	6	8	10	12	Figure 9.11 A Q-Q plot for
							Figure 9.11 A Q-Q plot for
		qchisq(ppoints(n), df = p)					assessing multivariate normality

The homogeneity of variance-covariance matrices assumption requires that the covariance matrix for each group are equal. The assumption is usually evaluated with a Box’s M test. R doesn’t include a function for Box’s M, but an internet search will provide the appropriate code. Unfortunately, the test is sensitive to violations of normality, leading to rejection in most typical cases. This means that we don’t yet have a good working method for evaluating this important assumption (but see Anderson [2006] and Silva et al. [2008] for interesting alternative approaches not yet available in R).

Finally, you can test for multivariate outliers using the aq.plot() function in the mvoutlier package. The code in this case looks like this:

library(mvoutlier) outliers <- aq.plot(y) outliers

Try it out and see what you get!

9.7.2Robust MANOVA

If the assumptions of multivariate normality or homogeneity of variance-covariance matrices are untenable, or if you’re concerned about multivariate outliers, you may want to consider using a robust or nonparametric version of the MANOVA test instead. A robust version of the one-way MANOVA is provided by the Wilks.test() function in the rrcov package. The adonis() function in the vegan package can provide the equivalent of a nonparametric MANOVA. Listing 9.10 applies Wilks.test() to our

example.

ANOVA as regression

243

Listing 9.10 Robust one-way MANOVA

library(rrcov)

> Wilks.test(y,shelf,method="mcd")

Robust One-way MANOVA (Bartlett Chi2)

data:	x
Wilks'	Lambda = 0.511, Chi2-Value = 23.71, DF = 4.85, p-value =
0.0002143
sample	estimates:
calories		fat sugars
1	120	0.701	5.66

2128 1.185 12.54

3161 1.652 10.35

From the results, you can see that using a robust test that’s insensitive to both outliers and violations of MANOVA assumptions still indicates that the cereals on the top, middle, and bottom store shelves differ in their nutritional profiles.

9.8ANOVA as regression

In section 9.2, we noted that ANOVA and regression are both special cases of the same general linear model. As such, the designs in this chapter could have been analyzed using the lm() function. However, in order to understand the output, you need to understand how R deals with categorical variables when fitting models.

Consider the one-way ANOVA problem in section 9.3, which compares the impact of five cholesterol-reducing drug regiments (trt).

>library(multcomp)

>levels(cholesterol$trt)

[1] "1time" "2times" "4times" "drugD" "drugE"

First, let’s fit the model using the aov() function:

>fit.aov <- aov(response ~ trt, data=cholesterol)

>summary(fit.aov)

	Df	Sum Sq Mean Sq F value			Pr(>F)
trt	4	1351.37	337.84	32.433	9.819e-13	***
Residuals	45	468.75	10.42

Now, let’s fit the same model using lm(). In this case you get the results shown in the next listing.

Listing 9.11 A regression approach to the ANOVA problem in section 9.3

>fit.lm <- lm(response ~ trt, data=cholesterol)

>summary(fit.lm)

Coefficients:
	Estimate Std. Error t value Pr(>\|t\|)
(Intercept)	5.782	1.021	5.665	9.78e-07	***
trt2times	3.443	1.443	2.385	0.0213	*

244		CHAPTER 9 Analysis of variance
trt4times	6.593	1.443	4.568	3.82e-05	***
trtdrugD	9.579	1.443	6.637	3.53e-08	***
trtdrugE	15.166	1.443	10.507	1.08e-13	***

Residual standard error: 3.227 on 45 degrees of freedom

Multiple R-squared: 0.7425, Adjusted R-squared: 0.7196

F-statistic: 32.43 on 4 and 45 DF, p-value: 9.819e-13

What are we looking at? Because linear models require numeric predictors, when the lm() function encounters a factor, it replaces that factor with a set of numeric variables representing contrasts among the levels. If the factor has k levels, k-1 contrast variables will be created. R provides five built-in methods for creating these contrast variables (see table 9.6). You can also create your own (we won’t cover that here). By default, treatment contrasts are used for unordered factors and orthogonal polynomials are used for ordered factors.

Table 9.6 Built-in contrasts

Contrast	Description

contr.helmert	Contrasts the second level with the first, the third level with the average of
	the first two, the four th level with the average of the first three, and so on.
contr.poly	Contrasts used for trend analysis (linear, quadratic, cubic, and so on.)
	based on or thogonal polynomials. Use for ordered factors with equally
	spaced levels.
contr.sum	Contrasts are constrained to sum to zero. Also called deviation contrasts,
	they compare the mean of each level to the overall mean across levels.
contr.treatment	Contrasts each level with the baseline level (first level by default). Also
	called dummy coding.
contr.SAS	Similar to contr.treatment but the baseline level is the last level. This
	produces coefficients similar to contrasts used in most SAS procedures.

With treatment contrasts, the first level of the factor becomes the reference group and each subsequent level is compared with it. You can see the coding scheme via the contrasts() function:

> contrasts(cholesterol$trt)
	2times 4times drugD drugE
1time	0	0	0	0
2times	1	0	0	0
4times	0	1	0	0
drugD	0	0	1	0
drugE	0	0	0	1

If a patient is in the drugD condition, then the variable drugD equals 1, and the variables 2times, 4times, and drugE will each equal zero. You don’t need a variable for the first group, because a zero on each of the four indicator variables uniquely determines that the patient is in the 1times condition.

Summary

245

In listing 9.11, the variable trt2times represents a contrast between the levels 1time and 2time. Similarly, trt4times is a contrast between 1time and 4times, and so on. You can see from the probability values in the output that each drug condition is significantly different from the first (1time).

You can change the default contrasts used in lm() by specifying a contrasts option. For example, you can specify Helmert contrasts by using

fit.lm <- lm(response ~ trt, data=cholesterol, contrasts="contr.helmert")

You can change the default contrasts used during an R session via the options() function. For example,

options(contrasts = c("contr.SAS", "contr.helmert"))

would set the default contrast for unordered factors to contr.SAS and for ordered factors to contr.helmert. Although we’ve limited our discussion to the use of contrasts in linear models, note that they’re applicable to other modeling functions in R. This includes the generalized linear models covered in chapter 13.

9.9Summary

In this chapter, we reviewed the analysis of basic experimental and quasi-experimental designs using ANOVA/ANCOVA/MANOVA methodology. We reviewed the basic terminology used, and looked at examples of between and within-groups designs, including the one-way ANOVA, one-way ANCOVA, two-way factorial ANOVA, repeated measures ANOVA, and one-way MANOVA.

In addition to the basic analyses, we reviewed methods of assessing model assumptions and applying multiple comparison procedures following significant omnibus tests. Finally, we explored a wide variety of methods for displaying the results visually. If you’re interested in learning more about the design of experiments (DOE) using R, be sure to see the CRAN View provided by Groemping (2009).

Chapters 8 and 9 have covered the statistical methods most often used by researchers in a wide variety of fields. In the next chapter, we’ll address issues of power analysis. Power analysis helps us to determine the sample sizes needed to detect an effect of a given size with a given degree of confidence, and is a crucial component of research design.

<<< < Предыдущая 15 16 17 18 19 20 21 22 23 24 25 2627 / 4827 28 29 30 31 32 33 34 35 36 37 38 39 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
26.03.20161.55 Mб15report.doc
#
04.09.2019123.9 Кб2report_praktika.doc
#
02.06.201534.78 Кб20Research_Proposal_v_3_0.docx
#
02.06.2015613.89 Кб16Rimskoe_pravo_bilety.doc
#
10.11.2019295.94 Кб8RI_lab.doc
#
02.06.201512.13 Mб89Robert I. Kabacoff - R in action.pdf
#
02.06.20152.89 Mб33Rossyskoe_zakonodatelstvo_X_XX_vekov_V_9-ti.doc
#
24.09.20195.23 Mб48RPZ.doc
#
26.03.2016112.64 Кб3Rules.doc
#
26.03.2016233.33 Кб125RUR2012.docx
#
26.03.2016355.13 Кб5Russia2013.pdf