Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

540

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

<<< < Предыдущая 80 81 82 83 84 85 86 87 88 89 90 9192 / 17392 93 94 95 96 97 98 99 100 101 102 103 104 > Следующая >>>

292	CHAPTER 12 Resampling statistics and bootstrapping

But what if you aren’t willing to assume that the sampling distribution of the mean is normally distributed? You can use a bootstrapping approach instead:

1Randomly select 10 observations from the sample, with replacement after each selection. Some observations may be selected more than once, and some may not be selected at all.

2Calculate and record the sample mean.

3Repeat the first two steps 1,000 times.

4Order the 1,000 sample means from smallest to largest.

5Find the sample means representing the 2.5th and 97.5th percentiles. In this case, it’s the 25th number from the bottom and top. These are your 95% confidence limits.

In the present case, where the sample mean is likely to be normally distributed, you gain little from the bootstrap approach. Yet there are many cases where the bootstrap approach is advantageous. What if you wanted confidence intervals for the sample median, or the difference between two sample medians? There are no simple normaltheory formulas here, and bootstrapping is the approach of choice. If the underlying distributions are unknown, if outliers are a problem, if sample sizes are small, or if parametric approaches don’t exist, bootstrapping can often provide a useful method of generating confidence intervals and testing hypotheses.

12.6 Bootstrapping with the boot package

The boot package provides extensive facilities for bootstrapping and related resampling methods. You can bootstrap a single statistic (for example, a median) or a vector of statistics (for example, a set of regression coefficients). Be sure to download and install the boot package before first use:

install.packages("boot")

The bootstrapping process will seem complicated, but once you review the examples it should make sense.

In general, bootstrapping involves three main steps:

1Write a function that returns the statistic or statistics of interest. If there is a single statistic (for example, a median), the function should return a number. If there is a set of statistics (for example, a set of regression coefficients), the function should return a vector.

2Process this function through the boot() function in order to generate R bootstrap replications of the statistic(s).

3Use the boot.ci() function to obtain confidence intervals for the statistic(s) generated in step 2.

Now to the specifics.

The main bootstrapping function is boot(). It has the format

bootobject <- boot(data=, statistic=, R=, ...)

Description

The observed values of k statistics applied to the original data

An R × k matrix, where each row is a bootstrap replicate of the k statistics

Bootstrapping with the boot package

293

The parameters are described in table 12.3.

Table 12.3 Parameters of the boot() function

Parameter	Description

data	A vector, matrix, or data frame.
statistic	A function that produces the k statistics to be bootstrapped (k=1 if bootstrap-
	ping a single statistic). The function should include an indices parameter that
	the boot() function can use to select cases for each replication (see the
	examples in the text).
R	Number of bootstrap replicates.
...	Additional parameters to be passed to the function that produces the statistic
	of interest.

The boot() function calls the statistic function R times. Each time, it generates a set of random indices, with replacement, from the integers 1:nrow(data). These indices are used in the statistic function to select a sample. The statistics are calculated on the sample, and the results are accumulated in bootobject. The bootobject structure is described in table 12.4.

Table 12.4 Elements of the object returned by the boot() function

Element

You can access these elements as bootobject$t0 and bootobject$t.

Once you generate the bootstrap samples, you can use print() and plot() to examine the results. If the results look reasonable, you can use the boot.ci() function to obtain confidence intervals for the statistic(s). The format is

boot.ci(bootobject, conf=, type= )

The parameters are given in table 12.5.

Table 12.5 Parameters of the boot.ci() function

Parameter	Description

bootobject	The object returned by the boot() function.
conf	The desired confidence interval (default: conf=0.95).
type	The type of confidence interval returned. Possible values are norm, basic,
	stud, perc, bca, and all (default: type="all")

294	CHAPTER 12 Resampling statistics and bootstrapping

The type parameter specifies the method for obtaining the confidence limits. The perc method (percentile) was demonstrated in the sample mean example. bca provides an interval that makes simple adjustments for bias. I find bca preferable in most circumstances. See Mooney and Duval (1993) for an introduction to these methods.

In the remaining sections, we’ll look at bootstrapping a single statistic and a vector of statistics.

12.6.1Bootstrapping a single statistic

The mtcars dataset contains information on 32 automobiles reported in the 1974 Motor Trend magazine. Suppose you’re using multiple regression to predict miles per gallon from a car’s weight (lb/1,000) and engine displacement (cu. in.). In addition to the standard regression statistics, you’d like to obtain a 95% confidence interval for the R-squared value (the percent of variance in the response variable explained by the predictors). The confidence interval can be obtained using nonparametric bootstrapping.

The first task is to write a function for obtaining the R-squared value:

rsq <- function(formula, data, indices) { d <- data[indices,]

fit <- lm(formula, data=d) return(summary(fit)$r.square)

}

The function returns the R-squared value from a regression. The d <- data[indices,] statement is required for boot() to be able to select samples.

You can then draw a large number of bootstrap replications (say, 1,000) with the following code:

library(boot)

set.seed(1234)

results <- boot(data=mtcars, statistic=rsq, R=1000, formula=mpg~wt+disp)

The boot object can be printed using

> print(results)

ORDINARY NONPARAMETRIC BOOTSTRAP

Call:

boot(data = mtcars, statistic = rsq, R = 1000, formula = mpg ~ wt + disp)

Bootstrap Statistics :
original	bias	std. error
t1* 0.7809306	0.01333670	0.05068926

and plotted using plot(results). The resulting graph is shown in figure 12.2.

				Bootstrapping with the boot package								295
		Histogram of t
	8					0.90
						0.85
	6
						0.80
Density	4				t*	0.75
						0.70
	2
						0.65
	0					0.60
	0.6	0.7	0.8	0.9		−3	−2	−1	0	1	2	3
			t*			Quantiles of Standard Normal

Figure 12.2 Distribution of bootstrapped R-squared values

In figure 12.2, you can see that the distribution of bootstrapped R-squared values isn’t normally distributed. A 95% confidence interval for the R-squared values can be obtained using

> boot.ci(results, type=c("perc", "bca")) BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 1000 bootstrap replicates

CALL :

boot.ci(boot.out = results, type = c("perc", "bca"))

Intervals :
Level	Percentile			BCa
95%	( 0.6838,	0.8833 )	(	0.6344,	0.8549 )
Calculations and		Intervals on		Original	Scale

Some BCa intervals may be unstable

You can see from this example that different approaches to generating the confidence intervals can lead to different intervals. In this case, the bias-adjusted interval is moderately different from the percentile method. In either case, the null hypothesis H0: R-square = 0 would be rejected, because zero is outside the confidence limits.

In this section, you estimated the confidence limits of a single statistic. In the next section, you’ll estimate confidence intervals for several statistics.

<<< < Предыдущая 80 81 82 83 84 85 86 87 88 89 90 9192 / 17392 93 94 95 96 97 98 99 100 101 102 103 104 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
05.08.2019741.83 Кб0psihologia.rtf
#
02.06.2015162.69 Кб76Psyh_final_ver.docx
#
02.06.2015141.74 Кб44Psyh_final_ver.docx
#
26.03.2016226.3 Кб23public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб540R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб17Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб4Raeff_Modernity.pdf
#
26.03.20162.12 Mб19raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
02.06.201563.98 Кб4referat_IOGP_3.docx