Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
R in Action, Second Edition.pdf
Скачиваний:
540
Добавлен:
26.03.2016
Размер:
20.33 Mб
Скачать

Debugging

483

The four efficiency measures described in this section can help with everyday coding problems. But they only go so far in helping you to process really large datasets (for example, datasets in the terabyte range). When you’re working with big datasets, methods like those described in appendix G are required.

Locating bottlenecks

“Why is my code taking so long?” R provides tools for profiling programs in order to identify the most time-consuming functions. Place the code to be profiled between Rprof() and Rprof(NULL). Then execute summaryRprof() to get a summary of the time spent executing each function. See ?Rprof and ?summaryRprof for details.

Efficiency is little comfort when a program won’t execute or gives nonsensical results. Methods for uncovering programming errors are considered next.

20.5 Debugging

Debugging is the process of finding and reducing the number of errors or defects in a program. It would be wonderful if programs worked the first time. It would also be wonderful if unicorns lived in my neighborhood. In all but the simplest programs, errors occur. Determining the cause of these errors and fixing them is a time-consuming process. In this section, we’ll look at common sources of error and tools that can help to uncover errors.

20.5.1Common sources of errors

The following are some common reasons functions fail in R:

An object name is misspelled, or the object doesn’t exist.

There is a misspecification of the parameters in a function call.

The contents of an object aren’t what the user expects. In particular, errors are often caused by passing objects that are NULL or contain NaN or NA values to a function that can’t handle them.

The third reason is more common than you may think. It results from R’s terse approach to errors and warnings.

Consider the following example. For the mtcars dataset in the base installation, you want to provide the variable am (transmission type) with a more informative title and labels. Next, you want to compare the gas mileage of cars with automatic transmissions to those with manual transmissions:

> mtcars$Transmission <- factor(mtcars$a, levels=c(1,2),

labels=c("Automatic", "Manual")) > aov(mpg ~ Transmission, data=mtcars)

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

484

CHAPTER 20 Advanced programming

Yikes! (Embarrassing, but this is actually what I said.) What happened?

You didn’t get an “Object xxx not found” error, so you probably didn’t misspell a function, data frame, or variable name. Let’s look at the data that was passed to the aov() function:

> head(mtcars[c("mpg",

"Transmission")])

 

 

mpg

Transmission

Mazda RX4

 

21.0

Automatic

Mazda RX4 Wag

 

21.0

Automatic

Datsun 710

 

22.8

Automatic

Hornet 4 Drive

21.4

<NA>

Hornet Sportabout

18.7

<NA>

Valiant

 

18.1

<NA>

> table(mtcars$Transmission)

Automatic

Manual

 

13

 

0

 

There are no cars with a manual transmission. Looking back at the original dataset, the variable am is coded 0=automatic, 1=manual (not 1=automatic, 2=manual).

The factor() function happily did what you asked without warnings or errors. It set all cars with manual transmissions to automatic and all cars with automatic transmissions to missing. With only one group available, the analysis of variance failed. Confirming that each input to a function contains the expected data can save you hours of frustrating detective work.

20.5.2Debugging tools

Although examining object names, function parameters, and function inputs will uncover many sources of error, sometimes you have to delve into the inner workings of functions and functions that call functions. In these cases, the internal debugger that comes with R can be useful. Some helpful debugging functions are listed table 20.1.

Table 20.1 Built-in debugging functions

Function

Action

 

 

debug()

Marks a function for debugging.

undebug()

Unmarks a function for debugging.

browser()

Allows single-stepping through the execution of a function. While you’re debugging,

 

typing n or pressing <RET> (the Enter key) executes the current statement and

 

moves on to the next. Typing c continues execution to the end of the function without

 

single-stepping. Typing where displays the call stack, and Q halts execution and

 

jumps to the top level immediately. Other R commands like ls(), print(), and

 

assignment statements can also be submitted at the debugger prompt.

trace()

Modifies a function to allow debug code to be temporarily inserted.

untrace()

Cancels tracing and removes the temporary code.

traceback()

Prints the sequence of function calls that led to the last uncaught error.

 

 

Debugging

485

The debug() function marks a function for debugging. When the function is executed, the browser() function is called and allows you to step through the function’s execution one line at a time. The undebug() function turns this off, allowing the function to execute normally. You can temporarily insert debugging code into a function with the trace() function. This is particularly useful when you’re debugging base functions and CRAN-contributed functions that can’t be edited directly.

If a function calls other functions, it can be hard to determine where an error has occurred. In this case, executing the traceback() function immediately after an error will list the sequence of function calls that led to the error. The last call is the one that produced the error.

Let’s look at an example. The mad() function calculates the median absolute deviation for a numeric vector. You’ll use debug() to explore how this function works. The debugging session is displayed in the following listing.

Listing 20.4 A sample debugging session

> args(mad)

function (x, center = median(x), constant = 1.4826, na.rm = FALSE, low = FALSE, high = FALSE)

Views the formal b arguments

NULL

 

 

 

 

 

 

 

> debug(mad)

 

 

Sets the function

 

> mad(1:10)

 

 

 

 

c to debug

 

 

 

 

debugging in: mad(x)

 

 

 

 

 

 

 

 

 

 

debug: {

 

 

 

 

 

 

 

if (na.rm)

 

 

 

 

 

 

 

x <- x[!is.na(x)]

 

 

 

 

 

 

n <- length(x)

 

 

 

 

 

 

constant * if ((low || high) && n%%2 == 0) {

 

 

 

 

if (low && high)

 

 

 

 

 

 

stop("'low' and 'high' cannot be both TRUE")

 

n2 <- n%/%2 + as.integer(high)

 

 

 

 

sort(abs(x - center), partial = n2)[n2]

 

 

 

 

}

 

 

 

 

 

 

 

else median(abs(x - center))

 

 

 

d Lists objects

}

 

 

 

 

Browse[2]> ls()

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[1] "center"

"constant" "high"

"low"

"na.rm"

"x"

Browse[2]>

center

 

 

 

 

 

 

[1] 5.5

 

 

 

 

 

 

 

 

 

Browse[2]>

constant

 

 

 

 

 

[1] 1.4826

 

 

 

 

 

 

 

 

Browse[2]>

na.rm

 

 

 

 

 

 

[1] FALSE

 

 

 

 

 

 

 

 

 

Browse[2]>

x

 

 

 

 

 

 

 

[1]

1

2

3

4

5

6

7

8

9

10

Browse[2]> n

 

Single-steps

 

debug: if (na.rm) x <- x[!is.na(x)]

e through the code

Browse[2]> n

 

 

debug: n <- length(x) Browse[2]> n

debug: constant * if ((low || high) && n%%2 == 0) {

486 CHAPTER 20 Advanced programming

if (low && high)

stop("'low' and 'high' cannot be both TRUE") n2 <- n%/%2 + as.integer(high)

sort(abs(x - center), partial = n2)[n2]

} else median(abs(x - center))

 

 

Browse[2]>

print(n)

 

 

[1] 10

 

 

 

Browse[2]>

where

 

 

where 1: mad(x)

 

 

Browse[2]>

c

 

Resumes continuous

exiting from: mad(x)

 

f execution

[1] 3.7065

 

 

 

 

> undebug(mad)

 

 

First, the arg() function is used to display the argument names and default values for the mad() function b. The debug flag is then set using debug(mad) c. Now, whenever mad() is called, the browser() function is executed, allowing you to step through the function a line at a time.

When mad() is called, the session goes into browser() mode. The code for the function is listed but not executed. Additionally, the prompt changes to Browse[n]>, where n indicates the browser level. The number increments with each recursive call.

In browser() mode, other R commands can be executed. For example, ls() lists the objects in existence at a given point during the function’s execution d. Typing an object’s name displays its contents. If an object is named n, c, or Q, you must use print(n), print(c), or print(Q) to view its contents. You can change the values of objects by typing assignment statements.

You step through the function and execute the statements one at a time by entering the letter n or pressing the Return or Enter key e. The where statement indicates where you are in the stack of function calls being executed. With a single function, this isn’t very interesting; but if you have functions that call other functions, it can be helpful.

Typing c moves out of single-step mode and executes the remainder of the current function f. Typing Q exits the function and returns you to the R prompt.

The debug() function is useful when you have loops and want to see how values are changing. You can also embed the browser() function directly in code in order to help locate a problem. Let’s say that you have a variable X that should never be negative. Adding the code

if (X < 0) browser()

allows you to explore the current state of the function when the problem occurs. You can take out the extra code when the function is sufficiently debugged. (I originally wrote “fully debugged,” but this almost never happens, so I changed it to “sufficiently debugged” to reflect a programmer’s reality.)

20.5.3Session options that support debugging

When you have functions that call functions, two session options can help in the debugging process. Normally, when R encounters an error, it prints an error message

Debugging

487

and exits the function. Setting options(error=traceback) prints the call stack (the sequence of function calls that led to the error) as soon as an error occurs. This can help you to determine which function generated the error.

Setting options(error=recover) also prints the call stack when an error occurs. In addition, it prompts you to select one of the functions on the list and then invokes browser() in the corresponding environment. Typing c returns you to the list, and typing 0 quits back to the R prompt.

Using this recover() mode lets you explore the contents of any object in any function chosen from the sequence of functions called. By selectively viewing the contents of objects, you can frequently determine the origin of the problem. To return to R’s default state, set options(error=NULL). A toy example is given next.

Listing 20.5 Sample debugging session with recover()

f <- function(x, y){

Creates functions

z <-

x + y

g(z)

 

 

}

g <- function(x){

z <- round(x) h(z)

}

h <- function(x){ set.seed(1234) z <- rnorm(x) print(z)

}

>options(error=recover)

>f(2,3)

[1] -1.207

0.277

1.084 -2.346 0.429

> f(2, -3)

 

 

Error in rnorm(x)

: invalid arguments

Enter a frame number, or 0 to exit

1:f(2, -3)

2:#3: g(z)

3:#3: h(z)

4:#3: rnorm(x)

Selection:

4

 

Examines rnorm()

Called from: rnorm(x)

Browse[1]>

ls()

 

 

[1] "mean"

"n"

"sd"

 

Browse[1]>

mean

 

 

[1] 0

 

 

 

Browse[1]>

print(n)

 

[1] -1

 

 

 

Browse[1]>

c

 

 

Enters values

that cause an error

Enter a frame number, or 0 to exit

488

CHAPTER 20 Advanced programming

1:f(2, -3)

2:#3: g(z)

3:#3: h(z)

4:#3: rnorm(x)

Selection: 3

Examines h(z)

Called from: h(z)

Browse[1]> ls()

 

[1] "x"

 

Browse[1]> x

 

[1] -1

 

Browse[1]> c

 

Enter a frame number, or 0 to exit

1:f(2, -3)

2:#3: g(z)

3:#3: h(z)

4:#3: rnorm(x)

Selection: 2

Examines g(z)

Called from: g(z)

Browse[1]> ls()

 

[1] "x" "z"

 

Browse[1]> x

 

[1] -1

 

Browse[1]> z

 

[1] -1

 

Browse[1]> c

 

Enter a frame number, or 0 to exit

1:f(2, -3)

2:#3: g(z)

3:#3: h(z)

4:#3: rnorm(x)

Selection: 1

Examines f(2, -3)

Called from: f(2, -3)

Browse[1]> ls()

 

[1] "x" "y" "z"

 

Browse[1]> x

 

[1] 2

 

Browse[1]> y

 

[1] -3

 

Browse[1]> z

 

[1] -1

 

Browse[1]> print(f)

 

function(x, y){

 

z <- x + y

 

g(z)

 

}

 

Browse[1]> c

 

Enter a frame number, or 0 to exit

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]