Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Herbert Chen - Success in Academic Surgery - 2012.pdf
Скачиваний:
16
Добавлен:
21.03.2016
Размер:
4 Mб
Скачать

Chapter 5.  Analyzing Your Data

65

historical. Randomized clinical trials (RCTs) are considered the gold standard. Rigorous randomization and large sample sizes minimize or eliminate errors due to confounding, bias, and chance. Disadvantages of RCTs include significant time and expense, narrow cohort selection which limits generaliz- ability, and difficulty accruing patients. In clinical medicine, it is not always possible to conduct RCTs. They require equi- poise, significant resources, and reasonable expectation of patient accrual.

Inferential Statistics

The majority of research studies are based on a sample and make inferences about the truth in the overall population. A statistical hypothesis is a statement of belief about popula- tion parameters. The purpose of hypothesis testing is to per- mit generalizations from a sample to the population from which it came. Hypothesis testing confirms or refutes the assertion that the observed findings in a study occurred by chance alone. The null hypothesis, symbolized by H0, is a statement claiming that there is no difference between the observed findings and the population, or that the findings occurred by chance alone.The alternative hypothesis, H1, is a statement claiming that there is an association, or that the finding did not occur by chance alone.

By constructing a 2 × 2 table (Table 5.2), we can evaluate the possible outcomes of a study. The inference of a study is

TABLE 5.2  Hypothesis testing

 

True population results

Experimental results

No association

Association

 

 

 

No Association

Correct

b or type II errorb

Association

a or type I errora

Correct

aP-value is equal to the probability of a type I error bPower = 1 − b where b is the probability of a type II error

66 T.S. Riall

correct if a significant association is not found when there truly is no association or vice versa. However, inferences are subject to two types of errors. Type I errors or alpha (a) errors occur when a significant association is found when, in truth, there is no association. The alpha level refers to the probability of a type I error. By convention, most statistical analyses set a at 0.05, which means that if we reject the null hypothesis (confirm an association), there is less than a 5% chance that the findings occurred by chance alone. The P-value, which is calculated from a statistical test, is a mea- sure of the probability of a type I error. If the P-value is less than a, then we reject the null hypothesis and conclude that the result is statistically significant. The P-value is an arbi- trary cutoff point and gives no information about the strength of the association, only that the outcome did not occur by chance. A P-value may be statistically significant but the observed association clinically irrelevant,which is common in studies with very large sample sizes. The use of confidence intervals instead of P-values is increasingly common, as these intervals convey information about the clinical significance, the magnitude of the differences, and the precision of the measurement. The convention is to use 95% confidence intervals. Values or estimates that are statistically different from one another will have nonoverlapping 95% confidence intervals.Wide confidence intervals indicate lack of precision in the measurement, possibly resulting from random variabil- ity in the data or small sample sizes.

When a study demonstrates no significant association, the potential error of concern is a type II or beta (b) error. Type II errors are expressed as power. The power of a study is the probability of finding a significant association if one truly exists. Power is defined as 1 – probability of a type II error (b). Acceptable power is usually set at greater than or equal to 80%. Power is directly related to sample size and is calcu- lated differently for different statistical methods. There are four elements in a power analysis: a, b, effect size, and sample size. The effect size is the difference that you want or expect to be able to detect between two groups, and it should be

Chapter 5.  Analyzing Your Data

67

clinically meaningful. For the previously used example of the effect of a new anastomotic technique on the development of pancreatic fistula, you need to know the expected rate of fis- tula formation (~20%) and the expected reduction with the new intervention. I caution you against choosing an effect size that is clinically irrelevant (i.e., 30% reduction in fistula) in order to make the power over 80%. Power increases with increasing sample size.You should work with your statistician before you begin a study to ensure that you will realistically be able to accrue enough patients to generate sufficient power to answer your question.

Types of Variables

Patient characteristics can be measured on various scales using different types of variables. The variable type determines the statistical methodology. Broadly speaking, data can be cate- gorical (qualitative) or numerical (quantitative). Within cate- gorical data, variables are often nominal. This is the simplest level of measurement where data values fall into mutually exclusive categories. Examples include sex, race, the presence or absence of a condition (i.e., congestive heart failure), or dichotomous outcomes (yes or no). Nominal data can have more than two different groups. Nominal data are generally described in terms of proportions or percentages and are often best summarized or displayed as bar charts or pie charts.

When inherent ordering occurs among nominal categories of a variable, the variable is called ordinal.A classic example would be tumor staging. There is inherent ordering in the tumor staging scale with stage IV tumors having a worse prognosis than stage I tumors. Although inherent ordering exists, it is important to remember that the distance between two adjacent categories is not necessarily the same through- out the scale. The clinical implications between tumor stages I and II may be vastly different than the difference between stages III and IV. Ordinal data are also summarized using proportions and percentages.

68 T.S. Riall

Numerical scales are used for quantitative observations. These can be discrete or continuous.A continuous scale, such as age, duration of survival, or operative time, has numbers on a continuum. Continuous data can be reported to a high degree of precision, and the situation will dictate the preci- sion required. For example, age can be reported to the closest integer for adults, but in infants, data to the nearest month might be required. A discrete scale consists of data that can take on integer values only. Examples are counts such as the number of hospital admissions, number of previous opera- tions, or number of falls.

Descriptive Statistics and Comparison

of Groups

Measures of Central Tendency and Spread

Numeric data can be summarized by measures of central ten- dency such as mean, median, and mode, and in terms of mea- suresofspreadordispersion,suchasrange,standarddeviation, and interquartile range. The most common measure of cen- tral tendency is the mean,or arithmetic average of the numer- ical observations. It is the sum of the observations divided by the number of observations.The mean is sensitive to extreme outlying values, especially when the sample size is small. The median is the middle observation, where half the observa- tions are smaller and half are bigger.The median is calculated by arranging the observations from smallest to largest and counting to find the middle value. If there is an even number of observations, the median is the mean of the two middle values.The median is less sensitive to extreme values than the mean. We often use median values to describe survival. A median survival of 18 months after a curative-intent opera- tion for pancreatic cancer indicates that half the people who have such an operation will survive that long.The mode is the value that occurs the most frequently, commonly used for large numbers of observations. If a dataset has two modes, it is called bimodal.

FIGURE 5.2  Commonly seen distributions of observations in clini- cal studies. (a) Normal distribution.The mean is equal to the median. (b) Positively skewed or skewed to the right.

The mean is greater than the median due to large outlying observations. (c) Negatively skewed or skewed to the left.The mean is less than the median due to small outlying observations

Chapter 5.  Analyzing Your Data

69

a

Mean = Median

b

Median Mean

 

Mean > Median

 

c

Mean

Median

Mean < Median

When determining which measure of central tendency is best, you need to consider the scale of the measurement (ordinal or numerical) and the shape of the distribution of observations (Fig. 5.2). If observations are evenly distributed around the mean, the mean is equal to the median, and the distribution is symmetric (Fig. 5.2a). If outlying observations are all large, the mean will be larger than the median, and the distribution will be skewed to the right (positively skewed, Fig. 5.2b). If they are all small, the distribution mean will be lower than the median, and the distribution will be skewed to the left (negatively skewed, Fig. 5.2c), respectively.The mean should be used for numerical data that are not skewed. The median can be used for ordinal or for numerical data with a skewed distribution. The mode is useful for bimodal distributions. For example, there is a bimodal distribution for

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]