- •Foreword
- •Contents
- •Contributor Current and Past Positions: Association for Academic Surgery
- •Contributors
- •Academic Surgeons as Bridge-Tenders
- •Types of Surgical Research
- •Going Forward
- •Selected Readings
- •Introduction
- •Preparation Phase
- •Assistant Professor
- •Job Search
- •The First Three Years
- •Career Development Awards (CDAs)
- •Contemplating a Mid-Career Move?
- •Approaching Promotion
- •Associate Professor and Transition to Full Professor
- •Conclusion
- •Selected Readings
- •Introduction
- •Reviewing the Literature
- •Developing a Hypothesis
- •Study Design
- •Selected Readings
- •Introduction
- •The Dual Loyalties of the Surgeon-Scientist
- •Human Subjects Research
- •Informed Consent
- •Surgical Innovation and Surgical Research
- •Conflict of Interest
- •Publication and Authorship
- •Conclusion
- •References
- •Sources of Error in Medical Research
- •Study Design
- •Inferential Statistics
- •Types of Variables
- •Measures of Central Tendency and Spread
- •Measures of Spread
- •Comparison of Numeric Variables
- •Comparison of Categorical Values
- •Outcomes/Health Services Research
- •Steps in Outcomes Research
- •The Basics of Advanced Statistical Analysis
- •Multivariate Analysis
- •Time-to-Event Analysis
- •Advanced Methods for Controlling for Selection Bias
- •Propensity Score Analysis
- •Instrumental Variable (IV) Analysis
- •Summary
- •Selected Readings
- •Transgenic Models
- •Xenograft Models
- •Noncancer Models
- •Alternative Vertebrate Models
- •Selected Readings
- •Overview
- •Intellectual Disciplines and Research Tools
- •Comparative Effectiveness Research
- •Patient-Centered Outcomes Research
- •Data Synthesis
- •Overview
- •Intellectual Disciplines and Research Tools
- •Disparities
- •Quality Measurement
- •Implementation Science
- •Patient Safety
- •Optimizing the Health Care Delivery System
- •Overview
- •Intellectual Disciplines and Research Tools
- •Policy Evaluation
- •Surgical Workforce
- •Conclusion
- •References
- •Introduction
- •What Is Evidence-Based Medicine?
- •Evidence-Based Educational Research
- •Forums for Surgical Education Research
- •Conducting Surgical Education Research
- •Developing Good Research Questions
- •Beginning the Study Design Process
- •Developing a Research Team
- •Pilot Testing
- •Demonstrating Reliability and Validity
- •Developing a Study Design
- •Data Collection and Analysis
- •Surveys
- •Ethics
- •Funding
- •Conclusions
- •Selected Readings
- •Genomics
- •Gene-Expression Profiling
- •Proteomics
- •Metabolomics
- •Conclusions
- •References
- •Selected Readings
- •Introduction
- •Why Write
- •Getting Started
- •Where and When to Write
- •Choosing the Journal
- •Instructions to Authors
- •Writing
- •Manuscript Writing Order
- •Figures and Tables
- •Methods
- •Results
- •Figure Legends
- •Introduction
- •Discussion
- •Acknowledgments
- •Abstract
- •Title
- •Authorship
- •Revising Before Submission
- •Responding to Reviewer Comments
- •References
- •Selected Readings
- •Introduction
- •Origins of the Term
- •Modern Definition and Primer
- •Transition from Mentee to Colleague
- •Mentoring Risks
- •Conclusion
- •References
- •Selected Readings
- •The Career Development Plan
- •Choosing the Mentor
- •Writing the Career Development Plan
- •The Candidate
- •Research Plan
- •Final Finishing Points About the Research Plan
- •Summary
- •References
- •Introduction
- •Decisions, Decisions!
- •Mission Impossible: Defining a Laboratory Mission or Vision
- •Project Planning
- •Saving Money
- •Seek Help
- •People
- •Who Should I Hire?
- •Advertising
- •References
- •Interviews
- •Conduct a Structured Interview
- •Probation Period
- •Trainees
- •Trainee Funding
- •Time Is on Your Mind
- •Research Techniques
- •Program Leadership
- •Summary
- •Selected Readings
- •Introduction
- •Direct Evidence
- •Indirect Evidence
- •Burnout
- •Prevention of and Recovery from Work–Life Imbalance
- •Action Plan for Finding Balance: Personal Level
- •Action Plan for Finding Balance: Professional Level
- •Conclusion
- •References
- •Introduction
- •Time Management Strategies
- •Planning and Prioritizing
- •Delegating and Saying “No”
- •Action Plans
- •Activity Logs
- •Scheduling Protected Time
- •Eliminating Distractions
- •Buffer Time
- •Goal Setting
- •Completing Large Tasks
- •Maximizing Efficiency
- •Get Organized
- •Multitasking
- •Think Positive
- •Summary
- •References
- •Selected Readings
- •Index
Chapter 5. Analyzing Your Data |
65 |
historical. Randomized clinical trials (RCTs) are considered the gold standard. Rigorous randomization and large sample sizes minimize or eliminate errors due to confounding, bias, and chance. Disadvantages of RCTs include significant time and expense, narrow cohort selection which limits generaliz- ability, and difficulty accruing patients. In clinical medicine, it is not always possible to conduct RCTs. They require equi- poise, significant resources, and reasonable expectation of patient accrual.
Inferential Statistics
The majority of research studies are based on a sample and make inferences about the truth in the overall population. A statistical hypothesis is a statement of belief about popula- tion parameters. The purpose of hypothesis testing is to per- mit generalizations from a sample to the population from which it came. Hypothesis testing confirms or refutes the assertion that the observed findings in a study occurred by chance alone. The null hypothesis, symbolized by H0, is a statement claiming that there is no difference between the observed findings and the population, or that the findings occurred by chance alone.The alternative hypothesis, H1, is a statement claiming that there is an association, or that the finding did not occur by chance alone.
By constructing a 2 × 2 table (Table 5.2), we can evaluate the possible outcomes of a study. The inference of a study is
TABLE 5.2 Hypothesis testing
|
True population results |
|
Experimental results |
No association |
Association |
|
|
|
No Association |
Correct |
b or type II errorb |
Association |
a or type I errora |
Correct |
aP-value is equal to the probability of a type I error bPower = 1 − b where b is the probability of a type II error
66 T.S. Riall
correct if a significant association is not found when there truly is no association or vice versa. However, inferences are subject to two types of errors. Type I errors or alpha (a) errors occur when a significant association is found when, in truth, there is no association. The alpha level refers to the probability of a type I error. By convention, most statistical analyses set a at 0.05, which means that if we reject the null hypothesis (confirm an association), there is less than a 5% chance that the findings occurred by chance alone. The P-value, which is calculated from a statistical test, is a mea- sure of the probability of a type I error. If the P-value is less than a, then we reject the null hypothesis and conclude that the result is statistically significant. The P-value is an arbi- trary cutoff point and gives no information about the strength of the association, only that the outcome did not occur by chance. A P-value may be statistically significant but the observed association clinically irrelevant,which is common in studies with very large sample sizes. The use of confidence intervals instead of P-values is increasingly common, as these intervals convey information about the clinical significance, the magnitude of the differences, and the precision of the measurement. The convention is to use 95% confidence intervals. Values or estimates that are statistically different from one another will have nonoverlapping 95% confidence intervals.Wide confidence intervals indicate lack of precision in the measurement, possibly resulting from random variabil- ity in the data or small sample sizes.
When a study demonstrates no significant association, the potential error of concern is a type II or beta (b) error. Type II errors are expressed as power. The power of a study is the probability of finding a significant association if one truly exists. Power is defined as 1 – probability of a type II error (b). Acceptable power is usually set at greater than or equal to 80%. Power is directly related to sample size and is calcu- lated differently for different statistical methods. There are four elements in a power analysis: a, b, effect size, and sample size. The effect size is the difference that you want or expect to be able to detect between two groups, and it should be
Chapter 5. Analyzing Your Data |
67 |
clinically meaningful. For the previously used example of the effect of a new anastomotic technique on the development of pancreatic fistula, you need to know the expected rate of fis- tula formation (~20%) and the expected reduction with the new intervention. I caution you against choosing an effect size that is clinically irrelevant (i.e., 30% reduction in fistula) in order to make the power over 80%. Power increases with increasing sample size.You should work with your statistician before you begin a study to ensure that you will realistically be able to accrue enough patients to generate sufficient power to answer your question.
Types of Variables
Patient characteristics can be measured on various scales using different types of variables. The variable type determines the statistical methodology. Broadly speaking, data can be cate- gorical (qualitative) or numerical (quantitative). Within cate- gorical data, variables are often nominal. This is the simplest level of measurement where data values fall into mutually exclusive categories. Examples include sex, race, the presence or absence of a condition (i.e., congestive heart failure), or dichotomous outcomes (yes or no). Nominal data can have more than two different groups. Nominal data are generally described in terms of proportions or percentages and are often best summarized or displayed as bar charts or pie charts.
When inherent ordering occurs among nominal categories of a variable, the variable is called ordinal.A classic example would be tumor staging. There is inherent ordering in the tumor staging scale with stage IV tumors having a worse prognosis than stage I tumors. Although inherent ordering exists, it is important to remember that the distance between two adjacent categories is not necessarily the same through- out the scale. The clinical implications between tumor stages I and II may be vastly different than the difference between stages III and IV. Ordinal data are also summarized using proportions and percentages.
68 T.S. Riall
Numerical scales are used for quantitative observations. These can be discrete or continuous.A continuous scale, such as age, duration of survival, or operative time, has numbers on a continuum. Continuous data can be reported to a high degree of precision, and the situation will dictate the preci- sion required. For example, age can be reported to the closest integer for adults, but in infants, data to the nearest month might be required. A discrete scale consists of data that can take on integer values only. Examples are counts such as the number of hospital admissions, number of previous opera- tions, or number of falls.
Descriptive Statistics and Comparison
of Groups
Measures of Central Tendency and Spread
Numeric data can be summarized by measures of central ten- dency such as mean, median, and mode, and in terms of mea- suresofspreadordispersion,suchasrange,standarddeviation, and interquartile range. The most common measure of cen- tral tendency is the mean,or arithmetic average of the numer- ical observations. It is the sum of the observations divided by the number of observations.The mean is sensitive to extreme outlying values, especially when the sample size is small. The median is the middle observation, where half the observa- tions are smaller and half are bigger.The median is calculated by arranging the observations from smallest to largest and counting to find the middle value. If there is an even number of observations, the median is the mean of the two middle values.The median is less sensitive to extreme values than the mean. We often use median values to describe survival. A median survival of 18 months after a curative-intent opera- tion for pancreatic cancer indicates that half the people who have such an operation will survive that long.The mode is the value that occurs the most frequently, commonly used for large numbers of observations. If a dataset has two modes, it is called bimodal.
FIGURE 5.2 Commonly seen distributions of observations in clini- cal studies. (a) Normal distribution.The mean is equal to the median. (b) Positively skewed or skewed to the right.
The mean is greater than the median due to large outlying observations. (c) Negatively skewed or skewed to the left.The mean is less than the median due to small outlying observations
Chapter 5. Analyzing Your Data |
69 |
a
Mean = Median
b
Median Mean
|
Mean > Median |
|
c |
Mean |
Median |
Mean < Median
When determining which measure of central tendency is best, you need to consider the scale of the measurement (ordinal or numerical) and the shape of the distribution of observations (Fig. 5.2). If observations are evenly distributed around the mean, the mean is equal to the median, and the distribution is symmetric (Fig. 5.2a). If outlying observations are all large, the mean will be larger than the median, and the distribution will be skewed to the right (positively skewed, Fig. 5.2b). If they are all small, the distribution mean will be lower than the median, and the distribution will be skewed to the left (negatively skewed, Fig. 5.2c), respectively.The mean should be used for numerical data that are not skewed. The median can be used for ordinal or for numerical data with a skewed distribution. The mode is useful for bimodal distributions. For example, there is a bimodal distribution for