Error Rates, Decisive Outcomes and Publication Bias with Several
Inferential Methods
Will G. Hopkins · Alan M. Batterham
Institute of Sport Exercise and Active Living, Victoria University, Melbourne,
Victoria, Australia; Health and Social Care Institute, Teesside University,
Middlesbrough, United Kingdom
Abstract
Background Statistical methods for inferring true
magnitude of an effect from a sample should have acceptable error rates when the
true effect is trivial (Type-I rates) or substantial (Type-II rates).
Objectives To quantify error rates, rates of decisive
(publishable) outcomes, and publication bias of five inferential methods
commonly used in sports medicine and science. The methods were conventional
null-hypothesis significance testing (NHST; significant and non-significant
imply respectively substantial and trivial true effects); conservative NHST
(the observed magnitude is interpreted as the true magnitude only for significant
effects); non-clinical magnitude-based inference (MBI; the true magnitude is
interpreted as the magnitude range of the 90% confidence interval only for
intervals not spanning substantial values of opposite sign); clinical MBI (a
possibly beneficial effect is recommended for implementation only if it is most
unlikely harmful); and odds-ratio clinical MBI (implementation is also
recommended when odds of benefit outweigh odds of harm, with odds ratio
>66).
Methods Simulation was used to quantify
standardized mean effects in 500,000 randomized controlled trials each for true
standardized magnitudes ranging from null through marginally moderate with
three sample sizes: suboptimal (10+10), optimal for MBI (50+50), and optimal
for NHST (144+144).
Results Type-I rates for non-clinical MBI were
always lower than for NHST. When Type-I rates for clinical MBI were higher,
most errors were debatable, given the probabilistic qualification of those
inferences (unlikely or possibly beneficial). NHST often had unacceptable rates
either for Type-II errors or decisive outcomes, and it had substantial
publication bias with the smallest sample size, whereas MBI had no such
problems.
Conclusion
Magnitude-based
inference is a trustworthy nuanced alternative to null hypothesis significance
testing, which it outperforms on sample size, error rates, decision rates, and
publication bias.
Key Points
Null-hypothesis significance testing
(NHST) is increasingly criticised for its failure to deal adequately with
conclusions about the true magnitude of effects in research on samples.
A relatively new approach,
magnitude-based inference (MBI), provides up-front comprehensible nuanced
uncertainty in effect magnitudes.
In simulations of randomised
controlled trials, MBI outperforms NHST in respect of inferential error rates,
rates of publishable outcomes with suboptimal sample sizes, and publication
bias with such samples.