Advice on the
Use of MBI: a Comment on The Vindication of
Magnitude-Based Inference Will G Hopkins, Alan M Batterham Sportscience
22, sportsci.org/2018/CommentsOnMBI/wghamb.htm,
2018 Summary: The editor of Medicine and Science in Sports and Exercise
has ordered rejection of manuscripts containing magnitude-based inference
(MBI). We therefore advise authors submitting such manuscripts to that
journal to describe their inferences as reference
Bayesian with a dispersed uniform prior, which identifies MBI with a
formal "objective" Bayesian method. Decisions about the true
magnitude of an effect can be justified by referring to the existing MBI
threshold probabilities, which are similar to but more conservative than
those used by the Intergovernmental Panel on Climate Change. We also counter
the latest attack on MBI by a journalist and explain how a return to
null-hypothesis significance testing will reduce the generalizability of
meta-analyses and impair the career development of some young researchers. For those who missed recent news on social
media, Bruce Gladden, the editor-in-chief of Medicine and Science in Sports and Exercise (MSSE), has
instructed his associate editors not to accept manuscripts with effects
assessed using magnitude-based inference (MBI). The news that he currently
intends to enshrine this decision in journal policy was announced in the latest ill-conceived attack
on MBI by a journalist, Christie Aschwanden. We say ill-conceived rather than ill-informed,
because Aschwanden was well informed by us prior to publication of her news.
More on that at the end of this article. First, though, we may still be able
to publish inferences about the magnitudes of effects in MSSE. Let's see why. MSSE has published a letter ahead of print (Borg et al., 2018), in
which the authors call for the use of Bayesian inference as an answer to the
apparent problems of MBI that Kristin Sainani (2018)
identified in her critique and that spawned attacks on MBI online (see our earlier comment
for links). Others have previously recommended use of Bayesian inference in
sport science (Mengersen et al., 2016),
including Sainani herself. Presumably, therefore, the editor of MSSE will not
order "desk" rejection of manuscripts containing Bayesian
inference. What is Bayesian inference? Very simply, you
make probabilistic statements about the magnitude of the true or population
value of the particular effect statistic that you have investigated. Sound
familiar? Yes, it's MBI. MBI is Bayesian, so what is the problem? The problem
is that Sainani and authors of a previous critique of MBI (Welsh and Knight, 2015) made
the astonishing claim that MBI is not Bayesian, and because Sainani and those
previous authors are card-carrying statisticians at top institutions
(Stanford and the Australian National University), people who have not looked
closely at the literature assume that MBI is not Bayesian. We point you now
to earlier references written by or for clinicians, and a recent comment by statistician
Roderick Little, where it is stated unequivocally that the estimates of
probabilities of the magnitude of effects provided by the same method as MBI
are valid Bayesian estimates: Burton (1994), Burton
et al. (1998), Gurrin
et al. (2000),
Shakespeare et al. (2001),
Shakespeare et al. (2008), and
Little (2018). Furthermore,
the estimates are obtained with the same straightforward calculations used
for p values and confidence limits. This quote from Gurrin et al. (2000) is a
compelling summary: "The congruence between a Bayesian analysis using a
uniform prior and a conventional analysis provides a non-threatening
introduction to Bayesian methods and means that analyses of the type we
describe can be carried out on standard software. Our approach is
straightforward to implement, offers the potential to describe the results of
conventional analyses in a manner that is more easily understood, and leads
naturally to rational decisions." There is, however, a subtle issue we need to
address. Bayesian statisticians are true to the spirit of their founder,
Thomas Bayes, according to whose eponymous theorem the probability of
something (e.g., that a treatment is beneficial) can be derived from the data
of your study combined with the probability of the something before you got
your data. Modern-day Bayesians have worked out ways to express the latter
probability, known as a prior, and
to combine it with the usual data from the study of a sample to get the posterior probability. The issue here
is that it is difficult to justify the prior objectively, so Bayesians quite
rightly regard the prior as a belief, and the posterior probability gets
labelled with terms like credibility.
Fine, but that doesn't solve the problem of
quantifying the prior belief. To give you an example, at the recent conference of the
European College of Sport Science, one of us (WGH) presented on behalf of a
Chinese colleague, FeiFei Li, a study of the
effects of high-intensity continuous vs intermittent exercise on markers of
cardiac damage in a crossover study of marathon runners. How can Feifei
quantify her prior belief in what the change in biomarkers will be? It turns
out she has to provide a probability distribution not only for the mean
change but also for the variance of the change. It's unrealistic for her and
any reasonably skeptical researcher to do that. For all she knows, the
periods of low-intensity exercise between the intervals might more than
compensate for the effects of the higher-intensity exercise during the
intervals, or maybe there will be the opposite effect. And that's not all.
Her analysis includes estimation of the modifying effect of the intensity of
exercise on the change in the biomarkers. What is her prior belief about this
effect? She has no idea, so she has to go into the analysis without a prior
belief. If there were any studies on this effect in print, she might be able
to do a meta-analysis of the outcomes in those studies to get an objective
prior, but the posterior would then be relevant only to her study setting,
whereas readers want an outcome generalizable to their setting and indeed to any
setting. The meta-analysis, if it hasn't already been done, therefore should
be done as a comprehensive random-effect meta-regression after her study. For researchers who can't or won't provide
priors, perhaps owing to the concern that the prior will not be acceptable to
a skeptical scientific audience, there is a legitimate form of Bayesian
inference known as reference or calibrated Bayes with a dispersed uniform
prior. The prior here is minimally informative and is sometimes referred
to as flat and objective. As noted above, it gives the same answers as MBI (e.g., Gurrin et al., 2000; see also Roderick
Little's comment). So, if
we refer to our inferences about magnitudes in this manner, Bruce Gladden and
any other editors considering a ban on MBI should have no objection. Well,
they may still have two objections, but these can be addressed. The first potential objection is that, to
perform Bayesian analysis with a dispersed uniform prior, we should use the
formal mathematics of the Bayesians, otherwise it is not Bayesian. This was
the basis of the assertions by Sainani and by Welsh and Knight that MBI was
not Bayesian, and we consider it to be an unjustifiable technicality. Those
previous authors who recommended straightforward MBI-type analyses referred
to their inferences as Bayesian. Were they wrong, too? The second potential objection takes a
little longer to address. It arises from the fact that MBI provides advice
for making decisions about the true magnitude of effects. Some Bayesians
don't like making decisions; they are obviously concerned about exactitude
with estimation of probabilities, but when it comes to making decisions, they
are often silent. For example, we recently invited a noted critic of NHST,
who was one of the reviewers of our article in Sports Medicine (Hopkins and Batterham, 2016), to
support MBI. He declined, on the grounds that he prefers not to
"dichotomize outcomes". Despite repeated entreaties, he would not
be drawn on the responsibility of the risk-savvy statistician to advise clinicians and practitioners about implementation
of a treatment. As practitioners of sport and exercise science, we accept
responsibility for decisions, and we have devised decision guidelines based
on thresholds for probability labeled with terms ranging from most unlikely through various levels
of possibility to most likely. We
have discovered very recently that these thresholds are remarkably similar to
those used by the Intergovernmental Panel on Climate Change (IPCC; Mastrandrea et al., 2010), except
that ours are a little more conservative. For example, they define about as likely as not with
probabilities between 33% and 66%, whereas our possible is defined by 25% to 75%. Importantly, the IPCC qualify
their scale with this comment: "About
as likely as not should not be used to express a lack of knowledge."
We take this caveat to mean that possible
effects represent useful information. We have two sets of decision guidelines. The
first is for clinically or practically relevant effects–those that could be
used to make a decision about implementation of a treatment that could
benefit athletes, patients or clients (improve performance or health) or harm
them (impair performance or health). If the effect is possibly beneficial
(>25% chance) and most unlikely harmful (<0.5% risk), the effect is
deemed clear and potentially
implementable; if it is unlikely beneficial (<25% chance), it is deemed clear and not recommended for
implementation; and if it is possibly beneficial but has an unacceptable risk
of harm (>0.5%), it is deemed unclear
and the researcher is advised to get more data before making a decision.
There is a less conservative approach to clinical decision-making based on
odds of benefit and harm, but you can read about that elsewhere (e.g., Hopkins et al., 2009; Hopkins and Batterham, 2016). The second set of guidelines is for effects
that are not implementable as treatments or strategies, such as a comparison
of males and females. Here it is simply a matter of a substantial magnitude
of the effect: a difference in endurance between females and males, for
example. For such effects, it is only when a substantial or trivial
difference is very likely (>95% chance) or very unlikely (<5% chance)
that you declare the effect to be clear. For example, if the effect is very
likely to be a substantial increase, it is clearly not trivial or a
substantial decrease. Or for another example, if the effect is very unlikely
to be a substantial decrease, it is clearly something else: some likelihood
of trivial and/or substantially positive, depending on the probabilities of
those effects. This description of clear and unclear non-clinical effects is
consistent with our previous descriptions but is more concise. Obviously you can make mistakes with your decisions: for example, the true effect is harmful, but you decide it is implementable; or for another example, the true effect is trivial, but you decide it is substantial. We investigated the error rates for all the clinical and non-clinical decisions, and we found them to be generally less and often much less than those associated with decisions about magnitude based on null-hypothesis significance testing. And the error rates are acceptable. The high error rates calculated by Sainani are based either on her own unjustifiable re-definitions of our errors or on the bizarre notion that a clear effect deemed possibly substantially positive and possibly trivial (it is clear, because it is very unlikely substantially negative) represents a Type-I error, if the true effect is trivial. These and other assertions aimed at discrediting MBI have been accepted as gospel by Christie Aschwanden, by the authors of the recent letter to MSSE (Borg et al., 2018), and by Bruce Gladden. We absolutely stand by our definitions of error and by the resulting error rates that we calculated in our Sports Medicine article (Hopkins and Batterham, 2016). For a thorough account of Sainani's erroneous errors, see our vindication article (Hopkins and Batterham, 2018). Finally, the ill-conceived attack by Christie Aschwanden. She contacted us for comment before she submitted her article to her editor. Here is what we sent her, with the plea that "it would be so cool if your editor allowed this message to be published verbatim." The text in red was omitted from her published item, presumably because it represents inconvenient truths: What you regard as "shoddy statistics,"
and what has motivated the editor of MSSE to refuse manuscripts using
magnitude-based inference (MBI), is probably the fact that MBI allows
publication of some statistically non-significant effects. Apparently you and he have the mistaken idea
that authors and readers of the publications of such effects will consider
that the effects are "real" and that the literature
is getting corrupted with fake findings. But such effects are published
with probabilistic terms that properly reflect the uncertainty in the true
magnitude: not only the confidence intervals but also qualitative terms such
as possibly, likely, and so on. These
are proper estimates of the uncertainty, because they are legitimate
"reference" Bayesian estimates with a uniform dispersed prior, as
evidenced in our rebuttal article at sportsci.org and in the associated
post-publication comments. Kristin Sainani's assertion that MBI is not
Bayesian is absolutely wrong, along with her estimates of error rates and
other disgracefully incorrect assertions. Furthermore, effects published
according to the rules of MBI do not corrupt the literature. In fact, the
reverse is true: there is trivial publication bias with "clear" MBI
effects, whereas there is substantial publication bias with
"significant" effects. The qualitative probabilistic terms of
MBI are based on a scale that is remarkably similar to a scale used by the
Intergovernmental Panel on Climate Change to assess likelihood of
climatological effects in terms that their readers and the public can
understand. Apparently climatologists and sport scientists are so far the
only researchers concerned with making decisions about effects informed by an
understanding of probability of the true magnitudes rather than a p value
based on the null hypothesis, which, incidentally, is always false. In her news item, Aschwanden claimed that "over the years, statisticians have identified numerous problems with MBI." Her first problem was the critique of Welsh and Knight, which we comprehensively dismissed in our Sports Medicine article. The next problem is an interesting one: a claim that MBI has not been published in a recognized statistics journal and should not be used until it is. The articles promoting the computational approach and Bayesian interpretation that MBI uses were published in Statistics in Medicine (Burton, 1994) and in other journals specializing in the clinical application of statistics: Journal of Epidemiology and Community Health (Burton et al., 1998), Journal of Evaluation in Clinical Practice (Gurrin et al., 2000), and Medical Decision Making (Shakespeare et al., 2008). Our probability decision thresholds were supported by an informal survey of researchers on a mailing list, so it should not be surprising that they agree closely with the thresholds used by the IPCC. In any case, our extensive simulations showed that they resulted in acceptable error rates and trivial publication bias. In a message to the MSSE editor, we pointed out that the conversation should now be about the values of the decision thresholds, not error rates based on the impractical and ever-false null hypothesis. He has not responded to that suggestion. Aschwanden then claimed that "if MBI were really the revolutionary new method that its inventors claim it is, it should be taken up among many fields, but Gladden notes with concern that MBI is only used in sports and exercise science." What is at first surprising is that the articles promoting the MBI equivalent in medical journals have not been cited to anything like the extent that our MBI articles have been cited. There is a good reason for the disparity. Medical researchers have been encouraged to interpret the traditional confidence interval in a Bayesian fashion, but they have not been provided with the tools for calculating the probabilities and making decisions based on chances of benefit and risk of harm. Instead, they are still stuck in the perceived rut of having to get statistical significance before they are prepared or permitted say anything more. Enlightened medical researchers might then interpret the magnitude of the lower and upper confidence limits, but the statistical packages do not automatically calculate probabilities above or below user-defined magnitude thresholds, and researchers can get their larger studies into print without the probabilities, so what's the point? MBI has become popular in exercise and sport science, because we have provided the tools, and the tools provide the researchers with an avenue for publishing previously unpublishable effects from small samples. Aschwanden next went over the same old ground of Sainani's error rates, making the fatuous claim that a possibly substantial and possibly trivial effect incurs an error, if the true effect is trivial. She then made the patently false claim that "in practice, MBI may deem an intervention 'likely beneficial' even if the error bars show that it could be almost as likely to be useless". It is dispiriting to note this quote from the MSSE editor near the end of Aschwanden's news: "We need better reproducibility, not less." Reproducibility is purely a question of error rates, and MBI stacks up better than the traditional method for making inferences about magnitudes that matter. Following publication of Aschwanden's news item, we sent her the following message. …You and Sainani are doing your best to prevent publication of
small-scale studies in sport science and any other disciplines, and that is
really, really bad. Why? For two reasons. First, MBI-published effects from small-scale studies could contribute
to a meta-analysis and thereby push the overall sample size up for that
effect to something that gives definitive outcomes. The effects from
MBI-published studies are NOT biased, so the meta-analysis will not be
biased. MBI does not result in shoddy effects, Christie. It does not
contaminate the literature. And provided the published small studies are not
substantially biased–to repeat again, they aren't, with MBI–it's actually
better to meta-analyze a large number of small studies than a few large
studies, because you get better estimates of the modifying effects of study
and subject characteristics and thereby better generalizability to more
settings. Secondly, and equally importantly, you are making it harder for
research students to get publications, because they will need larger sample
sizes to get significance, often impractically large, if we are talking about
studies of competitive athletes. There will be many other disciplines where
there is a similar problem with getting enough subjects. You probably have no
idea how dispiriting it is to be a young researcher and get your manuscripts
rejected… We have a climate of manuscript rejection rather than one of
manuscript acceptance, engendered mainly by journals seeking ever higher
impact factors. They take pride in a high rejection rate! MBI is a big step
in the direction of getting more of the students' stuff into print. And it
does not result in publication bias, for the last time. The incredible irony of all this is that, when they do occasionally
get statistical significance and manage to get their studies into print, the
result is publication bias! If you don't understand that, you need to, ASAP,
and then start setting the record straight by making your next quest the
rehabilitation of magnitude-based inference. We also sent the
above message to the editor of MSSE. Neither he nor Aschwanden has replied so
far. To end on a positive
note, we received the following message just before publishing the present
comment, echoing Martin Buchheit's
cri de coeur… I am a sport physiologist across the pond in Canada. I wanted to
quickly reach out and thank you for all your work and resilience with MBIs to
support us scientists, practitioners and educators. You have provided a more
valid option for statistics in sport science and most importantly its
application in the real world. Having recently completed my PhD, MBIs were a
cornerstone for my work examining high performance athletes and utilizing its
findings to support coaches and the integrative support team. Realizing the importance of MBIs is much like Neo deciding to swallow
the RED pill in the movie The Matrix. When making the decision to learn and
understand MBIs, one can never revert to "conventional" methods
unless forced to! You simply know better. As a young and developing scientist, I believe it is our job to
support fellow researchers. Please continue to stand tall with confidence.
Time will allow the dust to settle, and your work will be left standing for
everyone to see. Borg DN, Minett GM, Stewart IB, Drovandi CC
(2018). Bayesian methods might solve the problems with magnitude-based
inference. A letter in response to Dr. Sainani. Medicine and Science in
Sports and Exercise (in press), https://eprints.qut.edu.au/119403/ Burton PR (1994).
Helping doctors to draw appropriate inferences from the analysis of medical
studies. Statistics in Medicine 13, 1699-1713 Burton PR,
Gurrin LC, Campbell MJ (1998). Clinical significance not statistical
significance: A simple Bayesian alternative to p values. Journal of
Epidemiology and Community Health 52, 318-323 Gurrin LC,
Kurinczuk JJ, Burton PR (2000). Bayesian statistics in medical research: An
intuitive alternative to conventional data analysis. Journal of Evaluation in
Clinical Practice 6, 193-204 Hopkins WG,
Marshall SW, Batterham AM, Hanin J (2009). Progressive statistics for studies
in sports medicine and exercise science. Medicine and Science in Sports and
Exercise 41, 3-12 Hopkins WG,
Batterham AM (2016). Error rates, decisive outcomes and publication bias with
several inferential methods. Sports Medicine 46, 1563-1573 Hopkins WG,
Batterham AM (2018). The vindication of Magnitude-Based Inference.
Sportscience 22, 19-27 Little R
(2018). Calibrated Bayesian inference: a comment on The Vindication of
Magnitude-Based Inference. Sportscience 22,
sportsci.org/2018/CommentsOnMBI/rjl.htm Mastrandrea MD,
Field CB, Stocker TF, Edenhofer O, Ebi KL, Frame DJ, Held H, Kriegler E, Mach
KJ, Matschoss PR, Plattner G-K, Yohe GW, Zwiers FW (2010). Guidance Note for
Lead Authors of the IPCC Fifth Assessment Report on Consistent Treatment of
Uncertainties. Intergovernmental Panel on Climate Change (IPCC): https://www.ipcc.ch/pdf/supporting-material/uncertainty-guidance-note.pdf Mengersen KL,
Drovandi CC, Robert CP, Pyne DB, Gore CJ (2016). Bayesian estimation of small
effects in exercise and sports science. PloS One 11, e0147311 Sainani KL
(2018). The problem with "magnitude-based inference". Medicine and
Science in Sports and Exercise (in press) Shakespeare TP,
Gebski VJ, Veness MJ, Simes J (2001). Improving interpretation of clinical
studies by use of confidence levels, clinical significance curves, and
risk-benefit contours. Lancet 357, 1349-1353 Shakespeare TP,
Gebski V, Tang J, Lim K, Lu JJ, Zhang X, Jiang G (2008). Influence of the way
results are presented on research interpretation and medical decision making:
the PRIMER Collaboration Randomized Studies. Medical Decision Making 28,
127-137 Welsh AH,
Knight EJ (2015). "Magnitude-based Inference": A statistical
review. Medicine and Science in Sports and Exercise 47, 874-884 Back to index of comments. Back
to The Vindication of
Magnitude-Based Inference. Published 15
July 2018. |