SPORTSCIENCE · sportsci.org

Latest issue

Perspectives / Research Resources

This issue

Calibrated Bayesian Inference: a Comment on The Vindication of Magnitude-Based Inference

Roderick J Little

Sportscience 22, sportsci.org/2018/CommentsOnMBI/rjl.htm, 2018
Department of Statistics, University of Michigan, Ann Arbor, Michigan. rlittle@umich.edu 

Summary: Direct probability statements about the sizes of effects requires Bayesian methods. The Hopkins and Batterham approach appears to be a special case of “calibrated Bayes” inference, which seeks Bayesian inferences with “dispersed” priors that yield posterior credibility intervals with good frequentist properties. I think that in many settings calibrated Bayes is a good basis for inference. But we should not bury the prior distribution, which should be declared and subject to criticism, along with other aspects of the statistical model.

The ASA Statement on P-Values describes limitations of hypothesis testing that are broadly acknowledged by many statisticians (Wasserstein and Lazar, 2016). Confidence intervals are an improvement, since they focus on estimated sizes of effects with associated estimates of uncertainty. However, it is not possible to make direct statements about the "chance that the true effect is large" without being Bayesian, and therefore invoking a prior probability distribution for parameters. My impression of MBI is that it is basically computing the posterior distribution under a "dispersed" uniform prior, and then computing posterior probabilities of attaining various sizes of effects. I think this use of “dispersed priors” that do not inject strong prior information is often a reasonable approach. However, it has a long history, and I don't think it requires a new name.

In statistics, this approach is often called “objective Bayes”. The term is somewhat problematic, since there is no prior distribution that is completely “objective”, a criticism dating back to Fisher (1922). I prefer the term “calibrated Bayes”, an approach that seeks priors that lead to posterior distributions with good frequentist properties – for example, 95% posterior credibility intervals should be well “calibrated”, in the sense of having close to 95% confidence coverage in repeated sampling. Two excellent papers, by Box (1980) and Rubin (1984), capture the essence of this approach. A relatively non-technical discussion is Little (2006).

Concerns of subjectivity have led people to try to finesse the need to formulate a prior distribution. A famous example is Fisher’s mysterious “fiducial” inference, which seems only to work in special cases. However, I think prior distributions play an important role in the inference, as illustrated in the well-known "screening paradox" for a rare disease. Prior distributions need to be out in the open and subject to criticism, as for other features of a statistical model.

Box GEP (1980). Sampling and Bayes' inference in scientific modelling and robustness.  Journal of the Royal Statistical Society Series A 143, 383-430

Fisher RA (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London Series A 222, 309-368

Little RJA (2006). Calibrated Bayes: a Bayes/frequentist roadmap. The American Statistician 60, 213-223

Rubin DB (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics 12, 1151-1172

Wasserstein RL, Lazar NA (2016). The ASA's statement on p-values: context, process, and purpose. The American Statistician 70, 129-133

Back to index of comments.

Back to The Vindication of Magnitude-Based Inference.

First published 3 June 2018.

©2018