A Spreadsheet for Monitoring an Individual's Changes and Trend Will G Hopkins^{} Sportscience 21, 59, 2017 (sportsci.org/2017/wghtrend.htm) Update Feb 2018. Martin Buchheit, the reviewer of this article,
has published a very useful summary of methods (Buchheit, 2018) for defining smallest important changes in the new postpublication
peerreviewed journal Sport Performance & Science Reports. In response to his article, I have now
included instructions in the spreadsheet and below on monitoring for changes that are "unusual" (very
unlikely) for an individual, when you don't have a welldefined smallest
important change. I have now also provided a panel of cells to estimate the chances
of seeing the various outcomes (unclear,
possible increase, very likely increase, possible decrease, and very likely decrease) for a given true
change. (See these slides
for a partial explanation of the derivation of the chances–for experts
only.) These cells offer a more direct
way than the simulation spreadsheet to see how often you will make the wrong
conclusion about changes. See the comments associated with these cells and
with the cell for setting the meaning of very
unlikely. 

Appendix 1: Smallest Important
Changes Performance tests for athletes competing
for a best competition score Tests and performance indicators with uncertain
relationships to performance Measures unrelated to performance. IntroductionThis article and the accompanying spreadsheet represent a long overdue resource for practitioners in the disciplines of exercise and sport science who want to monitor the progress of their individual athletes, patients, clients or teams in a reasonably rigorous quantitative fashion. In a previous article/slideshow and spreadsheet I focused on quantitative assessment of change since the previous test or measurement, but there has always been a need for quantitative assessment of trend over multiple tests to determine the extent to which the individual is on track for improvement or recovery. Equally, there is a need for assessment of acute deviations from the trend to determine the likelihood of benefit or harm from shortterm changes in training or other strategies. This new spreadsheet meets all these needs. The previous spreadsheet included a panel for inserting a single measurement to estimate the chances that the individual's true value falls either side of a reference value or threshold, such as a critical blood concentration of vitamin D. The present spreadsheet cannot be used for such calculations, but I will produce a similar upgraded resource, if there is enough demand for it. The new spreadsheet can be used to track an individual team's scores, but if the data for the team come from the sum of the scores of individual athletes, you should consider analyzing the athletes' data as a time series with either the postonly crossover spreadsheet or mixed modeling. You can then perform the usual samplebased inferential analyses, which are potentially publishable, if you have got some interesting results with the team. The new spreadsheet comes as a workbook with a second spreadsheet for simulating realistic test scores and thereby investigating error rates with known true values for starting score, trend, changes on chosen time points, and typical error of measurement. Instructions on their use are included at the top of both spreadsheets and are supplemented with comments in some cells. Both spreadsheets require input of values for the smallest important change and for a target trend. Input of a value for the typical (standard) error of measurement is also desirable. There are numerous articles at this site and elsewhere about the meaning and estimation of these values, which at the request of the reviewer I have summarized in appendices. What follows is a brief description of how the inferences are derived for the three kinds of monitoring: change between measurements, deviation of one measurement or the average of several measurements from the trend, and magnitude of the trend. The operation of the spreadsheet for simulating data is also described. Change between MeasurementsThe probability that the true change is a substantial increase is calculated by combining the observed change, the smallest important change, and the shortterm typical error (with its degrees of freedom) into the relevant value of the t statistic. The probability of a substantial decrease is calculated in a similar manner. The probability of a trivial change is the difference between 100% and the sum of the percent probabilities of a substantial increase and decrease. To the extent that measurements on an individual represent a sample of the individual's true values, it seems reasonable to apply the rules of magnitudebased inference to make conclusions about the individual's true changes. I have chosen the nonclinical version of magnitudebased inference, according to which a change is unclear if the true change could be both a substantial increase and a substantial decrease. I have set the default value for could to 10%, rather than the usual 5%, which in simulations (described below) does not produce enough clear outcomes for real changes with noisy measures (typical error ³ smallest important change). Unclear changes are indicated with a "?". Clear changes are indicated with «, , and/or ¯ for trivial changes, substantial increases and/or substantial decreases respectively, and very likely changes (those with chances >90%) are indicated with an asterisk (*). Inserting a value for the shortterm typical error is optional. If you do not have a value, the spreadsheet uses the typical error of the estimate from the trend analysis, which will generally overestimate the error of measurement (if the individual made sporadic real changes off the trend) and thereby make it harder to get clear outcomes. Also, you need enough test scores (at least 10) on the individual to get even modest precision for the typical error of the estimate. Deviation from the TrendA scatterplot of the data allows for easy identification of the scores to include for calculating the trend line. The plot shows the regression line flanked by dashed lines defining a zone of trivial changes plus uncertainty in predicted scores. When a score falls outside the zone defined in this manner, the true difference from the predicted score is approximately likely (>75% chance) to be substantial. A trend line is fitted to chosen assessments using the Excel functions for simple linear regression, then the difference between the score predicted by the linear trend and the other test scores is analyzed in the same manner as for changes between measurements. (Formulae for the predicted scores and their uncertainty were copied from the spreadsheet for validity.) When measures are noisy, averaging repeated measurements is the best strategy to reducing the noise. I have therefore included an option for averaging a cluster of values to compare with the values predicted by the trend line. Magnitude of the TrendIf you insert a value for a change that the individual wants or needs to show over an extended period (which you also have to insert; e.g., 10 weeks, a season, or a year), the spreadsheet divides the change by the period to give a target trend. This trend is then treated as a smallest important trend, and the spreadsheet combines this value with the slope of the regression line and its uncertainty to give probabilities that the true trend is greater than the positive trend, less than the negative trend, and trivial (falls between ± this trend). Magnitudebased inferences for the trend are provided in the same manner as for changes. (The formula for the uncertainty in the slope was copied from the spreadsheet for validity.) Simulating Test ScoresYou will be a better practitioner if you have some idea of the errors you can make when assessing changes and trend in an individual. I have therefore gone to considerable trouble to make a version of the spreadsheet that simulates and analyses test scores similar to yours, so you can see directly how often errors will occur. To generate the score at a given time point, the spreadsheet combines a true starting score with a true trend multiplied by the elapsed time, adds any extra true change at one or more time points, then adds a value of the true typical error multiplied by NORMSINV(RAND()), which simulates the effect of normally distributed random measurement error. In this way the spreadsheet generates a sample of the kind of scores you will get with tests on your individual, if you have inserted realistic values of true typical error, true trend, true extra changes, and of course a true starting score. The spreadsheet then performs inferences on changes and trends in the same manner as for the other spreadsheet, using values for the smallest important change and target trend. In the absence of an estimate of the typical error from a reliability study, a value can come from an initial analysis of the individual with the first spreadsheet. Find the cell showing the typical error of the estimate from the fitting of the trend line (Cell E81), and use that as the typical error in the simulation sheet. You can also use a typical error obtained by combining the typical errors of the estimate from several individuals, by adding their squares and taking the square root. To use the simulation spreadsheet, start with a zero true trend and zero extra changes, so you can see how often you get Type1 (falsepositive) errors: that is, how often the inference shows a nonexistent substantial trend and nonexistent substantial changes. You should play with the number of time points included in the trend analysis, to get some idea of how many you will need with your real data. Then insert a realistic true trend to see how often you get falsenegative (Type2) errors: how often the inference fails to show a substantial trend. Similarly insert substantial changes at individual time points to see how often the inferences fail to show changes from the linear trend and changes from the previous test. The Type1 error rate for changes from the previous test depends on the relative magnitudes of typical error and smallest important change. The Type1 error rate for change from the trend depends additionally on the number of time points included in the trend, which along with the smallest important trend also contributes to the Type1 error rate for the trend itself. All these factors contribute to the Type2 error rates, along with the magnitude of the true changes and trend. For noisy measures you will have to learn to live with high error rates. It is inevitable that the two new spreadsheets have bugs, hopefully only minor formatting errors or ambiguous instructions. Please get back to me by email if you encounter problems or especially if you have any suggestions for improvement. References
Buchheit M (2018). Magnitudes
matter more than beetroot Juice. Sport Performance & Science Reports 1, https://sportperfsci.com/magnitudesmattermorethanbeetrootjuice/ Published June 2017 Appendix 1: Smallest Important ChangesFor a description of smallest important effects for all types of outcomes, see Linear Models and Effect Magnitudes (Hopkins, 2010). See also a slideshow focusing on medalwinning enhancements presented at the performance analysis conference in 2016. For the following summary I have also drawn on Martin Buchheit's recent article on monitoring athletes in the Aspetar Journal (now published here). Performance tests for athletes competing for a best competition scoreFor an athlete who is already winning medals
regularly, the smallest improvement in performance results in an extra medal
every 10 competitions. Assuming little interaction between the athletes in
the competition, the change in performance score (time, distance, weight
lifted, or judges' ratings) producing the extra medal is 0.3 of the
variability that top athletes show from competition to competition. Expressed
as a coefficient of variation, the variability ranges from ~0.5% to ~3.0% for
top athletes, depending on the sport. You have to take into account a kinetic
factor to convert 0.3 of this variability in competition time or distance
into the smallest important change in test performance, if the test measure
is power output. The test protocol also has to be taken into account, if the
test measure is time in an incremental or constantpower test to exhaustion
and/or there is a fatiguing preload. See Malcata and Hopkins (2014) for a comprehensive review. Subelite athletes generally show more
variability in competition than elites do (perhaps by a factor of 1.11.4),
so the smallest important change for winning medals in local or agegroup
competitions is correspondingly larger. But a target trend may be more
important than winning medals for these athletes. See Appendix 2. Tests and performance indicators with uncertain relationships to performanceWhen you do not know exactly how much change
in a test or performance indicator is associated with winning an extra match
or medal every 10 competitions, standardization is the only way to define
smallest important changes. You will need an estimate of the betweensubject
standard deviation for athletes on a similar level to your athlete, and in
the same playing position for teamsport athletes. (You may pool playing
positions to get enough players for a reasonably accurate estimate, if you
are satisfied that mean test scores are unlikely to differ substantially
between the playing positions.) Onefifth (0.2) of this standard deviation is
then the smallest important change. Measures unrelated to performanceA test unrelated to performance might still
be worth administering if it is related to health. In particular, the
smallest important change in a test related to risk of injury or illness can
be defined as the change or difference associated with the smallest important
reduction in risk in a relevant epidemiological study: a risk (hazard) ratio
of 0.90. In the absence of information on risk,
standardization is the only approach. Strictly speaking, the betweensubject
standard deviation for a health indicator should be "pure" or free
of shortterm typical error of measurement (in contrast to that for measures
of performance, which for teamsport athletes should be the observed
betweensubject standard deviation). The pure standard deviation is Ö(observed SD^{2} – typical error^{2}) and requires a
shortterm reliability study on subjects similar to your individual for
estimation of the SD and error. Measures where standardization is not appropriateUnfortunately, some physiological measures
have real differences between individuals that are only a fraction of the
typical error (withinsubject variation); that is, any observed differences
are due almost entirely to daytoday variation. The variation itself may
differ substantially between individuals. Standardization fails for such
measures. Martin Buchheit (2018) has suggested using a fraction or
multiple of the typical error as the
smallest important change for such measures, in the sense that any observed
change greater than this amount can be regarded as a change not due simply to
error of measurement. This approach amounts to identifying when a change is
"unusual" for the individual. With the current spreadsheet, you can
identify such changes by specifying what you mean by unusual with the cell that defines the meaning of very unlikely. The default of 10% is
probably about right for most situations; it corresponds to accepting a
single change score as a real change when the change is greater than ~2.0
typical errors (t´Ö2´e, where t is
the value of the t statistic for a cumulative probability of 0.9), as shown
in one of the cells. You then set the smallest important change to zero
(actually a very small number, such as 0.001) rather than specifying a
fraction or multiple of the typical error. All unusual changes are now marked
with either * or ¯*, and all other changes are unclear (marked with ?). Keep in mind
that, if the real change is zero, you will see unusual increases 10% of the
time and unusual decreases 10% of the time. Finally, the above advice is based on the
assumption that the test measure could have a useful relationship to
performance or health. If you have a strong suspicion that the measure has no
such relationship, stop measuring it and put your resources into measuring
something that is more likely to have a relationship. Appendix 2: A Target TrendFor an individual who wants or needs to improve over a period longer than simply the time to the next test, you can decide on a realistic change over a chosen period. The rate of change (the change divided the period) is then the target trend. For a development athlete aiming (eventually) for Olympic selection, work out the change needed over the next season or year to keep the athlete on target. For subelite teamsport athletes, the difference between the athlete's test or performance indicator and that of typical athletes in an elite team is a reasonable target; you then have to decide on a reasonable period over which you hope to gain that improvement. Similar reasoning would apply to measures of muscle mass, body fat, or health indices such as blood lipids or vitamin D. The target trend evaluated over the period between consecutive tests is another approach to estimating the smallest important change. For example, if the target improvement for the coming year is 4%, then with three months between tests, the smallest important change of 1% will keep the athlete on track for the target. Appendix 3: Typical ErrorThe optional shortterm typical or standard error of measurement represents the standard deviation of an individual's repeated measurements in the absence of any real change in the individual. As such, the error can come from a shortterm reliability study of individuals similar to your individual. Alternatively, if you have at least 10 or preferably more consecutive repeated tests on the individual, and there is no reason why there should be substantial changes between the tests other than a gradual trend, the standard error of measurement derived from fitting the trend line is a good estimate of the typical error. Hence, if you do not include a value for the typical error, the spreadsheet will use the standard error of the estimate. You may use the standard error of the estimate from another individual or individuals when you are starting tests on a new individual. You may also average standard errors of the estimate, but do it by averaging their squares and taking the square root. To be more accurate, weight the squares by their degrees of freedom (number of points in their trend line minus 2). The degrees of freedom of the mean is the sum the degrees of freedom. If the performance test and score are similar to those of the competition (e.g., a timetrial time for a track athlete, a throwing or jumping distance for a field athlete, an ergometer timetrial time for cyclist, rower or kayaker), and if you are satisfied that the athlete performs the test as reliably as a competition, then the typical error of the test is the published variability of top athletes in that sport, as stated in the Malcata and Hopkins (2014) review and in my recent slideshow. Note that in this case the smallest important change is automatically 0.3´ the typical error of the test. You should consider using the smallest important change as calculated in this manner, but make the typical error somewhat larger than the variability in competitions, by a factor of perhaps ~1.3´. Or you can use the value from a reliability study, or if that's not available, let the spreadsheet use the typical error of the estimate from the trend analysis. If the test scores are counts and proportions of counts (e.g., most performance indicators from the analysis of players' actions in games), the typical error depends on the individual's mean count or proportion, but the estimate of typical error from a reliability study will be an average across all the subjects in the study. It is possible to estimate approximate typical errors for the counts and proportions of the individual by assuming they have Poisson and binomial sampling distributions respectively, but then there is a potential problem of nonrandomness of the counts, which usually produces a larger typical error ("overdispersion"). I suggest using the estimate of typical error based on the sampling distribution^{a} until you have enough data to let the spreadsheet use the typical error of the estimate from the trend analysis. –––––– ^{a}Count:Ön, where n is the number of events. Proportion (%): (100/m)Ö[n(mn)/m], where n and m are the numbers of events and trials. 
