Diff for "FAQ/rxxy_correction" - CBU statistics Wiki
location: Diff for "FAQ/rxxy_correction"
Differences between revisions 1 and 35 (spanning 34 versions)
Revision 1 as of 2011-01-24 17:13:52
Size: 2321
Editor: PeterWatson
Comment:
Revision 35 as of 2019-01-07 15:41:57
Size: 6648
Editor: PeterWatson
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= An unbiased method of assessing the influence of baseline on change from baseline =
Line 2: Line 3:
= Correcting r(x,x-y) for bias due to correlating two terms involving x = Consider a baseline score x and a later score on the same test (e.g. using a new treatment), y. The correlation, r(x,x-y), between a baseline score and the difference score x-y (ie correlating baseline score with change from baseline) is biased to be considerably greater than zero. Tu and Gilthorpe (2007), among others, illustrate that for two independent ''random'' variables, x and y with the ''same variance'', the correlation between x and the x-y difference is approximately, not zero as one might expect, but, for a large enough sample size, a whopping 0.71 (= the reciprocal of root 2). Indeed they show that the x,x-y correlation will always fall between 0 and 0.71 if x and y have the same variance.
Line 4: Line 5:
Consider a baseline score x and a later score on the same test, y. The correlation between a baseline score and the difference score x-y (baseline score with change from baseline) is biased to be considerably greater than zero. Tu and Gilthorpe (2007) among others illustrate that for two independent ''random'' variables, x and y with the same variance the correlation between x and the x-y difference is approximately not zero as one might expect but the reciprocal of root 2 = 0.71. This positive bias between x and x-y is caused by y only appearing in one of the terms being correlated which means an unwanted covariance between baseline x and the retest, y, is added into the correlation since x*(x-y) = x*x - ''x*y''. This biasing due to x appearing in both terms being correlated is a problem known as ''mathematical coupling''. It follows that the correlation between x and y-x is biased negatively from zero which intuitively says that the smaller baseline scores tend to have larger increases.
Line 6: Line 7:
This positive bias between x and x-y is caused by a variable, baseline x in this case, appearing in both sides of the terms being correlated. It follows that the correlation between x and y-x is biased negatively from zero which intuitively says that the smaller baseline scores tend to have larger increases. There is also a problem caused by measurement error which may 'throw' scores higher or lower than their true value meaning that by chance the next sampled scores could be lower or higher respectively. This process where scores are falsely made more extreme by measurement error leading to less extreme successive scores is an illustration of ''regression to the mean'' which can give ''false'' differences in score variances due to errors in measurement. In practice it is not always easy to get an idea of the amount, if any, of measurement error and so solutions such as those mentioned below usually assume equality of measurement error in x and y.
Line 8: Line 9:
Tu et al. (2005) therefore suggest that if it can be assumed that the variances of x and y are equal the correlation between baseline and x-y change should be compared using Fisher's test to the square root of 0.5(1-r)where r is the correlation between x and y. Both sides need to be Fisher transformed to do this comparisons. SPREADSHEET TO BE ADDED. Most authors suggest comparing the ''variances'' of x and y since the variances at later times would be expected to be smaller as scores 'bunch up' if there is a relationship between x and x-y. So a difference in variances in scores x and y will illustrate a relationship between baseline score and change. Variances of x and y do not suffer from the bias problem associated with the correlation between x and x-y. Myrtek and Foerster (1986) and Maloney and Rastogi (1970) propose a t-test that compares the variances of x and y. They only differ in that the former assumes that the baseline variance is greater than the retest (one-tailed test). The t-test takes into account measurement errors by assuming they are equal in x and y.
Line 10: Line 11:
Other authors suggest comparing the ''variances' of x and y since the variances at later times would be expected to be smaller as scores 'bunch up' if there is a relationship between x and x-y. So a difference in variances in scores x and y will illustrate a relationship between baseline score and change. Myrtek and Foerster (1986) propose a t-test that compares x and y variances assuming that one variance is greater than the other. Tu and Gilthorpe (2007) show, rather surprisingly, that the comparison of x and y variances is equivalent to testing the size of the Pearson correlation between x-y and x'''+y'''. Both these authors, therefore, recommend using r(x-y,x+y) for testing if baseline influences change score rather than r(x,x-y). This rather curious result stems from the fact that y appears in ''both'' terms being correlated with ''opposite signs'' meaning the problematic biasing covariance term that was present in r(x,x-y) now cancels out since (x-y)(x+y) = x*x - x*y + x*y - y*y = x*x - y*y. it turns out y*y is in fact greater than x*y in the r(x,x-y) term which means the r(x+y,x-y) correlation is comparatively smaller and thus the bias present in r(x,x-y) is removed.

The t-test can be done by firstly computing x-y and x+y and then using the correlation or regression procedures in SPSS (or other software). Alternatively you can simply enter the (baseline) x and (retest) y values (upto 200 pairs) into this [[attachment:rxx-r.xls|spreadsheet]] which computes the correlation testing a difference in x and y variances and the p-value for the t-test of this difference. The t-statistic of Mrytek and Foerster (1986) is also given (see Jin(1992) for the correct formula for this method).
__Please note prior to the 8th January 2019 there was an error in this spreadsheet which gave an erroneous test of the correlation. This has now been corrected.__ There is also a R function [[FAQ/rxx-y_in_R | listed here]] written by Leon Reteig of the Department of Psychology, University of Amsterdam that will also compute Jin's corrected correlation and test for it and agrees with the spreadsheet.

Jin(1992) states that the Myrtek and Foerster test should __only__ be used when both (a) the correlation between initial and retest values and (b) the difference between retest and baseline scores are non-zero.

As an example we generated 10 cases at random each consisting of two variables x and y ''independently'' and randomly sampled from Normal distributions, N(5,7) and N(7,7), respectively. We would expect that there would be no relation between baseline and change score since the true x and y variances are equal. The correlation between x and x-y is, however, a biased 0.60 (p=0.06) whereas the x-y,x+y correlation which removes the bias is only 0.16 (p=0.66) as expected.

Other authors follow another tack. For example Tu et al. (2005) proposed a test of the x,x-y correlation taking the positive bias into account by comparing r(x,x-y) to a non-zero positive value under the null hypothesis. The problem with this method is that it requires the baseline and retest score variances to be equal which will not be the case in general.
Line 14: Line 24:
Jin, P. (1992). Toward a reconceptualization of the law of initial value. ''Psychological Bulletin'' '''111(1)''' 176-184. A free on-line pdf copy (not from this journal) is [[attachment:jin.pdf|here.]]

Maloney, C. J. and Rastogi, S. C. (1970). Significance test for Grubb's estimators. ''Biometrics'' '''26''' 671-676.
Line 15: Line 29:
NOTE: THE FORMULA GIVEN IN THIS PAPER IS INCORRECT
Line 16: Line 31:
Tu, Y-K., Baelum, V. and Gilthorpe, M. S. (2005). The relationship between baseline value and its change: problems in categorisation and the proposal of a new method. ''European Journal of Oral Sciences'' ''113''' 279-288. Tu, Y-K., Baelum, V. and Gilthorpe, M. S. (2005). The relationship between baseline value and its change: problems in categorisation and the proposal of a new method. ''European Journal of Oral Sciences'' '''113''' 279-288.
Line 18: Line 33:
Tu, Y-K. and Gilthorpe, M. S. (2007). [Revisiting the relation between change and initial value: A review and evaluation http://dionysus.psych.wisc.edu/lit/Topics/Statistics/RegressionToMean/tu_RegressionToTheMean_SiM2007.pdf] ''Statistics in Medicine'' '''26''' 443-457. Tu, Y-K. and Gilthorpe, M. S. (2007). Revisiting the relation between change and initial value: A review and evaluation. ''Statistics in Medicine'' '''26''' 443-457. A pdf copy is available [[http://dionysus.psych.wisc.edu/lit/Topics/Statistics/RegressionToMean/tu_RegressionToTheMean_SiM2007.pdf|here (via webpage)]] or [[attachment:rxxmy.pdf|here (pdf file).]] This article gives an overview of methods testing the x,x-y correlation.

An unbiased method of assessing the influence of baseline on change from baseline

Consider a baseline score x and a later score on the same test (e.g. using a new treatment), y. The correlation, r(x,x-y), between a baseline score and the difference score x-y (ie correlating baseline score with change from baseline) is biased to be considerably greater than zero. Tu and Gilthorpe (2007), among others, illustrate that for two independent random variables, x and y with the same variance, the correlation between x and the x-y difference is approximately, not zero as one might expect, but, for a large enough sample size, a whopping 0.71 (= the reciprocal of root 2). Indeed they show that the x,x-y correlation will always fall between 0 and 0.71 if x and y have the same variance.

This positive bias between x and x-y is caused by y only appearing in one of the terms being correlated which means an unwanted covariance between baseline x and the retest, y, is added into the correlation since x*(x-y) = x*x - x*y. This biasing due to x appearing in both terms being correlated is a problem known as mathematical coupling. It follows that the correlation between x and y-x is biased negatively from zero which intuitively says that the smaller baseline scores tend to have larger increases.

There is also a problem caused by measurement error which may 'throw' scores higher or lower than their true value meaning that by chance the next sampled scores could be lower or higher respectively. This process where scores are falsely made more extreme by measurement error leading to less extreme successive scores is an illustration of regression to the mean which can give false differences in score variances due to errors in measurement. In practice it is not always easy to get an idea of the amount, if any, of measurement error and so solutions such as those mentioned below usually assume equality of measurement error in x and y.

Most authors suggest comparing the variances of x and y since the variances at later times would be expected to be smaller as scores 'bunch up' if there is a relationship between x and x-y. So a difference in variances in scores x and y will illustrate a relationship between baseline score and change. Variances of x and y do not suffer from the bias problem associated with the correlation between x and x-y. Myrtek and Foerster (1986) and Maloney and Rastogi (1970) propose a t-test that compares the variances of x and y. They only differ in that the former assumes that the baseline variance is greater than the retest (one-tailed test). The t-test takes into account measurement errors by assuming they are equal in x and y.

Tu and Gilthorpe (2007) show, rather surprisingly, that the comparison of x and y variances is equivalent to testing the size of the Pearson correlation between x-y and x+y. Both these authors, therefore, recommend using r(x-y,x+y) for testing if baseline influences change score rather than r(x,x-y). This rather curious result stems from the fact that y appears in both terms being correlated with opposite signs meaning the problematic biasing covariance term that was present in r(x,x-y) now cancels out since (x-y)(x+y) = x*x - x*y + x*y - y*y = x*x - y*y. it turns out y*y is in fact greater than x*y in the r(x,x-y) term which means the r(x+y,x-y) correlation is comparatively smaller and thus the bias present in r(x,x-y) is removed.

The t-test can be done by firstly computing x-y and x+y and then using the correlation or regression procedures in SPSS (or other software). Alternatively you can simply enter the (baseline) x and (retest) y values (upto 200 pairs) into this spreadsheet which computes the correlation testing a difference in x and y variances and the p-value for the t-test of this difference. The t-statistic of Mrytek and Foerster (1986) is also given (see Jin(1992) for the correct formula for this method). Please note prior to the 8th January 2019 there was an error in this spreadsheet which gave an erroneous test of the correlation. This has now been corrected. There is also a R function listed here written by Leon Reteig of the Department of Psychology, University of Amsterdam that will also compute Jin's corrected correlation and test for it and agrees with the spreadsheet.

Jin(1992) states that the Myrtek and Foerster test should only be used when both (a) the correlation between initial and retest values and (b) the difference between retest and baseline scores are non-zero.

As an example we generated 10 cases at random each consisting of two variables x and y independently and randomly sampled from Normal distributions, N(5,7) and N(7,7), respectively. We would expect that there would be no relation between baseline and change score since the true x and y variances are equal. The correlation between x and x-y is, however, a biased 0.60 (p=0.06) whereas the x-y,x+y correlation which removes the bias is only 0.16 (p=0.66) as expected.

Other authors follow another tack. For example Tu et al. (2005) proposed a test of the x,x-y correlation taking the positive bias into account by comparing r(x,x-y) to a non-zero positive value under the null hypothesis. The problem with this method is that it requires the baseline and retest score variances to be equal which will not be the case in general.

References

Jin, P. (1992). Toward a reconceptualization of the law of initial value. Psychological Bulletin 111(1) 176-184. A free on-line pdf copy (not from this journal) is here.

Maloney, C. J. and Rastogi, S. C. (1970). Significance test for Grubb's estimators. Biometrics 26 671-676.

Myrtek, M. and Foerster, F. (1986). The law of initial value: a rare exception. Biological Psychology 22 227-237. NOTE: THE FORMULA GIVEN IN THIS PAPER IS INCORRECT

Tu, Y-K., Baelum, V. and Gilthorpe, M. S. (2005). The relationship between baseline value and its change: problems in categorisation and the proposal of a new method. European Journal of Oral Sciences 113 279-288.

Tu, Y-K. and Gilthorpe, M. S. (2007). Revisiting the relation between change and initial value: A review and evaluation. Statistics in Medicine 26 443-457. A pdf copy is available here (via webpage) or here (pdf file). This article gives an overview of methods testing the x,x-y correlation.

None: FAQ/rxxy_correction (last edited 2019-01-07 15:41:57 by PeterWatson)