R code for Jin's test of a correlation between baseline, X, and change from baseline, Y-X
(written by Leon Reteig of the Department of Psychology, University of Amsterdam, January 2019)
corr_change_baseline <- function(X,Y) {
# Performs a (two-tailed) test for equal variances in the two input vectors.
# This is one way to do an unbiased assesment of the correlation between baseline and change from baseline (retest - baseline).
# In this case, X would be the baseline, and Y the retest.
#
# Returns a list of 4 values, following Maloney & Rastogi (1970)
# -r: correlation of X-Y vs. X+Y. Assessing significance of this correlation is equivalent to a variance test
# -t: t-value constructed from the r-value
# -df: degrees of freedom for "r" and "t"
# -p: p-value for the variance test. If significant, this means the two vectors do not have equal variances,
# suggesting presence of a correlation between baseline and change from baseline.
#
# This implementation was inspired by and checked against an Excel spreadsheet, available at:
# http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/rxxy_correction
# This page is a great resource for further background on the issue of variance tests / correlations
#
# The relevant methods are contained in the following papers:
#
# Jin, P. (1992). Toward a reconceptualization of the law of initial value. Psychological Bulletin 111(1) 176-184.
# Maloney, C. J. and Rastogi, S. C. (1970). Significance test for Grubb's estimators. Biometrics 26 671-676.
# Myrtek, M. and Foerster, F. (1986). The law of initial value: a rare exception. Biological Psychology 22 227-237
# Tu, Y-K. and Gilthorpe, M. S. (2007). Revisiting the relation between change and initial value: A review and evaluation.
# Statistics in Medicine 26 443-457
# This approach of testing the correlation correlation of X-Y and X+Y
# is equivalent to testing the variance of X vs. the variance of Y (Tu & Gilthorpe, 2007)
r_cor <- cor(X - Y, X + Y)
n <- length(X) # number of observations
df <- n - 2 # degrees of freedom for r and t
t_MR_1970 <- r_cor*sqrt(df) / sqrt(1 - r_cor^2) # t-value proposed by Maloney & Rastogi (1970); equation taken from Tu & Gilthorpe (2007)
p_MR_1970 <- 2*pt(-abs(t_MR_1970), df) # p-value from t-value (for a two-sided test)
return(list(r = r_cor, df = df, t = t_MR_1970, p = p_MR_1970))
# Myrtek & Foerster (1986) proposed a different test. The approach is equivalent to Maloney & Rastogi (1970) and yields the same p-value.
# It is only included here for the sake of completeness.
B_e <- ( var(Y) - var(X) + sqrt((var(Y) - var(X))^2 + 4*(cov(X,Y))^2) ) / (2*cov(X,Y)) # structural relation, from Jin (1992)
# N.B. this is a simplifaction of the equation in the paper, as "cor(X,Y) * sd(X) * sd(Y)"
# simply reduces to "cov(X,Y)". The formula as literally in Jin (1992):
# B_e_paper <- ( var(Y) - var(X) + sqrt((var(Y) - var(X))^2 + 4*(cor(X,Y) * sd(X) * sd(Y))^2) ) / (2*cor(X,Y) * sd(X) * sd(Y))
# This simplification is also used in the formula below.
# t-value proposed by Myrtek & Foerster, 1986; equation taken from from Jin (1992)
t_MF_1986 <- sqrt( df * sin(2*(atan(B_e) - atan(1)))^2 *
( (1/4) * (var(X) - var(Y))^2 + cov(X,Y)^2 ) /
( var(X) * var(Y) - cov(X,Y)^2) )
p_MF_1986 <- 2*pt(-abs(t_MF_1986), df) #p-value from t-value (for a two-sided test)
# return(list(df = df, t = t_MF_1986, p = p_MF_1986)) # the t- and p-values are identical to t_MR_1970 and p_MR_1970, so only return those
}Example using the above R function (given in spreadsheet on preceding page)
library(foreign) > dat2 <- data.frame(dat) > X <- dat2[,1] > Y <- dat2[,2] > X [1] 1.05842 2.43411 -0.51272 0.96959 4.00000 5.00000 2.00000 3.00000 4.00000 5.00000 6.00000 7.00000 3.74174 [14] -1.12423 1.67558 2.57155 2.81140 4.79701 -0.31238 3.12158 2.45195 2.07938 5.03972 1.72247 1.06231 -0.99752 [27] 2.66932 -4.12602 2.81685 2.97308 3.13575 0.46938 0.46938 > Y [1] 3.41853 0.87283 2.76419 5.36829 2.15316 3.47944 -0.66727 -0.07276 -0.06596 1.54776 2.52074 -0.52881 5.48375 [14] 0.59687 -0.56258 2.86513 2.47741 -0.01483 4.12124 1.63788 0.88078 4.90384 3.36825 2.04285 3.68688 5.04538 [27] 1.52683 -1.26871 0.83673 3.58103 1.03052 -1.08757 -1.08757
This yields the same answer as the spreadsheet
> corr_change_baseline(X,Y) $r [1] 0.1296376 $df [1] 31 $t [1] 0.7279344 $p [1] 0.4721166
