Differences between revisions 1 and 5 (spanning 4 versions)

How do I correlate change score with baseline?

To correlate change between two time points with baseline (at time 1, T1) we correlate the score at T1 with the difference in scores between time 2, T2, and T1. This is not the same as correlating T1 score with T2 score or T1 score with the residuals from a regression of T1 score on T2 score.

To see this consider the following example

T1		T2
1		2
2		3
3		4

The change over time is the same (a difference of 1 unit) regardless of baseline so has no relationship with baseline score. The correlation between the scores at T1 and T2, on the other hand, is 1 because the score at T2 is exactly 1 higher than the score at T1 so is perfectly predicted by T1 score.

The difference score T2 score - T1 score is also different to the residual of a regression using T1 score to predict T2 score. To see this lets consider the regression equation which is

T2 score = B T1 score + random error

with the regression coefficient, B, therefore approximated by T2 score/T1 score, the expected ratio of a score at time 2 to a score, on the same individual, at time 1. B T1 is an estimate of what an individual should score at time 2 given their time 1 score. So the residual, T2 score - B T1 score, obtained from the regression of T1 score on T2 score represents if an individual's T2 score is 'above' or 'below' the 'average' expected score, obtained using all T1 score, T2 score pairs, based on their score at T1.

The plot given [attachment:davyplot.pdf here] shows that the higher difference score given in red (a score rising from around 11 at time 1 to 16 at time 2) has a smaller (negative) residual than a smaller difference score given in blue (a score rising from around 5 at time 1 to 8 at time 2) which has a positive residual. This is because although the individual scoring 11 at time 1 increases by a larger amount (rising by around 5 units) than the one scoring 5 at time 1 (who rises by about 3 units) the former goes up by less than would be expected assuming a constant overall increase in scores represented by a constant T2/T1 ratio (in this case T2/T1 is close to 1.5).

Aside: In fact the illustrative data in the plot was generated so that T2 = 1.5 T1 + Normal error(mean=0,variance=0.25). The closeness of the linear regression model is related to the size of the variance of the (normally distributed) random error which is assumed constant across all scores. The larger the error variance the worse will be the fit of the linear regression model.

-  ⇤ ← Revision 1 as of 2010-05-19 10:11:04 → 
  Size: 619
  Editor: PeterWatson
  Comment:
+   ← Revision 5 as of 2010-05-20 09:33:45 → ⇥
  Size: 2605
  Editor: PeterWatson
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-Line 4:
+Line 3:
-To correlate change between two time points with baseline (at T1) we correlate the score at T1 with the difference in scores between T2 and T1. This is ''not'' the same as correlating T1 with T2.
+To correlate change between two time points with baseline (at time 1, T1) we correlate the score at T1 with the difference in scores between time 2, T2, and T1. This is ''not'' the same as correlating T1 score with T2 score or T1 score with the residuals from a regression of T1 score on T2 score.
-Line 11:
+Line 10:
 |||| 3 || 4 ||
 Line 13:
-The change over time is the same (1) regardless of baseline so has no relationship with baseline score. The correlation between T1 and T2, on the other hand, is 1 because T2 is exactly 1 higher than T1 so is perfectly predicted by T1.
+The change over time is the same (a difference of 1 unit) regardless of baseline so has no relationship with baseline score. The correlation between the scores at T1 and T2, on the other hand, is 1 because the score at T2 is exactly 1 higher than the score at T1 so is perfectly predicted by T1 score.

The difference score T2 score - T1 score is also different to the residual of a regression using T1 score to predict T2 score. To see this lets consider the regression equation which is
 
T2 score = B T1 score + random error

with the regression coefficient, B, therefore approximated by T2 score/T1 score, the expected ratio of a score at time 2 to a score, on the same individual, at time 1. B T1 is an estimate of what an individual should score at time 2 given their time 1 score. 
So the residual, T2 score - B T1 score, obtained from the regression of T1 score on T2 score represents if an individual's T2 score is 'above' or 'below' the 'average' expected score, obtained using all T1 score, T2 score pairs, based on their score at T1.

The plot given [attachment:davyplot.pdf here] shows that the higher difference score given in red (a score rising from around 11 at time 1 to 16 at time 2) has a smaller (negative) residual than a smaller difference score given in blue (a score rising from around 5 at time 1 to 8 at time 2) which has a positive residual. This is because although the individual scoring 11 at time 1 increases by a larger amount (rising by around 5 units) than the one scoring 5 at time 1 (who rises by about 3 units) the former goes up by less than would be expected assuming a constant overall increase in scores represented by a constant T2/T1 ratio (in this case T2/T1 is close to 1.5).

''Aside:'' In fact the illustrative data in the plot was generated so that T2 = 1.5 T1 + Normal error(mean=0,variance=0.25). The closeness of the linear regression model is related to the size of the variance of the (normally distributed) random error which is assumed constant across all scores. The larger the error variance the worse will be the fit of the linear regression model.

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

How do I correlate change score with baseline?