Differences between revisions 3 and 5 (spanning 2 versions)

Errors in variables when doing a regression

If there is measurement error in a predictor (x) it follows that the slope and intercept will not converge to their true values and be biasedly estimated. For example the slope will converge to slope / (Variance(x) + Variance(Measurement Error of x)) which will lead to an underestimate of the slope in the presence of measurement error since its variance will be non-zero.

Klauer KC, Draine SC and Greenwald A G (1998) An unbiased errors-in-variables approach to detecting unconscious cognition. British Journal of Mathematical and Statistical Psychology 51 253-267 present a method for estimating the slope and intercept and their standard errors adjusting for measurement error. This can be estimated using a FORTRAN program used by the authors.

R has a Bayesian procedure, leiv, which uses a Cauchy prior for the slope combined with a likelihood function using the standard deviations of (predictor) x and (outcome) y and their correlation from the data to produce posterior distributions for the slope and intercept adjusted for measurement error. The procedure can also use these posterior distributions to produce median values and credible regions for the slope and intercept. The Bayesian procedure is described in Leonard D. (2011) Estimating a bivariate linear relationship Bayesian Analysis 6(4) 727-754.

Special case

Goldstein (2015) give formulae for obtaining corrections for the slope and intercept ina simple regression with one predictor, x, of, outcome, y. In particular if we know the reliability of x, R, equal to variance(x-true)/variance(x-obs) where x-obs is x-true + measurement error then

if y = a* + b*x-obs + e* for observed x and

y = a + bx-true + e for the true value of x then for intercepts a and slope b corresponding to the true value of x

b = b*/R

a = ybar - b xbar

This formula is used by the leiv routine mentioned above using the correlation between x and y as the measure of R, the reliability of x. For example if x has mean 7.35, y has a mean of 7.22, correlation(x,y) = 0.70, the slope of b* for x-obs is 0.60 then

b = b*/R = 0.60/0.70 = 0.85.

a = ybar - b xbar = 7.22 - 0.85 x 7.35 = 0.97.

Goldstein also suggests using a range of reliabilites corresponding to R=1, R=0.75 and R=0.65 to assess the sensitivity of the regression coefficients to measurement error. He also recommends and illustrates adjustment for measurement error in a multiple regression using the Bayesian approach of Richardson and Gilks (1993).

References

Goldstein H (2015) Jumping to the wrong conclusions. Significance 12(5) 18-21.

Richardson S and Gilks W (1993) Conditional independence models for epidemiological studies with covariate measurement error. Statistics in Medicine 12 1703-1722.

-  ⇤ ← Revision 3 as of 2015-10-15 14:36:53 → 
  Size: 1255
  Editor: PeterWatson
  Comment:
+   ← Revision 5 as of 2015-10-23 10:33:39 → ⇥
  Size: 2906
  Editor: PeterWatson
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 7:
-R has a Bayesian procedure, leiv, which uses a Cauchy prior for the slope combined with a likelihood function using the standard deviations of (predictor) x and (outcome) y and their correlation from the data to produce posterior distributions for the slope and intercept adjusted for measurement error. The procedure can also use these posterior distributions to produce median values and credible regions for the slope and intercept.
+R has a Bayesian procedure, leiv, which uses a Cauchy prior for the slope combined with a likelihood function using the standard deviations of (predictor) x and (outcome) y and their correlation from the data to produce posterior distributions for the slope and intercept adjusted for measurement error. The procedure can also use these posterior distributions to produce median values and credible regions for the slope and intercept. The Bayesian procedure is described in [[attachment:leivpaper.pdf | Leonard D. (2011) Estimating a bivariate linear relationship]] ''Bayesian Analysis'' '''6(4)''' 727-754.

__Special case__

Goldstein (2015) give formulae for obtaining corrections for the slope and intercept ina  simple regression with one predictor, x, of, outcome, y. In particular if we know the reliability of x, R, equal to variance(x-true)/variance(x-obs) where x-obs is x-true + measurement error then 

if y = a* + b*x-obs + e* for observed x and

y = a + bx-true + e for the true value of x then for intercepts a and slope b corresponding to the true value of x

{{{
b = b*/R

a = ybar - b xbar
}}}

This formula is used by the leiv routine mentioned above using the correlation between x and y as the measure of R, the reliability of x. For example if x has mean 7.35, y has a mean of 7.22, correlation(x,y) = 0.70, the slope of b* for x-obs is 0.60 then 

b = b*/R = 0.60/0.70 = 0.85.

a = ybar - b xbar = 7.22 - 0.85 x 7.35 = 0.97.

Goldstein also suggests using a range of reliabilites corresponding to R=1, R=0.75 and R=0.65 to assess the sensitivity of the regression coefficients to measurement error. He also recommends and illustrates adjustment for measurement error in a multiple regression using the Bayesian approach of Richardson and Gilks (1993).  


__References__

Goldstein H (2015) Jumping to the wrong conclusions. ''Significance'' '''12(5)''' 18-21.

Richardson S and Gilks W (1993) Conditional independence models for epidemiological studies with covariate measurement error. ''Statistics in Medicine'' '''12''' 1703-1722.

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

Errors in variables when doing a regression