<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>FAQ/errsvars</title><revhistory><revision><revnumber>24</revnumber><date>2015-11-11 16:14:46</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>23</revnumber><date>2015-11-10 16:36:28</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>22</revnumber><date>2015-11-10 15:45:16</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>21</revnumber><date>2015-11-04 17:31:23</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>20</revnumber><date>2015-11-04 17:30:25</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>19</revnumber><date>2015-11-03 12:58:33</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>18</revnumber><date>2015-11-03 12:57:33</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>17</revnumber><date>2015-11-02 15:42:59</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>16</revnumber><date>2015-11-02 15:27:57</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>15</revnumber><date>2015-11-02 15:26:59</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>14</revnumber><date>2015-11-02 15:25:45</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>13</revnumber><date>2015-11-02 15:25:07</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>12</revnumber><date>2015-11-02 15:16:45</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>11</revnumber><date>2015-11-02 15:09:01</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>10</revnumber><date>2015-10-23 11:47:58</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>9</revnumber><date>2015-10-23 11:45:41</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>8</revnumber><date>2015-10-23 11:11:47</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>7</revnumber><date>2015-10-23 10:36:51</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>6</revnumber><date>2015-10-23 10:34:08</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>5</revnumber><date>2015-10-23 10:33:39</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>4</revnumber><date>2015-10-16 14:53:10</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>3</revnumber><date>2015-10-15 14:36:53</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>2</revnumber><date>2015-10-15 14:36:02</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>1</revnumber><date>2015-10-15 14:35:06</date><authorinitials>PeterWatson</authorinitials></revision></revhistory></articleinfo><section><title>Errors in variables when doing a regression</title><para>If there is measurement error in a predictor (x) it follows that the slope and intercept will not converge to their true values and be biasedly estimated. For example the slope will converge to slope / (Variance(x) + Variance(Measurement Error of x)) which will lead to an underestimate of the slope in the presence of measurement error since its variance will be non-zero.  </para><para>Klauer KC, Draine SC and Greenwald A G (1998) An unbiased errors-in-variables approach to detecting unconscious cognition. <emphasis>British Journal of Mathematical and Statistical Psychology</emphasis> <emphasis role="strong">51</emphasis> 253-267 present a method for estimating the slope and intercept and their standard errors adjusting for measurement error. This can be estimated using a FORTRAN program used by the authors. Their approach uses a truncated Normal distribution which assumes no negative x values to estimate the slope of x and intercept. The authors claim in their discussion in this paper that such an assumption is robust to differing distributions of x (exponential and uniform) which assume negative values. </para><para>R has a Bayesian procedure, leiv, which uses a Cauchy prior for the slope combined with a likelihood function using the standard deviations of (predictor) x and (outcome) y and their correlation from the data to produce posterior distributions for the slope and intercept adjusted for measurement error. The procedure can also use these posterior distributions to produce median values and credible regions for the slope and intercept. The Bayesian procedure is described in <ulink url="https://lsr-wiki-02.mrc-cbu.cam.ac.uk/statswiki/FAQ/errsvars/statswiki/FAQ/errsvars?action=AttachFile&amp;do=get&amp;target=leivpaper.pdf">Leonard D. (2011) Estimating a bivariate linear relationship</ulink> <emphasis>Bayesian Analysis</emphasis> <emphasis role="strong">6(4)</emphasis> 727-754. </para><para><emphasis role="underline">Special case (simple regression of a single predictor)</emphasis> </para><para>Goldstein (2015) give formulae for obtaining corrections for the slope and intercept ina  simple regression with one predictor, x, of, outcome, y. In particular if we know the reliability of x, R, equal to variance(x(true))/variance(x(obs)) where x(obs) is x(true) + measurement error then  </para><para>if y = a* + b*x(obs) + e* for observed x and </para><para>y = a + bx(true) + e for the true value of x then for intercepts a and slope b corresponding to the true value of x </para><screen><![CDATA[b = b*/R = b / (b sd(x)/sd(y) ) = sd(y) / sd(x)
]]><![CDATA[
a = ybar - b xbar]]></screen><para>This formula is used by the <emphasis>leiv</emphasis> routine mentioned above using the correlation between x and y as the measure of R, the reliability of x. For example if x has mean 7.35 (sd = 5.53), y has a mean of 7.22 (sd= 4.70), correlation(x,y) = 0.70, the slope of b* for x(obs) is 0.60 then  </para><para>b = b*/R = 0.60/0.70 = 0.85 = 4.70 / 5.53 = sd(y) / sd(x) . </para><para>a = ybar - b xbar = 7.22 - 0.85 x 7.35 = 0.97. </para><para>Goldstein also suggests using a range of reliabilites corresponding to R=1, R=0.75 and R=0.65 to assess the sensitivity of the regression coefficients to measurement error. He also recommends and illustrates adjustment for measurement error in a multiple regression using the Bayesian approach of Richardson and Gilks (1993) using WINBUGS freeware which may be run in R. An example of WINBUGS syntax for measurement error in a simple regression (one predictor, x) is given <ulink url="http://stats.stackexchange.com/questions/152911/bayesian-modelling-measurement-error-carrying-over-posterior-distributions">here.</ulink> To run this syntax in R WINBUGS14 software needs to be downloaded and placed in the Program Files directory in the C: drive on your PC.   </para><para><emphasis role="underline">Standard errors for slope and intercepts in a simple regression</emphasis> </para><para>Given the formulae above for the slope and intercept corrected for measurement error when there is just one predictor we can use these to obtain standard errors for the slope and intercept using the delta method (see <ulink url="https://lsr-wiki-02.mrc-cbu.cam.ac.uk/statswiki/FAQ/errsvars/statswiki/FAQ/FVars/Fab#">here).</ulink> </para><para>Since the variance of a sample variance from a sample of size, n, equals sample variance times sqrt(2/n-1)) (see <ulink url="https://lsr-wiki-02.mrc-cbu.cam.ac.uk/statswiki/FAQ/errsvars/statswiki/FAQ/errsvars?action=AttachFile&amp;do=get&amp;target=sdsd.pdf">here).</ulink> </para><para>Variance of slope = V( sd(y) / sd(x) ) =  ( sd(y)<superscript>2 </superscript> sd(x)<superscript>2 </superscript> sqrt[2/(n-1)] - sd(x)<superscript>2 </superscript> sd(y)<superscript>2 </superscript> sqrt[2/(n-1)] ) / sd(x)<superscript>4 </superscript> (assumes the standard deviations of x and y are uncorrelated) </para><para>If we ignore the uncertainty in the correlation then variance (slope) = uncorrected slope/correlation<superscript>2 </superscript> (see page 218 in the pdf chapter <ulink url="https://lsr-wiki-02.mrc-cbu.cam.ac.uk/statswiki/FAQ/errsvars/statswiki/FAQ/errsvars?action=AttachFile&amp;do=get&amp;target=bayeseiv.pdf">here.</ulink>) </para><para>Variance of intercept = V(ybar - slope xbar) = V(ybar) + slope<superscript>2 </superscript> V(xbar) + xbar<superscript>2 </superscript> V(slope) which assumes the slope of x on y is not related to specific values of x. </para><para>We could also construct bootstrap samples to obtain confidence intervals for the slope and intercept. </para><para><emphasis role="underline">References</emphasis> </para><para>Goldstein H (2015) Jumping to the wrong conclusions. <emphasis>Significance</emphasis> <emphasis role="strong">12(5)</emphasis> 18-21. See <ulink url="https://lsr-wiki-02.mrc-cbu.cam.ac.uk/statswiki/FAQ/errsvars/statswiki/FAQ/errsvars?action=AttachFile&amp;do=get&amp;target=gold.pdf">this article here.</ulink> </para><para>Lunn D, Jackson C, Best N, Thomas A and Spiegelhalter D (2012) The BUGS Book - A Practical Introduction to Bayesian Analysis. CRC Press / Chapman and Hall:London. </para><para>Richardson S and Gilks W (1993) Conditional independence models for epidemiological studies with covariate measurement error. <emphasis>Statistics in Medicine</emphasis> <emphasis role="strong">12</emphasis> 1703-1722. </para></section></article>