<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>FAQ/RegressionOutliers</title><revhistory><revision><revnumber>23</revnumber><date>2015-05-06 16:06:06</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>22</revnumber><date>2015-05-06 16:05:24</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>21</revnumber><date>2013-09-19 11:26:55</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>20</revnumber><date>2013-09-19 11:24:47</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>19</revnumber><date>2013-03-08 10:17:44</date><authorinitials>localhost</authorinitials><revremark>converted to 1.6 markup</revremark></revision><revision><revnumber>18</revnumber><date>2011-09-20 11:28:42</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>17</revnumber><date>2010-01-25 13:57:32</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>16</revnumber><date>2010-01-25 13:55:19</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>15</revnumber><date>2010-01-25 13:53:59</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>14</revnumber><date>2008-01-29 10:21:06</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>13</revnumber><date>2008-01-29 10:20:27</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>12</revnumber><date>2008-01-29 10:20:08</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>11</revnumber><date>2008-01-29 10:17:42</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>10</revnumber><date>2008-01-29 10:16:51</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>9</revnumber><date>2007-10-03 16:33:09</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>8</revnumber><date>2007-10-03 16:32:36</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>7</revnumber><date>2007-10-03 16:28:27</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>6</revnumber><date>2007-10-03 16:27:31</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>5</revnumber><date>2007-10-03 16:27:00</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>4</revnumber><date>2006-07-20 14:13:14</date><authorinitials>pc0082.mrc-cbu.cam.ac.uk</authorinitials></revision><revision><revnumber>3</revnumber><date>2006-06-30 22:57:01</date><authorinitials>Scripting Subsystem</authorinitials></revision><revision><revnumber>2</revnumber><date>2006-06-30 22:55:30</date><authorinitials>Scripting Subsystem</authorinitials></revision><revision><revnumber>1</revnumber><date>2006-06-30 21:37:50</date><authorinitials>Scripting Subsystem</authorinitials></revision></revhistory></articleinfo><section><title>Checking for outliers in regression</title><para>According to Hoaglin and Welsch (1978) leverage values above 2(p+1)/n where p predictors are in the regression on n observations (items) are influential values. If the sample size is &lt; 30 a stiffer criterion such as 3(p+1)/n is suggested. </para><para>Leverage is also related to the i-th observation's <ulink url="https://lsr-wiki-02.mrc-cbu.cam.ac.uk/statswiki/FAQ/RegressionOutliers/statswiki/FAQ/mahal#">Mahalanobis distance</ulink>, MD(i), such that for sample size, N </para><para>Leverage for observation i = MD(i)/(N-1) + 1/N </para><para>so  </para><para>Critical MD(i) = (2(p+1)/N - 1/N)(N-1) </para><para>(See Tabachnick and Fidell) </para><para>Other outlier detection methods using boxplots are in the Exploratory Data Analysis Graduate talk located <ulink url="https://lsr-wiki-02.mrc-cbu.cam.ac.uk/statswiki/FAQ/RegressionOutliers/statswiki/StatsCourse2009#">here</ulink> or by using z-scores using tests such as Grubb's test - further details and an on-line calculator are located <ulink url="http://www.graphpad.com/quickcalcs/Grubbs1.cfm">here.</ulink> </para><para>Hair, Anderson, Tatham and Black (1998) suggest Cook's distances greater than 1 are influential.  Hair et al mention that some people also use 4/(N-k-1) for k predictors and N points as a threshold for Cook’s distance which usually gives a lower threshold than 1 (e.g. with 1 predictor and 27 observations this gives 4/(27-1-1) = 0.16). A third threshold of 4/N is also mentioned (Bollen and Jackman (1990)) which would give a threshold of 4/27 = 0.14 in the above example.  </para><para><emphasis role="strong">References</emphasis> </para><para><emphasis role="strong">Bollen, K. A. and Jackman, R. W. (1990)</emphasis> Regression diagnostics: An expository treatment of outliers and influential cases, in Fox, John; and Long, J. Scott (eds.); Modern Methods of Data Analysis (pp. 257-91). Newbury Park, CA: Sage. </para><para><emphasis role="strong">Hair, J., Anderson, R., Tatham, R. and Black W. (1998).</emphasis> Multivariate Data Analysis (fifth edition). Englewood Cliffs, NJ: Prentice-Hall. </para><para><emphasis role="strong">Hoaglin, D. C. and Welsch, R. E. (1978).</emphasis> The hat matrix in regression and ANOVA. The American Statistician 32, 17-22. </para><para><ulink url="https://lsr-wiki-02.mrc-cbu.cam.ac.uk/statswiki/FAQ/RegressionOutliers/statswiki/FAQ#">Return to Statistics FAQ page</ulink> </para><para><ulink url="https://lsr-wiki-02.mrc-cbu.cam.ac.uk/statswiki/FAQ/RegressionOutliers/statswiki/CbuStatistics#">Return to Statistics main page</ulink> </para><para><ulink url="http://www.mrc-cbu.cam.ac.uk/">Return to CBU main page</ulink> </para><para>These pages are maintained by <ulink url="mailto:ian.nimmo-smith@mrc-cbu.cam.ac.uk">Ian Nimmo-Smith</ulink> and <ulink url="mailto:peter.watson@mrc-cbu.cam.ac.uk">Peter Watson</ulink> </para></section></article>