<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>FAQ/skew</title><revhistory><revision><revnumber>11</revnumber><date>2017-04-26 09:33:20</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>10</revnumber><date>2016-08-31 14:07:04</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>9</revnumber><date>2013-03-08 10:17:09</date><authorinitials>localhost</authorinitials><revremark>converted to 1.6 markup</revremark></revision><revision><revnumber>8</revnumber><date>2012-10-24 13:42:28</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>7</revnumber><date>2007-07-20 09:20:44</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>6</revnumber><date>2007-07-20 09:19:36</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>5</revnumber><date>2007-07-20 09:12:06</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>4</revnumber><date>2007-07-20 09:11:52</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>3</revnumber><date>2007-07-20 09:11:04</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>2</revnumber><date>2007-07-20 09:09:32</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>1</revnumber><date>2007-07-20 09:05:42</date><authorinitials>PeterWatson</authorinitials></revision></revhistory></articleinfo><section><title>How should I deal with skew when doing correlations?</title><para>Skewness (where the data is bunched at one end e.g. ceiling or floor effects) and in particular outliers can give <ulink url="https://lsr-wiki-02.mrc-cbu.cam.ac.uk/statswiki/FAQ/skew/statswiki/FAQ/skew?action=AttachFile&amp;do=get&amp;target=outlier.ppt">spurious Pearson correlations.</ulink> </para><para>To properly analyse the effects of skew one should look at the residuals from a regression using one of the variables as a predictor of the other. If the residuals are not normally distributed about zero the Pearson correlation could be unreliable. This can be checked by plotting - see regression talk at <ulink url="https://lsr-wiki-02.mrc-cbu.cam.ac.uk/statswiki/FAQ/skew/statswiki/StatsCourse2006#">StatsCourse2006</ulink>. </para><para>A suggested strategy is to transform one of the two variables, using either a power transform, or if the residuals are still non-normal after that, a rank transform (Spearman's rho or Kendall's tau-b) or compute Normal scores after separately ranking each pair of variables which are to be correlated (Bishara and Hittner, 2012). </para><para>de Winter, Golsing and Potter (2016) suggest using Pearson correlations for 'light-tailed' distributions and the Spearman correlation for heavier tailed distributions e.g. when outliers are present. </para><para>Outliers should not be deleted unless there is some measurement problem (Langkjaer-Bain R (2017)). </para><para><emphasis role="underline">Further Discussion</emphasis> </para><para>Bishara, A. J. and Hittner, J. B. (2012) Testing the Significance of a Correlation With Nonnormal Data: Comparison of Pearson, Spearman, Transformation, and Resampling Approaches. <emphasis>Psychological Methods</emphasis> <emphasis role="strong">17 (3)</emphasis> 399–417. </para><para>de Winter, J. C. F., Gosling, S. D. &amp; Potter, J. (2016) Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: a tutorial using simulations and empirical data. <emphasis>Psychological Methods</emphasis> <emphasis role="strong">21(3)</emphasis> 273-290.  </para><para><ulink url="http://findarticles.com/p/articles/mi_m2405/is_n4_v122/ai_17848623/pg_7">Dunlap, W. P., Burke, M. J., &amp; Greer, T. (1995). The effect of skew on the magnitude of product-moment correlations. Journal of General Psychology, 122, 365-377.</ulink> </para><para>Langkjaer-Bain R. (2017) The murky tale of Flint's deceptive water data. <emphasis>Significance</emphasis> <emphasis role="strong">14(2)</emphasis> 17-21. </para></section></article>