<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>FAQ/dummyCor</title><revhistory><revision><revnumber>14</revnumber><date>2013-08-20 15:39:17</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>13</revnumber><date>2013-04-11 08:55:19</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>12</revnumber><date>2013-03-08 10:17:31</date><authorinitials>localhost</authorinitials><revremark>converted to 1.6 markup</revremark></revision><revision><revnumber>11</revnumber><date>2012-10-26 11:01:20</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>10</revnumber><date>2007-02-27 15:11:10</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>9</revnumber><date>2007-02-27 15:10:37</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>8</revnumber><date>2007-02-27 15:07:23</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>7</revnumber><date>2007-02-27 15:01:35</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>6</revnumber><date>2007-02-27 14:57:12</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>5</revnumber><date>2007-02-27 14:52:06</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>4</revnumber><date>2007-02-27 14:50:10</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>3</revnumber><date>2007-02-27 14:44:17</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>2</revnumber><date>2007-02-27 14:43:00</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>1</revnumber><date>2007-02-27 14:42:20</date><authorinitials>PeterWatson</authorinitials></revision></revhistory></articleinfo><section><title>Regression diagnostics for categorical variables</title><para>Some people feel a little anxious expressing correlations between dichotomous variables and a continuous variable in a regression,  for example, as input for multicollinearity diagnostics. </para><para>When we have have a dichotomous variable (or dummy variable) in a simple regression the correlation with the outcome measure is termed a point-biserial correlation. Rosenthal, R. (1994) shows that this correlation is related both to the F and t statistics and also to the difference in group means expressed in terms of the pooled group standard deviation. </para><para>In particular, for the former two,  </para><para>$$r(pb) = \mbox{the square root of } [ \mbox{t}<superscript>2</superscript> / (t<superscript>2</superscript> + df) ] $$ </para><para>and </para><para>F(1,df) = [ df(Residual)  r(pb) r(pb) ] / [ (1-r(pb)r(pb) ) ] </para><para>For the more general case of a categorical predictor, representing k groups,  say, Rsq, the square of the semi-partial correlation for   the categorical predictor with outcome is related to the F value by </para><para>F(k-1,df) = [df(Residual)/(k-1)] [Rsq /(1-Rsq)] </para><para>Semi-partial R-squared for group, Rsq(group), is defined as </para><para>Rsq(group) = Rsq(all predictors) - Rsq(removing group) </para><para>Semi-partial R-squareds and F ratios are routinely used as indicators of predictive strength in simple and multiple regressions.  Cohen, J. Cohen, P. (1983), for example, give an example of semi-partial correlations in a four predictor multiple regression involving sex.  </para><para>As an alternative to the above the StepAIC procedure in R can be used to select the best fitting models by comparing model Akaike Information Criteria (AICs) as described by Venables and Ripley (2002). </para><para><emphasis role="strong">References</emphasis> </para><para>Cohen, J. Cohen, P. (1983) Applied multiple regression/correlation analysis for the behavioral sciences. Second edition. Lawrence Erlbaum:London. </para><para>Rosenthal, R. (1994) Parametric measures of effect size. In H.Cooper amd L.V. Hedges (Eds) The handbook of research synthesis.  New York: Russell Sage Foundation. </para><para>Venables, W. N., Ripley, B. D., (2002). Modern Applied Statistics with S. 4th edition. New York: Springer. </para></section></article>