Diff for "FAQ/icc" - CBU statistics Wiki
location: Diff for "FAQ/icc"
Differences between revisions 30 and 32 (spanning 2 versions)
Revision 30 as of 2010-11-17 12:12:27
Size: 3436
Editor: PeterWatson
Comment:
Revision 32 as of 2010-11-17 12:30:11
Size: 3311
Editor: PeterWatson
Comment:
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
$$\mbox{ICC1} = \frac{\mbox{MS(subjects)–MS(subjects x raters)}}{\mbox{MS(subjects) + (k-1)MS(subjects x raters)}}$$ $$\mbox{MS(total w)} = \frac{\mbox{SS(raters)–MS(subjects x raters)}}{\mbox{df(raters) + df(subjects x raters)}}$$
Line 9: Line 9:
where MS is the Mean square from the repeated measures analysis of variance. It follows that the intra-class correlation (ICC), unlike the Pearson correlation, is useful for pooling paired data each having three or more observations. Einfield and Tonge (1992, p 12) prefer using the ICC to the Pearson as it is more conservative owing to that fact it "takes account of the absolute as well as the relative difference between the scores of two raters".
Line 11: Line 10:
Howell (1997) recommends an alternative, most widely used ICC which assumes that the raters are a random sample from a larger population which has an extra term in the denominator and is of form $$\mbox{ICC1} = \frac{\mbox{MS(subjects)–MS(total w)}}{\mbox{MS(subjects) + (k-1)MS(total w)}}$$

where MS is the Mean square from the repeated measures analysis of variance.

This corresponds to the one-way random approach in SPSS (see [http://www.uvm.edu/~dhowell/StatPages/More_Stuff/icc/icc.html here).]

It follows that the intra-class correlation (ICC), unlike the Pearson correlation, is useful for pooling paired data each having three or more observations. Einfield and Tonge (1992, p 12) prefer using the ICC to the Pearson as it is more conservative owing to that fact it "takes account of the absolute as well as the relative difference between the scores of two raters".

Howell (1997) also recommends an alternative, most widely used ICC which assumes that the raters are a random sample from a larger population which has an extra term in the denominator and is of form
Line 15: Line 22:
where n is the number of subjects being rated. This corresponds to the one-way random approach in SPSS (see [http://www.uvm.edu/~dhowell/StatPages/More_Stuff/icc/icc.html here).] where n is the number of subjects being rated.
Line 17: Line 24:
ICC2 is to be preferred as it is more robust to differences in absolute ratings betweeen raters. For example suppose we have two raters and one rater always gives exactly half the rating of the other then only ICC2 has a correct low value. e.g. if two raters rate three subjects giving ratings 1,2; 2,4; 3,6 e.g. if two raters rate three subjects giving ratings 1,2; 2,4; 3,6

Intraclass correlations

An alternative, to the kappa statistic, which uses an analysis of variance output to estimate rater reliability is the intraclass correlation coefficient (ICC).

For a repeated measures anova involving k raters it follows assuming both subjects and raters are fixed effects that

$$\mbox{MS(total w)} = \frac{\mbox{SS(raters)–MS(subjects x raters)}}{\mbox{df(raters) + df(subjects x raters)}}$$

$$\mbox{ICC1} = \frac{\mbox{MS(subjects)–MS(total w)}}{\mbox{MS(subjects) + (k-1)MS(total w)}}$$

where MS is the Mean square from the repeated measures analysis of variance.

This corresponds to the one-way random approach in SPSS (see [http://www.uvm.edu/~dhowell/StatPages/More_Stuff/icc/icc.html here).]

It follows that the intra-class correlation (ICC), unlike the Pearson correlation, is useful for pooling paired data each having three or more observations. Einfield and Tonge (1992, p 12) prefer using the ICC to the Pearson as it is more conservative owing to that fact it "takes account of the absolute as well as the relative difference between the scores of two raters".

Howell (1997) also recommends an alternative, most widely used ICC which assumes that the raters are a random sample from a larger population which has an extra term in the denominator and is of form

$$\mbox{ICC2} = \frac{\mbox{MS(subjects)–MS(subjects x raters)}}{\mbox{MS(subjects) + (k-1)MS(subjects x raters) + k[MS(raters) - MS(subjects x raters)]/n}}$$

where n is the number of subjects being rated.

e.g. if two raters rate three subjects giving ratings 1,2; 2,4; 3,6 then ICC1 = 0.80 and ICC2 = 0.46.

ICC may be computed in SPSS using analyze>scale>reliability analysis>statistics and choosing one of the two ICCs which allow a type of absolute agreement.

Examples of ICC computation in SPSS [http://www.nyu.edu/its/statistics/Docs/intracls.html are available here] and [attachment:ICC.doc here.] The fixed ICC correlations called sfsingle, sf random and sffixed in the above article are of form

$$\frac{\mbox{true inter-rater variance}}{\mbox{true inter-rater variance + common error in rating variance}}$$

as mentioned as a reliability correlation in the two rater case, for example, in [http://www-users.york.ac.uk/%7Emb55/talks/oxtalk.htm a paper by Martin Bland and Doug Altman.]

An overview of approaches to inter rater reliability including the ICC is given by Darroch and McCloud (1986).

  • [:FAQ/iccpr: Inferiority of using a Pearson correlation compared to an ICC]

References

Darroch JN, McCloud PI (1986) Category distinguishability and observer agreement Australian Journal of Statistics 28 371-88.

Howell DC (1997) Statistical methods for psychologists. Fourth edition. Wadsworth:Belmont,CA. (pages 490-493).

Einfield, SL and Tonge, BJ (1992) Manual for the developmental hebaviour checklist (DBC)(Primary Carer version). Melbourne:School of Psychiatry, Unievrsity of new South Wales, and Centre for Developmental Psychiatry, Monash University, Clayton, Victoria.

Shrout, PE and Fleiss, JL (1979). Intraclass Correlations: Uses in Assessing Rater Reliability, Psychological Bulletin, 86 (2) 420-428. (A good primer showing how anova output can be used to compute ICCs).

None: FAQ/icc (last edited 2018-04-26 11:21:52 by PeterWatson)