Diff for "FAQ/Poisson" - CBU statistics Wiki
location: Diff for "FAQ/Poisson"
Differences between revisions 2 and 29 (spanning 27 versions)
Revision 2 as of 2008-07-23 11:55:06
Size: 1503
Editor: PeterWatson
Comment:
Revision 29 as of 2019-11-11 12:23:04
Size: 4118
Editor: PeterWatson
Comment:
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
To do this we can assume the number of correct responses, x, follows a Poisson distribution, Po($$\mu$$) of general form: To do this we can assume the number of correct responses, x, follows a Poisson distribution, Po(m) of general form:
Line 9: Line 9:
$$P(X=x|random responses) = \frac{\mu^text{x}}{x!}e^text{-\mu}$$ $$P(X=x|\mbox{random responses}) = [m^x ^ ]/[x! ]e^-m ^
Line 11: Line 11:
where $$\mu$$ is the expected total number of correct responses for N questions. Since this equals N/k we can rewrite the above as: where $$m$$ is the expected total number of correctly answered questions from the N questions. Since this equals N/k we can rewrite the above as:
Line 13: Line 13:
$$P(X=x|random responses) = \frac{(N/k)^text{x}}{x!}e^text{-(N/k)}$$ $$P(X=x|\mbox{random responses}) = [(N/k)^x ^] / [x!] e^-(N/k)^
Line 15: Line 15:
For large N the Poisson distribution is approximately
Normally distributed with mean and variance both equal to N/k.
one-tailed p-value = P(X \leq x) = (sum from 1 to x) P(X=x|\mbox{random responses})
= 0.5(two-tailed p-value)
Line 18: Line 18:
so For large N the Poisson distribution is approximately Normally distributed with mean and variance both equal to N/k so we can analogously obtain a one-tailed p-value as:
Line 20: Line 20:
$$P(X=x) = INV(\frac{x-(N/k)}{\sqrt{N/k}}$$

where INV is the inverse Normal function.
P(X $$\leq$$ x) = Probit( [x-(N/k)] divided by [Square Root{N/k}] ).
Line 28: Line 26:
the expUsing the Poisson distribution this equals The expected number of correct responses assuming the patient is responding at random is 14/3=4.67.
Line 30: Line 28:
Using the Poisson distribution

P(X $$\leq$$ 3)

= P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)

 = (1 + 4.67 + [4.67^2 ^]/[2] + [4.67^3 ^]/[6])e^-4.67 ^

= 0.31.

We can think of the p-value as the sum of numbers correct which are ''less likely'' assuming random responses than the observed one. It turns out that P(X $$\geq$$ 6) also accounts for ''less likely'' occurrences than the observed 3 correctly answered questions, given random responses. So the ''exact'' two-sided p-value is the sum of the Poisson probabilities P(X $$\leq$$ 3) and P(X $$\geq$$ 6) which equals 0.64.

P(X $$\leq$$ 3) is approximately equal to P(X $$\geq$$ 6) because the Poisson distribution is symmetric about its expected value, N/k, the two-tailed p-value can also be computed as approximately equal to twice the one-tailed p-value = 0.31 x 2= 0.62. This is close to the p-value of 0.64 using the sum of the poisson probabilities above.

The approximate or exact p-values from a Poisson distribution both conclude that there is no evidence to suggest a score of 3 on a three-choice task of 14 questions differs from chance responses.

We can also evaluate probabilities of observing 3 correct responses due to chance using the ''Normal approximation'' to the Poisson distribution:

$$P(X \leq 3) = Probit (\frac{3-4.67}{\sqrt{4.67}}) = Probit(-0.772) = 0.22$$ with a two-sided p-value of 0.44. The two-tailed probability equals $$P(X \leq 3) + P(X \geq 6.33)$$ since the Normal distribution probabilities are, like those of the Poisson distribution, symmetric about the mean of 4.67. Of course we can't observed 6.33 correct responses but this is a continuous approximation to the discrete Poisson distribution - like joining lines between frequency bars on a histogram of the number of correct responses!

As with the Poisson distribution we conclude there is no evidence to suggest getting 3 questions correct on a three-choice task of 14 questions differs from chance. The exact Poisson two-sided p-value and its Normal approximation may be computed using a [[attachment:multichoice_exact.xls|spreadsheet.]]

In practice for over 30 questions (N) the Poisson and Normal approximations should closely agree. For less than 30 questions the Poisson is preferable as it is a discrete distribution assuming, as in this example, only integer values can occur (ie that numbers of correctly answered questions are whole numbers).

How do I compare observed numbers correct to those expected by chance in a multi-choice task?

Suppose a questionnaire has a set of k possible responses to N questions of which one of the k choices is the correct response. Assuming by chance each of the k responses to each question is equally likely the number expected by chance equals N/k.

Suppose we observe x $$\leq$$ N responses for a patient. We wish to see how likely x responses are given the patient responds at random to k choices on each of N questions.

To do this we can assume the number of correct responses, x, follows a Poisson distribution, Po(m) of general form:

$$P(X=x|\mbox{random responses}) = [mx ]/[x! ]e-m

where $$m$$ is the expected total number of correctly answered questions from the N questions. Since this equals N/k we can rewrite the above as:

$$P(X=x|\mbox{random responses}) = [(N/k)x ] / [x!] e-(N/k)

one-tailed p-value = P(X \leq x) = (sum from 1 to x) P(X=x|\mbox{random responses}) = 0.5(two-tailed p-value)

For large N the Poisson distribution is approximately Normally distributed with mean and variance both equal to N/k so we can analogously obtain a one-tailed p-value as:

P(X $$\leq$$ x) = Probit( [x-(N/k)] divided by [Square Root{N/k}] ).

Example

Suppose we have 14 questions each with 3 possible responses of which only one is correct and a patient gets a total score of 3 correct responses. We wish to determine how likely it is we would observe 3 or fewer responses given the patient has responded at random (1 sided p-value).

The expected number of correct responses assuming the patient is responding at random is 14/3=4.67.

Using the Poisson distribution

P(X $$\leq$$ 3)

= P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)

  • = (1 + 4.67 + [4.672 ]/[2] + [4.673 ]/[6])e-4.67

= 0.31.

We can think of the p-value as the sum of numbers correct which are less likely assuming random responses than the observed one. It turns out that P(X $$\geq$$ 6) also accounts for less likely occurrences than the observed 3 correctly answered questions, given random responses. So the exact two-sided p-value is the sum of the Poisson probabilities P(X $$\leq$$ 3) and P(X $$\geq$$ 6) which equals 0.64.

P(X $$\leq$$ 3) is approximately equal to P(X $$\geq$$ 6) because the Poisson distribution is symmetric about its expected value, N/k, the two-tailed p-value can also be computed as approximately equal to twice the one-tailed p-value = 0.31 x 2= 0.62. This is close to the p-value of 0.64 using the sum of the poisson probabilities above.

The approximate or exact p-values from a Poisson distribution both conclude that there is no evidence to suggest a score of 3 on a three-choice task of 14 questions differs from chance responses.

We can also evaluate probabilities of observing 3 correct responses due to chance using the Normal approximation to the Poisson distribution:

$$P(X \leq 3) = Probit (\frac{3-4.67}{\sqrt{4.67}}) = Probit(-0.772) = 0.22$$ with a two-sided p-value of 0.44. The two-tailed probability equals $$P(X \leq 3) + P(X \geq 6.33)$$ since the Normal distribution probabilities are, like those of the Poisson distribution, symmetric about the mean of 4.67. Of course we can't observed 6.33 correct responses but this is a continuous approximation to the discrete Poisson distribution - like joining lines between frequency bars on a histogram of the number of correct responses!

As with the Poisson distribution we conclude there is no evidence to suggest getting 3 questions correct on a three-choice task of 14 questions differs from chance. The exact Poisson two-sided p-value and its Normal approximation may be computed using a spreadsheet.

In practice for over 30 questions (N) the Poisson and Normal approximations should closely agree. For less than 30 questions the Poisson is preferable as it is a discrete distribution assuming, as in this example, only integer values can occur (ie that numbers of correctly answered questions are whole numbers).

None: FAQ/Poisson (last edited 2019-11-11 12:23:04 by PeterWatson)