Diff for "FAQ/cp2" - CBU statistics Wiki
location: Diff for "FAQ/cp2"
Differences between revisions 1 and 15 (spanning 14 versions)
Revision 1 as of 2010-04-23 12:39:15
Size: 2135
Editor: PeterWatson
Comment:
Revision 15 as of 2011-01-06 11:59:09
Size: 3288
Editor: PeterWatson
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:

= Fitting a regression model assuming two changepoints = 
= Fitting a regression model assuming two changepoints =
Line 6: Line 5:
Y = A + B1 (time-K1) I(time<=k1) + B2 (time-K1) I(k1 <= time <= K2) + B3 (time-K2) I(time >=K2) Y = A + B1 (time-K1) I(time<=K1) + B2 (time-K1) I(K1 <= time <= K2) + B3 (time-K2) I(time >=K2)
Line 8: Line 7:
where A is the predicted average response at K1, A + B2 (K2-K1) is the predicted average response at K2, B1 is the change upto and including K1, B2 is the change between K1 and K2 and B3 is the change from K2 onwards. where A is the predicted average response at K1, A + B2 (K2-K1) is the predicted average response at K2, B1 is the change upto and including K1, B2 is the change between K1 and K2 and B3 is the change from K2 onwards and I() is the indicator function.
Line 10: Line 9:
The code below fits this model in R to a data set, dat, which appears to have two changepoints with an initial flat phase followed by a sharp decrease which culminates in a sharp increase towards the end of the series. In fact it follows the general parameterisation for a model with J-1 changepoints (separating J lines) is $$Y = A + B_text{1} (\mbox{time}-K_text{1}) I(\mbox{time} <= K_text{1}) + \sum_text{i}^text{J-1} [B_text{i} (\mbox{time}-K_text{i-1}) I(K_text{i-1} <= \mbox{time} <= K_text{i}) ] + B_text{J} (\mbox{time} - K_text{J-1}) I(\mbox{time} >= K_text{J-1}) $$

where A is the predicted average response at $$K_text{1}$$, $$A + B_text{i} (k_text{i}-k_text{i-1})$$ is the predicted response at $$K_text{i}$$ (i > 1) and $$B_text{i}$$ is the slope of the line leading up to $$K_text{i}$$.

As a first step we plot the example data (dat) using the R code below.

{{{
npts <- 11
grp <- c(1:npts)
dat <- c(10,10,11,10,11,9,4,2,3,5,9)
stripchart(grp ~ dat, method = "stack")
}}}

The data from the 11 time points appear to have two changepoints with an initial flat phase followed by a sharp decrease which culminates in a sharp increase towards the end of the series. The code below fits this two changepoint model in R to the above data set, dat.
Line 41: Line 53:
The best fitting model with two changepoints (k1 and k2) is one with changepoints at 5 and 9. The best fitting model with two changepoints (K1 and K2) is one with changepoints at 5 and 9.
Line 68: Line 80:
You can then plot the fitted line for the best fitting changepoint model and the original data points using the R code below which produces [attachment:cpt2.pdf this plot].

{{{
plot(grp, out$fitted.values,type="n")
lines(grp,out$fitted.values,lty=1)
points(grp,dat,pch=1)
}}}

Fitting a regression model assuming two changepoints

For two changepoints, K1 and K2, the regression can be parameterised as:

Y = A + B1 (time-K1) I(time<=K1) + B2 (time-K1) I(K1 <= time <= K2) + B3 (time-K2) I(time >=K2)

where A is the predicted average response at K1, A + B2 (K2-K1) is the predicted average response at K2, B1 is the change upto and including K1, B2 is the change between K1 and K2 and B3 is the change from K2 onwards and I() is the indicator function.

In fact it follows the general parameterisation for a model with J-1 changepoints (separating J lines) is $$Y = A + B_text{1} (\mbox{time}-K_text{1}) I(\mbox{time} <= K_text{1}) + \sum_text{i}^text{J-1} [B_text{i} (\mbox{time}-K_text{i-1}) I(K_text{i-1} <= \mbox{time} <= K_text{i}) ] + B_text{J} (\mbox{time} - K_text{J-1}) I(\mbox{time} >= K_text{J-1}) $$

where A is the predicted average response at $$K_text{1}$$, $$A + B_text{i} (k_text{i}-k_text{i-1})$$ is the predicted response at $$K_text{i}$$ (i > 1) and $$B_text{i}$$ is the slope of the line leading up to $$K_text{i}$$.

As a first step we plot the example data (dat) using the R code below.

npts <- 11
grp <- c(1:npts)
dat <- c(10,10,11,10,11,9,4,2,3,5,9)
stripchart(grp ~ dat, method = "stack")

The data from the 11 time points appear to have two changepoints with an initial flat phase followed by a sharp decrease which culminates in a sharp increase towards the end of the series. The code below fits this two changepoint model in R to the above data set, dat.

npts <- 11

grp <- c(1:npts)
dat <- c(10,10,11,10,11,9,4,2,3,5,9)

rss2 <- matrix(NA,npts,npts)

for (k1 in 1:npts-1) {
 for (k2 in (k1+1):npts) {
gp1 <- rep(0,npts)
gp1 <- (grp[]-k1) * (grp[]-k1 <= 0)
   
gp2 <- rep(0,npts)
gp2 <- (grp[]-k1) * (grp[]-k1 >= 0) * (grp[]-k2 <=0)

gp3 <- rep(0,npts)
gp3 <- (grp[]-k2) * (grp[]-k2 >= 0)

out <- lm(dat ~ gp1 + gp2 + gp3)
out

rss2[k1,k2] <- sum(out$residuals^2)
}
}

coor <- which(rss2 == min(rss2,na.rm=TRUE), arr.ind = TRUE)

The best fitting model with two changepoints (K1 and K2) is one with changepoints at 5 and 9.

> print(k1)
[1] 5
> print(k2)
[1] 9

You can also output further diagnostics for the best fitting model. The R code below, for example, outputs the predicted values.

k1 <- coor[1]
k2 <- coor[2]

gp1 <- rep(0,npts)
gp1 <- (grp[]-k1) * (grp[]-k1 <= 0)
   
gp2 <- rep(0,npts)
gp2 <- (grp[]-k1) * (grp[]-k1 >= 0) * (grp[]-k2 <=0)

gp3 <- rep(0,npts)
gp3 <- (grp[]-k2) * (grp[]-k2 >= 0)

out <- lm(dat ~ gp1 + gp2 + gp3)
out$fitted.values

You can then plot the fitted line for the best fitting changepoint model and the original data points using the R code below which produces [attachment:cpt2.pdf this plot].

plot(grp, out$fitted.values,type="n")
lines(grp,out$fitted.values,lty=1)
points(grp,dat,pch=1)

One should decide on the number of changepoints apriori. In this way the number of changepoints should not be compared because the more changepoints there are the better the fit to the data culminating in a model (which usually makes no psychological sense) which assumes a changepoint occurs at each data point which will trivially always perfectly fit the data.

None: FAQ/cp2 (last edited 2013-03-08 10:17:57 by localhost)