Package 'ecp' reference manual

Title:	Non-Parametric Multiple Change-Point Analysis of Multivariate Data
Description:	Implements various procedures for finding multiple change-points from Matteson D. et al (2013) <doi:10.1080/01621459.2013.849605>, Zhang W. et al (2017) <doi:10.1109/ICDMW.2017.44>, Arlot S. et al (2019). Two methods make use of dynamic programming and pruning, with no distributional assumptions other than the existence of certain absolute moments in one method. Hierarchical and exact search methods are included. All methods return the set of estimated change- points as well as other summary information.
Authors:	Nicholas A. James [aut], Wenyu Zhang [aut, cre], David S. Matteson [aut]
Maintainer:	Wenyu Zhang <[email protected]>
License:	GPL (>= 2)
Version:	3.1.6
Built:	2025-03-27 04:33:53 UTC
Source:	https://github.com/cran/ecp

Bladder Tumor Micro-Array Data

Description

Micro-array data for 43 different individuals with a bladder tumor.

Usage

data(ACGH)data(ACGH)

Format

A list with the following components.

data: The micro-array data for 43 individuals. This information is stored in a 2215 by 43 matrix.

individual: A numeric vector indicating which individuals' mico-array data are present.

Source

Bleakley K., Vert J.-P. (2011), The group fused Lasso for multiple change-point detection

N. Stransky, C. Vallot, F. Reyal, I. Bernard-Pierrot, S.G. Diez de Mediana, R. Segraves, Y. de Rycke, P. Elvin, A. Cassidy, C. Sparaggon, A. Graham, j. Southgate, B. Asselain, Y. Allory, C. C. Addou, D. G. Albertson, J.-P. Thiery, D. K. Chopin, D. Pinkel, and F. Radvanyi. Regional copy number-independent deregulation of transcription in cancer. Nat. Genet., 38(12):1386-1396, Dec 2006

References

Bleakley K., Vert J.-P. (2011), The group fused Lasso for multiple change-point detection

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Examples

data(ACGH, package="ecp")
data(ACGH, package="ecp")

Dow Jones Industrial Average Index

Description

The weekly log returns for the Dow Jones Industrial Average index from April 1990 to January 2012.

Usage

data(DJIA)data(DJIA)

Format

A list with the following components.

dates: A character vector of dates associated with each observation in the returns series.

index: Weekly log returns from April 1990 to January 2012 of the DOW 30 index.

market: Weekly log returns from April 1990 to January 2012, for the companies in the DOW 30 apart from Kraft.

Source

http://research.stlouisfed.org/fred2/series/DJIA/downloaddata

References

Examples

data(DJIA, package="ecp")
data(DJIA, package="ecp")

ENERGY AGGLOMERATIVE

Description

An agglomerative hierarchical estimation algorithm for multiple change point analysis.

Usage

e.agglo(X, member=1:nrow(X), alpha=1, penalty=function(cps){0})
e.agglo(X, member=1:nrow(X), alpha=1, penalty=function(cps){0})

Arguments

`X`	A T x d matrix containing the length T time series with d-dimensional observations.
`member`	Initial membership vector for the time series.
`alpha`	Moment index used for determining the distance between and within clusters.
`penalty`	Function used to penalize the obtained goodness-of-fit statistics. This function takes as its input a vector of change point locations (cps).

Details

Homogeneous clusters are created based on the initial clustering provided by the member argument. In each iteration, clusters are merged so as to maximize a goodness-of-fit statistic. The computational complexity of this method is O(T^2), where T is the number of observations.

Value

Returns a list with the following components.

`merged`	A (T-1) x 2 matrix indicating which segments were merged at each step of the agglomerative procedure.
`fit`	Vector showing the progression of the penalized goodness-of-fit statistic.
`progression`	A T x (T+1) matrix showing the progression of the set of change points.
`cluster`	The estimated cluster membership vector.
`estimates`	The location of the estimated change points.

Author(s)

Nicholas A. James

References

Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Examples

set.seed(100)
mem = rep(c(1,2,3,4),times=c(10,10,10,10))
x = as.matrix(c(rnorm(10,0,1),rnorm(20,2,1),rnorm(10,-1,1)))
y = e.agglo(X=x,member=mem,alpha=1,penalty=function(cp,Xts) 0)
y$estimates


## Not run: 
# Multivariate spatio-temporal example
# You will need the following packages:
#	mvtnorm, combinat, and MASS
library(mvtnorm); library(combinat); library(MASS)
set.seed(2013)
lambda = 1500 #overall arrival rate per unit time
muA = c(-7,-7) ; muB = c(0,0) ; muC = c(5.5,0)
covA = 25*diag(2)
covB = matrix(c(9,0,0,1),2)
covC = matrix(c(9,.9,.9,9),2)
time.interval = matrix(c(0,1,3,4.5,1,3,4.5,7),4,2)
#mixing coefficents
mixing.coef = rbind(c(1/3,1/3,1/3),c(.2,.5,.3), c(.35,.3,.35), 
	c(.2,.3,.5))
stppData = NULL
for(i in 1:4){
	count = rpois(1, lambda* diff(time.interval[i,]))
	Z = rmultz2(n = count, p = mixing.coef[i,])
	S = rbind(rmvnorm(Z[1],muA,covA), rmvnorm(Z[2],muB,covB),
		rmvnorm(Z[3],muC,covC))
	X = cbind(rep(i,count), runif(n = count, time.interval[i,1],
		time.interval[i,2]), S)
	stppData = rbind(stppData, X[order(X[,2]),])
}
member = as.numeric(cut(stppData[,2], breaks = seq(0,7,by=1/12)))
output = e.agglo(X=stppData[,3:4],member=member,alpha=1,
	penalty=function(cp,Xts) 0)

## End(Not run)
set.seed(100)
mem = rep(c(1,2,3,4),times=c(10,10,10,10))
x = as.matrix(c(rnorm(10,0,1),rnorm(20,2,1),rnorm(10,-1,1)))
y = e.agglo(X=x,member=mem,alpha=1,penalty=function(cp,Xts) 0)
y$estimates


## Not run: 
# Multivariate spatio-temporal example
# You will need the following packages:
#	mvtnorm, combinat, and MASS
library(mvtnorm); library(combinat); library(MASS)
set.seed(2013)
lambda = 1500 #overall arrival rate per unit time
muA = c(-7,-7) ; muB = c(0,0) ; muC = c(5.5,0)
covA = 25*diag(2)
covB = matrix(c(9,0,0,1),2)
covC = matrix(c(9,.9,.9,9),2)
time.interval = matrix(c(0,1,3,4.5,1,3,4.5,7),4,2)
#mixing coefficents
mixing.coef = rbind(c(1/3,1/3,1/3),c(.2,.5,.3), c(.35,.3,.35), 
	c(.2,.3,.5))
stppData = NULL
for(i in 1:4){
	count = rpois(1, lambda* diff(time.interval[i,]))
	Z = rmultz2(n = count, p = mixing.coef[i,])
	S = rbind(rmvnorm(Z[1],muA,covA), rmvnorm(Z[2],muB,covB),
		rmvnorm(Z[3],muC,covC))
	X = cbind(rep(i,count), runif(n = count, time.interval[i,1],
		time.interval[i,2]), S)
	stppData = rbind(stppData, X[order(X[,2]),])
}
member = as.numeric(cut(stppData[,2], breaks = seq(0,7,by=1/12)))
output = e.agglo(X=stppData[,3:4],member=member,alpha=1,
	penalty=function(cp,Xts) 0)

## End(Not run)

CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA E-STATISTIC)

Description

An algorithm for multiple change point analysis that uses dynamic programming and pruning. The E-statistic is used as the goodness-of-fit measure.

Usage

e.cp3o(Z, K=1, minsize=30, alpha=1, verbose=FALSE)
e.cp3o(Z, K=1, minsize=30, alpha=1, verbose=FALSE)

Arguments

`Z`	A T x d matrix containing the length T time series with d-dimensional observations.
`K`	The maximum number of change points.
`minsize`	The minimum segment size.
`alpha`	The moment index used for determining the distance between and within segments.
`verbose`	A flag indicating if status updates should be printed.

Details

Segmentations are found through the use of dynamic programming and pruning. For long time series, consider using e.cp3o_delta.

Value

The returned value is a list with the following components.

`number`	The estimated number of change points.
`estimates`	The location of the change points estimated by the procedure.
`gofM`	A vector of goodness of fit values for differing number of change points. The first entry corresponds to when there is only a single change point, the second for when there are two, and so on.
`cpLoc`	The list of locations of change points estimated by the procedure for different numbers of change points up to K.
`time`	The total amount to time take to estimate the change point locations.

Author(s)

Nicholas A. James, Wenyu Zhang

References

W. Zhang, N. A. James and D. S. Matteson, "Pruning and Nonparametric Multiple Change Point Detection," 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, 2017, pp. 288-295.

Examples

set.seed(400)
x1 = matrix(c(rnorm(50),rnorm(50,3)))
y1 = e.cp3o(Z=x1, K=2, minsize=30, alpha=1, verbose=FALSE)
#View estimated change point locations
y1$estimates
set.seed(400)
x1 = matrix(c(rnorm(50),rnorm(50,3)))
y1 = e.cp3o(Z=x1, K=2, minsize=30, alpha=1, verbose=FALSE)
#View estimated change point locations
y1$estimates

CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA E-STATISTIC)

Description

An algorithm for multiple change point analysis that uses dynamic programming and pruning. The E-statistic is used as the goodness-of-fit measure.

Usage

e.cp3o_delta(Z, K=1, delta=29, alpha=1, verbose=FALSE)
e.cp3o_delta(Z, K=1, delta=29, alpha=1, verbose=FALSE)

Arguments

`Z`	A T x d matrix containing the length T time series with d-dimensional observations.
`K`	The maximum number of change points.
`delta`	The window size used to calculate the calculate the complete portion of our approximate test statistic. This also corresponds to one less than the minimum segment size.
`alpha`	The moment index used for determining the distance between and within segments.
`verbose`	A flag indicating if status updates should be printed.

Details

Segmentations are found through the use of dynamic programming and pruning. Between-segment distances are calculated only using points within a window of the segmentation point. The computational complexity of this method is O(KT^2), where K is the maximum number of change points, and T is the number of observations.

Value

The returned value is a list with the following components.

`number`	The estimated number of change points.
`estimates`	The location of the change points estimated by the procedure.
`gofM`	A vector of goodness of fit values for differing number of change points. The first entry corresponds to when there is only a single change point, the second for when there are two, and so on.
`cpLoc`	The list of locations of change points estimated by the procedure for different numbers of change points up to K.
`time`	The total amount to time take to estimate the change point locations.

Author(s)

Nicholas A. James, Wenyu Zhang

References

Examples

set.seed(400)
x1 = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y1 = e.cp3o_delta(Z=x1, K=7, delta=29, alpha=1, verbose=FALSE)
#View estimated change point locations
y1$estimates
set.seed(400)
x1 = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y1 = e.cp3o_delta(Z=x1, K=7, delta=29, alpha=1, verbose=FALSE)
#View estimated change point locations
y1$estimates

ENERGY DIVISIVE

Description

A divisive hierarchical estimation algorithm for multiple change point analysis.

Usage

e.divisive(X, sig.lvl=.05, R=199, k=NULL, min.size=30, alpha=1)
e.divisive(X, sig.lvl=.05, R=199, k=NULL, min.size=30, alpha=1)

Arguments

`X`	A T x d matrix containing the length T time series with d-dimensional observations.
`sig.lvl`	The level at which to sequentially test if a proposed change point is statistically significant.
`R`	The maximum number of random permutations to use in each iteration of the permutation test. The permutation test p-value is calculated using the method outlined in Gandy (2009).
`k`	Number of change point locations to estimate, suppressing permutation based testing. If k=NULL then only the statistically significant estimated change points are returned.
`min.size`	Minimum number of observations between change points.
`alpha`	The moment index used for determining the distance between and within segments.

Details

Segments are found through the use of a binary bisection method and a permutation test. The computational complexity of this method is O(kT^2), where k is the number of estimated change points, and T is the number of observations.

Value

The returned value is a list with the following components.

`k.hat`	The number of clusters within the data created by the change points.
`order.found`	The order in which the change points were estimated.
`estimates`	Locations of the statistically significant change points.
`considered.last`	Location of the last change point, that was not found to be statistically significant at the given significance level.
`permutations`	The number of permutations performed by each of the sequential permutation test.
`cluster`	The estimated cluster membership vector.
`p.values`	Approximate p-values estimated from each permutation test.

Author(s)

Nicholas A. James

References

Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Examples

## Not run: 
set.seed(100)
x1 = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y1 = e.divisive(X=x1,sig.lvl=0.05,R=199,k=NULL,min.size=30,alpha=1)
x2 = rbind(MASS::mvrnorm(100,c(0,0),diag(2)),
	MASS::mvrnorm(100,c(2,2),diag(2)))
y2 = e.divisive(X=x2,sig.lvl=0.05,R=499,k=NULL,min.size=30,alpha=1)

## End(Not run)
## Not run: 
set.seed(100)
x1 = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y1 = e.divisive(X=x1,sig.lvl=0.05,R=199,k=NULL,min.size=30,alpha=1)
x2 = rbind(MASS::mvrnorm(100,c(0,0),diag(2)),
	MASS::mvrnorm(100,c(2,2),diag(2)))
y2 = e.divisive(X=x2,sig.lvl=0.05,R=499,k=NULL,min.size=30,alpha=1)

## End(Not run)

Kernel Change Point Analysis

Description

An algorithm for multiple change point analysis that uses the 'kernel trick' and dynamic programming.

Usage

kcpa(X, L, C)
kcpa(X, L, C)

Arguments

`X`	A T x d matrix containing the length T time series with d-dimensional observations.
`L`	The maximum number of change points.
`C`	The constant used to penalize the inclusion of additional change points in the fitted model.

Details

Segments are found through the use of dynamic programming and the kernel trick.

Value

If the algorithm determines that the best fit is obtained through using k change points then the returned value is an array of length k, containing the change point locations.

Author(s)

Nicholas A. James

References

Arlot S., Celisse A., Harchaoui Z. (2019). A Kernel Multiple Change-point Algorithm via Model Selection. J. Mach. Learn. Res., 20, 162:1-162:56.

CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA KOLMOGOROV-SMIRNOV STATISTIC)

Description

An algorithm for multiple change point analysis that uses dynamic programming and pruning. The Kolmogorov-Smirnov statistic is used as the goodness-of-fit measure.

Usage

ks.cp3o(Z, K=1, minsize=30, verbose=FALSE)
ks.cp3o(Z, K=1, minsize=30, verbose=FALSE)

Arguments

`Z`	A T x d matrix containing the length T time series with d-dimensional observations.
`K`	The maximum number of change points.
`minsize`	The minimum segment size.
`verbose`	A flag indicating if status updates should be printed.

Details

Segmentations are found through the use of dynamic programming and pruning. For long time series, consider using ks.cp3o_delta.

Value

The returned value is a list with the following components.

`number`	The estimated number of change points.
`estimates`	The location of the change points estimated by the procedure.
`gofM`	A vector of goodness of fit values for differing number of change points. The first entry corresponds to when there is only a single change point, the second for when there are two, and so on.
`cpLoc`	The list of locations of change points estimated by the procedure for different numbers of change points up to K.
`time`	The total amount to time take to estimate the change point locations.

Author(s)

Wenyu Zhang

References

Examples

set.seed(400)
x = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y = ks.cp3o(Z=x, K=7, minsize=30, verbose=FALSE)
#View estimated change point locations
y$estimates
set.seed(400)
x = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y = ks.cp3o(Z=x, K=7, minsize=30, verbose=FALSE)
#View estimated change point locations
y$estimates

CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA KOLMOGOROV-SMIRNOV STATISTIC)

Description

An algorithm for multiple change point analysis that uses dynamic programming and pruning. The Kolmogorov-Smirnov statistic is used as the goodness-of-fit measure.

Usage

ks.cp3o_delta(Z, K=1, minsize=30, verbose=FALSE)
ks.cp3o_delta(Z, K=1, minsize=30, verbose=FALSE)

Arguments

`Z`	A T x d matrix containing the length T time series with d-dimensional observations.
`K`	The maximum number of change points.
`minsize`	The minimum segment size. This is also the window size used to calculate between-segment distances.
`verbose`	A flag indicating if status updates should be printed.

Details

Segmentations are found through the use of dynamic programming and pruning. Between-segment distances are calculated only using points within a window of the segmentation point.

Value

The returned value is a list with the following components.

`number`	The estimated number of change points.
`estimates`	The location of the change points estimated by the procedure.
`gofM`	A vector of goodness of fit values for differing number of change points. The first entry corresponds to when there is only a single change point, the second for when there are two, and so on.
`cpLoc`	The list of locations of change points estimated by the procedure for different numbers of change points up to K.
`time`	The total amount to time take to estimate the change point locations.

Author(s)

Wenyu Zhang

References

Examples

set.seed(400)
x = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y = ks.cp3o_delta(Z=x, K=7, minsize=30, verbose=FALSE)
#View estimated change point locations
y$estimates
set.seed(400)
x = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y = ks.cp3o_delta(Z=x, K=7, minsize=30, verbose=FALSE)
#View estimated change point locations
y$estimates

Package 'ecp'

Help Index

Bladder Tumor Micro-Array Data

Description

Usage

Format

Source

References

Examples

Dow Jones Industrial Average Index

Description

Usage

Format

Source

References

Examples

ENERGY AGGLOMERATIVE

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA E-STATISTIC)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA E-STATISTIC)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

ENERGY DIVISIVE

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Kernel Change Point Analysis

Description

Usage

Arguments

Details

Value

Author(s)

References

CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA KOLMOGOROV-SMIRNOV STATISTIC)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA KOLMOGOROV-SMIRNOV STATISTIC)

Description

Usage

Arguments

Details

Value