A Run Length Transformation for Discriminating Between Auto Regressive Time Series

Bagnall, Anthony; Janacek, Gareth

doi:10.1007/s00357-013-9135-6

A Run Length Transformation for Discriminating Between Auto Regressive Time Series

Published: 18 July 2013

Volume 31, pages 154–178, (2014)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Anthony Bagnall¹ &
Gareth Janacek¹

565 Accesses
112 Citations
Explore all metrics

Abstract

We describe a simple time series transformation to detect differences in series that can be accurately modelled as stationary autoregressive (AR) processes. The transformation involves forming the histogram of above and below the mean run lengths. The run length (RL) transformation has the benefits of being very fast, compact and updatable for new data in constant time. Furthermore, it can be generated directly from data that has already been highly compressed. We first establish the theoretical asymptotic relationship between run length distributions and AR models through consideration of the zero crossing probability and the distribution of runs. We benchmark our transformation against two alternatives: the truncated Autocorrelation function (ACF) transform and the AR transformation, which involves the standard method of fitting the partial autocorrelation coefficients with the Durbin-Levinson recursions and using the Akaike Information Criterion stopping procedure. Whilst optimal in the idealized scenario, representing the data in these ways is time consuming and the representation cannot be updated online for new data. We show that for classification problems the accuracy obtained through using the run length distribution tends towards that obtained from using the full fitted models. We then propose three alternative distance measures for run length distributions based on Gower’s general similarity coefficient, the likelihood ratio and dynamic time warping (DTW). Through simulated classification experiments we show that a nearest neighbour distance based on DTW converges to the optimal faster than classifiers based on Euclidean distance, Gower’s coefficient and the likelihood ratio. We experiment with a variety of classifiers and demonstrate that although the RL transform requires more data than the best performing classifier to achieve the same accuracy as AR or ACF, this factor is at worst non-increasing with the series length, m, whereas the relative time taken to fit AR and ACF increases with m. We conclude that if the data is stationary and can be suitably modelled by an AR series, and if time is an important factor in reaching a discriminatory decision, then the run length distribution transform is a simple and effective transformation to use.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

ABRAHAM, Z., and TAN, P. (2010), “An Integrated Framework for Simultaneous Classification and Regression of Time-Series Data”, in Proceedings of the SIAM International Conference on Data Mining, SDM 2010, pp. 653–664.
AGRAWAL, R., FALOUTSOS, C., and SWAMI, A. (1993), “Efficient Similarity Search in Sequence Databases”, in Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms, Chicago, also in Lecture Notes in Computer Science 730, Springer Verlag, pp. 169–184.
BAGNALL, A., and JANACEK, G. (2004), “Clustering Time Series from ARMA Models with Clipped Data”, Machine Learning, 58(2), 151–178.
Google Scholar
BAGNALL, A., RATANAMAHATANA, C., KEOGH, E., LONARDI, S., and JANACEK, G. (2006), “A Bit Level Representation for Time Series Data Mining with Shape Based Similarity”, Data Mining and Knowledge Discovery Journal, 13(1), 11–40.
Article MathSciNet Google Scholar
BAGNALL, A., DAVIS, L., HILLS, J., and LINES, J. (2012), “Transformation Based Ensembles for Time Series Classification”, in Proceedings of the International Conference on Data Mining, SDM 2012, pp. 307–318.
BOJANCZYK, A.W., BRENT, R.P., and De HOOG, F.R., (1995), “A Weakly Stable algorithm for General Toeplitz Systems”, Numerical Algorithms, 10, 225–244.
Article MATH MathSciNet Google Scholar
BOX, G., JENKINS, G., and REINSEL, G., (2008), Time Series Analysis, Forecasting and Control (4th ed.), John Wiley and Sons.
BUZA, K., NANOPOULOS, A., and SCHMIDT-THIEME, L. (2011), “Fusion of Similarity Measures for Time Series Classification”, in Proceedings of the 6th International Conference on Hybrid Artificial Intelligent Systems, also in Lecture Notes in Computer Science 6679, Springer Verlag, pp. 253–261.
CORDUAS, M., and PICCOLO, D. (2008), “Time Series Clustering and Classification by the Autoregressive Metric”, Computational Statistics and Data Analysis, 52(4), 1860–1872.
Article MATH MathSciNet Google Scholar
COX, T., and COX, M. (2000), “A General Weighted Two-Way Dissimilarity Coefficient”, Journal of Classification, 17(1), 101–121.
Article MATH MathSciNet Google Scholar
DENG, H., RUNGER, G., TUV, E., and VLADIMIR, M. (2013), ”A Time Series Forest for Classification and Feature Extraction”, Information Sciences, 239, 142–153.
Article MathSciNet Google Scholar
DING, H., TRAJCEVSKI, G., SCHEUERMANN, P., WANG,X., and KEOGH, E. (2008), “Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measure”, in Proceedings of the 34th International Conference on Very Large Data Bases (VLDB), pp. 1542–1552.
DOUZAL-CHOUAKRIA, A., and AMBLARD, C. (2012), “Classification Trees for Time Series”, Pattern Recognition, 45(3), 1076–1091.
Article Google Scholar
DOUZAL-CHOUAKRIA, A., DIALLO, A., and GIROUD, F. (2010), “A Random-Periods Model for the Comparison of a Metrics Efficiency to Classify Cell-Cycle Expressed Genes”, Pattern Recognition Letters, 31(12), 1608–1617.
Google Scholar
DURBIN, J. (1960), “The Fitting of Time Series Models”, Review of the International Statistical Institute, 28(4), 233–243.
Article MATH Google Scholar
GOWER, J.C. (1971), “A General Coefficient of Similarity and Some of its Properties”, Biometrics, 27(4), 857–871.
Article Google Scholar
HE, S., and KEDEM, B. (1990), “The Zero-Crossing Rate of Autoregressive Processes and Its Link to Unit Roots”, Journal of Time Series Analysis, 11(3), 201–213.
Article MATH MathSciNet Google Scholar
JEONG, Y., JEONG, M., and OMITAOMU, O. (2011), “Weighted Dynamic Time Warping for Time Series Classification”, Pattern Recognition, 44(9), 2231–2240.
Article Google Scholar
KALPAKIS, K., GADA, D., and PUTTANGUNTA, V. (2001), “Distance Measures for Effective Clustering of ARIMA Time-Series”, in IEEE Proceedings of the International Conference on Data Mining (ICDM), San Jose CA, pp. 273–280.
KEOGH, E., and FOLIAS, T., “The UCR Time SeriesDataMining Archive”, http://www.cs.ucr.edu/~eamonn/time_series_data/.
LIAO, T.W. (2005), “Clustering of Time Series Data: A Survey”, Pattern Recognition, 38(11), 1857–1874.
Article MATH Google Scholar
LINES, J., DAVIS, L., HILLS, J., and BAGNALL, A. (2012), “A Shapelet Transform for Time Series Classification”, in Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Beijing, China, pp. 289–297.
MAHARAJ, E.A. (2000), Clusters of Time Series, Journal of Classification, 17, 297–314.
Article MATH MathSciNet Google Scholar
MAHARAJ, E.A. (1999), “Comparison and Classification of Stationary Multivariate Time Series”, Pattern Recognition, 32(7), 1129–1138.
Article MathSciNet Google Scholar
MAHARAJ, E.A. (1996), “A Significance Test for Classifying ARMA Models”, Journal of Statistical Computation and Simulation, 54(4), 305–331.
Article MATH MathSciNet Google Scholar
PENG, C.K., MIETUS, J.E., LIU, Y., KHALSA, G., DOUGLAS, P.S., BENSON, H., and GOLDBERGER, A.L. (1999), “Exaggerated Heart Rate Oscillations During Two Meditation Techniques”, International Journal of Cardiology, 70(2), 101–107.
Article Google Scholar
PHYSIONET, “PhysioBank’s Data Archive”, http://www.physionet.org/.
PICCOLO, D. (1990), “A Distance Measure for Classifying ARIMA Models”, Journal of Time Series Analysis, 11(2), 153–164.
Article MATH Google Scholar
QAIRUNNISA, S., CHANDRASHEKAR,M., REVATHI,M., KONDAM, A.,MADHURI, B.A., and SUESH, M. (2012), “A Study on Modulation on Cardiovascular Response to Yoga Training”, International Journal of Biological and Medical Research, 3(2),1662–1666.
Google Scholar
RATANAMAHATANA, C., and KEOGH, E. (2005), “Three Myths About Dynamic Time Warping Data Mining”, in Proceedings of the SIAM International Conference on Data Mining, DSM 2005, pp. 506–510.
RODRIGUEZ, J., ALONSO, C., and MAESTRO, J. (2005), ”Support Vector Machines of Interval-Based Features for Time Series Classification”, Knowledge-Based Systems 18, pp. 171–178.
Article Google Scholar
VAN WYK, B., VAN WYK, M., and QI, G. (2009), “Difference Histograms: A New Tool for Time Series Analysis Applied to Bearing Fault Diagnosis”, Pattern Recognition Letters, 30(6), 595–599.
Article Google Scholar
WITTEN, I., FRANK, L., and HALL,M., (2011), Data Mining: Practical Machine Learning Tools and Techniques (3rd Ed.), San Francisco: Morgan Kaufmann.
WU, Y., AGRAWAL, D., and EL ABBADI, A. (2000), “A Comparison of DFT and DWT Based Similartiy Search in Time-Series Databases”, in Proceedings of the 9th International Conference on Information Knowledge Management, McLean VA, NewYork: ACM Press, pp. 488–495.
YE, L., and KEOGH, E. (2009), “Time Series Shapelets: A New Primitive for Data Mining”, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, New York: ACM Press, pp. 947–956.

Download references

Author information

Authors and Affiliations

University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
Anthony Bagnall & Gareth Janacek

Authors

Anthony Bagnall
View author publications
You can also search for this author in PubMed Google Scholar
Gareth Janacek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anthony Bagnall.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bagnall, A., Janacek, G. A Run Length Transformation for Discriminating Between Auto Regressive Time Series. J Classif 31, 154–178 (2014). https://doi.org/10.1007/s00357-013-9135-6

Download citation

Published: 18 July 2013
Issue Date: July 2014
DOI: https://doi.org/10.1007/s00357-013-9135-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Run Length Transformation for Discriminating Between Auto Regressive Time Series

Abstract

Access this article

Similar content being viewed by others

Time Series Segmentation Procedures to Detect, Locate and Estimate Change-Points

A Novel First-Order Autoregressive Moving Average Model to Analyze Discrete-Time Series Irregularly Observed

Comparing Multistep Ahead Forecasting Functions for Time Series Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Run Length Transformation for Discriminating Between Auto Regressive Time Series

Abstract

Access this article

Similar content being viewed by others

Time Series Segmentation Procedures to Detect, Locate and Estimate Change-Points

A Novel First-Order Autoregressive Moving Average Model to Analyze Discrete-Time Series Irregularly Observed

Comparing Multistep Ahead Forecasting Functions for Time Series Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation