Abridged Symbolic Representation of Time Series for Clustering

Jerzy Korzeniewski

doi:10.18778/0208-6018.341.03

Authors

Jerzy Korzeniewski Uniwersytet Łódzki https://orcid.org/0000-0001-6526-5921

DOI:

https://doi.org/10.18778/0208-6018.341.03

Keywords:

clustering, time series, symbolic representation, data mining

Abstract

In recent years a couple of methods aimed at time series symbolic representation have been introduced or developed. This activity is mainly justified by practical considerations such memory savings or fast data base searching. However, some results suggest that in the subject of time series clustering symbolic representation can even upgrade the results of clustering. The article contains a proposal of a new algorithm directed at the task of time series abridged symbolic representation with the emphasis on efficient time series clustering. The idea of the proposal is based on the PAA (piecewise aggregate approximation) technique followed by segmentwise correlation analysis. The primary goal of the article is to upgrade the quality of the PAA technique with respect to possible time series clustering (its speed and quality). We also tried to answer the following questions. Is the task of time series clustering in their original form reasonable? How much memory can we save using the new algorithm? The efficiency of the new algorithm was investigated on empirical time series data sets. The results prove that the new proposal is quite effective with a very limited amount of parametric user interference needed.

Downloads

Download data is not yet available.

References

Agrawal R., Faloutsos C., Swami A. (1993), Efficient similarity search in sequence databases, “Lecture Notes in Computer Science”, vol. 730, pp. 69–84.
Google Scholar

Bagnall A., Janacek G. (2005), Clustering time series with clipped data, “Machine Learning”, vol. 58(2–3), pp. 151–178.
Google Scholar

Fu T. (2011), A review on time series data mining, “Engineering Applications of Artificial Intelligence”, vol. 24, Issue 1, pp. 164–181.
Google Scholar

Gatnar E., Walesiak M. (2004), Metody statystycznej analizy wielowymiarowej w badaniach marketingowych, Wydawnictwo Akademii Ekonomicznej we Wrocławiu, Wrocław.
Google Scholar

Gavrilov M., Anguelov D., Indyk P., Motwani R. (2000), Mining the stock market: which measure is best, Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, Boston, pp. 487–496.
Google Scholar

Grabiński T., (1992), Metody taksonometrii, Wydawnictwo Akademii Ekonomicznej w Krakowie, Kraków.
Google Scholar

Korzeniewski J. (2012), Metody selekcji zmiennych w analizie skupień. Nowe procedury, Wydawnictwo Uniwersytetu Łódzkiego, Łódź.
Google Scholar

Möller‑Levet C. S., Klawonn F., Cho K., Wolkenhauer O. (2003), Fuzzy clustering of short time‑series and unevenly distributed sampling points, “Lecture Notes in Computer Science”, vol. 2811, pp. 330–340.
Google Scholar

Struzik Z. R., Siebes A. (1999), Measuring time series’ similarity through large singular features revealed with wavelet transformation, Proceedings of tenth international workshop on database & expert systems applications, Berlin, pp. 12–22.
Google Scholar

Yeh M. Y., Dai B. R., Chen M. S. (2007), Clustering over multiple evolving streams by events and correlations, “IEEE Transactions on Knowledge and Data Engineering”, vol. 19(10), pp. 1349–1362.
Google Scholar

Yin J., Gaber M. M. (2008), Clustering distributed time series in sensor networks, Proceedings of the eighth IEEE international conference on data mining, Washington, pp. 678–687.
Google Scholar