Abridged Symbolic Representation of Time Series for Clustering
DOI:
https://doi.org/10.18778/0208-6018.341.03Keywords:
clustering, time series, symbolic representation, data miningAbstract
In recent years a couple of methods aimed at time series symbolic representation have been introduced or developed. This activity is mainly justified by practical considerations such memory savings or fast data base searching. However, some results suggest that in the subject of time series clustering symbolic representation can even upgrade the results of clustering. The article contains a proposal of a new algorithm directed at the task of time series abridged symbolic representation with the emphasis on efficient time series clustering. The idea of the proposal is based on the PAA (piecewise aggregate approximation) technique followed by segmentwise correlation analysis. The primary goal of the article is to upgrade the quality of the PAA technique with respect to possible time series clustering (its speed and quality). We also tried to answer the following questions. Is the task of time series clustering in their original form reasonable? How much memory can we save using the new algorithm? The efficiency of the new algorithm was investigated on empirical time series data sets. The results prove that the new proposal is quite effective with a very limited amount of parametric user interference needed.
Downloads
References
Agrawal R., Faloutsos C., Swami A. (1993), Efficient similarity search in sequence databases, “Lecture Notes in Computer Science”, vol. 730, pp. 69–84.
Google Scholar
Bagnall A., Janacek G. (2005), Clustering time series with clipped data, “Machine Learning”, vol. 58(2–3), pp. 151–178.
Google Scholar
Fu T. (2011), A review on time series data mining, “Engineering Applications of Artificial Intelligence”, vol. 24, Issue 1, pp. 164–181.
Google Scholar
Gatnar E., Walesiak M. (2004), Metody statystycznej analizy wielowymiarowej w badaniach marketingowych, Wydawnictwo Akademii Ekonomicznej we Wrocławiu, Wrocław.
Google Scholar
Gavrilov M., Anguelov D., Indyk P., Motwani R. (2000), Mining the stock market: which measure is best, Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, Boston, pp. 487–496.
Google Scholar
Grabiński T., (1992), Metody taksonometrii, Wydawnictwo Akademii Ekonomicznej w Krakowie, Kraków.
Google Scholar
Korzeniewski J. (2012), Metody selekcji zmiennych w analizie skupień. Nowe procedury, Wydawnictwo Uniwersytetu Łódzkiego, Łódź.
Google Scholar
Möller‑Levet C. S., Klawonn F., Cho K., Wolkenhauer O. (2003), Fuzzy clustering of short time‑series and unevenly distributed sampling points, “Lecture Notes in Computer Science”, vol. 2811, pp. 330–340.
Google Scholar
Struzik Z. R., Siebes A. (1999), Measuring time series’ similarity through large singular features revealed with wavelet transformation, Proceedings of tenth international workshop on database & expert systems applications, Berlin, pp. 12–22.
Google Scholar
Yeh M. Y., Dai B. R., Chen M. S. (2007), Clustering over multiple evolving streams by events and correlations, “IEEE Transactions on Knowledge and Data Engineering”, vol. 19(10), pp. 1349–1362.
Google Scholar
Yin J., Gaber M. M. (2008), Clustering distributed time series in sensor networks, Proceedings of the eighth IEEE international conference on data mining, Washington, pp. 678–687.
Google Scholar