Temporal Data Mining


Knowledge Discovery and Data Mining for Time Series

Basically, the field of knowledge discovery and data mining (short: DM) deals with the extraction of interesting patterns or knowledge from often huge amounts of raw data. The goal is to find interesting patterns, that is, patterns that are valid, novel, useful, and understandable. Certainly, validity is usually seen as the most important of these goals.


Temporal Data Mining Pyramid
From data to wisdom – the Data Mining pyramid (Embrechts et al., 2004).

Temporal data mining (TDM) addresses tasks such as segmentation, classification, clustering, forecasting, and indexing of time series, event sequences, or sections of time series or sequences. Applications deal with financial, biomedical, meteorological, or technical time series or sequences, for instance.

Efficient Time Series Modeling Techniques

In our work in the field of TDM we focus on a fusion of probabilistic modeling techniques with extremely fast polynomial least-squares approximation techniques (with E. Fuchs, Passau). These techniques allow for

  • new time series segmentation and representation methods, e.g., piecewise probabilistic representation which includes a piecewise polynomial representation or shape space representation, and
  • new kinds of distance (similarity) measures for time series based on divergence measures for probability distributions such as the Kullback-Leibler divergence, for instance.

The time complexity of these techniques is only linear with respect to the overall number of time series or – if they are applied on-line – constant for each new sample (observation). Therefore, these techniques are suitable for many real-time applications.


Temporal Data Mining Example 1
Example for a piecewise probabilistic model of a time series.


Temporal Data Mining Example 2
Example: Detection of three days with unusual energy consumption in a building (blue color).

With the new segmentation, representation, and similarity measurement techniques, new and extremely fast techniques for motif detection in time series and time series clustering, classification, or prediction become possible as well. It is also possible to extract understandable temporal rules for time series classification or anomaly detection in time series.

In our work we develop a framework for TDM which we call SwiftMiner. Up to now it consists of modules such as SwiftSeg, SwiftMotif, and SwiftRule. We are interested in a broad applicability of our SwiftMiner approach, but we also investigate a few application scenarios in much more detail:

  • detection of human activities with body-worn sensors (with P. Lukowicz, Passau, and B. Schiele, Darmstadt),
  • evaluation of signals from biometric writing systems (see page on this topic).

The shape space representation of time series can also be used to generate large number of artificial, but realistic time series that can be used to benchmark database systems (with H. Kosch, Passau).



The video shows the segmentation and representation of an ECG time series with SwiftSeg (slow motion). Segmentation points are indicated by red vertical lines, segmentation polynomials in green color, and modeling polynomials in blue color.



Further Information

Staff:

Funding:

  • German Research Foundation (DFG); grant SI 674/6-1.

Publications:

U. Blanke, B. Schiele, M. Kreil, P. Lukowicz, B. Sick, T. Gruber; All for one or one for all? - Combining Heterogeneous Features for Activity Spotting; 7th IEEE Workshop on Context Modeling and Reasoning (CoMoRea) at the 8th IEEE International Conference on Pervasive Computing and Communication (PerCom 2010); pp. 18-24; Mannheim, 2010


E. Fuchs, T. Gruber, H. Pree, B. Sick; Temporal Data Mining Using Shape Space Representations of Time Series; in: Neurocomputing; (accepted)


D. Fisch, T. Gruber, B. Sick; SwiftRule: Mining Comprehensible Classification Rules for Time Series Analysis; in: IEEE Transactions on Knowledge and Data Engineering; (accepted)


E. Fuchs, T. Gruber, J. Nitschke, B. Sick; Online Segmentation of Time Series Based on Polynomial Least-Squares Approximations; in: IEEE Transactions on Pattern Analysis and Machine Intelligence; vol. 32; no. 12; pp. 2232-2245; 2010


T. Rabl, A. Lang, T. Hackl, B. Sick, H. Kosch; Generating Shifting Workloads to Benchmark Adaptability in Relational Database Systems; in: R. Nambiar, M. Poess (Eds.): Performance Evaluation and Benchmarking; Lecture Notes in Computer Science 5895, Springer Verlag, Berlin, Heidelberg, New York; pp. 116-131; Proceedings of the ”First TPC Technology Conference, TPCTC”; Lyon, 2009


E. Fuchs, T. Gruber, J. Nitschke, B. Sick; On-line Motif Detection in Time Series With SwiftMotif; in: Pattern Recognition; vol. 42, no. 11, pp. 3015-3031; 2009


E. Fuchs, C. Gruber, T. Reitmaier, B. Sick; Processing Short-Term and Long-Term Information With a Combination of Polynomial Approximation Techniques and Time-Delay Neural Networks; in: IEEE Transactions on Neural Networks; vol. 20, no. 9, pp. 1450-1462; 2009


and others...