Attimo: fast time series motif mining

Attimo is a software package for quickly finding motifs in time series. Its name, that is Italian means instant, is an acronym for AdapTive TImeseries MOtifs: in fact, the peculiarity of the algorithm underlying this implementation is that it is adaptive to the data distribution.

What is a motif?

A time series (i.e. a sequence of time-varying values) can be chunked in several subsequences of a given length \(w\). The similarity between two given subsequences of a time series can be measured using several distance measures. A very common choice is the Euclidean distance, along with its variant the z-normalized Euclidean distance.

Intuitively, a motif is a subsequence in a time series that has at least one similar occurrence in another location of the same time series.

More formally, for a fixed subsequence length \(w\) and for a given distance function between subsequences of length \(m\) of a time series, we can define the top motif as being the pair of subsequences at minimum distance. Since subsequences can overlap, usually overlapping subsequences are ignored¹.

How does this software work?

This software is implemented in the Rust programming language and can be used as a Rust library. For convenience, we provide a Python wrapper providing the full functionality.

In particular, you can specify an input time series and the desired window length, and the library will return a lazy iterator of the top motifs:

# Load the library
import pyattimo
# Load an example dataset
ts = pyattimo.load_dataset("ecg", prefix=100000)
# Build the iterator
motifs_iter = pyattimo.MotifsIterator(ts, w=1000, max_k=10)
# Iterate through the motifs
for rank, motif in enumerate(motifs_iter):
    print("Motif {} between {} and {} at distance {:.3}".format(
        rank,
        motif.a,
        motif.b,
        motif.distance
    ))

Motif 0 between 82088 and 89708 at distance 0.814
Motif 1 between 44164 and 95648 at distance 0.909

Motif 2 between 26581 and 93613 at distance 1.07
Motif 3 between 28272 and 30704 at distance 1.27
Motif 4 between 84732 and 97912 at distance 1.29
Motif 5 between 1328 and 17384 at distance 1.32
Motif 6 between 23572 and 49424 at distance 1.38
Motif 7 between 33016 and 34264 at distance 1.49
Motif 8 between 18499 and 96675 at distance 1.58
Motif 9 between 48287 and 77807 at distance 1.65

How does the algorithm work?

The algorithm is based on Locality Sensitive Hashing and takes advantage of the distribution of distances in the dataset.

The details are described in the paper Fast and Scalable Mining of Time Series Motifs with Probabilistic Guarantees by Matteo Ceccarello and Johann Gamper, soon to appear in PVLDB 15(13).

Footnotes

This definition of subsequences to be ignored is somewhat flexible. For instance, we can allow for overlaps of at most \(w/4\), or \(w/2\)↩︎