Efficient Time Series Clustering: A Distance-Based Feature Engineering Framework with Minimal Hyperparameter Tuning
Abstract
Time series clustering is a critical tool used to extract valuable insights from time series data. However, challenges accompany time series clustering due to time series unique properties, such as noise and data shifts. One major challenge lies in selecting appropriate distance measures used for clustering algorithms, significantly impacting the overall clustering performance. This research introduces an improved time series clustering approach based on a novel feature extraction technique that is founded on an enhanced vector-based distance measure. Our feature extraction process, named DBFE, converts time series data into distance-based feature vectors using the enhanced distance measure, which is both efficient and hyperparameter-free, overcoming time series challenges while remaining robust to noise, outliers, and simple shifts in data. Experimental results show that our proposed approach enhances clustering performance compared to state-of-the-art methods. When tested on 22 time series datasets and compared with traditional clustering approaches, clustering over DBFE resulted in better clustering results on 18 datasets, equivalent results on two datasets, and only failed on two datasets, one of which is not suitable for clustering and the other is too small to evaluate on. DBFE has also been expanded to multivariate data and, hence, is suitable for a wider range of time series applications in various domains such as medicine, finance, and marketing. By applying this enhanced clustering approach, researchers could more accurately discover patterns, detect anomalies, and recognize dynamic changes in data.