Heavy-Tailed K-Means, Linear Regression and PCA

Loading...
Thumbnail Image

Authors

Sayde, Mario

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Traditional machine learning algorithms are formulated with the implicit assumption that the empirical data is ``well-behaved". However, this assumption fails when dealing with heavy-tailed data, where the underlying distribution may lack finite moments. Such heavy-tailed datasets are prevalent in fields like finance, telecommunications, and geophysics, where rare but impactful events dominate statistical behavior. In these contexts, classical methods such as linear regression, standard K-means clustering, and Principal Component Analysis (PCA) fail. To address this limitation, we introduce and validate heavy-tailed versions of these algorithms designed specifically for such type of data in both scalar and multidimensional settings. Our proposed approaches rely on recently introduced robust measures of location and power tailored to heavy-tailed characteristics. Furthermore, we demonstrate through extensive evaluations that these novel algorithms outperform existing specialized techniques found in current literature.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By