Feature selection approaches for predictive modelling of cadmium sources and pollution levels in water springs

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Science and Business Media Deutschland GmbH

Abstract

The World Health Organization lists cadmium (Cd) as one of the top ten chemicals of public health concern. Cd is toxic at relatively low exposure levels and has acute and chronic effects on both health and the environment. In this study, we investigate a suite of data-driven methods that could assist decision-makers in estimating Cd levels in water springs, and in identifying polluting sources. Machine learning (ML) regression models were used to identify sources of contamination and predict Cd levels based on support vector machines and a variety of tree-based models, including Random Forests, M5Tree, CatBoost, and gradient boosting. Feature selection analysis revealed that heavy traffic and distance to a major power plant in the sampled area play a leading role in springs Cd contamination, together with precipitation levels and average of slopes of the closest waste dumps upstream to sampled springs. Our best performing ML model was the Adaboost regression tree using all the features (RMSE = 19.36, R^2 = 0.64). Our findings highlight the effectiveness of predictive data-driven modeling in addressing environmental challenges, particularly in high-risk areas with low resources. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.

Description

Keywords

Cadmium (cd), Machine learning, Solid waste, Traffic emission, Water pollution, Cadmium, Environmental pollution, Natural springs, Water, World, Numerical model, Prediction, Spring water, Natural spring, Pollution

Citation

Endorsement

Review

Supplemented By

Referenced By