Abstract:
Petroleum and its products undergo large-scale production, transportation, and storage which makes them prone to spills and leakages into the environment. Petroleum contamination in terrestrial environments, particularly soil bodies, is common and holds major consequences on food and crops, microbial communities, the atmosphere, the water sphere, public health and safety, and the soil itself, and therefore, requires immediate detection and assessment in case contamination is present.
In this study, hyperspectral imaging combined with advanced machine learning and deep learning methods are used to predict petroleum hydrocarbon contamination in soil. Hyperspectral imaging combines imaging and spectroscopy and can detect petroleum hydrocarbons using the characteristic absorption features in hydrocarbon reflectance spectra. Laboratory-prepared soil samples of three different soil types are contaminated with various petroleum hydrocarbons including crude oil, diesel, and gasoline. The resultant samples are scanned with a hyperspectral camera in a laboratory setup and then analyzed using gas chromatography to obtain the petroleum hydrocarbon assessment for control and model training.
The data collected is used to train data-driven models for each soil type-petroleum hydrocarbon combination to predict, quantitatively, the amount of petroleum hydrocarbons present in the soil samples. The results show that the models were able to achieve excellent performance, reaching a R-Squared of 0.96 and RMSE of 600 mg/kg on testing data on a range of 0 to 10,000 mg/kg. The performance was heavily impacted by petroleum hydrocarbon and soil types. Gasoline models suffered to find a relation between input spectra and output contamination level whereas crude oil and diesel models performed better on that front due to their more sensitive spectral features. This study also tested taking selected spectral bands instead of the entire spectra as input to our models which improved the performance by regulating overfitting. XGB regressor machine learning model achieved a compromise between testing and training performance while also being simpler to train which is recommended for most spectral applications. Future work may include scrutinizing of various factors that may affect the performance of the models such as mixing contaminants, mixing different types of soil, and moisture content in the soil.