Abstract:
Pipelines are a vital part of the global economy as they transport large volumes of fluids
across thousands of miles. Pipelines mainly pass through rural areas and are most of the time
buried underground or laid down on the seabed. They are prone to various types of failure due
to the harsh environmental conditions they serve in, where corrosion is considered one of the
major causes.
This study focused on predicting the remaining service life of pipes, before they fails
due to corrosion, using a big dataset published by the Pipeline Hazardous Material Safety
Administration (PHMSA). The PHMSA dataset includes a large number of predictive fields,
along with additional weather data such as temperature and precipitation that were extracted
from the National Centers for Environmental Information. A regression model was built to
predict the remaining time till failure and classification models to predict an interval of the
remaining time till failure. Top performance was achieved using the Extremely Randomized
Trees (Extra Trees) algorithm with an R-Squared score of 90.35% for regression and an f1
score of 85% for classification. The importance of each feature used to build the model was
assessed using SHapley Additive exPlanations (SHAP) to explain the outputs of the models
and identify the most contributing factors responsible for accelerated pipe failure. It was
concluded that weather conditions like temperature and precipitation play a major role in pipe
failure.
Future work may include implementing predictive maintenance using the precise
predictions of the time left before failure, as well as considering other datasets from various
types of structures.