Abstract:
The practice of medicine has evolved significantly during the past decade, with the
emergence of new diagnostic and prognostic tools allowing for a better implementation
of “precision medicine”. Among these tools is Machine Learning (ML) that offers the
opportunity of personalized patient-tailored care. However, just as humans, ML models
still face some difficulty when classifying patients in certain applications where clear-cut
boundaries between classes are not easy to identify (e.g., when diagnosing patients with
intermediate pretest probability or estimating level of stress of healthcare workers during
a pandemic). In this work, we propose an ML architecture to improve the sensitivity of
the model to detect patients in intermediate “hard-to-classify” classes and boost the
overall performance. This architecture replaces a single classifier by a group of cascaded
specialized classifiers that we refer to as: the Human-like Classifier, the Segregating
Classifier, and the Deep Classifiers. By doing so, it flags the points that are hard to
classify and then develops more specialized models to segregate them. To test its
effectiveness, 8 machine learning algorithms were used to predict the feeling of protection
among healthcare workers during the COVID-19 pandemic, based on a global online
survey, using the traditional and the proposed architectures. The results show, for most
algorithms, an enhanced sensitivity for points belonging to intermediate classes, as well
as an overall improvement in the models’ accuracies. To validate the results and check
for generalizability, the new architecture is tested on a different output of the public health
dataset (to predict respondents’ perception of being valued by their community), and on
another public dataset (Wine Quality Dataset), and yielded similar results with improved
accuracies for most algorithms when compared to the old architecture. This architecture
is proving to be a very promising tool to assist physicians in their decision making
especially that it is fully automated and does not depend on the algorithm or dataset used.