Abstract:
In the realm of data exploration, the persistent challenges of data disconnection
and inconsistency often hinder the efficiency of data analysts, especially in terms of
data enrichment and aggregation. This thesis focuses on addressing the following
research questions: How can we improve data integration and reuse of data in a clean
and downloadable format to facilitate data analysis? Moreover, how can we
contextually expand data on the fly to leverage its value and enhance data exploration?
This work proposes KGFusionX, a knowledge graph centered framework that
recognizes the time-intensive nature of data enrichment and integration. The study
employs a backend implementation utilizing knowledge graphs to seamlessly connect
disparate datasets. Several datasets from Lebanon covering different domains (e.g.
health care, economy, education, and others) were converted and published as openly
accessible knowledge graphs in a triple store repository (749,500 triples). This
conversion allows efficient and fast aggregation of data because of the connections
generated by knowledge graphs. Also, it is integrated with open linked data sources that
serves as a resource to expand the data. The framework is showcased through an online
platform built with Streamlit that allows users to select, combine, and download tabular
data that can be used in other visualization exploration tools (e.g. PowerBI and
Tableau). The approach was evaluated by data analysts and two use cases. Potential
pickup of our platform was expressed by users who relied on the tool to analyze school
and university challenges in rural areas, in addition to boosting tourism in Lebanon. The
results demonstrated a significant improvement in data exploration efficiency, and
better visuals with the knowledge graph-driven approach proving successful in
overcoming the challenges posed by disconnection, inconsistency, and enrichment. This
research primarily contributes to streamlining data exploration using the high potential
of knowledge graphs to support data aggregation, data enrichment and visual data
analysis.