Abstract:
Since the evaluation of XPath expressions is highly dependent upon their size and navigational structures that include ancestor-descendant relationships (--) and wildcard steps (-*), we introduce a novel and complementary approach to optimizing XPath queries by rewriting and minimizing such structural occurrences. This rewriting approach depends upon the existence of a statistical schema, which we derive from a set of pre-processed XML documents. However, an imprecision in the schema extraction may lead to a loss of accuracy in the results. Through experimentation and analysis, we validate the scalability and efficiency of our approach. © Springer-Verlag Berlin Heidelberg 2005.