Data trees

Mikołaj Bojańczyk

papers

July 14, 2015

Mikołaj Bojańczyk, Anca Muscholl, Thomas Schwentick, Luc Segoufin
Two-variable logic on data trees and XML reasoning. J. ACM, 2009 PDF
Henrik Björklund, Mikołaj Bojańczyk
Bounded Depth Data Trees. ICALP, 2007 PDF
Mikołaj Bojańczyk, Slawomir Lasota 0001
An extension of data automata that captures XPath Log. Methods Comput. Sci., 2012 PDF
Vince Bárány, Mikołaj Bojańczyk, Diego Figueira, Pawel Parys
Decidable classes of documents for XPath. FSTTCS, 2012 PDF
Mikołaj Bojańczyk, Filip Murlak, Adam Witkowski
Containment of Monadic Datalog Programs via Bounded Clique-Width. ICALP (2), 2015 PDF

These are papers about data trees, which are the tree version of data words, i.e. these are trees where each node has a label from a finite alphabet and a data value from an infinite alphabet, with the provision that data values can only be compared for equality. The motivation behind data trees is that they are supposed to be like XML documents. This is a sequence of papers which tries to tackle the satisfiability problem for various logics over data trees. All of the papers use automata that run over data trees.

Paper [1] is about two-variable logic over data trees. The logic has access to child and next-sibling predicates, as well as the equal data value predicate. It does not have access to the descendant predicate. Using a heroic amount of cutting and pasting, we prove that satisfiability for the logic is decidable. The paper is a journal version of

Mikołaj Bojańczyk, Claire David, Anca Muscholl, Thomas Schwentick, Luc Segoufin
Two-variable logic on data trees and XML reasoning. PODS, 2006 PDF

which had the best paper award at PODS 2006, and the test of time award at PODS 2016. Here is a picture of my lovely co-authors and me in 2016:

Paper [2] is about bounded depth data trees, bounded by the typically small depth of tree-shaped databases. At the cost of bounding the depth, we can extend the logic from [1] with descendant order (not very interesting for bounded depth) and document order (more interesting), and still have decidable satisfiability. The underlying decision problem is priority vector addition systems, which are an extension of vector addition systems that can do tests for zero in a certain nested way, with the nesting essentially corresponding to the tree structure of bounded depth data trees.

Paper [3] is a journal version of

Mikołaj Bojańczyk, Slawomir Lasota 0001
An Extension of Data Automata that Captures XPath. LICS, 2010 PDF

Like for all papers in this group, the motivation behind papers [3,4] is to do static analysis of XPath, e.g. decide if a formula of XPath is true in some data word or tree. The problem is known to be undecidable. One workaround is to consider fragments of XPath, which is the approach of the previous papers [1,2]. Another approach is to find a restriction of data words or trees, such that most real-life documents satisfy this restriction (of course, this requires an understanding of real life that is missing in the analysis), and then prove that XPath satisfiability is decidable over documents that satisfy this restriction. One example of such a restriction is bounded depth from [2], but here we try to go for something more ambitious. In order to meaningfully analyze XPath, it is useful to have an automaton model, and such a model is introduced in paper [3], which is a journal version of

Mikołaj Bojańczyk, Slawomir Lasota 0001
An Extension of Data Automata that Captures XPath. LICS, 2010 PDF

Then, using this automaton model, we prove in [4] that, given a certain structural restriction on data trees, emptiness is decidable for the model from [3]. The restriction is difficult to explain, but the idea is to have something that is not a simple window dressing on established notions of well-behaved structures like tree width or clique width.

Speaking of clique width, paper [5] uses clique width to explain some existing decidability results on Datalog containment.

papers

Data trees

Leave a Reply