# FMT 7: Conjunctive Queries

This post discusses some basic facts about conjunctive queries, the containment problem, and minimization.

conjunctive query (database terminology), or primitive-positive formula (logical terminology), or pp-formula for short, is a formula of the form

$$\phi=\exists x_1\exists x_2\ldots\exists x_n \alpha_1\land \alpha_2\land \alpha_k,$$

where each $\alpha_i$ is an atom, i.e. a formula of the form $R(v_1,v_2,\ldots, v_r)$ where $R$ is a symbol in the signature and $v_1,\ldots,v_r$ are variables. Note that the formula $\phi$ might have free variables.

It is convenient to also allow atoms to be of the form $v_i=v_j$. This does not change much, since a formula using equality can be easily converted into an equivalent formula which does not use equalities (for example, $\exists x.(x=y)\land R(x,z)$ is equivalent to $R(y,z)$).

Example. The fact that two nodes of a graph are at distance two is expressible by a pp-formula:

$$\phi(x,y)=\exists z. E(x,z)\land E(z,y).$$

In database theory, a formula with free variables $x_1,\ldots,x_n$ is viewed as a query, which can be applied to a finite structure $\str A$ giving as result the set

$$\phi(\str A)\stackrel{\textrm{def}}{=}\set{(a_1,\ldots,a_k):\quad \str A,a_1,\ldots,a_k\models \phi(x_1,\ldots, x_k)}.$$

# The canonical structure

Let $\phi$ be a conjunctive query with free variables $x_1,\ldots,x_k$. The canonical structure $\str A_\phi$ associated with $\phi$ is defined as follows. The elements of $\str A_\phi$ are the variables of $\phi$. If $R(v_1,\ldots,v_r)$ is an atom appearing in $\phi$, then we say that $R(v_1,\ldots,v_r)$ holds in $\str A_\phi$. Moreover, the structure $\str A_\phi$ has constants $c_1,\ldots,c_k$ interpreted as the elements $x_1,\ldots,x_k$, respectively.

If $\str A$ is a relational structure and $a_1,\ldots,a_n$ are its elements, then by $(\str A,a_1,\ldots, a_n)$ we denote the logical structure obtained from $\str A$ by adding constants $c_1,\ldots,c_n$, where the interpretation of the constant $c_i$ is the element $a_i$ in $\str A$.

If $\str A,\str B$ are two logical structures, then we say that $\str A$ maps homomorphically to $\str B$, if there is a homomorphism from $\str A$ to $\str B$. In this case, we also write $\str A\rightarrow \str B$.

Lemma 1. The following conditions are equivalent for a structure $\str B$ with distinguished elements $b_1,\ldots,b_k$:

1. $\str B,b_1,\ldots,b_k\models \phi(x_1,\ldots,x_k)$
2. There is a homomorphism from $\str A_\phi$ to $(\str B,b_1,\ldots,b_k)$.

Conversely, for any structure $\str A$ with constants $c_1,\ldots,c_k$ there exists a conjunctive formula $\phi$ with free variables $x_1,\ldots,x_k$ such that the two conditions above hold, with $\str A_\phi$ replaced by $\str A$. Therefore, conjunctive queries can be conveniently represented by structures with distinguished elements.

# Query containment

The query containment problem is the following decision problem:

Input: two queries $\phi,\psi$, both with free variables $x_1,\ldots,x_k$

Output: YES if for every finite structure $\str A$, $\phi(\str A)\subset \psi(\str A)$, NO otherwise.

In the special, but sufficiently interesting case when $k=0$, i.e., $\phi$ and $\psi$ are sentences, the question is then whether $\phi$ implies $\psi$ in every finite structure $\str A$.

If $\phi,\psi$ are allowed to be first-order formulas, even without free variables (i.e., sentences), then the problem is undecidable by Trakhtenbrot’s theorem, since setting $\psi=\bot$ gives a reduction from the unsatisfiability problem for first-order logic.

However, the query containment problem for conjunctive queries is decidable, due to the following.

Proposition. The following conditions are equivalent for two conjunctive queries $\phi,\psi$ with free variables $x_1,\ldots,x_k$:

1. For every structure $\str A$, $\phi(\str A)\subset \psi(\str A)$
2. For every structure $\str A$ with distinguished elements $a_1,\ldots,a_n$, if $(\str A_\phi,x_1,\ldots,x_n)$ maps homomorphically to $(\str A,a_1,\ldots,a_n)$, then $(\str A_\psi,x_1,\ldots,x_n)$ maps homomorphically to $\str A$.

Proof: Follows immediately from Lemma 1.$\square$

Therefore, the query containment problem reduces to a certain problem about logical structures. This problem, in turn, reduces to a simpler problem, as implied by the following.

Proposition. Let $\str A$ and $\str B$ be two logical structures. Then the following conditions are equivalent:

1. For every structure $\str C$, if $\str A$ maps homomorphically to $\str C$, then $\str B$ maps homomorphically to $\str C$
2. $\str B$ maps homomorphically to $\str A$.

Proof. (1→2) Assume that the first condition holds. Since $\str A$ maps to $\str A$, it follows that $\str B$ maps homomorphically to $\str A$.

(2→1) Suppose that $\str B\rightarrow\str A$. Let $\str C$ be a such that $\str A\rightarrow \str C$. Then $\str B\rightarrow \str C$, by composing the two arrows.$\square$

Corollary. The query containment problem for conjunctive queries is decidable.

Indeed, it follows from the above, that $\phi$ is contained in $\psi$ if and only if $\str A_\psi\rightarrow \str A_\phi$. The latter problem is decidable (in NP), as one can guess a mapping and test that it is a homomorphism.

# Query minimization and core structures

Two queries $\phi$ and $\psi$ are equivalent if in every finite structure $\str A$, $\phi(\str A)=\psi(\str A)$. In other words, $\phi$ is contained in $\psi$ and vice-versa.

In terms of canonical structures, this corresponds to the following notion. We say that two structures $\str A,\str B$ are homomorphically equivalent if $\str A$ maps homomorphically to $\str B$ and vice-versa.

It is natural to try to “optimize” a query $\phi$ obtaining an equivalent query which is smaller in some sense. We describe one way of doing this. Instead of speaking of the query, let us consider the associated canonical structure, say $\str A$. The question therefore becomes the following: can we choose some “best” representative of the class of homomorphically equivalent structures?

Example. All bipartite (undirected) graphs with at least one edge are homomorphically equivalent. Indeed, any bipartite graph maps homomorphically to the two two clique (by mapping one part to one vertex and the other part to the other vertex). Conversely, the two-clique is a subgraph of any bipartite graph $G$ with at least one edge, so maps homomorphically to $G$. Moreover, the graphs which are homomorphically equivalent to the two clique are precisely the bipartite graphs with at least one edge. The two clique is the smallest graph representative of this class.

We will prove the following.

Proposition. Let $\str B$ be a finite relational structure and $\str C$ be the smallest (in terms of cardinality) structure which is homomorphically equivalent to $\str B$. The structure $\str C$ is unique up to isomorphism, and is isomorphic to a induced substructure of $\str B$.

The structure $\str C$ described in the proposition is called the core of $\str B$.

Before we prove the above proposition, we introduce the following notion. A relational structure $\str C$ is a core if every endomorphism of $\str C$ (i.e., homomorphism from $\str C$ to itself) is bijective.

Example. Consider an undirected clique $K_n$ with $n$ elements. Then $K_n$ is a core. On the other hand, the undirected graph $G$ with three nodes and two edges is not a core, since it has an endomorphism which maps the two nodes of degree one to one of them.

We prove the proposition by proving three facts.

Fact 1. Let $\str A$ be a structure and let $\str C$ be the smallest (in terms of cardinality) structure which is homomorphically equivalent to $\str A$. Then $\str C$ is a core.

Proof. Suppose that $f$ is an endomorphism of $\str C$. Then the image $f(\str C)$ of $f$ is homomorphically equivalent to $\str C$, since $\str C$ maps to it via $f$, and $f(\str C)$ is a substructure of $\str C$. By minimality of $\str C$, $f$ must be bijective.$\square$

Fact 2. Any two homomorphically equivalent cores are isomorphic.

Proof. Consider two core structures $\str C,\str C’$ which are homomorphically equivalent, i.e. $\str C\rightarrow \str C’$ and $\str C’\rightarrow \str C$. By composing the two homomorphisms, we get an endomorphism of $\str C$. Since $\str C$ is a core, this endomorphism is bijective. It follows from Fact 3 below that this composition is an automorphism. In particular, both homomorphisms where isomorphisms. $\square$

Fact 3. If $\str C$ is a finite relational structure and $f$ is a bijective endomorphism of $\str C$, then $f$ is an automorphism of $\str C$. $\square$

(Note that this fact does not hold for infinite structures. Also, a bijective homomorphism does not need to be an isomorphisms in the category of finite relational structures, unlike the situation in the the category of algebraic structures.)

We now prove the existence of the structure $\str C$ as described in the statement of the proposition.

Proof (of Proposition). Consider the smallest (in terms of cardinality) structure $\str C$ which is homomorphically equivalent to $\str B$, i.e. there are homomorphisms $f:\str B\to \str C$ and $g:\str C\to \str B$. It is a core, by Fact 1. By Fact 2, it is unique, up to isomorphism. Consider the composition $fg:\str C\to \str C$. This is an endomorphism of $\str C$, so it is an automorphism of $\str C$, by Fact 3. It follows that $f$ is an isomorphism onto its image. Therefore, $\str B$ contains an induced subgraph isomorphic to $\str C$.$\square$

Corollary. Finding a core of a given structure $\str A$ can be done in NP: guess a subset $X$ of $\str A$ and a homomorphism of $\str A$ to the substructure induced by $X$.

# Exercises

Exercise 1. Construct an infinite sequence $\str C_1,\str C_2,\ldots$ of cores which:

1. form an infinite strictly decreasing chain, i.e., $\str C_i$ maps to $\str C_{i+1}$ (but not vice-versa) for $i=1,2,\ldots$
2. form an infinite strictly increasing chain, i.e., $\str C_{i+1}$ maps to $\str C_{i}$ (but not vice-versa) for $i=1,2,\ldots$ Hint: sǝlɔʎɔ pǝʇɔǝɹıp ɹǝpısuoɔ
3. form an infinite antichain i.e., $\str C_n$ does not map homomorphically to $\str C_m$ for $m\neq n$.

Exercise 2. Show that $\str C$ is the core of $\str B$ if and only if $\str B$ and $\str C$ are homomorphically equivalent and $\str C$ is a core.

Exercise 3. Given two structures $\str C,\str B$, we say that $\str C$ is a retract of $\str B$ if there exist mappings $f:\str B\to\str C$, $g:\str C\to \str B$ such that $gf$ is the identity on $\str B$. This defines a transitive, reflexive relation on finite relational structures, i.e. a preorder. Show that the core of $\str B$ is the minimal (with respect to this preorder) retract of $\str B$.

Exercise 4. This exercise shows that the core $\str C$ not only has the smallest number of elements, but also the related conjunctive query is the smallest possible. Let $\str B$ be a finite structure. Show that the following conditions are equivalent for a structure $\str C$

1. $\str C$ is a core
2. $\str C$ is the structure with the least number of atoms (i.e., tuples $(R,(x,\ldots,z))$ such that $R(x,\ldots,z)$ holds in $\str C$) which is homomorphically equivalent to $\str B$.

Exercise 5. The aim of this exercise is to provide a constructive description of the core of $\str B$. Show that if $\str B$ is not a core then there exists a retraction $g$ of $\str B$ which is not injective (a retraction of $\str B$ is a mapping $g:\str B\to\str B$ such that $g^2=g$. Equivalently, $g$ is the identity when restricted to its image). Prove that the structure $\str B’$ induced by the image of $g$ is homomorphically equivalent to $\str B$. Repeat the argument to construct a core of $\str B$.

Exercise 6. Show that query containment is decidable also for unions of conjunctive queries, i.e. formulas which are disjunctions of pp-formulas.