# FMT 1a: Introduction

These are lecture notes for a course on Finite Model Theory, which takes place at the University of Warsaw, in 2014. The lecturers are Filip Murlak and Szymon Toruńczyk. Subsequent notes will appear in this thread.
For a vast literature on the subject, see this post.

Basic notions.
A signature $\Sigma$ is a family of symbols, each with a prescribed type – either  ‘function’ or ‘relation’ – and arity (a natural number). A structure $\str A$over the signature $\Sigma$ is a set – the domain of $\str A$ – with an assignment which associates to each symbol of $\Sigma$ a function or relation of appropriate type and arity. Structures are also called models.

Examples. The graph signature consists of one binary relation symbol $E$. The partial order signature also consists of one relation symbol $\le$. A directed $G$ is a structure over the graph signature. A simple graph $G$ is a structure over the graph signature such that the graph relation is symmetric and antireflexive.

Logic. We consider various logics, such as first order logic. For each signature $\Sigma$ there is a set of admissible first order formulas over the signature $\Sigma$. Formally, we fix a countable set of variables $x,y,z,\ldots$ and define the syntax of first order formulas by induction:

• If $\sigma$ is a relation symbol of arity $n$ in $\Sigma$, and $x_1,\ldots,x_n$ are variables, then $\sigma(x_1,\ldots,x_n)$ is a formula;
• More generally, above $x_1,\ldots,x_n$ could be any terms built of variables, using function symbols;
• Above, we also allow the special symbol $=$ as a special binary relation symbol denoting equality;
• If $\phi$ is a formula and $x$ is a variable then $\forall x \phi$ and $\exists x \phi$ are formulas;
• If $\phi,\psi$ are formulas, then $\neg \phi, \phi\lor\psi,\phi\land\psi,\phi\implies\psi$ are formulas.

A variable $x$ is bound in $\phi$ if each occurrence of $x$ in the syntax tree of $\phi$ appears below a node bounding $x$, i.e., below $\forall x$ or $\exists x$. A variable $x$ is free in $\phi$ if it is not bound. For instance the formula $\phi=(x<y)$ has free variables $x,y$. A sentence is a formula without free variables. Sometimes we write $\phi(x,\ldots,z)$ to indicate that the free variables of $\phi$ are contained in $x,\ldots,z$; however, this is only an optional convention. (This is different than in the case of defining functions, where we are obliged to provide the arguments of the function, whereas the symbols which are not variables are assumed to be parameters. In the case of formulas, everything is a free variable, unless it is bound).

The semantics is defined inductively: for a formula $\phi$ with free variables $V$ and a structure $\str A$, we define a set $[\phi]_{\str A}\subset \str A^V$ of valuations $\nu:V\to\str A$ which satisfy $\phi$. The inductive step is either a projection or a boolean combination.

If $\nu\in [\phi]_{\str A}$ then we write $\str A,\nu\models\phi$. If $\phi$ has no free variables, then we write $\str A\models\phi$ if $\str A,\eps\models \phi$, where $\eps$ is the empty valuation.

A glimpse of model theory. Model theory studies structures (models) and their theories. A theory is a set of sentences. We say that a structure is a model of a theory if it is a model of each of its sentences. For example, a simple graph is a model of the (first order) theory consisting of two formulas: $\forall x\forall y E(x,y)\implies E(y,x)$ and $\forall x\neg E(x,x)$. The first order theory of a structure is the set of all first order sentences which it satisfies. For example, the first order theory of $(\str Q,\le)$ contains the axioms of a dense linear order without endpoints:

• $\forall x\forall y\forall z.(x\le y\land y\le z)\implies (x\le z)$,
• $\forall x\forall y. (x\le y\land y\le x) \iff (x=y)$,
• $\forall x\forall y.(x\le y)\lor (y\le x)$,
• $\forall x\forall z. (x\le z)\implies \exists y. x\le y\le z$,
• $\forall x\exists y\exists z. (y\neq x)\land (y\neq z)\land (z\le x\le y)$

but also many (but only countably many) other formulas, such as $\forall x\forall y(x\le y)\implies(\exists t\exists z. x\le t\le z\le y)$.

The first order theory of $(\str Q,\le )$ and of $(\str R,\le)$ are the same. This can be proved using Ehrenfeucht-Fraisse games. However, the theory of $(\str Q,\le)$ has exactly one countable model (it is said that $(\str Q,\le)$ is $\aleph_0$-categorical).

Two fundamental theorems of model theory are the completeness and compactness theorems.

Completeness (of first order logic and a proof system). A first order sentence $\phi$ is valid (i.e., holds in all structures) if it is provable (in the proof system).

Compactness (for first order logic). If $T$ is a first order theory such that every finite subset of $T$ has a model, then $T$ has a (countable) model.

Corollary. The set of valid sentences is recursively enumerable.

Proof. By completeness, a Turing machine can produce all valid sentences by producing all correct proofs in the proof system.

Here is a simple application of compactness.

Fact. There is no first order formula which holds in a graph $G$ if and only if $G$ has a cycle, i.e., a finite sequence $x_1,x_2,\ldots,x_n$ of vertices such that $E(x_i,x_{i+1})$ holds for $i=1,\ldots,n-1$ and $E(x_n,x_1)$.

Proof. Suppose that $\phi$ is such a formula. For $n=1,2,\ldots$, let $\phi_n$ be a first order formula which says “there is no cycle of length $n$”. This can be expressed by a first order formula.

Let $T=\set{\phi}\cup\set{\phi_n:n\in\NN}$. Then every finite subset of $T$ has a model: if $S\subset T$ contains no formula $\phi_n$ with $n>n_0$, then a cycle of length $n_0+1$ is a model of $S$. Therefore, by compactness, $T$ has a model. This model would have a cycle (thanks to $\phi$) but no cycle of finite length (thanks to all the $\phi_n$), a contradiction.

Finite model theory studies the expressive power of logics over finite structures. The main differences with model theory are:

• The fundamental theorems – compactness and completeness – fail in FMT. For example, $T=\set{\psi_n: n\in \NN}$, where $\psi_n$ says “there exist at least $n$ distinct elements”, is a theory such that each of its finite subsets has a finite model, but there is no finite model of $T$. Because of this, the methods used in FMT are radically different than the methods of model theory. Note that there is no formula which holds precisely in those finite graphs which have a cycle, but a different proof than in the Fact above is needed! [Exercise: find a proof using compactness.] Completeness also fails (for any reasonable proof system). This is due to Trakhtenbrot’s theorem which says that the set of sentences which are satisfiable in some finite model is not decidable (for any signature $\Sigma$ with at least one binary relation symbol). Note that the set of sentences which are satisfiable in some finite model is recursively enumerable (a machine can test all finite models of larger and larger size, and check if any of them satisfies a given formula; if the formula is satisfiable by some finite model, the machine will terminate). Therefore, Trakhtenbrot’s theorem is equivalent to saying that the set of sentences which are valid in all finite models is not recursively enumerable — an opposite statement to the Corollary above, in the finite world.
• Typically, logics more powerful than first order logic are studied in finite model theory.
• Complexity classes, such as PTime or PSpace, can be considered as (or compared to) logics; there is a close connection of logics over finite structures to complexity theory (studied by descriptive complexity theory).

Example: Database theory. In 1969 Codd introduced the very successful relational model of databases. A relational database is nothing else than a relational structure, and relational algebra is the same thing as first order logic. SQL is an extension of relational algebra by a counting mechanism. Using methods of FMT one can study the expressive power of different query languages. For example, can an SQL query detect that there is a cycle in a graph?