Skip to main content
Nathanaël Fijalkow

Fliess' theorem for minimising weighted automata

We state and prove Fliess' theorem, which says that the minimal automaton recognising a given formal series has size the rank of its Hankel matrix.

A formal series (here over the reals) is a function f : \Sigma^{\star} \to \mathbb{R}. The Hankel matrix of f is a bi-infinite matrix H_f \in \mathbb{R}^{\Sigma^{\star} \times \Sigma^{\star}} defined by

H_f(u,v) = f(uv)

For recognising formal series we use weighted automata: \mathcal{A} = (Q, \alpha \in \mathbb{R}^Q, (\Delta(a) \in \mathbb{R}^{Q \times Q})_{a \in A}, \eta \in \mathbb{R}^Q).

Such an automaton recognises the function

\mathcal{A}(a_1 \cdots a_n) = \alpha \cdot \underbrace{\Delta(a_1) \cdots \Delta(a_n)}_{\Delta(a_1 \cdots a_n)} \cdot \eta

Theorem: (Fliess ’74)
* Any automaton recognising f has at least \textrm{rank}(H_f) many states,
* There exists an automaton recognising f with \textrm{rank}(H_f) many states.

First implication: the backward-forward decomposition

Let \mathcal{A} be an automaton recognising f.

Define P_\mathcal{A}(\cdot,q) \in \mathbb{R}^{\Sigma^{\star} } the formal series recognised by the weighted automaton \mathcal{A} with e_q as final vector, and S_\mathcal{A}(\cdot,q) \in \mathbb{R}^{\Sigma^{\star} } the formal series recognised by the weighted automaton \mathcal{A} with e_q as initial vector.

We have H_f(u,v) = \sum_{q \in Q} P_\mathcal{A}(u,q) \cdot S_\mathcal{A}(v,q). We wrote H_f as a sum of Q terms, each of which is a product of a row vector with a column vector, hence has rank 1. We conclude by noting that the rank of a sum of matrices is at most the sum of the ranks.

Converse implication: construction of the minimal automaton

We introduce some notations. For X,Y two subsets of \Sigma^{\star}, we let H_f(X,Y) denote the submatrix of H_f restricted to rows in X and columns in Y. Observe the following equality: H_f(Xa,Y) = H_f(X,aY).

The backward automaton

Without loss of generality let us assume that f \neq 0. It follows that H_f( {\varepsilon}, \Sigma^{\star} ) \neq 0. We consider the family of vectors given by rows of H_f. Let us consider X indexing a basis of this family; thanks to the previous assumption we can assume \varepsilon \in X.

We construct an automaton \mathcal{A} as follows.
* The set of states is X.
* The initial vector is e_1.
* The final vector is H_f( X, {\varepsilon} ).
* The transition is defined to satisfy
H_f(X a, \Sigma^{\star}) = \Delta(a) \cdot H_f(X, \Sigma^{\star}).

To see that there exists \Delta(a) such that H_f(X a, \Sigma^{\star}) = \Delta(a) \cdot H_f(X, \Sigma^{\star}), let x \in X. We decompose H_f(xa,\Sigma^{\star}) on the basis \left\{H_f(x,\Sigma^{\star}) \mid x \in X \right\}: there exists \Delta(a)(x,x’) \in \mathbb{R} such that

H_f(xa,\Sigma^{\star}) = \sum_{x’ \in X} \Delta(a)(x,x’) \cdot H_f(x’,\Sigma^{\star}).

We now argue that \mathcal{A} recognises f: we show by induction that \Delta(w) \cdot \eta = H_f(Xw, {\varepsilon}).

\Delta(aw) \cdot \eta = \Delta(a) \cdot \Delta(w) \cdot \eta
= \Delta(a) \cdot H_f(Xw, {\varepsilon})
= \Delta(a) \cdot H_f(X, {w})
= H_f(Xa, {w})
= H_f(Xaw, {\varepsilon})

Note that we used the equality H_f(X a, w) = \Delta(a) \cdot H_f(X, w) for all words w \in \Sigma^{\star}, which is the projection on the column w of the equality defining \Delta(a).

This concludes the proof.

For the sake of curiosity, we show that there exists a dual construction, which considers columns instead of rows.

The forward automaton

Since f \neq 0 it follows that H_f( \Sigma^{\star}, {\varepsilon}) \neq 0. We consider the family of vectors given by columns of H_f. Let us consider Y indexing a basis of this family; thanks to the previous assumption we can assume \varepsilon \in Y.

We construct an automaton \mathcal{A} as follows.
* The set of states is Y.
* The initial vector is H_f( {\varepsilon}, Y ).
* The final vector is e_1.
* The transition is defined to satisfy
H_f(\Sigma^{\star}, a Y) = H_f(\Sigma^{\star}, Y) \cdot \Delta(a).

To see that there exists \Delta(a) such that H_f(\Sigma^{\star}, a Y) = H_f(\Sigma^{\star}, Y) \cdot \Delta(a), let y \in Y. We decompose H_f(\Sigma^{\star}, ay) on the basis \left\{H_f(\Sigma^{\star},y) \mid y \in Y\right\}: there exists \Delta(a)(y,y’) \in \mathbb{R} such that

H_f(\Sigma^{\star}, ay) = \sum_{y’ \in Y} H_f(\Sigma^{\star},y’) \cdot \Delta(a)(y,y’).

We now argue that \mathcal{A} recognises f: we show by induction that \alpha \cdot \Delta(w) = H_f( {\varepsilon}, wY).

\alpha \cdot \Delta(wa) = \alpha \cdot \Delta(w) \cdot \Delta(a)
= H_f({\varepsilon}, wY) \cdot \Delta(a)
= H_f({w}, Y) \cdot \Delta(a)
= H_f({wa}, Y)
= H_f({\varepsilon}, waY)

Note that we used the equality H_f(w, a Y) = H_f(w, Y) \cdot \Delta(a) for all words w \in \Sigma^{\star}, which is the projection on the row w of the equality defining \Delta(a).

We finish this post with an alternative abstract point of view.

An abstract construction without choosing a basis

(Thanks to Pierre Ohlmann for working this out with me!)

Instead of constructing a weighted automaton, we build a representation, which is given by a vector space V of finite dimension (over the reals), a morphism \mu : \Sigma^{\star} \to V and \lambda : V \to \mathbb{R}. A representation computes a function f : \Sigma^{\star} \to \mathbb{R} by f(w) = \lambda(\mu(w)).

One can show that through the choice of a basis for V a representation induces a weighted automaton, and conversely.

We consider the vector space of functions \Sigma^{\star} \to \mathbb{R}, which has infinite dimension. We simplify the notation H_f(w,\varepsilon) into H(w). We let V denote the vector space generated by \left\{ H_w \mid w \in \Sigma^{\star} \right\}, it has finite dimension since \textrm{rank}(H_f) is finite.
Define

\mu(a)(H_w) = H_{wa}.

We first need to show that this is well defined: if H_w = H_{w’}, then H_{wa} = H_{w’a}. This is easily checked:

H_{wa}(u) = H_w(au) = H_{w’}(au) = H_{w’a}(u).

Then we extend \mu into a morphism \mu : \Sigma^{\star} \to V, and let \lambda(H_w) = f(w). We obtain by proving the invariant \mu(w) = H_w that the representation (V,\mu,\lambda) computes the function f.

Learning theory

Weighted automata and matrix factorisations

Read post
Learning theory

Angluin's style learning for weighted automata

Read post
Parity games

Value iteration for parity games

Read post
See all posts
See all posts in category

Nathanaël Fijalkow

351, cours de la Libération F-33405 Talence cedex, France

nathanael.fijalkow@gmail.com
How to find my office
made by superskrypt
  • Home
  • Curriculum Vitae
  • Publications
  • Research blog

Nathanaël Fijalkow

nathanael.fijalkow@gmail.com
made by superskrypt