# Fliess' theorem for minimising weighted automata

We state and prove Fliess' theorem, which says that the minimal automaton recognising a given formal series has size the rank of its Hankel matrix.

A formal series (here over the reals) is a function f : \Sigma^{\star} \to \mathbb{R}. The Hankel matrix of f is a bi-infinite matrix H_f \in \mathbb{R}^{\Sigma^{\star} \times \Sigma^{\star}} defined by

H_f(u,v) = f(uv)

For recognising formal series we use weighted automata: \mathcal{A} = (Q, \alpha \in \mathbb{R}^Q, (\Delta(a) \in \mathbb{R}^{Q \times Q})_{a \in A}, \eta \in \mathbb{R}^Q).

Such an automaton recognises the function

\mathcal{A}(a_1 \cdots a_n) = \alpha \cdot \underbrace{\Delta(a_1) \cdots \Delta(a_n)}_{\Delta(a_1 \cdots a_n)} \cdot \eta

Theorem:(Fliess ’74)

* Any automaton recognising f has at least \textrm{rank}(H_f) many states,

* There exists an automaton recognising f with \textrm{rank}(H_f) many states.

## First implication: the backward-forward decomposition

Let \mathcal{A} be an automaton recognising f.

Define P_\mathcal{A}(\cdot,q) \in \mathbb{R}^{\Sigma^{\star} } the formal series recognised by the weighted automaton \mathcal{A} with e_q as final vector, and S_\mathcal{A}(\cdot,q) \in \mathbb{R}^{\Sigma^{\star} } the formal series recognised by the weighted automaton \mathcal{A} with e_q as initial vector.

We have H_f(u,v) = \sum_{q \in Q} P_\mathcal{A}(u,q) \cdot S_\mathcal{A}(v,q). We wrote H_f as a sum of Q terms, each of which is a product of a row vector with a column vector, hence has rank 1. We conclude by noting that the rank of a sum of matrices is at most the sum of the ranks.

## Converse implication: construction of the minimal automaton

We introduce some notations. For X,Y two subsets of \Sigma^{\star}, we let H_f(X,Y) denote the submatrix of H_f restricted to rows in X and columns in Y. Observe the following equality: H_f(Xa,Y) = H_f(X,aY).

### The backward automaton

Without loss of generality let us assume that f \neq 0. It follows that H_f( {\varepsilon}, \Sigma^{\star} ) \neq 0. We consider the family of vectors given by rows of H_f. Let us consider X indexing a basis of this family; thanks to the previous assumption we can assume \varepsilon \in X.

We construct an automaton \mathcal{A} as follows.

* The set of states is X.

* The initial vector is e_1.

* The final vector is H_f( X, {\varepsilon} ).

* The transition is defined to satisfy

H_f(X a, \Sigma^{\star}) = \Delta(a) \cdot H_f(X, \Sigma^{\star}).

To see that there exists \Delta(a) such that H_f(X a, \Sigma^{\star}) = \Delta(a) \cdot H_f(X, \Sigma^{\star}), let x \in X. We decompose H_f(xa,\Sigma^{\star}) on the basis \left\{H_f(x,\Sigma^{\star}) \mid x \in X \right\}: there exists \Delta(a)(x,x’) \in \mathbb{R} such that

H_f(xa,\Sigma^{\star}) = \sum_{x’ \in X} \Delta(a)(x,x’) \cdot H_f(x’,\Sigma^{\star}).

We now argue that \mathcal{A} recognises f: we show by induction that \Delta(w) \cdot \eta = H_f(Xw, {\varepsilon}).

\Delta(aw) \cdot \eta = \Delta(a) \cdot \Delta(w) \cdot \eta

= \Delta(a) \cdot H_f(Xw, {\varepsilon})

= \Delta(a) \cdot H_f(X, {w})

= H_f(Xa, {w})

= H_f(Xaw, {\varepsilon})

Note that we used the equality H_f(X a, w) = \Delta(a) \cdot H_f(X, w) for all words w \in \Sigma^{\star}, which is the projection on the column w of the equality defining \Delta(a).

This concludes the proof.

For the sake of curiosity, we show that there exists a dual construction, which considers columns instead of rows.

### The forward automaton

Since f \neq 0 it follows that H_f( \Sigma^{\star}, {\varepsilon}) \neq 0. We consider the family of vectors given by columns of H_f. Let us consider Y indexing a basis of this family; thanks to the previous assumption we can assume \varepsilon \in Y.

We construct an automaton \mathcal{A} as follows.

* The set of states is Y.

* The initial vector is H_f( {\varepsilon}, Y ).

* The final vector is e_1.

* The transition is defined to satisfy

H_f(\Sigma^{\star}, a Y) = H_f(\Sigma^{\star}, Y) \cdot \Delta(a).

To see that there exists \Delta(a) such that H_f(\Sigma^{\star}, a Y) = H_f(\Sigma^{\star}, Y) \cdot \Delta(a), let y \in Y. We decompose H_f(\Sigma^{\star}, ay) on the basis \left\{H_f(\Sigma^{\star},y) \mid y \in Y\right\}: there exists \Delta(a)(y,y’) \in \mathbb{R} such that

H_f(\Sigma^{\star}, ay) = \sum_{y’ \in Y} H_f(\Sigma^{\star},y’) \cdot \Delta(a)(y,y’).

We now argue that \mathcal{A} recognises f: we show by induction that \alpha \cdot \Delta(w) = H_f( {\varepsilon}, wY).

\alpha \cdot \Delta(wa) = \alpha \cdot \Delta(w) \cdot \Delta(a)

= H_f({\varepsilon}, wY) \cdot \Delta(a)

= H_f({w}, Y) \cdot \Delta(a)

= H_f({wa}, Y)

= H_f({\varepsilon}, waY)

Note that we used the equality H_f(w, a Y) = H_f(w, Y) \cdot \Delta(a) for all words w \in \Sigma^{\star}, which is the projection on the row w of the equality defining \Delta(a).

We finish this post with an alternative abstract point of view.

## An abstract construction without choosing a basis

(Thanks to Pierre Ohlmann for working this out with me!)

Instead of constructing a weighted automaton, we build a representation, which is given by a vector space V of finite dimension (over the reals), a morphism \mu : \Sigma^{\star} \to V and \lambda : V \to \mathbb{R}. A representation computes a function f : \Sigma^{\star} \to \mathbb{R} by f(w) = \lambda(\mu(w)).

One can show that through the choice of a basis for V a representation induces a weighted automaton, and conversely.

We consider the vector space of functions \Sigma^{\star} \to \mathbb{R}, which has infinite dimension. We simplify the notation H_f(w,\varepsilon) into H(w). We let V denote the vector space generated by \left\{ H_w \mid w \in \Sigma^{\star} \right\}, it has finite dimension since \textrm{rank}(H_f) is finite.

Define

\mu(a)(H_w) = H_{wa}.

We first need to show that this is well defined: if H_w = H_{w’}, then H_{wa} = H_{w’a}. This is easily checked:

H_{wa}(u) = H_w(au) = H_{w’}(au) = H_{w’a}(u).

Then we extend \mu into a morphism \mu : \Sigma^{\star} \to V, and let \lambda(H_w) = f(w). We obtain by proving the invariant \mu(w) = H_w that the representation (V,\mu,\lambda) computes the function f.