Fliess' theorem for minimising weighted automata

We state and prove Fliess' theorem, which says that the minimal automaton recognising a given formal series has size the rank of its Hankel matrix.

A formal series (here over the reals) is a function $f : \Sigma^{\star} \to \mathbb{R}$ . The Hankel matrix of $f$ is a bi-infinite matrix $H_f \in \mathbb{R}^{\Sigma^{\star} \times \Sigma^{\star}}$ defined by

$H_f(u,v) = f(uv)$

For recognising formal series we use weighted automata: $\mathcal{A} = (Q, \alpha \in \mathbb{R}^Q, (\Delta(a) \in \mathbb{R}^{Q \times Q})_{a \in A}, \eta \in \mathbb{R}^Q)$ .

Such an automaton recognises the function

$\mathcal{A}(a_1 \cdots a_n) = \alpha \cdot \underbrace{\Delta(a_1) \cdots \Delta(a_n)}_{\Delta(a_1 \cdots a_n)} \cdot \eta$

Theorem: (Fliess ’74)
* Any automaton recognising $f$ has at least $\textrm{rank}(H_f)$ many states,
* There exists an automaton recognising $f$ with $\textrm{rank}(H_f)$ many states.

First implication: the backward-forward decomposition

Let $\mathcal{A}$ be an automaton recognising $f$ .

Define $P_\mathcal{A}(\cdot,q) \in \mathbb{R}^{\Sigma^{\star} }$ the formal series recognised by the weighted automaton $\mathcal{A}$ with $e_q$ as final vector, and $S_\mathcal{A}(\cdot,q) \in \mathbb{R}^{\Sigma^{\star} }$ the formal series recognised by the weighted automaton $\mathcal{A}$ with $e_q$ as initial vector.

We have $H_f(u,v) = \sum_{q \in Q} P_\mathcal{A}(u,q) \cdot S_\mathcal{A}(v,q)$ . We wrote $H_f$ as a sum of $Q$ terms, each of which is a product of a row vector with a column vector, hence has rank $1$ . We conclude by noting that the rank of a sum of matrices is at most the sum of the ranks.

Converse implication: construction of the minimal automaton

We introduce some notations. For $X,Y$ two subsets of $\Sigma^{\star}$ , we let $H_f(X,Y)$ denote the submatrix of $H_f$ restricted to rows in $X$ and columns in $Y$ . Observe the following equality: $H_f(Xa,Y) = H_f(X,aY)$ .

The backward automaton

Without loss of generality let us assume that $f \neq 0$ . It follows that $H_f( {\varepsilon}, \Sigma^{\star} ) \neq 0$ . We consider the family of vectors given by rows of $H_f$ . Let us consider $X$ indexing a basis of this family; thanks to the previous assumption we can assume $\varepsilon \in X$ .

We construct an automaton $\mathcal{A}$ as follows.
* The set of states is $X$ .
* The initial vector is $e_1$ .
* The final vector is $H_f( X, {\varepsilon} )$ .
* The transition is defined to satisfy
$H_f(X a, \Sigma^{\star}) = \Delta(a) \cdot H_f(X, \Sigma^{\star}).$

To see that there exists $\Delta(a)$ such that $H_f(X a, \Sigma^{\star}) = \Delta(a) \cdot H_f(X, \Sigma^{\star})$ , let $x \in X$ . We decompose $H_f(xa,\Sigma^{\star})$ on the basis $\left\{H_f(x,\Sigma^{\star}) \mid x \in X \right\}$ : there exists $\Delta(a)(x,x’) \in \mathbb{R}$ such that

$H_f(xa,\Sigma^{\star}) = \sum_{x’ \in X} \Delta(a)(x,x’) \cdot H_f(x’,\Sigma^{\star}).$

We now argue that $\mathcal{A}$ recognises $f$ : we show by induction that $\Delta(w) \cdot \eta = H_f(Xw, {\varepsilon})$ .

$\Delta(aw) \cdot \eta = \Delta(a) \cdot \Delta(w) \cdot \eta = \Delta(a) \cdot H_f(Xw, {\varepsilon}) = \Delta(a) \cdot H_f(X, {w}) = H_f(Xa, {w}) = H_f(Xaw, {\varepsilon})$

Note that we used the equality $H_f(X a, w) = \Delta(a) \cdot H_f(X, w)$ for all words $w \in \Sigma^{\star}$ , which is the projection on the column $w$ of the equality defining $\Delta(a)$ .

This concludes the proof.

For the sake of curiosity, we show that there exists a dual construction, which considers columns instead of rows.

The forward automaton

Since $f \neq 0$ it follows that $H_f( \Sigma^{\star}, {\varepsilon}) \neq 0$ . We consider the family of vectors given by columns of $H_f$ . Let us consider $Y$ indexing a basis of this family; thanks to the previous assumption we can assume $\varepsilon \in Y$ .

We construct an automaton $\mathcal{A}$ as follows.
* The set of states is $Y$ .
* The initial vector is $H_f( {\varepsilon}, Y )$ .
* The final vector is $e_1$ .
* The transition is defined to satisfy
$H_f(\Sigma^{\star}, a Y) = H_f(\Sigma^{\star}, Y) \cdot \Delta(a).$

To see that there exists $\Delta(a)$ such that $H_f(\Sigma^{\star}, a Y) = H_f(\Sigma^{\star}, Y) \cdot \Delta(a)$ , let $y \in Y$ . We decompose $H_f(\Sigma^{\star}, ay)$ on the basis $\left\{H_f(\Sigma^{\star},y) \mid y \in Y\right\}$ : there exists $\Delta(a)(y,y’) \in \mathbb{R}$ such that

$H_f(\Sigma^{\star}, ay) = \sum_{y’ \in Y} H_f(\Sigma^{\star},y’) \cdot \Delta(a)(y,y’).$

We now argue that $\mathcal{A}$ recognises $f$ : we show by induction that $\alpha \cdot \Delta(w) = H_f( {\varepsilon}, wY)$ .

$\alpha \cdot \Delta(wa) = \alpha \cdot \Delta(w) \cdot \Delta(a) = H_f({\varepsilon}, wY) \cdot \Delta(a) = H_f({w}, Y) \cdot \Delta(a) = H_f({wa}, Y) = H_f({\varepsilon}, waY)$

Note that we used the equality $H_f(w, a Y) = H_f(w, Y) \cdot \Delta(a)$ for all words $w \in \Sigma^{\star}$ , which is the projection on the row $w$ of the equality defining $\Delta(a)$ .

We finish this post with an alternative abstract point of view.

An abstract construction without choosing a basis

(Thanks to Pierre Ohlmann for working this out with me!)

Instead of constructing a weighted automaton, we build a representation, which is given by a vector space $V$ of finite dimension (over the reals), a morphism $\mu : \Sigma^{\star} \to V$ and $\lambda : V \to \mathbb{R}$ . A representation computes a function $f : \Sigma^{\star} \to \mathbb{R}$ by $f(w) = \lambda(\mu(w))$ .

One can show that through the choice of a basis for $V$ a representation induces a weighted automaton, and conversely.

We consider the vector space of functions $\Sigma^{\star} \to \mathbb{R}$ , which has infinite dimension. We simplify the notation $H_f(w,\varepsilon)$ into $H(w)$ . We let $V$ denote the vector space generated by $\left\{ H_w \mid w \in \Sigma^{\star} \right\}$ , it has finite dimension since $\textrm{rank}(H_f)$ is finite.
Define

$\mu(a)(H_w) = H_{wa}.$

We first need to show that this is well defined: if $H_w = H_{w’}$ , then $H_{wa} = H_{w’a}$ . This is easily checked:

$H_{wa}(u) = H_w(au) = H_{w’}(au) = H_{w’a}(u).$

Then we extend $\mu$ into a morphism $\mu : \Sigma^{\star} \to V$ , and let $\lambda(H_w) = f(w)$ . We obtain by proving the invariant $\mu(w) = H_w$ that the representation $(V,\mu,\lambda)$ computes the function $f$ .

Weighted automata and matrix factorisations

Read post

Angluin's style learning for weighted automata

Read post

Value iteration for parity games

Read post

See all posts

See all posts in category