Angluin's style learning for weighted automata

We show that weighted automata over the reals can be learned efficiently in Angluin's supervised scenario, also called the exact learning framework.

This post uses all the notations of the previous post, and the result presented here was proved in this paper.

The scenario is Angluin’s style learning, also called exact learning: a master knows a function $f : \Sigma^{\star} \to \mathbb{R}$ , and can answer two types of queries.
* membership query: when asked about $w \in \Sigma^{\star}$ , the master replies with $f(w)$
* equivalence query: when asked whether $f = \mathcal{A}$ for a given weighted automaton, the master replies YES or gives a counter-example.

Theorem: (Beimel, Bergadano, Bshouty, Kushilevitz, Varricchio, 2000)
Weighted automata are efficiently learnable.

The algorithm maintains two sets $X,Y$ of finite words of the same size such that the matrix $H_f(X,Y)$ is guaranteed to have full rank, and either $X$ is empty or $\varepsilon \in X$

The first question is how to initialise the data, and then how to maintain it.
For initialisation, we set $X = Y = \emptyset$ .

We now explain how to extend the data. This goes in two steps:
* constructing an automaton and submitting it to the master,
* expanding the matrix using the counter-example given by the master.

We construct a hypothesis automaton. The first step is to fill in $H_f(Xa,Y)$ and $H_f(X, \left\{\varepsilon \right\})$ using at most $O(n^2)$ membership queries.
The automaton $\mathcal{A}$ is as follows.
* The set of states is $X$ .
* The initial vector is $e_1$ .
* The final vector is $H_f(X, \left\{\varepsilon \right\})$ .
* The transition is defined to satisfy
$H_f(Xa,Y) = \Delta(a) \cdot H_f(X,Y).$

Equivalently $\Delta(a) = H_f(Xa,y) \cdot H_f(X,Y)^{-1}$ .

Note that we do not claim that this automaton satisfies $\mathcal{A}(xy) = f(xy)$ for $x \in X, y \in Y$ .

We submit $\mathcal{A}$ as an equivalence query to the master, and assume that we get in return a word $z$ such that $\mathcal{A}(z) \neq f(z)$ .

We say that a word $w$ is correct if $\Delta(w) \cdot \eta = H_f(Xw, \left\{\varepsilon \right\})$ .

We observe that $\varepsilon$ is correct and that $z$ is not correct, since this would imply $\mathcal{A}(z) = \alpha \cdot \Delta(z) \cdot \eta = f(z)$ .

Let $w$ such that $w$ is correct and $aw$ is not correct. Such a word can be detected by querying the suffixes of $z$ of increasing length, thanks to the two observations above.

By definition $w$ is correct:

$\Delta(w) \cdot \eta = H_f(Xw, \left\{\varepsilon \right\}).$

And $aw$ is not correct, so there exists $x \in X$ such that

$\Delta(aw) \cdot \eta(x) \neq f(xaw).$

Lemma:
$H_f(X \cup{xa }, Y \cup{w })$ has full rank.

Since $H_f(X,Y)$ has full rank, it is enough to show that the last column is linearly independent from the others. Assume towards contradiction that this is not the case, there exists $\lambda \in \mathbb{R}^Y$ such that $H_f(X \cup{xa },{ w }) = H_f(X \cup{xa }, Y) \cdot \lambda$ .

This implies

$\begin{array}{ccc} \Delta(aw) \cdot \eta&=&\Delta(a) \cdot \Delta(w) \cdot \eta \\ & =&\Delta(a) \cdot H_f(Xw,{\varepsilon }) \\ & =&\Delta(a) \cdot H_f(X,{w }) \\ & =&\Delta(a) \cdot H_f(X, Y) \cdot \lambda \\ & =&H_f(Xa, Y) \cdot \lambda \end{array}$

Instantiating in $x$ we get $\Delta(aw) \cdot \eta = f(xaw)$ , a contradiction with $aw$
being not correct.

Note that the lemma implies that $xa \notin X$ and $w \notin Y$ , so we do maintain that the size of $X$ and $Y$ are equal.

How long does the algorithm run?

Let $n = \textrm{rank}(H_f)$ , we claim that the automaton computed when $X$ and $Y$ have size $n$ does compute $f$ . Indeed, if it did not, then we could extend $X$ and $Y$ as the algorithm does, obtaining a submatrix of $H_f$ of rank $n+1$ , which is impossible.
It follows that the total number of queries is:
* at most $n$ equivalence queries,
* $O(n^3)$ membership queries.

Note that the algorithm we presented here is based on the construction of the backward automaton (see the previous post). A dual algorithm can be constructed using the forward automaton, which is actually what is done in the original paper.

Weighted automata and matrix factorisations

Read post

Fliess' theorem for minimising weighted automata

Read post

Value iteration for parity games

Read post

See all posts

See all posts in category