Skip to main content
Nathanaël Fijalkow

Angluin's style learning for weighted automata

We show that weighted automata over the reals can be learned efficiently in Angluin's supervised scenario, also called the exact learning framework.

This post uses all the notations of the previous post, and the result presented here was proved in this paper.

The scenario is Angluin’s style learning, also called exact learning: a master knows a function f : \Sigma^{\star} \to \mathbb{R}, and can answer two types of queries.
* membership query: when asked about w \in \Sigma^{\star}, the master replies with f(w)
* equivalence query: when asked whether f = \mathcal{A} for a given weighted automaton, the master replies YES or gives a counter-example.

Theorem: (Beimel, Bergadano, Bshouty, Kushilevitz, Varricchio, 2000)
Weighted automata are efficiently learnable.

The algorithm maintains two sets X,Y of finite words of the same size such that the matrix H\_f(X,Y) is guaranteed to have full rank, and either X is empty or \varepsilon \in X

The first question is how to initialise the data, and then how to maintain it.
For initialisation, we set X = Y = \emptyset.

We now explain how to extend the data. This goes in two steps:
* constructing an automaton and submitting it to the master,
* expanding the matrix using the counter-example given by the master.

We construct a hypothesis automaton. The first step is to fill in H\_f(Xa,Y) and H\_f(X, \left\\{\varepsilon \right\\}) using at most O(n^2) membership queries.
The automaton \mathcal{A} is as follows.
* The set of states is X.
* The initial vector is e_1.
* The final vector is H\_f(X, \left\\{\varepsilon \right\\}).
* The transition is defined to satisfy
H\_f(Xa,Y) = \Delta(a) \cdot H\_f(X,Y).

Equivalently \Delta(a) = H\_f(Xa,y) \cdot H\_f(X,Y)^{-1}.

Note that we do not claim that this automaton satisfies \mathcal{A}(xy) = f(xy) for x \in X, y \in Y.

We submit \mathcal{A} as an equivalence query to the master, and assume that we get in return a word z such that \mathcal{A}(z) \neq f(z).

We say that a word w is correct if \Delta(w) \cdot \eta = H\_f(Xw, \left\\{\varepsilon \right\\}).

We observe that \varepsilon is correct and that z is not correct, since this would imply \mathcal{A}(z) = \alpha \cdot \Delta(z) \cdot \eta = f(z).

Let w such that w is correct and aw is not correct. Such a word can be detected by querying the suffixes of z of increasing length, thanks to the two observations above.

By definition w is correct:

\Delta(w) \cdot \eta = H\_f(Xw, \left\\{\varepsilon \right\\}).

And aw is not correct, so there exists x \in X such that

\Delta(aw) \cdot \eta(x) \neq f(xaw).

Lemma:
H\_f(X \cup \{ xa \}, Y \cup \{ w \}) has full rank.

Since H\_f(X,Y) has full rank, it is enough to show that the last column is linearly independent from the others. Assume towards contradiction that this is not the case, there exists \lambda \in \mathbb{R}^Y such that H\_f(X \cup \{ xa \},\{ w \}) = H\_f(X \cup \{ xa \}, Y) \cdot \lambda.

This implies

\begin{array}{ccc}\Delta(aw) \cdot \eta & = & \Delta(a) \cdot \Delta(w) \cdot \eta \\ & = & \Delta(a) \cdot H\_f(Xw, \{ \varepsilon \}) \\ & = & \Delta(a) \cdot H\_f(X, \{ w \}) \\ & = & \Delta(a) \cdot H\_f(X, Y) \cdot \lambda \\ & = & H\_f(Xa, Y) \cdot \lambda \end{array}

Instantiating in x we get \Delta(aw) \cdot \eta = f(xaw), a contradiction with aw
being not correct.

Note that the lemma implies that xa \notin X and w \notin Y, so we do maintain that the size of X and Y are equal.

How long does the algorithm run?

Let n = \textrm{rank}(H\_f), we claim that the automaton computed when X and Y have size n does compute f. Indeed, if it did not, then we could extend X and Y as the algorithm does, obtaining a submatrix of H\_f of rank n+1, which is impossible.
It follows that the total number of queries is:
* at most n equivalence queries,
* O(n^3) membership queries.

Note that the algorithm we presented here is based on the construction of the backward automaton (see the previous post). A dual algorithm can be constructed using the forward automaton, which is actually what is done in the original paper.

Learning theory, Weighted automata

Weighted automata and matrix factorisations

Read post
Learning theory

Fliess' theorem for minimising weighted automata

Read post
Weighted automata

A polynomial time algorithm for the equivalence problem of weighted automata over a field

Read post
See all posts
See all posts in category

Nathanaël Fijalkow

351, cours de la Libération F-33405 Talence cedex, France

nathanael.fijalkow@gmail.com
How to find my office
made by superskrypt
  • Home
  • Curriculum Vitae
  • Publications
  • Research blog

Nathanaël Fijalkow

nathanael.fijalkow@gmail.com
made by superskrypt