2013 Academic Year Seminars
Speaker(s): Marco Turchi
This work describes the design of an autonomous agent that can teach itself how to translate from a foreign language, by first assembling its own training set, then using it to improve its vocabulary and language model. The key idea is that a Statistical Machine Translation package can be used for the Cross-Language Retrieval Task of assembling a training set from a vast amount of available text (e.g. a large multilingual corpus, or the Web) and then train on that data, repeating that process several times. The stability issues related to such a feedback loop are addressed by a mathematical model, connecting statistical and control-theoretic aspects of the system.The agent has been tested on real-world tasks, showing that indeed it can improve its translation performance autonomously and in a stable fashion, when seeded with a very small initial training set. The modelling approach that has been developed for this agent is general, and the authors believe will be useful for an entire class of self-learning autonomous agents working on the Web.