Towards a UNified Algorithm for the Generation of Referring Expressions

 

TUNA was a research project funded by the UK's Engineering and Physical Sciences Research Council (EPSRC). It involves a collaboration between the Department of Computing Science, University of Aberdeen, the Open University, and the University of Tilburg. The project started in October 2003, and ended in Feburary 2007.

Natural Language Generation programs generate text from an underlying Knowledge Base. It can be difficult to find a mapping from the information in the Knowledge Base to the words in a sentence. Difficulties arise, for example, when the Knowledge Base uses `names' (ie, databases keys) that a hearer/reader does not understand. This can happen, for instance, if the Knowledge Base contains an artificial name like `#Jones083', because `Jones' alone is not uniquely distinguishing; it is also true if the Knowledge Base deals with entities for which no names at all are in common usage (eg, a specific tree or a chair). In all such cases, the program has to "invent" a description that enables the reader to identify the referent. In the case of Mr. Jones, for example, the program could give his name and address; in the case of a tree, some longer description may be necessary (eg, `the green oak on the corner of ... and ...'. The technical term for this set of problems is Generation of Referring Expressions (GRE). GRE is a key aspect of almost any Natural Language Generation system. 

Existing GRE algorithms tend to focus on one particular class of referring expressions, for example conjunctions of atomic or relational properties (eg, `the black dog', `the book on the table'). Our research is aimed at designing and implementing a new algorithm for the generation of referring expressions that generates appropriate descriptions in a far greater variety of situations than any of its predecessors. The algorithm will be more complete than its predecessors because it is able to construct a greater variety of descriptions (involving negations, disjunctions, relations, vagueness, etc.). The descriptions generated should also be more appropriate (ie, more natural in the eyes of a human hearer/reader), because the algorithm will be based on empirical studies involving corpora and controlled experiments. Among other things, these empirical studies will address the question under what circumstances the descriptions should be logically under- or over specific; they will also allow us to prune the search space (ie, the space of all descriptions) which would otherwise threaten to make the problem intractable. The project combines (psycho) linguistic, computational and logical challenges and should be of interest to people whose intellectual home is in either of these areas.

 

Project Members

  • Kees van Deemter (PI, University of Aberdeen)
  • Richard Power (Co-Investigator, Open University)
  • Emiel Krahmer (Visiting Fellow, University of Tilburg)
  • Ielka van der Sluis (Post-Doctoral Research Fellow)
  • Albert Gatt (Research student)
  • Sebastian Varges (Post-Doctoral Research Fellow, 2003-2005)

 

Background Reading

Papers that describe some of the technical background to the project (in pdf format):

 

TUNA Publications

Reports

Journal papers

Book chapters

  • van Deemter, K., and Krahmer, E. (2007). Graphs and Booleans. H. Bunt and R. Muskens (eds.), Computing Meaning III. Dordrecht: Kluwer Academic Publishers.

Conference papers

2007

2006

2003-2005

Workshop papers

2007

2003-2006