TUNA corpus

TUNA corpus

About the corpus

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment, and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008).


Obtaining the TUNA Corpus

A version of the corpus was released for public distribution in October 2009. It forms part of the ELRA Language Resources Catalogue, and can be obtained by contacting ELRA directly. Alternatively, you can download the latest distribution from here.


Annotation and documentation

The following documents describe the annotation procedure and XML format of the corpus:

  1. Van der Sluis, I., A. Gatt and K. van Deemter (2006). Manual for the TUNA Corpus: Referring expressions in two domains. Technical Report AUCS/TR0705, University of Aberdeen.
  2. Gatt, A., van der Sluis, I., and van Deemter, K. (2008). XML Format Guidelines for the TUNA Corpus. Technical Report, University of Aberdeen.


Publications related to the corpus

These papers describe evaluation studies involving the TUNA Corpus, as well as giving further details on the design of the experiment and annotation.

  1. van Deemter, K., van der Sluis, I. & Gatt, A. (2006). Building a semantically transparent corpus for the generation of referring expressions. Proceedings of the 4th International Conference on Natural Language Generation (Special Session on Data Sharing and Evaluation), INLG-06.
  2. Gatt, A., van der Sluis, I. & van Deemter, K. (2007). Assessing algorithms for the generation of referring expressions, using a semantically and pragmatically transparent corpus.
  3. van der Sluis, I., Gatt, A. & van Deemter, K. (2007). Evaluating algorithms for the generation of referring expressions: Going beyond toy domains.
  4. Gatt, A. and van Deemter, K. (2007). Incremental generation of plural descriptions: Similarity and partitioning.
  5. Gatt, A.,van der Sluis, I., and van Deemter, K. (2007). Corpus-based evaluation of referring expressions generation. Workshop on Shared Tasks and Evaluation in NLG, Arlington, Virginia.