One of the goals of the CAST project was to develop an annotated corpus for automatic summarization. A description of the developed corpus can be found in:

  • Laura Hasler, Constantin Orasan and Ruslan Mitkov (2003): Building better corpora for summarisation. In Proceedings of Corpus Linguistics 2003, Lancaster, UK, March, pp. 309 -- 319 (bib pdf)
If you want to find out more about the corpus you can:
  • read a description of the corpus
  • browse and compare the annotated texts. In addition this page also allows to run term-based summarisation methods on the annotated texts and measure their accuracy using precision and recall.
  • download the corpus

If you use our corpus we would like to hear from you.

