"#input", "Format of the key"=> "#key", "Format of the output" => "#out"); $navLinks = array("Home" => $rootPath, "Events" => $rootPath . "/events/index.php", "Anaphora Resolution Evaluation" => "/events/ARE/index.php", "Data for task 3" => ""); generateTopDocument("Data for ARE: Task 3"); generateMenu($sideLinks, $navLinks, 0); ?>
Data for task 3

The training data for task 3 can be downloaded from here.

For each text there are two files. The first one finishes in -input.xml and constitutes the input text for your program. In the testing stage of ARE the input files will be in this format. The second file finishes in -key.xml and represents the gold standard. It can be used to measure the accuracy of your programs.

Format of the input

For this task the entities which need to be resolved are not indicated. Therefore the participants will have to determine the referential pronouns first and then resolve them. In order to facilitate evaluation the input for this task is not plain text. Spaces and punctuation have been preceded in the texts by the <node> tag so that snippets of texts can be easily identified. An example of a text is:

      <p>
        <node id="26"/>
        Japan
        <node id="27"/>
        and
        <node id="28"/>
        Peru
        <node id="29"/>
        on
        <node id="30"/>
        Saturday
        <node id="31"/>
        took
        <node id="32"/>
        a
        <node id="33"/>
        tough
        <node id="34"/>
        stand
        <node id="35"/>
        on
        <node id="36"/>
        rebel
        <node id="37"/>
        demands
        <node id="38"/>
        in
        <node id="39"/>
        the
        <node id="40"/>
        Lima
        <node id="41"/>
        hostage
        <node id="42"/>
        crisis
        <node id="43"/>
        ,
        <node id="44"/>
        but
        <node id="45"/>
        their
        <node id="46"/>
        accord
        <node id="47"/>
        was
        <node id="48"/>
        swiftly
        <node id="49"/>
        rejected
        <node id="50"/>
        by
        <node id="51"/>
        the
        <node id="52"/>
        guerrillas
        <node id="53"/>
        holding
        <node id="54"/>
        72
        <node id="55"/>
        captives
        <node id="56"/>
        in
        <node id="57"/>
        the
        <node id="58"/>
        Japanese
        <node id="59"/>
        ambassador ' s
        <node id="60"/>
        residence
        <node id="61"/>
        .
    </p>

In this example the NP the Japanese ambassador's residence is identified by the start position 57 and end position 60 (please note the end position is not 61!). In the same way, the they pronoun has the start position 45 and end position 45.

Format of the key

The key file used in this task is very similar to the one in task two. The difference is that the referring expressions and their antecedents are no longer indicated using IDs, but using their start and end positions. The key file contains pairs of pronouns and lists of antecedents. The antecedents are entities from the same chain as the pronoun which occur in the text before the pronoun. The pronoun is considered correctly resolved if any of the antecedents from the list is indicated as the entity referred to by the pronoun.

An example of a pair is
      <pair id="p1">
        <pronoun value=" He" start="676" end="676"/>
        <antecedents>
          <antecedent value=" Peru ' s President Alberto Fujimori" start="117" end="120"/>
          <antecedent value=" President Fujimori" start="154" end="155"/>
          <antecedent value=" his" start="157" end="157"/>
          <antecedent value=" Fujimori ' s" start="190" end="190"/>
          <antecedent value=" Fujimori ' s" start="216" end="216"/>
          <antecedent value=" Fujimori" start="305" end="305"/>
          <antecedent value=" Fujimori ' s" start="511" end="511"/>
          <antecedent value=" Fujimori" start="524" end="524"/>
          <antecedent value=" Fujimori ' s" start="634" end="634"/>
          <antecedent value=" Fujimori" start="640" end="640"/>
          <antecedent value=" Fujimori" start="673" end="673"/>
        </antecedents>
      </pair>
    
The <pair> tag contains a pronoun and its list of antecedents. The pronoun is indicated by the start and end attributes of the <pronoun> tag. The values of these attributes correspond to the IDs attached to the <node> tags the input text. The <antecedents> tag contains the list of antecedents. Each antecedent is indicated by the <antecedent> tag which has a start and an end attribute which allows its identification in the input text. The value attribute of the <antecedent> tag is only to increase the legibility of the XML and will not be used in the evalution.
Format of the output

The format of the output is very similar to the format of the key. For each anaphoric pronoun, the output should contain a pair which identifies the pronoun and any of its antecedents. For example:

      <pair id="p6">
        <pronoun value=" He" start="676" end="676"/>
        <antecedent value=" Fujimori" start="640" end="640"/>
      </pair>