"#input", "Format of the key"=> "#key", "Format of the output" => "#out"); $navLinks = array("Home" => $rootPath, "Events" => $rootPath . "/events/index.php", "Anaphora Resolution Evaluation" => "/events/ARE/index.php", "Data for task 1" => ""); generateTopDocument("Data for ARE: Task 1"); generateMenu($sideLinks, $navLinks, 0); ?>
Data for task 1

The training data for task 1 can be downloaded from here.

For each text there are two files. The first one finishes in -input.xml and constitutes the input text for your program. In the testing stage the input files will be in this format. The second file finishes in -key.xml and represents the gold standard. It can be used to measure the accuracy of your programs.

Format of the input

The text in which the anaphoric references need to be resolved is contained in the <text> tag. The pronouns which need to be resolved are in the <anaphoric_pronouns> tag.

The input files have all the entities which can be the antecedent of a pronoun marked using the <entity> tag. Each tag contains a unique ID. This is a snippet from an input text:

      <p>
        <entity id="2">Israeli-PLO relations</entity>
          have hit
        <entity id="3">a new low</entity>
          with
        <entity id="4">the Palestinian Authority</entity>
          saying
        <entity id="5">Israel</entity>
          is wrong to think
        <entity id="6">it</entity>
          can treat
        <entity id="7">the Authority</entity>
          like
        <entity id="8">a client militia</entity>
          .
      </p>
      
The <anaphoric_pronouns> section contains the list of pronouns which need to be resolved. Each pronoun is indicated by the <pronoun> tag and the ID of the pronoun that corresponds to the ID of the entity in the text. The <pronoun> tag also has a value attribute. This attribute is only to increase the legibility and can be ignored.

An example of list of pronouns to be resolved is:
      <anaphoric_pronouns>
        <pronoun id="6" value=" it"/>
        <pronoun id="91" value=" they"/>
        <pronoun id="83" value=" He"/>
        <pronoun id="159" value=" he"/>
        <pronoun id="167" value=" his"/>
      </anaphoric_pronouns>
    
Format of the key

The key file contains pairs of pronouns and lists of antecedents. The antecedents are entities from the same chain as the pronoun which occur in the text before the pronoun. The pronoun is considered correctly resolved if any of the antecedents from the list is indicated as the entity referred to by the pronoun.

An example of a pair is
      <pair id="p6">
        <pronoun id="62" value=" it"/>

        <antecedents>
          <antecedent id="4" value=" the Palestinian Authority"/>
          <antecedent id="7" value=" the Authority"/>
          <antecedent id="20" value=" the Authority"/>
          <antecedent id="26" value=" The Authority"/>
          <antecedent id="34" value=" the Authority"/>
          <antecedent id="59" value=" the Authority"/>
        </antecedents>
      </pair>
    
The <pair> tag contains a pronoun and its list of antecedents. The pronoun is indicated by the id attribute in the <pronoun> tag. This id corresponds to an entity in the input text. The <antecedents> tag contains the list of antecedents. Each antecedent is indicated by the <antecedent> tag which has a unique id corresponding to an entity in the input text. The value attribute of the <antecedent> tag is only to increase the legibility of the XML and will not be used in the evalution.
Format of the output

The format of the output is very similar to the format of the key. For each anaphoric pronoun, the output should contain a pair which identifies the pronouns and any of its antecedents. For example:

      <pair id="p6">
        <pronoun id="62" value=" it"/>
        <antecedent id="4" value=" the Palestinian Authority"/>
      </pair>