"#input", "Format of the key"=> "#key", "Format of the output" => "#out"); $navLinks = array("Home" => $rootPath, "Events" => $rootPath . "/events/index.php", "Anaphora Resolution Evaluation" => "/events/ARE/index.php", "Data for task 4" => ""); generateTopDocument("Data for ARE: Task 4"); generateMenu($sideLinks, $navLinks, 0); ?>
Data for task 4
The training data for task 4 can be downloaded from here.

For each text there are two files. The first one finishes in -input.xml and constitutes the input text for your program. In the testing stage of ARE the input files will be in this format. The second file finishes in -key.xml and represents the gold standard. It can be used to measure the accuracy of your programs.

Format of the input

For this task the entities which need to be resolved are not indicated. Therefore the participants will have to determine the referential expressions first and then build the coreferential chains. In order to facilitate evaluation the input for this task is not plain text. Spaces and punctuation have been preceded in the texts by the <node> so that snippets of texts can be easily identified. An example of a text is:

      <p>
        <node id="26"/>
        Japan
        <node id="27"/>
        and
        <node id="28"/>
        Peru
        <node id="29"/>
        on
        <node id="30"/>
        Saturday
        <node id="31"/>
        took
        <node id="32"/>
        a
        <node id="33"/>
        tough
        <node id="34"/>
        stand
        <node id="35"/>
        on
        <node id="36"/>
        rebel
        <node id="37"/>
        demands
        <node id="38"/>
        in
        <node id="39"/>
        the
        <node id="40"/>
        Lima
        <node id="41"/>
        hostage
        <node id="42"/>
        crisis
        <node id="43"/>
        ,
        <node id="44"/>
        but
        <node id="45"/>
        their
        <node id="46"/>
        accord
        <node id="47"/>
        was
        <node id="48"/>
        swiftly
        <node id="49"/>
        rejected
        <node id="50"/>
        by
        <node id="51"/>
        the
        <node id="52"/>
        guerrillas
        <node id="53"/>
        holding
        <node id="54"/>
        72
        <node id="55"/>
        captives
        <node id="56"/>
        in
        <node id="57"/>
        the
        <node id="58"/>
        Japanese
        <node id="59"/>
        ambassador ' s
        <node id="60"/>
        residence
        <node id="61"/>
        .
    </p>

In this example the NP the Japanese ambassador's residence is identified by the start position 57 and end position 60 (please note the end position is not 61!). In the same way the they pronoun has the start position 45 and end position 45.

Format of the key

The key file used in this task is very similar to the one in task one. The difference is that the referential expressions are no longer indicated using IDs, but using their start and end positions. The values of these attributes correspond to the IDs attached to the <node> tags the input text. The <antecedents> tag contains the list of antecedents. The attribute value is only to improve the legibility of the XML and is not used in the evaluation process.

      <chain id="c0">
        <node value=" Peru ' s President Alberto Fujimori" start="117" end="120"/>
        <node value=" President Fujimori" start="154" end="155"/>
        <node value=" his" start="157" end="157"/>
        <node value=" Fujimori ' s" start="190" end="190"/>
        <node value=" Fujimori ' s" start="216" end="216"/>
        <node value=" Fujimori" start="305" end="305"/>
        <node value=" Fujimori ' s" start="511" end="511"/>
        <node value=" Fujimori" start="524" end="524"/>
        <node value=" Fujimori ' s" start="634" end="634"/>
        <node value=" Fujimori" start="640" end="640"/>
        <node value=" Fujimori" start="673" end="673"/>
        <node value=" He" start="676" end="676"/>
        <node value=" Fujimori , the son of Japanese immigrants" start="747" end="753"/>
        <node value=" his" start="766" end="766"/>
        <node value=" He" start="777" end="777"/>
      </chain>
   
Format of the output

The format of the output is the same with the format of the key file.