Abstract
The occurrence of syntactic phenomena such as coordination and subordination is characteristic of long, complex sentences. Text simplification systems need to detect and categorise constituents in order to generate simpler sentences. These constituents are typically bounded or linked by signs of syntactic complexity, which include conjunctions, complementisers, wh-words, and punctuation marks. This paper proposes a supervised tagging approach to classify these signs in accordance with their linking and bounding functions. The performance of the approach is evaluated both intrinsically, using an annotated corpus covering three different genres, and extrinsically, by evaluating the impact of classification errors on an automatic text simplification system. The results are encouraging.
Electronic version
http://clg.wlv.ac.uk/papers/dornescu-RANLP-2013.pdfBibTeX reference
@inproceedings{dornescu:RANLP:2013, author = {Dornescu, Iustin and Evans, Richard and Orasan, Constantin}, title = {{A Tagging Approach to Identify Complex Constituents for Text Simplification}}, booktitle = {Proceedings of Recent Advances in Natural Language Processing}, pages = {221 -- 229}, address = {Hissar, Bulgaria}, year = {2013} }