RGCL Anniversary Highlights, Day 7
Published on Aug, 16 2022 by RGCL.
The BiRD Project
Today’s highlight goes all the way back to the early 2000s, when RGCL secured its first ESRC grant for the newsworthy BiRD project (“An automatic system to Build Resource Databases for researchers”). The aim of BiRD was to process archives of mailing lists and use information extraction to populate a database with information about Natural Language Processing resources. This was made available for researchers, students, lecturers, project managers, and even industrial companies and funding agencies. It landed RGCL in the news:
The BiRD project was picked up by the press. Pictured are RGCL members Constantin Orasan, Le An Ha, Catalina Barbu, Ruslan Mitkov, and Richard Evans
Ruslan Mitkov and Constantin Orasan were instrumental in the bid-writing process, while Richard Evans and Viktor Pekar (pictured below) undertook the research on the project.
A young Richard Evans and Viktor Pekar (Richard arguably looks the same now)
The BiRD project was crucial in raising the visibility and the profile of the Group in its early days, and several of the key members are still part of the Research Group today.
Some papers produced from this project were:
- Pekar, V. and Evans, R. (2007) Discovery of Language Resources on the Web: Information Extraction from Heterogeneous Documents. In Journal of Literary and Linguistic Computing, ISSN: 0268-1145, Oxford University Press. DOI: 10.1093/llc/fqm010. (abs issue)
- Pekar, V. (2005) Information Extraction from Email Announcements. In Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB-05). Alicante, Spain. (pdf)
- Pekar, V. (2005). Information Extraction from Heterogeneous Documents. In Proceedings of the 2nd Seminar of Languages, Cognition and Information Processing. University of Chongqing, China. (pdf)
- Pekar, V. and Evans, R. (2005). Optimizing the Subtasks in the Double Classification Approach to Information Extraction. In Proceedings of RANLP 2005, Borovetz, Bulgaria - September 2005 (pdf)
- Pekar, V. and Evans, R. (2005). Automatic Discovery of NLP Resources on the Web. In Proceedings of ALLC/ACH 2005, Victoria, Canada - June 2005 (abs)
- Pekar, V., Evans, R. and Mitkov, R. (2004). Categorizing Web Pages as a Preprocessing Step for Information Extraction. In Proceedings of LREC 2004, Lisbon, Portugal (pdf).
- Richard Evans (2004). Building the Corpus Used in the BiRD Project. Technical Report, March 2004 (pdf).
Additional information about the project can be accessed via the Wayback Machine:
BiRD - BuIlding Research Databases for researchers (archive.org)
BiRD - An automatic system to Build Resource Databases for researchers (archive.org)
The BiRD project logo, designed by Richard Evans