Ezagutzaren erauzketa eleanitza tamaina handiko baliabideetan eremu berria da Hizkuntza Teknologian. Gure taldea ildo horretan dabil lanean Kyoto proiektu europarra abiatu zenetik, eta azkenaldian Paths proiektuan ere bete betean. Alemaniako Darmstadt unibertsitatetik bisitan (otsailetik ekainera) etorri zaigun Nicolai Erbs doktoregaiak horretaz hitz egingo digu: Nola erauzi automatikoki ezagutza tamaina handiko baliabideetatik.
Gaia: Multilingual acquisition of large scale knowledge resources.
(Erauzketa eleanitza tamaina handiko ezagutza-baliabidetan).
Tokia: 3.2 aretoa. Informatika Fakultatea
Hizlaria: Nicolai Erbs.
Technical University of Darmstadt (Germany)
Eguna: Otsailaren 24an
A vast amount of content is produced by many users every day, but due to the lack of structure, their contribution is often ignored by other users. This talk presents approaches such as keyphrase extraction and link discovery, enabling automatic structure generation for texts, thus making them more readable.
However, the major challenge of disambiguating word senses is not tackled. Solving this challenge could improve the proposed approaches significantly. Especially for the task of link discovery, named entity disambiguation is a fundamental issue.
The talk introduces Wikipedia as a valuable knowledge repository, for it is full of named entities. Basically all famous – and not quite as famous – people have their own Wikipedia article, which are heavily interconnected (e.g. two actors participated in the same movie). These interconnection is represented in Wikipedia articles as links and can be used as input for graph-based named entity disambiguation systems.