Mark Stevenson irakaslea da Sheffield Unibertsitatean, eta bertako Natural Language Processing groupikertaldeko kidea.Datorren astean Donostian izango da Ixa Taldea bisitatzen, eta ostiralean hitzaldi bat emango du, blog honetan berriki azaldu diren bi gai uztartuz:
Hitzaldia:Disambiguation of Biomedical Text
Non: Informatika Fakultateko 3.17 gelan (3. solairuan)
Like text in other domains, biomedical documents contain a range of terms with more than one possible meaning. These ambiguities form a significant obstacle to the automatic processing of these texts. Previous approaches to resolving this problem have made use of a variety of knowledge sources including the context in which the ambiguous term is used and domain-specific resources (such as UMLS). We compare a range of knowledge sources which have beenpreviously used and introduce a novel one: MeSH terms. The best performance is obtained using linguistic features in combination with MeSH terms. Performance exceeds previously reported results on a standard test set.
Our approach is supervised and therefore relies on annotated training examples. A novel approach to automatically acquiring additional training data, based on the relevance
feedback technique from Information Retrieval, is presented. Applying this method to generate additional training examples is shown to lead to a further increase in performance.