Qeqweqeqw

Páginas: 38 (9438 palabras) Publicado: 12 de diciembre de 2010
Hierarchical, Perceptron-like Learning for Ontology-Based Information Extraction
Yaoyong Li
University of Sheffield 211 Portobello Street Sheffield, S1 4DP, UK

Kalina Bontcheva
University of Sheffield 211 Portobello Street Sheffield, S1 4DP, UK

yaoyong@dcs.shef.ac.uk ABSTRACT
Recent work on ontology-based Information Extraction (IE) has tried to make use of knowledge from the targetontology in order to improve semantic annotation results. However, very few approaches exploit the ontology structure itself, and those that do so, have some limitations. This paper introduces a hierarchical learning approach for IE, which uses the target ontology as an essential part of the extraction process, by taking into account the relations between concepts. The approach is evaluated on thelargest available semantically annotated corpus. The results demonstrate clearly the benefits of using knowledge from the ontology as input to the information extraction process. We also demonstrate the advantages of our approach over other state-of-the-art learning systems on a commonly used benchmark dataset. Categories and Subject Descriptors: I.2.7 [Natural Language Processing]: Text analysis;I.5.2 [Pattern Recognition]: Classifier design and evaluation. General Terms: Algorithms; Performance. Keywords: Ontology-based Information Extraction, semantic annotation, hierarchical learning.

kalina@dcs.shef.ac.uk
Information Extraction (IE), a form of natural language analysis, is becoming a central technology for bridging the gap between unstructured text and formal knowledge expressed inontologies. Ontology-Based IE (OBIE) is IE which is adapted specifically for the semantic annotation task. One of the important differences between traditional IE and OBIE is in the use of a formal ontology as one of the systems inputs and as the target output. The main contribution of this paper is in investigating the application of hierarchical classification learning to semantic annotation, aspart of an ontology-based IE system. Hierarchical classification takes into account the relations between concepts, thus benefiting directly from the ontology. In particular, this paper studies the large margin hierarchical classification learning algorithm Hieron proposed in [8], because it is very efficient during both training and classification. Computational efficiency is of major importance for OBIE,because depending on the size of the ontology, the system may need to train hundreds of classifiers. However, it should be noted that work presented in this paper is not a simple application of the Hieron algorithm, as proposed in [8]. In fact, OBIE specifics lead to two important modifications: introduction of multi-loop learning and a parameter which makes the algorithm applicable on any IEcorpus. Both of these resulted in a quantifiable improvement in performance (see Table 6 in Section 5). In addition, semantic annotation is very different from document classification, which is what Hieron was originally developed for. Consequently, another contribution of this work is in showing how OBIE can be decomposed into several hierarchical classification tasks, which can then be approached withan algorithm such as Hieron. Another important contribution of this work is in the use of the Sekt2 ontology-annotated news corpus for evaluation. To the best of our knowledge, it is the only corpus suitable for evaluating OBIE, which has a non-trivial number of classes (146 classes) from an independetly created ontology. The problem with using only the Sekt corpus is that no other systems havebeen evaluated on it, apart from the SVM and Perceptron ones reported here. Unfortunately other recent corpora for IE evaluation (e.g., Pascal challenge3 , CONLL’034 ) are either not fully available or use a
2 For further information on the Sekt project, see http://www.sekt-project.com. The corpus itself is available on request from the second author. 3 http://nlp.shef.ac.uk/pascal/Corpus.html 4...
Leer documento completo

Regístrate para leer el documento completo.

Conviértase en miembro formal de Buenas Tareas

INSCRÍBETE - ES GRATIS