Presenta
‘Parsing’ is the term used to describe the process of automatically building syntactic analyses of a sentence in terms of a given grammar and lexicon. Theresulting syntactic analyses may be used as input to a process of semantic interpretation, (or perhaps phonological interpretation, where aspects of this, like prosody, are sensitive to syntactic structure). Occasionally, ‘parsing’ is also used to include both syntactic and semantic analysis. We use it in the more conservative sense here, however. In most contemporary grammatical formalisms, theoutput of parsing is something logically equivalent to a tree, displaying dominance and precedence relations between constituents of a sentence, perhaps with further annotations in the form of attribute-value equations (‘features’) capturing other aspects of linguistic description. However, there are many different possible linguistic formalisms, and many ways of representing each of them, andhence many different ways of representing the results of parsing. We shall assume here a simple tree representation, and an underlying context-free grammatical (CFG) formalism. However, all of the algorithms decribed here can usually be used for more powerful unification based formalisms, provided these retain a context-free ‘backbone’, although in these cases their complexity and termination propertiesmay be different. Parsing algorithms are usually designed for classes of grammar rather than tailored towards individual grammars. There are several important properties that a parsing algorithm should have if it is to be practically useful. It should be ‘sound’ with respect to a given grammar and lexicon; that is, it should not assign to an input sentence analyses which cannot arise from thegrammar in question. It should also be ‘complete’; that it, it should assign to an input sentence all the analyses it can have with respect to the current grammar and lexicon. Ideally, the algorithm should also be ‘efficient’, entailing the minimum of computational work consistent with fulfilling the first two requirements, and ‘robust’: behaving in a reasonably sensible way when presented with a sentencethat it is unable to fully analyse successfully. In this discussion we generaly ignore the latter two requirements: to some extent, they are covered by the companion section on ‘chart parsing’. 1
There are several other dimensions on which is useful to characterise the behaviour of parsing algorithms. One can characterise their search strategy in terms of the characteristic alternatives ofdepth first or breadth first (q.v.). Orthogonally, one can characterise them in terms of the direction in which a structure is built: from the words upwards (‘bottom up’), or from the root node downwards (‘top down’). A third dimension is in terms of the sequential processing of input words: usually this is left-to-right, but right-to-left or ‘middle-out’ strategies are also feasible and may bepreferable in some applications (e.g. parsing the output of a speech recogniser).
Top down parsing
We can illustrate a simple top down algorithm for a CFG as follows. Let the grammar contain the following rules (which for simplicity also introduce lexical entries instead of presupposing a separate component for this): S NP VP VP VP Aux Vi Vt --> --> --> --> --> --> --> --> NP VP they | fish Aux VPVi Vt NP can fish can
This will assign to the sentence ‘they can fish’ two distinct analyses corresponding to two interpretations ‘they are able/permitted to fish’ and ‘they put fish in cans’. (It will also generate several other good sentences and lots of odd ones). The top down algorithm is very simple. We begin with the start symbol, in this case, S, and see what rules it figures in as the...
Regístrate para leer el documento completo.