• No Comments

The book is a reference guide to the finite-state computational tools developed by Xerox Corporation in the past decades, and an introduction to the more. : Finite State Morphology (): Kenneth R. Beesley, Lauri Karttunen: Books. Morphological analysers are important NLP tools in particular for languages with R. Beesley and Lauri Karttunen: Finite State Morphology, CSLI Publications.

Author: Kigara Got
Country: Montenegro
Language: English (Spanish)
Genre: Video
Published (Last): 20 September 2010
Pages: 328
PDF File Size: 8.86 Mb
ePub File Size: 13.12 Mb
ISBN: 641-3-96858-637-9
Downloads: 59862
Price: Free* [*Free Regsitration Required]
Uploader: Grobar

Documentation tools We publish our documentation with forrest Morphological analysis The project uses a set of morphological compilers which exists in two versions, the xerox and the hfst tools. Instead of cascaded rules with intermediate stages and the computational problems they seemed to lead to, rules could be thought of as statements that directly constrain the surface realization of lexical strings. We have made a short introduction in English and a longer document in Norwegian on this topic.

Development tools

They have a generative orientation, viewing surface forms as a realization of the corresponding lexical forms, not the other way around. The existing stemmers have ignored the handling of multi-word expressions and identification of Arabic names.

The first two-level rule compiler was written in InterLisp by Koskenniemi and Karttunen in using Kaplan’s implementation of the finite-state calculus [ Koskenniemi,Karttunen et al. The original implementation was primarily intended for analysis, but the model was in principle bidirectional and could be used for generation. Two-level rules may refer to both sides of the context at the same time. However, the problem is easy to manage in a system that has only two levels.

The four K’s discovered that all of them were interested katrtunen had been working on the problem of morphological analysis. This was the situation in the spring of when Kimmo Koskenniemi came to a conference on parsing that Lauri Karttunen had organized at the University of Texas at Austin.

Linguistic Issues Although the two-level approach to morphological analysis was quickly accepted as a useful practical method, the linguistic insight behind it was not picked up by mainstream linguists.

A Short History of Two-Level Morphology

But in order to look them up in the lexicon, the system must first complete the fjnite. When it first appeared in print [ Karttunen et al.


The idea of rules as parallel constraints between a lexical symbol and its surface counterpart was not taken seriously at the time outside the circle of computational linguists. This asymmetry is an inherent property of the generative approach to phonological description.

Two-level rules enable the linguist to refer to the input and the output context in the same constraint.

But none of these systems had a finite-state rule compiler.

heesley Developing a complete finite-state calculus was a challenge in itself on the computers that were available at the time. This is one of the many types of conflicts that the Xerox compiler detects and resolves without difficulty. For example, in Finnish consonant gradation, an intervocalic k generally disappears in the weak grade. Many arguments had been advanced in the literature to show that phonological alternations could not be described or explained adequately without sequential rewrite rules.

Beesely reason for the slow progress may have been that there were persistent doubts about the practicality of the approach for morphological analysis. The constraints can refer to the lexical context, to the surface context, or to both contexts at the same time. Although two-level rules are formally quite different from the rewrite rules studied by Kaplan and Kay, the basic finite-state methods that had been developed for compiling rewrite-rules were applicable to two-level rules as well.

It went largely unnoticed that two-level rules could have the same effect as ordered rewrite rules because two-level rules allow the realization of a lexical symbol to be constrained either by the lexical side or by the surface side.

Applying the rules in parallel does not in itself solve the overanalysis problem discussed in the previous section.

Koskenniemi and other early practitioners of two-level morphology had to compile their rules by hand into finite-state transducers. It is interesting to note how linguistic fashions have changed. There are of course many other differences. A Path in the Lexicon. In the two-level formalism, the left-arrow part of a rule such as N: The results obtain shows that the average of accuracy in enhanced stemmer on the corpus is finiye These theoretical insights did not immediately lead to practical results.


Etate a formal point of view there is no substantive difference; a cascade of rewrite rules and a set of parallel two-level constraints are just two different ways to decompose a complex regular relation into a set of simpler relations that are easier to understand and manipulate. It was necessary to make the compiler check for, and automatically eliminate, most common types of conflicts.

In Optimality Theory, cases of this sort are handled by constraint ranking. We used morhology enhanced stemming for extracting the stem of Arabic words that is based on light stemming and dictionary-based stemming approach. The xerox tools are the original ones, they are robust and well documented, they are freely available for research, but they are not open source. The hfst tools are open source with no restrictions, but they are still quite new with version numbers like 0.

But the world has changed. In a two-level framework, there is seemingly a problem. Although transducers cannot in general be intersected, Koskenniemi’s constraint transducers can be intersected.

They are documented in the book referred to on that page Beesley and Karttunenwe strongly recommend anyone working on morphological transducers, both with xerox and hfst, to buy the book. The project manipulates text in many ways, organized in lexicons. Finite State Morphology Kenneth R. But a surface form mlrphology typically be generated in more than one way, and the number of possible analyses grows with the number of rules that are involved.

Finite-State Morphology

The semantics of two-level rules were well-defined but there was no rule compiler available at the time. We have used Arabic corpus that consists of ten documents in order to evaluate the enhanced stemmer. Koskenniemi was not convinced that efficient morphological analysis would ever be practical with generative rules, even if they were compiled into finite-state transducers.