Book Reviews The Oxford Handbook of Computational Linguistics Ruslan Mitkov (editor) (University of Wolverhampton) Oxford: Oxford University Press, Oxford University Computing Laboratory, and before that spent four years as a . sole editor of The Oxford Handbook of Computational Linguistics (, musicmarkup.info The Oxford Handbook of Computational Linguistics. Edited by Ruslan Mitkov. Oxford Handbooks. Describes the best concepts, processes.
|Language:||English, Spanish, Japanese|
|ePub File Size:||15.68 MB|
|PDF File Size:||8.33 MB|
|Distribution:||Free* [*Sign up for free]|
This handbook is currently in development, with individual articles publishing online in advance of print publication. At this time, we cannot add information about. The Oxford Handbook of Computational Linguistics features thirty-eight articles commissioned from experts all over the world. The book describes major. Garsington Road, Oxford, OX4 2DQ, UK .. sole editor of The Oxford Handbook of Computational Linguistics (, Oxford Uni-.
Corpas Pastor, I. Theoretical and Applied Approaches, S. Torner, E. Bernal ed. Stefanov J. Esteves-Ferreira, J. Macan, R.
Rohanian, R. Mitkov, A. Computational and Corpus-based Phraseology. Baisa, S. Corpas Pastor, M. Corpas Pastor, J. Monti, V Seretan, R. Corpas Pastor ed. Costa, I. Pastor, R. Mitkov Compiling Specialised Comparable Corpora.
Should we always thrust Semi- automatic Compilation Tools? Corpas Pastor, R. Cruz, Maite Taboada, Ruslan Mitkov A machine-learning approach to negation and speculation detection for sentiment analysis, Journal of the Association for Information Science and Technology 67 9 , p.
Sojka, A. Pala ed. Aguilera Crespillo, G. Las Palmas de Gran Canaria, de enero de , p. Bowker, G. Varieties and variations" 38 2 , P. Costa, G. Seghiri, R. Costa, A Zaretskaya, G. Taken together, they provide both a comprehensive introduction to the field and a useful reference volume. In addition to the usual author and subject matter indices, there is a substan- tial glossary that students will find invaluable.
Each chapter ends with a bibliography, together with tips for further reading and mention of other resources, such as confer- ences, workshops, and URLs. Part I covers the full spectrum of linguistic levels of analysis from a largely theoret- ical point of view, including phonology, morphology, lexicography, syntax, semantics, discourse, and dialogue.
The result is a layered approach to the subject matter that allows each new level to take the previous level for granted. However, the authors do not typically restrict themselves to linguistic theory. The phonology and morphology chapters provide fine introductions to these topics, which tend to receive short shrift in many NLP and AI texts.
Part I ends with two chapters, one on formal grammars and one on complexity, which round out the computational aspect. Part II is more task based, with a focus on such activities as text segmentation, part- of-speech tagging, parsing, word sense disambiguation, anaphora resolution, speech recognition, and text generation.
Some of these chapters make the obvious connections to topics in Part I, but others could have done more in this regard.
However, there are many forward references to applications in Part III to which these techniques are pertinent. The levels of treatment accorded to the topics in Part II are perhaps a little mixed, with some being less introductory than others. This is probably fine for a handbook of this kind, although it might limit the usefulness of these chapters for some students.
Each of these chapters provides enough material to get a student started on the conception and planning phases of a segmentation or tagging project.
Middle chapters in Part II concentrate upon technologies to solve somewhat higher- level problems, such as natural language generation, speech recognition, and text-to- speech synthesis.
For example, the cepstral transformation and the Mel scale could be better motivated; neither is formally defined or linked to a glossary entry.
Many students of NLP will not be familiar with these concepts and will not understand their importance in linear prediction and filterbank analyses using hidden Markov models. These fairly specialized topics are then followed by useful chapters on subjects of interest to most computational linguists.
Mooney concentrates on the induction of symbolic representations of knowledge, such as rules and decision trees, in his chapter on machine learning. This focus avoids overlap with more statistical learning methods, such as naive Bayes, and allows room for covering case-based methods, such as nearest-neighbor algorithms.
Measures such as precision and recall are useful yardsticks, but the real issue is, what value does the system deliver to an end user? More specifically, what does the system enable a knowledge worker to do that he or she could not do before? Academic researchers are typically not well placed to either pose or answer such questions, but any purveyor of natural language software must somehow address them.
The section on evaluation of mature output components is the most relevant here. McEnery provides an able introduction to corpus linguistics, albeit with a primary focus upon English, and briefly summarizes some of the advances that annotated corpora have enabled.
However, it is clear that the value of ontological approaches has yet to be fully demonstrated and that many of the tools are still in their infancy. Part II ends with a compact and readable overview of lexicalized tree-adjoining grammars by Joshi, which both motivates the formalism and illustrates its power. Part III provides overviews of important areas such as machine translation, in- formation retrieval, information extraction, question answering, and summarization.
These chapters will be particularly attractive to practitioners in these fields, as they provide succinct and realistic overviews of what can and cannot be achieved by cur- rent technology.
I confess to having read these chapters first. In fact, it might not be a bad strategy for some readers to dive straight into an application area in which they are particularly interested, and then read other chapters as needed, using the cross-references as a guide.
Machine translation is accorded two chapters, one that discusses the earlier, rule- based approaches and one that deals with more recent, empirical approaches based on parallel corpora. Both chapters give the general reader a good feel for the issues, the strengths and limitations of the various methods, and the kinds of tools that are currently available to assist translators.
In the information retrieval chapter, Tzoukerman, Klavans, and Strzalkowski pro- vide a frank assessment of how little impact natural language processing has had upon current search engine technology, beyond the application of tokenization and stem- ming rules.
Whether attempting to apply WordNet to query expansion or seeking to disambiguate query terms, researchers have typically either failed to deliver improve- ments or failed to scale complex solutions to applications of commercial value. They conclude that NLP techniques to date have either been too weak to have a mea- surable impact or too expensive in terms of effort or computation to be cost-effective. Grishman provides an overview of the work done under the auspices of the Message Understanding Con- ferences in these areas, as well as an update on machine-learning approaches to the problem of building extraction patterns.
In looking for gaps in the book as a whole, one cannot help noticing that the chap- ters on ontologies, word senses, and lexical knowledge acquisition by Matsumoto are among the few to touch upon semantic information processing. This is in marked con- trast to many AI and NLP collections from the s, in which articles on knowledge representation languages and text interpretation schemes abounded.
Also absent are connectionist models of speech and language, which were perhaps more popular in the s than they are today. These omissions may reflect a new realism in the field, in which the emphasis is now upon methods that are scalable, less knowledge intensive, and more amenable to empirical evaluation. Overall, this is an impressive volume that demonstrates just how far the field has progressed in the last decade.
When one combines the newer corpus-based approaches with continued advances in algorithms and representations in other areas, and then factors in annual increases in computing power and storage capability, one sees a recipe for further successes on hard problems like speech recognition, machine translation, and broad-coverage parsing.
Over the last 20 years, he has published books and papers on expert systems, theorem proving, information extraction, and text categorization. Paul, MN ; e-mail: Jackson Thomson. Com; URL: Anyone with an interest in the history of computational linguistics will find much to relish and learn from in this weighty collection of articles past.
Lest we forget, MT was one of the first nonnumerical applications proposed for the digital computer following the Second World War, and its often tumultuous year history has had a significant impact on the entire field of computational linguistics. Indeed, this very journal can trace its lineage back to the journal whose original title was Mechanical Translation.
Though not a proper history of MT, Readings in Machine Translation is certainly a historical collection. For this alone, Nirenburg, Somers, and Wilks deserve our gratitude.
The volume begins with the famous memo- randum that Warren Weaver sent out to some professional acquaintances in , which is generally taken to mark the genesis of machine translation; and the most recent paper included dates back to the fourth MT Summit in The editors cite three: Well, as criteria go, that certainly sets a high standard!
And yet many of these articles seem to meet it with ease. One reads these papers today, decades after they were written, and one still cannot help but be impressed. Needless to say, not all the articles included in Readings in Machine Translation come up to this high standard; that would be too much to expect. In other cases, one wishes the editors had made more liberal use of their prerogative to abridge.
Another reason for the excessive length of Readings in Machine Translation is that the book is divided into three distinct sections, each under the responsibility of one of the editors.
There are obvious overlaps between these divisions, in the sense that articles included in one section could just as well fit into another. The editors acknowledge this, and in itself it is not very serious.
In his introduction, for example, Nirenburg cites numerous, often lengthy passages from the articles by the early MT pioneers that purportedly support his preferred approach to meaning-based MT. A more serious criticism of Readings in Machine Translation is that the book is some- what dated.
This is a rather paradoxical charge for a collection of historical articles; what I mean by it is this: In fact, I was sent a preliminary version by the publisher in In the last few years, for example, there has been an impressive resurgence of activity in machine transla- tion, particularly in the United States, where statistical methods drawn from speech recognition and various techniques borrowed from machine learning have proven re- markably successful.
Had the editors been more aware of the profound impact of these new influences on the field, they would perhaps have modified their selection of articles. As it is, only two of the thirty-six papers in the collection explicitly address data-driven or statistical methods in MT: Which brings me to my final criticism of this otherwise wonderful volume. Watson Research Center in the late s that eventually produced the Mark I system, later installed at the U.
And where did the article included in this collection first appear? It would have been so much easier and more helpful to display this information on the first page of each contribution!
Indeed, one wishes the editors had seen fit to include a short introductory note to each article, providing a few words of historical background on the author, or at least his or her affiliation at the time the paper was published. But these are more or less minor quibbles, and they do not significantly detract from the value of this generous volume: