Please use this identifier to cite or link to this item: http://hdl.handle.net/11707/127
Title: Computational Linguistic Text Processing – Lexicon, Grammar, Parsing and Anaphora Resolution
Authors: Delmonte, Rodolfo
Keywords: Computational Linguistic Text Processing – Lexicon, Grammar, Parsing and Anaphora Resolution
Issue Date: Dec-2008
Publisher: Nova Science Publishers - New York
Abstract: This book is the second in a series of books organized as an experimental exercise: they contain both theoretical background and the output of the system, GETARUNS that enacts and applies the theory. The architecture of the system is strictly related to the structure of the books. Thus, we can think of the books as being organized around two scientifically distinct but in fact strictly interrelated fields of research: • sentence level linguistic phenomena • text or discourse level linguistic phenomena the former is to be described by means of grammatical theories, the latter requires the intervention of extralinguistic knowledge, i.e. knowledge of the world. Book 1 – or the current book – addresses sentence grammar or what is usually referred to as such by theoretical linguists. It does it by dividing up – somewhat ideally and sometimes arbitrarily – what must or needs to be computed at sentence level from what needs not or cannot be computed at the same level, and consequently belongs to discourse grammar. In that sense, the subdivision is not totally an arbitrary one, even though overlappings are normal cases and will be discussed where needed. The book also indirectly does another (un)intended subdivision: the one existing between syntax and semantics. Again, it would be impossible not to deal with semantically related issues when talking about syntax or the lexicon. However, semantics with uppercase S, is only treated in Book 2 – already published – where discourse and text level grammar is tackled. So eventually, this book deals with lexicon, morphology, tagging, treebanks, parsing, quantifiers and anaphoric or pronominal binding. In other words, all that concerns the level of sentence grammar in a computational environment, i.e. sentence level parsing. Sentence level Grammar – as has been purported in linguistic theories – takes care of all grammatical and linguistic relations that belong to that level. Knowledge of the world and semantic disambiguation do not interfere with the rules of sentence grammar, and can be thought of as a separate level of computation, provided that the lexicon be structured in such a way to allow such a subdivision of tasks. For that reason, the first chapter is devoted to the Lexicon and to a linguistically-based way to derive principled rules that create lexical entries for Out of Vocabulary words when needed on the basis of a fixed number of syntactic, semantic and conceptual lexical types. The second chapter presents our work on a treebank of Italian and the way to produce a conversion algorithm to Dependency Structures (almost) effortlessly. In the following three chapters – the central part of this book – Chapts. 3-5 – we are concerned with parsing. In these chapters we present at first a deep parser – very domain and text limited – and then a less constrained, more versatile, derived scaled version. The deep parser only works topdown and has the goal to identify ungrammatical sentences. The other parser works in both directions: and produces a semantically complete and consistent DAG (direct acyclic graph) like representation. Both parsers use the same lexical resources and rule modules. They also both take advantage of a “shallow” parser where tagging and chunking takes place. The output of the shallow parser is used to help lookahead mechanisms work appropriately at given critical constituency boundaries: to detect where possible structural ambiguity may come up. Its output can also be used as a final backoff strategy to recover from complete failure. In these three chapters we also develop a comparison with Dependency parsers and discuss their advantages and deficiencies. Then in chapter 4, we discuss at length why we regard statistically based parsing ill-founded and speak in favour of linguistically-based symbolic parsing – where however statistics can play a minor role. Eventually, in chapter 5, we present our deep parser and the rule modules. The following chapters deal respectively with: quantification – chapt. 6 -, semantic shallow interpretation – chapt. 7 -, pronominally based discourse anaphora – chapt. 8. Chapter 9 show the system used fruitfully in a shallow version for summarization. We decided to include the system in its various implementations in a CD-Rom attached to the book. GETARUN comes in three versions: - Version 1. Complete Getarun – performs a complete analysis of a text from tokenization to discourse structure. This version also supports the fully topdown parser for grammaticality checking. It also implements question answering with a generator; - Version 2. Partial Getarun – also performs a complete analysis but does it in a fully bottomup version, only checking for broad semantic constraints. No temporal reasoning, no logical form, no semantic discourse model. It builds a fully indexed augmented dependency structure which is then used to produce a level of informational structure. This is used to produce discourse relations and discourse structures. It is also used to evaluate entailment relations; - Version 3. Shallow Getarun – can be used to do sentence extraction on the basis only of tagging and local discourse perusal based on discourse markers. It can also be used to do the same thing on a shallow version of the Partial Getarun, which we called Deep Summarization. It also implements a version of Question Answering based on sentence extraction as best candidate answers.
URI: http://hdl.handle.net/11707/127
ISBN: 978-1-60456-749-6
Appears in Collections:Articles, book chapters by CLS members

Files in This Item:
File Description SizeFormat 
CLTP1tocS.pdfBook1 in pdf format4.29 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.