Please use this identifier to cite or link to this item:
Title: Semi-automatic techniques for extending the FrameNet lexical database to new languages
Authors: Tonelli, Sara
Keywords: Natural language processing
Discourse analysis -- Data processing
Computational Linguistics
Issue Date: 2010
Abstract: The topic of this work is the semi-automatic development of FrameNet-like resources for new languages with a focus on Italian. Our approach is aimed at exploiting as much as possible the theoretical backbone of English FrameNet, and to find ways to automatically populate the language-dependent part of the database with Italian lexical units and example sentences. The first part of this thesis is devoted to the analysis of FrameNet theoretical background and to the discussion about ongoing projects for the development of new FrameNets. We also introduce the main natural language processing tasks that can benefit from the integration of frame information. The second part of the thesis is more task-oriented and presents three strategies for the semi-automatic annotation of Italian data with frame information. We start from the fundamental assumption that frames as defined in the English FrameNet can be re-used for the semantic analysis of Italian, but then we account also for some exceptions to such claim, due to different types of cross-linguistic divergences. Even if we focus on Italian, the presented framework can be easily applied to any new language, also because our experiments were carried out using publicly available multilingual resources such as the Europarl corpus (available in 11 languages), MultiWordNet (5 languages) and Wikipedia (264 languages).
Appears in Collections:Ca' Foscari PhD and MA Theses

Files in This Item:
File Description SizeFormat 
TESI_TONELLI.pdf2.83 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.