Research Area - Natural Language Processing

Natural Language Processing


In the last years substantial efforts have been devoted by ILSP to the development of basic NLP tools. A battery of reusable language components and resources has been developed (lingware including tokenizers, stemmers, POS taggers, lemmatizers, named entity recognizers, term extractors, surface syntactic analysers, parsers and computational lexica) related to processing and linguistic analysis of text. Our Science &Technology endeavour was rooted in a much promising paradigm: data-driven, probabilistic models. These linguistic components were integrated in a consistent Service-oriented Architecture platform leveraging open standards, exposing resources and lingware, and addressing the needs of a variety of NLP-based applications.

The present decade has seen a blossoming of open NLP tools and research projects. Recent advances in theoretical underpinnings and language representations will act as a driving force that will enable language understanding and industrialize further the language landscape. A confluence of various fields ranging from machine learning, cognitive psychology and computational linguistics pushed for a shift in established patterns. In this context, ILSP has revised its plan/vision in order to incorporate new challenges and reflect the evolution of technology.

Accomplishments so far:

  • The Greek Dependency Treebank (an annotated resource of 100K words)
  • The Greek Event Annotation Corpus (with annotations for events and time expressions according to the TimeML schema)
  • A data-driven dependency parser for Modern Greek
  • A Timex Recogniser implementing the Greek TimeML codebook
  • Text classifiers based on machine learning techniques
  • Computational semantic resources for SRL based on cognitive semantics
  • Document summarization tools

Current R&D focus:

  • expansion of our existing infrastructure of resources for Greek
  • enhancement of our text processing chain with lingware focusing on coreference resolution, semantic role labelling and spatial expressions recognition
  • topic modelling in web data
  • event/fact recognition and spatiotemporal anchoring of events with novel text mining techniques
  • opinion mining and sentiment analysis; opinion summarization, prediction, topic linking and visualization plug-ins