Welsh Natural Language Toolkit Phase 2 (WNLT2)

Background

WNLT2 is a follow on project (£40k) from the Welsh Natural Language Toolkit Welsh Government funded project (£39K) under the Welsh-language technology and digital media grant. WNLT2 ran from July 2016 through March 2017, with PI Daniel Cunliffe, co-I Douglas Tudhope and RA Daniel Williams

The aim of WNLT2 is to extend the accessibility of the WNLT toolkit to a wide range of developers and applications.

WNLT2 further develops the suite of WNLT open source software modules that enable Welsh Language computational linguistic and textmining applications via a set of core Natural Language Processing (NLP) tools.

Objectives

The particular objectives of WNLT2 are to

  • Develop a standalone version of WNLT with its own GUI, an API and a command line interface
  • Develop a tweet analysis module so that Welsh-language tweets can be analysed
  • Expand the Named Entity Extraction (NER) module to cover a wider range of entities
  • Demonstrate an example NER application over the Papurau Bro community newspapers

Final workshop

A workshop will be held at USW (Trefforest) on 25 May 2017 to discuss the outcomes. The presentations will be available:

Outputs

The modules are written in JAVA and ‘wrapped’ for execution under the General Architecture for Text Engineering (GATE) framework. They are disseminated under the GNU Lesser General Public License (LGPL). 

The WNLT2 modules can be download from Sourceforge at
https://sourceforge.net/projects/wnlt-project/

The User Guide can be downloaded at xxxx