www-announce mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joern Kottmann <jo...@apache.org>
Subject [ANNOUNCE] Apache OpenNLP 1.8.0 Release
Date Fri, 19 May 2017 11:29:00 GMT
The Apache OpenNLP team is pleased to announce the release of version 1.8.0
of Apache OpenNLP.

The Apache OpenNLP library is a machine learning based toolkit for the
processing of natural language text.

It supports the most common NLP tasks, such as tokenization, sentence
segmentation, part-of-speech tagging, named entity extraction, chunking,
parsing, and coreference resolution.

The OpenNLP 1.8.0 binary and source distributions are available for
download from our download page:

The OpenNLP library is distributed by Maven Central as well. See the Maven
Dependency page for more details:

Java 1.8 is required to run OpenNLP Maven 3.3.9 is required for building it
Building from the Source Distribution.

To build everything execute the following command in the root folder: mvn
clean install

The results of the build will be placed in:
opennlp-distr/target/apache-opennlp-1.8.0-bin.tar-gz (or .zip)

What's new in Apache OpenNLP 1.8.0

This release introduces many new features, improvements and bug fixes. The
has been improved for a better consistency and many deprecated methods were
removed. Java 1.8 is required.

Additionally the release contains the following noteworthy changes:

- POS Tagger context generator now supports feature generation XML
- Add a Name Finder feature generator that adds POS Tag features
- Add CONLL-U format support
- Improve default Name Finder settings
- TokenNameFinderEvaluator CLI now support nameTypes argument
- Stupid backoff is now the default in NGramLanguageModel
- Language codes now are ISO 639-3 compliant
- Add many unit tests
- Distribution package now includes example parameters file
- Now prefix and suffix feature generators are configurable
- Remove API in Document Categorizer for user specified tokenizer
- Learnable lemmatizer now returns all possible lemmas for a given word and
pos tag
- Lemmatizer API backward compatibility break: no need to encode/decode
lemmas anymore, now LemmatizerME lemmatize method returns the actual lemma
- Add stemmer, detokenizer and sentence detection abbreviations for Irish
- Chunker SequenceValidator signature changed to allow access to both token
and POS tag

A detailed list of the issues related to this release can be found in the

Thanks again to all contributors and committers for their help.

View raw message