Apache OpenNLP 3.0.0-M3

This release focuses on security hardening, new NLP capabilities, and dependency maintenance.

Security Fixes

Three security issues are addressed in this release (also backported to 2.5.9).

XXE in `DictionaryEntryPersistor` (OPENNLP-1819)

The DictionaryEntryPersistor previously used a SAXParserFactory that did not enable secure processing or disable DTD handling, leaving external entity resolution active. A malicious dictionary file could exploit this for local file disclosure or SSRF before any dictionary entry was processed.

The parsing path is now aligned with the project's existing XmlUtil helper, which properly sets FEATURE_SECURE_PROCESSING and disallow-doctype-decl.

Arbitrary Class Instantiation in `ExtensionLoader` (OPENNLP-1820)

ExtensionLoader.instantiateExtension() performed its isAssignableFrom type check after Class.forName() had already executed the target class's static initializer, allowing a crafted model archive to trigger the static initializer of any class on the classpath.

The fix introduces a package-prefix allowlist consulted before Class.forName() is invoked:

Classes under opennlp.* remain permitted by default.
Other packages must be opted in via ExtensionLoader.registerAllowedPackage(String) or the OPENNLP_EXT_ALLOWED_PACKAGES system property (comma-separated list).

OOM via Unbounded Array Allocation in `AbstractModelReader` (OPENNLP-1821)

getOutcomes(), getOutcomePatterns(), and getPredicates() read attacker-controlled 32-bit count fields from binary model streams and passed them directly to array allocations. A crafted .bin file could trigger an immediate OutOfMemoryError and crash the JVM.

Each count is now bounded (default 10,000,000, configurable via -DOPENNLP_MAX_ENTRIES=<n>), with negative or oversized values failing fast via IllegalArgumentException.

⚠️ For all three issues, users who cannot upgrade immediately should restrict input (dictionary and model files) to trusted sources only.

New Features & Improvements

Roberta-based model support via ONNX — OPENNLP-1518 (#998)
Byte Pair Encoding (BPE) tokenization — OPENNLP-1220 (#1011)
Parse.createFromTokens() convenience method for tokenized input — OPENNLP-53 (#1012)
Thread-safe ME classes by eliminating shared mutable instance state — OPENNLP-1816 (#1003)

What's Changed

Apache OpenNLP 3.0.0-M2 by @mawiesne in #996
OPENNLP-1518: Roberta-based Models - Add support for utilization via Onxx by @rzo1 in #998
OPENNLP-1817: Update log4j2 to 2.25.4 by @dependabot[bot] in #1001
Regenerated NOTICE file after dependency changes by @github-actions[bot] in #1009
OPENNLP-1818: Update zlibsvm-core to 3.0.0 by @dependabot[bot] in #1000
Regenerated NOTICE file after dependency changes by @github-actions[bot] in #1010
OPENNLP-53: Add Parse.createFromTokens() for convenient tokenized input by @mawiesne in #1012
OPENNLP-1220: Add support for Byte Pair Encoding (BPE) by @mawiesne in #1011
Bump com.ruleoftech:markdown-page-generator-plugin from 2.4.2 to 2.4.3 by @dependabot[bot] in #1016
Bump peter-evans/create-pull-request from 8.1.0 to 8.1.1 by @dependabot[bot] in #1014
Bump actions/cache from 5.0.4 to 5.0.5 by @dependabot[bot] in #1017
OPENNLP-1819: Align DictionaryEntryPersistor XML parsing with XmlUtil by @rzo1 in #1019
OPENNLP-1816: Make ME classes thread-safe by eliminating shared mutable instance state by @krickert in #1003
OPENNLP-1822: Update ONNX runtime to 1.25.0 by @dependabot[bot] in #1024
Regenerated NOTICE file after dependency changes by @github-actions[bot] in #1025
OPENNLP-1821: Prevent OutOfMemory Due To Huge Array Allocation by @subbudvk in #1022
OPENNLP-1820: Restrict ExtensionLoader to allowlisted package prefixes by @subbudvk in #1021

New Contributors

@krickert made their first contribution in #1003
@subbudvk made their first contribution in #1022

Full Changelog: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311215&version=12356813

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenNLP 3.0.0-M3

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Apache OpenNLP 3.0.0-M3

Security Fixes

XXE in `DictionaryEntryPersistor` (OPENNLP-1819)

Arbitrary Class Instantiation in `ExtensionLoader` (OPENNLP-1820)

OOM via Unbounded Array Allocation in `AbstractModelReader` (OPENNLP-1821)

New Features & Improvements

What's Changed

New Contributors

Contributors

Uh oh!

OpenNLP 3.0.0-M3

Apache OpenNLP 3.0.0-M3

Security Fixes

XXE in DictionaryEntryPersistor (OPENNLP-1819)

Arbitrary Class Instantiation in ExtensionLoader (OPENNLP-1820)

OOM via Unbounded Array Allocation in AbstractModelReader (OPENNLP-1821)

New Features & Improvements

What's Changed

New Contributors

Contributors

Uh oh!

XXE in `DictionaryEntryPersistor` (OPENNLP-1819)

Arbitrary Class Instantiation in `ExtensionLoader` (OPENNLP-1820)

OOM via Unbounded Array Allocation in `AbstractModelReader` (OPENNLP-1821)