Member-only story
Introducing prose v2.0.0: Bringing NLP to Go
A guide to using Go for natural language processing (NLP).
We’re pleased to announce the v2.0.0 release of prose
, a natural language processing (NLP) library for Go.
v2.0.0 represents a major shift in the project’s focus: instead of simply offering an assortment of prose-related utilities, we’re focusing on bringing a more refined NLP experience to Go. This means that the development of v1.0.0’s higher-level features (e.g., the title-case converter) will be moved to other repositories going forward.
In order to avoid breaking code already importing
prose
, v2.0.0 will be exposed viagopkg.in/jdkato/prose.v2
— allowinggithub.com/jdkato/prose
to still point to v1.0.0.
Among the new features of v2.0.0 is a new, more cohesive API built around Documents
.
The document-creation process consists of four steps — tokenization, segmentation, POS tagging, and named-entity extraction — which are discussed in more detail below.
Tokenization
Given a piece of text, tokenization is the task of breaking it up into units referred to as tokens. For example,