Member-only story

Introducing prose v2.0.0: Bringing NLP to Go

Joseph Kato
3 min readJul 16, 2018

--

A guide to using Go for natural language processing (NLP).

We’re pleased to announce the v2.0.0 release of prose, a natural language processing (NLP) library for Go.

v2.0.0 represents a major shift in the project’s focus: instead of simply offering an assortment of prose-related utilities, we’re focusing on bringing a more refined NLP experience to Go. This means that the development of v1.0.0’s higher-level features (e.g., the title-case converter) will be moved to other repositories going forward.

In order to avoid breaking code already importing prose, v2.0.0 will be exposed via gopkg.in/jdkato/prose.v2— allowing github.com/jdkato/prose to still point to v1.0.0.

Among the new features of v2.0.0 is a new, more cohesive API built around Documents.

The document-creation process consists of four steps — tokenization, segmentation, POS tagging, and named-entity extraction — which are discussed in more detail below.

Tokenization

Given a piece of text, tokenization is the task of breaking it up into units referred to as tokens. For example,

--

--

Joseph Kato
Joseph Kato

Written by Joseph Kato

An open-source software developer with interests in natural language processing, data science, and collaborative writing. More @ https://github.com/jdkato.

Responses (3)