Prodigy + Prose: Radically efficient machine teaching in Go

A quick tutorial on training NER models for the prose library

In this post, we’ll learn how to teach the prose library to recognize a completely new entity label called PRODUCT. This label will represent various brand names such as “Windows 10”.

To do this, we’ll being using an annotated data set produced by Prodigy, which is an “an annotation tool powered by active learning.”

You can read more about the data set we’re using here or you can download it directly here.

Getting Started

The only keys we’re interested in are text and spans, which we need to populate the data structures required to train our model. More specifically, we need to turn our JSON Lines file into a slice of EntityContext structures.

Since Prodigy’s output and our expected input are so similar, this is fairly straightforward:

Training the Model

Here’s the result of running the full script (which can be downloaded here):

$ time go run model.go
Correct (%): 0.822222
75.24s user 0.54s system 58.845 total


We can now use this model by loading it from disk:

As you can see, prose correctly labeled Windows 10 with the newly-trained label PRODUCT.

While this is an exciting step for the library, we see it as merely the beginning of the kind of NLP functionality we’d like to bring to Go. If you’d like to get involved, head over to the GitHub repository (stars are also highly appreciated!).

An open-source software developer with interests in natural language processing, data science, and collaborative writing. More @

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store