Prodigy + Prose: Radically efficient machine teaching in Go

Joseph Kato
2 min readJul 16, 2018

A quick tutorial on training NER models for the prose library

In this post, we’ll learn how to teach the prose library to recognize a completely new entity label called PRODUCT. This label will represent various brand names such as “Windows 10”.

To do this, we’ll being using an annotated data set produced by Prodigy, which is an “an annotation tool powered by active learning.”

You can read more about the data set we’re using here or you can download it directly here.

Getting Started

The first step is to convert Prodigy’s output into a format that prose can understand. After an annotation session, Prodigy produces a JSON Lines file containing annotations (in our case, we have a total of 1800) of the following format:

The only keys we’re interested in are text and spans, which we need to populate the data structures required to train our model. More specifically, we need to turn our JSON Lines file into a slice of EntityContext structures.

Since Prodigy’s output and our expected input are so similar, this is fairly straightforward:

--

--

Joseph Kato

An open-source software developer with interests in natural language processing, data science, and collaborative writing. More @ https://github.com/jdkato.