Rule Authoring 101

A guide to creating regex-based rules for Vale and Vale Server.

An example rule from Vale’s style for the Microsoft Writing Style Guide.

Overview

In this post, we’ll take a look at the process of writing new rules for Vale (the command-line tool) and Vale Server (the desktop application). The goal for this post is to supplement the existing documentation by covering possible areas of confusion and potential “gotchas.”

But first, we need to define a few terms that you’ll encounter throughout this post and other documentation:

  • Vale: An open-source, command-line tool that brings code-like linting to prose.
  • Check: Vale’s functionality is exposed through extensible “checks” that perform abstract tasks such as checking the length of certain text segments (such as sentences and paragraphs) or searching for a particular token in a text document.
  • Rule: A “rule” is a check that has been given a concrete task — for example, ensuring that sentences contain no more than 25 words.
  • Style: A “style” consists of multiple rules that come together to enforce the guidelines of a certain organization, guide, or application. Within a given style, rules are assigned an ID of the form <style name>.<rule name>.
  • StylesPath: A directory containing all of a users styles.
  • Vale Server: A commercial desktop application built on top of Vale that brings a refined experience for individual writers, including a fully managed StylesPath and multiple third-party app plugins.

Topic 1: Overcoming the lack of look-around assertions

Vale is written in Go and uses the standard regexp package to evaluate all of its regular expressions. In general, this should be inconsequential to rule authors — regexp supports the same syntax as Google’s RE2 library and is similar to most scripting languages.

The one exception is that regexp, like the aforementioned RE2 library, is non-backtracking:

As a matter of principle, RE2 does not support constructs for which only backtracking solutions are known to exist. Thus, backreferences and look-around assertions are not supported.

WhyRE2

While this may seem like a significant loss (especially if we ignore the associated performance guarantees), there are a number of tricks to overcome the limitation.

Positive look-arounds

The first variation of look-arounds are the positive look-ahead and look-behind assertions:

  • look-ahead (x(?=y)): “I want to match x only if it precedes y without matching y.”
  • look-behind ((?<=y)x): “I want to match x only if it is preceded by y without matching y.”

For Vale’s purposes, it’s rare to encounter a case that requires matching x without y but it’s possible by defining a message without any format specifiers (%s):

This is definitely a contrived example but it illustrates the point: we’re matching x = dialog only if it precedes y = box without including y in our message to the user.

Negative look-arounds

The second variation of look-arounds are the negative look-ahead and look-behind assertions:

  • look-ahead (x(?!y)): “I want to match x only if it does not precede y.”
  • look-behind ((?<!y)x): “I want to match x only if it is not preceded by y.

For Vale’s purposes, these cases can be emulated using the substitution check:

A rule emulating negative look-ahead functionality (running in Vale Server’s Studio page).

Using our example from above, we’re now matching x = dialog only when it does not precede y = box.

Topic 2: Offering corrections with “actions” [Vale Server only]

While Vale is designed to only notify writers about style violations (typically in a CI or command-line environment), Vale Server is capable of also offering potential solutions.

Vale Server offering solutions through its VS Code extension.

It does this by offering a set of generic, built-in “actions” that can be extended in a rule’s YAML file — much like how a rule extends an existing check such as substitution. There are currently 5 available actions that can do anything from suggest grammatical changes (powered by LanguageTool), offer spelling corrections, or perform arbitrary in-place edits to a token.

Let’s look at a real use case:

In the above rule (taken from the Microsoft style), we define an action with the name edit and the parameter .?!. This allows the text editor clients (Atom, Sublime Text, and VS Code) to offer an in-editor solution for removing punctuation from headings.

An open-source software developer with interests in natural language processing, data science, and collaborative writing. More @ https://github.com/jdkato.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store