Example: Writing a Language Package

Language support in CodeMirror takes the form of specific packages (with names like @codemirror/lang-python or codemirror-lang-elixir) that implement the support features for working with that language. Those may be...

A parser for the language.
Additional syntax-related metadata, such as highlighting, indentation, and folding information.
Optionally various language-specific extensions and commands, such as autocompletion support or language-specific keybindings.

In this example, we'll go through implementing a language package for a very minimal Lisp-like language. A similar project, with build tool configuration and such set up for you, is available as an example Git repository at codemirror/lang-example. It may be useful to start from that when building your own package.

Parsing

The first thing we'll need is a parser, which is used for highlighting but also provides the structure for things like syntax-aware selection, auto-indentation, and code folding. There are several ways to implement a parser for CodeMirror.

Using a Lezer grammar. This is a parser generator system that converts a declarative description of a grammar into an efficient parser. It's what we'll be using in this example.
Using a CodeMirror 5-style stream parser, which is mostly just a tokenizer. This can be easier for very basic highlighting, but doesn't produce a structured syntax tree, and quickly breaks down when you need more than tokenizing, for example to distinguish type names from variable names.
Writing a completely custom parser. This can be the only recourse for some awkward languages like Markdown, but tends to be quite a lot of work.

Generally, it won't be feasible to use existing parsers, written for a different purpose, to parse editor content. The way the editor parses code needs to be incremental, so that it can quickly update its parse when the document changes, without re-parsing the entire text. It also needs to be error-tolerant, so that highlighting doesn't break when you have a syntax error somewhere in your file. And finally, it is practical when it produces a syntax tree in a format that the highlighter can consume. Very few existing parsers can easily be integrated in such a context.

If your language defines a formal context-free grammar, you may be able to base a Lezer grammar on that with relative ease—depending on how much dodgy tricks the language uses. Almost all languages do some things that don't fit the context-free formalism, but Lezer has some mechanisms to deal with that.

The Lezer guide provides a more complete explanation of how to write a grammar. The basic example walks through the grammar used in this example, producing a small grammar file.

Your grammar should be put in its own file, typically with a .grammar extension, and ran through lezer-generator to create a JavaScript file.

If your grammar lives in example.grammar, you can run lezer-generator example.grammar to create a JavaScript module holding the parse tables. Or, as the example repository does, include the Rollup plugin provided by that tool in your build process, so that you can directly import the parser from the grammar file.

CodeMirror integration

Lezer is a generic parser tool, and our grammar so far doesn't know anything about highlighting or other editor-related functionality.

A Lezer parser comes with a number of node types, each of which can have props with extra metadata added to them. We'll create an extended copy of the parser to include node-specific information for highlighting, indentation, and folding.

import {parser} from "./parser.js"
import {foldNodeProp, foldInside, indentNodeProp} from "@codemirror/language"
import {styleTags, tags as t} from "@lezer/highlight"

let parserWithMetadata = parser.configure({
  props: [
    styleTags({
      Identifier: t.variableName,
      Boolean: t.bool,
      String: t.string,
      LineComment: t.lineComment,
      "( )": t.paren
    }),
    indentNodeProp.add({
      Application: context => context.column(context.node.from) + context.unit
    }),
    foldNodeProp.add({
      Application: foldInside
    })
  ]
})

styleTags is a helper that attaches highlighting information. We give it an object mapping node names (or space-separated lists of node names) to highlighting tags. These tags describe the syntactic role of the elements, and are used by highlighters to style the text.

The information added by @detectDelim would already allow the automatic indentation to do a reasonable job, but because Lisps tend to indent continued lists one unit beyond the start of the list, and the default behavior is similar to how you'd indent parenthesized things in C or JavaScript, we'll have to override it.

The indentNodeProp prop associates functions that compute an indentation with node types. The function is passed a context object holding the relevant values and some indentation-related helper methods. In this case, the function computes the column position at the start of the application node and adds one indent unit to that. The language package exports a number of helpers to easily implement common indentation styles.

Finally, foldNodeProp associates folding information with node types. We allow application nodes to be folded by hiding everything but their delimiters.

That gives us a parser with enough editor-specific information encoded in its output to use it for editing. Next we wrap that in a Language instance, which wraps a parser and adds a language-specific facet that can be used by external code to register language-specific metadata.

import {LRLanguage} from "@codemirror/language"

export const exampleLanguage = LRLanguage.define({
  parser: parserWithMetadata,
  languageData: {
    commentTokens: {line: ";"}
  }
})

That code provides one piece of metadata (line comment syntax) right away, and allows us to do something like this to add additional information, such as the an autocompletion source for this language.

import {completeFromList} from "@codemirror/autocomplete"

export const exampleCompletion = exampleLanguage.data.of({
  autocomplete: completeFromList([
    {label: "defun", type: "keyword"},
    {label: "defvar", type: "keyword"},
    {label: "let", type: "keyword"},
    {label: "cons", type: "function"},
    {label: "car", type: "function"},
    {label: "cdr", type: "function"}
  ])
})

Finally, it is convention for language packages to export a main function (named after the language, so it's called css in @codemirror/lang-css for example) that takes a configuration object (if the language has anything to configure) and returns a LanguageSupport object, which bundles a Language instance with any additional supporting extensions that one might want to enable for the language.

import {LanguageSupport} from "@codemirror/language"

export function example() {
  return new LanguageSupport(exampleLanguage, [exampleCompletion])
}

The result looks like this: