Example: Writing a Language Package
Language support in CodeMirror takes the form of specific packages
(with names like @codemirror/lang-python
or
codemirror-lang-elixir
) that implement the support features for
working with that language. Those may be...
-
A parser for the language.
-
Additional syntax-related metadata, such as highlighting,
indentation, and folding information.
-
Optionally various language-specific extensions and commands, such
as autocompletion support or language-specific keybindings.
In this example, we'll go through implementing a language package for
a very minimal Lisp-like language. A similar project, with build tool
configuration and such set up for you, is available as an example Git
repository at
codemirror/lang-example.
It may be useful to start from that when building your own package.
Parsing
The first thing we'll need is a parser, which is used for
highlighting but also provides the
structure for things like syntax-aware
selection,
auto-indentation, and code
folding. There are several ways to implement a parser
for CodeMirror.
-
Using a Lezer grammar. This is a
parser generator system that converts a declarative description of
a grammar into an efficient parser. It's what we'll be using in
this example.
-
Using a CodeMirror 5-style stream
parser, which is mostly just a tokenizer.
This can be easier for very basic highlighting, but doesn't produce
a structured syntax tree, and quickly breaks down when you need
more than tokenizing, for example to distinguish type names from
variable names.
-
Writing a completely custom parser. This can be the only recourse
for some awkward languages like
Markdown, but tends
to be quite a lot of work.
Generally, it won't be feasible to use existing parsers, written for a
different purpose, to parse editor content. The way the editor parses
code needs to be incremental, so that it can quickly update its parse
when the document changes, without re-parsing the entire text. It also
needs to be error-tolerant, so that highlighting doesn't break when
you have a syntax error somewhere in your file. And finally, it is
practical when it produces a syntax tree in a
format that the
highlighter can consume. Very few existing parsers can easily be
integrated in such a context.
If your language defines a formal context-free
grammar, you may
be able to base a Lezer grammar on that with relative ease—depending
on how much dodgy tricks the language uses. Almost all languages do
some things that don't fit the context-free formalism, but Lezer has
some mechanisms to deal with that.
The Lezer
guide
provides a more complete explanation of how to write a grammar. The
basic example walks
through the grammar used in this example, producing a small grammar
file.
Your grammar should be put in its own file, typically with a .grammar
extension, and ran through
lezer-generator
to create a JavaScript file.
If your grammar lives in example.grammar
, you can run
lezer-generator example.grammar
to create a JavaScript module
holding the parse tables. Or, as the example
repository does, include
the Rollup plugin provided by that tool in
your build process, so that you can directly import the parser from
the grammar file.
CodeMirror integration
Lezer is a generic parser tool, and our grammar so far doesn't know
anything about highlighting or other editor-related functionality.
A Lezer parser comes with a number of node
types, each of
which can have
props with
extra metadata added to them. We'll create an extended copy of the
parser to include node-specific information for highlighting,
indentation, and folding.
import {parser} from "./parser.js"
import {foldNodeProp, foldInside, indentNodeProp} from "@codemirror/language"
import {styleTags, tags as t} from "@lezer/highlight"
let parserWithMetadata = parser.configure({
props: [
styleTags({
Identifier: t.variableName,
Boolean: t.bool,
String: t.string,
LineComment: t.lineComment,
"( )": t.paren
}),
indentNodeProp.add({
Application: context => context.column(context.node.from) + context.unit
}),
foldNodeProp.add({
Application: foldInside
})
]
})
styleTags
is a helper that attaches highlighting information. We give it an
object mapping node names (or space-separated lists of node names) to
highlighting
tags. These
tags describe the syntactic role of the elements, and are used by
highlighters to style the text.
The information added by @detectDelim
would already allow the
automatic indentation to do a reasonable job, but because Lisps tend
to indent continued lists one unit beyond the start of the list, and
the default behavior is similar to how you'd indent parenthesized
things in C or JavaScript, we'll have to override it.
The indentNodeProp
prop associates
functions that compute an indentation with node types. The function is
passed a context object holding the
relevant values and some indentation-related helper methods. In this
case, the function computes the column position at the start of the
application node and adds one indent unit to
that. The language package exports a number of helpers
to easily implement common indentation styles.
Finally, foldNodeProp
associates folding
information with node types. We allow application nodes to be folded
by hiding everything but their delimiters.
That gives us a parser with enough editor-specific information encoded
in its output to use it for editing. Next we wrap that in a
Language
instance, which wraps a parser and
adds a language-specific facet that can be used by
external code to register language-specific metadata.
import {LRLanguage} from "@codemirror/language"
export const exampleLanguage = LRLanguage.define({
parser: parserWithMetadata,
languageData: {
commentTokens: {line: ";"}
}
})
That code provides one piece of metadata (line comment syntax) right
away, and allows us to do something like this to add additional
information, such as the an autocompletion
source for this language.
import {completeFromList} from "@codemirror/autocomplete"
export const exampleCompletion = exampleLanguage.data.of({
autocomplete: completeFromList([
{label: "defun", type: "keyword"},
{label: "defvar", type: "keyword"},
{label: "let", type: "keyword"},
{label: "cons", type: "function"},
{label: "car", type: "function"},
{label: "cdr", type: "function"}
])
})
Finally, it is convention for language packages to export a main
function (named after the language, so it's called css
in
@codemirror/lang-css
for example) that takes a configuration object
(if the language has anything to configure) and returns a
LanguageSupport
object, which bundles
a Language
instance with any additional supporting extensions that
one might want to enable for the language.
import {LanguageSupport} from "@codemirror/language"
export function example() {
return new LanguageSupport(exampleLanguage, [exampleCompletion])
}
The result looks like this: