Example: Collaborative Editing
Real-time collaborative editing is a technique where multiple people
on different machines can edit the same document at the same time.
Changes are propagated to other participants over the network and show
up in their views of the document as soon as they arrive.
Here's a toy collaborative editing setup that lives within this page:
Server
The main difficulty with this style of editing is handling of
conflicting edits—since network communication isn't instantaneous, it
is possible for people to make changes at the same time, which have to
be reconciled in some way when synchronizing everybody up again.
CodeMirror comes with utilities for collaborative editing based on
operational
transformation
with a central authority (server) that assigns a definite order to the
changes. This example describes the practical information you need to
set up such a system. For more theoretical information, see this blog
post.
(It is also possible to wire up different collaborative editing
algorithms to CodeMirror. See for example
Yjs.)
Principles
Collaborative systems implemented with the
@codemirror/collab package work like this:
-
There is a central system (authority) that builds up a history of
changes.
-
Individual editors (peers) track which version of the authority's
history they have synchronized with, and which local (unconfirmed)
changes they have made on top of that.
-
All peers set up some way to receive new changes from the
authority. When changes come in...
-
If some of those changes are the peer's own changes, those
changes are removed from the list of unconfirmed changes.
-
Remote changes are applied to the local editor state.
-
If there are unconfirmed changes present, operational
transformation is used to transpose the remote changes across
the unconfirmed local ones, and vice versa, so that the remote
changes can be applied to the peer's current document, and the
updated local changes can be submitted to the server as if they
came after the remote changes.
-
The peer's document version is moved forward.
-
Whenever there are unconfirmed local changes, the peer should try
to send them to the authority, along with its current synchronized
version.
-
If that version matches the server's version, the server
accepts the changes as-is and adds them to its history.
-
Otherwise, the server can, depending on how complex it is and on
whether it has access to all the changes that happened since the
client's version, either reject the updates, or
rebase and accept them.
The more tricky logic that a peer must apply is implemented in the
@codemirror/collab package, but to set up a collaborative system you
must implement the authority (which can be very simple) and wire up
the communication between the peers and the authority (which can get a
bit more subtle due to the nature of networked/asynchronous systems).
The Authority
In this example, the authority is a web
worker. That
helps simulate the asynchronous nature of communication, and the need
to serialize data that goes over the communication channel, while
still allowing everything to run inside the browser. In the real
world, it'll typically be a server program communicating with peers
using HTTP requests or
websockets.
Error handling will be omitted throughout the example to keep things
concise.
The state kept by the authority is just an array of updates (holding a
change set and a client ID), and a current
document.
import {ChangeSet, Text} from "@codemirror/state"
import {Update, rebaseUpdates} from "@codemirror/collab"
let updates: Update[] = []
let doc = Text.of(["Start document"])
The document is used by new peers to be able to join the session—they
must ask the authority for a starting document and version before they
are able to participate.
This code implements the three message types that the worker handles.
-
pullUpdates
is used to ask the authority if any new updates
have come in since a given document version. It “blocks” until new
changes come in when asked for the current version (this is what
the pending
variable is used for).
-
pushUpdates
is used to send an array of updates. The server
stores the updates, rolls its document forward, and notifies any
waiting pullUpdates
requests.
-
getDocument
is used by new peers to retrieve a starting
state.
We'll use a crude mechanism based on
postMessage
and message
channels
to communicate between the main page and the worker. Feel free to
ignore the message-channel-related code in resp
, since that's not
terribly relevant here.
let pending: ((value: any) => void)[] = []
self.onmessage = event => {
function resp(value: any) {
event.ports[0].postMessage(JSON.stringify(value))
}
let data = JSON.parse(event.data)
if (data.type == "pullUpdates") {
if (data.version < updates.length)
resp(updates.slice(data.version))
else
pending.push(resp)
} else if (data.type == "pushUpdates") {
let received = data.updates.map(json => ({
clientID: json.clientID,
changes: ChangeSet.fromJSON(json.changes)
}))
if (data.version != updates.length)
received = rebaseUpdates(received, updates.slice(data.version))
for (let update of received) {
updates.push(update)
doc = update.changes.apply(doc)
}
resp(true)
if (received.length) {
let json = received.map(update => ({
clientID: update.clientID,
changes: update.changes.toJSON()
}))
while (pending.length) pending.pop()!(json)
}
} else if (data.type == "getDocument") {
resp({version: updates.length, doc: doc.toString()})
}
}
In older versions of @codemirror/collab,
rebaseUpdates
didn't exist the way to
handle updates where the client's version didn't match the server's
document version was to simply reject them. The client would have to
fetch peer updates and try again.
This is still a reasonable way to implement the protocol, but it can
in some circumstances lead to starvation—if a client has a higher
latency than its peers, and those peers keep submitting changes, it
may be unable to get its updates through for a long time, as they keep
getting rejected. Thus, it is recommended to rebase updates whenever
practical. If you don't store the full history of updates, but only
recent ones, it is still reasonable to reject requests with a version
that you don't have the update objects for anymore.
The Peer
On the other side of the connection, I'm using some messy magic code
to introduce fake latency and broken connections (the scissor controls
in the demo above). This isn't very interesting, so I'm hiding it in a
Connection
class which is omitted from the code below (you can find
the full code on
GitHub).
These wrappers interact with the worker process through messages,
returning a promise that eventually resolves with some result (when
the connection is cut, the promises will just hang until it is
reestablished).
function pushUpdates(
connection: Connection,
version: number,
fullUpdates: readonly Update[]
): Promise<boolean> {
let updates = fullUpdates.map(u => ({
clientID: u.clientID,
changes: u.changes.toJSON()
}))
return connection.request({type: "pushUpdates", version, updates})
}
function pullUpdates(
connection: Connection,
version: number
): Promise<readonly Update[]> {
return connection.request({type: "pullUpdates", version})
.then(updates => updates.map(u => ({
changes: ChangeSet.fromJSON(u.changes),
clientID: u.clientID
})))
}
function getDocument(
connection: Connection
): Promise<{version: number, doc: Text}> {
return connection.request({type: "getDocument"}).then(data => ({
version: data.version,
doc: Text.of(data.doc.split("\n"))
}))
}
To manage the communication with the authority, we use a view
plugin (which are almost always the right place
for asynchronous logic in CodeMirror). This plugin will constantly (in
an async loop) try to pull in new updates and, if it gets them, apply
them to the editor using the
receiveUpdates
function.
When the content of the editor changes, the plugin starts trying to
push its local updates. It keeps a field to make sure it only has one
running push request, and crudely sets a timeout to retry pushing when
there are still unconfirmed changes after the request. This can happen
when the push failed or new changes were introduced while it was in
progress.
(The request scheduling is something you'll definitely want to do in
a more elaborate way in a real setup. It can help to include both
pushing and pulling in a single state machine, where the peer only
does one of the two at a time.)
The peerExtension
function returns such a plugin plus a collab
extension configured with the appropriate start version.
function peerExtension(startVersion: number, connection: Connection) {
let plugin = ViewPlugin.fromClass(class {
private pushing = false
private done = false
constructor(private view: EditorView) { this.pull() }
update(update: ViewUpdate) {
if (update.docChanged) this.push()
}
async push() {
let updates = sendableUpdates(this.view.state)
if (this.pushing || !updates.length) return
this.pushing = true
let version = getSyncedVersion(this.view.state)
await pushUpdates(connection, version, updates)
this.pushing = false
if (sendableUpdates(this.view.state).length)
setTimeout(() => this.push(), 100)
}
async pull() {
while (!this.done) {
let version = getSyncedVersion(this.view.state)
let updates = await pullUpdates(connection, version)
this.view.dispatch(receiveUpdates(this.view.state, updates))
}
}
destroy() { this.done = true }
})
return [collab({startVersion}), plugin]
}
Now you can create a wired-up editor view with code like this.
async function createPeer(connection: Connection) {
let {version, doc} = await getDocument(connection)
let state = EditorState.create({
doc,
extensions: [basicSetup, peerExtension(version, connection)]
})
return new EditorView({state})
}
Dropping Old Updates
This implementation endlessly accumulates updates for every single
change made in the editors.
This can be okay, or even useful, since it provides a detailed history
of how the document was written (especially if you add additional data
like timestamps and user identifiers).
But when you don't want or need it, it is possible to, at some point,
start dropping old updates. The downside of this is that it will no
longer be possible for peers that have been offline since those
updates were made to resynchronize with the authority. They will ask
for all updates since a given version, and that data will no longer be
available. Depending on the use case and the time period involved,
this may be acceptable.
It is also possible to “compress” changes using
ChangeSet.compose
to store a less
finely grained history. This will still prevent peers from
synchronizing (when they ask for a version that has been compressed
away), but when done on old data it can greatly reduce the size of the
history data.
Shared Effects
By default, the only thing that is shared through such a
collaborative-editing channel is document changes (the changes
field
in the update objects). Sometimes it is useful to also use this
mechanism to share other information that should be distributed
between clients.
To do that, when calling the collab
extension
constructor, you can pass a
sharedEffects
function which
produces an array of “shared effects” from a transaction. Shared
effects are instances of StateEffect
that
should be applied in other peers as well. In the simplest case,
sharedEffects
could just filter the transaction's effects, picking
out specific types of transactions.
Say, for example, we have a plugin tracking marked regions in the
document. Editors keep a state field with a collection of marked
regions, and have a markRegion
effect that they use to add regions
to this.
import {StateEffect} from "@codemirror/state"
const markRegion = StateEffect.define<{from: number, to: number}>({
map({from, to}, changes) {
from = changes.mapPos(from, 1)
to = changes.mapPos(to, -1)
return from < to ? {from, to} : undefined
}
})
Since the effect refers to positions in the document, it needs a
map
function to map it
through document changes. This will also be used by the collab package
when reconciling local and remote changes.
Now with a function like this as sharedEffects
source, we'd get
these effects in our Update
objects:
import {collab} from "@codemirror/collab"
let markCollab = {
sharedEffects: tr => tr.effects.filter(e => e.is(markRegion))
}
The library doesn't provide any utilities for serializing effects, so
in order to send them around as JSON you need your own custom
serialization code for the effects
field in updates.
But once you set that up, assuming that all peers have some extension
that handles these effects installed, applying them across peers
should just work. The effects get applied in transactions created by
receiveUpdates
, and the state field that
manages such marks will pick them up from there.
There is one thing to keep in mind though. As described in more detail
in the collaborative-editing blog
post,
the kind of position mapping done in the effect's map
function is
not guaranteed to converge to the same positions when applied in
different order by different peers. For some use cases (such as
showing other people's cursor), this may be harmless. For others, you
might need to set up a separate mechanism to periodically synchronize
the positions.