I've spent a lot of time studying CRDTs & order theory in the last year & will publish an article too. Local-first apps are not easy to build, complex data structures (say, calendar repetitions with exceptions) become harder to model. Everything must converge automatically while clients can branch off.
In general, you don't really get to compact tombstones meaningfully without consensus so you really are pushing at least remnants of the entire log around to each client indefinitely. You also can't easily upgrade your db you're stuck looking after legacy data structures indefinitely - or you have to impose arbitrary cut off points.
List CRDTs - which text CRDTs are built from are probably unavoidable except for really really simple applications. Over the last 15 years they have evolved, roughly: WOOT (paper) -> RGA (paper) -> YATA (paper) / YJS (js + later rust port) -> Peritext (paper) / Automerge (rust/js/swift) -> Loro (Rust/js). Cola (rust) is another recent one. The big libs (yjs, automerge, loro) offer full doc models.
Mostly the later ones improve on space, time & intent capture (not interleaving concurrent requests).
The same few guys (Martin Kleppman, Kevin Jahns, Joseph Gentle, probably others) pop up all over the more recent optimisations.
> In general, you don't really get to compact tombstones meaningfully without consensus so you really are pushing at least remnants of the entire log around to each client indefinitely.
This is often much less of a problem in practice than people think. Git repositories also grow without bound, but nobody seems to really notice or care. For diamond types (my library), I lz4 compress the text content itself within the .dt files. The metadata is so small that in many cases the space savings from lz4 compression makes the resulting .dt file - including full change history - smaller than the document on disk.
If anyone really cares about this problem, there are a couple other approaches:
1. In our Eg-walker algorithm, we don't use stable guids to identify insert positions. Thats where other algorithms go wrong, because they need to store these guids indefinitely in case someone references them. For eg-walker, we just store relative positions. This makes the algorithm work more like git, where you can do shallow clones. And everything works, until you want to merge a branch which was split off earlier than your clone point. Then you should be able to download the earlier edits back to the common branch point and merge. You also only need the metadata of earlier edits, not the inserted content. And thats usually tiny.
2. Mike Toomim's Antimatter algorithm is a distributed consensus protocol which can detect when its safe to purge the metadata for old changes. It works even in the presence of partitions in the network. (Its conservative by default - if a partition happens, the network will keep metadata around until that peer comes back, or until some timeout.)
> The same few guys (Martin Kleppman, Kevin Jahns, Joseph Gentle, probably others) pop up all over the more recent optimisations.
Alex Good is another. He's behind a lot of the huge improvements to automerge over the last few years. He's wicked smart.
I love diamond types personally, which I think Loro uses in certain cases. I find not only is DT fast under normal usage, allows for easy archiving and retrieval of past data but it also quickly outputs a per user casuality rendering that LLMs can pretty easily rectify in most cases. Compact and archival to flat files works very well.
Last-Write-Win CRDTs are nice, but I wish the article talked about where CRDT really shine, which is when the state truly converge in a non-destructive way, for example:
1) Counters
While not really useful, they demonstrate this well:
- mutations are +n and -n
- their order do not matter
- converging the state is a matter of applying the operations of remote peers locally
2) Append-only data structures
Useful for accounting, or replication of time-series/logs with no master/slave relationship between nodes (where writes would be accepted only on a "master" node).
- the only mutation is "append"
- converging the state is applying the peers operations then sorting by timestamp
EDIT: add more
3) Multi Value registers (and maps)
Similar to Last-Write-Win registers (and maps), but all writes are kept, the value becomes a set of concurrent values.
4) Many more...
Each is useful for specific use cases. And since not everybody is making collaborative tools, but many are working on distributed systems, I think it's worth it to mention this.
On another note, the article talks about state based CRDTs, where you need to share the whole state. In the examples I gave above, they are operation based CRDTs, where you need to share the operations done on the state and recompute it when needed.
Delta-CRDTs are an optimization over state based CRDTs where you share state diffs instead of the whole state (described in this paper: https://arxiv.org/pdf/1603.01529 ).
I haven't dug into this deeply, but to me CRDTs look like a P2P data structure abstracted to the programming language variable level. The article says they shine when you don't want a central server. But most communication libraries already handle authentication and multiple peers well — and if you designate one peer as canonical (via leader election), conflict resolution is solved. I'm curious what use cases make avoiding a central server worth the paradigm shift. That said, it's a choice of approach — some may prefer the CRDT paradig
Enjoyed the article! As someone who's worked on bits of collaborative software (and is currently helping build one at work), I'd caution people from using CRDTs in general. A central server is the right tool for the job in most cases; there are certain things that are difficult to handle with CRDTs, like permissions and acquiring locks on objects.
Edit: I had an excerpt here which I completely misread. Sorry.
Here's a 2019 Figma article on this, not sure if its representative of their current system
> OTs power most collaborative text-based apps such as Google Docs. They’re the most well-known technique but in researching them, we quickly realized they were overkill for what we wanted to achieve ...
> Figma's tech is instead inspired by something called CRDTs, which stands for conflict-free replicated data types. ... Figma isn't using true CRDTs though. CRDTs are designed for decentralized systems where there is no single central authority to decide what the final state should be
Heh. Find how to grant permissions/ acquire lock in git. You can not. That is fundamental to distributed systems.
Downside: harder to think about it all.
Upside: a rocket may hit the datacenter.
From what I remember about Figma, it can be proclaimed CRDT. Google Docs got their sync algorithm before CRDT was even known (yep, I remember those days!).
Early versions of google docs didn't even implement OT correctly. There were various cases where if two people bolded and unbolded text in a certain order, while connecting and disconnecting their wifi, the documents would go out of sync. You'd end up looking at different documents, even after everything reconnected.
I (obviously) care a lot about fixing these sort of bugs. But its important to remember that in many applications, it matters a lot less than people think.
Of course. But typical, live collaborative software doesn't need to be (and shouldn't be) decentralized. In such software it's annoying to not be able to (even speculatively) acquire unique access to an object. I'd be very surprised if Google Docs used CRDTs now.
In general, you don't really get to compact tombstones meaningfully without consensus so you really are pushing at least remnants of the entire log around to each client indefinitely. You also can't easily upgrade your db you're stuck looking after legacy data structures indefinitely - or you have to impose arbitrary cut off points.
List CRDTs - which text CRDTs are built from are probably unavoidable except for really really simple applications. Over the last 15 years they have evolved, roughly: WOOT (paper) -> RGA (paper) -> YATA (paper) / YJS (js + later rust port) -> Peritext (paper) / Automerge (rust/js/swift) -> Loro (Rust/js). Cola (rust) is another recent one. The big libs (yjs, automerge, loro) offer full doc models.
Mostly the later ones improve on space, time & intent capture (not interleaving concurrent requests).
The same few guys (Martin Kleppman, Kevin Jahns, Joseph Gentle, probably others) pop up all over the more recent optimisations.
This is often much less of a problem in practice than people think. Git repositories also grow without bound, but nobody seems to really notice or care. For diamond types (my library), I lz4 compress the text content itself within the .dt files. The metadata is so small that in many cases the space savings from lz4 compression makes the resulting .dt file - including full change history - smaller than the document on disk.
If anyone really cares about this problem, there are a couple other approaches:
1. In our Eg-walker algorithm, we don't use stable guids to identify insert positions. Thats where other algorithms go wrong, because they need to store these guids indefinitely in case someone references them. For eg-walker, we just store relative positions. This makes the algorithm work more like git, where you can do shallow clones. And everything works, until you want to merge a branch which was split off earlier than your clone point. Then you should be able to download the earlier edits back to the common branch point and merge. You also only need the metadata of earlier edits, not the inserted content. And thats usually tiny.
2. Mike Toomim's Antimatter algorithm is a distributed consensus protocol which can detect when its safe to purge the metadata for old changes. It works even in the presence of partitions in the network. (Its conservative by default - if a partition happens, the network will keep metadata around until that peer comes back, or until some timeout.)
> The same few guys (Martin Kleppman, Kevin Jahns, Joseph Gentle, probably others) pop up all over the more recent optimisations.
Alex Good is another. He's behind a lot of the huge improvements to automerge over the last few years. He's wicked smart.
It would be interesting to try again now.
1) Counters
While not really useful, they demonstrate this well:
2) Append-only data structuresUseful for accounting, or replication of time-series/logs with no master/slave relationship between nodes (where writes would be accepted only on a "master" node).
EDIT: add more3) Multi Value registers (and maps)
Similar to Last-Write-Win registers (and maps), but all writes are kept, the value becomes a set of concurrent values.
4) Many more...
Each is useful for specific use cases. And since not everybody is making collaborative tools, but many are working on distributed systems, I think it's worth it to mention this.
On another note, the article talks about state based CRDTs, where you need to share the whole state. In the examples I gave above, they are operation based CRDTs, where you need to share the operations done on the state and recompute it when needed.
For example, in the Elixir ecosystem, we have Horde ( https://hexdocs.pm/horde/readme.html ) which allows distributing a worker pool over multiple nodes, it's backed by DeltaCrdt ( https://hexdocs.pm/delta_crdt/DeltaCrdt.html ).
Delta-CRDTs are an optimization over state based CRDTs where you share state diffs instead of the whole state (described in this paper: https://arxiv.org/pdf/1603.01529 ).
Also a really well written piece.
An interactive intro to CRDTs - https://news.ycombinator.com/item?id=37764581 - Oct 2023 (130 comments)
Edit: I had an excerpt here which I completely misread. Sorry.
> OTs power most collaborative text-based apps such as Google Docs. They’re the most well-known technique but in researching them, we quickly realized they were overkill for what we wanted to achieve ...
> Figma's tech is instead inspired by something called CRDTs, which stands for conflict-free replicated data types. ... Figma isn't using true CRDTs though. CRDTs are designed for decentralized systems where there is no single central authority to decide what the final state should be
https://www.figma.com/blog/how-figmas-multiplayer-technology...
Downside: harder to think about it all.
Upside: a rocket may hit the datacenter.
From what I remember about Figma, it can be proclaimed CRDT. Google Docs got their sync algorithm before CRDT was even known (yep, I remember those days!).
I (obviously) care a lot about fixing these sort of bugs. But its important to remember that in many applications, it matters a lot less than people think.