Irmin: Git-like distributed DB

amirmc · on Nov 4, 2015

There was some discussion about Irmin just over a year ago [1] and it's been steadily improving ever since. The Readme on the repo [2] lists the bleeding-edge users too, so you can see examples of how it's been applied -- (note to self: we should probably write an update about the progress to date).

[1] https://news.ycombinator.com/item?id=8053687

[2] https://github.com/mirage/irmin/blob/master/README.md#use-ca...

avsm · on Nov 4, 2015

If anyone wants to see a talk about this with a few demos of various use-cases (such as the JavaScript transpilation backend, and log servers), I gave a talk at QCon NYC that just went live on InfoQ yesterday: http://www.infoq.com/presentations/irmin

zmanian · on Nov 4, 2015

I'd like to see more information about transactions. Is it possible to update a several keys atomicly?

eriangazag · on Nov 4, 2015

Yes, you can easily add thing into a "staging area" of your database (which is hold in-memory), and then "commit" your changes to update multiple keys in one go. You can also use this mechanism to keep track of reads (what Git doesn't do) to detect read/write conflicts.

avsm · on Nov 4, 2015

See also an early paper on "mergeable persistent data structures" that shows how to do this for Irmin-based Rope and Queue data structures. http://anil.recoil.org/papers/2015-jfla-irmin.pdf

bratsche · on Nov 4, 2015

The Github repo[1] says "This repository is currently offline". I've never seen this message on github before, I'm not really sure what it means. Github then refers me to the "working when Github goes down" article, but Github hasn't really gone down.. I can access every other repo I try to access. Just not this one.

[1] https://github.com/mirage/irmin

rcy · on Nov 4, 2015

https://status.github.com/

20:34 MST We are continuing to investigate a fileserver outage. Some repositories may be temporarily unavailable.

mwilcox · on Nov 4, 2015

Github is having issues at the moment https://status.github.com/

drewm1980 · on Nov 4, 2015

If github went down I would seek the nearest bomb shelter.

rplnt · on Nov 4, 2015

It's down pretty often.

marknadal · on Nov 4, 2015

From your description I'm not quite sure where in the stack Irmin belongs? Is this to be used by web application developers? I assume note, as this looks like it is targeting more OS level development work?

Pretty cool stuff. I am also working on a distributed database, https://github.com/amark/gun , that operates at the high level (web/javascript) rather than the low level. Although it looks like Irmin can be used in the browser? https://github.com/talex5/irmin-js ? Would love to hear some clarification.

eriangazag · on Nov 4, 2015

As for the rest of MirageOS, Irmin is a "library" database, means that you have a bunch of components than you can re-use in different contexts. Two interesting contexts are:

- the browser, where some components of Irmin are transpiled to JavaScript using js_of_ocaml (http://ocsigen.org/js_of_ocaml/). Cuekeeper (http://roscidus.com/blog/blog/2015/04/28/cuekeeper-gitting-t...) is an interesting use-case for that.

- the kernel, where some components of Irmin can be compiled into a unikernel and be run on top of Xen/baremetal, bypassing the OS completely. Irmin-ARP (http://somerandomidiot.com/blog/2015/04/24/what-a-distribute...) is an interesting step is the direction of exposing kernel data and do interesting stuff with it.

amirmc · on Nov 4, 2015

[ To avoid any confusion, the submitter is not the author of the post :) ]

I don't really understand the question about the stack. Where it belongs should only limited by its features.

It certainly can be used by people developing browser-based apps. For example, Cuekeeper [1] is a version-controlled TODO manager which uses Irmin. It's also been used with Xenstore [2], which is a different use-case (there are other examples in the readme of the Irmin repo).

[1] https://github.com/talex5/cuekeeper

[2] https://github.com/djs55/ocaml-xenstore/tree/irminsule

talex5 · on Nov 4, 2015

Like most OCaml libraries, Irmin can be used in the browser by compiling your program with js_of_ocaml.

My `irmin-js` experiment provides a Javascript API to Irmin, so you can write your application in Javascript too, rather than OCaml. My JS isn't very good though; here's what my example code ended up looking like:

https://github.com/talex5/irmin-js/blob/master/examples/test...

csense · on Nov 4, 2015

How does this compare / contrast with IPFS?

Natanael_L · on Nov 4, 2015

IPFS isn't designed as a database, more as a lookup layer for files

david_ar · on Nov 4, 2015

You'll soon be able to store arbitrary data structures on IPFS [1], and we're discussing how to perform merging in a decentralised manner [2]. I'm actually quite interested in having a persistent data storage system (like Irmin) backed by IPFS in the future.

[1] https://github.com/ipfs/go-ipld [2] https://github.com/ipfs/notes/issues/40

amelius · on Nov 4, 2015

Nice, but there is a fundamental problem with the three-way merge: the guarantees about the result are very weak, and it may require special attention to resolve merge conflicts.

avsm · on Nov 4, 2015

That's sort of the entire point. The idea is that you build datastructures using this library that operate over the three-way merge, and provide their own guarantees about merge conflicts.

For instance, a weakly consistent data structure could promise never to raise a merge conflict, and therefore be safe to compose. A stronger one could raise more precise merge exceptions depending on the exact error, which ripple up to the application.

An example of an application handling this sort of merge error is the Cuekeeper TODO manager (which is a pure JavaScript Irmin app that uses HTML5 Localstorage/IndexedDB). Try opening http://test.roscidus.com/CueKeeper/ in two tabs and creating conflicting changes, and see the Irmin merge error ripple up to the UI.

talex5 · on Nov 4, 2015

Note: CueKeeper is a bit unusual here. Merges always succeed, but it adds a note to the item saying what it did to resolve the conflict.

e.g. if you rename an action "orig" to "a" and to "b" and then merge, you'll end up with an action called "a" with a note saying the change to "b" was discarded.

BTW: it's easier to generate merge conflicts using this web interface, which lets you run two instances in one window:

http://roscidus.com/blog/blog/2015/04/28/cuekeeper-gitting-t...

mbrock · on Nov 4, 2015

Can you expand on the weak guarantees?

Sure, conflicts need to be handled. That's a fundamental problem with reality!

amelius · on Nov 4, 2015

Well, the problem with a merge is that you lose information about the "intention" of the operation that resulted in those specific versions to be merged. All you've got is the end result of those operations. This becomes especially troublesome if there are invariants that need to hold over multiple data-items that have changed in a 3-way merge.

mbrock · on Nov 4, 2015

I would probably understand better with a concrete example. It's not clear what "intention" means, and why it can't be encoded as part of whatever objects are being merged.

shahbazac · on Nov 4, 2015

Heads up to the Irmin folks, looks like your website is down.

amirmc · on Nov 4, 2015

Seems to be fine now.