Here's a devil-in-the-details question that you might consider adding to your ex...

mononcqc · on Feb 8, 2016

Author here. This is a challenging one, because it is intimately related to what is acceptable or not to your users.

By default you could say that if the storage mechanism must be up and available and it isn't, then the front-end shouldn't be responsive and it should crash.

You could also say that you want the front-end app to be available if the storage layer is offline. This has two possible consequences:

a) you disconnect the front-end and the back-end so that they do not depend on each other. This can be done either through application strategies (you can define the storage app as 'transient' so it can fail without shutting down the system) or by putting the front-end on a different Erlang node.

The latter means that your dependency on the storage back-end is not as direct as it seems.

b) this is my preferred solution, and it requires you to rework what you think of as 'depends on'. If you expect the storage layer to fail and that you must be able to service the front-end anyway, then the architecture demoed in the presentation needs an asterisk.

The reason for this is that the dependency as described crashes if the database is not available, because the storage subtree acts as a proxy for 'the database'. The OTP structure encodes 'my database is available'.

I can rework that requirement to mean 'the storage layer is up and ready to talk to a database'. This is a huge change because it no longer promise the DB is available, it promises that something whose job it is to talk to the DB is available.

You can then change your interface accordingly. I go into some more detail about this in "It's about the guarantees" http://ferd.ca/it-s-about-the-guarantees.html

In a nutshell, the difference in both initialization and supervision approaches is that in the one described in b), the client's callers make the decision about how much failure they can tolerate, not the client itself. The client making the decisions is what is described in the presentation.

Sadly I could not fit all of that and the compromise of supervision structures within the hour I had allocated for my presentation, so this comment and the side-blog post ought to do (I've also put that material in Erlang in Anger, if you happen to grab that free ebook).

davidw · on Feb 8, 2016

I wish more people would talk about this kind of thing in the Erlang world. Supervision trees are nice, but there are real-world examples like the above where it's not quite so cut-and-dried, and some additional design is required. Each of your proposed solutions involves compromises, costs, and benefits of their own that may not be obvious to someone new to Erlang.

The insight of people such as yourself who have already run into these problems is very valuable to those of us with less experience.

Thanks!

mononcqc · on Feb 8, 2016

I think a lot of these things are experience-related, or usually cemented within a specific implementation. A lot of people may apply these principles correctly because that's what they find works best, without necessarily bringing it to a conscious level, or to a level of explicitness that makes it easy to teach or use.

Garrett Smith is starting to hit on that with http://www.erlangpatterns.org/ and trying to broadcast that kind of information to the rest of the community, but I'm guessing participation hasn't been strong enough to help (I know I haven't participated enough to that website personally)

agumonkey · on Feb 8, 2016

Having an independent failure system to take place as interface ?

hprx · on Feb 9, 2016

http://tjheeta.github.io/2014/12/24/elixir-external-process-...

Someone has also written a library for this, but I've forgotten where it is.