Rage Against the Finite-State Machines

gosubpl · on April 3, 2016

The biggest problem with FSMs (or generally actors, or more generally distributed systems, microservices, etc.) is that you get emergent complex global behaviours from seemingly simple actor rules. There are two excellent talks on that by Torben Hoffman from Erlang solutions: http://www.infoq.com/presentations/erlang-async-protocols and http://www.slideshare.net/torbenhoffmann90/2014-12ndcthinkin... . Also, worth noting, Paxos pseudo-code is only 25 lines of code for all participating actors: http://nil.csail.mit.edu/6.824/2015/notes/paxos-code.html but it took quite a while for the community to verify that this code is actually correct for all cases.

lostcolony · on April 3, 2016

I've seen this in practice, but it begs the question (and it might be answered in the talks): what is the alternative? That is, I have this corpus of rules and behaviors that I can express as an FSM...is there a better way of expressing it, given that I actually need something that behaves according to them?

galangalalgol · on April 3, 2016

As another comment below points out, any non trivial software contains FSMs but they can express it in any number of ways. Haskell trying to avoid mutable state would probably pass around a constant data structure recursively through a series of pure functions. I find explicit FSMs difficult to reason about and find pure functions operating on complicated data structures much easier to reason about. It is irrational to conclude there is one clearest representation for everyone even within a single problem space. Experiment, find the one that suits you and show understanding to those who think and learn in other ways.

ArkyBeagle · on April 3, 2016

The mechanism for learning to think in state machines is message sequence charts. For each state there is usually a short list of Things That Can Go Wrong ( timeout, race condition, garble, inappropriate message ).

Ya gotta think like a clockmaker.

im_down_w_otp · on April 3, 2016

Erlang is doing exactly what you're saying here.

mannykannot · on April 3, 2016

You are right - while there may be a simpler FSM that is equivalent to the one you have, the irreducible complexity that can arise from apparently simple rules is a feature of the problem domain. FSMs are part of the solution domain, and the best you can hope for in the solution domain is to avoid making things more complicated than they have to be. That does not rule out there being alternatives to FSMs that are easier to reason about.

marcosdumay · on April 3, 2016

If you have more than one layer with implicit mutable state, you are up for emergent behavior. It is not limited to FSM.

But FSM are just too easy, so people often use it to make implicit some state that would be explicit otherwise. In this sense, yes, they do cause emergent behavior.

The alternatives are joining your state at a single layer, making sure your states are really orthogonal to each other (very hard if both do IO), or making your state explicit so that you can at least verify once you find a problem.

ArkyBeagle · on April 3, 2016

So don't make state implicit. This is critical. Orthogonality is just something you do.

I've found it offends some people, but it works and generally, their way doesn't.

But the founding principle is "trust no one" in distributed systems.

marcosdumay · on April 3, 2016

I wouldn't put it that black on white. Orthogonalising implicit state is just great for managing complex behavior. It's a very powerful tool, and it can be made foolproof on lots of cases.

But if two layers do IO, you'll almost certainly want the state to be explicit.

ArkyBeagle · on April 3, 2016

I don't think foolproof costs all that much, especially if it's habit. YMMV.

I have a BluRay player with Java(tm). If it sits more than a day or twenty, I have to unplug it, count to ten and plug it back in. I have a TV that has seizures - loud, painful seizures ( that always test the ability of an amp connected to it to defend itself ) . Scares the crap out of the dogs.

This is not necessary. This is not good.

readdcft · on April 4, 2016

it begs the question: what is the alternative?

Here is one: https://news.ycombinator.com/item?id=11418855 (intended as a reply to your question but accidentally posted to the thread)

ArkyBeagle · on April 3, 2016

But this is nothing more than a variation on the Two Generals Problem for distributed systems.

Given an FSM and a set of messages, and a model of timing, it may be possible to fully test out the most interesting subset of transactions and test out many of the pathological cases. But instrumentation in anticipation of finding unanticipated pathological cases is pretty important.

This is bounded by cost, just as all due diligence processes are. Well-designed instrumentation is the key.

FSMs are not the problem here - with infinite bandwidth channels and perfect transmission, you wouldn't even need all that.

mwilcox · on April 3, 2016

I'd say those are features of agent-based systems as opposed to problems. But it depends on the application

gosubpl · on April 3, 2016

Features, yes, provided one has time to actually think of the protocol and express it in terms of some formalism (CSP, TLA+) to at least informally sketch a proof of major global properties. Or do some extensive testing (https://aphyr.com/posts/291-jepsen-zookeeper https://aphyr.com/posts/316-jepsen-etcd-and-consul ).

api · on April 3, 2016

We are possibly one such emergent behavior in an agent based system, as are bees and fish and trees and whales. So yes. :)

petra · on April 3, 2016

On the other hand, hierarchical-state-machine (like FSM, with hierarchy), are often used in embedded systems(including safety-critical systems) and are considered among the more reliable and bug-free methodologies.

So maybe we shouldn't rule-out FSM's but be subtle in their use ?

AstralStorm · on April 4, 2016

Better still, why not use maths and convert FSM into mathematical proofs? It is automatic if the FSM is fully specified and the semi-automatic part could catch insufficient specification. (to borrow words from reply above: certain kinds of implicit state and assumptions on the platform)

Something like Isabelle/HOL etc.

jonahx · on April 3, 2016

> The biggest problem with FSMs (or generally actors, or more generally distributed systems, microservices, etc.)

I'm intrigued by this sentence. Can you explain how actors and microservices are a generalization of FSMs?

theseoafs · on April 3, 2016

I think you misread OP's sentence. Actors and micro services are not really related to FSM's.

You can make services and actors which can be modeled as FSM's but they don't have to be.

skewart · on April 3, 2016

I'd say it really comes down to whether your actor or microservice has any statefulness or if it's idempotent.

If you have a database behind your microservice, or some kind of in-memory cache, where you ever modify values instead of just writing and reading them then there's a good chance it's an FSM. On the other hand if it's purely functional - say, it just returns a thumbnail image for any photo you give it - then it's not really an FSM.

From what I've seen an awful lot of microservices out there have some amount of state (not sure about actors). I think that's why the author made the generalization.

arielb1 · on April 3, 2016

The complicated part of Paxos is not these 25 lines of code, but all the extensions people add over them.

darksaints · on April 3, 2016

I'm not exactly very well acquainted with the theoretical underpinnings of Computer Science (except where they intersect with other fields that are more my specialty like Operations Research and Economics), but Finite State Machines have always struck me as incredibly intuitive modeling structures for thousands of real life phenomenon. And I'm continuously surprised that professional SDEs with academic training in CS are so reluctant to use them to model their domains. I realize that avoiding state has its benefits in a lot of areas of software design, but the world is stateful and if you are avoiding state just to make your job easier, you are doing it wrong.

One area that absolutely baffles me is UI Framework design. Every single UI in the world can be modeled as an FSM because they are FSMs. And yet, when you go to any common framework for UI design, you are forced to use concepts that only tangentially touch the idea of modeling a UI as an FSM. And while the new breed of Reactive Functional Programming approaches have incredibly useful concepts (like reducers and signals), they still require incredibly imperative approaches to state transitions.

And unfortunately the worst part of web design is the language. Javascript is terrible for modeling FSMs: recursion is an afterthought, state can be so shapeless and dynamic to the point where you need to query your state to be able to identify where you are in it, and listening to events is a chore of epic proportions. The closest that I've found to actual FSM design on the web has been React.js, and even then it only seems like they accidentally stumbled onto the idea of using FSMs to model UIs. The state of the art in modeling state in React (Redux) eschews component local state religiously, forcing you towards modeling every possible DOM event through a gigantic global structure. It ends up being a gigantic pain in the ass.

I really wish I could find a cross platform UI framework with a true FSM approach to modeling UIs. It doesn't need to be complicated: All you need is a component model and you program a component by describing each component's state transition table (state0 * event=> action * state1), and the mapping of the state to its immutable and pure renderer. There literally is nothing more to it than that. Some day I'm just gonna lose my shit enough to compel me to write my own framework and model it correctly. Until then, I'm gonna bitch about it on the internet.

mixmastamyk · on April 3, 2016

SDEs?

Yes, I've encountered many instances of developers reinventing FSMs poorly because they never studied them, over the course of my career. I try to counter that by giving a short presentation on them at each place I work.

Your FSM UI sounds interesting. It would be a lot of work to accomplish fully, but perhaps instead a blog post on what it would/should look like would be enough to get people interested in the idea.

wlievens · on April 3, 2016

Software Development Engineer I assumed. Maybe a slash in there.

voidfunc · on April 3, 2016

I find that title is often used by bigger traditional Fortune 500 type companies.

logical42 · on April 4, 2016

I'm pretty sure that ember had an internal object called Ember.State at one point and another called Ember.StateManager to (naturally) manage the relationships between states. This was a few years back so I can't say what's happened internally since then, but as far as I remember the router was intended to basically encapsulate the logic of an FSM.

szines · on April 6, 2016

Yep, you are right. There was Ember.State and Ember.StateManager, but it was extracted from the final v1.0.0. Instead of having one universal State Manager, router, views, etc got they own.

However, this universal State Manager is still exist and you can add to your project. https://github.com/emberjs/ember-states

After spending 15 minutes in the source code, it is ready to use... for example, it is brilliant to manage states in a component tree, where the original router would overkill.

nilkn · on April 4, 2016

Please correct me if I'm wrong, but I think the approach of your final paragraph could be very straightforward to implement in Elm (a purely functional language that also has one of the best implementations of FRP that you can find anywhere, in any language). See this article on "the Elm architecture", which I think could be very naturally extended in the way you propose:

https://github.com/evancz/elm-architecture-tutorial

Elm's approach involves having completely declarative, purely functional UI code (with efficient update computations resolved via a virtual DOM in the background), where individual components are highly composable and referentially transparent.

darksaints · on April 4, 2016

In terms of just web programming, Elm is amazingly close to my dream. I have some beefs with the language (Ports are unintuitive, generics higher kinded polymorphism are non-existent, haskell syntax), but they are pretty small beans. The big part that I find missing is the cross platform abilities (I'm talking web, mobile, and desktop platforms without the slow/hacky do-everything-in-html5 methods).

And to be honest, if I were to start from scratch, I'd probably be re-implementing 90% of Elm anyway. Which maybe says to me that instead of starting from scratch, I should just join the Elm community and start hacking on the compiler backends (for compiling to platform-native code) and front end libraries (to target native UI framework code), and hope the language grows closer to what I would like.

eximius · on April 3, 2016

Wouldn't a FSM UI have an absurd number of states? (from my limited understanding of them)

sesteel · on April 3, 2016

The states exist whether you model them or not, but yes using one FSM for entire UI would potentially grow too large and cumbersome to be practical. They do, however, make more sense targeted at particular levels of abstraction: system, component, etc.

monocasa · on April 3, 2016

Sure, if you had one giant FSM.

vbhavsar · on April 3, 2016

We make extensive use of Akka FSMs [1]. I find their implementation to be a lot more intuitive and less verbose than the Erlang implementation. Of course, that is in large part due to Scala's succinctness and fantastic DSL support.

FSMs make it so easy to read and debug code. It helps that Akka uses an actor system (based on Erlang's actor system), which simplifies dealing with concurrency.

> However, if the dog is sitting and you pet it, we have no idea what might happen. In the Erlang world, the dog could crash (and eventually be restarted by its supervisor).

Akka FSMs have a whenUnhandled state which can handle any event that your FSMs don't explicitly handle. Also, not handling an event doesn't cause the actor to crash. The event ends up being dead lettered.

[1] http://doc.akka.io/docs/akka/snapshot/scala/fsm.html

lostcolony · on April 3, 2016

So, just to point out, the quote you're referencing is about the diagram, not the code. That is, the expected use case of the dog is undefined in that case; what the code does is thus undefined, and up to the implementor. It COULD crash. Or it could be handled. Or the message could just sit in the mailbox unanswered (messages that match no case clauses sit in the mailbox because the code may at a later date be able to handle such a message; see below). It's up to you.

We even see an example of that in the code. All those pattern matches with _ in them are the 'default' handling, the "it matched nothing else so match here and handle it". If it matched part of the receive clause, but that was not coded to handle that kind of message (your guard was insufficient), then it could crash (eg: receive {event, Pid} -> Pid ! blah, but you send the message {event, "stuff"}, would crash, since you can't send an event to a string).

That's actually a pattern I've noticed when it comes to Erlang vs Scala/Akka. A lot of the things that have special syntax in Scala/Akka, don't in Erlang, because they just flow naturally out of the language primitives. Another example is Akka's "become". In Erlang, you just call another function, that has a different receive block. And that's also why messages just sit in the mailbox if they match nothing; because the actor may enter a different receive block at a later time, that -can- respond to that kind of message.

rdtsc · on April 3, 2016

> Akka FSMs have a whenUnhandled state which can handle any event that your FSMs don't explicitly handle.

Erlang's gen_fsm has that too:

http://erlang.org/doc/man/gen_fsm.html#Module:handle_event-3

However that is a dangerous anti-pattern usually. More often than not, I want the actor to crash and have a proper trace and notification of it most of the time than have it go to some unspecified state and hang around forever in that broken state.

moron4hire · on April 3, 2016

Every non-trivial program has an implicit FSM in it somewhere. Yes, the quality of the program is only as good as your analysis of the states and transitions between them, but it turns out that is true for all programs, regardless of whether or not you implement them as an explicit FSM. Explicitly specifying the FSM means you have a better target for analysis when you define missing/bad transitions.

rdtsc · on April 3, 2016

There is a successor to gen_fsm in Erlang being developed -- gen_statem.

Here is the work in progress docs:

http://erlang.org/~raimo/doc-7.2/lib/stdlib-2.7/doc/html/gen...

And the community discussion:

https://github.com/erlang/otp/pull/960

_ezrr · on April 3, 2016

I've always liked fsm in Erlang and used it for a few things. Can anyone tell me why Elixir didn't include it in the core?

phieromnimon · on April 3, 2016

They decided that it was too confusing and suggested that developers seek other solutions.

I wrote an Elixir wrapper around it 2 days ago, and pushed it up to hex with the name gen_fsm. Here's the github page: https://github.com/pavlos/gen_fsm if you want to check it out.

julie1 · on April 3, 2016

Well any asynchronous programming requires FSM. Thus don't do distributed system or asynch if you don't like FSM. It may not be self taught.

The explosion of transitions with the number of states is unavoidable by nature. It requires thinking and design on paper before implementing.

Here is example of hpib that includes the FSM for the communication. These people did a great job and yes it is hard

http://www.hp9845.net/9845/tutorials/hpib/

readdcft · on April 4, 2016

Experience with Rules-Based Programming for Distributed, Concurrent, Fault-Tolerant Code

The paper’s third contribution is to demonstrate the value of the rules-based approach. Rules allowed us to solve a wide range of problems in RAMCloud using a small amount of code (only 30-300 lines of rules-based code for each DCFT module). Rules are also efficient: when used in the critical path of RAMCloud’s write operations, rules overheads account for only about 200-300 ns out of the total write time of 13.5 μs. The rules-based approach is a specialized form of an event- driven state machine, but it results in cleaner factoring and simpler code than the traditional approach to state machines. We reimplemented the scheduler for Hadoop MapReduce (which uses the traditional approach) using rules; our rules-based implementation replaced 163 state transitions with only 19 rules.

https://ramcloud.atlassian.net/wiki/download/attachments/684...

EDIT: the code behind that paper appears to be here:

https://github.com/PlatformLab/RAMCloud/blob/master/src/Inde...

AKrumbach · on April 3, 2016

While the meat of the article may be (somewhat) good, I take umbrage with the very first sentence: "A finite-state machine (FSM) is not really a machine, but it does have a finite number of states."

A finite state machine is a machine in the same sense we can claim this of a wheelchair ramp: both can be thought of as a "simple machine" (level, pulley, wheel, incline, etc) from which we build "real machines" [in the ordinary, conversational sense] like engines and dishwashers.

logicrook · on April 3, 2016

A Turing machine is not really a machine. A TM can simulate the abstraction of a machine, a machine can simulate a TM. You can put bread in your toaster, you can't put the bread in a Turing machine (unless it's a toaster-enhanced TM with a special bread input tape and a temperature oracle).

dahart · on April 3, 2016

Ah, The Treachery of Turing Machines, Magritte would be proud. Yes, a FSM diagram cannot toast bread, just like a John Deere Schematic cannot plow fields. But if we want to play this game, unfortunately we'll have to get a little pedantic and reveal whether we're talking about design or implementation.

The Turing Machine is an abstract mathematical model for designing real machines. A Turing Machine is a design for a specific real machine. The computer or software of your choice is a real machine that is also a Turing Machine.

My toaster's special bread input tape is implemented using a mechanical lever. It's temperature oracle is implemented using a temperature sensor and a digital timer. I believe I have a toaster-enhanced TM.

The Turing Machine is really a machine, in every practical sense.

logicrook · on April 3, 2016

My comment's intention was to illustrate that the derided sentence simply intended to say "it's abstract". So now, let's see how "The Turing Machine is really a machine, in every practical sense." is false, starting with the Oxford's dictionary definition of machine:

An apparatus using mechanical power and having several parts, each with a definite function and together performing a particular task.

So a mathematical notion definitely does not satisfy this definition. An abstract machine is definitely different from a machine. The map is not the territory (or was Korzybski wrong all along?).

dahart · on April 3, 2016

Clearly you understand! An abstract machine is different from a machine, that nicely summarizes what I said. Good to know we can use the word "machine" for multiple different kinds of machines. Glad to see you agree that a Finite State Machine is a machine of the abstract variety. I'm happy that we've moved past the semantics into the practical realm of allowing communication to proceed by acknowledging that words like "machine" don't always conform to a single literal interpretation. I'm sure then that you agree with @AKrumbach and me that the opening sentence of the article was indeed not very informative, even if it was a little controversial and provided a fun thought experiment.

jwdunne · on April 3, 2016

A turing machine is an abstract machine. That fits with your sense of the definition. Sorta like category and group in maths are a bit different from what we consider a category and group in the real world.

Retra · on April 3, 2016

You can put bread in your Turing machine, you just need to convert it to some language first.

dpeck · on April 3, 2016

I've found gen_fsm to be a bit of a beast to reason about, and I'm not alone the core team announced at Erlang Factory a few weeks ago that there will be a new FSM implementation in an upcoming release https://youtu.be/YlNrWxH56_E?t=985

arthurcolle · on April 3, 2016

https://news.ycombinator.com/item?id=11416119

avip · on April 3, 2016

But how did he come up with those lovely diagrams? If any Gliffy-like tool is capable of that, I'd like to know.

seivan · on April 3, 2016

I've stopped using FSM for Behaviour Trees instead. Though not related to the problems here, it just makes it easier to design and reason about. Not to mention more decoupled and thus reusable.

known · on April 3, 2016

A quick web server that was written to showcase the sigio/siginfo event model. consider this code highly experimental and yourself highly mental if you try and use it in a production environment http://www.kegel.com/c10k.html#examples.nb.sigio

PaulHoule · on April 3, 2016

Cats rule, dogs drool.