Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Fallacies of Distributed Computing (wikipedia.org)
139 points by mindcrime on April 21, 2011 | hide | past | favorite | 48 comments


I'm not sure whether the OP is implying that Amazon is guilty of these fallacies or whether AWS's users are.

Having worked at Amazon, I can attest that Amazon does take these issues into account, and many more. That said, the purpose of the services that failed isn't to guard against network partitioning at the data center cluster level. The services that were designed with cluster network partitions in mind are still up (S3, EBS, SQS, etc).

If you are renting instances in a specific data center, and you wish to have cross-cluster redundancy, it is your responsibility to build that into your distributed system. This is ultimately a question of whether you need to commit to an SLA higher than that of the service on which you depend. (It's more complicated than that, but that should suffice for simplicity's sake).

There are companies that built services on top of EC2 to mitigate these risks (e.g. Elastra), and there are competing solutions that provide application-level hosting that should guard against such partitions (e.g. Windows Azure).


I'm not sure whether the OP is implying that Amazon is guilty of these fallacies or whether AWS's users are.

I had AWS customers in mind, more than Amazon.com themselves.

If you are renting instances in a specific data center, and you wish to have cross-cluster redundancy, it is your responsibility to build that into your distributed system.

Exactly. It's easy, though, to build something on top of AWS (or Rackspace or whatever) and then stand back, marvel at it's magnificence; and forget that an outage - in a network segment or service that you don't control - can totally wreck your masterpiece.


Fair enough. As I mentioned in my earlier post, there are available solutions that mitigate the risks of these fallacies for AWS's customers. When you build on top of a networked service, you are taking a hard dependency on its SLA, and if you need something with greater reliability, it is your responsibility to provide the necessary redundancy.


We use AWS for the following reasons:

* Flexible pricing, with fair costs

* Instant, flexible provisioning

* Awesome APIs and third party libraries (e.g. boto)

* Extensive documentation/support in the wild (i.e. you can get just about any question answered via Google or Server Fault)

* Solid uptime track record with SLA to back it up

Did I expect that all server outages would magically disappear because we were now in the "cloud"? Not in the least bit.

Did others sign up for AWS thinking that? Sounds incredibly foolish to me if they did. But this sounds suspiciously like a straw man to me...


Did others sign up for AWS thinking that? Sounds incredibly foolish to me if they did. But this sounds suspiciously like a straw man to me...

There's no straw-man, because nothing is being debated. That said, I do have a hunch that some non-trivial number of Amazon AWS customers do sign-up, build significant parts of their business on AWS, and not fully consider the implications of this configuration in terms of reliability, uptime, etc.

Building in the kind of redundancy and failover mechanisms to handle outages like this isn't easy, but it is important; and awareness of the kinds of things pointed out by the "Eight Fallacies" is part of that.


We probably agree but are perhaps looking at it from two different angles.

I guess I took the implication of your sharing this list of fallacies to be, "A buncha suckers got duped by all the cloud services hype."

As an actual AWS customer, I'm just sharing my perspective -- namely, that I never expected to be immune to unpredictable service outages like these, and that this is not at all a reason why I signed up for AWS. Maybe other did, but I am not personally acquainted with any of them if so.

I do slightly disagree with your assertion that building in redundancy and failover mechanism is by definition "important." Important is a relative term and is probably more meaningfully looked at from a business or cost/benefit perspective. For a company like Netflix it's clearly massively important to have a fully functioning failover plan in place. For startups like Quora, it probably suffices to have a simple service outage message.

Now if you'll excuse me, I have to go back to hitting refresh over and over on the AWS Status page.


I guess I took the implication of your sharing this list of fallacies to be, "A buncha suckers got duped by all the cloud services hype."

No, not at all. I'm a big advocate for cloud services, including AWS.

I do slightly disagree with your assertion that building in redundancy and failover mechanism is by definition "important." Important is a relative term and is probably more meaningfully looked at from a business or cost/benefit perspective.

True.. I was generalizing a bit, due to laziness. That and it didn't seem important - at the time - to get into all that. But yes, I agree... the importance of failover / redundancy / etc. is clearly a business decision which is related to a number of factors.


That it is not being debated is the reason it's a strawman. The parent is suggesting that the list of fallacies is a response to an argument that nobody is making. I'm not agreeing or disagreeing, but that seems to be what he meant.

If I came up to you and said, "You know, it's a fallacy for you to have assumed that your car wouldn't require any maintenance," you'd probably respond with something like, "Uh, yeah."


That it is not being debated is the reason it's a strawman.

That's not how I've understood the use of the term "strawman," FWIW. I've always felt that a strawman can only exist inside the context of an argument... that is, that a strawman is a technique used in rhetoric/debate, where you setup a false argument for your opponent and then tear it down. That seems to jibe with what Wikipedia says:

"A straw man is a component of an argument and is an informal fallacy based on misrepresentation of an opponent's position.[1] To "attack a straw man" is to create the illusion of having refuted a proposition by substituting it with a superficially similar yet unequivalent proposition (the "straw man"), and refuting it, without ever having actually refuted the original position."

In this case, there was no debate, no opponent, etc., ergo, no strawman.

Maybe there's another usage of the term, that I'm not acquainted with?


Observe the example given by Wikipedia:

1. Person A has position X.

2. Person B disregards certain key points of X and [...]

3. Person B attacks position Y, concluding that X is false/incorrect/flawed.

Person A doesn't need to be present or even know that his position is being spun for Person B to establish and attack a strawman. The strawman is presented to force a debate over something that Person A never asserted. That is, the debate comes after and is a result of the strawman. There may have existed a debate previously, but it is not a requirement for one that is provoked by the introduction of a strawman.

You might have heard someone respond to a strawman with, "Do you still beat your wife?," which is an obvious example of the tactic that is thrown in the face of the offender as a way of demonstrating its unfairness and ridiculousness.

And imagine the possible responses to the question and why it might leave someone helplessly searching for a response that doesn't damn him:

1. Deny having ever beaten his wife, thus lending credence to the implication by acknowledging it.

2. Ignore the question, thus lending credence to the implication by not denying it.

So the submission could be seen as an implication that people who don't succumb to these fallacies actually do, leaving them with the choice of ignoring the assertion or leaving a comment to deny ever having been naive about any of those points.

I honestly have never heard that there has to exist an argument or a specific opponent for a strawman to be established. Someone please correct me if I am off.


> 1. Person A has position X.

> 2. Person B disregards certain key points of X and [...]

> 3. Person B attacks position Y, concluding that X is false/incorrect/flawed.

> Person A doesn't need to be present or even know that his position is being spun for Person B to establish and attack a strawman.

In the degenerate case, person B only identifies person A by the position he claims that person A holds. It should be obvious that there is no strawman here, since A does not exist unless A holds the position B claims.

It could be argued that, as B gives A more and more distinguishing characteristics, he slides closer to strawman territory. But I don't think the OP was close enough to be in danger.


I think I understand your reasons for sharing this link per our conversation earlier, but to clarify what I meant by "straw man":

I was referring to a lot of discussions here today that take the form of "All those people who thought moving to the cloud was going to magically fix their problems and solve all reliability/availability issues once and for all sure do have egg on their face today."

My point was that I don't myself know a single person or company who is deploying with Amazon or other cloud services who thinks that way. It is this fictional person who is the straw man I was referring to.


There are more general issues at AWS this morning, but the continuing issues all center around EBS.

I think this is just reinforcing to me to never use a networked filesystem. It is the wrong abstraction for storing data across multiple physical units. Too often you end up hanging the whole box waiting for an IO operation to finish.

NFS fell out of favor for most Linux things a long time ago for these reasons.

IBM's GPFS... I've seen it fail spectacularly. But their marketing materials said it is awesome!

And now Amazon's EBS, essentially a networked block store, is failing.

Don't trust a vendor about networked storage. Its a lie, just like the cake.


That's an odd thing to say.

Internal networks can easily be more reliable than applications given that higher end equipment has a lot more money poured into it at a development level, and anything above the bottom end of the market typically has multiple layers of redundancy options available. Wether it's VRRP, stacking, bonding, IPMP, iSCSI channels etc.

EBS is unreliable, granted, but blaming the network seems faulty.

For one, it's nothing like NFS. For two, NFS can be adapted to highly-available systems fairly easily. These days it's a pretty simple system. Even with just a single Array, put two FreeBSD "heads" on it, MPIO connectivity, and CARP between the boxes. Monitor the CARP interface to import/export the file-systems and there you go. If you can tolerate a minute or two of downtime when a host fails you have reliable, automatic redundancy.

EBS, FCoE, iSCSI, et all are entirely different beasts. But again, you can make the hosts redundant pretty easily. With iSCSI you can give the clients multiple paths to the devices increasing reliability and available bandwidth without LACP. Consolidate storage needs to save money. It's win/win/win.

Networked storage, wether it's block devices or shared file-systems, whatever the technology, aren't going away anytime soon. Matter of fact, they're becoming more critical to operations day by day. And these days, they're one of the easiest things to setup, and ensure availability on.

That EBS is the weak-link at Amazon is a bit counter-intuitive based on my own experience. I wonder what the root problems actually are...


Let me summarize the argument against using network filesystems. Elsewhere on the HN homepage as I write this [1] there is a link to the Wikipedia article on the Fallacies of Distributed Computing [2], which are:

    1. The network is reliable.
    2. Latency is zero.
    3. Bandwidth is infinite.
    4. The network is secure.
    5. Topology doesn't change.
    6. There is one administrator.
    7. Transport cost is zero.
    8. The network is homogeneous.
Technically, every one of them is a myth of local disk storage as well, inasmuch as nothing is every 100% reliable in the mathematical sense. In reality, unlike network programming where the moment you step out into the big world you are slammed in the face by the mythical nature of these claims, they are true enough for when you are working with local disk that you can write significant programs as if they are true, excepting perhaps latency (busy disk may mean you get scheduled much later). The basic argument is the practical differences are so significant that where the file interface is most effective when it just pretends most of the myths are true because adding the complexity of being able to deal with violations isn't worth it (and most programmers using local disk would get it wrong even if you offered it; how many programs gracefully handle running out of disk space, let alone more bizarre failure cases?), the network must give you the tools to handle those issues, and correspondingly must have a more complicated API.

In particular, every time you so much as touch the network, you need a code path to handle the possibility that the network communication utterly failed, just disappeared into oblivion. You just don't code that way when you deal with files; the file may be gone, the disk may run out of space, but you don't sit there in the middle of a fast read loop and handle failure on every read, such as a topology change, i.e., "the disk drive is no longer present". That's not because it's not possible, because you can certainly yank out a USB key mid-read, but you tend not to handle it in most userland code. In fact the OS won't even let you handle many failures, as it will either abstract away the retries or simply lock your userspace hard.

So the fundamental idea is that filesystem abstractions built for relatively reliable communication simply isn't correct for network communication.

Incidentally, the same argument can pretty much be made against remote RPC by simply replacing "disk access" with "function call". You do not have to guard every function call you make with a check for absolute failure. (Not the function returning failure or the function throwing an exception, but the function that may have been there two minutes ago unexpectedly being entirely gone.) RPC may look cute in the source code but it's the wrong layer of abstraction.

[1]: http://news.ycombinator.com/item?id=2470865

[2]: http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Comput...


That's actually this post (the fallacies link). :-)

And you make a good argument about handling read errors.

In practice that's not what high availability is about though, and it's certainly not what we're talking about with AWS, iSCSI or even an HA-NFS solution.

Availability in those terms is properly designed, redundant infrastructure. There are many cases I'm sure (say a Stock-Exchange) where HA-NFS would not be appropriate. In your average web-app however, request failures while transitioning to failover are (in my experience) just part of the bargain.

In that context, waiting 60 seconds for a stale NFS handle to pick a new connection up on the VRRPed failover node is acceptable, and like you said, you make assumptions and don't worry about the edge-case here.

There are clearly situations where NFS wouldn't work well, but those are definitely the exception to the rule. There are not a lot of reasons/scenarios where iSCSI couldn't or shouldn't replace local storage for virtualization needs however. In my opinion.

iSCSI is simple and extremely reliable. Perhaps even more reliable than local disks, especially if the primary concern is availability.

Frankly, I don't know that these rules are really something that applies at the level of a polished iSCSI or NFS stack. They certainly apply to the internals of those systems. While you or I aren't going to worry about wether a buffered-file-reader can fetch the next 8K block, that thought and engineering effort definitely has been put into these protocols and implementations at a lower level than you or I will typically work with.


EBS isn't a shared filesystem, its a block device mountable by only one host (instance) at a time. Much more analogous to fibrechannel or iscsi than to nfs.


From somebody who only tends to do single-system administration...what do you use in place of NFS? Local storage and sync it? Something else?


Not sure if I agree. Networked filesystems require a strong consistency model, which forces you to give up some reliability, as we can plainly see today. That doesn't mean to say that strong consistency models are bad. They're easier and cheaper to work with.

If your system is not so mission-critical or self-sufficient that spending the time, effort and money to develop a fault-tolerant, eventually consistent system (e.g., see Netflix) is worth doing, then relying on a network filesystem isn't such a bad option. Of course, when you do, you should realize that the reliability issues tend to be a lot more fundamental than which provider you use, the number of customers they have, or what buzzword people associate it with them.


NFS on a well designed and maintained LAN is ok, but on any network less reliable I agree it's the wrong abstraction. You can only get away with hiding the network in those rare situations where the network can be made that robust.


That is the point though, the #1 Fallacies of Distributed Computing: The network is reliable.

There is no such thing as a well designed and maintained LAN. All networks are unreliable.


I worked for the last ten years at a major animation studio where we pushed huge I/O over NFS every day. This was a lot easier to do with NFS semantics than it would have been with a less "fallacious" model, but we only got away with it because we had complete control over the whole network.


Why do people think being in the "cloud" for hosting changes any of these rules? EC2 is a rentable compute utility, it doesn't provide for automatic failure or redundancy. Also people keep talking about an EC2 outage but it is really 1 region and from the status updates, maybe only a single Availability Zone that is affected.

That said, maybe I'm just less upset because my site is in a US-East Zone that doesn't appear to be affected.


"EC2 is a rentable compute utility, it doesn't provide for automatic failure or redundancy."

Nobody wants to admit that "cloud" became a great way to market (and mark up) traditional web hosting services that we've had since the early 90s.

It still boggles my mind that Microsoft's Azure, which overall I'm a fan of, makes me pick an instance size (and corresponding price) for my app; the original premise of the cloud was to take these kinds decisions out of the mix. What happens when my micro-instance pegs out? It won't automatically bump up to the next higher level or spin up a new instance for me. This isn't "cloud" this is marketing.


The difference is in the word "utility".

In the 90s, hosting was a fixed quantity for a fixed price.

In the 00/10s, hosting is whatever you need whenever you need it and pay for whatever you use.

It's like going from chopped wood to an electric grid, a radical transformation of technology and business models.


A radical transformation, really? To me it seems more like a small step, just changing the payment model a little bit and making it a bit more convenient to provision more instances.


Actually, EC2 and AWS in general does give a lot of options for automatic failure and redundancy.

I have a highly available application running in EC2 and I have multiple instances ready for failover and scripts triggered by monitors to move into another zone if the whole zone goes bonkers. This would be much more expensive and complicated if I needed to have dedicated redundant hardware in two geographically separated datacenters.


You set those scripts up yourself though so AWS is not automatically providing it. AWS does make it easy and cheap though.


A lot of it is actually just configuration of their auto-scaling mechanism. If a web front end stops working it starts new instances according to the pool minimums and CPU load. There are some custom scripts though, mostly for migrating data to another zone.


I'd be interested to see some details on your setup.

How would it be more complicated with redundant hardware? Also, it might not be more expensive if you use something like Linode. Linode is not supposed to be "cloud" AFAIK -- those VMs sit on a single "statically allocated" box. AWS is supposed to be "cloud" in the sense that VMs can migrate between physical machines automatically.

I am searching for "aws automatic failover", but some sites seem to be down, probably due to the AWS outage :)

http://www.quora.com/What-is-the-best-automated-single-serve...


Automatic failover is not directly in the AWS tools, but its fairly simple with some bash scripts and their aut0-scaling features.

My setup just involves an autoscaling pool of web front ends connected to a mysql instance in the back and two micro instances running gluster.

Migrating to another zone involves promoting a mysql slave in another zone, creating a new web front end pool and spinning up the gluster instances. This is all in scripts I wrote. But having those redundancies available on demand at no cost when they are not used is very handy.

I do have a linode for some other projects and I may rather have this site on linode. I am not sure about having linodes spinning up and down on-demand when traffic spikes. Do they support hourly charges for short term use like this?

My primary problem at linode is getting lots of storage space. In one of my linode sites I have an s3 bucket mounted over fuse in write only mode and redirect requests for them to the s3 url. This works well in limited circumstances.

Real redundant hardware may not be more complicated, just more expensive. I do that too.


I think you're right in the sense that they make it easy to have compute units in different geographic/availability zones, which they make much easier/cheaper than doing yourself.

But it isn't something built-in with regard to fail-over handling, etc.


As popular as "cloud" solutions are becoming, I think it's more important than ever to consider these fallacies, especially fallacy number 1:

The network is reliable.


Or the even older "Don't put all of your eggs in one basket"


This compounds with other fallacies in computing. Like picking designs that behave very well on average but show very nasty behavior once in a while. In particular, beware of people talking about big O notation without mentioning worst case behavior.


Wait, what? Big O notation is all about asymptotic worst-case behavior. Anybody using it in another way is abusing the notation, although it may sometimes be useful to talk that way.

For example, a naive quicksort takes O(n^2) time. On random data, its expected runtime is O(n lg n). With ideal partitioning, it can be brought down to guaranteed O(n lg n) on arbitrary data, though this will usually make it slower in practice. But someone who says that quicksort is "usually O(n lg n)" is misusing the notation, and hopefully will quickly point this out by mentioning worst-case time.

Is this roughly what you had in mind?


  > Wait, what? Big O notation is all about asymptotic worst-case behavior.
  > Anybody using it in another way is abusing the notation, although it may
  > sometimes be useful to talk that way.
Exactly.

  > For example, a naive quicksort takes O(n^2) time. On random data,
  > its expected runtime is O(n lg n).
Great example. People jump from that to "Quicksort is O(n log n)" + mostly|on average|or something.

Computers today are extremely complex and it's getting worse. Most big-O analysis in textbooks take very misleading assumptions and only apply in a theoretical reduced framework.

For example, continuing the Quicksort case, this algorithm has somewhat unpredictable memory access and comparison results (branch prediction.) Merge sort has a penalty of a lot of memory required (n/2 but it could be mitigated) but it shows very predictable sequential memory access and in many common cases comparisons are predictable (runs of numbers lower on one sequence.) It's easy to apply non-temporal writes, for example. And it's possible to vectorize (thought it's non-trivial.)

Empirical research is the only way.


Big O can be used to represent worst-case behavior, but it can be, and is used also for average-case. And no, it is not a misuse of the notation - it is legitimately used for both (also for best-case, but I have never heard of anybody actually bothering with that). Just be sure to understand which is being talked about.


Okay, this is a subtle point, but you're right. Big-O notation describes an asymptotic upper bound on any function. If that function is average run time, then that's perfectly kosher. Actually establishing what this means, though, is a bit trickier, and something people often mess up.


I don't think it's a subtle point at all - Big-O is not inherently anything to do with worst case time complexity of algorithms, it is a general relationship between functions.


Also, beware if pre-asymptotic behavior is significant.


Open question: does Google App Engine have automatic redundancy across multiple datacenters?



I don't believe that AppEngine offers the choice of what region/zone your app runs in, so they would have to.


I think the way the "cloud" was explained for years before it became mainstream was misleading. People would explain that if there were any failures within the cloud, data would magically shift to another part of the cloud with zero downtime. Obviously this isn't true, even Google has suffered outages in their cloud.

That's not to say the solution is to use managed data centers, but we will get better and better at improving the reliability of the cloud.

The cloud is not fail-proof.


Can you make the "network is homogeneous" assumption given that all servers are at the same datacenter? It seems to me that more or less is true if you make the latency resolution at millisecond level.


If A, B, C, and D are all on gigabit on the same switch, if A and B are using their full gigabit to talk to each other, C could get significantly different bandwidths if it tries to talk to A vs. D. Is that homogeneous? Depends on how you define it but "no" is probably the more useful answer.

(There's a couple of other possible outcomes, like the switch prioritizes C, but then the A<->B link sees variable bandwidth over time, another type of inhomogeneity. One way or another something is going to give.)


Nope. Even if it was, consider how much traffic the instances around yours are producing and how that is affecting you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: