Don't Settle for Eventual Consistency (2014)

taeric · on July 8, 2017

Alternatively, don't require strong consistency everywhere. Instead, make sure to have it places it makes sense, and always reason about it.

I view this as a generalization of optimistic locking on source control. I don't envy my past self for having to checkout code with a lock that I was going to change it.

sriram_malhar · on July 8, 2017

Are you saying that optimistic locking isn't strongly consistent? If so, that's not quite correct. When the locking is optimistic, a serialisable transaction would abort if found conflicting with another transaction.

taeric · on July 9, 2017

I'm saying that requiring strong consistency is like requiring pessimistic locking in an org. In some ways, it sounds easier and seems to have better guarantees. In practice, it is more work and slows you down.

sriram_malhar · on July 10, 2017

Strong consistency has strong guarantees, not just 'seems' to have -- hence the name.

In practice, strong consistency is less work for the client in many many (most?) cases. If concurrent clients can update the same key concurrently in several places, then the onus is on the client to somehow merge the differing versions at some point. There are few good options to do so correctly and efficiently. Let's consider the options.

a) Timestamps and last writer wins. Every version is time-stamped, and while merging, the latest timestamped version wins out. This is error-prone; a misconfigured clock can cause havoc.

b) Version-vectors: Allow you to track the causal history of different versions, but you still have the problem that if two versions are indeed concurrent, the client has to have code to merge them automatically -- experience with the Bayou system showed that this isn't easy. Your source-code example is in this category; recall how Git cannot always automatically merge conflicting changes ... the developer has to step in for manual changes.

3) Lattices. If you have the good fortune to have your data exhibit lattice properties (like CRDTs), then automatic merging is easy and elegant. However, you still don't get compositionality.

For a vast majority of the examples in recent history that have made a case for eventual consistency, subsequent examples have shown how to achieve strong consistency and still maintain respectable performance, as the subject of this thread indicates. Google Spanner and CockroachDb are linearizable, the strongest possible consistency, and are geo-replicated and unburdened by traditional lock or 2PC issues.

taeric · on July 10, 2017

Yeah, I'm not saying to completely abandon it. However, folks are well served learning that idempotency is your friend. Not always achievable, to be sure. However, even ec2 lets you launch instances with an idempotency key.

So, where I said "seems" to have. Take that more as these "seem to be required guarantees." In practice, they are not required and you need to have plans for them failing, at any rate. Rather than build up a system so that fluke failures are painful, make one that is constantly checking for failures and doing incremental work.

But, above all, my point was to reason about it. Not to take anything as written in stone.

irfansharif · on July 10, 2017

CockroachDB’s external consistency guarantee is by default only serializability. We do provide a limited form of linearizability however, providing ordering for any observer or chain of observers[1].

[1]: https://github.com/cockroachdb/cockroach/blob/master/docs/de...

pdexter · on July 8, 2017

See previous comments here https://news.ycombinator.com/item?id=7632303

ianamartin · on July 8, 2017

I would love to see a Jepsen evaluation of this system. I am not an expert in this area, but it still strikes me as eventually consistent. Discussions around data stores need to be bounded by what the data store can guarantee. Not what optimistic conditions allow for. The paper is pretty light on details about how the dependency graph works. I think that in theory, there is an interesting idea here that basically packages a dependency graph along with every message. But consider this failure case:

t0: alice "I lost my wedding ring." t1: alice "Nevermind, I found it."

Because of the peculiar circumstances around traffic at this moment, the nodes closest to Alice don't get that second message. Instead, the nodes closest to bob receive the message, and bob sees the message before alice does. Bob replies, and because everything is okay with the nodes close to him, the message gets passed back to alice. What happens? The node alice is reading from is having issues, so things are re-routed to bob's cluster. Bob's message has a timestamp that legitimately puts it in between the two messages from alice because the server clocks are off a little.

Now, whatever was going wrong with the cluster close to alice has repaired and it receives bob's message with a node in the graph that alice's cluster is completely unaware of because it hasn't caught up.

Bob's message is telling the cluster, "You can't display this message (c) unless you have already acknowledged prior messages (a and b)." Alice's cluster says, "I have message a, but b doesn't exist."

According to the paper, bob's message will never be delivered to alice until the cluster closest to alice sync's up with some other remote cluster and her second message is restored. Which, of course, we know can fail to happen with Cassandra, and the projects mentioned are forks of Cassandra.

I can't prove it formally, but this feels like the problems you run into with messaging queues: you can guarantee at least once delivery or at most once delivery, but not both at the same time. But a different set of tradeoffs: under this theoretical model, you can deliver in order or not at all.

Obviously, this is a use case facebook would care about. But I don't know that it's generally applicable for very many use cases.

Unless I'm missing something important, transmitting the graph doesn't really do anything to change the fact that this is eventual consistency and brings all of the problems that come with that. I guess the difference between this and eventual consistency and causal consistency is that one provides out-of-date/out-of-order data in an error condition, and the other presumably provides nothing but an error. YMMV.

But to a larger point, calling a combined system of client libraries with a forked version of Cassandra some other-category-of-consistency datastore seems to me to be disingenuous. You're no longer talking about a data store. You're talking about an ecosystem of libraries and developer practices and a datastore that work together to provide some workarounds for the challenge that CAP theorem presents. This is, unless I'm badly misreading the article, granting a lot of leeway.

I also find it unfortunate that the optimized version of this only compared throughput without discussing failure cases at all. CAP theorem is all and only about what can be guaranteed in failure cases. We need to agree from now on that if we're going to talk about CAP theorem, that we are talking about guarantees in failure modes, not optimistic modes. Otherwise, there is no value found by invoking the theorem.

And this strikes me as a particularly bold assumption given almost anyone's real-world experience: "It is partition-tolerant because all reads are in the local data center (partitions are assumed to occur only in the wide area, not in the local data center)."

No, kind sir. It is not partition tolerant if that is your assumption. It has already failed the P part of CAP if that's an assumption you are relying on.

I'm not trying to shoot the article or the authors down here. Now that CAP theorem has been proved to be a conceptual "pick 2" situation, of course we have to do everything we can to make up for that and do our best, and this seems like a great effort at that.

But I think that the authors fundamentally misunderstand the point of writing a paper about a data store and its relationship to CAP theorem. It's not about optimistic cases, software development patterns, or libraries that sit on top of the data store.

It's about guarantees of what happens in failure modes. From reading this paper, all I can tell with certainty is that it's not consistent, and it's not partition-tolerant. It's highly available. To me that seems like it's a step backwards from platforms that are, or at least aim to be, CA or AP.

wuch · on July 8, 2017

That is good point, so far I see one potential explanation, but it presumes that clients are on equal footing with other peers in the system. Whenever you send a write and destination discovers that it doesn't have all of writes operations this write depends on, you send them as well. On the other hand, if clients are tracking only dependencies without actual data behind them, then I don't see how to ensure availability - maybe with fixed entry point to the system (one for each client), but then it is not exactly available.

EDIT:

I did look at the actual paper [0], they provide a strong consistency within local data center, and a causal consistency with convergent conflict handling across data centers. They presume that each client resides within a data center, so it is essentially a variant of fixed entry point solution I mentioned earlier.

Additionally, what it quite important, you lose causal consistency if client connects to different data centers.

[0] Don't Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS