I would love to do a Kafka analysis. :-)

diggan · 2024-11-12T17:36:41 1731433001

Not that I'm a Kafka user, but I greatly appreciate your posts, so thank you :)

Maybe Kafka users should do a crowdfund for it if the companies aren't willing. Realistically, what would the goal of the crowdfund have to be for you to consider it?

jwr · 2024-11-12T17:33:43 1731432823

I'm still hoping Apple (or Snowflake) will pay you to do an analysis of FoundationDB…

tptacek · 2024-11-12T20:53:22 1731444802

I do too, but doesn't FDB already do a lot of the same kind of testing?

kasey_junk · 2024-11-12T21:03:17 1731445397

They are famous for doing simulation testing. https://antithesis.com/ Have recently brought to market a simulation testing product.

SahAssar · 2024-11-12T21:11:24 1731445884

I think they do similar testing, and therefore it might be even more interesting to read what Kyle thinks of their different approaches to it.

jwr · 2024-11-12T22:25:10 1731450310

Yes. But going through Jepsen and surviving is different. Gives an entirely new reputation to a database.

kasey_junk · 2024-11-12T22:37:56 1731451076

I don’t think ayphr would disagree with me when I say that FDB’s testing regime is the gold standard and Jepsen is trying to get there, not the other way around.

aphyr · 2024-11-12T22:55:53 1731452153

I'm not sure. I've worked on a few projects now which employed simulation testing and passed, only to discover serious bugs using Jepsen. State space exploration and oracle design are hard problems, and I'm not convinced there's a single, ideal path for DB testing that subsumes all others. I prefer more of a "complete breakfast" approach.

On another axis: Jepsen isn't "trying to get there [to FDB's testing]" because Jepsen and FDB's tests are solving different problems. Jepsen exists to test arbitrary, third-party databases without their cooperation, or even access to the source. FoundationDB's test suite is designed to test FoundationDB, and they have political and engineering buy-in to design the database from the ground up to cooperate with a deterministic (and, I suspect, protocol-aware) simulation framework.

To some extent Antithesis may be able to bridge the gap by rendering arbitrary distributed binaries deterministic. Something I'd like to explore!

tptacek · 2024-11-13T04:11:07 1731471067

This is a super interesting distinction and I'm glad I wrote my superficial drive-by comment about FDB's testing to prompt you to make it. :)

kasey_junk · 2024-11-12T22:58:46 1731452326

Has your opinion changed on that in the last few years? I could have sworn you were on record as saying this about foundation in the past but I couldn’t find it in my links.

aphyr · 2024-11-12T23:09:43 1731452983

I don't think so, but I've said a lot about databases in the last fifteen years haha.

Sometimes I look at what people say about FDB and it feels like... folks are putting words in my mouth that I don't recognize. I was very impressed by a short phone conversation with their engineers ~12 years ago. That's good, but that's not, like, a substantive experimental evaluation. That's "I focus my unpaid efforts on databases which seem more likely to yield fun, interesting results".

jwr · 2024-11-13T13:55:08 1731506108

So, will we get an evaluation of FDB one day? Pretty please? :-)

aphyr · 2024-11-13T14:27:33 1731508053

Apple is positively swimming in money! They could pay me! (Hi, Apple ;-))

kasey_junk · 2024-11-12T23:13:13 1731453193

Fair enough.

DylanSp · 2024-11-13T00:29:30 1731457770

Looks like it was an offhand tweet from 2013: https://web.archive.org/web/20220805112242/https://twitter.c.... I got that from a comment on the first Antithesis post on HN, https://news.ycombinator.com/item?id=39376195.

EdwardDiego · 2024-11-13T02:49:21 1731466161

Hey mate, think we interacted briefly on the Confluent Slack while you were working on this, something about outstanding TXes potentially interfering with consumption in the same process IIRC?

This isn't the first time you've discussed how parlous the Kafka tx spec is - not that that's even really a spec as such. I think this came up in your Redpanda analysis.

(And totally agree with you btw, some of the worst ever customer Kafka issues I dealt with at RH involved transactions.)

So was wondering what your ideal spec would look like, because I'd be interested in trying to capture the tx semantics in something like TLA+ as a learning experience - and because it would only help FOSS Kafka and FOSS clients improve, especially now that Confluent has withdrawn so much from Apache Kafka development.

aphyr · 2024-11-13T05:01:30 1731474090

I'm not really sure how to answer this question, but even a few chapters worth of clear prose would go a long way. We lay out a bunch of questions in the discussion section that would be really helpful in firming up intended txn semantics.

EdwardDiego · 2024-11-14T02:19:50 1731550790

Cheers, good place for me to start digging :)

monksy · 2024-11-12T19:04:46 1731438286

I would love to read your Kafka analysis