Hacker News new | past | comments | ask | show | jobs | submit login

> if you don't explicitly commit, how does Kafka know when you've processed the messages it gave you?

I did expect that auto-commit still involved an explicit commit. I expected that it meant that the consumer side would commit _after_ processing a message/batch _if_ it had been >= autocommit_interval since the last commit. In other words, that it was a functionality baked into the Kafka client library (which does know when a message has been processed by the application). I don't know if it really makes sense, I never really thought hard about it before!

I'm still a bit skeptical... I'm pretty sure (although not positive) that I've seen consumers with autocommit being stuck because of timeouts that were much greater than the autocommit interval, and yet retrying the same message in a loop




Here's a good article from New Relic on the problem, if you'd like more detail: https://newrelic.com/blog/best-practices/kafka-consumer-conf...

Or here, you can reproduce it yourself using the Bufstream or Redpanda/Kafka test suite. Here's a real quick run I just dashed off. You can watch it skip over writes: https://gist.github.com/aphyr/1af2c4eef9aacde7f08f1582304908...

lein run test --enable-auto-commit --bin bufstream-0.1.3-rc.12 --time-limit 30 --txn --final-time-limit 1/10000




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: