Probably a FAQ, but the example about network partition leaves me wondering : if two clients talk to a different leader of subnetwork, the algorithm guarantees that eventually, one of the leader will step down, and both clients will eventually see the same operation log.
Does it mean that the client should implicitly wait a bit before "trusting" their server, to be sure ? What happens if you take a wrong decision based on an outdated log that is eventually rolled back ?
(Or it simply the same thing as with any eventually consistent system - you should _not_ have irreversible side effects that depend on the value of a log in the system, at all ?)
That's the definition of committed. So you won't get a committed write or read from a partitioned
"zombie" leader.
In good (non-buggy) Raft implementations, things
are set up so there can
only ever be at most one leader at a time.
That is the point of the election,
the pre-vote, the leader-leases, the
sticky-leader, and the leader-must-step-down
rules (from chapter 6 of the dissertation).
Yes those optimizations are
described as optional in the Raft dissertation.
But really they are not.
You have to read the whole dissertation
to the details and full picture.
You should have all of them as a part of a complete and
sane implementation.
Pre-vote and sticky-leader are needed for
liveness.
Leader leases are possible because you
have pre-vote and sticky leader on; and they
give you fast local-only leader reads.
The leader step-down rule prevents client
liveness and multiple leaders at once.
Even if you do have multiple leaders at once,
only one will actually be able to commit, so
its not a problem with safety or consistency,
just with liveness/not wasting the client's time
talking to "a dead man walking".
Does it mean that the client should implicitly wait a bit before "trusting" their server, to be sure ? What happens if you take a wrong decision based on an outdated log that is eventually rolled back ?
(Or it simply the same thing as with any eventually consistent system - you should _not_ have irreversible side effects that depend on the value of a log in the system, at all ?)