I'd love more info how Netflix propagates schema changes to downstream stores. How do you apply migrations to heterogeneous databases? Applying binlog messages only works if downstream stores are the same flavor database as the source. And common message formats like Avro don't have a guaranteed migration strategy like protobuf.
I suspect it's more of a process solution than a technological solution. Are non-backwards-compatible migrations scheduled in advance, and broadcast to dependent teams? Are downstream consumers expected to have a replay/dead-letter queue?
This was nice to read. According to you, what would be the minimal change to push to postgresql and mysql in order to reduce complexity and better support tools like DBLog?
Added a kafka layer to make this real time capture. I guess lot of people are trying to do this. I guess what is the keypoint i am missing here.anyone ?
DBLog has a very simple Output interface which allows to plugin a writer into whatever Output is desired: a stream (like Kafka), a datastore, a service, ...
For example one can use MySQL as a source and have ElasticSearch as a direct output, without needing to go through an intermediate stream like Kafka.
The described properties of DBLog (see blog post) hold true regardless of the output, including capturing changes in real-time and writing them to a desired output.
correct. This way we can capture create, update and delete events of individual rows. binlog_format must be set to ROW in order to make this work in MySQL. For Postgres we are using replication slots which provide row based events.
We use MySQL RDS and it has "mixed" as the default binlog_format. Mixed uses statement based logging for some event types (see MySQL docu for details). Hence statement based replication is part of the mix unless one explicitly switches to ROW based replication (which is required for DBLog).
I suspect it's more of a process solution than a technological solution. Are non-backwards-compatible migrations scheduled in advance, and broadcast to dependent teams? Are downstream consumers expected to have a replay/dead-letter queue?