Sure. I'm just talking about binary deploy. Of course you're gonna use flags to ...

ratorx · on Nov 10, 2022

Well, if rollbacks (which shouldn’t require affecting the source tree) are a thing then that. Otherwise you’re right and code needs to be reverted and hope it works.

But in the 2nd case, I’d make sure to increase the priority of having known good rollback versions available (and rollbacks performable) and also carefully consider what CI/CD could be added to catch more broken binaries (e.g canary or staging if it’s important enough) and what code review practices could have prevented it.

throwaw20221107 · on Nov 10, 2022

Ok I agree, you roll back to the known working version. The easiest way to do that is revert the whole PR (or data deploy in case of flags, ofc). My point is not "flags vs. no flags". My point is "each PR should generate one commit because that's easy to revert".

The commit dag of git is a cool feature but shouldn't be in `main`. It's so much easier to work with a linear history and one where each commit contains all the required context to figure out "could this have broken something".

ratorx · on Nov 10, 2022

I don’t think reverting the broken code should be on the critical mitigation path at all.

Imagine a scenario where a bug didn’t immediately cause an issue or where your release contains more than 1 new PR. If you suspect the latest version of the binary is broken, your first instinct should be to use a version that isn’t. Figuring out the change and rolling it back should come after the rollback, when you have more time to think.

Deciding whether to revert a change is tactical question. Often the issue will be because you tickled an unknown bug in a different part of the code. In that case, it’s a lot easier to fix forward than revert the code that tickled the big and go through the multiple steps of fixing the bug and redoing.

ratorx · on Nov 10, 2022

Linear history is good, and having multiple commits in a PR doesn’t prevent it. The only change is adding n (ideally well crafted) consecutive commits rather than 1.

throwaw20221107 · on Nov 10, 2022

The problem is who is doing the crafting. (And the approving.) At least squash based approach limits to 1 the number of commits an untrustworthy "crafter" can occupy.

ratorx · on Nov 10, 2022

I think most code review already requires good faith on behalf of the reviewer already.

I do see what you mean about untrustworthy crafter. If we want to preserve master history, then the damage of a bad commit chain is worse than of bad code (which can be fixed/undone).

However, I think that the truly adversarial case is rare (and an exception could be made and master history be rewritten in that case). In most cases though, hopefully your coworkers and not deliberately trying to sabotage the codebase :). And I don’t think the commit chain needs to be a work of art or anything, just mainly avoiding typo commits and similar, so it shouldn’t be difficult to do when approached in good faith.

throwaw20221107 · on Nov 10, 2022

> the truly adversarial case is rare

The problem is not adversarial, or due to malice. The problem is ignorance and expediency driven by a desire to push code and little incentive to cleanup your git history.

The easy fix is to squash PRs.

The hard fix is to enforce that devs become "crafters" and to define what is and isn't "good faith".

throwaw20221107 · on Nov 10, 2022

Ratorx, do we work on the same team at google? I feel like I've heard you before?

ratorx · on Nov 10, 2022

Possibly? I’m an SRE. I don’t think my position is too different from the SREs I know :)