Squash your commits

WorldMaker · on April 1, 2016

Sometimes I feel like it's a minority position, but I think it strange all the efforts people go to in order to essentially make the git DAG look like a (lie of a) straight-line CVS or SVN commit list. Seeing how the sausage was actually made (no rebases, no squashes, sometimes not even fast-forwards) isn't pretty, but it is meaningful and will tell you a great deal about a project and its developers... I trust that. It's real and visceral and how software is actually made and you can learn from that or find things to explore in that jungle. Projects with multiple developers that yet have straight line commit histories and super tidy commits are aberrations and full of little lies...

Kudos to GitHub for providing this feature that a lot of people have asked for. I obviously don't plan to use it, but I appreciate that it's an option for those people that like their small, harmless lies. ;)

phasmantistes · on April 1, 2016

I actually disagree. Large teams that still have linear commit histories doesn't mean it is a lie. It means that the code review process is more important that the code writing process.

For example: I check out a repository, and create a local feature branch. I create a commit containing the tests for the new feature, then one for the first draft of the new feature, then two or three for bugfixes. Each commit is small, and self-contained, but importantly isn't standalone. If someone checked out the repository in the middle of my chain of commits, they wouldn't have a working product. Then I upload my change for code review. There's no point in reviewing each of my ~5 commits individually: they only make sense to the reviewer as a combined unit. And there's no point in landing them individually: they only make sense for the overall project history as a combined unit.

In a project with many developers (e.g. 1,000 like the Chromium project), every developer has different local practices. Some keep their work based on HEAD of master via rebase, others via merges. Some do test-driven development, some don't. Making the code review the atomic unit of work, rather than the messy string of local commits, helps the project enforce common etiquette, commit formatting, and readable history.

sulam · on April 1, 2016

caveat: I was responsible for code review for 2000+ developers.

We only allowed squash commits on master because of what you're describing. That is the level where history "made sense". However, for code review, we wanted to support both styles, because there is an advantage sometimes to seeing the sausage being made. For instance someone will refactor something -- maybe change a method name. Then they apply that refactoring at all the call sites. Very conscientious developers would break this into two commits. We didn't want the first commit on master, but it made sense to review this way, because it was easier on the reviewers: change, effect of change on everything else.

I call this "telling a story" with your commits. There's a lot of value in that style if you have the time to do it.

The other style of commit-by-commit reviewing, where I see all of the work in progress commits, I don't find valuable at all and I _definitely_ don't want to see on master.

andersonvom · on April 1, 2016

While this is "telling a story", the first commit will break your code for no good reason (i.e. if you rename a method, but not its usages, hell breaks loose). This make you lose one very useful features of git: the ability to binary search for the place where a bug was introduced - git bisect.

hinkley · on April 1, 2016

But so does squashing, which is one of a handful of reasons I hate most uses of squashing.

Only squash when it removes bug that only ever existed on your machine. Everything else should be recorded in the history. Forensics are important to the long term health of your project and you impoverish yourselves by scrubbing the crime scene.

Pyxl101 · on April 2, 2016

> Everything else should be recorded in the history. Forensics are important to the long term health of your project and you impoverish yourselves by scrubbing the crime scene.

OK. So would you support an IDE that generated one individual commit per keypress? When I type "hello", that's five individual commits each changing a single letter.

That is the true history of what happened, and it's typically recorded in your IDE's undo log in memory, such that you can step back and forward through it. Are you OK with pushing this commit history to your project? That's the true history of what happened, after all, and it could be forensically useful.

When we commit code to share it with other people, we recognize that only a certain level of detail is relevant to them. The physical sequence of letters that I pressed isn't typically relevant to someone else. Rather, the logical set of changes is what affects them.

If you agree that a single commit per letter of keypress is undesirable then you agree with me in principle. There is a finite level of detail that makes sense to share, practically, with current source control systems, and what we're arguing about is how much to share.

I feel like this is important to point out because people frequently make this argument about the "true history" of source, and all that, while neglecting the fact that the commits they choose to ship are already an arbitrary simplification of the "true history".

jasonkester · on April 2, 2016

So would you support an IDE that generated one individual commit per keypress?

That would actually be awesome, if the tools supported it. Imagine how easy it would be to find bugs with a bisect if it can drill down to the actual keystroke that introduced a bug.

Realistically, you'd want to back out a few notches though. Say, every time the dev hit Shift+Ctrl+B (or tabbed over to the browser or whatever signifies "build the project" in the environment in question), so that you get an indication that the current state of things was meant to at least compile and run.

But yeah, that's the value of source control. Being able to dig back in history to the exact spot where a bug was born. I can't understand all why so many people here would want to scrub that away.

hobs · on April 2, 2016

I am told that Google's filesystem for their dev workspace is somewhat similar to capturing every single keypress as an immutable layer forever.

Apparently introducing interesting problems when people create a log file one byte at a time, up to several hundred megs. (just 100000000 files for one little project, oops)

phire · on April 2, 2016

I actually implemented an eclipse plugin which recorded every keypress (among other things) for my 3rd year project at uni.

chris_wot · on April 2, 2016

That's pretty interesting... did you ever publish this code?

MaulingMonkey · on April 2, 2016

> OK. So would you support an IDE that generated one individual commit per keypress? When I type "hello", that's five individual commits each changing a single letter.

Not OP but - conceptually? Yes. I've written file formats which preserve undo/redo history, for example. The caveats:

1) I don't want to inadvertently leak my password. This is an issue with things such as local bash command history buffers as well, and unreviewed autocommits in general. You could say it's too useful - to the wrong people!

2) My tooling needs to be built around a different level of granularity as the default. I'm not OK with single letter commits cluttering up my git log, for example. Having them around to drill down into if I need them? Sure.

Per-letter detail is so granular that even undo/redo systems will often squash history states together. Observing your exact typing Cadence / the extra evidence of initial authorship is niche enough that I'm quite willing to sacrifice that level of detail for the sake of performance, maintainability, or basically any and every other excuse you can think of.

> When we commit code to share it with other people, we recognize that only a certain level of detail is relevant to them. The physical sequence of letters that I pressed isn't typically relevant to someone else. Rather, the logical set of changes is what affects them.

I have never seen a codebase with perfect commits. The ones that always give me trouble wrapping my head around are the squashed "logical set of changes", where the set size is way too damn big. Reverse engineering a saner overview from a series of tiny commits is way easier.

And the physical way something was done does matter at times. I'm going to pay way more attention to "whitespace cleanup" commit done by a human than "whitespace cleanup" commit done by vetted tools, for example. In whitespace significant languages, the former may trigger a full code review. Similar concerns with a lot of refactorings, actually.

> If you agree that a single commit per letter of keypress is undesirable then you agree with me in principle.

Per the above, I can only agree with you in practice :)

> There is a finite level of detail that makes sense to share, practically, with current source control systems, and what we're arguing about is how much to share.

Agreed. But I haven't found a single, solitary codebase, where I'd ever argue "less". I can't even recall a single, solitary commit where I'd have ever argued "less". Commit directly to mainline to fix a single character typo? I'll be annoyed if your changelist description was too terse! I want to see:

  Fix the build:  Fix typo 'baz' -> 'bar'
  Fix documentation typo: 'baz' -> 'bar'

I've seen a codebase where a majority (read literally: >50%) of the commits were "good enough". It was beautiful.

A coworker of mine shared he'd collected stats on who made the most commits/day to test a hypothesis. Due to some outliers, he'd decided against it, but thought I'd find the stats amusing. Out of ~50 people (~15 programmers), I (the most recently hired programmer) topped the chart. Second place? The build server account (thanks to nightly build scripts.) I was indeed amused.

chris_wot · on April 2, 2016

I'm curious, what do you think of the commit history of the last few years of LibreOffice?

https://cgit.freedesktop.org/libreoffice/core/log/

(Don't look at the OpenOffice.org years, they literally took a whole bunch if development work from SVN branches and then merged them in as a single commit and put in single line descriptions with internal tracking numbers and odd project management codes... utter disaster! And of course the branches are now all lost...)

MaulingMonkey · on April 2, 2016

How about I just look at the latest ~33h directly on that page ;)

I like the scope of a lot of those commits, although a few are still chunkier looking than I'd like - take that with a grain of salt, though, as I don't have a good enough feel for the codebase to reasonably estimate how much more they could be chunked up. Pretty much everything has a review link, which is nice. I'd expect more back and forth in the comments, but perhaps that's handled out-of-band.

I have lots of nitpicks with the actual changelist descriptions where I'd want things to improve. "Clean up" could mean just about anything - I must go to diff (and expand the context) to understand e.g. https://cgit.freedesktop.org/libreoffice/core/commit/?id=945... properly. I'd be inclined to instead write:

officeipcthread: Cleanup RequestHandler::Enable: Early bail, remove aDummy (just use aUserInstallPath directly), move declarations.

Now I know scope, and the types of changes (refactoring worth reviewing if looking for breakage, not just ignorable whitespace / comment changes.)

No gripes with the overall style on this one, although interacting with a security component, I'd want multiple reviewers: https://cgit.freedesktop.org/libreoffice/core/commit/?id=2a9... .

I'd lean towards linking a screenshot of at least the new version of UI when the file being modified "isn't human readable" (read: is modified with something other than a text editor, even if I can totally read it) which would apply to e.g. https://cgit.freedesktop.org/libreoffice/core/commit/?id=72c... .

EDIT: Formatting, + rationale RE: changelist description.

chris_wot · on April 2, 2016

Ah... sberg's one commit message that could be more descriptive :-) That NSS commit has two very experienced reviewers, but fair point! wrt screenshot linking, I guess a link to the LibreOffice bugzilla would be nice.

Thanks for the info, that's really quite insightful. Appreciate you taking the time to give it!

MaulingMonkey · on April 2, 2016

> That NSS commit has two very experienced reviewers,

I must be blind - somehow I only saw one the first time around, but they're clearly both there.

> Thanks for the info, that's really quite insightful. Appreciate you taking the time to give it!

I'll take your word for it ;). And no problem.

chris_wot · on April 2, 2016

I'd love to see a system that can show reviewers as metadata within git itself.

lerpa · on April 2, 2016

> My tooling needs to be built around a different level of granularity as the default. I'm not OK with single letter commits cluttering up my git log, for example. Having them around to drill down into if I need them? Sure.

And that would be nice to have, a vcs that allows you to fold and unfold commits to different levels of granularity.

Wanna have only a straight line history with all feature branches squashed? There you have it. Just need some of them like that? Sure enough. What if you want every time the file was saved? Not a problem, just configure it to commit automatically.

jasonlotito · on April 2, 2016

> Only squash when it removes bug that only ever existed on your machine. Everything else should be recorded in the history.

You can squash into master without losing the history of the code review with git. This gives you the best of both worlds, a more accurate history than the one you propose, and a master that isn't broken.

Pyxl101 · on April 2, 2016

I've often wondered if it would be possible to design a source control system that could track this lineage. Kind of like, "inside" of a commit would be a hidden history that you could unpack if you wanted to. "This single commit is actually the squash of these other commits made by this guy". "This commit is actually the rebase onto mainline of this commit a guy made when his machine was behind". The commit would look like normal to the current source control CLI, by default, but would reveal this hidden level of detail if you inquired "How was this commit made?" - then you'd get the reflog.

I've wondered if support for a feature like this could finally resolve the "debate" between merging and rebase. The people who like having the full history (whatever that means) would have it, and the people who care about logical changes would see it by default. The cost would be that the repo history would indeed have the full level of detail, but you could imagine supporting a cleanup operation that erases this "origin" information for commits that are sufficiently old, or else simply doesn't pull the info by default and fetches it only when necessary.

(You can kind of simulate this using merge commits, by only looking at one parent, but it doesn't really work out well in practice.)

chris_wot · on April 2, 2016

Wouldn't this be a branch?

FWIW, the OpenOffice.org guys had this insane workflow where they squashed all commits into one commit and then out a summary of all the changes into the commit message. Then they scrubbed the private branches.

That methodology is an extreme example of where squashing does NOT make sense.

superuser2 · on April 2, 2016

Have you used Phabriactor? This sounds an awful lot like a Diff in its terms. "master" is a linear sequence of Diffs being landed, with Diff number in the commit message. You can paste that diff number in the web UI and see the individual commits that were part of it.

MaulingMonkey · on April 2, 2016

> You can squash into master without losing the history

This is otherwise known as a merge. A single atomic change to master, referencing only the final files, with the first parent referencing the previous master that was also "not" broken.

It has only one change: A second parent, referencing the more accurate history.

furyferret · on April 2, 2016

What you're describing is definitely a squash.

MaulingMonkey · on April 2, 2016

Given: http://robey.lag.net/images/no-ff-5.png

If we start without M, and start with master=R2, temporary-feature-branch=R5, I expect the following outcomes:

"squash" will leave us with master=M, and the graph R1 -> R2 -> M. Once temporary-feature-branch is deleted, R3-R5 are forever lost.

"merge" will leave us with master=M, and the graph R1 -> R2 -> M, + R2 -> R3 -> R4 -> R5 -> M

"rebase" will leave us with master=R5, and the graph R1 -> R2 -> R3 -> R4 -> R5. Once temporary-feature-branch is deleted,

The first case loses the history of the code, and thus can't be what we're talking about.

If you call the second case "squash" I find that very confusing. It's also not what github is referring to when talking about "squash your commits" in the parent article, they're talking about the first case. Now, git confuses the issue a bit too - perhaps I should be labeling the second case a "non-fast-forward merge", but I'd still find that clearer than "squash without losing the history."

junke · on April 2, 2016

That's why you can "skip" a commit during git-bisect[0].

I had to find when a bug was introduced, and guess what? it was inside a single massive commit touching hundreds of files. I prefer atomic commits with good messages, thanks.

[0] https://git-scm.com/docs/git-bisect

voltagex_ · on April 2, 2016

Yeah, I wonder if I could write a checkin policy for to enforce "small" changes without it becoming too annoying.

sulam · on April 1, 2016

Which is why that commit doesn't get on master.

phasmantistes · on April 1, 2016

Yep, in chromium-land we do something similar:

You can upload multiple different versions of a single code review, and reviewers can diff both against the base and against previous versions of the review. This is helpful for showing "stories", responses to comments, and for "my original commit got reverted, so here I've reuploaded it, and then also uploaded the fix, so you can clearly see what's different this time".

chris_wot · on April 2, 2016

Is that using gerrit?

sulam · on April 2, 2016

Rietveld looks like. Which used to be the code review tool at Google, but has since been replaced. Not by Gerrit. ;)

tedmiston · on April 1, 2016

> The other style of commit-by-commit reviewing, where I see all of the work in progress commits, I don't find valuable at all and I _definitely_ don't want to see on master.

This is the style I use for code reviews.

There's not a lot of tooling to support it, at least with GitLab, but git-playback [1] is kind of interesting.

[1]: http://codingfearlessly.com/2012/05/14/git-playback/

vcarl · on April 2, 2016

I try to shoot for meaningful individual commits, squashing to get the intent of each change (more or less) self contained. I've found it really helpful if I ever need to `git bisect` to find what introduced a bug, I get a small(ish) commit that only affects a single module. Very helpful when debugging code I wrote last year.

radarsat1 · on April 2, 2016

In that sense it would be cool if git supported something more like "branch collapsing," which could be used instead of squashing a branch and rebasing that on master. It should be possible to "view" a branch as if it were a single commit on master, with its own comments etc (from the merge commit perhaps) and then expand it. Perhaps this is just a UI problem..

WorldMaker · on April 4, 2016

I do think this is mostly a UI problem. As someone else suggested in this thread sites like GitHub which show only a linear list of commits should probably default to something more like `git log --first-parent`.

There also sounds like a need for `git bisect` to be smarter about DAG traversal, in that it seems to treat the branch as if it were a flattened linear list rather than taking advantage of the merge structure of a branch. This also sounds like something that should be relatively easy to make smarter.

suls · on April 2, 2016

How did you - on that 2k+ developers project - deal with the fact that sometimes someone is working on a change in a feature branch (a) that the an other developer also needs for her feature branch (b)? Would you move the commits that need to be shared away from the features onto a seperate branch (c) and then .. ?

I am really curious how you handled these sort of situations at that scale!

jldugger · on April 1, 2016

> It means that the code review process is more important that the code writing process.

If that were true, the optimal solution is PRs with individual commits that all pass testing. I find it much easier to review a series of small changes for logical correctness than mashing them together into a single PR. Github recently added this as a feature, so I'm not in a completely invisible minority there.

And then, when the review is over, having discreet commits makes git bisecting down to the commit that broke the system more granular.

phasmantistes · on April 1, 2016

Absolutely. If someone formulates their PR such that every commit in the chain is small, easily reviewable, and passes all tests, that's fantastic! That makes reviewing code, searching history, and bisecting all easier.

Unfortunately, that's not the 90% case that I see. Most of the time a multi-commit PR contains N-1 commits of incremental development and one final one that fixes all the tests and typos and removes debugging print statements. Neither the project nor the author benefit from having those intermediate commits integrated verbatim.

jarfil · on April 1, 2016

If people generate N-1 commits with a final one to clean things up, maybe people should learn about git stash and making some WIP branches, then squashing commits themselves or better yet, keeping their own history clean, instead of submitting PRs full of crap.

I know, it might be too much to ask of people... oh well.

hinkley · on April 1, 2016

The Mikado method is sadly underemphasized.

Just because you smash away at code for hours doesn't mean that's how your commit history should look. You can revise history in ways that are beneficial instead of destructive.

cortesoft · on April 2, 2016

What is the cost of just doing it at the last merge step? Makes it easier, doesn't it?

chris_wot · on April 2, 2016

Yeah, but dreadful if you try to bisect a problem.

rjayatilleka · on April 2, 2016

Yeah, and that's why you squash all the intermediate commits first. If all your commits on master are useful and meaningful individually, bisect works great.

forrestthewoods · on April 2, 2016

> Neither the project nor the author benefit from having those intermediate commits integrated verbatim.

Sure they benefit. You can see the thought process that went into the commit. If there's something that seems weird or out of place you can see how it evolved into existence. That can be exceptionally useful.

voidlogic · on April 1, 2016

>If that were true, the optimal solution is PRs with individual commits that all pass testing. I find it much easier to review a series of small changes for logical correctness than mashing them together into a single PR.

This is too myopic and explains how you can end up with good code, but bad architecture. A good review ensures both.

I like my PRs to be about high level goals, and I want them made of lots of commits I can review. The commits themselves can be the result of re-basing (which is fine within feature branches) and maybe not truly chronological.

stingraycharles · on April 1, 2016

I agree with what you're saying, good code review and good code writing aren't mutually exclusive. However, the truth of the matter is that code review is a far more clearly defined moment in a workflow than writing good code (which is more of a habit than an actual action). As such, it is far more efficient for a team to accept reality and squash after PRs (which is now first possible), instead of relying and keep correcting all your coworkers when they don't properly squash commits before pushing.

WorldMaker · on April 1, 2016

«Making the code review the atomic unit of work, rather than the messy string of local commits, helps the project enforce common etiquette, commit formatting, and readable history.»

I agree that code reviews should be important, and key to understanding the history of a project at a more useful timescale. I just disagree that they should be "atomic" and that a reviewer (even, or especially, a later code archeologist) may not have reason to inspect or dive into smaller units within code reviews.

Where I think that we may agree is that I feel that even if they shouldn't necessarily be "atomic", I agree that code reviews should probably be first-class objects when talking about and dealing with source control. In git, you can use --no-ff merges today as a useful approximation of code review boundaries (especially with PRs and GitHub's default --no-ff and including linking PR #s). It might be nice to see code reviews or other aggregates of commits/commit graphs be truly first-class citizens of git in some manner.

procrastitron · on April 2, 2016

I agree with you, and I prefer the --no-ff merge option for submitting reviews, but I think the argument for it would be a lot stronger if 'git bisect' supported a --first-parent flag the same way that 'git log' does.

lotyrin · on April 1, 2016

Yep, the key to understanding git (why there's mutable history, etc.) is to understand that because of its history (designed by Linus) it's not designed to make day-to-day developers's lives easier, but to serve foss project maintainers and release engineers.

People who review code for inclusion in a project, want to track meta-progress on issues, want to pin versions for release, etc., mutable history means they can squash fixups or fix your whitespace for you, rebase changesets onto other changesets, have history that reflects the project management strategy, etc.

Justsignedup · on April 1, 2016

Also it is very likely that my feature branch isn't really complete during all the little commits I make. The commits are useful to me, but don't often indicate intent too much. The whole thing put together may be easier to diff when a few years in the future someone tries to figure out what happened there.

itaysk · on April 2, 2016

Regarding your example - why would you commit any of the 5 little commits, If they didn't complete the task, and also are breaking the product? wouldn't it be better to commit only when you completed the task you were working on? (I mean, if the task is made out of sub-tasks, then I would also commit each subtask or milestone that I feel is important, but this is not the case here)

Piskvorrr · on April 2, 2016

Sometimes, having a snapshot is useful even if it's not a snapshot of a completely functional system. "Hmm, this doesn't seem to be a workable approach here; let's commit, checkout an earlier version and branch off that." It's not necessary to push all these experiments (or even keep them in the end), but it might be useful to have them, at least for the moment.

ghostwriter · on April 2, 2016

That's what "git stash" is for.

Piskvorrr · on April 4, 2016

Stash is supported as a first-class citizen by exactly zero tools that I've seen - even though a stash is a commit in all but name ;)

jasode · on April 1, 2016

>Seeing how the sausage was actually made, ... it is meaningful and will tell you a great deal about a project and its developers... I trust that.
...tidy commits are aberrations and full of little lies...

...small, harmless lies.

Interesting choice of words.

Here's another way to think about squashing private commits for public consumption: programmers do not install keyloggers and upload their entire keystroke history including every Backspace and Ctrl+Z used in their text editor to the repositories. And, most of us wouldn't care about seeing it.

Whether John typed "x = 218^H^H73" or "x = 273" is a meaningless distinction and irrelevant noise. Those spurious ^H Backspaces are equivalent to the twitchy multiple commits in private branches. We really don't want to see them. Think of private noisy commits as an extended workspace of a text editor. If squashing those commits is a lie, the Backspace key without an audited keystroke log is also a lie.

Side note: The other comment downthread about keeping all private commit history for "git bisect" is a red herring. Sometimes a commit will deliberately have broken syntax -- e.g. make a quick commit before getting up to grab a soda -- it won't be CI test worthy. Besides, an automated CI server's cpu cycles can point to an upstream integration/test/qa branch instead of a programmer's private branch.

WorldMaker · on April 1, 2016

«If squashing those commits is a lie, the Backspace key without an audited keystroke log is also a lie.»

In a world with infinite storage space and a good UX on top of it, I could absolutely see a case where it might be amazing to have a source control integration with the full undo stack of my editors. VCR roll through someone's efforts Twitch style and grab a box of popcorn as you drinking game your way through their typos...

That said, I definitely will rebase/squash local WIP stuff on local-only branches on my own machine, as I see fit. Yeah, I see those as harmless lies because I really didn't build it that way, but sometimes that's what makes me feel better about publishing that work.

I appreciate you trying to push this conversation towards it's extreme, absurd ends, but I also realize that there are a lot of aesthetic judgments here and I for one lean towards keeping more of the little pieces and the interesting digressions like here's where I totally "brb grabbing a soda" the whole branch and sometimes you taking that break means easier commits for me to review when I'm reviewing your code (whether a code review in a PR immediately, or a research effort down the line) as maybe I need a review break there too. I appreciate not everyone feels the same on this topic.

jasode · on April 1, 2016

>you trying to push this conversation towards it's extreme, absurd ends,

I didn't think of my example as absurd hyperbole. People actually do use "git commit" on their local unpublished branch as another form of Backspace/Ctrl+Z/Ctrl+S. And just like every text-editor Ctrl+S keystroke is not meaningful, every "git commit" is not meaningful either. A lot of commits are just the programmer's personal unhygienic work-in-progress scratchpad stuff. It's not CI or "git bisect" worthy.

>I for one lean towards keeping more of the little pieces and the interesting digressions like here's where I totally "brb grabbing a soda" the whole branch

I won't dispute that you like whatever you like (VCR-playback of every keystroke mistake) because we're all different. However, I'm pretty sure most busy programmers reviewing pull requests will not appreciate having to wade through all the commits about "cleaned up whitespace" and "savepoint before soda run."

WorldMaker · on April 1, 2016

I think it is hyperbole given the surrounding context of this article/thread is mostly speaking to squashing commits after a review has happened in a PR. I say that not as a judgment, I like hyperbole and admit my top-most comment was intentionally hyberbolic too. Hyberbole is a good conversation to have sometimes.

A commit to git is a named snapshot of a file tree. That's it. All the other "worthiness" we ascribe to a commit is communal. A person's unhygienic work-in-progress can give you insight into their thought process or their work ethic. Why did this person give me this particular named snapshot of their file tree? I expect they are tell me (or future them) something and I can respect that artifact and preserve it, just as easily as you might angrily suffer from it and destroy it.

CI or "git bisect" "worthy" are orthogonal considerations and also vary between communities and needs.

If there is something we can agree on here: yes, all commits are not created equal. Even all commits aren't worth preserving equally. That said, I think there are tooling solutions here far beyond "destroy that what you dislike".

jasode · on April 3, 2016

>Why did this person give me this ...

Just to be clear, your premise of "give me" is flawed because not every "commit" was meant as a capital "C" Commit-a-Logical-Unit-of-Work. Instead, many commits in private local history are a more mundane commit-as-a-backup-checkpoint-savepoint. The "savepoints" are like "^H Backspace". Asking "why the programmer gave you those particular savepoints" is like asking "why did he gave you those ^H Backspace keystrokes". It's a nonsensical question.

The confusion is that the same "git commit" command is used for 2 very different semantic purposes:

(1) git commit -m "fixed bug #23984" --> as Logical-Unit-Work

(2) git commit -m "wip" --> as meaningless backup/savepoint like Ctrl+S save

The type (2) was for the programmer's internal purposes of safety backups, cleaning up whitespace, typos in comments, reflexive muscle memory of saving often, etc. They have no semantic meaning to "give you". Type (2) commits can have deliberate broken syntax and they're not meant to be built or be bisected.

Type (2) commits should never be discouraged because saving work often (including broken midstream work) is a good habit but from an outsiders perspective of the reviewers upstream, they are way too noisy. The spurious commits could be less than 30 seconds apart with no compile/build step in between.

chris_wot · on April 2, 2016

The only issue is that most people reviewing the commit history will ascribe a value to the worthiness if a commit.

Linus Torvalds has described his thoughts on rebasing:

http://torvalds-family.blogspot.com.au/2009/06/happiness-is-...

sulam · on April 1, 2016

It's not destroyed, it's simply not on master. If you want to push that branch and have the history forever, that is an option.

hinkley · on April 1, 2016

Anything not stored on the system of record cannot be counted on to exist at any point in the future.

From a strategic standpoint, thinking of that data as anything other than 'destroyed' would be a grave error of judgement. From a retention standpoint the difference between 'could' and 'did' is profound.

jasonlotito · on April 2, 2016

> Anything not stored on the system of record cannot be counted on to exist at any point in the future.

That's an issue with GitHub itself, not an issue with the method of squashing commits before merging into master. See gerrit as a prime example of separating reviewing code and your master branch.

cdcarter · on April 2, 2016

If you have an issue with people deleting branches that were intentionally preserved to save history that was then squashed onto master, that's not the fault of anyone who squashed their commits but left them preserved on a branch.

cortesoft · on April 2, 2016

As a code reviewer looking at a pull request, why aren't you just looking at the entire diff instead of individual commits? How is squashing to a single commit and looking at that different than just viewing the entire diff for lots of little commits?

paisawalla · on April 1, 2016

> you trying to push this conversation towards it's extreme, absurd ends

I would remind you that you are the one introducing morally charged terms like "lying" when describing rebasing. The fact that you cannot conceded even that those who disagree with you are at least morally good actors is the source of the extremism in this conversation.

I would encourage you to seek hard to understand how it could be that morally good people still conclude that rewriting project commit history is a desirable thing, rather than concluding that they're engaged in self/external deception. At the very least, your conversations about the topic will become more productive.

teacup50 · on April 1, 2016

It's inherently deceptive. Instead of thinking through problems to produce their solutions in an ordered, clear fashion, they make a big spaghetti mess, and then hide the sausage-making in a squashed commit that throws away work that may in fact, be very useful later on, all to present the illusion that they arrived at the visible conclusion without misstep.

That's deceptive, wasteful, and has more to do with ego or shallow ideas about "cleanliness" than it does with usefully recording our work processes.

scrollaway · on April 2, 2016

The kind of work you're thinking of is generally what you put in comments.

Someone reading the code will not be reading the commit log. If there's gotchas in the code, they should be expressed in comments, not in the commit log.

The commit log, on the other hand, should always be readable for devs, especially newcomers to the project, to get an idea of how to develop small and concise features and how to contribute. Smaller, cleaner and always-buildable commits also make for very easy git bisects.

Git isn't here to record your personal history with the code, it's here to provide developers with an understanding of how the project evolved. It's not deceptive to arrange your commits to make them clean, it's part of creating a clear, understandable, easily-reviewed unit of change.

teacup50 · on April 2, 2016

What you're missing is that "gotchas in the code" are not always known when the code is being written.

Very often, you have to piece together what happened and why from the record of what people actually did.

There's no such thing as "personal history" with the code in the context of SCM commits. It's all project history.

This new idea is both deceptive, and honestly, shocking. No SCM system previously has ever been used to actively encourage discarding history, and decades of using these SCM systems ought to have adequately demonstrated why that history matters.

jasode · on April 2, 2016

>There's no such thing as "personal history" with the code in the context of SCM commits. It's all project history.

Yes, there absolutely is "personal history" with DVCS.

Your rigid mental model for "SCM commits" is relevant for older tools like cvs/svn with lock-checkout-checkin-unlock procedures. The "commits" in those centralized SCMs are a Really Big Deal.

DVCS is different. When a programmer decides to "fork" a public repository, he copies it to "local" harddrive and it becomes a personal repository. From that fork, the programmer can also create an unpublished branch and that branch is another level of personal scratch space. The commits in this type of environment are not a big deal. With git/dvcs, it shifts the Important Action from the "commit" to the "merge/fast-forward" step. That's why many programmers rebase to optimize how that step looks.

The concept of "personal history" is built-in into the ethos of how Git and DVCS works and it's been a separate concept from "public project history" since day one. The programmer can certainly choose to share every untouched line of personal history but he doesn't have to. Same concept as not sharing a recording of memory buffers and a log of every keystroke from the text editor. Most of us don't care to see any of that.

teacup50 · on April 5, 2016

No, programming process is not different.

I'm very aware of what DVCS is. It doesn't require breaking our collaborative processes just because changes are staged locally first.

jasode · on April 5, 2016

>our collaborative processes just because changes are staged locally first.

The keystrokes of ^H Backspace are not a "collaborative process". The idiosyncratic persistence of RAM buffers of a text editor is also not a "collaborative process". Why do you think "git commit" on a local unpublished repo is collaborative? I can only guess that it's because the word "commit" is in "git commit"? Well, syntax is not semantics.

>I'm very aware of what DVCS is.

I think many people with your position know the commands syntax of DVCS but don't actually grok the "draft work" philosophy of DVCS. Therefore you assign label of "collaborative process" to unrelated concepts such as the typing habits the programmer happens to execute on his laptop's harddrive. Unfortunately, the programmer persisting his editor's RAM buffers to disk happens to use the command "git commit" which distorts the thinking and causes people to label that action as "collaborative"!

The "pull request" and "merge" is collaborative. The raw "git commit" is not.

hinkley · on April 1, 2016

Not all lying makes you a bad person (thank you so much for the gift! I love it!).

But it's still lying.

chris_wot · on April 2, 2016

Except it's not lying. If the unit of work is good, then watching the developer go backwards and forwards and making mistakes that stop the software from even compiling isn't even helpful - in fact, you can sometimes get multiple of these errors stack up in a row. Trying to bisect, even with bisect skip, is painful in these cases, and it doesn't even help with code reading.

The general rules of Linus Torvalds are the only real sane ones in this situation, at least IMO:

1. For a clean commit, make sure you rebase only on your private history. If the branch is public and in any way used by anyone else, then unfortunately it's not something that can be considered "private" any more (at least not unless you give strict warnings that no serious work should be done on the code in the branch until it is finalized).

2. Once your code is committed into a public branch or into the master branch, then you can't change it. No rebasing on this code!

3. Do NOT rebase other people's code, even if it is very messy. If you pull in messy code, then complain about their messy code or try to avoid depending on it. Which basically means that developers that collaborate need to practice pushing and merging clean code.

hinkley · on April 2, 2016

I said elsewhere that the only use of squash that I strongly agree with is to remove bugs and other dumb mistakes.

What I disagree with strenuously, to the point of anger, is having a code as where every line of blame tells me only who made a change and what feature it was for. And I know too many people who think of every feature or pull request as a single commit.

THAT is lying, and I mean that in the judgement-passing definition of the word. You're fucking up the version history intentionally, and you should be stopped.

I'm not in the habit of reading code for fun. I have a job to do, and I want to do it. If I'm looking at the version history it's pretty much only for two reasons. Maybe I know for a fact there was a clever bit of code in this file and it's gone now, so I need to find it. Was it deleted or did you split isn't off into another file? I like those but they don't happen to me very often.

More often I'm reading blame because of a wtf moment. I can't think of a logical reason why the code looks the way it does, but I need to make a change and so I need to know if the person who wrote it was trying to do accomplish something or just confused.

Commit a8bcdef by dingus Implement feature #532

That tells me what changed, who changed it, how it changed and when, but I've lost the why. I know, you probably think "feature 532 is why, duh" but it doesn't tell the story of the intermediate states and why for instance they changed 'users > 1' to 'users >= 1'.

And if there is one class of commit should absolutely and under no circumstances ever ever be squashed it's any commit where you reformatted code. If a line changes and its part of a commit for a feature I'm going to assume that method body was all you, and stop looking. If you 'wrote' it but the commit message is "pretty print" I know I have to go pull a blame on the parent version to keep digging.

chris_wot · on April 2, 2016

Ah, yeah - I agree with that last bit about formatting changes - even if I have probably unintentionally violated that in the past.

However, if you have changed users > 1 to users >= 1 and there is some sort of intrisic logic to it, surely a. you put that reasoning into the commit message, and b. you add a comment into the code briefly explaining the significance?

I don't know about you, but I'd prefer to know about why this is significant within the code, rather than have to dig through the blame logs of a version control system!

Too · on April 2, 2016

Early versions of Microsoft Word kept the undo history in the .doc file. I remember reading about a few cases where someone wrote things like "my boss is an ass", erases it and then sends the report to the boss, boss accidentally clicks undo too much and guess what he sees.

Stratoscope · on April 1, 2016

> It's real and visceral and how software is actually made...

It's also extraordinarily cluttered, and it really gets in the way when someone later want to do a 'git bisect' to track down when a bug was introduced.

When I'm working I do frequent little commits just to capture and back up my broken stream-of-thought experiments. None of those are going to be relevant to people who work with this code in the future; how does it benefit them to impose my haphazard process on them?

Of course if there's a way to break up my final commit into more meaningful smaller commits, I do that rather than one monolithic commit.

For example if I clean up some whitespace issues, add some new comments to old code, and implement a new feature, I'll put those in three separate commits even if I originally did the entire change at once. In this case you'll see more commits in the public history than I originally had.

Or if I check in a new version of some external library, add an API call that uses it along with its tests, and add UI code that calls the API, those may be separate commits in that order.

My goal is to make the public history useful to future developers.

marssaxman · on April 2, 2016

I'm curious about these workflows where history matters so much, because people using them must be doing very different things than I do. I use 'git blame' every now and then, 'git show' more frequently, 'git bisect' practically never. I think I would spend at least two orders of magnitude more time squashing commits than I ever have to spend dealing with the consequences of a non-squashed history.

I did a rebase once because there was a big mess I was trying to clean up in order to make a merge work. I would be surprised if anyone has ever cared about the details.

wylee · on April 2, 2016

I don't think anyone does `git bisect` all that often, but... when you need to do it, you hope the history is clean. Otherwise, figuring out what went wrong and where can be a nightmare.

rtpg · on April 2, 2016

But squashing makes git bisect even harder, since the chunks are bigger. If you have a very granular history then you can see exactly what 15 line change caused the breakage

(though the counterpoint is that maybe there are parts of the granular history that are just busted for other reasons)

Stratoscope · on April 2, 2016

Agreed, I certainly don't advocate squashing more commits than necessary! The granular, the better. As I mentioned, I'll often tease apart separate commits out of something that I originally stuffed into a single commit: whitespace changes, add some comments, fix a bug, all separate even if they started out as one commit.

I just don't find it useful to see every commit when someone uses Git like I do, committing all sorts of intermediate broken work while experimenting. Especially not commits that are going to be broken regardless.

zachrose · on April 2, 2016

You make a good point about bisecting. (Although I guess if you're too overzealous with rebasing you'll arrive at the commit you're looking for and it will be huge. Balance in everything yadda yadda yadda.)

Pxtl · on April 2, 2016

To me this still sounds like a display problem more than a data problem. The viewers could implicitly hide non-tagged commits and then you can expand them out to see the details if you should need.

stormbrew · on April 1, 2016

What frustrates me is that there are mechanisms to deal with this problem that maintain the DAG and history (cleaned or not), but people jump to the solution of destroying the process behind the code rather than use them.

In particular, I would kill for support for basically the `git log --first-parent` option in viewing the history of a branch on tools like github and gitlab. Rather than squashing your branch, you make your merge commit have a meaningful commit message (which you do anyways for a squashed commit) so that you don't always have to be viewing the tangled web underneath.

There should be no practical difference between a squashed commit and a merge commit from the perspective of the branch that commit is on (they represent the exact same change from parent to child), but the tooling insists on giving you the most complicated possible view all the time so there is a tangible difference.

geuis · on April 1, 2016

Couldn't disagree more. Having worked extensively on teams on both sides of this issue, I can experientially state that a well-done git rebase and commit strategy is much more useful and helpful.

In terms of feature branches:

The individual engineer is free to do individual commits in their branch as they need to in order to keep track of their work. Before they submit a pull request, they should rebase and squash all of their commits into a single one that thoroughly describes everything in the feature that is being committed. When used in conjunction with tools like Phabricator, Arcanist, and commit templates, the workflow is very smooth.

When another team member goes to code review their pull request, rather than having to examine multiple individual commits there is only a single one to examine and comment on.

Master history:

Rather than cluttering up the mainline history with 'Did this', 'Did that', 'Merged: Did this', 'Merged: Did that', 'Reverted: Merged: Did this' etc, you get a series of commits that articulately describe what each commit was for. In the event you need to revert a feature because it breaks something, its much easier to revert that single commit than trying to hunt through all of the individual commits from an engineers feature branch. And in that case, if you revert one of the commits from the feature branch it could break something else.

andersonvom · on April 1, 2016

I agree that a well-done git rebase and commit strategy is much more useful. However, squashing everything will (most likely) lead to gigantic commits that are hard to reason about.

A better approach would be to create multiple small commits that work and are self contained. It's ok for commit N to depend on the preceding commit, but each N should be able to stand on its own.

If devs rebase everything before pushing and push often (therefore also rebasing often), conflicts will happen a lot less often. Devs can also use their private branches for temporarily saving all WIP, squashing/rewording only what makes before submitting the PR or pushing to master.

WorldMaker · on April 1, 2016

I'm only arguing that what you see as clutter, I see as potentially interesting history, and a mess of merges and reverts are more interesting than people give them credit. It can tell you quite a lot about a project. (Did you learn from reverting 'Merged: Did This' anything about why it shouldn't have been merged? Did you miss something in 'Merged: Did that' that didn't quite merge easily? That's easier to find/diagnose with smaller more often merges than big evil merges...)

Anyway, to each their own, and I appreciate your preferences differ from mine.

teacup50 · on April 1, 2016

Have you tried making sense of a project 10 years old? 20? 40?

I can "experientially" state that squash throws away very necessary information for anyone trying to make sense of old code.

cwilkes · on April 2, 2016

that's more of how squash was applied in the situation ... If they had huge commits then while squash might break it up a bit that's a problem with how the people did their work.

Myself I like to do PR and merges with code that can be logically comprehended quickly and doesnt break the build.

a lot of that is due to being emphatic to people that are doing the reviewing. I know they don't know the context as they aren't in the code so I'm showing them a snippet of it, say a new method in s single commit. Then another commit for another method. Finally a third one that actually does some new functionality by combining the two.

I figure that's good enough for people to see that unit tests passed on the two methods and that the system started getting errors after the last commit so to look there for faulty logic.

teacup50 · on April 2, 2016

It's rare that the contemporary author knows what will be useful context in two decades.

dsp1234 · on April 1, 2016

isn't pretty, but it is meaningful and will tell you a great deal about a project and its developers... I trust that.

  Here is a (made up), but generally realistic git log

  git log | grep -i WIP 
  
  mon 5pm - WIP, going to work on this from home 
  tue 4:45pm - WIP, going to work on this from home
  wed 2:30pm - WIP, meeting
  wed 5pm - WIP
  thu Noon - WIP, working from the cafe on my laptop
  fri 5pm - WIP, working from home
  sat 3pm - WIP, heading home sick for the day

Does it really matter to anyone, and count as anything but noise to know that I committed my work in to the repository just so that I could work on it from a different computer. I can't imagine how low the signal to noise ratio would be if every person on the team did this.

WorldMaker · on April 1, 2016

It does tell me a great deal about your development habits, yes.

Maybe not information that I care to do much more than skim, of course, unless I'm your manager looking for reasons why you might be working from home too much. :) (That said, there's probably some cool deep learning applications here...)

Some of the information, for instance, is that maybe you are working on pieces too large at a time and should find more ways to break them into smaller units of work that you can more easily commit in logical piece at a time rather than "snapshot dumps" between computers.

Like I said, from a hyberbolic standpoint, how the sausage is made isn't pretty and is full of garbage sometimes, but it is informative.

dahart · on April 1, 2016

I'm a manager, and I'd be happy to see commit messages about working from home. My team would never write commit messages like this, they are of negative value to the team. I would ask them to squash. If I saw these commit messages pushed, the employee would get a warning.

conductor · on April 1, 2016

I can relate to that. I don't like those WIP commits on a feature branch (when the program is crippled or even doesn't compile or doesn't pass the tests, when the code is filled with temporary "printf" debug messages, etc.) to be in the history of the master branch.

Ideally, every commit that I'm making should be preferably not big, but logically complete and working. The problem is that sometimes I want to work on another branch and I have to commit in the middle of the work so I can checkout into the other branch in which case (without a squashing merge) the "offending" WIP commit would end up in the master branch's history.

Estragon · on April 1, 2016

  man git stash

kageneko · on April 2, 2016

I can't trust git stash anymore. I mistyped something and it blew away a lot lot lot of work. So now I just commit stuff. If I want to get rid of the commit, I can follow it up with a reset.

gcommer · on April 2, 2016

I use commit amends (`git commit -a`) for continually updating these pseudo-stash commits.

cname · on April 2, 2016

If you can't trust `git stash`, you can't trust git in general, because you can do all kinds of destructive things (which are usually recoverable).

dahart · on April 2, 2016

Not true. From the git stash man page: "If you mistakenly drop or clear stashes, they cannot be recovered through the normal safety mechanisms."

Git stash is more dangerous than other git commands, and I've personally witnessed more people losing work with stash accidents than with commit accidents. There may be plumbing commands that can get you there like fsck but the fact of the matter is you are better off committing or branching, from a safety standpoint.

jdfellow · on April 2, 2016

also very handy for working on multiple branches concurrently, git worktree --help

jarfil · on April 1, 2016

Get yourself a WIP branch, do whatever you want on it, extract clean commits from it, kill it with fire when finished.

ris · on April 1, 2016

I think this is a strawman - I don't think anyone is suggesting publishing WIP commits.

brianpan · on April 2, 2016

Isn't that exactly what the original commenter is suggesting?

> no rebases, no squashes, sometimes not even fast-forwards

ris · on April 2, 2016

I think the implication is rebases or squashes of already published work.

WorldMaker · on April 4, 2016

Thanks, that was indeed the primary implication.

That said, I've been happy to debate the larger implications that there are UI/UX tweaks/story-telling that could make even the work-in-progress non yet published stuff more appealing to publish without needed to mutate it from its as-developed states... But yes, I was mostly speaking to published commits.

serge2k · on April 2, 2016

Sure they are, if you don't YOU'RE A LIAR!

Pxtl · on April 2, 2016

But why delete them? Why don't the tools just hide them?

chris_wot · on April 2, 2016

I think I would go insane with commit messages like that. Sorry, but they have very little value - I'm afraid that it's not important to know that you've gone out for a meeting, a coffee or sick. Commit messages in a mainline tree should be reserved to explain what you've done, and nothing more.

I would hate to have done a bisect to land on your commit "WIP, heading home sick for the day" as the one that caused the bug. By all means, create a WIP branch and if you can then push this WIP branch to the main repo, but please use commit squashing into logical units of change when merging into the feature branch or main trunk!

voltagex_ · on April 2, 2016

Yep, on TFS where branches are a world of pain, commit messages look like this. Sorry.

krisdol · on April 2, 2016

Tell whomever is naming those commits to stop. We don't squash or do PRs, but that is not at all a realistic history.

jasode · on April 2, 2016

>Tell whomever is naming those commits to stop.

No. That advice is wrong.

Git was designed for those type of low-friction wip commits in private/local/unpublished branches. Many programmers have a concept of "savepoints". Think of "git commit" as an "implementation detail" of that savepoint. Those savepoints were intended to extend the programmer's mental scratch space and never meant for public consumption. Therefore, squash/rebase is a logical followup step to consolidate meaningless savepoints into a meaningful commit.

A private "commit" in git does not have the same semantics as the lock & checkout type of commits in svn/Perforce. Programmers should not be discouraged from making cheap commits as often as they want even with nondescriptive titles because those commits weren't meant for public dissemination anyway. Being judgmental about those wip commits makes no sense.

joshdick · on April 1, 2016

Write better commit messages. Garbage in, garbage out.

InclinedPlane · on April 1, 2016

What annoys me is that so many people take the position that there's no other way to improve the experience except for building an hazardous and error prone system into the core workflow of a tool that should ideally deal primarily with immutable history, rather than building better tooling to manage that complexity and present history in a useful way other than a raw list of commits. I mean, who would build such tools, and isn't managing complexity hard? Oh wait, it turns out we're developers, and managing complexity is the core reason of existence of all software engineering. It's really rather silly.

But it's a lot easier to convince devs that using tools with bad UI elements is hardcore and makes them look smart instead of demanding improvements.

WorldMaker · on April 1, 2016

Yeah, I definitely agree. Especially coming from a source control system before git where mutability was extremely hard (and likely to cause, deep, terrible problems) so a lot of great work had been put into making it less likely you would even need to mutate things down the road, such as good interactive defaults to help lead you through exactly what was going into a patch....

Certainly there are a lot of people that seem prefer imperative mutability, and more power to them, but maybe we learn from all of this and build better tools too.

chris_wot · on April 2, 2016

What do you propose? If you want to err on the side of commits that show a clear unit of change, with a clear commit message and you make a syntax error by accident, do you seriously think it's a good idea to force a dodgy "oops, syntax error" commit into the mainline code branch?

I agree that code once committed to master should be immutable, but your own private commits should be maleable. It's not a matter of feeling superior or trying to look smarter, it's just really commonsense that you should try to submit easy to understand code changes and remove as much unnecessary extraneous rubbish as possible.

WorldMaker · on April 4, 2016

I absolutely agree that private commits should be malleable and sometimes you do want to just pretend that you wrote things smartly the first time...

That said, there are certainly ways to better handle the "Oops, syntax error" style of commits than blowing them out after the fact. Some of that is better acknowledging the existing DAG structure of git and realizing that you don't need to 'depth-first search' the commit log every time you look at the commit log. Tools could default to views more like `git log --first-parent` and then let you expand out from there, for instance.

There's also the idea of better drawing lines between the code work and "story telling work" of a commit. Certainly today you could do all your work in progress in a branch and then `git merge --no-ff` it into another branch and there tell the completed story of that commit set after all the "dust has settled" on the stuff that was done in the heat of progress.

It's as much a matter of tooling support and community support (we all make mistakes so we aren't going to dunce cap anyone for making mistakes) as it is a technical problem.

I think we can see that "not all commits are equal" and maybe we want a system for "color these commits as work in progress" followed by "here's the final story for these commits, color it as presentable and ready for code review". I think it's an interesting discussion to continue to have and an opportunity to really think what we want in our source control tools and how they can better help us tell the stories of the code we are working on, for our reviewers and even maybe for our code archeologists.

zenlikethat · on April 1, 2016

They're not full of little lies. Projects insist on squash-before-merge to help developers keep their sanity.

Imagine working on a fast-changing block of code with 20 or so other people concurrently. If all 20 of those peoples have patches submitted to master for review, and all of those patches have multiple commits each, then every time one gets merged, the others will have to rebase to its changes and fix conflicts for _every single commit_ while the fast forward plays out. It's a horrible experience. It prevents desirable code from getting in as contributors drop out due to the browbeating.

Squashing commits can be the difference between tediously fixing something once vs. tediously fixing it 20 times. No one needs to know that you changed your mind about calling that struct "ConfigOpts" before it was ever introduced upstream.

WorldMaker · on April 1, 2016

«the others will have to rebase to its changes and fix conflicts for _every single commit_ while the fast forward plays out»

I'm mostly sort of advocating a "rebase none of the things" approach. Fix conflicts only when they happen in a branch (the GitHub PR system very nicely doesn't let you merge branches that conflict with your target branch and with CI information even better it won't let you merge branches that don't build). It's really not a bad experience.

ealexhudson · on April 1, 2016

Sometimes the scaffolding should be left in place to help show others how to build things; other times, it was put up and taken down so many times that the learning from the first few attempts isn't worth it.

I would probably be much happier not squashing stuff if you were able to bisect cleanly to points between branch merges. I don't think that's even theoretically possible; which means that when you're bisecting it's quite possible to pick up half-baked mid-branch points that you have to recognise for the broken rubbish they are - lots of false positives there occasionally, and makes bisect a lot less useful. On a repo with nice squashed commits, you tend to be able to narrow down to the feature very quickly - of course those commits then tend to be bigger etc., but I find that less of an issue.

WorldMaker · on April 1, 2016

Maybe bisect needs a flag like --start-with-merges to focus it's efforts on -no-ff merges?

Also, what if there was a tool on top of bisect that could better utilize GitHub PR JSON to target the search pattern? That could even save you some time in the case where you already have CI information attached to your PRs...

stormbrew · on April 1, 2016

There was a patch submitted in 2011[1] to do roughly that, but it got diverted into a different purpose and then seems to have disappeared. It certainly seems like it must be possible to have a git-bisect that can deal with branches.

[1] http://git.661346.n2.nabble.com/RFC-Add-bad-branch-first-opt...

kylnew · on April 1, 2016

Have you ever tried following a change in a repo that came from an unsquashed PR? It's hell.

What's truly meaningful IMO is a git log that reads like a product change log

WorldMaker · on April 1, 2016

«Have you ever tried following a change in a repo that came from an unsquashed PR?»

I have a self-congratulating black belt in source code archeology. With the right tools, most of which are on GitHub, even, such as good commit range diffing, smart uses of tags and branches, and knowing how to navigate the DAG from merge commits (more reason to -no-ff) you have a lot of power in your hands.

«What's truly meaningful IMO is a git log that reads like a product change log»

I appreciate that point of view, but I don't share it. A product change log, I feel, is a bit of marketing/PR that needs some time, love, and editing; I find a git log is for catching snapshots of raw progress and more often useful in seeing what your co-developers are up to, as they are working.

kylnew · on April 1, 2016

I think this really comes down to team. It's kind of an all-in or nothing thing and if you go against the grain everyone will just end up angry

WorldMaker · on April 1, 2016

Absolutely, not every team follows the same preferences, so with any style/aesthetic choices, be sure to know what the team zeitgeist is. (I'm not a monster, I play well on teams, I promise.)

natrius · on April 1, 2016

Try it both ways. I used to agree with you until I tried squashing after code review.

teacup50 · on April 1, 2016

As someone who has spent a lot of time tracking down the origin of source code changes -- and the reasons for them, and the implications of the change -- by reviewing commit logs, I can think of little worse, short of no commit history at all, than trying to derive anything remotely useful from a commit history that has been condensed down to a product change log.

Even commits that lack good commit messages provide valuable information in the form of insight into the author's cumulative thinking/process.

kylnew · on April 5, 2016

Ugh, the thought of scouring through commits with messages like "fixed", "oops", "done" is like fingernails on a chalkboard to me. What I read is a LACK of thought process, when you can't squash and justify your changes susinctly

DougWebb · on April 2, 2016

Do you see the commit history as a piece of living art, a representation of the community and culture that has produced it, or do you see it as a tool?

I'm an engineer, and I see the commit history as a tool. When I want to know what a block of code is for and why it was written the way it was, the commit history (if it is clean and granular) will tell me a lot about that, and will point me to authors, issues, features, and requirements where I can learn more. I don't care about the process of producing the code, I care about the end result. I get enough exposure to the process when I'm writing my own code.

WorldMaker · on April 4, 2016

I suppose I see commit history more interesting as works of art and archeology. Code history will be read more often than it will likely ever be interacted with. Like the scuffs in the marble, the fingerprints and brush strokes and little hairs in watercolor and oil painting, there can be beauty in the little flaws.

I've had it put to me that an Architect deals solely with the art of a project and a Scientist deals solely with the science and theory; it's the work of an Engineer to deal in the practical middle where art meets science (meets the real world).

Sometimes it is easy to overlook (or to want to overlook) the little bits of humanity in the machine; the various sorts of creative chaos in the vast ordered systems; the parts of the code that are art.

There's no easy answers to much of this thread, because it is art, it is aesthetics. There's no "right" answer, just "this looks good and pleasing to me and my team" and working to find that practical Engineering border space between the unwavering art of the Architect and the similarly unwavering logic and discipline of the Scientist.

tomphoolery · on April 1, 2016

There's definitely some merit in keeping the original intent of commit messages even when "committing early and often", but at the end of the day a pull request is supposed to tell some kind of story. When you `git blame` or `git log` a particular file, aren't you more interested in the higher-level changes than a single developer's own miniature storyline of how it got that way?

I'll give you another analogy...If I'm composing a song, and you're not particularly trying to learn how to write songs, what is more relevant to you, the end result or the various drafts and early versions that made it to that end result? For the majority of people who aren't interested in learning the mechanics of songwriting, the "journey" is definitely not as interesting as the destination. For nerds like you and I, that's part of the fun! So I try to keep the original intent of my commits, and preserve their messages in a bullet-pointed list format, to show the individual changes that were made in addition to a higher-level overview of the overall change to the project.

TLDR: Project-level changes are not the same as individual changes, and while both should be represented in commit messages, the project-level changes are overwhelmingly more useful in the future. Git is not about code storage, it's about code communication. It's about developers on the same team communicating with both prose and code in tandem.

kazinator · on April 1, 2016

If you rebase your unpublished work on top of some recent changes, that is in no way a "lie". Doing it via a merge is completely uninformative, in fact. Oh, George was working, by complete coincidence, at the same time, on something completely unrelated and that had to be merged against my change. Whoa, inform me more softly there, I can't take the raw cognitive overload!

Multi-parent commits complicate the use of git. When a commit has a single parent, we can pretend that it's a delta: a patch. (Like it fscking should be in a decent version control system based on some sort of patch theory!) When we "show -p" that commit, we get a diff, which is against its one and only parent. Multiple parents also complicate certain situations. They rear their ugly heads and create an ambiguity. For instance, consider git cherry-pick. If a given commit has just one parent, we can pretend that it's a delta and "git cherry-pick" it. If it has multiple parents, the ugly truth is revealed: a git version isn't actually a delta. If you want that change, you need to specify the parent!

The parent of a commit should be the thing that the work was actually based on: the work that the developer took and massaged to create the new baseline representing the commit. When you have multiple parents, only one of the parents actually meets this definition. The others are arbitrary nodes in the system which are just installed as the parents.

dahart · on April 1, 2016

Speaking of trust, I have a really hard time taking seriously the argument that squashing or rebasing is "lying". If you'd stuck to meaningful, I'd consider it seriously. But "lying"? This may be git's biggest flame war, this is an old and tired debate, and calling it "lying" is an over-the-top ham-fisted value judgement of a technical workflow that is a personal choice with legitimate reasons to go either way.

Also, Linus endorses cleaning up your WIP commits before pushing.

wheels · on April 2, 2016

I don't find "seeing how the sausage was made" simply of interest. It's often very instructive.

Often there will be counter-intuitive bits of code that make sense in "git blame" if you see the original, small, commit that they were created as part of. If they're part of a 1000+ line feature bomb, you lose that important context.

chris_wot · on April 1, 2016

This was only recently, but for a while I tried keeping my commits to as small as possible in LibreOffice so that I didn't break the build. Unfortunately this caused major issues in backporting fixes, so I changed my practice.

Whilst my small commits were squashed, it shows that even well formed smaller commits that are part of a larger change can often be problematic.

Keeping your commit history clean is important. When I'm bisecting, I don't want to see coding errors like typos and syntax errors, they literally get in the way of the bisect. And when I'm reading through a source file, I'd like to each commit to be significant, or at least entirely relevant to a change. Minor syntax error changes, whilst they can still sneak into the master repository, should be few and far between.

Basically, it also encourages unit testing, rechecking your code, continuous integration, and a raft of other good and best practices around coding. And your colleagues will thank you.

Merge squashing is actually a pretty decent way around this - I'm going to use it as my workflow now. I'll make frequently code commits on a seperate branch, then squash down into another branch, then push this.

forrestthewoods · on April 2, 2016

I'm with you. At least for most professional environments.

It's weird. Git was made for something very specific, Linux kernel development. It made a lot of decisions to support that environment. However most of us don't work in that type of environment.

If you have a private repo for your job you're in a very different environment. At my job mutating history is the opposite of what I want. I don't want people mutating history. Ever!

Part of the problem, in my opinion, is that Git encourages tiny commits. I might take it a step further and say that Git mandates tiny commits. Too tiny in my opinion. When that's forced upon you a clean mechanism is required. But I'm not sure it isn't sweeping another problem under the rug.

But I'm weird. I'm a game developer. We all use Perforce. It just works. You can't fuck it up. You can't ruin history. You can't get stuck. Artists and designers can be trained to use it from scratch in 5 minutes. It's so easy to use there aren't tens of thousands of blog posts desperately trying to explain how easy it is use.

exDM69 · on April 2, 2016

I'm from a Git background but we use Perforce at work. I wish there were blog posts desperately explaining how simple P4 is because I can't understand how to use it for software development.

To me it seems like a big dump of files like a network mount with locking and some kind of history. But how the hell is one supposed to write software with it?

It has complicated tools for sharing incomplete work. I don't know how you do code reviews but we have a Perl script for that(!).

In other words: if you think p4 is simple and git is not, it's because of your background.

forrestthewoods · on April 2, 2016

I can teach an artist or designer everything they need to know about Perforce in 5 minutes. I can spend a year helping and they still won't be comfortable with Git. Every studio needs a Git expert to help people when they have a Gitastrophe.

Perhaps your issue here is that P4 is too simple. You check a file in and it's there forever. Sync latest, change things, check in, voila! Once in P4 there is literally nothing you can do to permanently screw things up.

Binary files are locked so only one person can edit at a time. Text files can be merged. New users may need help resolving conflicts. I recommend Araxis Merge.

Code review tools always exist on top of source control. There are different tools that integrate with Git, Mercurial, SVN, Perforce, etc. What did you use at your old job that you were happy with?

I do have at least some good news for you. If your company is running a semi-recent version of the Perforce server you can use Git with it. You can work 100% in Git if that's what floats your boat. https://www.perforce.com/gitswarm

exDM69 · on April 2, 2016

The point about artists I totally get. Git isn't designed for that. But similarly I'll argue that Perforce isn't designed for programmers, or at least a group of programmers spread far and wide (geographically and/or otherwise). I'm sure even p4 works great if you're quite close to the people you share the depot with.

From my point of view, Perforce is designed from completely the wrong standpoint. The "big dump of files" paradigm makes everything strange in software development. In particular, I find it needlessly difficult to share incomplete work that isn't ready to be merged (In Git, you just commit and push, in P4, you use something else, like the shelf for that).

In contrast with the "big dump of files" paradigm, Git is "create patches and share them". The beauty of this is that "sharing" part works without some dedicated, blessed server and changesets are easily shared over e-mail or any other medium.

I must agree with you that the Git UI is quite bad, inconsistent and not easily discoverable (ie. not something for artists). But the concepts of Git are well-defined and elegant and are thoroughly explained in the README in the first commit[0] of Git(!). To mitigate the issues with the bad UI, I've always found the Git manpages to be fairly clear and useful (but not everyone agrees).

What comes to using Git client with Perforce server, I'd use it if our monorepo workflow wasn't completely incompatible with that.

For me, software development has always been about creating and sharing patches, and that's why Git makes sense for me.

[0] https://github.com/git/git/commit/e83c5163316f89bfbde7d9ab23...

forrestthewoods · on April 2, 2016

That's a fair assessment. Git and Perforce are designed for very different things. For video games, and especially non-coders, Perforce is clearly better imo. Git is made for very wide, open source projects. Private software projects fall somewhere in between. Some projects may skew one way or the other.

I'm fascinated by your use case because for my professional work I've never created or shared a patch. I just submit the new code and that's it. Possibly to some branch. More than likely straight to main. Merging heavy branches sucks. Even in Git.

But I definitely understand where you're coming from here. There's a heavy learning curve moving from SVN to Git. It's just a radically different way of thinking. You have to break your brain and reshape it. Moving from Git to non-Git is much the same.

When I worked on a Unreal Engine 3 game we often talked about doing things the Unreal way. The Unreal way sucked. It was stupid. Objectively wrong even. However it was important to do many things the Unreal way. Because if you did it your way that would only cause pain and suffering down the road. Because if you change a few bits to be your way then eventually they'll try to interact with bits that do things the Unreal way and it's bad times. So you grit your teeth and do it the Unreal way. If you use Git you gotta do it the Git way. If you use Perforce you gotta use it the Perforce way. As painful as it may be.

exDM69 · on April 3, 2016

Good points!

> I'm fascinated by your use case because for my professional work I've never created or shared a patch.

I do driver development, and often times we're debugging an issue where toggling some hardware (or software) "mode" can make an issue go away or appear. These are the patches we end up making and sharing with coworkers but never merging. We need to be able to do quick and dirty hacks but we can't be putting that kind of stuff anywhere near master branch. I'd say 4 out of 5 patches I write never get merged (and they're usually really short patches, the code itself wasn't of any measurable effort).

Moving from SVN to Git (or darcs/hg in my case) was certainly a big move. But I guess if you were coming from diff/patch/email/shell script background (like kernel before bitkeeper), it was probably very intuitive :)

forrestthewoods · on April 3, 2016

You can shelve a changelist in P4. And any other user can unshelve it. I'm not sure how different that is from creating/sending patches? Especially if done from the cmdline.

This has been a fun, educational HN thread. Thanks for sharing!

exDM69 · on April 4, 2016

> You can shelve a changelist in P4. And any other user can unshelve it.

I know, and we use this sometimes.

It's just not as easy as pushing to a dev branch in Git creating a patch with "git am" and emailing/copy-pasting to a pastebin or whatever.

Thanks for sharing your point of view and making a very civilized discussion from obviously differing opinions :)

forrestthewoods · on April 4, 2016

I've never tried this. But it looks like if you run "p4 diff -du file" it creates a patch that could be pastebinned.

https://www.perforce.com/perforce/r14.2/manuals/cmdref/p4_di...

Branching still sucks in P4. But if it improves your work flow at all I figured it's worth mentioning.

maccard · on April 2, 2016

> It just works. You can't fuck it up.

Unless someone checks out everything in the depot by accident..

forrestthewoods · on April 2, 2016

We've all done that once. For most people once is enough. ;)

vbit · on April 1, 2016

I agree with you. In fact I think the reason we have to do squashes and lose fidelity is because the git model is broken.

There is not technical reason a source control tool could not offer multiple 'views' into the commit history. A high level linear view which you can zoom into to see the underlying commits and merges. Why do I have to lose the latter to see the former?

wiml · on April 1, 2016

Bzr+launchpad does this nicely, I think. The underlying history of the merged-in branch is there, and you can look at it, but by default it looks like a single atomic change.

sethammons · on April 1, 2016

I'm not a big fan of clean history for the reasons you state. However, there is at least one big benefit of a clean history in the master branch: any commit can be checked out and assumed to have working code. This means you can use git bisect. Git bisect allows you to programmatically search through your commit history to identify when a certain behavioral change happened. If you have commits like "wip, not sure why the app wont start yet" in your master's history, you cannot leverage tooling like bisect. Let the sausage be created at the branch level, and keep master clean </0.02>.

atmosx · on April 1, 2016

Depends on two things IMHO:

(A) how important the commit is and (B) how many lines of code are changed.

Squash makes sense usually when you have have a branch with multiple small commits that affect the same thing. Doesn't make sense to squash 2 commits affecting large parts of the application just because your commit has to be squashed before merging.

dap · on April 1, 2016

I understand the point of view that it's useful to see what "actually" happened. But the other way to think about it is like this: at some point, when a developer in the future wants to understand how #master evolved over time, should the burden of linearizing the history rest with that developer (making sense of a complex graph), or with people who make changes to #master when they make them?

WorldMaker · on April 1, 2016

Obviously, -no-ff merge commits give you big sign boards for groups of changes today.

Other than that, maybe all of this is an indication of a need for a meta-UX over the change graph to annotate and describe subgraphs in new ways.

Daviey · on April 2, 2016

Projects that use a rich code review tool like Gerrit, such as Android, OpenStack and wikipedia keep the truth in the code review tool.. but the actual git tree is kept minimal and clean.

If you want to dig into the actual commit, and the dirty truth under the surface - use Gerrit.. you can even use git itself to pull down the truth.

I'm expecting this github feature to be the same, the pull request probably keeps the truth somehow.

WorldMaker · on April 4, 2016

I see this as a tooling problem. Why use two tools when you can use one? Why is "code review" not a first class citizen in your source control world?

I don't have all the answers of what the tooling should be, I just think this is as a good an opportunity to discuss it as any.

GitHub's long-standing --no-ff merges at least are one way of preserving the code review and it's internal changes directly into the git DAG. This mostly works except for tools like git bisect that treat the DAG as if it were a straight line, rather than making use of the fact that the system already supports complicated graphs.

Furthermore, along the questions of why use two tools to navigate the code repository: I sort of wish that things like GitHub PR comments and code annotations made their way somehow into nodes in the git DAG.

quanticle · on April 2, 2016

To me, the advantage of squashing is that it makes your changes atomic, like database transactions. If you need to back or forward port your functionality to another branch, it's easy. You just cherry-pick a single commit and you're done. If you don't squash, then it's not nearly as easy to automatically move features from one branch to another.

hyperpallium · on April 2, 2016

Did you hit backspace while writing this? I use explicit ^H because I'm honest.

_ugfj · on April 2, 2016

This discussion needs to happen because git is fundamentally flawed. There should be no way to change the history but at the same time you should be able to hide the immaterial commits. (Yes, bzr had this. I still mourn over the demise of bzr.)

lucashuang · on April 2, 2016

+1 for the last paragraph. As long as more people are becoming developers. This convenient feature did save a lot of headaches. At least, it saves time to those who don't want to get deeper into Github.

hashkb · on April 1, 2016

Squash isn't for linear history. That's the rebase vs merge-master-downstream debate. Squash is for getting your wip commits out of history, that kind of thing.

teacup50 · on April 1, 2016

Many of us believe there's no such thing as a throw-away "WIP" commit, and if there is, the work should be better broken up and managed by the developer so that they're committing fully considered incremental progress.

The easiest time to catch a bug is when it's hiding in a 10 line diff, especially before you commit it.

It gets progressively more difficult as the scope of the changeset grows.

Gmo · on April 2, 2016

You're not alone man, but I feel that we are indeed in the minority ...

It looks like it comes down to the style of the dev/team, because I NEVER do any kind of WIP commit.

sebastianconcpt · on April 2, 2016

I'm totally into this style. As a non-linear thinker, I value history a lot. The other style is too much into-the-box thinking.

Bjartr · on April 2, 2016

Would there be value in being able to both preserve the true history and the retcon'd one? If so, is it feasible?

matchagaucho · on April 2, 2016

I love squash. Go to the branch for the details. Keep master clean and simple.

EdHominem · on April 4, 2016

> to essentially make the git DAG look like a (lie of a) straight-line CVS or SVN commit list.

Err, it doesn't make it look like the "lie of a" straight line, it makes it into a straight line. Whatever the other developers do, I move the project forward one ball of functionality at a time when their changes are useful to mainline.

When you use software, why do you run a "release" instead of whatever happens to be in the dev's directory when they leave for lunch? Don't you feel dishonest getting the version without the bugs?

> Seeing how the sausage was actually made (no rebases, no squashes, sometimes not even fast-forwards) isn't pretty, but it is meaningful

Perhaps if I was going to hire you it'd be interesting to glance at how you work with nobody looking. Do you keep your desk tidy or not?

But it's absolutely irrelevant to the final project and as such, it shouldn't be stored.

> I trust that. It's real and visceral and how software is actually made

You should read Tracy Kidder's _The Soul of a New Machine_, it's a good read about sausage.

But it's not how you should work because you have choice now.

rimantas · on April 1, 2016

Yes it tells the story. Does it help to understand the history of the code? Not necessary. I care most about what, when and where came from, not some fiction around that.

WorldMaker · on April 1, 2016

I appreciate you using the word "fiction", but I think you have it backwards. The cleaned up post-facto linearization and "cleanup" is writing a fiction, telling a story about the changes rather than actually being the changes. This is great and I realize that that has its uses to build good stories about what our code is.

I'd like to think, however, and I think that this is my larger point in this thread, that maybe we could build better storytelling tools that don't delete/mangle/mutate the actual history so that we can see in the same repository both the story and the raw facts.

draw_down · on April 1, 2016

I disagree, it's mostly not very useful information.

cballard · on April 1, 2016

This is a bad idea masquerading as a good idea. Before making a pull request (or doing any sort of merge), you should rebase against upstream master (or whatever you're going to push to). However, keeping distinct atomic commits that change one and only one small thing, when possible, is much preferable if bisect or blame is used. If you have broken or poorly written commits, use fixup, reword, squash, etc. in rebase -i.

Using fast-forward (and possibly only allowing fast-forward) is a good idea. Squashing entire pull requests that may change multiple things into a single commit is a very bad idea.

JoshTriplett · on April 1, 2016

If someone prepares a pull request with a well-structured series of commits, making a logical series of changes, where the project builds and passes tests after each commit, then those commits shouldn't get squashed.

However, I frequently see people adding more commits on top of a pull request to fix typos, or do incremental development, where only the final result builds and passes, but not the intermediate stages, and where the changes are scattered among the commits with no logical grouping. In that case, I'd rather see them squashed and merged than merged in their existing form, and having a button to do that makes it more likely to happen.

nhaehnle · on April 2, 2016

The trick is to not squash everything into one giant commit, but to use rebase -i liberally to squash/fixup those typo fix commits where they belong.

JoshTriplett · on April 2, 2016

That's what the author of the pull request should do. But this provides a potentially useful alternative when that doesn't happen.

andersonvom · on April 1, 2016

Plain squashing commits, while still a valid option in very few cases, will likely lead to gigantic commits that are hard to reason about.

I've seen projects where maintainers clean up poor commits before merging them: rebase/squash/reword only what's appropriate.

Tyr42 · on April 2, 2016

It's also the case that you lose the code review if you force push to a PR's branch after adding in a typo fix and squashing locally, right?

That's a pretty good reason not to squash till the review is done.

JoshTriplett · on April 2, 2016

> It's also the case that you lose the code review if you force push to a PR's branch after adding in a typo fix and squashing locally, right?

Not as far as I can tell; I've force-pushed pull request branches many times, and the code reviews seem to stick around. (Perhaps they wouldn't if the code changed more drastically, like files disappearing; I haven't tried that.)

michaelmior · on April 1, 2016

I used to feel the same way (and still do to some degree). However I think the issue is more nuanced. I agree that rebasing beforehand is a good idea. But I can see the value in keeping commits on the master branch corresponding to specific features or bug fixes (which presumably map to PRs).

I think the argument can be made that if you don't feel comfortable performing a squashed merge of a PR, then that PR contains too much work and should be split up. However, I don't think there's an easy rule to decide in either case.

cballard · on April 1, 2016

Small PRs are an issue because PRs are dependent on other users and can't be dependent on a prior PR.

Let's say we're adding an interface/typeclass/protocol and a concrete implementation. I'd say these should be two separate commits, as they're adding two different things. An interface doesn't require a provided implementation to work. But, if we were to create those as two separate pull requests, it would be more work for the project maintainers, and the initiator wouldn't be able to create the PR for the concrete implementation until the interface PR was merged - the concrete PR can't be added as a dependent PR of the interface one, or something to that effect.

Since you can "compare" almost anything on Github, small commits aren't really an issue, just view a larger-scope comparison to get an idea of the whole PR.

Another way to put this might be that commits are for individual code changes that build up to a pull request, which is a conceptual change?

lomnakkus · on April 1, 2016

> and can't be dependent on a prior PR.

This pinpoints the major problem exactly. Without dependencies between PRs there's really no sane way (with this feature enabled) to submit a series of commits while expecting those commits to remain separate.

Oh, and I object to the general sentiment in the responses to your post that seem to value drive-by/inexperienced contributors over the "experts". Yes, we definitely should make things as easy/simple as possible for new contributors, but NOT at the expense of adding a gotcha for expert contributors. The experts are what keep a project going over many years instead of just releasing version upon version of trivial spelling fixes.

(And, btw, the default "merge" option for GitHub PRs also sucks. It should be possible to simply disallow non-FF merges and to force all merges to be FF. EDIT: Interestingly, this seems to be about the only workflow explicitly forbidden by the new rules... unless, of course, you're willing to merge everything manually using your local copy of the repo and pushing from that to GH.)

yxhuvud · on April 1, 2016

Yeah, it is also the obvious thing that is missing in the review stage of a pull request - viewing all the content and all the diffs separately but on one page, in a serial way that corresponds to the actual order they will show up when you do git log, all on the same page.

JoshTriplett · on April 1, 2016

This is something that Gerrit supports natively: you can have a Gerrit CL that depends on another CL. It's unfortunate that Github doesn't support any equivalent.

lomnakkus · on April 1, 2016

Yup. Happy user of Gerrit here. :)

simplify · on April 1, 2016

How does not squashing your commits help the protocol/implementation scenario you described?

knicholes · on April 1, 2016

You can merge the interface PR into the concrete implementation's PR. You don't have to work off of just one remote branch.

BinaryIdiot · on April 1, 2016

> Before making a pull request (or doing any sort of merge), you should rebase against upstream master (or whatever you're going to push to)

See, and maybe this is because I'm just dumb or something, but I have never gotten rebasing to work for me. Ever. Every single time I do it I read at east 3 articles about it so I don't screw something up, I attempt to do it and ultimately I lose a bunch of work.

I just don't get it. I can write web, mobile and desktop apps and I like to think I'm pretty good at it. But I'm one of those people who constantly have commits of merges in their code because for whatever reason I just can't get my head around making rebasing work correctly.

Am I the only one? Sorry for the derail but it's bothering me that I've never gotten this to work correctly and I feel otherwise normally smart. ¯\_(ツ)_/¯

azernik · on April 1, 2016

A few tips!

1. Always use the "upstream" branch as your rebase target - "git rebase -i master", or " git rebase -i origin/master". This is almost always what you want, and picking the wrong base is the most common error I've seen when teaching people rebase -i

2. Use autosquash! https://robots.thoughtbot.com/autosquashing-git-commits. If you have trouble with the text-editor interface you get when you run rebase -i, this will both handle its usage, and in the long run give you some visual examples of how the interface is supposed to be used. If you're really into this, set the config option "rebase.autoSquash true" to avoid the extra command-line flag.

3. If you mess up and realize in the middle, git rebase --abort.

4. Use the reflog after the fact for both finding and undoing mistakes: git diff branchname branchname@{1} to check for unintended code differences, and git reset --hard branchname@{1} to undo the rebase.

BinaryIdiot · on April 1, 2016

Thanks! I get the feeling I should give up on using a GUI for most of my git usage as doing many of these seems awkward or impossible with the GUI. That's probably part of my problem.

jdmichal · on April 1, 2016

For what it's worth, I pretty much exclusively use git through Eclipse's EGit UI. I do a lot of rebasing, as we use Atlassian's "centralized" workflow. [0] It's interactive rebase interface is pretty good. If you haven't used it, maybe try loading a repo into it and see how you do.

[0] https://www.atlassian.com/git/tutorials/comparing-workflows/...

chopin · on April 2, 2016

Interactive rebase works fine for me in Eclipse (eGit). At least on wiindows that's preferable to me over the command line editor.

However, I am squashing very rarely, mainly for commits which correct typos.

KZeillmann · on April 2, 2016

Yeah. I'd recommend that. Personally, I found that I didn't really understand git until I did it all from the command line. YMMV, but that's what made it all click for me.

sthomas1618 · on April 2, 2016

I find SourceTree to be surprisingly effective. Also, try setting Sumblkme Text to be your core.editor in git

joshuahutt · on April 1, 2016

  $ git checkout master
  $ git pull
  $ git checkout branch-name
  $ git rebase master

If there are merge conflicts, open the affected file(s) and resolve them. Then:

  $ git add filename.ext
  $ git rebase --continue

Finally:

  $ git push origin branch-name

If you've already pushed the branch, use -f. Make sure to always specify the branch name when using that flag!

mcguire · on April 1, 2016

For those cases where you have created a fork of a project and are preparing a pull request, would that be something like:

    $ git checkout master
    $ git fetch upstream  # https://help.github.com/articles/syncing-a-fork/
    $ # git merge upstream/master  # <- leaves merge commits in your fork
    $ git checkout branch-name
    $ git rebase upstream/master  # Use rebase instead of merge?
    $ git push -f origin branch-name

tunesmith · on April 1, 2016

I think the advice to rebase runs up against the business pattern of pushing your branch as soon as you create it (git-flow and a lot of jira/stash integrations work like this). Also some teams want to see evidence of your commits as you make them, which means pushing as you commit.

If you have a branch and it's already pushed, rebasing just feels kind of funny and can sometimes cause a lot of problems if anyone else has checked it out.

If you have a branch and it's local only, then merging from mainline into your branch and selecting rebase instead of merge is relatively painless.

yuubi · on April 2, 2016

> ultimately I lose a bunch of work.

One trick that's worked ok for me in a private repo is, before starting to edit the fix-spline-reticulation branch (which has a handful of separate logical changes, fixes discovered midway through a later change that really belong in an earlier change, and temporary debug code that was never meant to go into the product) for publication, to do

    git branch fix-spline-reticulation.0

(or .the-next-sequential-number). Then no matter how badly the "rebase -i master" goes, there's a branch tag pointing at the original state, and

    git branch -D fix-spline-reticulation
    git checkout fix-spline-reticulation.0
    git branch fix-spline-reticulation

will destroy the failed attempt and restore the branch to its earlier state. (Note that if you decide in the middle of the rebase that you're losing, "git rebase --abort" will undo anything you've done so far; you need the backup only if you regret the rebase after you're finished). It also makes it easy to "git diff my-feature.0..my-feature" and confirm that all the changes in the edited history add up to the same as the real history.

Sometimes I do this in the middle of development to move all the changes intended for the product ahead of the temp debug stuff in case I suspect the debug code is causing problems. Keeping the debug code in the dev branch even after the cleanup rebase makes the diff to check the rebase easier (then, of course, the merge should take the commit just before the debug).

Best never to do let anything but the cleaned-up branch hit a shared repo.

lomnakkus · on April 2, 2016

> See, and maybe this is because I'm just dumb or something, but I have never gotten rebasing to work for me. Ever. Every single time I do it I read at east 3 articles about it so I don't screw something up, I attempt to do it and ultimately I lose a bunch of work.

Rebase takes a little bit of practice, but everyone who's using git owes it to themselves to learn it by heart. It's almost like having superpowers compared to any VCS which doesn't have rebase.

My advice[1] would be to simply create some dummy repository (perhaps just copy an existing repository with some real code) and going through various scenarios described in the git-rebase man page (using some trivial changes). If something blows up, don't worry, you can always just start from scratch.

The key to making rebase work for you is: 1) understanding the underlying model of git[2], and 2) practice, practice, practice. With enough practice you'll get a good feeling for which "type" of rebase works best in a given situation.

[1] In addition to the excellent advice given by others in this thread.

[2] It may look like it's really all based on snapshots of files, but the workflows are definitely mostly centered around patch-based thinking.

nugator · on April 1, 2016

I know that I probably swear in church since git is the current de facto standard for version control but this shows that gits usability is way too low. Why do I have to invest so much time to understand the inner workings of a tool that should just help me collaborate with my coworkers? I've given up on understanding git and use gitflow and the built in tools in Intellij Idea for all my branching/merging/committing needs.

2muchcoffeeman · on April 1, 2016

While I like the git flow model, I find the git flow plugin really useless.

You lose so much of the power that makes git such an awesome tool.

Same with every single front end to git I've ever tried, free or purchased. I always come back to the CLI because it's so much more powerful.

The IntellJ merge tool is pretty nice though.

azernik · on April 1, 2016

Because this is a quite advanced feature, and creating an interface for it is inherently hard. IIRC, the IDEA interface for interactive rebase ends up looking exactly the same as the text-mode one.

jarfil · on April 1, 2016

Version control is hard, simple as that. Git actually makes a great job at keeping simple stuff simple, but if you need some of the more complex stuff... well, I guess you could always go back to pen, paper and a secretary (aka: make someone else do it for you).

Estragon · on April 2, 2016

  ultimately I lose a bunch of work.

Take a copy of the entire repository before attempting anything potentially destructive.