The Myers diff algorithm that is used in Git (2017)

ufo · on May 31, 2020

By the way, git also comes with other diff algorithms which may give cleaner diffs than the default one

https://gist.github.com/roryokane/6f9061d3a60c1ba41237

Those alternative algorithms can be requested with command line flags like "git diff --histogram" or "git diff --patience". The default algorithm can also be changed via the git config file.

GolDDranks · on May 31, 2020

Patience in this example looks a lot better for what I consider a good diff to do a code review with. Is there any settings you could toggle in GitHub Enterprise you could enable this diffing in the UI?

JoshTriplett · on May 31, 2020

Note that the same user also has a gist showing a case where myers gives a better result: https://gist.github.com/roryokane/79b8ebcb3813ebd934c4

ufo · on May 31, 2020

Patience diff cares about lines that only appear once in each file. The idea behind that heuristic is that lines that only appear once are more likely to represent the parts of the document that don't belong to the diff. Conversely, if a line appears more than once in the file then it is more likely to be something like a blank line or a " }", where a match in the two documents might be just a coincidence. That particular example is one where the heuristic doesn't work well. The big block of abcs doesn't have any unique lines so patience diff doesn't see much cost in including it as part of the diff.

OP's blog has another series of posts about patience diff. They are also an interesting read: https://blog.jcoglan.com/2017/09/19/the-patience-diff-algori...

kohtatsu · on May 31, 2020

Feels like there ought to be a cute scoring algorithm to determine which to use. It is a pretty important UX to get right IMO.

I reckon it'd be nice to include which to use in the commit too.

swinglock · on May 31, 2020

It'd be even nicer if git could figure it out.

From those two examples only, looks like fewer hunks are better, but so are smaller hunks, at least when the number of hunks are equal.

GolDDranks · on May 31, 2020

I hazard a guess that there doesn't exist a simple heuristic (such as "number of chunks") that always does the right thing. I think that to achieve better diffs, the diff tool should have some understanding over the structural properties of the language – but that would, of course, mean that the diff tool should be different for each language. (Or more like, there should be a parsing frontend that parses a "labeled tree" structure, and then a generic diffing algorithm over those kinds of trees.)

yencabulator · on June 2, 2020

git's diff already has language-specific configurability, for example xfuncname sets a "function declaration" or similar header (e.g. markdown heading, org-mode section) that is shown in a diff.

Similarly, one could imagine a language-specific regexp that gives "anchor points" such as declaration start, where one would prefer that a single diff hunk doesn't mutate things across that anchor, if it doesn't have to.

jdsully · on May 31, 2020

The one thing I really miss from perforce/source depot was seeing the original code block when performing a merge. Its so much more difficult to merge just seeing both new sections.

matvore · on May 31, 2020

For diffs that include the common ancestor, add this to .gitconfig:

  [merge]
  conflictStyle = diff3

yencabulator · on June 2, 2020

Git can call a whole bunch of merge visualization and conflict resolution tools. See `git mergetool --tool-help`.

lawn · on May 31, 2020

I'm currently going through his book, Building Git, and doing it with Rust. It's a lot of effort but it's very good.

mitchtbaum · on May 31, 2020

for diff algorithims in Rust, see https://nest.pijul.com/pijul_org/pijul:master/6afda19ba69307...

for a Rust diff library crates, see https://github.com/changeutils

dominotw · on May 31, 2020

hey I got this 'edit distance' question in my FB phone screen last month

https://leetcode.com/problems/edit-distance/

graph search + dynamic programming to pass the phone screen.

Ofcourse the consideration here just the smallest number of edits.

DEADBEEFC0FFEE · on May 31, 2020

I always wonder how it was done.