The "brief explanation" is what the commit message's subject line is for :) Just turn on "show `git blame` beside code" in your IDE, and you'll get all the brief explanations you could ever want!
...but no, to be serious for a moment: this isn't really a workable idea as-is, but it could be. It needs discoverability — mostly in not just throwing noise at you, so that you'll actually pay attention to the signal. If there was some way for your text editor or IDE to not show you all the `git blame` subject lines, but just the ones that were somehow "marked as non-obvious" at commit time, then we could really have something here.
Personally, I think commit messages aren't structured enough. Github had the great idea of enabling PR code-reviewers to select line ranges and annotate them to point out problems. But there's no equivalent mechanism (in Github, or in git, or in any IDE I know of) for annotating the code "at commit time" to explain what you did out-of-band of the code itself, in a way that ends up in the commit message.
In my imagination, text-editors and IDEs would work together with SCMs to establish a standard for "embeddable commit-message code annotations." Rather than the commit being one long block of text, `git commit -p` (or equivalent IDE porcelain) would take you through your staged hunks like `git reset -p` does; but for each hunk, it would ask you to populate a few form fields. You'd give the hunk a (log-scaled) rating of non-obviousness, an arbitrary set of /[A-Z]+/ tags (think "TODO" or "XXX"), an eye-catching one-liner start to the explanation, and then as much additional free-text explanation as you like. All the per-hunk annotations would then get baked into a markdown-like microformat that embeds into the commit message, that text-editors/IDEs could recognize and pull back out of the commit message for display.
And then, your text-editor or IDE would:
1. embed each hunk's annotation-block above the code it references (as long as the code still exists to be referenced — think of it as vertical, hunk-wise "show `git blame` beside code");
2. calculate a visibility score for each annotation-block based on a project-config-file-based, but user-overridable arbitrary formula involving the non-obviousness value, the tags, the file path, and the lexical identifier path from the syntax highlighter (the same lexical-identifier-path modern `git diff` gives as a context line for each diff-hunk);
3a. if the visibility score is > 1, then show the full annotation-block for the hunk by default;
3b. else, if the visibility score is > 0, then show the annotation-block folded to just its first line;
3c. else, hide the annotation-block (but you can still reveal the matching annotation with some hotkey when the annotated lines are focused.)
Of course, because this is just sourced from (let's-pretend-it's-immutable) git history, these annotation-block lines would be "virtual" — i.e. they'd be read-only, and wouldn't have line-numbers in the editor. If the text-editor wants to be fancy, they could even be rendered in a little sticky-note callout box, and could default to rendering in a proportional-width font with soft wrapping. Think of some hybrid of regular doc-comments, and the editing annotations in Office/Google Docs.
---
...though, that's still not going as far as I'd really like. My real wish (that I don't expect to ever really happen) is for us to all be writing code as a series of literate codebase migrations — where your editor shows you the migration you're editing on the left, and the gestalt codebase that's generated as a result of running all migrations up to that one on the right, with the changes from the migration highlighted. You never directly edit the gestalt codebase; you only edit the migrations. And the migrations are what get committed to source-control — meaning that any code comments are there to be the literate documentation for the changes themselves; while the commit messages exist only to document the meta-level "editing process" that justifies the inclusion of the change.
Why? Because the goal is to structure the codebase for reading. Such a codebase would have one definitive way to learn it: just read the migrations like a book, front to back; watching the corresponding generated code evolve with each migration. If you're only interested in one component, then filter for just the migrations that make changes to it (`git log -S` style) and then read those. And if you, during development, realize you've just made a simple typo, or that you wrote some confusing code and later came up with a simpler way to express the same semantics — i.e. if you come up with code that "you wish you could have written from the start" — then you should go back and modify the earlier migration so that it's introduced there, so that new dev reading the code never has to see you introduce the bad version and then correct it, but instead just sees the good version from the start.
In other words: don't think of it as "meticulously grooming a commit history"; instead, think of it as your actual job not being "developer", but rather, as you (and all your coworkers) being the writers and editors of a programming textbook about the process of building program X... which happens to compile to program X.
I don't understand the focus on commit messages. I never read the git log. You can't assume anyone reading the code has access to the commit history or the time to read it. The codebase itself should contain any important documentation.
Well, no, because 1. it's not useful, because 2. most people never write anything useful there (which is a two-part vicious cycle), and 3. editors don't usefully surface it.
If we fix #3; and then individual projects fix #2 for themselves with contribution policies that enforce writing good commit messages; then #1 will no longer be true.
> You can't assume anyone reading the code has access to the commit history or the time to read it.
You can if you're the project's core maintainer/architect/whoever decides how people contribute to your software (in a private IT company, the CTO, I suppose.) You get to decide how to onboard people onto your project. And you get to decide what balance there will be between the amount of time new devs waste learning an impenetrable codebase, vs. the amount of time existing devs "waste" making the codebase more lucid by explaining their changes.
> The codebase itself should contain any important documentation.
My entire point is that commit messages are part of "the codebase" — that "the codebase" is the SCM repo, not some particular denuded snapshot of an SCM checkout. And that both humans and software should take more advantage of — even rely upon! — that fact.
I've been in enough projects that changed version control systems, had to restart the version control from a snapshot for whatever reason (data loss, performance issues with tools due to the commit history, etc) that I wouldn't want to take this approach.
> amount of time new devs waste learning an impenetrable codebase, vs. the amount of time existing devs "waste" making the codebase more lucid by explaining their changes.
That's a false dichotomy. The codebase won't be impenetrable if there are appropriate comments in it. In my experience time would be better spent making the codebase more lucid in the source code than an external commit history. The commit messages should be good too but I only rely on them when something is impossible to understand without digging through and finding the associated ticket/motivation, which is a bad state to be in, so at that point a comment is added. Of course good commit messages are fine too, none of this precludes them.
Agreed on most points, but a good SCM provides also the ability to bisect bugs and to show context that is hard to capture by explicit comments. E.g. what changed at the same time as some other piece of code. What was changed by the same person a few days before and after some line got introduced.
Regarding your first point:
> I've been in enough projects that changed version control systems
I have the impression that with the introduction of git, it became suddenly en-vogue to have tools to migrate history from one SCM to another. Therefore, I wouldn't settle on restarting from a snapshot anymore.
With git you can cut off history that is too old but weighs down the tools. You can simplify older history for example, while keeping newer history as it is. That is of course not easy but it can be done with git and some scripting.
> I've been in enough projects that changed version control systems, had to restart the version control from a snapshot for whatever reason (data loss, performance issues with tools due to the commit history, etc) that I wouldn't want to take this approach.
When was that? I've never seen that in 15-20 years of software development; I've seen plenty of projects change VCS but they always had the option of preserving the history.
Sure it's there, but having to wade through a large history of experiments tried and failed when trying to answer why this thing is here right now just feels subpar. Definitely sometimes it's helpful to read the commits which introduced a behavior, but that feels like a fallback when reading the code as it exists now. It works, but is much slower.
> Sure it's there, but having to wade through a large history of experiments tried and failed when trying to answer why this thing is here right now just feels subpar
That is one the downsides of trunk-based developments. One keeps a history of all failed experiments, the usefulness of the commit history deteriorates. That is for reading commit messages as well as for bisecting bugs.
> In other words: don't think of it as "meticulously grooming a commit history"; instead, think of it as your actual job not being "developer", but rather, as you (and all your coworkers) being the writers and editors of a programming textbook about the process of building program X... which happens to compile to program X.
If you have to "wade through experiments" to read the commit history, that means that the commit history hasn't had a structural editing pass applied to it.
Again: think of your job as writing and editing a textbook on the process of writing your program. As such, the commit history is an entirely mutable object — and, in fact, the product.
Your job as editor of the commit-history is, like the job of an editor of a book, to rearrange the work (through rebasing) into a "narrative" that presents each new feature or aspect of the codebase as a single, well-documented, cohesive commit or sequence of commits.
(If you've ever read a programming book that presents itself as a Socratic dialogue — e.g. Uncle Bob's The Clean Coder — each feature should get its own chapter, and each commit its own discussion and reflected code change.)
Experiments? If they don't contribute to the "narrative" of the evolution of the codebase — helping you to understand what comes later — then get rid of them. If they do contribute, then keep them: you'll want to have read about them.
Features gradually introduced over hundreds of commits? Move the commits together so that the feature happens all at once; squash commits that can't be understood without one-another into single commits; break commits that can be understood as separate "steps" into separate commits.
After factoring hunks that should have been independent out into their own commits, squashing commits with their revert-commits, etc., your commit history, concatenated into a file, should literally be a readable literate-programming metaprogram that you read as a textbook, that when executed, generates your codebase. While also still serving as a commit history!
(Keeping in mind that you still have all the other things living immutably in your SCM — dead experiments in feature branches; a develop branch that immutably reflects the order things were merged in; etc. It's only the main branch that is groomed in this fashion. But this groomed main branch is also the base for new development branches. Which works because nobody is `git merge`ing to main. Like LKML, the output-artifact of a development branch should be a hand-groomed patchset.)
And, like I said, this is all strictly inferior to an approach that actually involves literate programming of a metaprogram of codebase migrations — because, by using git commit-history in this way, you're gaining a narrative view of your codebase, but you're losing the ability to use git commits to track the "process of editing the history of the process of developing the program." Whereas, if you are actually committing the narrative as the content of the commits, then the "process of editing the history" is tracked in the regular git commits of the repo — which themselves require no grooming for presentation.
But "literate programming of a metaprogram that generates the final codebase" can only work if you have editor support for live-generating+viewing the final codebase side-by-side with your edits to the metaprogram. Otherwise it's an impenetrably-thick layer of indirection — the same reason Aspect-Oriented Programming never took off as a paradigm. Whereas "grooming your commit history into a textbook" doesn't require any tooling that doesn't already exist, and can be done today, by any project willing to adopt contribution policies to make it tenable.
---
Or, to put this all another way:
Imagine there is an existing codebase in an SCM, and you're a technical writer trying to tell the story of the development of that codebase in textbook form.
Being technically-minded, you'd create a new git repo for the source code of your textbook — and then begin wading through the messy, un-groomed commit history of the original codebase, to refactor that "narrative" into one that can be clearly presented in textbook form. Your work on each chapter would become commits into your book's repo. When you find a new part of the story you want to tell, across several chapters, you'd make a feature branch in your book's repo to experiment with modifying the chapters to weave in mentions of this side-story. Etc.
Presuming you finish writing this textbook, and publish it, anyone being onboarded to the codebase itself would then be well-advised to first read your textbook, rather than trying to first read the codebase itself. (They wouldn't need to ever see the git history of your textbook, though; that's inside-baseball to them, only relevant to you and any co-editors.)
Now imagine that "writing the textbook that should be read to understand the code in place of reading the code itself" is part of the job of developing the program; that the same SCM repo is used to store both the codebase and this textbook; and that, in fact, the same textual content has to be developed by the same people under the constraints of solving both problems in a DRY fashion. How would you do it?
Imho, you are missing out on a great source of insight. When I want to understand some piece of code, I usually start navigating the git log from git blame.
Even just one-line commit messages that refer to a ticket can help understanding tremendously.
Even the output of git blame itself is helpful. You can see which lines changed together in which order. You see, which colleague to ask questions.
One could accomplish a part of your vision, today, by splitting commits into smaller commits. Then you have the relation between hunks of changes and text in the commit message on a smaller level. Then you can use branches with a non-trivial commit message in the merge commit to document the whole set of commits.
As far as I know changes to the Linux kernel are usally submitted as a series of patches, i.e. a sequence of commits. I.e. a branch, although it is usually not represented as git branch while submitting.
...but no, to be serious for a moment: this isn't really a workable idea as-is, but it could be. It needs discoverability — mostly in not just throwing noise at you, so that you'll actually pay attention to the signal. If there was some way for your text editor or IDE to not show you all the `git blame` subject lines, but just the ones that were somehow "marked as non-obvious" at commit time, then we could really have something here.
Personally, I think commit messages aren't structured enough. Github had the great idea of enabling PR code-reviewers to select line ranges and annotate them to point out problems. But there's no equivalent mechanism (in Github, or in git, or in any IDE I know of) for annotating the code "at commit time" to explain what you did out-of-band of the code itself, in a way that ends up in the commit message.
In my imagination, text-editors and IDEs would work together with SCMs to establish a standard for "embeddable commit-message code annotations." Rather than the commit being one long block of text, `git commit -p` (or equivalent IDE porcelain) would take you through your staged hunks like `git reset -p` does; but for each hunk, it would ask you to populate a few form fields. You'd give the hunk a (log-scaled) rating of non-obviousness, an arbitrary set of /[A-Z]+/ tags (think "TODO" or "XXX"), an eye-catching one-liner start to the explanation, and then as much additional free-text explanation as you like. All the per-hunk annotations would then get baked into a markdown-like microformat that embeds into the commit message, that text-editors/IDEs could recognize and pull back out of the commit message for display.
And then, your text-editor or IDE would:
1. embed each hunk's annotation-block above the code it references (as long as the code still exists to be referenced — think of it as vertical, hunk-wise "show `git blame` beside code");
2. calculate a visibility score for each annotation-block based on a project-config-file-based, but user-overridable arbitrary formula involving the non-obviousness value, the tags, the file path, and the lexical identifier path from the syntax highlighter (the same lexical-identifier-path modern `git diff` gives as a context line for each diff-hunk);
3a. if the visibility score is > 1, then show the full annotation-block for the hunk by default;
3b. else, if the visibility score is > 0, then show the annotation-block folded to just its first line;
3c. else, hide the annotation-block (but you can still reveal the matching annotation with some hotkey when the annotated lines are focused.)
Of course, because this is just sourced from (let's-pretend-it's-immutable) git history, these annotation-block lines would be "virtual" — i.e. they'd be read-only, and wouldn't have line-numbers in the editor. If the text-editor wants to be fancy, they could even be rendered in a little sticky-note callout box, and could default to rendering in a proportional-width font with soft wrapping. Think of some hybrid of regular doc-comments, and the editing annotations in Office/Google Docs.
---
...though, that's still not going as far as I'd really like. My real wish (that I don't expect to ever really happen) is for us to all be writing code as a series of literate codebase migrations — where your editor shows you the migration you're editing on the left, and the gestalt codebase that's generated as a result of running all migrations up to that one on the right, with the changes from the migration highlighted. You never directly edit the gestalt codebase; you only edit the migrations. And the migrations are what get committed to source-control — meaning that any code comments are there to be the literate documentation for the changes themselves; while the commit messages exist only to document the meta-level "editing process" that justifies the inclusion of the change.
Why? Because the goal is to structure the codebase for reading. Such a codebase would have one definitive way to learn it: just read the migrations like a book, front to back; watching the corresponding generated code evolve with each migration. If you're only interested in one component, then filter for just the migrations that make changes to it (`git log -S` style) and then read those. And if you, during development, realize you've just made a simple typo, or that you wrote some confusing code and later came up with a simpler way to express the same semantics — i.e. if you come up with code that "you wish you could have written from the start" — then you should go back and modify the earlier migration so that it's introduced there, so that new dev reading the code never has to see you introduce the bad version and then correct it, but instead just sees the good version from the start.
In other words: don't think of it as "meticulously grooming a commit history"; instead, think of it as your actual job not being "developer", but rather, as you (and all your coworkers) being the writers and editors of a programming textbook about the process of building program X... which happens to compile to program X.