> Heck, most of that could even be automated if you wanted. I've had the idea of...

crazygringo · on Oct 30, 2023

Yeah, the idea had never occurred to me until now, I'm glad to hear someone else has already thought of this.

I wasn't thinking as much as identifying any single best revision, but utilizing more of a diff-like tool to identify the text/changes that remained most stable over time -- where a brand-new edit doesn't count for much, but the longer it stays around as other edits are made, the more trustworthy it presumably is.

I think the biggest problem comes as articles get rearranged and expanded -- a section gets split into two or three, something gets moved from one section to a more appropriate one, and so forth. Or heck, sometimes entire articles get split into multiple ones, or vice-versa. I'm not aware of any diff-like tool/algorithm that handles these situations well, to accurately track how the same information gets moved when it's not just a simple case of insertion.

UltimateEdge · on Oct 30, 2023

Huh, so a bit like git blame? And then you would merge together the chunks/edits which are most stable? That sounds awesome!!

I suppose you can't really count on the same text/markup being shifted around as articles get split and modified in the ways you've described. Also I suppose there is no such thing as a cross-article edit in MediaWiki terms iirc. Use vector embeddings? Throw an LLM at the problem? Rate editors on their familiarity with a given topic area (and track how that evolves over time)?

The idea of using edit information in addition to the raw text written by editors seems like it's extracting additional bits of information from human interactions.

I might have read some idea in a HN comment of training AI not just on code, but on how that code is edited in a git repo, or maybe I am just imagining it.