Hacker News new | past | comments | ask | show | jobs | submit login

> Heck, most of that could even be automated if you wanted.

I've had the idea of building something like this. The concept being that you would select an article and a time interval, and be shown the "best/most stable" revision of the article within the given window. The tool could use any number of metrics for determining which revision is best, the most reasonable one I've managed to come up with is "highest number of views during a state where the article was not locked/available to edit".




Yeah, the idea had never occurred to me until now, I'm glad to hear someone else has already thought of this.

I wasn't thinking as much as identifying any single best revision, but utilizing more of a diff-like tool to identify the text/changes that remained most stable over time -- where a brand-new edit doesn't count for much, but the longer it stays around as other edits are made, the more trustworthy it presumably is.

I think the biggest problem comes as articles get rearranged and expanded -- a section gets split into two or three, something gets moved from one section to a more appropriate one, and so forth. Or heck, sometimes entire articles get split into multiple ones, or vice-versa. I'm not aware of any diff-like tool/algorithm that handles these situations well, to accurately track how the same information gets moved when it's not just a simple case of insertion.


Huh, so a bit like git blame? And then you would merge together the chunks/edits which are most stable? That sounds awesome!!

I suppose you can't really count on the same text/markup being shifted around as articles get split and modified in the ways you've described. Also I suppose there is no such thing as a cross-article edit in MediaWiki terms iirc. Use vector embeddings? Throw an LLM at the problem? Rate editors on their familiarity with a given topic area (and track how that evolves over time)?

The idea of using edit information in addition to the raw text written by editors seems like it's extracting additional bits of information from human interactions.

I might have read some idea in a HN comment of training AI not just on code, but on how that code is edited in a git repo, or maybe I am just imagining it.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: