The beauty of Encarta and its printed predecessors is that they are unalterable snapshots of the knowledge and views of their time.
I inherited an encyclopedia from the 1930s and it is a constant source of surprise and wonder to me - both in what is different from today and in what I expect to be different but is not.
I think Encarta is the last of this generation.
Sure, I can download a snapshot from Wikipedia and I did and others did and do as well. It's still not the same, because all our snapshots are different. Anything surprising in there could just be fluke, an editing error, the temporary state of an edit war.
A Wikipedia snapshot proves nothing. My 30s encyclopedia on the other hand is a stable reference. If I doubted anything in my copy I could easily find another antiquated copy and compare. Same with Encarta, there are so many copies out there that this particular snapshot of human knowledge will never die.
Funny, everything you describe as a positive I see as a negative.
The 11th, 12th, and 13th editions of the Encyclopedia Britannica were from 1911, 1922, and 1926.
So if, as a researcher, you were interested in the state of knowledge in 1916, you're out of luck (if you're limiting yourself to a particular encyclopedia, for example).
I don't know what value you see in "stability", when the world's knowledge changes on a second-to-second basis. It's equally arbitrary whether you freeze knowledge at 1911 or in any given point in time since Wikipedia began.
When you say "A Wikipedia snapshot proves nothing", I have no idea what that means. If you're worried about vandalism or quick edits, it's trivial enough to compare with earlier and later versions of the page to ensure you're looking at relatively stable text. Heck, most of that could even be automated if you wanted.
And while Wikipedia articles are full of errors, so too were the articles in every edition of Britannica. The difference is that Wikipedia errors tend to get fixed a lot quicker, while the Britannica errors just remained on the paper they were printed on.
I genuinely don't see what difference there is between the "stability" of the Britannica 1911 edition, and the "stability" of Wikipedia at some arbitrary timestamp. Both capture a similarly arbitrary moment in time -- Wikipedia just gives you so many more to choose from.
You should especially consider the role of a historian or the person who wishes to set out to correct facts and figures that are in error in Wikipedia, this is precisely what they need to correct it, to be able to trace the lineage of the error and correct the associated facts and figures like an excel sheet would, it’s very important especially with LLM’s that can’t cite their sources and google having gone to shit to find them, errors are quickly propagated across the web like some kind of crypto mixer, it becomes difficult to verify
> Heck, most of that could even be automated if you wanted.
I've had the idea of building something like this. The concept being that you would select an article and a time interval, and be shown the "best/most stable" revision of the article within the given window. The tool could use any number of metrics for determining which revision is best, the most reasonable one I've managed to come up with is "highest number of views during a state where the article was not locked/available to edit".
Yeah, the idea had never occurred to me until now, I'm glad to hear someone else has already thought of this.
I wasn't thinking as much as identifying any single best revision, but utilizing more of a diff-like tool to identify the text/changes that remained most stable over time -- where a brand-new edit doesn't count for much, but the longer it stays around as other edits are made, the more trustworthy it presumably is.
I think the biggest problem comes as articles get rearranged and expanded -- a section gets split into two or three, something gets moved from one section to a more appropriate one, and so forth. Or heck, sometimes entire articles get split into multiple ones, or vice-versa. I'm not aware of any diff-like tool/algorithm that handles these situations well, to accurately track how the same information gets moved when it's not just a simple case of insertion.
Huh, so a bit like git blame? And then you would merge together the chunks/edits which are most stable? That sounds awesome!!
I suppose you can't really count on the same text/markup being shifted around as articles get split and modified in the ways you've described. Also I suppose there is no such thing as a cross-article edit in MediaWiki terms iirc. Use vector embeddings? Throw an LLM at the problem? Rate editors on their familiarity with a given topic area (and track how that evolves over time)?
The idea of using edit information in addition to the raw text written by editors seems like it's extracting additional bits of information from human interactions.
I might have read some idea in a HN comment of training AI not just on code, but on how that code is edited in a git repo, or maybe I am just imagining it.
This is a bit off-topic, but how do you actually browse a Wikipedia snapshot? I tried turning a dump [1] into a browseable website hosted locally a few years back, and it seemed anything but trivial. Has any progress been made here?
Could you speak to the practical values of a stabilized reference encyclopedia?
If the study of history is a long-term, iterative process, where is this stability important (beyond the filtering of short-term noise, vandalism, political influence, etc)?
I can see several advantages:
-having a "frozen" dataset, meaning that I can work on the same dataset as my colleague
-Having all the topics frozen at the same date, representing global knowledge at one point. They knew X and Y, but Z was not advanced yet, so they could not do W. It is hard to realized the concomitance of knowledge bumps on the time scale in a continuously changing encyclopedia.
I think this is mistaking the costs of publishing with the stability of knowledge.
The knowledge was never stable, even back then. The encyclopedias could only afford a team of some fixed size, to publish an edition every few years.
Wikipedia just scales that up to many editors doing real time edits. Arguably it's a more reflective representation of how organic knowledge transfer actually happens.
Citing any encyclopedia or other general reference text as a source is a warning sign, because it means that you haven't done any in-depth research, and/or you're trying to cite sources for information which is considered general knowledge in the field. Wikipedia isn't special in this regard; you're going to get marked down just as much if you cite the Encyclopedia Britannica or a dictionary.
It's a secondary or tertiary source that does a great job or collecting and summarizing primary lit (which can be cited if more rigor is preferred, at the cost of context). My college professors encouraged the use of both, maybe with the caveat of citing a particular Wikipedia revision rather than the article directly.
But seriously, these were the discussions we've had two decades ago. If a professor today had such an issue, they're very old fashioned, and I'd probably walk out of that class because who knows what else that professor is out of date on.
So many fields today are evolving so rapidly, I wouldn't trust any single expert on a topic. Better to have a living crowdsourced reference that collects many sources.
My parents bought me multiple editions of Encarta in my childhood and I absolutely loved it as we didn't have internet at that time. It was much more than an encyclopaedia; you could explore ancient sites and play educational games. It was great fun.
As great a free resource as Wikipedia is, each article's quality relies on a knowledgable contributor to really make it worthwhile, and those are few and far between. Wikipedia is dry and lacks what Encarta had
Should you want to dig deeper, and see view from the other side of the battle. (posted with distant taste of sour grapes in my mouth, aka, I played for the EB side)
Interesting comment about how pre-wikipedia encyclopedias were more engaging.
I felt this with encarta, and old printed volumes. It's written by someone, or told in a different way. While wikipedia feels more like a text book reference.
The Stanford Encyclopedia of Philosophy[1], while restricted in scope, takes the approach of having experts write the articles that are then revised and evolve with time. So a different approach is possible, but I think it would be hard for an encyclopedia with the breadth of wikipedia to do it. Maybe there is space for more specialised encyclopedias that take the approach of articles having authors and revisions instead of wikis. But all in all, I think both approaches compliment eachother.
Wikipedia is amazing, all things considered. But it's true that Encarta was a more integrated experience, with plenty of licensed images and video.
Wiki is open source, so it's limited by having access to CC-NA licensed assets only. It's also firmly an "Internet-first" encyclopedia so it's beauty comes from its SEO-friendly information architecture and internal link structure .
I'd bet it's easier for someone to find specific info on Wiki quickly, whereas Encarta would be better as a slow browsing experience.
It's understated the impact that Encarta had on Microsoft dominance in the home computer market. Encarta came about at a time when internet access was not yet commonplace and "multimedia" PCs were next big thing. Amiga, Atari and Apple had CD-ROM drives but only the PC had Encarta. It was revolutionary for 90s teens with homework assignments. These days some might use the term "killer app" to describe it. Obviously there were many other factors that led to Wintel dominance, but Encarta certainly played a part.
Encarta was a big inspiration for my Conzept project (https://conze.pt) - a topic exploration system based on Wikipedia, Wikidata and other (pluggable) datasources.
My favourite trivia for Encarta is that it was built using a rich text view + a custom hypertext system, similar to the old Windows Help format before it all became HTML and IE views.
Not mentioned: Microsoft Instruments, a wonderful way to experience music snippets from instruments around the world, which was killed via absorption into Encarta.
(I never owned Windows, so I never played with Encarta; it's possible the same experience was achievable, but I like dedicated software to play with when there's no real advantage to integrating it into a larger ecosystem.)
We sometimes forget how fortunate we are to have so much information at our fingertips. Even the richest person in the world would have a hard time having enough books to compare to, let's say, Wikipedia.
As neither this article nor the Wikipedia one provide a table with version history, I did some digging and found this table[0].
It's interesting. E.g. until "Encarta 96", the application was "16 bit". I wonder if this means that 95 and before worked on a plain 8088 5150. This 95 article[1] claims otherwise (386SX).
To be fair, I would love a curated version of Wikipedia periodically exported as a self contained reference application. I thought this would be a good use of the Internet Archive capabilities too.
At Blekko we, of course, crawled all of Wikipedia. One of the more interesting aspects was how many dead links it had to references (which presumably at one time were not dead). Sometimes you could find those references in the wayback machine and sometimes they were just "poof" gone. This is the nature of the web, information isn't persistent. Hence I think periodically pulling it into static storage would be a good thing for longevity.
> One of the more interesting aspects was how many dead links it had to references (which presumably at one time were not dead). Sometimes you could find those references in the wayback machine and sometimes they were just "poof" gone.
This is largely a solved problem. There's a number of bots on Wikipedia and other WMF wikis which periodically trigger Internet Archive dumps for all external links which are used as references, and which can replace those references with links to the archive if a site goes offline.
Are any peoprietary digital encyclopedias available nowadays? I love Wikipedia and am glad we have it for free but I also know non-free encyclopedias can happen to be much better in some specific ways.
I wonder if Wikipedia articles ever cite paid encyclopedias as sources? It's kind of a conundrum: many people who don't trust wikipedia would regard paid products as more "legitimate", but Britannica would presumably not want to be cited because that implicitly helps a free competitor.
Wikipedia started out as a copy of a public domain edition of EB.
Since then, lots of folks have edited and contributed content.
It was a pretty open secret that when Funk & Wagnalls (where Encarta got its content) was first starting that they would pay college students to write articles, the college students would crib from EB or some other similar source, and then submit that.
We had Encarta in the library at my school when I was 12. Having previously only used C64 and Amiga it was my first stint with PCs. If you needed to research something for homework say the solar system you would kindly ask the librarian for an Encarta CD-ROM and she would allocate a 45-minute slot at the library's PC.
I remember the period where Encarta allowed user contributions to their website but more moderated than Wikipedia. It's a shame that Microsoft gave up on Encarta too soon, it could have been a perfect integration with Bing AI and not subject to the vandalism and notability wars that Wikipedia suffers from.
I interned on Encarta one summer shortly before the fall, and 11 years ago I wrote a Quora answer about why I think it failed, which amusingly still gets upvotes: https://qr.ae/pK2pGA
I came here to pour scorn on the claims of an attempted multimedia CD-ROM in 1985, but turns out the first one was released in 1987! Red Book was mid '80s. I could've sworn they were a mid 90s thing but this really could have happened in 1985.
Is there any for kids Wikipedia equivalent? Targeted for 10 years old. Wikipedia is too dense and drab for someone just excited or curious about a topic.
Indeed, as the article mentions - but the UI for that was pretty bad IMO...
The Software Toolworks Multimedia Encyclopedia was better IMO (at least in terms of UI and multimedia), and then from 1995 onwards, Encarta was generally better (again IMO)...
People want me to believe that Microsoft is "new," when people like Phil Spencer have been there since the start, and worked up to things like being the head of Xbox.
I inherited an encyclopedia from the 1930s and it is a constant source of surprise and wonder to me - both in what is different from today and in what I expect to be different but is not.
I think Encarta is the last of this generation.
Sure, I can download a snapshot from Wikipedia and I did and others did and do as well. It's still not the same, because all our snapshots are different. Anything surprising in there could just be fluke, an editing error, the temporary state of an edit war.
A Wikipedia snapshot proves nothing. My 30s encyclopedia on the other hand is a stable reference. If I doubted anything in my copy I could easily find another antiquated copy and compare. Same with Encarta, there are so many copies out there that this particular snapshot of human knowledge will never die.