Early on, the Web was imagined to have a plethora of "agents" (bots) which act on behalf of their users, extracting information from pages, following links, aggregating the relevant data, etc.
This worked really well for search engines, but certain economic incentives crept in which pretty much killed off this idea. In particular:
- Many sites gained revenue by advertising
- Advertising only really works if a human is viewing it (it's possible to integrate advertising into the data, with product placement, "native advertising", etc. but that's more expensive than slapping some Javascript in some iframes)
- Exposing data publically in easily parsed formats will mostly stop humans from viewing the page. Hence so much data is now in silos.
- Exposing services, like search, to bots can get expensive. Hence the rise of API keys.
There are other reasons too, like scraping being hard and fragile due to styling having too much influence over markup (CSS helped a little; a more radical attempt was sending raw XML and creating the page via XSLT). The rise of Javascript-only sites, and the difficulty of implementing all of the many JS APIs that a site might try to use, has also hindered alternative user agents.
The end result is the Web being dominated by a handful of user agents: a few search crawlers and a few user-facing browsers :(
This worked really well for search engines, but certain economic incentives crept in which pretty much killed off this idea. In particular:
- Many sites gained revenue by advertising
- Advertising only really works if a human is viewing it (it's possible to integrate advertising into the data, with product placement, "native advertising", etc. but that's more expensive than slapping some Javascript in some iframes)
- Exposing data publically in easily parsed formats will mostly stop humans from viewing the page. Hence so much data is now in silos.
- Exposing services, like search, to bots can get expensive. Hence the rise of API keys.
There are other reasons too, like scraping being hard and fragile due to styling having too much influence over markup (CSS helped a little; a more radical attempt was sending raw XML and creating the page via XSLT). The rise of Javascript-only sites, and the difficulty of implementing all of the many JS APIs that a site might try to use, has also hindered alternative user agents.
The end result is the Web being dominated by a handful of user agents: a few search crawlers and a few user-facing browsers :(