There are claims that Google indexes 150B pages today. Even if you settle with 1/3rd of this and take in to account that average page size for HTML alone is 200KB, it would be evident that you need 10s of millions of dollars to pay towards hardware. Remember fetching pages costs, storing them, processing them etc costs lots of money and building even semi-scalable index which have in-memory, SSD based layers is super expensive. And we haven't even started the major cost factor i.e. human employees.
Today's search engines are not just indexes of web pages. They give direct answers for "when did Lincoln died", they show detailed street and satellite maps with StreetViews in search results, they act as business directory, they act as people finder, they have elaborate image search (again super expensive to do), they recognize objects in images, they have freshest news search for thousands of sources, they can do video search, they have scans of 100s of thousands of books, they catalog millions of products and so on. Even getting some of this high quality data such as satellite maps, books, business listings, product catalogues etc would cost ~100 million dollars in licensing deals.
Even if you decided to get world's most productive programmers you will need at least 300 people in my most minimal estimation and more than 2-4 years before you can build anything that has non-negligible chance of competing. That's about $75M of cost per year right there at the ultra-minimal end. Of course, this is assuming you already had bigger breakthrough than that of PageRank and that you can beat state of the art machine learning and natural language processing techniques. Hopefully you can now see, for all intent and purposes, search business is closed to attack via startups. I can imagine Facebook and Apple would sooner or later get in to this business but it would be uphill battle for them primarily because of lack of talent that search engine business requires and more importantly lack of data that only Google has for billions of queries and users doing them. You can build AirBnB with smart college hires but building search engines needs truly the cream of the crop who are perfect rare blend of Computer Scientists and part time mathematicians while also being an exceptionally productive applied programmers. Google has been working for years to sweep away pretty much all talent in this area and it would take most competitors significant effort to build this kind of army.
PS: DuckDuckGo is not a "real" search engine. It has a very little of its own index and for most queries they just rearrange results fetched mostly from Bing while inserting links from their own little index. If Bing shuts them out for leaching off of them, they would be toast.
A serious contender to Google isn't going to be a better Google (or even a comparable one), exactly for the reasons you mention. It's going to be something else altogether.
I can't tell you what the answer is, but I am pretty sure that it's outside the box you've built yourself into.
Great write up and break down of why search is hard and how unique a position Google is in. That said searching the entire web as a business isn't very feasible for many companies now and hence most of the other giants competing with Google instead use the asymmetric approach of bringing content into their own garden and building a specialized search engine around it. Since the data in their own garden is structured the amount of work to build an acceptable quality search engine is relatively less. See apple's app store, facebook search, twitter's search, amazon's a9 etc.
Today's search engines are not just indexes of web pages. They give direct answers for "when did Lincoln died", they show detailed street and satellite maps with StreetViews in search results, they act as business directory, they act as people finder, they have elaborate image search (again super expensive to do), they recognize objects in images, they have freshest news search for thousands of sources, they can do video search, they have scans of 100s of thousands of books, they catalog millions of products and so on. Even getting some of this high quality data such as satellite maps, books, business listings, product catalogues etc would cost ~100 million dollars in licensing deals.
Even if you decided to get world's most productive programmers you will need at least 300 people in my most minimal estimation and more than 2-4 years before you can build anything that has non-negligible chance of competing. That's about $75M of cost per year right there at the ultra-minimal end. Of course, this is assuming you already had bigger breakthrough than that of PageRank and that you can beat state of the art machine learning and natural language processing techniques. Hopefully you can now see, for all intent and purposes, search business is closed to attack via startups. I can imagine Facebook and Apple would sooner or later get in to this business but it would be uphill battle for them primarily because of lack of talent that search engine business requires and more importantly lack of data that only Google has for billions of queries and users doing them. You can build AirBnB with smart college hires but building search engines needs truly the cream of the crop who are perfect rare blend of Computer Scientists and part time mathematicians while also being an exceptionally productive applied programmers. Google has been working for years to sweep away pretty much all talent in this area and it would take most competitors significant effort to build this kind of army.
PS: DuckDuckGo is not a "real" search engine. It has a very little of its own index and for most queries they just rearrange results fetched mostly from Bing while inserting links from their own little index. If Bing shuts them out for leaching off of them, they would be toast.