Hacker News new | past | comments | ask | show | jobs | submit login

An aggressive stemmer might stem both "generic" and "general" to "gener".

Then if your query is "what documents contain 'generic'?", you look in the index for "gener" and then open each of those documents and check if it actually has "generic" using a stricter stemmer (that accepts generic{,s}, genericness{,es}, genericit{y,ies}, generically ... this is a bit of a bad example since they all have the prefix directly). The cost is acceptable as long as both words have about the same frequency so it doesn't affect the big O.

Of course if you have any decent kind of compute, you can hard-code exceptions before building the index (which does mean you have to rebuild the index if your exception list changes ... or at least, the part of the index for the specific part of the trie whose exception lists changed - you don't do a global lookup!) to do less work at query time. But regardless, you still have to do some of this at query time to handle nasty cases like lie/lay/laid (searching for "lie" should not return "laid" or vice versa, but "lay" should match both of the others) or do/does/doe (a more obviously unrelated example).




> and then open each of those documents

That alone ruled out doing anything like this on the device I'm talking about. The goal, which we reached, was to be able to search 1,000 documents in 5 seconds. Opening a document took nearly a second given the speed of our storage (a few KB/s). The search itself took about a second, and then we'd open up just enough of the documents to construct search result snippets as you paged through them.


Gosh this story makes me lament the state of our field.

If the current gen of devs were to build this, it would all be done "on the cloud" where they can just throw compute at the problem, and as long as the cost was less than 5$ per month they wouldn't care. That's the problem of the product managers, marketing execs and VCs.


I know exactly what you're talking about. The product manager on the project described above added little value. Luckily, they were so ineffective that they didn't get in the way often. I've had others who were "excellent" at getting in the way.

That said, three of the most impressive people I've ever known are a former marketing exec and two former product managers, all of whom now work in VC. In their former roles, each helped me be the best engineer I could be. The people in their current VC portfolios are lucky to have them as advisors. What makes them so good is that they bring expertise worth listening to, and they clearly value the technical expertise brought by engineers. The result is fantastic collaboration.

They are far from typical, but there are truly great ones out there. Losing hope of that might make it more difficult to be aware of the good fortune of working with one, and maximizing the experience. My hope is that every engineer gets at least one such experience in their career. I was lucky enough to experience it repeatedly, working with at least one great one for about half of my 30-year career.


This lament is about as interesting as complaining about kids not knowing how to use rotary phones.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: