Hacker News new | past | comments | ask | show | jobs | submit | scroyston's comments login

I'm all for data over code, but I'd use a Rete network so my program didn't run in O(DISTRIBUTIONS * RULES) time. Yes data design is important, but you need to be algorithm savvy in order to know the best way to design your data structures.


I think that in this case you get a non-trivial performance gain with some basic implementation improvements. You can replace the row-by-row round trip with a single SQL UPDATE with a LIMIT (or a LIMITed subquery if the server does it that way).

Also, the performance of the algorithm in the essay would be linear to the number of rows requiring update, can you elaborate on whether the RETE alogorithm can do any better in this case.

And finally, thanks for the note on RETE, I will have to investigate that.


Comment space is a bit limited to do an adequate explanation. For rules that have straightforward (but possibly compound) predicates, Rete will give you O(1) lookup. From wikipedia: "In most cases, the speed increase over naïve implementations is several orders of magnitude (because Rete performance is theoretically independent of the number of rules in the system)."

http://en.wikipedia.org/wiki/Rete_algorithm

My favorite reference on the subject is at the bottom of the wikipedia page: "Production Matching for Large Learning Systems - R Doorenbos"

Sadly (and strangely) many of the "Rete Engines" today do it wrong and will not give you effecient lookups.


It scales fairly well, but my target market is all the Excel analytic users who end up emailing around their views (and end up with version issues, data issues, etc.) I have a good deal experience with large enterprise DWs (SAP BW, Cognos, etc.) It's amazing how often people end up extracting the subset of information they need, and then do their work in excel. (Usually because of performance, and IT bottlenecks).

Thanks for the feedback.


Yes, I've seen tableau, I think they're awesome. However, their client application is desktop based (though they have recently started doing stuff online, see Tableau does Web 0.2: http://www.intelligententerprise.com/blog/archives/2007/11/t...). I'm primarily targeting this at Excel pivot table/analytic users, who usually end up emailing 40MB files around to distribute the views they've built.

The idea is with Eureka, you can do it online, and easily distribute what you've built via a simple URL.

Thanks for the feedback.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: