Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Damn Cool Algorithms, Part 1: BK-Trees (notdot.net)
177 points by llambda on Dec 5, 2011 | hide | past | favorite | 13 comments



I couldn't get the link to the Python version to work, but I wrote my own a while back.

https://github.com/sholiday/pyBKTree



Here's where the python one above (as well as a different haskell version) lives now: https://github.com/ahupp/bktree


IMHO the explanation on the Lisp implementation page is much clearer then the article.


If you hang out in the lisp community you will find this to be a recurring theme - the quality of the community and the textbooks makes lisp a great learning language even if you never write a line of lisp code in production. fwiw as a current outsider the haskell community gives me the same feeling that the discourse in the community is of pretty high quality.


What an odd thing to say! The Lisp implementation page has no explanation that I could find of either how to build the tree or how to search it. Or did you mean you found the source code to be a clearer explanation?

Edit: The lisp page does have a nice picture of a tree. Did you mean that looking at that picture you could figure out both the algorithm for building the tree and for querying it and the proof that these are correct faster than by reading Nick's blog post?


I meant that the picture explains the tree's structure much clearer than the blog post, which is probably not surprising. Didn't really look at how to build or query the tree.


This seems similar to the VP-Tree, which there was a post about a few days ago, except that VP-Trees use binary partitioning because they're measuring similarity in a continuous space, while these ones use discrete buckets because similarity is measured discretely.


The VP article linked to this one as background material. That is where OP found this article to repost for karma.


i especially like the part where it says. "The only copy of this online seems to be in the ACM archive, which is subscription only."


To clarify, Nick Johnson is referring to the original paper on the topic ("Some approaches to best match file searching" by Burkhard and Keller in 1973). It's a common frustration that ACM archives are not freely available to the public.

In the comments, "Sarah" notes that the paper can be found here:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.160...

BTW, Nick's Blog often has gems and is well worth a regular read.


The SISAP metric spaces library comes with an implementation: http://sisap.org/Metric_Space_Library.html You can also compare BK-trees with other methods and you may encounter that their performance is not so great, specially for high dimensional spaces.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: