Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: SpaCy – Build tomorrow's language technology products with Python (spacy.io)
93 points by syllogism on Aug 20, 2015 | hide | past | favorite | 10 comments



If anyone is interested in NLP and hasn't had the chance to use spaCy yet, give it a go now - it's just two commands: "pip install spacy" and "python -m spacy.en.download" =]

spaCy is perfect for web-scale NLP (a term I don't use lightly consider I crawl billions of pages a month) and is AGPLv3 for hobbyists / academics / open source developers. It comes with an existing language model for English, a concise API, and many other goodies baked in by default (i.e. word vector representations). Even though it's in Python and near state of the art in terms of accuracy, it's far faster than just about any other parser out there. Finally, the demos and documentation are consistently strong and continuously improving.

I'd highly recommend checking out the "marking adverbs" tutorial, which shows how to develop features you might use in a proofreading tool in a few dozen lines of Python.

http://spacy.io/tutorials/mark-adverbs/

n.b. I was in the same NLP research group with Matthew Honnibal whilst he was obtaining his PhD at the University of Sydney, so I'm biased, but only as I've seen his work in person!


Are other languages than English be supported ? I suppose so, but then how do I had another language ? How will it perform ?


It only supports english at the moment[0].

[0]http://spacy.io/#comparisons


So saw it was English only. Hence my question. How should I go about building support for another language ? I skimmed the doc but didn't see anything related to that question. Also, it is AGPL so I also wonder if you can you TreeBank for training.


This is great! The web demo parses ordinary sentences with pretty good accuracy. I had to resort to twisters to confuse it:

The old man the boat. (the elderly staff positions on the boat). Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo. (see https://simple.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_bu..., it's not a fair test, but it illuminates how the underlying engine works)

I'm going to point this project out to a teacher I had, see how it does with other corpora.


It's interesting, for sure, but it doesn't take trick sentences to confuse it. This somewhat opaque but totally realistic sentence:

http://spacy.io/displacy/?full=super%20llamas%20find%20a%20w...

caused it to get all the parts of speech correct, but miss pretty hard on how they are applied (ie, the arrows are pointing to the wrong objects or in some cases even in the wrong direction).


The "Under Construction" theme of the dependency parse tree visualization tool[0] is amazing.

[0] http://spacy.io/displacy/



Is there a glossary of annotations? For example, what is "RELCL?"


I can't be certain without the sentence that generated the tag, but coming from a linguistics background, RELCL is probably 'relative clause'.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: