Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nice to see an alternative or rather a complementary library to the well worn nltk.

Hopefully Stanza provides for supplemental training for their models.



NLTK, for my uses, has been dethroned by Spacy for years now. I'm very curious to see how Stanza compares. It looks like it's built on PyTorch, so very interested to check it out.


For a lot of things, yes, but NLTK still has a much bigger and better variety of tokenizers.


Interestingly I had trouble using Spacy as it requires an internet connection to AWS to load their models and therefore work. This was blocked by deep packet inspection in my use case.

I looked a bit for a workaround but finally decided to just do the work in NLTK.

That's one disadvantage of Spacy.


you can absolutely download the models locally:

# download best-matching version of specific model for your spaCy installation

  python -m spacy download en_core_web_sm


If I recall correctly, you can pre-download the models you need, which might be suitable for your use case.


gensim and spacy have been in that space for a while now as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: