Show HN: Pydb – a lightweight database with Python syntax queries, using ZeroMQ (github.com/asrp)
130 points by asrp on Jan 28, 2017 | 31 comments

There's already an (albeit deprecated) debugger package called pydb (http://bashdb.sourceforge.net/pydb/). It would be good to choose a different name for this, most importantly because `pip install pydb` is already taken

Reminds me of the time I lost a solid hour because the pip repository name was different than the module name.

Thank you! I looked up databases to see if there were obvious name clashes but never thought of debuggers despite using `pdb` a lot.

I'm open to suggestions for names if anyone has one. (Although I guess it'd be nice if the top discussion thread wasn't about naming.)

EDIT: I just saw you opened an issue on github so I'm happy to take name suggestions there.

I've just changed the project's name to pyzdb. Thanks again!

It might not hold up to heavy use cases, but I can already imagine a bunch of ways this would make my life easier (I use Python scripts to handle a bunch of social media stuff and basic analytics, for instance, where I use either text files or some other proxy to handle state.)

The only request I'd have is a syntactically saccharine way of spinning up the server within the client itself, which is in general an awful idea but would make my life easier for toy use cases.

Thanks! I'm glad to see this looks potentially useful to someone other than myself.

If the client could start servers (other than by importing from server.py), what would be the intended outcome if two clients try to spin up servers at the same time? Otherwise would a file lock do? In that case, just file locking combined with pickle could possibly be enough for your needs.

+1 (I find it useful too). I actually started writing my own implementation of something similar (named "PersistendDict" - feel free to steal the name ;) ) but never finished it so I will definitely check out your library. Thanks!

On account of starting / stopping the server... I personally don't care if functionality is there or not as I can always start server manually on dev machine, and it should be started differently on production machines anyway. But that's just my opinion.

I wrote a similar library -BlitzDB- a while ago:


It's a pure Python database engine with a MongoDB-like query engine and support for three different backends: File (native), SQL (via SQLAlchemy) and MongoDB.

The library transparently translates a large number of MongoDB queries into SQL or its own native storage backend, and when using the SQL backend it can do things that MongoDB can't, like queries spanning multiple relationships.

The latest version is not fully documented yet but I'm using it on several production projects myself. I'm looking for a maintainer and contributors btw, so if you're interested feel free to get in touch with me!

Interesting! I will have to take a deeper look at BlitzDB. I don't know much about MongoDB (so probably wouldn't be a good maintainer) but the sample code looks good. I can't tell how it handles nesting though.

What's the purpose/gain of layering ZMQ into this? I read the architecture bit but I'm still unclear as to what benefit this brings. I guess it allows for multiple clients to use the database at the same time? I can see how the queueing thing is useful for writes if you don't want to have to handle more than one at the same time for the sake of complexity, but wouldn't doing this for reading slow things down unnecessarily?

As you correctly observed, I mainly used ZeroMQ for the fan-in, so I need only consider one request at a time without worrying about chunking, disconnection or other lower (socket) level issues.

For speed, the idea is that you could potentially have multiple read-only servers answering queries simultaneous (all taking from the dealer). This isn't fleshed out yet. It possibly involves splitting requests into two queues for read and write requests (instead of only "run").

I'd be interested in hearing any info about potential slowdowns if you have them.

Thanks for the reply, that clears things up!

> I'd be interested in hearing any info about potential slowdowns if you have them.

I figured if you have a central queue that everything needs to go through then you'd also be limited to a single read at a time. But if you can have multiple read-only secondaries then that's unlikely to be an issue.

You could use `-e git://github.com/asrp/undoable` in requirements.txt to save some steps.

Oh, thanks, I wasn't aware of that flag. Though for the moment

    pip install -e git://github.com/asrp/undoable#egg=undoable
tells me `setup.py` doesn't exist (because it doesn't). I haven't gotten around to packaging yet not knowing (before today) if anyone's interested.

I'll probably look into that soon.

I decided to push a quick patch to github so installation with just `pip install -r requirements.txt` is possible now. I'll look into proper packaging later on. Thanks again!

Will this code not cause issues? I know that you aren't modifying args or kwargs, in the _run method, but it just seems like a potential point of failure or a python anti-pattern

    def _run(self, func=None, args=(), kwargs={})

Yes, indeed! Thanks for pointing that out. I actually saw that when I was cleaning this up a bit for release and couldn't make up my mind.

I mean I'm not modifying args or kwargs now but if I did later, I could shoot myself in the foot in a not so obvious way. But on the other hand, I don't know a succinct way to express these default values. I'd probably go with `args=None, kwargs=None` and then `args = args if args else ()`.

I typically do, but i don't know if it is the most "pythonic"

   def func(list_arg=None, dict_arg=None):
       list_arg = list_arg or []
       dict_arg = dict_arg or {}

It's not ideal. For instance if I as the caller wished to provide an empty dict-like object (e.g. dict_arg=collections.OrderedDict()), then your code would silently ignore it, and use a new dict.

Instead of checking for any object that evaluates to False, you should explicitly check for None, e.g.

   def func(list_arg=None, dict_arg=None):
       if list_arg is None:
           list_arg = []

This is the pythonic way

  def func(*args, defaulted='default', **kwargs):
      # args is a list, kwargs is a dict
      arg0 = args[0] if args else None
      another_kwarg = kwargs.get('another_kwarg', 'default')
      print(arg0, defaulted, another_kwarg)

  >>> func('one', 'two', defaulted='myval')
  one myval default
  >>> func(another_kwarg='myval')    
  None default myval
  >>> args = ('one', 'two')
  >>> kwargs = {'defaulted': 'myval', 'another_kwarg': 'other'}
  >>> func(*args, **kwargs)    
  one myval other

In the cases where I need to do this, I use a singleton:

  DEFAULT=object() # used for no other purpose

  def fun(arg=DEFAULT):
      arg_val = {} if arg is DEFAULT else arg

Why would I want to use this over SQLite?

Seems like this could be a good idea for small personal or temporary dashboards. Especially those with viz powered by packages that work natively with Python data structures like bokeh or plotly .

vanilla zeromq is a pretty bad choice for any database. zmq explicitly makes no guarantees about reliable delivery, so losing random inserts or queries here or there would be considered acceptable.

subscribers also lose the first few messages the publisher sends, unless you make sure you start the subscriber first. The publisher will make no indication of which messages are lost and which ones have actually been sent to someone:


I would suggest building something on top of request-reply instead: it's actually possible to get build reliable delivery on that.


Rather reminds me of ZODB[0].

[0] http://www.zodb.org

The interface reminds me of tied hashes and lists in Perl, or something like this[1] for Python.


It's superficially similar, but also a very different animal. This here is more of a toy database, or perhaps a better way to navigate huge JSON documents, while ZODB is a/the "real deal" object database.

Reminds me of TinyDB https://tinydb.readthedocs.io

Hmm this is pretty interesting. Going to check this out.

