There's already an (albeit deprecated) debugger package called pydb (http://bashdb.sourceforge.net/pydb/). It would be good to choose a different name for this, most importantly because `pip install pydb` is already taken
It might not hold up to heavy use cases, but I can already imagine a bunch of ways this would make my life easier (I use Python scripts to handle a bunch of social media stuff and basic analytics, for instance, where I use either text files or some other proxy to handle state.)
The only request I'd have is a syntactically saccharine way of spinning up the server within the client itself, which is in general an awful idea but would make my life easier for toy use cases.
Thanks! I'm glad to see this looks potentially useful to someone other than myself.
If the client could start servers (other than by importing from server.py), what would be the intended outcome if two clients try to spin up servers at the same time? Otherwise would a file lock do? In that case, just file locking combined with pickle could possibly be enough for your needs.
+1 (I find it useful too). I actually started writing my own implementation of something similar (named "PersistendDict" - feel free to steal the name ;) ) but never finished it so I will definitely check out your library. Thanks!
On account of starting / stopping the server... I personally don't care if functionality is there or not as I can always start server manually on dev machine, and it should be started differently on production machines anyway. But that's just my opinion.
It's a pure Python database engine with a MongoDB-like query engine and support for three different backends: File (native), SQL (via SQLAlchemy) and MongoDB.
The library transparently translates a large number of MongoDB queries into SQL or its own native storage backend, and when using the SQL backend it can do things that MongoDB can't, like queries spanning multiple relationships.
The latest version is not fully documented yet but I'm using it on several production projects myself. I'm looking for a maintainer and contributors btw, so if you're interested feel free to get in touch with me!
Interesting! I will have to take a deeper look at BlitzDB. I don't know much about MongoDB (so probably wouldn't be a good maintainer) but the sample code looks good. I can't tell how it handles nesting though.
What's the purpose/gain of layering ZMQ into this? I read the architecture bit but I'm still unclear as to what benefit this brings. I guess it allows for multiple clients to use the database at the same time? I can see how the queueing thing is useful for writes if you don't want to have to handle more than one at the same time for the sake of complexity, but wouldn't doing this for reading slow things down unnecessarily?
As you correctly observed, I mainly used ZeroMQ for the fan-in, so I need only consider one request at a time without worrying about chunking, disconnection or other lower (socket) level issues.
For speed, the idea is that you could potentially have multiple read-only servers answering queries simultaneous (all taking from the dealer). This isn't fleshed out yet. It possibly involves splitting requests into two queues for read and write requests (instead of only "run").
I'd be interested in hearing any info about potential slowdowns if you have them.
> I'd be interested in hearing any info about potential slowdowns if you have them.
I figured if you have a central queue that everything needs to go through then you'd also be limited to a single read at a time. But if you can have multiple read-only secondaries then that's unlikely to be an issue.
I decided to push a quick patch to github so installation with just `pip install -r requirements.txt` is possible now. I'll look into proper packaging later on. Thanks again!
Will this code not cause issues? I know that you aren't modifying args or kwargs, in the _run method, but it just seems like a potential point of failure or a python anti-pattern
Yes, indeed! Thanks for pointing that out. I actually saw that when I was cleaning this up a bit for release and couldn't make up my mind.
I mean I'm not modifying args or kwargs now but if I did later, I could shoot myself in the foot in a not so obvious way. But on the other hand, I don't know a succinct way to express these default values. I'd probably go with `args=None, kwargs=None` and then `args = args if args else ()`.
It's not ideal. For instance if I as the caller wished to provide an empty dict-like object (e.g. dict_arg=collections.OrderedDict()), then your code would silently ignore it, and use a new dict.
Instead of checking for any object that evaluates to False, you should explicitly check for None, e.g.
def func(list_arg=None, dict_arg=None):
if list_arg is None:
list_arg = []
...
Seems like this could be a good idea for small personal or temporary dashboards. Especially those with viz powered by packages that work natively with Python data structures like bokeh or plotly .
vanilla zeromq is a pretty bad choice for any database. zmq explicitly makes no guarantees about reliable delivery, so losing random inserts or queries here or there would be considered acceptable.
subscribers also lose the first few messages the publisher sends, unless you make sure you start the subscriber first. The publisher will make no indication of which messages are lost and which ones have actually been sent to someone:
It's superficially similar, but also a very different animal. This here is more of a toy database, or perhaps a better way to navigate huge JSON documents, while ZODB is a/the "real deal" object database.