If you've got something simple like a binary file full of uints (e.g. a raw image), then numpy provides fromfile which is incredibly simple to use. If you have tabular data in a known format, you can also pass fairly complicated datatypes.
I'm trying various not-regex syntaxes. My motivation is: we all agree regex is bad, but we keep using it because it's available everywhere and hence the least common denominator (a schelling point). What would it take to get beyond regex as the goto pattern matching/parsing? At least for casual use.
While I understand the ease of use pure Python provides, I think when manipulating real binary or in memory data, a little bit of C would really simplify and speed things up.
Writing raw C extensions can be a pain,
but with Cython or CFFI using C from Python has become extremely easy.
> when manipulating real binary or in memory data, a little bit of C would really simplify
In what sense? Accessing an octet in a buffer and performing basic bitwise operations on it is equivalent:
# C _and_ Python
some_buffer[idx] & 0x80
Reading a buffer in Python is slightly less error-prone, as errors are automatically converted to exceptions, which can be caught at a more centralized location in the code; in C, one must check for errors (and manually propagate those errors) at every read; this applies a bit less in C++, though even there, the stdlib leaves some things to be desired (but generally speaking, they can be wrapped away).
Text encoding/decoding, struct packing, and base64 aren't in C or its standard library.
Now, in terms of speed, C wins hands down. (I wrote a minecraft file parser in both C++ and Python, and the difference was several orders of magnitude.)
Erlang's binary matching is beyond awesome, and OCaml has a pretty good implementation of it as well (Bitstring) [1].
Cool thing about Elixir is of course that awesome Erlang binary handling is available and very easy to use.
Lots of people have been lobbying for something like this in Rust, given its low-level orientation, but there's been resistance to the idea from the core devs, who want it to be done using syntax extension.
Agreed. I remember unpacking binary data in Python 2.7 during my late undergrad and found it much more cumbersome. I bookmarked the article in celebration the next time I'm doing something like that again.
All python 3 does is add an onion layer around unicode. Fortunately it looks like these techniques will work for python 2.7 or 3.x. With 2.7 obviously being the preference for pretty much everyone who isn't a python 3 developer.
The cases which one would work with binary data are few and far between. And for the purposes of preserving sanity across architectures, I often find using a cross-platform file format like HDF5 helpful.
I disagree. The cases where I would do so voluntarily are limited, but I've been forced to deal with binary data on several occasions. HDF5 is also my goto if I am in control, but sometimes you're given data that is only in some obscure binary format.
Unless you were implying that such work is best done in another language. I can buy that, but I found Python to be a convenient choice for it.
np.fromfile('data.dat', dtype='uint8')
https://docs.scipy.org/doc/numpy/reference/generated/numpy.f...
EDIT: Obviously you have the added benefit that your data is returned into a numpy array for easy reshaping etc