Hacker News new | past | comments | ask | show | jobs | submit login
Q: A faster re-implementaiton of jq written in Reason Native/OCaml (github.com/davesnx)
250 points by davesnx on Sept 14, 2020 | hide | past | favorite | 192 comments



For everyone pining for a Jq with a different syntax: I have a bunch of links to alternatives collected, you might want to try some of them (some may be for different things than JSON):

https://github.com/fiatjaf/awesome-jq

https://github.com/TomConlin/json2xpath

https://github.com/antonmedv/fx

https://github.com/fiatjaf/jiq

https://github.com/simeji/jid

https://github.com/jmespath/jp

https://github.com/cube2222/jql

https://jsonnet.org

https://github.com/borkdude/jet

https://github.com/jzelinskie/faq

https://github.com/dflemstr/rq

Personally I think that next time I might just fire up Hy and use its functional capabilities.


My personal favorite solves the same problem but attacks it differently.

> Make JSON greppable!

> gron[1] transforms JSON into discrete assignments to make it easier to grep for what you want and see the absolute 'path' to it. It eases the exploration of APIs that return large blobs of JSON but have terrible documentation.

  ▶ gron "https://api.github.com/repos/tomnomnom/gron/commits?per_page=1" | fgrep "commit.author"
  json[0].commit.author = {};
  json[0].commit.author.date = "2016-07-02T10:51:21Z";
  json[0].commit.author.email = "mail@tomnomnom.com";
  json[0].commit.author.name = "Tom Hudson";
[1] https://github.com/tomnomnom/gron


Gron and Jq are like peanut butter and jelly. Also gron has `gron -u` (ungron) to turn the pivot back into json.


Don't forget powershell's Convert-FromJson :)

https://docs.microsoft.com/en-us/powershell/module/microsoft...


That is so not jq! I've really been pining for jq on my current Windows project :-(


What do you feel is missing?


Does it still convert only two levels by default?


Babashka is another Hy-like alternative, but based on Clojure, and recently discussed on HN:

https://github.com/borkdude/babashka

https://news.ycombinator.com/item?id=24353476

Aside: another nice tool I recently discovered for working with JSON and YML, doing conversion and diffs (especially helpful for generated files):

https://github.com/homeport/dyff


Late to the party but Benthos has its own language aimed at larger mappings: https://www.benthos.dev/docs/guides/bloblang/about


My go-to for simple queries is https://github.com/tidwall/jj

It is not nearly as expressive as jq, but it is faster for my use cases (written in golang).


Sorry, what is "Hy"?



A lightweight Lisp on top of Python.

Personally I'd prefer Fennel, which is on Lua and thus a whole lot faster, especially in regard to the startup time—but as I noted in a thread on Fennel, Lua's omission of a proper ‘null’ makes it awkward to handle exchange and transformations of data from third parties. And, since I'm likely to fiddle with the queries for some time, startup delay is less important here.




Thanks, next time if you paste the list somewhere else you can add query-json :P


1) refuses to operate on stdin; requires a filename argument, which is so irritating.

2) doesn't accept values that jq accepts

  % time jq -r '[expression]' < parcels | wc       
      365    1454    7978
  jq -r  < parcels  1.39s user 0.00s system 99% cpu 1.390 total
  wc  0.00s user 0.00s system 0% cpu 1.390 total

  % time ~/.yarn/bin/q  '[expression]' parcels | wc
  q: internal error, uncaught exception:
     Yojson.Json_error("Line 56, bytes -1-32:\nJunk after end 
  of JSON value: '{\n  \"OBJECTID\": 155303,\n  \"BOOK\"'")


1 is easy to work around (handy tip incoming for any tools that _seem_ to not support stdin but actually do, as stdin is also available as a file in unix):

    echo '{"foo": "bar"}' | query-json ".foo" /dev/stdin


Tools that accept filenames often expect you give them a real file, as they’ll do things on it that may not be supported by the various “it’s a file descriptor pretending to be something on disk” solutions.


Hm, do you have any examples handy? It's not that I don't believe you, it's just that in all the years I've been using this, it has always worked. Granted, I'm only using it for reading data, not for saving stuff to /dev/stdin, which would obviously fail.


Anything that seeks, which you can't do on a pipe.


atop parsing its logfiles.

  # Cuts off partway through:
  zcat atop.log.gz | atop -r /dev/stdin | less
  # Works fine:
  zcat atop.log.gz > atop && atop -r atop | less
That said, I agree with your experience that /dev/stdin usually works for programs that read a file straight in.


`file` on BSD/OSX has this notice for the option '-s':

> Normally, file only attempts to read and determine the type of argument files which stat(2) reports are ordinary files. This prevents problems, because reading special files may have peculiar consequences.

One example that comes to mind is /dev/urandom, which sucks randomy values out of the entropy pool (at least in Linux)—and the pool can be exhausted, or at least it could back in the day, not sure about now. Other possible cases are things in /proc (though unlikely), and particularly stuff like serial ports—where presumably reading could gobble data intended for some drivers or client software.


Not the best example here, but try to open a file descriptor in VSCode. I had a scenario where I wanted to diff a file on two different servers with a user-specified diff tool, and certain ones won't even operate in a read-only manner.


Seeking doesn’t work on stdin for example. Or mmapping, I imagine


Untested, but I expect unzip will fail because it keeps metadata after the files, so needs to seek back. (Unless they detect the pipe and buffer everything instead)


fseek(3) etc, and using stat(2) to determine allocation sizes springs to mind as things you might want to do on a normal file that will not work as expected on a character device.


I try to avoid the dummy files /dev/stdin, /dev/stdout and /dev/stderr, since I've been bitten when they're not available, or when I hit permission denied errors.

Two examples I can remember off the top of my head:

- Nix build scripts

- OpenMoko


alternatively in bash:

   query-json ".foo" <(echo '{"foo": "bar"}')


but indeed, it's a nice workaround!


query-json --kind=inline would support reading from stdin, I didn't spend time on the Cmdliner enough!


That's super weird, I think most people use jq for bash pipelines.


Yes, I don't understand how people end up with assertions that the filename is a require argument. At least we've got /dev/stdin or /proc/self/fd/0 as workarounds.


Much Unix software today is written by people who really don't appreciate or understand Unix. It leads to things like Homebrew (early on, at least) completely taking over and breaking /usr/local; to command-line utilities using a single dash (-) precending both short and long option names (granted, --long-opts is a GNUism, but it's a well-established standard); commands that output color by default even when the output isn't a tty; etc.

It's not hard to fix things like this, but it exemplifies a lack of familiarity with the Unix command line. There are an enormous number of tools out there that only exist because people don't know how to chain together basic 1970s Unix text-processing tools in a pipeline.


Anybody long enough to remember Unix from the beginning or even just the last 25 years... which is a tiny percentage of this site... should know that a unifying Unix or Unix "tradition" as noted in a follow-up comment is a pretty much a myth. The tradition is whatever system you grew up on and tribal biases you subscribe to and the only true Unix traditions are mostly trivialities like core shell syntax and a handful of commands, and a woefully underpowered API for modern purposes. And long option names are definitely not part of any tradition.

Myths like "everything is a file" or file descriptor is complete bollocks, mostly retconned recently with Linuxisms. Other than pipes, IPC on Unix systems did not involve files or file descriptors. The socket api dates to the early 80s and even it couldn't follow along with its weird ioctls. Why are things put in /usr/local anyway? Why is /usr even a thing? There's a history there, but these days I don't seem much of anything go into /usr/local on most Linux distributions.

It's also ironic to drag OS X into a discussion of Unix, because if there was one system to break with Unix tradition (for the best in some ways) -- no X11, launchd, a multifork FS, weird semantics to implement time machine, a completely non-POSIX low-level API, etc, that would be it.

All this shit has been reinvented multiple times, the user-mode API on Linux has had more churn than Windows -- which never subscribed to a tradition. There's no issue of lack of familiarity here, the original Unix system meant to run on a PDP-11 minicomputer only meets modern needs in an idealized fantasy-land. Meanwhile, worse is better has been chugging along for 50 years while people try to meet their needs.


> more churn than Windows -- which never subscribed to a tradition.

My understanding is that Windows has always had a very strong tradition of backwards compatibility. Even to the point of making prior bugs that vendors rely on still function the same way for them (i.e. detect if it's e.g. Photoshop requesting buggy API, serve them the buggy code path and everyone else the fixed one).

That's just as much a tradition as "we should implement this with file semantics because that's traditionally how our OS has exposed functionality".


> no X11

XQuartz if you want it

> completely non-POSIX low-level API

macOS has a POSIX layer.


> XQuartz if you want it

There are X server implementations for Windows, Android, AmigaOS, Windows CE!!, etc... I don't think this is relevant.

> macOS has a POSIX layer. So do many systems, again including Windows in varying forms through the years. I think the salient issue is that BSD UNIX and "tradition" are conflicting. The point of the original CMU Mach project was to replace the BSD monolith kernel.


Tangent: Homebrew itself doesn’t really choose to take over /usr/local; rather, it just accepts that there exists POSIX software that is way too hard for most machines to compile, and so must be distributed precompiled; and yet where that precompilation implies a burning-in of an installation prefix at build time, which therefore cannot be customized at install time. And so that software must be compiled to assume some installation prefix; and so Homebrew may as well also assume that installation prefix, so as to keep all the installed symlinks and their referents in the same place (i.e. on the same mountable volume.)

You have always been able to customize Homebrew to install at a custom prefix, e.g. ~/brew. It’s just that, when you do that, and then install one of the casks or bottles for “heavy” POSIX software like Calibre or TeX, that cask/bottle is going to pollute /usr/local with files anyway, but those files will be symlinks from /usr/local to the Homebrew cellar sitting in your home directory, which is ridiculous both in the multiuser usability sense, and in the traditional UNIX “what if a boot script you installed, relies on its daemon being available in /usr/local, which is symlinked to /home, but /home isn’t mounted yet, because it’s an NFS automount?” sense. (Which still applies/works in macOS, even if the Server.app interface for setting it up is gone!)

The real ridiculous thing, IMHO, is that Homebrew doesn’t install stuff into /usr, like a regular package manager. But due to macOS considering /usr part of its secure/immutable OS base-image, /usr is immutable when not in recovery mode.

I guess Homebrew could come up with its own cute little appellation — /usr/pkg or somesuch — but then you run into that other lovely little POSIXism where every application has its own way of calculating a PATH, such that you’d need to add that /usr/pkg directory to an unbounded number of little scripts here and there to make things truly work.


Or use `/opt` which is a POSIX-standard location. Every managed MacOSX laptop I've gotten from "Big Corp" has had `/usr/local` owned & managed by IT with permissions set to root meaning you're fighting Chef (or whatever your IT department prefers) if you use the default homebrew location.


But again, you'd have that problem whether you used Homebrew or not, as soon as you tried to (even manually!) install the official macOS binary distribution of TeX, or XQuartz, or PostGIS, or...

Homebrew just acknowledges that these external third-party binary distributions (casks) are going to make a mess of your /usr/local — because that's the prefix they've all settled on burning in at compile-time — and so Homebrew tries to at least make that mess into a managed mess.

And, if some other system is already managing /usr/local, but isn't expecting the results of these programs unpacking into there, it's going to be very upset and confused — again, regardless of whether or not you use Homebrew. So it'd be better for those other systems to just... not do that.

/usr/local isn't supposed to be managed. It's supposed to be the install prefix that's controlled by the local machine admin, rather than by the domain admin. Homebrew just happens to be a tool for automating local-admin installs of stuff.


Where are domain admins supposed to put installations?


The REAL ridiculous thing is that Homebrew was needed in the first place.

Mac OS X had some of the sexiest ways to install and uninstall application software that we'd ever seen in any other platform at that time.

But that Apple stubbornly refused to include a useful package management system, was one of the most horrible oversights in computing history.


> I guess Homebrew could come up with its own cute little appellation — /usr/pkg or somesuch —

/opt/homebrew would be a somewhat traditional place to put it.

> but then you run into that other lovely little POSIXism where every application has its own way of calculating a PATH, such that you’d need to add that /usr/pkg directory to an unbounded number of little scripts here and there to make things truly work.

What? You should be able to add it to the system PATH that's set for sessions and call it a day on a POSIX system. PATH is an environment variable and inherited. If MacOS is in the habit of overriding PATH on system scripts I have to imagine that's because they completely screwed it up at some point in the past. Generally, you just add it to your use session variables in whatever way your system supports (.profile, etc) if you want it for your user, or at a system level if you want it system wide (I could see maybe Apple making this hard).

The only times in over 20 years I've ever had to deal with PATH problems are when I ran stuff through cron, because it specifically clears the PATH. More recent systems just specify a default PATH in /etc/crontab for the traditional / and /usr bin and sbin dirs.

Maybe you're thinking of the shared library path loading? That should also be easily fixed.


> and yet where that precompilation implies a burning-in of an installation prefix at build time

Not necessarily. Plenty of software uses relative paths that work regardless of prefix. Off the top of my head, Node.js is distributed in this way.

> you’d need to add that /usr/pkg directory to an unbounded number of little scripts here and there to make things truly work.

How so? Are there that many scripts that entirely replace the PATH environment variable? In Linux, I just include my system wide path additions in /etc/profile which will be set for every login. For things like cron jobs or service scripts, which don't inherit the environment of a login shell, you will need to source the profile or use absolute paths, but that's about the only caveat I can think of.


> You have always been able to customize Homebrew to install at a custom prefix, e.g. ~/brew. It’s just that, when you do that...

...and then try to build something entirely sensible like Postgres, but hours of fiddling with different XCode versions and compiler flags still lead to a dead end of errors, you're stuck because you're running an unsupported configuration.

I still don't understand how the PG bottles for Mojave can be built.


The `go` program uses -longopts and I think it would be hard to argue that Rob Pike lacks an appreciation of Unix traditions.


I would argue that Go's design as a whole is characterized by an attitude of ignoring established ideas for no other reason than that they think they know better.


Something being established is not a grand argument for it's usage. The reasons it got established are relavent, and if you feel the end result of said establishment is obtuse or inane, why would you use it?

That's not to say Go's decisions to toss some established practices are "wise" or "sagely", just that broad acceptance is not a criteria they seemed concerned with. Which is fine.

>they think they know better.

It's safe to say Rob Pike is not clueless or without experience in unix tooling. You should listen to some of his experiences and thoughts with designing Go [0]. I don't always agree with him, [but it's very baseless to suggest he makes decisions on the grounds that they were his, not they have merrit.]

Edit to clarify: [He makes decisions on merrit over authority]

[0]: https://changelog.com/gotime/100


Sure, there's nothing that says established practice is better. That is not, in my opinion, a good defense of Go which makes many baffling design decisions. Besides, an appeal to the authority of Rob Pike is surely not a valid defense if mine is not a valid criticism.

I'm (perhaps unfairly) uninterested in writing out all the details, but “they think they know better” is because I see Go as someone's attempt to update C to the modern world without considering the lessons of any of the languages developed in the meantime. And because of the weird dogmatic wars about generics, modules, and error handling.


Rob Pike explained it thusly in a 2013 Google Groups reply:

> Rob 'Commander' Pike

> Apr 2, 2013, 6:50:36 AM

> to rog, John Jeffery, golan...@googlegroups.com

> As the author of the flag package, I can explain. It's loosely based on Google's flag package, although greatly simplified (and I mean greatly). I wanted a single, straightforward syntax for flags, nothing more, nothing less.

> -rob


Rob Pike lacks an appreciation of GNU traditions.

I'm actually surprised by this; I would have expected Pike to go with single-letter options only.


This is a fair point, although ironically it's probably because Pike predates GNU and still has a problem with all the conventions those young upstarts eschewed. Conventions change, usually for the better. I think this is one the Go team got wrong, regardless of the reason.


"There are an enormous number of tools out there that only exist because people don't know how to chain together basic 1970s Unix text-processing tools in a pipeline."

Arguably that is why the original implementation of Perl was written. If I remember the story correctly, we can never know for sure whether, e.g., AWK would have sufficed, because the particular the job the author wrote Perl for as a contractor was confidential.

Are people using jq most concerned about speed, or are they more concerned about syntax.

JSON suffers a problem from which line-oriented untilities generally have immunity: a large enough and deeply nested JSON structure will choke or crash a program that tries to read all the data into memory at once, or even in large chunks. The process is resource-constrained as the size of the data increases. There are no limits placed on the size or depth of JSON files.

I use sed and tr for most simple JSON files. It is possible to overlfow the sed buffer but it rarely ever happens. sed is found everywhere and it's resource-friendly. Others might choose a program for speed or syntax but the issue of reliability is even more important to me. jq alone is not a reliable solution for any and all JSON. It can be overkill for simple json and resource-constrained for large, complex JSON.

https://stackoverflow.com/questions/59806699/json-to-csv-usi...

netstrings (https://cr.yp.to/proto/netstrings.txt) do not suffer from the same problem as JSON.


> It leads to things like Homebrew (early on, at least) completely taking over and breaking /usr/local

Fully agree with you, but oh well, most if not everything is available on Macports anyway.

> There are an enormous number of tools out there that only exist because people don't know how to chain together basic 1970s Unix text-processing tools in a pipeline.

Speed. A specialized tool you need often beats manually wrangling the dozen or so Unix tools you need to replace it, plus many Good Options are only available on the GNU/Linux coreutils and don't work on Macs (sed -i, my most common annoyance) or busybox.


People often complain about homebrew's use of usr local without articulating what is lost.


files might be faster, because you can mmap them?


Most people work on compressed JSON lines files. Sometimes they are stored on s3. Files do not give flexibility.

When using jq, I can do a lot of things:

    aws s3 cp s3://bucket/file.json.gz - | zcat | head | jq .field | sort


Another common thing you can do is accept a generic stream as input, but have some code that penetrates the abstraction a bit to see what kind of stream it is, and if it is a file or something, do something special with it to go even faster. This way, you start with something maximally useful up front, and easy to use, but you can optimize things based on details as you go.

That's how Go's static file web server works. It serves streams, but if you happen to io.Copy to that stream with something that is also an ∗os.File on Linux, it can use the sendfile call in the kernel instead. (A downside of making it so transparent is that if you wrap that stream with something you may not realize that you've wrecked the optimization because it no longer unwraps to an ∗os.File but whatever your wrapper is, but, well, nothing's perfect.)


Which is sorta-kinda the same thing as

  jq .field <(aws s3 cp s3://bucket/file.json.gz - | zcat | head) | sort
which is more annoying to type but works.


Redirection is not a parameter, meaning that would still not work with this Q tool.


This is not a simple redirection. cmd <(subcmd) is a bashism that redirects the output of command subcmd to something that looks like a regular file to command cmd. Command cmd receive a path at the place the <(subcmd) syntax is used. Different from cmd < f, which redirects the contents of file f to cmd's input.

So, this should work :-)


Works in other shells, not just bash

  % cat /proc/self/cmdline <(echo $SHELL) | tr '\0' ' '
  cat /proc/self/cmdline /proc/self/fd/11 /bin/zsh


Yep! Zsh supports a lot of bashisms :-)

It won't work in dash though, and you should not use this in a shell that targets POSIX.


My most common jq usage is to copy and paste some json into quotes to make it easier to read. My second most common action is to chain curl and jq together. A replacement for jq that doesn’t use stdin is literally useless to me.


I think at least two people in the thread pointed out that you can use /dev/stdin as a file.


That’s a cumbersome extra step for unclear benefit.

Yes, q is supposedly faster than jq. But it is exceedingly rare for me to ever have any performance problems with jq, especially since it’s essentially a one off utility I use occasionally, not as part of the hot loop of any workflow where performance matters.


In that case I suppose you wouldn't use it.


It's more than a little strange that isn't default behavior out of the box without specifying input. How do the authors think jq is used?


I trend to use jq with a file all the time, I prefer to curl first and later operate on top. /slug


I'm pretty sure the author has also replied elsewhere saying he will add it since it was brought up as a concern.


Probably true but I use it regularly in the automation app huginn.

https://github.com/huginn/huginn


Nothing that can be fixed later?


Thanks for this. I've been planning a similar work for years and haven't gotten off my ass (too many other projects lol).

I definitely agree that reading from stdin is critical if I'll be able to use it. Don't take the criticism too hard though (especially the "author doesn't appreciate unix" stuff. Sometimes we can be such assholes to each other).

Nice work!


I'm happy with the critics.

Judging is free and I didn't consider stdin as something to spend time on yet. Will do, now that some people raise it.

Thanks :D


Probably. You tell us :=)

The incompatibility is apparently due to the fact that jq is happy with a concatenation of JSON objects and q is not. For example {'foo':1}{'foo':2} as opposed to [{'foo':1},{'foo':2}]


Sure, it's a missing feature "," creates 2 spins of the filter.

I certantly didn't use it, but I see where it's useful, will implement it soon.


To be fair that is a JSON parse error.


Yes, but the inputs to jq are not JSON, they are "a sequence of whitespace-separated JSON values which are passed through the provided filter one at a time" which is the relevant thing if you are going to try to replace jq.


Yep, kind of.

The comma operations means that the "filters" are duplicated, so instead of one json state you would pass two if there's one coma; and ofcourse any number of commas are allowed.


Are we sure it should get a single-letter 'q' binary name though? Docs seem to point that it's short for 'query-json'? Why not call it 'query-json' and let the user decide that as a shell alias or whatever. Even the ubiquitous 'ls' and 'cd' are two characters.



Which is relatively established and widely used (although mainly in finance). It was the first thing I thought about.


And there's also this 'q':

http://harelba.github.io/q/

Query CSV with SQL


Seems in the past hour he has renamed it to query-json and suggests that you set your own alias if you want something shorter.


Yeah. Can you imagine trying to do a web search for ‘q’?


I can and it is lightly disturbing. https://imgur.com/a/yAOF31g ^_^


Gqd, look how that looks like 666, edit distance is 3. Turn around the letters. Here have a timeline: https://imgur.com/a/aQYnKLm


"Gqd" = graphene quantum dot ?


I think it is a pun on God ^_^


Appropriating Shepard Fairey's graphical style into a QAnon T-shirt and then wearing it to a Trump rally is an arresting masterwork of postmodernism. One wonders: is the wearer trolling the rally, or is he trolling himself? Perhaps we are all trolls. Perhaps we have all been trolled. HAND.


I was hoping for some star trek shenanigans... I am disappointed.


or at least qj


I don't want to fight for name, q was a shortener.

Happy to rename it to qj instead, but the option of renaming the binary it's a good workaround.


Given these are two of the least-used characters in English and basically never appear in this sequence, it would certainly help for all the reasons people are mentioning. It'd at least follow suit with like 'rg' for 'ripgrep'.


I made the same argument to him on reddit. q even exists already.

He replied and thought about qj


Renamed :P


Thanks! qj is fine


Renamed already :D


This looks interesting, but could be confusing given the programming language of the same name (https://code.kx.com/q/)


Ah yes, the old ".j.k raze read0`" as a separate app


I should definitely check how that compares on some big files here.


I'd long for such a tool with a better comprehensible query language.


If so, and for anybody else having this wish, check out jql[0], I've created it exactly for this reason, to have the most common jq operations available in a more uniform and easier to use interface.

[0]: https://github.com/cube2222/jql


Nice!

I will try to bring it to the brenchmark, thanks for sharing


Give jet a try! Uses a lightweight query language over EDN. If you're familiar with Clojure, it'll be very natural to use and if you're not familiar with Clojure, the query language used is very easy to pickup :) https://github.com/borkdude/jet/blob/master/doc/query.md


There is XPath 3.1: https://www.w3.org/TR/xpath-31/#id-lookup

It is more verbose, like you get the size of something with array:size or map:size functions, so it is more readable

I am implementing it in Xidel 0.9.9+: http://www.videlibri.de/xidel.html


XPath is really useful and `xmlstarlet` is/was jq before jq existed. These days it's relatively rare to get data in XML instead of JSON, though.


This is about JSON

xmlstarlet supports XPath 1, but the W3C did not stop there. They made XPath 2 featuring variables, lists and regular expressions, XPath 3 featuring higher order function, and finally XPath 3.1 featuring JSON support.

For example,

     echo '[{"a": 1}, {"a": 2, "b": 3}, {"c": 4}]' | xidel - -e '?*?a!(. * 100)'
will print 100 and 200.

Or the same with a verbose syntax:

    echo '[{"a": 1}, {"a": 2, "b": 3}, {"c": 4}]' | xidel - -e 'for $obj in array:flatten(.) return map:get($obj, "a")  * 100'


That's news to me. Very cool actually. It's not far from jq on speed, either. Looks like you are the maintainer of this tool, so thanks!

  ~/xidel % time ./xidel - -e '?SitusAddress!(.)' < ~/parcels | wc
  **** Processing: stdin:/// ****
    29066  149317  903704
  ./xidel - -e '?SitusAddress!(.)' < ~/parcels  2.55s user 0.18s system 99% cpu 2.733 total
  wc  0.01s user 0.00s system 0% cpu 2.733 total
  ~/xidel % time jq -r '.SitusAddress' < ~/parcels | wc
    29066  149317  903704
  jq -r '.SitusAddress' < ~/parcels  0.95s user 0.00s system 99% cpu 0.958 total
  wc  0.00s user 0.01s system 1% cpu 0.957 total


The oj command in https://github.com/ohler55/ojg uses JSONPath as the query and filter language. Maybe it is more in line with what you are looking for.


I've collected a bunch of alternatives to look at, see here: https://news.ycombinator.com/item?id=24470715


For anything moderately complex I iterate using https://jqplay.org/. Life is much better since I started doing that.


Hint: you can do live jq query preview for any jq-like command using fzf. It looks like this for jql, an alternative I've created (you can find it in a neighboring comment):

echo '' | fzf --print-query --preview-window wrap --preview 'cat test.json | jql {q}'


> Aside from that, q isn't feature parity with jq which is ok at this point, but jq contains a ton of functionality that query-json misses and some of the jq operations aren't native, are builtin with the runtime. In order to do a proper comparision all of this above would need to take into consideration.

> The report shows that q is between 2x and 5x faster than jq in all operations tested and same speed (~1.1x) with huge files (> 100M).

While faster for somethings....that's a pretty large set of caveats!


Adding most of the jq operations shoudn't affect performance at all, in fact If I endup implementing streaming could be even faster.

I have a issue to improve performance where I can push this forward: https://github.com/davesnx/query-json/issues/7

But sure, are caveats!


Would be good if someone adds an explanation why this new approach is better, is it that the OCaml is faster, more efficient algorithms were used, etc?


I tried to explain it on the Performance section and on the report

https://github.com/davesnx/query-json#performance https://github.com/davesnx/query-json/blob/master/benchmarks...

But all explanations aren't based by any evidence, just asumptions.


Do we need to make jq faster ? Anyone has issues with current speed ? Is there any specific reason other than "because we can" ?


I can't answer for the OP, but "because we can" is a valid enough reason (pun unintended) for me.

IMO, an individual dev making a fast useful tool should always be welcomed as a feat of worthy hacking.


According to the "Purpose" section of the readme, it doesn't look like beating jq's speed was ever a goal. It was meant to be a learning exercise.

But if I had done something like that, and then serendipitously discovered that I was exceeding the original's performance, I certainly wouldn't be shy about it.

Also, this comes across as armchair criticism purely for the sake of armchair criticism. My own experience has been that, when I'm doing ETL that involves wrangling JSON, the "wrangling JSON" bit of it is almost always the bottleneck. So any improvement is more than welcome and deserves to be cheered. Even if it's an improvement on something that's already the current fastest way to do it.


I'm not sure at which "we" do you refer.

Reimplementing a piece of software that is 12 years old which mimics their UX and improves performance and error messages it's more than welcome in my opinion. My purpose was to learn the OCaml stack of writting compilers, so I personally found that I "needed" a language already created.

Thanks for raising those concerns


If we were at a board meeting deciding how to spend my time, no. If it is done? Why not


Batting practice.

We'd all be better off if plebes grew their skills by reimplementing common tools.


Hm, I thought q is synonym for querying CSV files https://harelba.github.io/q/


Same. When I saw the name "q" I thought of this same tool.


Right, I found q cute... but I'm thinking to release new version with the name query-json or just change the name all-together. Any suggestion? ^^


No q is an array language


As an outsider I get very confused by the Reason / Reason Native / OCaml / Bucklescript / Rescript?! ecosystem. What does it mean for it to be written in Reason Native/OCaml?


That means it produces a native binary (for example, a .exe file on windows platforms), so ultimately you're aiming to run the program in a terminal. This is the normal way for OCaml to operate.

In this case the author is using Reason as an alternative syntax to OCaml. Reason resembles javascript a little more, and some people find that nicer to work with. So the idea is that you write Reason code, then translate it into OCaml code using the Reason tools, and then ultimately you compile it down to a native binary.

If instead you want to write a web-app which runs in a web browser or node.js, then you'd need to compile it to Javascript, which is what bucklescript helps you do.

Where does Rescript come in? As explained above, Reason can be used for writing either native apps or javascript apps. However, it's hard to evolve the syntax of Reason in a way which satisfies both aims. So they've now split the work -- going forward, Reason will specialize on native, and Rescript will specialize on javascript apps. Their syntax is expected to diverge from each other, in order to support those aims as best as they can.


Thank you for the detailed answer! I check in on the status of the related projects from time to time and was often confused by the relationship between the components.


Right, the explanation of Reason - BuckleScript - OCaml is always nebulous.

I used Reason to compile to Native, so using OCaml's stdlib and OCaml's dependencies and compiling it with OCaml, but my source code is written in Reason syntax.


This looks nice, but I was a bit dismayed at "friends don't let friends curl | bash, to install this run curl | bash".


I remember one of the first times I tried installing Linux software in the wild. The bash script asked for your password, sent it to their server using curl then returned you the script with the password hard coded into it, run itself with sudo, all over unencrypted http. I was 17 but even then I stopped to think if this was a good idea.

It wasn't.


That is pretty amusing. I’ve seen some bootstrap scripts that pipe the curled output to the terminal for approval before executing it. That seems like an ergonomic alternative to curl | bash. It would be at least as useful as the terms of service warnings before you install something, anyway.


There's 4 alternatives to install query-json.

Before doing any curl | bash, check what's on the install command, that's the entire point of it.


I used to be a regular user of jq, but I was never parsing very large JSON. I now do what I used to do with jq in my browser's developer tools console. Map and filter are far more familiar than jq's syntax where I found myself referring to the documentation most of the time.

I'm sure other people have use cases where the browser wouldn't meet their needs, but for me, I find jq unnecessary.


Writing a script? I'm not going to have my script open a web browser so I can attempt to interact with a web console.


When it got to the point when I needed a script, I just preferred Python. I can understand how some might prefer jq and a shell script, I just realized it wasn't worth it for my particular needs.


Curious, any description as to why it's faster? Something intrinsic to Reason Native/OCaml? Architectural changes? Reduced feature set?


Jq appears to have its own hand written json parser and requires flex/bison. I suspect something about the hand written parser is slow for large data sets.

I was somewhat surprised it didn't use an existing json parser library.


I'm doubly surprised that such a popular utility uses bison; generated parsers tend to be slower than handwritten parser, and JSON isn't exactly the world's hardest language to parse


Well it is missing a ton of jq functionality, it's possible that in that list is something causing the performance degradation.


jq’s in C do probably not anything intrinsic. Not to mention I don’t think the ocaml compiler is an optimisation beast.


Is jq slow? I have only worked with datasets up to 1mb but I’ve never had a performance issue that wasn’t attributed to my error.


Yes

I often have to pluck out attributes from streams of json records (1 json object per line) - often millions/billions.

jq is almost always the bottleneck in the pipeline at 100% CPU - so much so that we often add an fgrep to the left side of the pipeline to minimize the input to jq as much as possible.


It's very slow. It was immediately standing out in our automated tests when we've added a json protocol to our system and used jq to test some assertions.


jq is pretty fast in my experience. But there have been cases where I've wanted it to be faster (dealing with a 90GB JSON file).

The main weakness seems to be streaming use cases (not having the whole file in memory at once). These are supported, but the syntax is quite awkward.


Right, q doesn't support streaming so it will manage a 90GB JSON.

I should specify that on the performance section, Thanks!


Out of interest, what created a 90GB JSON file?


Not OP, but I routinely call a specific HTTP API for millions of entities or pull down entire Kafka topics - all in JSON format. For various reasons those are the canonical sources of data and/or the most performant, so I end up ripping through GBs and GBs of JSON when troubleshooting/reporting on things.


I don't think it's quite 90GB, but I've processed Wikidata dumps in the same order of magnitude before (which are one JSON object per line) with jq, and it could've certainly been faster.


I was wondering about that - whether they are one single JSON array/object or one per line.


A Firebase Realtime Database backup file


Yes. I was looking to embed it in a tool, but decided against it after looking at its implementation. It parses the expression with a stack and executes it directly, and its JSON parsing is much the same. I doubt the parsing would be close to competitive with RapidJSON, let alone simd-json. The conditionals and pointer chasing of such an implementation are stumbling blocks to performance.

The C code is clean enough as C code goes, but fairly monolithic. And it’s C, so it’s not noticeably slow until you start processing GB. But it would probably take a rewrite to improve its performance significantly.


Nobody said that jq is slow.


This is funny because Stephen Dolan, the original jq author, works on OCaml itself.


Exactly! I wanted to contact him


The speed is not concern for me. I am wondering if there something better than `jq` in terms of syntax. Whenever I want to get something more that just prettify json output in the console or simply get value by specific field name I have a problem, for me it is just difficult to remember jq syntax without looking into history. As well have in my notes links to examples like this one

https://mosermichael.github.io/jq-illustrated/dir/content.ht...


For my part, I've always wanted a tool that just replicates PostgreSQL's JSON syntax. That way I can have only one syntax to remember.


One of the main ideas of query-json is to provide excelent errors. So, it would teach you by using the tool.

and there are a few techniques to "discover" the schema of the json file, I trend to read with '.' or 'keys' and later keep going.

I'm planning to implement a flag where each operation prints the internal state of the json, so you would see what are the "pipes".

I will pick a few of your cheatsheet to implement next in q, Thanks!



definitively will take a look, I've never heard of `jql` before, thanks


Might want to take a look at some of these alternatives: https://news.ycombinator.com/item?id=24470715



Umm... There's already a language called Q for array processing.


Will rename it to query-json. Thaaanks!


In case anyone is interested in yet another alternative, I have this old, unpolished project: https://github.com/bauerca/jv

It is a JSON parser in C without heap allocations. The query language is piddly, but the tool can be useful for grabbing a single value from a very large JSON file. I don't have time for it, but someone could fork and make it a real deal.


If you're into Clojure, check out https://github.com/borkdude/jet


I use jet all the time when I need to quickly examine a json snippet in Emacs. I would use <C-u M-|> (shell-command-on-region with a prefix) and execute jet to convert selected json part to EDN. That cuts out all the visual noise. EDN is much more concise, cleaner and easier to read. I'd use it even if I don't write Clojure.


Upcoming q-rs a rewrite of q in Rust :p


I hope so!


Slightly out of context here, I find the entire stack of bsb, bsb-native, ocaml and esy pretty cool. However, I just dont find enough resources, good tutorials etc on Google search. Is there a good set of beginner tutorials anyone can point to ? Thanks in advance.


The documentation is a problem in the OCaml world and a problem with Reason Native as well. I found myself pretty lost some times, esy.sh should be a initial point in contact for most of Reason related stuff.

Menhir/sedlex and others are pretty high accessibility barrier for new commers.

One of the nice things about all of it it's the discord, it's friendly and always helpful.

Hope it helps, just let me know if there's any specific!


Just ditch Reason and use OCaml. There's a lot more documentation and the syntax is better.


This is cool, but I’m not sure it’s fair to claim it’s “faster” yet when it doesn’t do 95% of what jq does—-particularly the command line options. If it’s still faster when you can match 80% of the functionality, then it might be a claim worth making.


Exactly I didn't claim to be faster in all the cases, since there's no feature parity and I won't make it that way.

For the set of operations that I implement it it's faster, that's true.


Great! Now improve the syntax!


How, though? I agree that jq's syntax isn't exactly the most straightforward, and it gets raised as a point of criticism anytime jq is mentioned, but its scripting language seems like a pretty good compromise between compactness and rich features.

Replacing that with, say, traditional command line flags would make it a lot less useful for me, I'd probably have to build much longer pipe-chains to do things that are relatively simple and readable jq snippets (if one knows the syntax.)

Using an established scripting language in its place would make it pretty much just python -c/ruby -e or whatever with some pre-loaded functions, but what's the point? You can always just write a quick python/ruby/whatever script, jq to me is an alternative for cases where a script feels unnecessary. It would also mean everything gets more verbose, so less of my jq transformations can be inlined without loss of readability.

Aligning it to more established languages would probably cause confusion as well in those cases where it doesn't match the reference language 1:1. Looks like javascript, writes like javascript, but only for a tiny subset of the language, etc.

Doing this only for a few function names or syntax constructs still results in a pretty unique and unusual language that will require people to reference the docs a lot, just now lots of existing scripts break.


Just because jq is very well stablished doesn't mean their APIs are well designed and we shoudn't improved because will break existing scripts.

There're a lot of quirks from the usage of it and people struggling with learning such a great tool, so in the area of query-json it will try to make a better interface for users.


Perhaps one of these might work for you: https://news.ycombinator.com/item?id=24470715


I think the issue is not the syntax, its the barebones docs with absolutely trivial examples.


I'd love to hear some speculation - from the author or otherwise - as to why a fresh OCaml implementation would so dramatically outperform a mature C implementation


There are a few good asumtions about why is faster, there are just speculations since I didn't profile jq or query-json.

The feature that I think penalizes a lot jq is "def functions", the capacity of define any function that can be available during run-time.

This creates a few layers, one of the difference is the interpreter and the linker, the responsible for getting all the builtin functions and compile them have them ready to use at runtime.

The other pain point is the architecture of the operations on top of jq, since it's a stack based. In query-json it's a piped recursive operations.

Aside from the code, the OCaml stack, menhir has been proved to be really fast when creating those kind of compilers.

I will dig more into performance and try to profile both tools in order to improve mine.

Thanks


JMESPath is the only viable alternative, which probably has a wider footprint than even jq as it's part of AWS CLI.


It's definitely popular but “only viable alternative” is a bit strong: that's only if you need compatibility with particular tools which support only one of the two formats. There's no reason why anyone who doesn't like those tools couldn't create a different syntax to scratch whatever particular itch they have.


It's embeddable and available as a library for all languages [0]. Everything else is nothing but an CLI tool pretty much, which further limits its adoption.

[0]: https://github.com/jmespath


Well, there is XPath 3.1 if you want standards[1] but my point was simply that it depends on whether your question is “I need compatibility with existing jq scripts”, “I need an embeddable library I can integrate in other programs”, or “I want to process JSON for my own usage”.

For example, someone who works with a lot of Python might prefer something like https://github.com/kellyjonbrazil/jello to write comprehensions using the full capabilities of Python, especially since that would provide a direct path to using the final expressions in a Python program or even embedded in one of the environments where Python is used as a scripting language. Is that a viable alternative? The answer depends entirely on who's asking.

1. https://www.w3.org/TR/xpath-31/#id-introduction


Isn't JQ written in C? I doubt LISP is going to be faster.


Yes, jq is written in C. Where LISP comes from?


I’m with Q!


+1




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: