Interesting, so it's reading text from STDIN into an Array, then executing its arguments as Ruby code in the context of that Array (or if you pass -l, in the context of each line individually). So all the standard methods on the [Array][1] (or [String][2]) class become accessible as top-level methods.
In case it’s helpful, as someone who doesn't know ruby, it's hard to understand. Not too readable. Split.last.ljust I just barely understood (left-justify the last match), but I have no idea what the next part is even trying to accomplish.
That's the author's oneliner. I don't know what that code is for either, which is part of I rewrote it in the second set of examples.
It's normal for a oneliner to be somewhat inscrutable though, since they're often extremely dependent on their inputs. Like if it's important to fetch the 3rd, 5th, and 7th column--why? You can't tell without the input.
Someone did a PR[1] with something similar. He also found a way to speed things up by creating a Proc first and feeding it to the eval instead of the string.
It makes me a little uncomfortable that they're using curl|bash for something as simple as "put this 10-line script somewhere in your $PATH," especially when the script involves sudo (to move into /usr/local/bin). Sure, it's easy to inspect the script and see that it's not doing anything malicious, but it makes install processes like this, where it'd be incredibly easy to, seem normal.
I didn't even see that. I followed the directions in the top section ("Clone this repo and copy the rb file to somewhere in your path (or just copy and paste the above).") (I did have to chmod +x)
That said,
brew install foo
is a normal part of many development workflows, which essentially just curls a file from someone else's git repo.
I mean, if you think about it, any time you run any installer, whether via brew install, apt-get install, or an .exe or .msi, you're effectively running someone else's unknown code on your system, often as a superuser (e.g. with sudo apt-get install). Is there a significant difference here? At least in this case you could potentially download the shell file and read it before you run it, unlike with a binary executable.
The second I start wanting a bash pipeline, probably around the third pipe, I scrap it immediately and move to using a text editor to write a script. Because if I'm wanting a pipeline, I'm also going to want to store it, and transmit it over the network, and manage it, and put it down and come back to it later.
All things perfectly manageable inside a PORO. Bundler even has an inline mode, so you can put everything in one file, close to your data, Bundler just handles dep management under the hood like a boss. Check out bundler-inline.
Sure, you can do all that with bash. But you can also make system() calls with Ruby, with full string interpolation power tools. If you're a Rubyist, and you want to do data analysis, this is the workflow you want.
I hadn't seen bundler-inline before, thanks for posting this. That'll be nice for small scripts that don't warrant a whole directory themselves, or useful for a sort of "tools" / "utils" repo with small utility scripts.
> Because if I'm wanting a pipeline, I'm also going to want to store it, and transmit it over the network, and manage it, and put it down and come back to it later.
Not my experience. More often than not, if I'm writing a pipeline (in a shell or in a REPL), it's because someone handed me a file they made in a one-off effort that has nonsensical nonstandard formatting, and I'm trying to normalize it.
If they made another file, it'd have different nonsensical nonstandard formatting, so there's no reuse here.
Meanwhile, writing a pipeline at a REPL (or shell) enables me to just work on it expression by expression, keeping a variable-assignment at the beginning of the expression so that I can then work on the intermediate created data to figure out how to munge it just a little bit more, and then add that back to the pipeline once it's right.
If there was a way to edit a Ruby script such that, each time I save it, my existing REPL session re-runs the script and loads the bindings from it into my existing session—without destroying any other bindings I've made—then I'd be happy to use that. But that's less like "using files", and much closer to using something like a Jupyter notebook.
sort(1) is a merge sort (specifically, an "external R-way merge")—and, as a merge sort, it actually isn't constrained by memory. It spills its sorted intermediates to disk as temp files! (And, in fact, it even compresses each intermediate, for the same reason a column store like Cassandra does.) It's pretty optimal!
If you're wondering how this is at-all fast (and it is), in Linux at least /tmp is a tmpfs, meaning that these tempfiles are actually ending up as memory after all. They get "spilled to disk" in the sense that the memory associated with the files spills to swap; but, importantly, tmpfs pages know that they're from files, so they have much better swapping semantics than regular memory for the case where you're writing the file as a stream, and then reading it as a stream. (Basically, juggling streams under a tmpfs is equivalent in overhead to managing memory when you've got it, and managing files when you don't, but without any of the dev-time overhead of having to write code for both cases.)
Yeah I suppose my attitude will change should I ever start needing to do heavy lifting. But I'll definitely explore instrumenting bash through ruby first, just to see where that line lies.
There's also the option of pushing the workload off onto AWS that I may look into.
Don't you loose the easy threading that's inherent in a pipeline of processes though?
I love ruby but for throw away jobs I'd be more likely to do it on the command line than a ruby script simply because it would be quicker for me to write and run in a shell. Unless I'm wrong at the very least you wouldn't have to deal with the GIL.
Yeah, just like eating dirt solved the problem of world hunger.
Look again at your code. I'm not 100% sure what the Ruby version does, but once I do figure out the exact semantics of drop and split, it's going to be waaaaaaaaaaaaaaaaay easier to remember, understand and modify the Ruby version than the Perl one.
Your post comes off as something from reddit's /r/nottheonion, you're just making the case against Perl for Perl-haters :)
Personally, when I write shell commands, they tend to be write-only code, because the shell isn't really suitable for anything more complex. So it's easier to think in code than it is to add the extra step of translating to English if it's something nobody's gonna see again anyway.
My counter to this would be - if I'm doing a thing, and I need to look back through my shell history a few days or weeks later to figure out how I did it (especially if something went wrong), seeing a more readable version is going to help me figure it out faster.
Granted, anything really nontrivial I'll usually just write a Python script for, but the point stands.
It is if you know ruby. Me, I know perl and honestly I do not grok most of the ruby examples given here. I am quite sure that I could learn them about as fast as somebody not knowing perl and knowing ruby could learn perl.
Why the hell would anybody want to replace AWK? It’s super fast, extremely light on resources, doesn’t have any dependencies, does auto memory management and type inference, and is extremely powerful for large dataset processing, not to mention easy to learn and program.
I think that's relative and it certainly hasn't been my experience. If I program in Ruby, Node, Python, etc for 8 hours of every day, it makes sense that I would reach for that over a command line tool with a syntax that looks a bit arcane. The best tool for the job sometimes is the tool you know best.
It’s arcane only if you don’t know C. If you’re on a UNIX-like or a true UNIX system, not knowing C will come to haunt you with a vengeance sooner or later.
> The best tool for the job sometimes is the tool you know best.
That's like hammering in screws because you know your hammer and screwdrivers seem arcane to you.
For ad-hoc stuff `ruby -e` is more than enough, and if you want to write a program once and use it a lot, it's worth investing the time in doing it the way it works best, which in this case, would be a (fast) language like awk that makes it easier to process line-by-line instead of reading until EOF and making the rest of the pipeline wait.
I want to say that Perl, and Ruby, and even Python are superior tools .... except that awk is also my default go-to for pipiing into to munge output data, so yes it's clearly still hitting the sweet spot for many tasks.
Well the thing is, matz did it a decade ago when he carefully copied all these features that Perl came up with or itself carefully copied from awk and sed.
It's feeding stdin to ruby code you specify on the command line, either as an array of lines, or repeated strings. So in:
docker ps | rb drop 1 | rb -l split[1]
The output of docker ps is passed as an array of lines to ruby, and the first line is dropped, and then that output is fed to ruby again, but line by line, so each line is split (which defaults to split on whitespace), and returns the second element. With parentheses for function calls it would be:
First I thought: This is stupid.
But then I watched the examples on github and here in the comments, and I changed my mind: It's pretty neat :) I like it.
Not the parent but speaking myself, I feel the need to pick one or the other.
I'm not saying I hate Ruby. I just haven't take the time (nor had a reason to, frankly) become a fan the way I have with Python. Being similar languages in terms of both being dynamic, malleable, mostly single threaded and REPL-able they both seem to occupy the same spot in my toolbox.
Not to knock Ruby-the-language, but the one hitch is that Python's ecosystem is quite a bit vaster into fields I don't interact with (biology, physics, etc.) but also with many that I do (numpy/sympy, cli utilities and scripting, opencv, curl bindings). Where I'm at, it's most everyone's secondary or tertiary language.
Nothing against Ruby, it's just circumstance. If anything I might pick it up soon if only to access JRuby/TruffleRuby worlds which are much ahead of Jython/GraalPython.
Oh, except for function calls without (). And I'm not huge on DSLs. They seem icky but I grew to love significant whitespace in Python so maybe I'll grow to love those too.
As someone who prefers Python to Ruby, Ruby is better for this job because there are so many more methods on Array/Enumerable than on Python list (or Python's global functions.) And them all being methods makes life simpler too.
- Reading all the lines into an array is a big error.
- It's a bit unintiuitive because you don't really know what strings to escape on the bash command line. In the example, some are escaped and others aren't.
- In the end I think this could be achieved with a simple bash function:
A few examples:
Capitalizing a line:
Printing only unique lines: Computing the sum of numbers in a table: Personally I think it's a really cool idea.[1]: https://ruby-doc.org/core/Array.html
[2]: https://ruby-doc.org/core/String.html