Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Rb – Turns Ruby into a command line utility (github.com/thisredone)
232 points by redka on Aug 14, 2018 | hide | past | favorite | 73 comments



Interesting, so it's reading text from STDIN into an Array, then executing its arguments as Ruby code in the context of that Array (or if you pass -l, in the context of each line individually). So all the standard methods on the [Array][1] (or [String][2]) class become accessible as top-level methods.

A few examples:

Capitalizing a line:

   echo hello world | rb -l capitalize
Printing only unique lines:

    printf 'hello\nworld\nhello\n' | rb uniq
Computing the sum of numbers in a table:

   printf '1\n2\n3\n' | rb 'map(&:to_i).sum'
Personally I think it's a really cool idea.

[1]: https://ruby-doc.org/core/Array.html

[2]: https://ruby-doc.org/core/String.html


    echo hello world | ruby -pe '$_.capitalize!'

    printf 'hello\nworld\nhello\n' | ruby -le 'puts STDIN.to_a.uniq

    ps | ruby -lane 'BEGIN { b = 0 }; b += $F[0].to_i; END { print "sum of PIDs: #{b}" }
select whatever column you want, to sum up.

Really what Ruby needs is a flag that adds some capabilities to NilClass, so that these BEGIN blocks aren't needed.


If this was an attempt at showing this tool is unnecessary, you just did the opposite.

Each of your examples use different flags, inputs and print methods. The last is particularly good at proving the point!


Each of those examples is doing different things. This rb script locks you into two of the options.


> Really what Ruby needs is a flag that adds some capabilities to NilClass, so that these BEGIN blocks aren't needed.

Does it, though?

  ps | ruby -le 'b = $<.reduce(0) {|s, l| s + l.split(0).to_i}; print "sum of PIDs: #{b}"'
or:

  ps | ruby -lane 'b = (b || 0) + $F[0].to_i; END { print "sum of PIDs: #{b}" }'


Command line tools in 10 lines of ruby (using this script):

    docker ps | rb drop 1 | rb -l split[1]
    docker ps -a | rb grep /Exited/ | rb -l 'split.last.ljust(20) + " => " + split(/ {2,}/)[-2]'
    df -h | rb 'drop(1).sort_by { |l| l.split[-2].to_f }'
Command line tools in zero lines of ruby (using ruby):

    docker ps | ruby -lane 'next if $. == 1; print $F[1]
    docker ps -a | ruby -lne 'print $1 if /(Exited .*? ago)/'
    df | ruby -lane 'BEGIN { lines = [] }; lines.push [$F[4].to_i, $_] if $. > 0; END { lines.sort { |a,b|b[0] <=> a[0] }.each{|k| print k[1] } }
Bit of a pain as ++ is lacking, as is autovivication. a /bin/sort at the end usually beats `<=>` for terseness.


In case it’s helpful, as someone who doesn't know ruby, it's hard to understand. Not too readable. Split.last.ljust I just barely understood (left-justify the last match), but I have no idea what the next part is even trying to accomplish.

It's a cool script though.


That's the author's oneliner. I don't know what that code is for either, which is part of I rewrote it in the second set of examples.

It's normal for a oneliner to be somewhat inscrutable though, since they're often extremely dependent on their inputs. Like if it's important to fetch the 3rd, 5th, and 7th column--why? You can't tell without the input.


for the 3rd example, not sure why you are manually building the array..

    df -h | ruby -e 'puts readlines.drop(1).sort_by { |l| l.split[-2].to_f }'
similarly, the `uniq` example from your other comment could be shortened to

    printf 'hello\nworld\nhello\n' | ruby -e 'puts readlines.uniq'


The command should process stdin in streaming fashion rather than slurping it all at once:

  code = ARGV[0]
  STDIN.each_line do |l|
    puts l.instance_eval(code)
    STDOUT.flush
  end


Someone did a PR[1] with something similar. He also found a way to speed things up by creating a Proc first and feeding it to the eval instead of the string.

[1] https://github.com/thisredone/rb/pull/2


OP apparently wants to process an array of lines, not just each line separately.


We need to be using lazy coolections.


So:

  execute(STDIN.each_line, code)
which gives you a lazy enumerator, in place of:

  execute(STDIN.readlines, code)
which gives you an Array.


It makes me a little uncomfortable that they're using curl|bash for something as simple as "put this 10-line script somewhere in your $PATH," especially when the script involves sudo (to move into /usr/local/bin). Sure, it's easy to inspect the script and see that it's not doing anything malicious, but it makes install processes like this, where it'd be incredibly easy to, seem normal.


Yeah, and it's not even getting the code from an official source; it's coming from a random bit.ly link [created by a contributor][1].

I've [submitted a PR][2] to inline the install script in the README.

[1]: https://github.com/thisredone/rb/pull/5

[2]: https://github.com/thisredone/rb/pull/8


Thanks, merged


I didn't even see that. I followed the directions in the top section ("Clone this repo and copy the rb file to somewhere in your path (or just copy and paste the above).") (I did have to chmod +x)

That said, brew install foo is a normal part of many development workflows, which essentially just curls a file from someone else's git repo.


It's such a common pattern too. I can't believe people are still doing this and it's generally acceptable.


I mean, if you think about it, any time you run any installer, whether via brew install, apt-get install, or an .exe or .msi, you're effectively running someone else's unknown code on your system, often as a superuser (e.g. with sudo apt-get install). Is there a significant difference here? At least in this case you could potentially download the shell file and read it before you run it, unlike with a binary executable.

Am I way off base here?


Some context on why curl pipe bash is a bad idea: https://www.idontplaydarts.com/2016/04/detecting-curl-pipe-b...

Most recent discussion: https://news.ycombinator.com/item?id=17636032


The second I start wanting a bash pipeline, probably around the third pipe, I scrap it immediately and move to using a text editor to write a script. Because if I'm wanting a pipeline, I'm also going to want to store it, and transmit it over the network, and manage it, and put it down and come back to it later.

All things perfectly manageable inside a PORO. Bundler even has an inline mode, so you can put everything in one file, close to your data, Bundler just handles dep management under the hood like a boss. Check out bundler-inline.

Sure, you can do all that with bash. But you can also make system() calls with Ruby, with full string interpolation power tools. If you're a Rubyist, and you want to do data analysis, this is the workflow you want.


I hadn't seen bundler-inline before, thanks for posting this. That'll be nice for small scripts that don't warrant a whole directory themselves, or useful for a sort of "tools" / "utils" repo with small utility scripts.


> Because if I'm wanting a pipeline, I'm also going to want to store it, and transmit it over the network, and manage it, and put it down and come back to it later.

Not my experience. More often than not, if I'm writing a pipeline (in a shell or in a REPL), it's because someone handed me a file they made in a one-off effort that has nonsensical nonstandard formatting, and I'm trying to normalize it.

If they made another file, it'd have different nonsensical nonstandard formatting, so there's no reuse here.

Meanwhile, writing a pipeline at a REPL (or shell) enables me to just work on it expression by expression, keeping a variable-assignment at the beginning of the expression so that I can then work on the intermediate created data to figure out how to munge it just a little bit more, and then add that back to the pipeline once it's right.

If there was a way to edit a Ruby script such that, each time I save it, my existing REPL session re-runs the script and loads the bindings from it into my existing session—without destroying any other bindings I've made—then I'd be happy to use that. But that's less like "using files", and much closer to using something like a Jupyter notebook.


xargs -P and other ad-hoc parallelism tools usually stop me from leaving the command line too quickly (e.g. <(), fork / join).

The other thing that keeps me back is scale. If everything fits in memory, cool, but sometimes I need to sort or uniq something that won't.


sort(1) is a merge sort (specifically, an "external R-way merge")—and, as a merge sort, it actually isn't constrained by memory. It spills its sorted intermediates to disk as temp files! (And, in fact, it even compresses each intermediate, for the same reason a column store like Cassandra does.) It's pretty optimal!

If you're wondering how this is at-all fast (and it is), in Linux at least /tmp is a tmpfs, meaning that these tempfiles are actually ending up as memory after all. They get "spilled to disk" in the sense that the memory associated with the files spills to swap; but, importantly, tmpfs pages know that they're from files, so they have much better swapping semantics than regular memory for the case where you're writing the file as a stream, and then reading it as a stream. (Basically, juggling streams under a tmpfs is equivalent in overhead to managing memory when you've got it, and managing files when you don't, but without any of the dev-time overhead of having to write code for both cases.)


Yeah I suppose my attitude will change should I ever start needing to do heavy lifting. But I'll definitely explore instrumenting bash through ruby first, just to see where that line lies.

There's also the option of pushing the workload off onto AWS that I may look into.


I agree. I love Ruby, but I'll pipe 'httpie', 'jq' and 'xargs' for quite awhile before deciding to write a Ruby script to do something similar.


Don't you loose the easy threading that's inherent in a pipeline of processes though?

I love ruby but for throw away jobs I'd be more likely to do it on the command line than a ruby script simply because it would be quicker for me to write and run in a shell. Unless I'm wrong at the very least you wouldn't have to deal with the GIL.


This is a good point. Using Rb with subshell and pipeline would be the best of both worlds?


  $ docker ps | rb drop 1 | rb -l split[1]
  $ docker ps | perl -anE 'say $F[1] if $.>1'

perl solved this problem a long time ago, people


for your given sample, you could use awk[0] as well

    awk 'NR>1{print $2}'
or ruby[1]

    ruby -ane 'puts $F[1] if $.>1'
which one to use depends on lots of factor - speed, features, availability, etc as well as whether user already knows perl[2]/ruby/etc

Further reading(disclosure: I wrote these)

[0]: https://github.com/learnbyexample/Command-line-text-processi...

[1]: https://github.com/learnbyexample/Command-line-text-processi...

[2]: https://github.com/learnbyexample/Command-line-text-processi...


Yeah, just like eating dirt solved the problem of world hunger.

Look again at your code. I'm not 100% sure what the Ruby version does, but once I do figure out the exact semantics of drop and split, it's going to be waaaaaaaaaaaaaaaaay easier to remember, understand and modify the Ruby version than the Perl one.

Your post comes off as something from reddit's /r/nottheonion, you're just making the case against Perl for Perl-haters :)


The perl here is not really that crazy to remember.

-a autosplits each line by whitespace and puts each element into the array F (this was inspired by AWK).

-n loops through each line of the file, and -E executes the perl.

$. (NR in AWK) is the line number.

As others have noted, you can write the same thing in ruby on the command line already with `ruby -ane 'puts $F[1] if $.>1'`

(Notice the similarities?)

If you write one liners in AWK, Perl, or Ruby often the "odd" variables look more like useful shortcuts.

*edit You could also write the perl without any of the special variables, but it would be much more verbose, hence the special characters and flags.


Except that in your example, the first line is coherent English, and the second line is just... well... code.


Personally, when I write shell commands, they tend to be write-only code, because the shell isn't really suitable for anything more complex. So it's easier to think in code than it is to add the extra step of translating to English if it's something nobody's gonna see again anyway.


My counter to this would be - if I'm doing a thing, and I need to look back through my shell history a few days or weeks later to figure out how I did it (especially if something went wrong), seeing a more readable version is going to help me figure it out faster.

Granted, anything really nontrivial I'll usually just write a Python script for, but the point stands.


but the ruby code is much more intuitively readable


It is if you know ruby. Me, I know perl and honestly I do not grok most of the ruby examples given here. I am quite sure that I could learn them about as fast as somebody not knowing perl and knowing ruby could learn perl.


I know both languages; the ruby version is better IMO. The verbiage is English... the syntax is a bit odd but so is the perl version's


Sorry, it was solved too long ago, I can't read what it says.


What does this have over the default ruby?

cat your-file.txt | ruby -ne '$_.your_ruby_stuff'

Just syntactic sugar? (it does look cleaner!)


I've always found it quite painful to do string escaping when using `ruby -e`


Exactly! It's nothing fancy but this small sugar makes Ruby more handy in those situations


"pyped" does the same for python.

I though it was great, knowing python better than bash, plus it was portable to windows.

But eventually I always end up using built in GNU tools.

Don't know why. It just rolls better.


“With 10 lines of Ruby replace most of the command line tools that you use to process text inside of the terminal.”

Well, it’s these 10 lines, plus an unlimited amount of Ruby that you have to compose on the fly for each operation.


I have no idea what this is doing, but it sure looks interesting.


It's using Ruby to replace Awk, the way Perl once tried to.


Why the hell would anybody want to replace AWK? It’s super fast, extremely light on resources, doesn’t have any dependencies, does auto memory management and type inference, and is extremely powerful for large dataset processing, not to mention easy to learn and program.


> easy to learn and program

I think that's relative and it certainly hasn't been my experience. If I program in Ruby, Node, Python, etc for 8 hours of every day, it makes sense that I would reach for that over a command line tool with a syntax that looks a bit arcane. The best tool for the job sometimes is the tool you know best.


It’s arcane only if you don’t know C. If you’re on a UNIX-like or a true UNIX system, not knowing C will come to haunt you with a vengeance sooner or later.


> The best tool for the job sometimes is the tool you know best.

That's like hammering in screws because you know your hammer and screwdrivers seem arcane to you.

For ad-hoc stuff `ruby -e` is more than enough, and if you want to write a program once and use it a lot, it's worth investing the time in doing it the way it works best, which in this case, would be a (fast) language like awk that makes it easier to process line-by-line instead of reading until EOF and making the rest of the pipeline wait.


I want to say that Perl, and Ruby, and even Python are superior tools .... except that awk is also my default go-to for pipiing into to munge output data, so yes it's clearly still hitting the sweet spot for many tasks.


Technically Ruby tried to replace Perl, so I guess the shoe fits. It sure has taken a while for someone to do this!


Well the thing is, matz did it a decade ago when he carefully copied all these features that Perl came up with or itself carefully copied from awk and sed.


It's feeding stdin to ruby code you specify on the command line, either as an array of lines, or repeated strings. So in:

  docker ps | rb drop 1 | rb -l split[1]
The output of docker ps is passed as an array of lines to ruby, and the first line is dropped, and then that output is fed to ruby again, but line by line, so each line is split (which defaults to split on whitespace), and returns the second element. With parentheses for function calls it would be:

  docker ps | rb drop(1) | rb -l split()[1]


So drop is just a Ruby function? This is a great idea, seems like a good application of Ruby's syntax.


That's similiar to the ruby-each-line gem I created https://github.com/Dorian/ruby-each-line


First I thought: This is stupid. But then I watched the examples on github and here in the comments, and I changed my mind: It's pretty neat :) I like it.


A bashtardization:

    rb () {
      [[ $1 == -l ]] && shift
      case $? in
        0 ) ruby -e "STDIN.each_line { |l| puts l.chomp.instance_eval(&eval('Proc.new { $* }')) }";;
        * ) ruby -e "puts STDIN.each_line.instance_eval(&eval('Proc.new { $* }'))";;
      esac
    }


I'd love this for python as I'm not a big fan of Ruby. Does such a thing exist?


Not well tested... just whipped this up.

The example to sort df -h by the percent:

  df -h | p 'sorted({}[1:], key=lambda x: int(x.split()[-2][:-1]))'
I saved this as ~/bin/p

  #!/usr/bin/env python3
  
  import sys
  
  single_line = sys.argv[1] == '-l'
  code = ' '.join(sys.argv[2 if single_line else 1:])
  if single_line:
      for line in sys.stdin:
          print(eval(code.format('line.strip()')))
  else:
    print('\n'.join(eval(code.format('sys.stdin.read().splitlines()'))))


Several people have tried to tackle this in Python. My attempt was python-oneliner[1]. There's also a list of similar projects in the readme.

[1] https://github.com/gvalkov/python-oneliner


I made myself a little utility that's kinda like that: https://gist.github.com/dschep/4358be665537463b9271f782e77ff...


Out of interest, why are you not a fan of Ruby?


Not the parent but speaking myself, I feel the need to pick one or the other.

I'm not saying I hate Ruby. I just haven't take the time (nor had a reason to, frankly) become a fan the way I have with Python. Being similar languages in terms of both being dynamic, malleable, mostly single threaded and REPL-able they both seem to occupy the same spot in my toolbox.

Not to knock Ruby-the-language, but the one hitch is that Python's ecosystem is quite a bit vaster into fields I don't interact with (biology, physics, etc.) but also with many that I do (numpy/sympy, cli utilities and scripting, opencv, curl bindings). Where I'm at, it's most everyone's secondary or tertiary language.

Nothing against Ruby, it's just circumstance. If anything I might pick it up soon if only to access JRuby/TruffleRuby worlds which are much ahead of Jython/GraalPython.

Oh, except for function calls without (). And I'm not huge on DSLs. They seem icky but I grew to love significant whitespace in Python so maybe I'll grow to love those too.


Right tool for the job; don't be a fan boy.


As someone who prefers Python to Ruby, Ruby is better for this job because there are so many more methods on Array/Enumerable than on Python list (or Python's global functions.) And them all being methods makes life simpler too.


I found the last example hard to read so I made this fork : https://github.com/kimat/rb


Cool.


Nice


its a pretty nifty idea


- Reading all the lines into an array is a big error.

- It's a bit unintiuitive because you don't really know what strings to escape on the bash command line. In the example, some are escaped and others aren't.

- In the end I think this could be achieved with a simple bash function:

    $ rb() { ruby -lane "print \$_.instance_eval(\"$@\")"; }
    $ echo hello | rb capitalize
    $ Hello




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: