Hacker News new | past | comments | ask | show | jobs | submit login
RobotJS – Node.js Desktop Automation (github.com/octalmage)
198 points by octalmage on July 30, 2015 | hide | past | favorite | 59 comments



Cool project! Seems like its capabilities are somewhat similar to Hammerspoon, an OS X project where you can script almost anything in the OS with Lua. I believe it was forked from Mjolnir.

I use it for window management (which is it awesome at, especially considering I have a very complex arrangement with a 4K monitor + laptop screen), automatic mute when not home, remapping shortcuts, and more.

The config syntax is pretty simple and works great. There are some really cool ones all over the internet. Here's mine: https://github.com/STRML/init/blob/master/hammerspoon/init.l...

Hammerspoon Docs: http://www.hammerspoon.org/docs/


Came to post the same thing. Here's mine: https://github.com/heptal/dotfiles/blob/master/roles/hammers...


That's a nice one, especially the paste block workaround. That is my #1 least favorite security fad/flaw.


Despite its many syntax quirks, AutoHotkey is an amazingly powerful tool for automating Windows keyboard and mouse input - I use it daily at work.

Having an equivalent tool on the Mac would be awesome, as I don't think anything in AutoHotkey's class exists right now...


Out of curiosity, have you looked at Sikuli [1]?

[1]: http://www.sikuli.org/


I hadn't heard of it before, and it does look potentially very powerful.

My only initial concern from reading the quick start (http://www.sikulix.com/quickstart.html) is the apparent requirements to use the "SikuliX IDE" for scripting...


The magic of Sikuli is its use of the OpenCV library under the hood (and Tesseract for OCR). You could skip the Sikuli part and just use OpenCV and Tesseract directly. (Not easy, but theoretically possible.)


I once used sikuli to extract a table from a pdf. Pretty funny way of hacking a fast solution for a problem. It's a pretty powerful tool but not very stable. IDE is also not so exciting but gets the job done. Pretty excited to see how far this project will go.


You actually don't need to use their IDE as the scripts it generates are just python scripts.

I've used it a few times in the past and was able to use an external screenshot program (Apple-Shift-4 on Mac) and a text editor to write the scripts.


Check out Morgan Dixon's awesome research project Prefab: The Pixel-Based Reverse Engineering Toolkit, which would be great to integrate into RobotJS.

http://homes.cs.washington.edu/~mdixon/research/prefab/


Sikuli is awesome, I've used it a bunch. But developing and distributing cross platform commercial applications with it is a mess. Also interacting with other languages, to make a GUI for example, isn't fun.


Autohotkey is a crazy dense macroing tool; I originally thought it was just move the mouse and type, but it can do so, so much more- interface with spreadsheets, web browsers, heck, there's even a tiling window manager out there that's programmed in ahk script. (bug.n)


Yeah AutoHotkey is amazing. I've written many successful apps in AHK. Like this one:

https://github.com/octalmage/mDesktop

With RobotJS and nw.js I've been able to port many of my apps to JavaScript, and ultimately make them cross platform.


As another example, I wrote some scripts (well, hacked together other people's work) to map MIDI CC values to a virtual joystick's Y axis, so I could use my MIDI expression pedal as an accelerator pedal in Euro Truck Simulator 2.


I mostly used AHK for keyboard stuff, so TextExpander would be my Mac equivalent, and within its domain it's a lot nicer and maybe even more powerful than AHK. Never done much mouse-scripting on any platform, though...



I love the features! But the language omg. Every time i want to modify my script it feels like such a chore. Would love some js or lisp that compiles to AHK.


AutoIt MVP member here (James on the forums).

I'm really glad to see a replacement for AutoIt and AHK and potentially, if not already, cross-platform too.

I can't speak for AHK but AutoIt has made huge progress over the last couple of years. Jon has been working on new features, especially improved COM support.

Maybe RobotJS will have a community too? It'd be great to see UDF's and a thriving ecosystem.


Awesome! AutoIt is great. AutoHotkey wouldn't exist if it wasn't for AutoIt.

AutoHotkey has also improved a bunch recently. Lexikos picked up development and he's done a killer job. But unfortunately it (and AutoIt) will never be cross platform. That's why I made RobotJS.

I honestly never thought about a community but that's an amazing idea. A classic forum would be great, I've spent so much time on the AutoHotkey/AutoIt forums. I'd love for this to happen.


Why won't it ever be cross-platform? Would the effort just be too gargantuan?


It's too directly tied to the Windows API. The vast majority of non-language code would have to be rewritten to run correctly, unless they emulate the Windows API instead.


There's been many attempts over the past 5 years and IronAHK got the closest, but yeah both languages rely 100% on Windows specific APIs. I hope later on to write an AHK to JavaScript compiler/interpreter. This would make the language more portable.


I've been looking for something like this forever! I loved AutoIt v3 on Windows, and osascript always felt super crippled in comparison (as well as impossible to look up documentation for). I've long since forgotten what I wanted to use this for, but I'll be sure to remember this for later!


> AutoIt v3

AutoHotkey is available and open source but the custom scripting language is off-putting, I would much rather have something like standard JavaScript for it. If this project moves forward in that direction it could be great.

http://ahkscript.org/


I grew up on AutoHotkey but yeah, the syntax is very strange. It's actually based on AutoIt v2. The closest language (syntax wise) is Assembly, and that's silly.

AutoHotkey is avalible, but only on Windows. I don't think I would have made this if AutoHotkey was cross platform.


That's exactly why I made it! I'm a huge AutoHotkey and AutoIt fan, I'm hoping I can recreate that for Node.js.

I hope you get inspired!


I know! Was thinking about this sort of thing and then ran into the pjt about a week later.


    //Type "Hello World".
    robot.typeString("Hello World");

    //Press enter. 
    robot.keyTap("enter");
Whoever designed this interface didn't think much about consistency.


It's very temporary! There's discussion about it here:

https://github.com/octalmage/robotjs/issues/4


Ah, thanks for the pointer!

    keyboard.type('foo');
    keyboard.press('fn');
... makes much more sense.


I'd go for as short (but still readable) as possible.

  > robot.keys("Type a string")

  > robot.keys(ENTER)   // ENTER is an integer key value
(I've been thinking about this while trying to make an idiomatic node client for Selenium WebDriver... https://github.com/hugs/34#api)


Importing enum values into global/function/module local namespace has always been a PITA for javascript environments. Any suggestion on how to do this cleanly?


I'm sure it's possible, but one way to avoid it -- turn the constant into a method:

  > robot.keys.ENTER()
(Not great for key combos, though...)


I'd then go with

  robot.keys.enter()
(since enter is a function, not a constant) and make it chainable by returning `robot.keys`.

  robot.keys.ctrl().enter()


You'd probably want something slightly different than keys - at least some way to different between key presses and hold the keys down at the same time.


See also: http://docs.oracle.com/javase/7/docs/api/java/awt/Robot.html

Java had this, like, forever. Long live "Write Once Run Anywhere"!


Nice stuff. Java provides so much out of the box that it becomes confusing.


Love how simple and solid this project is. You can use it to do a ton of new things in Node.


Thanks! Node.js can do anything!


RobotJS + chrome drivers (and therefore CSS selectors for mouse/keyboard actions) would be the holy grail for integrations testing. webdriver.io kind of does that now, but it's a bit finicky... Any idea how I'd set that up?


I think it would be awesome if we can somehow hook this into some API that grabs

- the process list - list of the window positions - text under a cursor - provides an interface to create specialized keyboard/mouse actions for specific apps (because why stop at chrome?)

i'm sure this is project is going to be quite popular :)


Looks damn neat! I've not used AutoIT (and the likes) previously so I can't think of compelling use-cases yet. Can someone suggest some possible ideas on what to automate in my desktop with this Node library? Thanks!


is there anything like this for browsers? I know theres things like casperjs but that seems more "browser test" specific, not "browser automation"


Mechanize might be something that fits that: https://github.com/sparklemotion/mechanize


Selenium can be used for browser automation.


Yup, Selenium has been around since 2004, and is now the basis of the W3C standard [1] for browser automation. Most browser vendors now officially support the spec. (Disclosure: I started the Selenium project. AMA.)

[1]: https://w3c.github.io/webdriver/webdriver-spec.html


I ran into a problem with the automation project I was working on when I tried using Selenium. I was trying to print (shipping labels from Amazon) and Selenium (at least Chrome webdriver) can't see the print dialog :( Any ideas?


I've used Selenium a bunch the last couple months to automate daily/hourly jobs to pull data from 3rd party UIs that don't offer an API. I couldn't imagine not having a tool like Selenium at my disposal!


Pro tip: create a chrome extension with permissions on all http:// and https:// sites, or run it through Node-Webkit/nw.io, then you can use the generic DOMParser and querySelectorAll with a pretty fluent interface on any site you can imagine.

example here: https://github.com/SchizoDuckie/DuckieTV/blob/angular/js/uti...


> or run it through Node-Webkit/nw.io

That's quite interesting! I thought node-webkit isn't suitable (yet) for such purpose. Could you go into more detail on how to do parsing/automation external sites with it?


It's very suitable! (I'm using it in DuckieTV in production, works like a charm!)

Basically, you can use xmlhttp to fetch any webpage becaused of relaxed restrictions. then use DOMParser (a built-in browser component, that you can even shim) to create a virtual DOM of that xmlhttp result, and execute regular querySelector and querySelectorAlll queries on that :)


robot.js, what will happen when we want to publish a js library for actual robots in something like Tessel? (;


This is great!

I wrote up some ideas about "aQuery -- Like jQuery for Accessibility", which RobotJS would be very useful for implementing. It refers to the Mac accessibility API but it could work with any platform, and even abstract the differences between platforms just like jQuery does.

http://www.donhopkins.com/mediawiki/index.php/AQuery

Also, Morgan Dixon did some wonderful stuff with Prefab: The Pixel-Based Reverse Engineering Toolkit, which would be great to integrate into RobotJS.

http://homes.cs.washington.edu/~mdixon/research/prefab/

aQuery -- like jQuery, but for selecting, querying and manipulating Mac app user interfaces via the Accessibility framework and protocols.

So you can write jQuery-like selectors that search for and select Accessibility objects, and then it provides a convenient high level API for doing all kinds of stuff with them. So you can write higher level plugin widgets with aQuery that use HTML with jQuery, or even other types of user interfaces like voice recognition/synthesis, video tracking, augmented reality, web services, etc!

For example, I want to click on a window and it will dynamically configure jQuery Pie Menus with the commands in the menu of a live Mac app. Or make a hypercard-like user interface builder that lets people drag buttons or commands out of Mac apps into their own stacks, and make special purpose simplified guis for controlling and integrating Mac apps.

[...]

aQuery could apply the DOM tree searching and traversal and data association stuff to the Accesibility Tree, which is similar in a lot of ways to a DOM tree, and describes all the widgets and user accessible affordances and commands in an app, as well as non-tree-like relationships between them (this label describes that widget, this tab represents that panel, this icon represents that view, this editor manipulates that object, etc).

[...]

aQuery should provide ways of registering patterns and calling handlers when user interface items that match them are created and destroyed. jQuery doesn't directly provide a way to do that (handling page onload events and XHR request responses is usually sufficient), but of course there is a jQuery plug-in that does it: https://code.google.com/p/mutation-summary/ .

So when some user interface objects you're interested in controlling come into existence, you can wrap them with your own "widget" to glue them into whatever other user interface you want to provide. (pie menus, hyperlook, ar, speech recognition, etc).

[...]

I think aQuery should be independent of jQuery, but I like to use jQuery as a metaphor for how it works, even though that might suggest that it's tied to jQuery, or even HTML, which it shouldn't be.


Thank you OP!

> 95.1% C

Any intention on making this available for other languages?


I think you are misunderstanding; robotjs is written in C but it is a module for Node.js which means you implement it using Javascript.

Node.js modules can be written in C or Javascript but implementing new features like this requires you to use C so there is no "making this available for other languages".


Well, you could take the API it presents and fit it to another language's FFI. Maybe someone who wants it for their language will do that.


I stand corrected. I looked closer.

I never knew node modules could be written in C.


I was searching around for a js wrapper, but found that even the JS API was implemented in C [1].

I would only implement the low level "hardware" primitives in C, then implement the high level API in JS like Chromium's Blink-in-JS initiative [2]. Once they start expanding the high level functionality, they will lose potential contributors by sticking with pure C.

[1] https://github.com/octalmage/robotjs/blob/master/src/robotjs...

[2] http://www.chromium.org/blink/blink-in-js


You'd be surprised by how many C/C++ programmers there are out there! I've already been surprised by the number of contributions. But yeah, using C wasn't a choice, it was the only option. Luckily we already have all planned features implemented in C.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: