To clarify how this is not related to WebAssembly, this is for code written in JavaScript, while WASM is for code written in other languages.
It's a fairly simple optimization - it's still JavaScript, just compressed and somewhat pre-parsed.
WASM doesn't currently have built-in garbage collection, so to use it to compress/speed up/whatever JavaScript, you would have to compile an entire JavaScript Virtual Machine into WASM, which is almost certainly going to be slower than just running regular JavaScript in the browser's built-in JS engine.
(This is true for the time being, anyway. WASM should eventually support GC at which point it might make sense to compile JS to WASM in some cases.)
This is important, as there seems to be a lot of misunderstanding in this thread.
What's proposed is structural compression of JS with JS-specific bits to speed things up even more. What's proposed is not compiled JS, in that the original JS is not meaningfully transformed at all. There is a very explicit design goal to retain the original syntactic structure.
OTOH WebAssembly is like a new low-level ISA accessible from web browsers. To use wasm, one does have to compile to it. As the parent here says, compiling JS -> wasm currently requires at least a GC but also much more. To engineer a performant VM and runtime for a dynamic programming language like JS is a time-consuming thing. It is curious to see so many folks think that JS can straightforwardly be compiled to wasm. Currently wasm and JS serve very different needs; and I also don't think JS will be going anywhere for many, many years.
This is a very good point. I would also add that there are a lot of languages compiling to JavaScript that would similarly not benefit from wasm. Right now, pretty much all GC-ed languages compiling to JS (such as ClosureScript, Elm, Scala.js, BuckleScript, PureScript, etc.) are in that category. Even if/when wasm supports GC in the future, I expect that dynamically typed languages compiling to JS will still be a long way from benefiting from wasm.
However, all these languages can benefit from the Binary AST right away. Just like any codebase directly written in JavaScript. If the Binary AST has some decent story about position mapping (see [1] which I just filed), they might even get somewhat better tooling/debugging support going through the Binary AST than going through .js source files, out of the box.
For context: position mapping is somewhere on our radar, but we haven't reached a stage at which it would make sense to start working on it yet.
If you have ideas and/or spare cycles, of course, they are welcome :)
As a side-note: I believe that we could build upon the (very early) proposed mechanism for comments to also store positions. Size-optimized BinAST files would drop both comments and positions, while debugging-optimized BinAST files would keep both at the end of the file so as to not slow down parsing until they are needed.
That seems awesome. I'm glad that it's on your radar.
If you can point me to the best place to suggest ideas or spend my spare cycles, I would gladly do so. At the very least, I can comment on how we serialize positions in the (tree-based) Scala.js IR, which is size-optimized.
Off-topic: I'm sorry to go a little squishy, but I thought I should say that I appreciate the work both of you do very much... though sjrd's work is a little bit more "direct-impact-for-me" at the moment, I must admit. :p
Of course, as you just both said, your work is kind of complementary... which is always nice. :)
Anyway, thanks for the long-term thinking to the both of you.
A lot of the languages you mentioned have different enough memory allocation characteristics than JavaScript due to immutability and functional style that they would probably benefit from having a garbage collector tuned to their purposes in webassembly. There's a reason we don't have one common garbage collector for all the managed languages.
I do recognize that this is a side point, but I think it's worth mentioning.
A lot of the mentioned languages also allow deep interoperability between their heap and the JavaScript heap, e.g., circular references between objects of the "two heaps", and free access to fields and methods of objects of the other language.
That's very hard (if not impossible) to achieve without leak and performance degradation if the two languages have their own GC, with their own heaps.
Compiling a language to JS is not about making it work. That's easy (it becomes hard to cite a language that does not do it). It's about designing the language to interoperate with JS, and making that work. That is the real challenge.
> Compiling a language to JS is not about making it work. That's easy (it becomes hard to cite a language that does not do it). It's about designing the language to interoperate with JS, and making that work. That is the real challenge.
It's very interesting that Scala and Scala.js have such a relatively painless interaction, but in general I'd say interoperation is "technically" simple by just employing an FFI?
Obviously, words like "seamless" and "effortless" start to enter the vocabulary here, but I'm not entirely these targets are worth it. Are they, do you think?
(I mean, obviously, Scala.js must have seamless 'interop' to Scala, but is 'seamless' introp with JS worth it, or should you require explicit FFI? I'm not sure, but I think you ultimately chose FFI-via-annotations, but there's a lot of fuzziness wrt. js.Dynamic.)
Don't you think that someone is eventually going to compile a JVM to wasm which would allow languages that compile to JVM bytecode to run directly as standard JVM bytecode in the browser? Wouldn't that allow to have as good performance as JS compilation? (I am asking you as it looks like you might have some expertise on languages that compile to both JVM and JS ;))
It might allow you to have as good performance as JS compilation, but definitely not as good interoperability with JS. Some of those languages, like ClojureScript, Scala.js and BuckleScript, have complete 2-way interop between them and JS, including for mutable objects, their properties and their methods.
"Just" compiling Scala to JVM-on-wasm does not give you the real power of Scala.js, which is its interoperability with JavaScript libraries. Similarly, just compiling Clojure to JVM-on-wasm does not give you the real power of ClojureScript.
People often forget about the interop with JS part--which is immensely more important than raw performance--if they don't actually work with a language that offers it. :-(
Thank you for clarifying. Plain text source code is just not a good way to ship code. Browsers have a warm shared copy of a highly tuned VM. However to make use of it sites have to talk to it via plain text source code heavily penalizing startup.
A JS binary AST has the potential to transparently benefits all websites without any effort on their part if it's handled by a web server module, like GZIP, or a build step. This can remove redundant work that's done for billions of page load for all web sites.
It's a fairly simple optimization - it's still JavaScript, just compressed and somewhat pre-parsed.
The author of Smalltalk/X proposed this for Smalltalk at one of the Camp Smalltalk meetups about 17 years ago. Doing this for Javascript would also facilitate the running of other languages which use Javascript as a compiler target, providing 1st class debugging support for other languages as a side effect.
No link. We were all stuffed into a La Quinta Inn in San Diego, running ethernet between the balconies. There was just a bunch of us crowded into a motel room, and he just started talking about it.
Yes, and for anything that already compiles to JS, adding this as an additional target shouldn't be too much work.
However, for anything that doesn't already compile to JS, this is essentially the same amount of work as targeting regular JavaScript. In that case, the work might be better spent making it compile to WASM. But, that's not a hard and fast rule, it really depends on the situation. And, of course, it could always target both.
Sure, why not? The difference is that WASM is aiming for near native performance of code, while the binary JS would still be limited to JS performance.
But that's what languages that compile to JavaScript support already today. The binary JS shouldn't prevent this behaviour if it is just a binary representation of a text version.
I don't know of anyone working on it, exactly because it is so difficult. So, while it might happen eventually, I don't think that anyone can provide an ETA at this stage.
So, compiled Javascript then? "We meet again, at last. The circle is now complete."
The more I see interpreted languages being compiled for speed purposes, and compiled languages being interpreted for ease-of-use purposes, desktop applications becoming subscription web applications (remember mainframe programs? ), and then web applications becoming desktop applications (electron) the more I realize that computing is closer to clothing fads than anything else. Can't wait to pickup some bellbottoms at my local target.
You're not the only one to observe that computing tends to be fad-driven. I enjoy Alan Kay's take on it:
In the last 25 years or so, we actually got something like a pop culture, similar to what happened when television came on the scene and some of its inventors thought it would be a way of getting Shakespeare to the masses. But they forgot that you have to be more sophisticated and have more perspective to understand Shakespeare. What television was able to do was to capture people as they were.
So I think the lack of a real computer science today, and the lack of real software engineering today, is partly due to this pop culture.
...
I don’t spend time complaining about this stuff, because what happened in the last 20 years is quite normal, even though it was unfortunate. Once you have something that grows faster than education grows, you’re always going to get a pop culture.
...
But pop culture holds a disdain for history. Pop culture is all about identity and feeling like you're participating. It has nothing to do with cooperation, the past or the future — it's living in the present. I think the same is true of most people who write code for money. They have no idea where [their culture came from] — and the Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free? The Web, in comparison, is a joke. The Web was done by amateurs.
I wish I've heard/seen some of the Alan Kay talks/articles earlier in my career. The more I work in IT, the more I see the wisdom in how he sees the industry. (And I don't just mean these pop culture comments.)
I feel fortunate to have seen his talks and interviews early in my career.
The downside is that exposure to both of those led me to write some software in both Smalltalk and Common Lisp...which is a downside because having worked with those makes some currently popular tools seem like dirty hacks.
Although I use things and enjoy things like React, and Webpack, and npm and I'm very happy with the things I'm able to create with them, when I stop and think about it, the tooling and building/debugging workflow feels pretty disjointed to what was available in Smalltalk and CL environments 30 years ago.
I understand why those tools didn't win over the hearts and minds of developers at the time. They were expensive and not that easily accessible to the average developer. I just wish that as an industry, we'd at least have taken the time to understand what they did well and incorporate more of those lessons into our modern tools.
You seem to assume that there is One True Way of doing things and we're just circling round trying to find it. In fact, as you point out, there are benefits and tradeoffs to every approach. People compile interpreted languages to make them run faster, but are using them in the first place because they're easier to use. People who have come to compiled languages for speed are now naturally looking to make them easier to develop with as well.
Compiled C is not going to go away any sooner than interpreted Javascript - but a widening of the options available is great as it allows us to focus on developing things quickly and correctly, rather than making decisions based on which particular annoyance you're happy to put up with.
People try to move CS both towards and away from engineering: We want the dependability engineering seems to provide in, say, Civil Engineering, but we don't want to acknowledge that engineering is about trade-offs, which are compromises, and that the nature of those trade-offs changes as the world surrounding a specific tool or program changes.
Maybe they think there is a One True Way. Maybe they think every building needs to be reinforced concrete and structural steel, now and forever, in every possible context.
This isn't driven by a "fad", rather by the simple fact that any trace of JS engine performance will show a sizable chunk at the beginning dedicated to just parsing scripts. Heck, we even have "lazy parsing", where we put off doing as much parsing work as possible until a piece of code is needed. Replacing that with a quick binary deserialization pass is a straightforward win.
I wouldn't call this "compiled JavaScript". In broad strokes what's happening is you take the result of parsing JavaScript source to an AST and serialize that; then next time you deserialize the AST and can skip the parsing stage.
(Source: I spent a summer working on JS engine and JIT performance at Mozilla.)
I don't think the point was that this project is a fad.
Instead, I think the point was that Javascript is a fad (we'll see whether this is true by watching how popular compile-to-WASM languages become compared to JS, once WASM becomes widespread and stable).
Alternatively, we might say that JS was created (in 10 days, yadda yadda) at a time when the fads were dynamic typing over static typing; interpreters over compilers; GC over manual or static memory management; OOP over, say, module systems; imperative over logic/relational/rewriting/etc.; and so on. JS's role as the Web language has tied developers' hands w.r.t. these tradeoffs for 20 years; long enough that a significant chunk of the developer population hasn't experienced anything else, and may not really be aware of some of these tradeoffs; some devs may have more varied experience, but have developed Stockholm Syndrome to combat their nostalgia ;)
As an example, one of the benefits of an interpreter is that it can run human-readable code; this is a sensible choice for JS since it fits nicely with the "View Source" nature of the Web, but it comes at a cost of code size and startup latency. The invention of JS obfuscation and minifiers shows us that many devs would prefer to pick a different balance between readability and code size than Eich made in the 90s. This project brings the same option w.r.t. startup latency. WASM opens up a whole lot more of these tradeoffs to developers.
Actually I think it was created at a time where it was used very sparingly and had very limited scope. The joke used to be that the average javascript program is one line. I don't know if that was ever exactly true but a lot of early javascript was on inline event attributes (think "onclick" and the like).
For that use a forgiving scripting language is a good fit. What changed is what javascript, and the web, is used for. Embedded Java and Flash showed that there was an appetite for the web to do more and the security problems involved with those technologies showed they weren't a good fit.
Javascript was adapted to fill that void by default as it was the only language that every browser could agree on using.
Parsing is unavoidable, regardless if the code is interpreted or compiled.
This proposal seems to further compress the AST somehow.
Basically the idea is to already preprocess the text source, so that the client can skip two steps on execution: lexical analysis and the parsing of the text source resulting in the AST. The post explains why the authors believe that this is a good idea.
Software development is fad driven, I think a lot or most people would agree with this. I myself find this to be lamentable. But over the years I've asked myself, why is this the case? I mean it has been faddish for years, this isn't new in the current era.
I think there are two main drivers for fad-ism in software. One is we as developers, are required to chase the popular languages and frameworks to stay current, marketable, and employable. So there is always a sense of "what do I have to learn next, so I can switch jobs succesfully when I need to". Another driver is, developers seek to gain experience by re-doing things from the past. Example, creating new languages, or compiler for an existing language, provides inexperience developers with fantastic new experiences and knowledge. This is desirable.
So yeah, we got fads, and us old timers, or not so old timers with a historical perspective, watch our lawns, but everyone has to learn somehow. Ideally a lot of this would come from an apprenticeship/journeyman system, or a University degree of some sort. But for now it's all we have.
To me, software development seems as tribal as politics. People tend to like and agree with those in their own tribe, and tend to see criticism of their tribe's preferred ideology as an attack on their identity.
There's a lot of pressure to be part of a popular tribe, and so you see situations where even a mild criticism of, for example, the a piece of the JavaScript tribe's technology (like, say, npm or Webpack) tends to elicit vitriolic responses and counter arguments.
You can sometimes get away with this kind of criticism if you choose your words carefully and make it very clear (or at least pretend to) that you're actually a member of the tribe you're criticizing, and that your criticism is meant to be constructive. So you say something like 'I use npm, and I like it, but here's where I struggle with it and here's how I'd improve it'. But if you just write about why you dislike it, you're likely to be flamed and criticized, even if everything you wrote verifiably correct.
So I find that watching programmers argue in blogs and on HN and Reddit feels a lot like reading political arguments on Reddit, or Facebook. And maybe that's why programming tends to be faddish. Each tribe is mostly only aware of its own history; there's no real shared sense of culture, or much awareness of the rich shared history that got us to where we are today. And so you see lots of wheel reinvention, as people poorly re-solve problems that were already solved long ago by other programming tribes.
These are just my personal observations, though. I could very well be completely wrong. :)
Webassembly is pretty much a greenfield approach: you need to learn C, redo your whole codebase and still write JS code to load your wasm, interact with DOM and browser APIs, and leverage existing good enough JS libraries. This is the state of things as of 2017. It took 4 years to reach that point. If you want to bet everything on the state of wasm in 2-3 years from now, please be my guest. Meanwhile, I have a business to run.
The binary AST is a quick win for brownfield technology, with already a working version in an experimental Firefox build. It can be rolled quite fast. It improves right now the experience of many users on existing codebase the same way minification did. And improve the experience of current web assembly, as it is still largely dependent on the speed of JS. I'll take this feature any day.
You are obviously very passionate and I believe that nothing I can write will convince you, so I will not pursue this conversation with you after this post.
For the same reason, my answers here are for people who have not followed the original HN thread in which we have already had most of this conversation.
I also believe that the best way for you to demonstrate that another approach is better than the JavaScript Binary AST would be for you to work on demonstrating your approach, rather than criticizing this ongoing work.
> - WebAssembly does not support JS hence we need binary AST
> WebAssembly is on track to support dynamic and GC-ed languages
If/when WebAssembly gains the ability to interact with non-trivial JS objects, JS libraries, etc. we will be able to compare the WebAssembly approach with the Binary AST approach. GC support and dynamic dispatch are prerequisites but are not nearly sufficient to allow interaction with non-trivial JS objects and libraries.
So let's meet again and rediscuss this if/when this happens.
> - WebAssembly will be slower than binary AST
> You've provided zero support for this statement. Meanwhile experiments with real apps (like Figma) show significant improvements with WebAssembly
If you recall, we are only talking about loading speed. I believe that there is no disagreement on execution speed: once WebAssembly implementations are sufficiently optimized, it is very likely that well-tuned WebAssembly code will almost always beat well-tuned JS code on execution.
The argument exposed on the blog is that:
- attempting to compile JavaScript to existing WebAssembly is very hard (hard enough that nobody does it to the best of my knowledge);
- the specifications of JavaScript are complex enough that every single object access, every single array access, every single operator, etc. is actually a very complex operation, which often translates to hundreds of low-level opcodes. Consequently, should such a compilation take place, I suspect that this would considerably increase the size of the file and the parsing duration;
- by opposition, we have actual (preliminary) numbers showing that compressing to JavaScript Binary AST improves both file size and parse time.
While I may of course be wrong, the only way to be sure would be to actually develop such a compiler. I have no intention of doing so, as I am already working on what I feel is a more realistic solution, but if you wish to do so, or if you wish to point me to a project already doing so, I would be interested.
You refer to the experiments by Figma. While Figma has very encouraging numbers, I seem to recall that Figma measured the speedup of switching from asm.js to WebAssembly. In other words, they were measuring speedups for starting native code, rather than JS code. Also a valid experiment, but definitely not the same target.
> - No one wants to ship bytecode
> False
Are we talking about the same blog entry?
What I claimed is that if we ended up with one-bytecode-per-browser or worse, one-bytecode-per-browser-version, the web would be much worse than it is. I stand by that claim.
> - it's hard to align browsers on bytecode
> See WebAssembly
I wrote the following things:
- as far as I know, browser vendors are not working together on standardizing a bytecode for the JavaScript language;
- coming up with a bytecode for a language that keeps changing is really hard;
- keeping the language-as-bytecode and the language-as-interpreted in sync is really hard;
- because of the last two points, I do not think that anybody will start working on standardizing a bytecode for the JavaScript language.
In case of ambiguity, let me mention that I am part of these "browser vendors". I can, of course, be wrong, but once again, let's reconvene if/when this happens.
> - you can't change bytecode once it's shipped
> You can, eventually. Same argument can be applied to binary AST
I claimed that no browser vendor seriously contemplates exposing their internal bytecode format because this would make the maintenance of their VM a disaster.
Indeed, there is an order of magnitude of difference between the difficulty of maintaining several bytecode-level interpreters that need to interact together in the same VM (hard) and maintaining several parsers for different syntaxes of the same language (much easier). If you know of examples of the former, I'd be interested in hearing about them. The latter, on the other hand, is pretty common.
Additionally, the JavaScript Binary AST is designed in such a manner that evolutions of the JavaScript language should not break existing parsers. I will post more about this in a future entry, so please bear with me until I have time to write it down.
> - WebAssembly not supported by tools
> Neither is binary AST. Meanwhile one of the goals of WebAssembly is tool support, readability etc.
I'm pretty sure I never said that.
> So. What are you going to do with JS AST when WebAssembly gets support for JS?
If/when that day arrives, I'll compare both approaches.
I may be mistaken, but I believe Adobe had to ship an Actionscript 2 runtime forever because they completely replaced it when they introduced AS3. They needed to keep the flash player backwards compatible, so the player had to have both runtimes. While it worked, it was a kludge.
> in which we have already had most of this conversation.
Interestingly enough, nowhere do you even mention these conversations in your attempts to push binary AST as hard as possible.
> Consequently, should such a compilation take place, I suspect that this would considerably increase the size of the file and the parsing duration;
Emphasis above is mine. However, it's presented (or was presented) by you as a fact.
> What I claimed is that if we ended up with one-bytecode-per-browser or worse, one-bytecode-per-browser-version, the web would be much worse than it is. I stand by that claim.
We are ending up with one wasm bytecode for every browser, aren't we?
> as far as I know, browser vendors are not working together on standardizing a bytecode for the JavaScript language;
Because they are working together on standardizing bytecode for the web in general, aren't they?
> coming up with a bytecode for a language that keeps changing is really hard
So you're trying to come up with a binary AST for a language that keeps changing :-\
> keeping the language-as-bytecode and the language-as-interpreted in sync is really hard
What's the point of WebAssembly then?
> I do not think that anybody will start working on standardizing a bytecode for the JavaScript language.
Because that's not really required, is it? This is a non-goal, and never was the goal. The goal is to create an interoperable standardized bytecode (akin to JVM's bytecode or .Net's MSIL), not a "standardized bytecode for Javascript". For some reason you don't even want to mention this.
> I claimed that no browser vendor seriously contemplates exposing their internal bytecode format
They don't need to.
> Additionally, the JavaScript Binary AST is designed in such a manner that evolutions of the JavaScript language should not break existing parsers.
I shudder to think how rapidly changing languages like Scala, ClojureScript etc. can ever survive on the JVM. The horror!
> If/when that day arrives, I'll compare both approaches.
Basically you will end up with two representations of JS: a binary AST and a version that compiles to WASM. Oh joy. Wasn't this something you wanted to avoid?
Regardless of all your points, the fact is that WASM isn't ready for this now, and doesn't appear that it will be for some time.
Combined with the fact, as mentioned above, that DOM access is significantly slower means that WASM isn't a suitable candidate. This is something you forgot to mention or take into account in your comment, for some reason.
Yes this does seem to overlap a bit with wasm as you have noted, but saying "well we could get this optimization but we need to wait several years until this highy complex other spec is completely finished" doesn't seem as good.
Why not do this, and use wasm when it's available? Why can't you have both?
> Regardless of all your points, the fact is that WASM isn't ready for this now
As opposed to Binary AST which as available now? :)
> Combined with the fact, as mentioned above, that DOM access is significantly slower
Given that DOM access for WASM currently happens via weird interop through JavaScript (if I'm not mistaken) how is this a fact?
> "well we could get this optimization but we need to wait several years until this highy complex other spec is completely finished" doesn't seem as good.
No. My point is: "Let's for once make a thing that is properly implemented, and not do a right here right now short-sighted short-range solution"
That's basically how TC39 committee operates these days: if something is too difficult to spec/design properly, all work is stopped and half-baked solutions are thrown in "because we need something right now".
> Why not do this, and use wasm when it's available? Why can't you have both?
Because this means:
- spreading the resources too thin. There is a limited amount of people who can do this work
- doing much of the work twice
- JS VM implementors will have to support text-based parsing (for older browsers), new binary AST and WASM
... because DOM access is significantly (10x) slower? That alone rules out your approach right now, regardless of the other points raised.
> Because this means: ...
When you understand how weak these arguments are you will understand the negative reaction to your comments on this issue.
You're clearly very passionate about this issue but you don't seem to be assessing the trade offs. Having something right now and iterating on it is better than waiting an indeterminate amount of time for a possible future solution involving an overlapping but otherwise unrelated specification that has not been fully implemented by anyone to a satisfactory point, and one with very complex technical issues blocking its suitability.
Sure, it would be nice to use WASM for this, but it is in no way needed at all. Given the status of WASM and the technical issues present with using it in this way it is odd to champion it to such a degree.
It seems your entire arguments boil down to "WASM may be OK to use at some point in the future, stop working on this and wait!". I, and I'm assuming others, don't see this as a very convincing point.
If I may, I'd offer some advice: stop taking this issue to heart. Time will tell if you're right, and making borderline insulting comments to the technical lead of the project in an attempt to push your position doesn't help anyone.
The world is heating up and species are dying, there are much better causes to take to heart.
How can you say it's "compressed" over "compiled" when you are actually parsing it into an AST and then (iiuc) converting that to binary? That's exactly what compilers do. You are in fact going to a new source format (whatever syntax/semantics your binary AST is encoded with) so you really are compiling.
To be fair, these two concepts are similar and I may be totally misunderstanding what this project is about. In the spirit of fairness, let me test my understanding. You are saying wasm bytecode is one step too early and a true "machine code" format would be better able to improve performance (especially startup time). I'm not following wasm development, but from comments here I am gathering that wasm is too level and you want something that works on V8. Is that what this project is about?
On a side note, it's truly a testament to human nature that the minute we get close to standardizing on something (wasm), someone's gotta step up with another approach.
> How can you say it's "compressed" over "compiled" when you are actually parsing it into an AST and then (iiuc) converting that to binary? That's exactly what compilers do. You are in fact going to a new source format (whatever syntax/semantics your binary AST is encoded with) so you really are compiling.
I am not sure but there may be a misunderstanding on the word "binary". While the word "binary" is often used to mean "native", this is not the case here. Here, "binary" simply means "not text", just as for instance images or zipped files are binary.
A compiler typically goes from a high-level language to a lower-level language, losing data. I prefer calling this a compression mechanism, insofar as you can decompress without loss (well, minus layout and possibly comments). Think of it as the PNG of JS: yes, you need to read the source code/image before you can compress it, but the output is still the same source code/image, just in a different format.
> You are saying wasm bytecode is one step too early and a true "machine code" format would be better able to improve performance (especially startup time). I'm not following wasm development, but from comments here I am gathering that wasm is too level and you want something that works on V8. Is that what this project is about?
No native code involved in this proposal. Wasm is about native code. JS BinAST is about compressing your everyday JS code. As someone pointed out in a comment, this could happen transparently, as a module of your HTTP server.
> On a side note, it's truly a testament to human nature that the minute we get close to standardizing on something (wasm), someone's gotta step up with another approach.
Well, we're trying to solve a different problem :)
Posting this in the hope that it might help some people grok what they are actually doing:
When I first discovered what Yoric and syg were doing, the first thing that I thought of was old-school Visual Basic. IIRC when you saved your source code from the VB IDE, the saved file was not text: it was a binary AST.
When you reopened the file in the VB6 IDE, the code was restored to text exactly the way that you had originally written it.
The parent might have been thinking of QuickBasic, which saved programs as byte-code, along with formatting information to turn it back into text: http://www.qb64.net/wiki/index.php/Tokenized_Code
BBC Basic did this too -- keywords were stored as single bytes (using the 128-255 values unused by ASCII). Apart from that the program code wasn't compiled, it just was interpreted directly. Very smart design when everything had to fit in 32KB of RAM.
> Here, "binary" simply means "not text", just as for instance images or zipped files are binary.
If it's not text, then what is it? I'm not sure "not text" is a good definition of the word "binary".
> A compiler typically goes from a high-level language to a lower-level language, losing data.
I don't agree, I don't think there is any loss in data, the compiled-to representation should cover everything you wanted to do (I suppose not counting tree-shaking or comment removal).
> I prefer calling this a compression mechanism, insofar as you can decompress without loss (well, minus layout and possibly comments).
Ahh, so you mean without losing the original textual representation of the source file.
> Wasm is about native code
Here you are making claims about their project that are just not the whole picture. Here's the one-line vision from their homepage[1]:
> WebAssembly or wasm is a new portable, size- and load-time-efficient format suitable for compilation to the web.
With that description in mind, how do you see BinAST as different?
> Well, we're trying to solve a different problem :)
I think you might be misunderstanding what wasm is intended for. Here's a blurb from the wasm docs that is pertinent:
> A JavaScript API is provided which allows JavaScript to compile WebAssembly modules, perform limited reflection on compiled modules, store and retrieve compiled modules from offline storage, instantiate compiled modules with JavaScript imports, call the exported functions of instantiated modules, alias the exported memory of instantiated modules, etc.
The main difference I can gather is that you are intending BinAST to allow better reflection on compiled modules than wasm intends to support.
Here's another excerpt from their docs (and others have mentioned this elsewhere):
> Once GC is supported, WebAssembly code would be able to reference and access JavaScript, DOM, and general WebIDL-defined objects.
Text means a sequence of characters conforming to some character encoding. Yoric's binary AST is not a sequence of characters conforming to a character encoding.
Compilation maps a program in a higher level language to a program in a lower level language. The map is not required to be one-to-one: the colloquial term for this is "lossy."
> If you prefer, this Binary AST representation is a form of source compression, designed specifically for JavaScript, and optimized to improve parsing speed.
Fads, or steadfast pursuit of balancing tradeoffs and finding clever ways to improve tools?
Compilation and interpretation both have distinct advantages. It's hard to do both (read: it took a while) and each has tradeoffs.
I'm impressed and grateful that I can get compile time errors for interpreted languages, and I'd feel the same about being able to use a compiled lang in a repl.
Regarding the actual topic at hand, this isn't compiled JavaScript, just parsed JavaScript. They're shipping an AST, not bytecode/machine code.
Haha, I did enjoy this comment. It's kinda fair, but also kinda not.
At each step developers are trying to create the best thing they can with the tools at their disposal. There will always be a tradeoff between flexibility and ease of use. There's a reason Electron exists — the browser has limitations that are easier to work around in Electron. There's a reason we moved to putting stuff online, we could write it once and distribute it everywhere.
> At each step developers are trying to create the best thing they can with the tools at their disposal.
Well, yeah. The point is that we don't have a consistent definition of "best" that's constant over time. Only by changing your direction all the time can you end up walking in circles.
I think everything will eventually settle in the middle where Java and C# currently live.
They have an "intermediate code" that they somewhat compile down to. But this is really a more compact version of the source than a binary though. You can see this easily by running a decompiler. Often times the decompiled IL is almost identical to the source. And before the JIT kicks in the IL is typically interpreted.
You get the best of both worlds. Easy to compile, easy to decompile, platform agnostic, compact code, and fast if it needs to be
The Design of Everyday Things is instructive here. You don't design for the middle or the average. You design for the extreme edges. If you have a programming language that can be used both by absolute beginners who just want to learn enough programming to script their excel spreadsheet AND HFT traders who need to ring out every cycle of performance they can on every core they have at their disposal, then the middle will take care of itself.
It's not optimal for things like embedded, device drivers, high performance computing, etc. So I doubt everything will settle there. There will always be situations where C/Rust/Fortran are better choices.
Part of this is fad driven, but part of this is also driven by other human concerns. For example, had the early web been binary (like we're pushing for now) instead of plain text it would have died in its crib. Executing binary code sent blindly from a remote server without a sandbox is a security nightmare. Now that we have robust sandboxes, remote binary execution becomes a viable option again. But, it took a decade's worth of JVM R&D to get to this point.
There is a serious problem when every new generations of a technology fixates on some problem of its choice and ignores all the others. The issue of binary blobs didn't go away. What changed is that a lot of developers today don't care about the open nature of the web and are perfectly fine with sacrificing it for faster load times of their JS-saturated websites.
I think a much better approach to this pronblem would be a compression format that's designed specifically for JavaScript. It could still be negotiable, like GZIP, so anyone wanting to see the source would be able to see it. It could also be designed in a way that allows direct parsing without (full?) decompression.
> What changed is that a lot of developers today don't care about the open nature of the web and are perfectly fine with sacrificing it for faster load times of their JS-saturated websites.
I'm one of those developers who couldn't care less about the 'open nature of the web'. I understood the necessity of it to the early, emerging internet, but times have changed. Now that we have secure sandboxes and the web is choking on its own bloat, it's time to shift back to binary blobs.
How does adding more code to the browser stack prevent the web from "choking on its own bloat"? You need to take things away to reduce bloat, not add them.
And they all had crippling security problems[1]. I only mentioned the JVM, but you're right, all of those technologies contributed to the work of secure sandboxes.
[1]: Except maybe Silverlight. From what I remember, it didn't see enough success to be a tempting target for hackers.
That is what on my ideal world ChromeOS should have been, but with Dart instead, unfortunately the ChromeOS team had other plans in mind and made it into a Chrome juggler OS.
I was using Smalltalk on some university projects, before Sun decided to rename Oak and announce Java to the world.
The development experience was quite good.
Similarly with Native Oberon, which captured many of the Mesa/Cedar workflows.
> That is what on my ideal world ChromeOS should have been, but with Dart instead,
That would have been very interesting. I liked Dart from the brief time I looked at it. The web still feels like a somewhat crippled platform to develop for.
From an alternate "not the web" viewpoint, I am interested in this because we have a desktop application that bootstraps a lot of JS for each view inside the application. There is a non-insignificant chunk of this time spent in parsing and the existing methods that engines expose (V8 in this case) for snapshotting / caching are not ideal. Given the initial reported gains, this could significantly ratchet down the parsing portion of perceived load time and provide a nice boost for such desktop apps. When presented at TC39, many wanted to see a bit more robust / scientific benchmarks to show that the gains were really there.
Yoric is working on a "proper" implementation and I'll be assisting in the design work, and if needs be some implementation. I think the next milestone here is a working demo in Firefox with Facebook page load.
At a personal level, I feel pretty confident that BinaryJS will help your case. The theory behind the gains was pretty solid before we designed the prototype. The prototype, for me, basically proved the theory.
My personal hope is that by the time we're done squeezing all we can out of this - which includes zero-costing lazy function parsing, and on-line bytecode generation, we can cut the "prepare for execution" time by 80%.
Do you have any details on what could be improved wrt V8's snapshotting? We have recently extended the feature set a lot in order to be able to snapshot full Blink contexts, and there are some efforts on-going to implement that in Node.js as well. Soon crutches like electron-link won't be necessary anymore.
Here's some perspective for where this project is coming from:
> So, a joint team from Mozilla and Facebook decided to get started working on a novel mechanism that we believe can dramatically improve the speed at which an application can start executing its JavaScript: the Binary AST.
I really like the organization of the present article, the author really answered all the questions I had, in an orderly manner. I'll use this format as a template for my own writing. Thanks!
Personally, I don't see the appeal for such a thing, and seems unlikely all browsers would implement it. It will be interesting to see how it works out.
> Personally, I don't see the appeal for such a thing, and seems unlikely all browsers would implement it.
Facebook can just run a feature detection JS snippet to see if your browser supports the Binary AST, and select the URLs of the other JS assets accordingly. If Binary AST is a thing, I can easily see web frameworks implementing such a format selection transparently.
And hey, if it contributes to people perceiving Firefox as faster than Chrome, I'm all for it. :)
It's going through TC39 (the committee that decides how the JavaScript language should evolve) and recently reached stage 1 (proposals advance from stage 0 to stage 4). I believe that if it reaches stage 4, then that means that all browser vendors have agreed to implement it, so it certainly seems to be their goal.
Oh, I think I misunderstood you. You mean the AST is going to go the way of Native Client because Mozilla doesn't have the muscle it used to? Or do you think Google's going to sandbag it as revenge for Native Client? :)
This is reminiscent of the technique used by some versions of ETH Oberon to generate native code on module loading from a compressed encoding of the parse tree. Michael Franz described the technique as "Semantic-Dictionary Encoding":
«SDE is a dense representation. It encodes syntactically correct source program by a succession of indices into a semantic dictionary, which in turn contains the information necessary for generating native code. The dictionary itself is not part of the SDE representation, but is constructed dynamically during the translation of a source program to SDE form, and reconstructed before (or during) the decoding process. This method bears some resemblance to commonly used data compression schemes.»
This same technique also was used by JUICE, a short-lived browser plugin for running software written in Oberon in a browser. It was presented as an alternative to Java byte code that was both more compact and easier to generate reasonable native code for.
I seem to recall that the particular implementation was quite tied to the intermediate representation of the OP2 family of Oberon compilers making backward compatibility in the face of changes to the compiler challenging and I recall a conversation with someone hacking on Oberon that indicated that he'd chosen to address (trans)portable code by the simple expedient of just compressing the source and shipping that across the wire as the Oberon compiler was very fast even when just compiling from source.
I'm guessing the hard parts are:
(0) Support in enough browsers to make it worth using this format.
(1) Coming up with a binary format that's actually significantly faster to parse than plain text. (SDE managed this.)
(2) Designing the format to not be brittle in the face of change.
Indeed, one of the challenges was designing a format that will nicely support changes to the language. I believe that we have mostly succeeded there, and I'll blog about it once I find a little time. Now, of course, the next challenge is making sure that the file is still small enough even though we are using this future-proof format. We haven't measured this yet, but I expect that this will need additional work.
This is a really interesting project from a browser technology point of view. It makes me wonder how much code you'd need to be deploying to for this to be useful in a production environment. Admittedly I don't make particularly big applications but I've yet to see parsing the JS code as a problem, even when there's 20MB of libraries included.
This is what BASIC interpreters on 8-bit systems did from the very beginning. Some BASIC interpreters did not even allow you to type the keywords. Storing a trivially serialized binary form of the source code is a painfully obvious way to reduce RAM usage and improve execution speed. You can also trivially produce the human-readable source back.
It's of course not compilation (though parsing is the first thing a compiler would do, too). It's not generation of machine code, or VM bytecode. it's mere compression.
This is great news because you got to see the source if you want, likely nicely formatted. You can also get rid of the minifiers, and thus likely see reasonable variable names in the debugger.
This is some amazing progress, but reading this and hearing how difficult JavaScript is as a language to design around makes me wonder how many hours have we spent optimizing a language designed in 2 weeks and living with those consequences. I wish we could version our JavaScript within a tag somehow so we could slowly deprecate code. I guess that would mean though browsers would have to support two languages that would suck..... this really is unfortunately the path of least resistance.
(I understand I could use elm, cjs, emscriptem or any other transpirer but I was thinking of ours spent around improving the js vm.
3. Deployed Scripting Media Types and Compatibility
Various unregistered media types have been used in an ad-hoc fashion
to label and exchange programs written in ECMAScript and JavaScript.
These include:
+-----------------------------------------------------+
| text/javascript | text/ecmascript |
| text/javascript1.0 | text/javascript1.1 |
| text/javascript1.2 | text/javascript1.3 |
| text/javascript1.4 | text/javascript1.5 |
| text/jscript | text/livescript |
| text/x-javascript | text/x-ecmascript |
| application/x-javascript | application/x-ecmascript |
| application/javascript | application/ecmascript |
+-----------------------------------------------------+
And along these lines: javascript modules automatically turn strict mode on, so most people will just be on the "new" (strict) version of javascript once modules are popular.
Brendan Eich was on a podcast in 2016 talking about the origins and evolution of JS. At one point he and the ECMA team wanted to make == strict equality, but then that would have required specifying the JS version, and Microsoft didn't like that, so they decided to go with === and leave == non-strict.
It's one of several cases where backward compatibility on the web trumped cleaning up the language.
This article says "Wouldn’t it be nice if we could just make the parser faster? Unfortunately, while JS parsers have improved considerably, we are long past the point of diminishing returns."
I'm gobsmacked that parsing is such a major part of the JS startup time, compared to compiling and optimizing the code. Parsing isn't slow! Or at least it shouldn't be. How many MBs of Javascript is Facebook shipping?
Does anyone have a link to some measurements? Time spent parsing versus compilation?
I think the main issue with parsing is that you probably need to parse all JavaScript before you can start executing any of it. That might lead to a high delay before you can start running scripts.
Compiling and optimizing code can be slow, too, but JIT compilers don't optimize all code that's on a page. At least at first, the code gets interpreted, and only hot code paths are JIT compiled, probably in a background threads. That means that compiling/optimizing doesn't really add to the page load latency.
But I agree with you that this is a strange suggestion. If parsing is so slow, maybe browsers should be caching the parsed representation of javascript sources to speed up page loading, or even better: the bytecode/JIT-generated code.
> If parsing is so slow, maybe browsers should be caching the parsed representation of javascript sources to speed up page loading, or even better: the bytecode/JIT-generated code.
Lua has something very similar(bytecode vs AST) via luac for a long while now. We've used to to speed up parse times in the past and it helps a ton in that area.
There's a problem---you can break out of Lua with malformed bytecode [1] and the Lua team don't want to spend the time trying to validate Lua byte code [2]. That's why the latest Lua version has an option to ignore precompiled Lua scripts.
Sure, if you're loading scripts from an untrusted source then don't use bytecode. They're pretty clear about that in the docs. However probably about 90% of Lua's use cases are embedded and so in that case it works just fine.
i'm very skeptical about the benefits of a binary JavaScript AST. The claim is that a binary AST would save on JS parsing costs. however, JS parse time is not just tokenization. For many large apps, the bottleneck in parsing is instead in actually validating that the JS code is well-formed and does not contain early errors. The binary AST format proposes to skip this step [0] which is equivalent to wrapping function bodies with eval… This would be a major semantic change to the language that should be decoupled from anything related to a binary format. So IMO proposal conflates tokenization with changing early error semantics. I’m skeptical the former has any benefits and the later should be considered on its own terms.
Also, there’s immense value in text formats over binary formats in general, especially for open, extendable web standards. Text formats are more easily extendable as the language evolves because they typically have some amount of redundancy built in. The W3C outlines the value here (https://www.w3.org/People/Bos/DesignGuide/implementability.h...). JS text format in general also means engines/interpreters/browsers are simpler to implement and therefore that JS code has better longevity.
Finally, although WebAssembly is a different beast and a different language, it provides an escape hatch for large apps (e.g. Facebook) to go to extreme lengths in the name of speed.
We don’t need complicate JavaScript with such a powerful mechanism already tuned to perfectly complement it.
Early benchmarks seem to support the claim that we can save a lot on JS parsing costs.
We are currently working on a more advanced prototype on which we will be able to accurately measure the performance impact, so we should have more hard data soon.
It seems like one big benefit of the binary format will be the ability to skip sections until they're needed, so the compilation can be done lazily.
But isn't it possible to get most of that benefit from the text format already? Is it really very expensive to scan through 10-20MB of text looking for block delimiters? You have to check for string escapes and the like, but it still doesn't seem very complicated.
Well, for one thing, a binary format’s inherent “obfuscatedness” actually works in its favor here. If Binary AST is adopted, I’d expect that in practice, essentially all files in that format will be generated by a tool specifically designed to work with Binary AST, that will never output an invalid file unless there’s a bug in the tool. From there, the file may still be vulnerable to random corruption at various points in the transit process, but a simple checksum in the header should catch almost all corruption. Thus, most developers should never have to worry about encountering lazy errors.
By contrast, JS source files are frequently manipulated by hand, or with generic text processing tools that don’t understand JS syntax. In most respects, the ability to do that is a benefit of text formats - but it means that syntax errors can show up in browsers in practice, so the unpredictability and mysteriousness of lazy errors might be a bigger issue.
I suppose there could just be a little declaration at the beginning of the source file that means “I was made by a compiler/minifier, I promise I don’t have any syntax errors”…
In any case, parsing binary will still be faster, even if you add laziness to text parsing.
a simple checksum in the header should catch almost all corruption
For JavaScript, you have to assume the script may be malicious, so it always has to be fully checked anyway.
It's true that the binary format could be more compact and a bit faster to parse. I just feel that the size difference isn't going to be that big of a deal after gzipping, and the parse time shouldn't be such a big deal. (Although JS engine creators say parse time is a problem, so it must be harder than I realise!)
> For JavaScript, you have to assume the script may be malicious, so it always has to be fully checked anyway.
The point I was trying to make isn't that a binary format wouldn't have to be validated, but that the unpredictability of lazy validation wouldn't harm developer UX. It's not a problem if malicious people get bad UX :)
Anyway, I think you're underestimating the complexity of identifying block delimiters while tolerating comments, string literals, regex literals, etc. I'm not sure it's all that much easier than doing a full parse, especially given the need to differentiate between regex literals and division...
I was figuring you could just parse string escapes and match brackets to identify all the block scopes very cheaply.
Regex literals seem like the main tricky bit. You're right, you definitely need a real expression parser to distinguish between "a / b" and "/regex/". That still doesn't seem very expensive though (as long as you're not actually building an AST structure, just scanning through the tokens).
Automatic semicolon insertion also looks fiddly, but I don't think it affects bracket nesting at all (unlike regexes where you could have an orphaned bracket inside the string).
Overall, digging into this, it definitely strikes me that JS's syntax is just as awkward and fiddly as its semantics. Not really surprising I guess!
Early error behavior is proposed to be deferred (i.e. made lazy), not skipped. Additionally, it is one of many things that require frontends to look at every character of the source.
I contend that the text format for JS is no way easy to implement or extend, though I can only offer my personal experience as an engine hacker.
Indeed it's a semantic change. Are you saying you'd like that change to be proposed separately? That can't be done for the text format for the obvious compat reasons. It also has very little value on its own, as it is only one of many things that prevents actually skipping inner functions during parsing.
Our goal is not to complicate Javascript, but to improve parse times. Fundamentally that boils down to one issue: engines spend too much time chewing on every byte they load. The proposal then is to design a syntax that allows two things:
1. Allow the parser to skip looking at parts of code entirely.
2. Speed up parsing of the bits that DO need to be parsed and executed.
We want to turn "syntax parsing" into a no-op, and make "full parsing" faster than syntax parsing currently is - and our prototype has basically accomplished both on limited examples.
> JS text format in general also means engines/interpreters/browsers are simpler to implement and therefore that JS code has better longevity.
As an implementor, I have to strongly disagree with this claim. The JS grammar is quite complex compared to a encoded pre-order tree traversal. It's littered with tons of productions and ambiguities. It's also impossible to do one-pass codegeneration with the current syntax.
An encoding of a pre-order tree traversal is not even context-free (it can be implemented on top of a deterministic PDA). It literally falls into a simpler class of parsing problems.
> The binary AST format proposes to skip this step [0] which is equivalent to wrapping function bodies with eval…
This really overstates the issue. One can equally rephrase that statement as: if you are shipping JS files without syntax errors, then the behaviour is exactly identical.
That serves to bring to focus the real user-impact of this: developers who are shipping syntactically incorrect javascript to their users will have their pages fail slightly differently than their pages are failing currently.
Furthermore, the toolchain will simply prevent JS with syntax errors from being converted to BinaryJS, because the syntactic conversion is only specified for correct syntax - not incorrect syntax.
The only way you get a "syntax" error in BinaryJS is if your file gets corrupted after generation by the toolchain. But that failure scenario exists just the same for plaintext JS: a post-build corruption can silently change a variable name and raise a runtime exception.
So when you trace the failure paths, you realize that there's really no new failure surface area being introduced. BinaryJS can get corrupted in the exactly the same way with the same outcomes as plaintext JS can get corrupted right now.
Nothing to worry about.
> We don’t need complicate JavaScript with such a powerful mechanism already tuned to perfectly complement it.
We need to speed up Javascript more, and parsing is one of the longest standing problems, and it's time to fix it so we can be fast at it.
Wasm is not going to make regular JS go away. Codebases in JS are also going to grow. As they grow, the parsing and load-time problem will become more severe. It's our onus to address it for our users.
I am puzzled by how an binary AST makes the code significantly smaller than a minified+gziped version.
A JavaScript expression such as:
var mystuff = blah + 45
Gets minified as
var a=b+45
And then what is costly in there is the "var " and character overhead which you'd hope would be much reduced by compression.
The AST would replace the keywords by binary tokens, but then would still contain function names and so on.
I mean I appreciate the effort that shipping an AST will cut an awful lot of parsing, but I don't understand why it would make such a difference in size.
Afaik, improving parse time is the big goal here. Parsing is a significant stage of the compile pipeline, perf-wise. Some engines (Chakara at least), defer part of the parsing of the body of a function until the function itself is called, just to save precious milliseconds and not block the first render.
It's actually hard to get smaller than compressed minified JS. Source code is actually a very excellent high-level, compact representation of program text.
The space optimizations in BinaryJS are to get us back down to the point where we are as good or better than minified compressed JS.
The main goal is to allow for much faster parse times, but to do that without compromising other things like compressed source size.
The linked article somehow avoids ever stating the meaning of the acronym, and I had to Google it myself, so I imagine some other people might not know: AST stands for "abstract syntax tree".
It seems this would break a number of libraries (can't name one, but certain I've seen it), although per the article it would be a simple affair for the browser to reify the encoded AST into a compatible representation, especially if comments are also eventually preserved (per article its on the roadmap)
I remember reading about some framework which was doing something nutty like embedding actual data in the comments on a function, and then parsing out those comments at run-time. For what it's worth, I believe it was on HN and in relation to the caveats for older V8 optimization w.r.t. function length (https://top.fse.guru/nodejs-a-quick-optimization-advice-7353...), but it was years ago so, as you say, hopefully they moved to something less insane in the intervening time.
Aside since you're here. The "Would it be possible to write a tool to convert serialized AST back to JS?" portion of the FAQ (https://github.com/syg/ecmascript-binary-ast#faq) says that it would be possible to generate source which would be "semantically equivalent" -- you might want to call out the Function.prototype.toString exception explicitly there, though admittedly that level of pedantry might be more obscuring than enlightening.
However this technology pans out, thank for a really well-written post. It is a model of clarity.
(And yet many people seem to have misunderstood: perhaps an example or a caricature of the binary representation might have helped make it concrete, though then there is the danger that people will start commenting about the quality of the example.)
To be honest I (as an author of the Sciter [1]) do not expect too much gain from that.
Sciter contains source code to bytecode compiler. Those bytecodes can be stored to files and loaded bypassing compilation phase. There is not too much gain as JS alike grammar is pretty simple.
In principle original ECMA-262 grammar was so simple that you can parse it without need of AST - direct parser with one symbol lookahead that produces bytecodes is quite adequate.
JavaScript use cases require fast compilation anyway. As for source files as for eval() and alike cases like onclick="..." in markup.
[1] https://sciter.com
And JS parsers used to be damn fast indeed, until introduction of arrow functions. Their syntax is what requires AST.
Not clear how hoisting is related to parsing really ...
Compiler builds table of variables (registers map) of the function to generate proper bytecodes to access registers.
At this point hoisting has some effect. But it has nothing with parse/AST phase.
What you're describing is a recognizer, not a parser. A parser by definition produces some output other than a boolean 'yes, this is a syntactically correct HTML/JS input' (which is still more useful than what a tokenizer/scanner gives you, which is 'yes, this thing contains only HTML/JS tokens and here they are...splat')
I'd like to see some real-world performance numbers when compared with gzip. The article is a little overzealous in its claims that simply don't add up.
My suspicion is it's going to be marginal and not worth the added complexity for what essential is a compression technique.
This project is a prime example of incorrect optimization. Developers should be focused on loading the correct amount of JavaScript that's needed by their application, not on trying to optimize their fat JavaScript bundles. It's so lazy engineering.
I'm guilty here: I described this format as "compression technique" because early feedback indicated that many people assumed this was a new bytecode. However, the main objective is indeed to speed up parsing. Compressing the file is a secondary goal.
> My suspicion is it's going to be marginal and not worth the added complexity for what essential is a compression technique.
In terms of raw file size and according to early benchmarks (which may, of course, be proved wrong as we progress), Binary AST + gzip affords us a compression that is a little bit better than minification + gzip. By opposition to minification, Binary AST does not obfuscate the code.
The real gain is in terms of parsing speed, in which we get considerable speedups. I do not want to advertise detailed numbers yet because people might believe them, and we are so early in the development process that they are bound to change dozens of time.
> This project is a prime example of incorrect optimization. Developers should be focused on loading the correct amount of JavaScript that's needed by their application, not on trying to optimize their fat JavaScript bundles. It's so lazy engineering.
Well, you are comparing optimizing the language vs. optimizing the code written in that language. These two approaches are and always will be complementary.
The gzip point aside (which is not an apples-to-apples comparison as gzipping a big source does not diminish its parse time), I see the response of "JS devs need to stop shipping so much JS" often. My issue with this response is that multiple parties are all trying to work towards making JS apps load faster and run faster. It is easy enough to say "developers should do better", but that can be an umbrella response to any source of performance issues.
The browser and platform vendors do not have the luxury of fiat: they cannot will away the size of modern JS apps simply because they are causing slowdown. There can be engineering advocacy, to be sure, but that certainly shouldn't preclude those vendors from attempting technical solutions.
Sure, a new file format always introduces a new risk. This has never prevented browsers from adding support for new image formats or compression schemes, though.
The specifications are not nearly stable enough to be publicized yet. In particular, there are several possibilities for file layout, compression, etc. and we have barely started toying with some of them, so any kind of spec we publish at this stage would be deprecated within a few weeks.
If you wish to follow the development of the reference implementation, you can find it here: https://github.com/Yoric/binjs-ref/ . It's very early and the format will change often as we
I wish for something like evalUrl() to run code that has already been parsed "in the background" so a module loader can be implemented in userland. It would be great if scripts that are prefetched or http2 pushed could be parsed in parallel and not have to be reparsed when running eval.
It faithfully models the original code. The idea is that if you want to transform `!0` to `true`, or if you want to obfuscate, etc. you can always plug an additional tool.
For the moment, it's JS + Rust, because Rust is better than JS when you keep refactoring Big Hairy Data Structures. However, once the data structures stabilize we're planning to shift it entirely to JS.
Mathematica has several representations for the code the main one is the inputForm, the thing that users type in, and there is a fullForm which is the full ast, and there is a way to manipulate it to create new functions on the fly (or really any other objects, like graphic or sound, since everything in Mathematica is represented in a uniform way)
For instance if in input form you have {1 + 1, 2 *3 }, in fullForm this becomes List[Plus[1,1], Times[2, 3]], which is the readable version of lisp and internally represented the same way [List [Plus 1 1] [Times 2 3]].
The fullForm is accessible from the language, and can be manipulated using standard language feature, kind of like eval but better.
It would be really cool to have something like this in javascript, but unfortunately looks like javascript tools tend to create nonuniform ast, that is hard to traverse and manipulate
For the moment, we're not trying to make the AST visible to the language although this would definitely be interesting.
The "recommended" manner to manipulate the AST these days is to use e.g. Babel/Babylon or Esprima. I realize that it's not as integrated as what you have in mind, of course, but who knows, maybe a further proposal one of these days?
One of my main concerns with this proposal, is the increasing complexity of what was once a very accessible web platform. You have this ever increasing tooling knowledge you need to develop, and with something like this it would certainly increase as "fast JS" would require you to know what a compiler is. Sure, a good counterpoint is that it may be incremental knowledge you can pick up, but I still think a no-work make everything faster solution would be better.
I believe there exists such a no-work alternative to the first-run problem, which I attempted to explain on Twitter, but its not really the greatest platform to do so, so I'll attempt to do so again here. Basically, given a script tag:
A browser, such as Chrome, would kick off two requests, one to abc.com/script.js, and another to cdn.chrome.com/sha256-123/abc.com/script.js. The second request is for a pre-compiled and cached version of the script (the binary ast). If it doesn't exist yet, the cdn itself will download it, compile it, and cache it. For everyone except the first person to ever load this script, the second request returns before the time it takes for the first to finish + parse. Basically, the FIRST person to ever see this script online, takes the hit for everyone, since it alerts the "compile server" of its existence, afterwards its cached forever and fast for every other visitor on the web (that uses chrome). (I have later expanded on this to have interesting security additions as well -- there's a way this can be done such that the browser does the first compile and saves an encrypted version on the chrome cdn, such that google never sees the initial script and only people with access to the initial script can decrypt it). To clarify, this solution addresses the exact same concerns as the binary AST issue. The pros to this approach in my opinion are:
1. No extra work on the side of the developer. All the benefits described in the above article are just free without any new tooling.
2. It might actually be FASTER than the above example, since cdn.chrome.com may be way faster than wherever the user is hosting their binary AST.
3. The cdn can initially use the same sort of binary AST as the "compile result", but this gives the browser flexibility to do a full compile to JIT code instead, allowing different browsers to test different levels of compiles to cache globally.
4. This would be an excellent way to generate lots of data before deciding to create another public facing technology people have to learn - real world results have proven to be hard to predict in JS performance.
5. Much less complex to do things like dynamically assembling scripts (like for dynamic loading of SPA pages) - since the user doesn't also have to put a binary ast compiler in their pipeline: you get binary-ification for free.
The main con is that it makes browser development even harder to break into, since if this is done right it would be a large competitive advantage and requires a browser vendor to now host a cdn essentially. I don't think this is that big a deal given how hard it already is to get a new browser out there, and the advantages from getting browsers to compete on compile targets makes up for it in my opinion.
I don't think the binary AST proposal changes the accessibility status quo. In my mind, the best analogy is to gzip, Brotli, etc.
If you had to have a complicated toolchain to produce gzipped output to get the performance boost, that would create a performance gap between beginners and more experienced developers.
But today, almost every CDN worth its salt will automatically gzip your content because it's a stateless, static transformation that can be done on-demand and is easily cached. I don't see how going from JavaScript -> binary AST is any different.
I actually think gzip serves as a good example of this issue: this comment alone is daunting to a beginner programmer and it really shouldn't. This chrome/cdn thing could ALSO be auto-gzipping for you so that a beginner throwing files on a random server wouldn't need to know whether it supports gzip or not. I think we really take for granted the amount of stuff completely unrelated to programming we've now had to learn. If our goal is to make the web fast by default, I think we should aim for solutions that work by default.
It's definitely the case that once a technology (such as gzip) gets popular enough it can get to "by default"-feeling status: express can auto-gzip, you can imagine express auto-binary-ast-ing. It's slightly more complicated because you still need to rely on convention of where the binary-ast lives if you want to get around the dual script tag issue for older browsers that don't support binary ast yet (or I suppose have a header that specifies it support binary ast results for js files?). Similarly, at some point CDN's may also do this for you, but this assumes you know what a CDN is and can afford one. The goal I'm after is it would be nice to have improvements that work by default on day 1, not after they've disseminated enough. Additionally, I think its really dangerous to create performance-targeted standards this high in the stack (gzip pretty much makes everything faster, binary ast one kind of file, and introduces a "third" script target of the browser). The chrome/cdn solution means that firefox/cdn might try caching at a different level of compilation, meaning we get actual real world comparisons for a year before settling on a standard (if necessary at all).
Edit: another thing to take into account, is that it now becomes very difficult to add new syntax features to JavaScript, if its no longer just the browser that needs to support it, but also the version of the Binary AST compiler than your CDN is using.
The process of getting content on to the web has historically been pretty daunting, and is IMO much easier now than the bad old days when a .com domain cost $99/year and hosting files involved figuring out how to use an FTP client.
In comparison, services like Now from Zeit, Netlify, Surge, heck, even RunKit, make this stuff so much easier in comparison now. As long as the performance optimizations are something that can happen automatically with tools like these, and are reasonable to use yourself even if you want to configure your own server, I think that's a net win.
I do agree with you though that we ought to fight tooth and nail to keep the web as approachable a platform for new developers as it was when we were new to it.
On balance, I'm more comfortable with services abstracting this stuff, since new developers are likely to use those services anyway. That's particularly true if the alternative is giving Google even more centralized power, and worse, access to more information that proxying all of those AST files would allow them to snoop on.
This suggestion has a problem similar to the reason that browsers don't globally cache scripts based on integrity values: with your suggestion, if a domain temporarily hosts a .js file with a CSP-bypass vulnerability (ie. `eval(document.querySelector('.inline-javascript').textContent)` is a simple example; many popular javascript frameworks exist that do the equivalent of this), and then later removes it and starts using CSP, an attacker who knows an XSS vulnerability (which would otherwise be useless because of CSP) could inject a script tag with the integrity set equal to the CSP-vulnerable script that used to be hosted at the domain, and Chrome would find the script at the cdn.chrome.com cache.
(You might be thinking someone could set CSP to disable eval on their pages, but eval is reasonably safe even in the presence of otherwise-CSP-protected XSS attacks as long as you aren't using code that goes out of its way to eval things from the DOM, and it's more than annoying that the only way to protect yourself from this cache issue would be to disable eval. ... Also, there are some libraries that interpret script commands from the DOM without using eval, so disabling eval doesn't even protect you if you previously hosted one of those javascript files.)
You could have the cdn.chrome.com cache aggressively drop the cache of things greater than a certain amount of time like a day. But then there's a question of whether the requests to the cache are all just wasted bandwidth for the many requests by users to scripts that hadn't been loaded in a day. And the whole system means that website security can be dependent on cdn.chrome.com in some cases. I'd rather just build and host the processed binary AST myself. I already use a minifier; for most people, the binary AST tool would just replace their minifier.
Interesting idea, which could be built on top of the Binary AST.
I would personally prefer it being handled transparently by the middleware or the cdn, which would neatly separate the responsibilities between the browser, the cache+compressor and the http server.
Anyway, one of the reasons this project is only about a file format is so that we can have this kind of conversation about what the browsers and other tools can do once we have with a compressed JS file.
One of my fears about doing this at the CDN level is that now introducing a new syntax feature means you need the browser to support it AND the version of the Binary AST compiler on your CDN. Imagine using a new JS keyword and all of a sudden all your code gets slower because its a syntax error at the CDN level. It would just slower the rate of introduction of new syntax features I think by needing a lot more coordination: its already a bit messy with different browsers having different support, now caniuse.com may need to include CDN's too.
Indeed. That's not one of the main goals of the project, but I hope that standardizing the AST will help a lot with JS tooling, including code visualisation.
I want to see a code editor that represent common programming language concepts such as classes, functions (OO) in a graphical way and displays the rest in text.
Yes, dmitriid, I believe it's clear by now that you don't like this project.
As I mentioned in another conversation, if you feel you can contribute to Wasm and make it solve the problems we are solving here, this is great, by all means, please do so.
In the meantime, we are going to continue trying to solve some of the performance problems of the web using the techniques discussed in the blog entry because we are convinced that they are going to work better.
> So, your article is FUD in it's purest undistilled form.
The article is just reiterating an argument that Mozilla and Facebook made. If you’re saying Mozilla and Facebook are spreading Fear Uncertainty and Doubt with their proposal for a binary JS AST, then you may do so, but please realize that they probably had good reasons to start such a project.
EDIT: Even though I disagree with the parent, it is making some good points (specifically, that WebASM is going to add a GC), so flagging it to death doesn’t seem to be the right thing to do.
It's a fairly simple optimization - it's still JavaScript, just compressed and somewhat pre-parsed.
WASM doesn't currently have built-in garbage collection, so to use it to compress/speed up/whatever JavaScript, you would have to compile an entire JavaScript Virtual Machine into WASM, which is almost certainly going to be slower than just running regular JavaScript in the browser's built-in JS engine.
(This is true for the time being, anyway. WASM should eventually support GC at which point it might make sense to compile JS to WASM in some cases.)