Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Computer program fixes old code faster than expert engineers (newsoffice.mit.edu)
132 points by Libertatea on July 10, 2015 | hide | past | favorite | 62 comments


I think what the paper does is a variant of profile-guided optimisation (PGO) [1], an old approach to optimising compilers, where the binary is annotated and run to produce information about runs that is then feed back to the compiler so as to allow better optimisation the next time the compiler is run. The most well-know off-shoot of PGO are tracing JIT compilers. The first (well-known) tracing JIT compiler was Dynamo [2], and indeed, the system under discussion here uses DynamoRIO [3], a descendant of Dynamo to instrument a compiled binary. The instrumented binary is then executed multiple times to generate execution traces that allow the optimiser to find hot code, which is then analyses to adapt the code to the new architecture, e.g. change buffer sizes.

[1] https://en.wikipedia.org/wiki/Profile-guided_optimization

[2] http://www.cs.virginia.edu/kim/courses/cs771/papers/bala00dy...

[3] http://www.dynamorio.org/


Interesting. So what would you say is novel/noteworthy about this research?


In typical PGO you have access to the source code, the compiler instruments it for you automatically, you run the program with what You consider good input. During that time the instrumentation records a lot of interesting stuff. Then the compiler recompiles your program with that extra knowledge, making much better decisions[1]. For instance the compiler might realize: "Oh, this function is called much more often than I thought, I will now inline it."

Here (Link to the paper:[2]) this is different: The authors do NOT have access to the source code, they only have stripped binaries. From these unreadable binary instructions, they are able to identify interesting patterns: they extract the algorithm from the assembly. For this to work, they seem to need run time information. This means, as in conventional PGO, they need to have a wide variety of valuable input. It apparently cannot be done at compile time: "Current state-of-the-art techniques are not capable of extracting the simple algorithms from these highly optimized program."

Once the algorithm is figured out, the Helium framework is generating domain specific code in "Halide". Halide compilers knows how to optimize these stencil computation better than old hand written code, which give them these impressive improvements.

[1]https://msdn.microsoft.com/en-us/library/e7k32f4k.aspx

[2] http://groups.csail.mit.edu/commit/papers/2015/mendis-pldi15...


Also seems like 'highly optimized' is another way of saying 'obfuscated' in this case.


I have not had time to study this paper in detail, so please take what I say here with a grain of salt.

Dynamic instrumentation is powerful, I think Google is using DynamoRIO heavily. Optimising compilers for DSLs is a trendy topic because

1. We are beginning to understand well how canonically to embed DSLs in general purpose languages, and have to build compilers for languages.

2. DSLs typically work on much more restricted domains than general purpose languages, so optimisation is much easier, and more powerful.

The Pochoir compiler for stencil computations is an example of this. There are many open questions about compilation of DSLs other than optimisation, for example automatic generation of operational and axiomatic semantics of DSLs, and associated tooling. Really interesting research field.

[1] https://people.csail.mit.edu/yuantang/


Question for clarification: the title uses the word "fix", implying that this new program repairs code that was broken. But in reality, it optimizes repetitive algorithms. Is that the correct takeaway?


I think it "fixes" code in that it makes it work on newer hardware, but the title is still misleading; it suggests that the software is performing bugfixes and correcting code that has failing unit tests, which is totally unrelated.


Fix in the sense that it takes a thing that used to work well and no longer does (due to changes in the underlying computing platform) and programmatically changes the methods used to be more efficient on modern platforms, while producing the same output. Fix in the sense of "improve" rather than "make not broken".


I believe you are correct.


Title correction: "Computer program optimizes the speed of image filters".


Well, the technique could be applied in other areas, but optimizing is still not the same as fixing. The headline implies a solution to a much larger and more difficult problem than the researchers were actually trying to address, so it's effectively clickbait. I for one would not have clicked through if the headline had been accurate.


    the technique could be applied in other areas
The optimisations used in the paper are tightly coupled to a certain approach to image-processing.


I'd have to agree with this. Low-level "close to the metal" programming is something I have very little experience with, but based on my understanding, this research has extremely limited scope. The technique relied on what is essentially prior knowledge of the implementation. This doesn't mean it isn't very useful for companies like Adobe...it is. However, it does not look like anyone intended for this to be broadly applicable.


Question: What's the most important algorithmic technique?

Answer: Understand the problem domain better.


The optimizations themselves are pretty domain-specific, but the techniques used to instrument and modify the target program are - as you yourself point out in another comment - pretty widely applicable. That's where the real innovation lies.


I agree that the framework and the general idea of program rejuvenation is applicable in general.


Exactly...


And general programs like MS Word...


Where do you see that? I saw mention of using it on one other program - an image viewer, Microsoft Windows IrfanView.


The confusion arises from the apostrophe that implied that IrfanView somehow belongs to Microsoft Windows (it doesn't). I almost made the same mistake while reading quickly.


I believe MS Word (or some components in office) already has some sort of PGO applied to it:

http://llvm.1065342.n5.nabble.com/Capabilities-of-Clang-s-PG...


Title is misleading; can a computer program step in and fix it, please?


The MIT PR machine has perfected the press cycle eclipsing good research.

http://www.phdcomics.com/comics.php?n=1174


The amount of meaningless verbiage in this press release is indeed disturbing. What does it mean for the program to become "less effective"? A program is either effective or isn't. And a "billion dollar problem?" Adobe doesn't even spend a billion dollars on R&D entirely.


Less effective meaning that the program does its job slower. From the way I read this it sounds like the original programmers used optimizations that worked well on older processors but not newer ones. For instance you might optimize one way for a pentium 4 but differently for an i7. It is not that the code for the P4 will not work on the i7 but the code is optimized for an older processor. Now this can be mitigated by the fact that the i7 is much improved over the P4 but you can get greater improvements by actually modifying the program source code as done by the software here.

As for the billion dollar problem I would guess it (Bit rot) is that or more over the entire software industry.


The headline made me think: Imagine the irony if, after years of various human jobs being replaced by software, someone one day completes the cycle by inventing an AI that can develop applications for humans. Far off, but funny to think about.


Software development is mostly not about syntax and programming languages, but domain knowledge and understanding humans. If a non-programmer could really describe what he wants, then you could write a compiler for that text.

But it's not possible yet, as the AI would need to have a human-like intelligence, and that's very far away.


Some amazing progress is being made in the field of program synthesis. In particular, there have been a lot of successes in generating programs from input/output examples, provided the domain is sufficiently restrictive and/or the programmer provides a "sketch" or partial program to start off the synthesizer with. One good paper to start looking at is: Oracle guided program synthesis, Jha and others ICSE'10. This algorithm has made its way into Microsoft Excel: http://research.microsoft.com/en-us/news/features/flashfill-....

You might argue this isn't "true" program synthesis, but in some cases it can outdo human programs, because the synthesizer can be tied into a verifier to produce only "correct" programs, where correctness is defined by a few simple and declarative properties. One example of this sort of thing is from Udupa and colleagues in PLDI'13: http://dl.acm.org/citation.cfm?doid=2462156.2462174. They're able to generate correct cache coherence protocols from a partial specification and input/output examples. This is a big improvement over the current state of the art, where a lot of very smart people think really hard and come up with a coherence protocol, convince themselves it's correct, and then some other very smart people spend a few months playing with automated verifiers trying to prove the protocol correct, and finally someone actually implements the protocol in hardware, and you might still run into bugs at this stage.


It's closer than you think

You are right about humans knowing about domain knowledge, but you don't need human level AI to have computers writing software

Maybe future programmers will be more like spec writers than actual coders


I'm pretty sure there's either a Paul Graham or Joel Spolsky essay on the subject, but basically: What's the difference between a sufficiently complete and formalised specification, and a high-level programming language?


Domain specific knowledge?

An AI would know how to fill in the gaps when you said "Make me a blog page" or "Make this server scalable."

This may sound outrageous, but some very large percentage of coding practice is wheel reinvention. Developers the world over solve the same problems over and over again.

One of the first applications for useful AI will be to collect and automate that repetition.

There's already a "make me a blog" project, and Bootstrap is so cliched now I'm surprised there aren't more site builders being developed. So this isn't entirely impractical.

The hard part is the visual design. It's much harder to automate that in a way that includes some genuine original creativity.

Of course web apps != general scientific or business coding. So AI isn't a complete solution. But it doesn't need to be to be useful. (Domain specific, after all.)


Download the Tokeneer project's Z specifications and Ada source code to see the difference. I'm sure it will nicely illustrate it.


I think what we're looking at in about a decade are interactive natural language development, in which one 'architect' will replace an entire team of programmers, designers, testers, etc.

"When I tap on this thing " - put the finger on the control "Go ahead and get the latest articles from HackerNews"

The System contacts "Hacker News" and queries for "latest articles" API. The systems negotiate an API key, account, etc and is good to go. The result is a list of articles with title and number of comments.

"Now use a nice table. " System formats output. "Show me other styles" Lists table styles. "This one. Now if the number of comments is more than 100, show them in red" And so on.

In the end, the System can generate source code in a multitude of languages (why?), some kind of pseudo code and of course an interactive, editable video of the "programming" session which others can watch and fix if necessary.

So basically, one person can design the whole "application" in a couple of hours.

We're not that far from that. And as we get closer, it will become better and better and it will be a lot easier to extend and modify this "System".

Imagine developing the System using the System.

Programming as we know it will remain a thing of the distant past as is the case with all things that evolve over time.


Now this sounds familiar.

I still remember books about software engineering written in the 80s (mainly intended for manager types) which, in their opening chapters after an explanation of how the discipline of "software engineering" evolved to the present day, painted a rosy-ass picture of the future by describing the year 2010 in which software architects lounged about in easy chairs and gave plain-English directions to HAL 9000 workstations -- replete with line drawings of retrofuturistic offices straight out of The Jetsons, The Incredibles, or Logan's Run.

Needless to say, 2010 came and went and the state of the craft had evolved microscopically compared to anticipated advances despite having plenty more CPU cycles to burn on ever more complicated tooling. In some ways it has regressed. The know-how which produced Unix and ARPANET on limited hardware with limited resources, inside academic, industrial, and government departments which got little recognition or respect, is in short enough supply that we may not be able to repeat the feat if we had to do it all over again.

Alan Kay maintains that programming is pop culture because we don't remember our history but it's worse than that. We're like a primitive Mesopotamian tribe from antiquity who had a history but forgot it except for bits and pieces which got written down and became Religion. Things like "GO TO considered harmful", "object-oriented programming is good", "DRY", "refactoring", "design patterns", etc.

We're saddled with object-oriented programming -- and its attendant complexity -- for life. But most of us don't understand how OO came to be, why it might be good, or what the tradeoffs are vs. other programming methods because we don't ask ourselves these questions.

Design patterns should be treated like TV Tropes, replete with having maybe a great wiki describing them all, and allowing the addition of new ones as well as admitting variations and inversions. Instead we treat them like nam-shubs, strict instructions handed down from the gods on what to do under these circumstances to achieve this result.

And in this culture we await the day when, magically, the Machines will spontaneously evolve the intelligence to relieve us of the burden of software design specifics!


Maybe you're missing the fact that in 2015 we have super computers (by 80s standards) in our pockets which are always connected to the Internet to which we actually issue verbal commands..

In fact, most of the tasks described in those futuristic books didn't involve programming, I suspect they were examples of how you could ask your HAL 9000 to tell you how much is 9999 * 9999 and it would give you the answer in human voice. A lot of the tasks which would have required programming in the 80s are now solved by one of the millions of apps, so that's another angle.

Maybe we're not exactly were sci-fi predicted we'd be at, but there are a lot of things that sci-fi didn't even dream of that we take for granted.

Even the dev environments of today. You'd think they only include the text editor/IDE, but you've also got the browser, google and stackoverflow, github and a myriad of tools and libraries to help use build more and more complex apps.

The fact is, today's programmer-for-hire is an interface between the client (app inventor/architect) and the machine, just like secretaries were interfaces between their boss' spoken messages and the typewritten letters.

And if something can be optimized away by technology someday it will. Maybe it won't involve spoken language (although, why not, if it's done right?), but my bet is that programming will be less of a profession and more of a skill that you learn by using powerful tools.


That is highly optimistic and I do not buy into this. This would require an understanding of the human language and here we are FAR off, considering that it is currently even hard for engineers to grasp a user's requirements.


RNNs can already write what looks like viable code without any human intervention. Of course, the code doesn't do anything useful and is just a reflection of its training, but all we have to do is figure how to guide that.

We are doing OK in human language recognition as well as understanding in simple dialogue frames. The technology is also moving awfully fast at the moment. You are thinking in terms of human level intelligence, but it really doesn't have to be that good. It only has to provide enough random-but-feedback guided choices until the user finds what they are really looking for.

Put it this way: if the user could get what they wanted from the computer directly just by "searching", the process would be more efficient since the most inefficient part about programming is human-to-human communication and coordination.


RNNs are good at learning the structure of their training input—even character-level networks can output vaguely plausible English. Of course, what you end up with often looks like a computational model of a thought disorder. They have seemingly lucid moments, but so do Markov chain generators.

As far as code goes, we've already solved the problem of modeling the structure of any programming language with an implementation. That's not the hard part.


> considering that it is currently even hard for engineers to grasp a user's requirements.

That's because users don't know exactly what they want. They know it when they see it, but they don't know how to explain it. They expect the engineers to guide them through to their idea.

But as per example above, that could be done by an app - guiding a user from nothing to whatever he wants by interactively experimenting with features.

Think of a REPL with access to a huge number of libraries, which is searchable by voice input.


Semantics may not be a solvable problem, but the system doesn't exactly have to answer existential identity questions to render a list. A restricted subset of English would probably be necessary to deal with abstract algorithms.


We could have AI assisted tools; e.g. very intelligent code completion systems that are trained on your's and everyone else's code. Sometimes programmers don't know exactly what they want, but when given a few plausible choices based on some initial code, can generally pick out what they really wanted.

Up from enhanced code completion, AI can become effective in coding even before they are human-level intelligent simply by creating a conversational feedback loop with a human; e.g. more to the right, more to the left, redder, yes! One could argue that the human is still programming, but they might not need any specialized programming training to use such a system. In that case, we really aren't as far away as many think.


First they came for the factory workers, and I did not speak out - Because I was not a factory worker.

Then they came for the vehicle drivers, and I did not speak out - Because I was not a vehicle driver.

Then they can for the accountants, and I did not speak out - Because I was not a accountants.

Then they came for me - and there was no one left to speak for me.


Let's not conflate this with the Holocaust even as a joke. Automation like this is a step forward even if it only inspires a dreary debate about economic theory.


Automation is not a step forward. It increases the resource burn rate.


Sure, now tell me about your hand-woven clothes collection


What is the connection between "First they came for..." and anything Jewish? The original does not mention Jews:

http://www.martin-niemoeller-stiftung.de/4/daszitat/a31

Let the guy (or gal) make a joke, we don't need to be reminded that we only care about the Jewish holocaust and not the Ukrainian holocaust 10 years prior by the same mustache shape on a different tyrant.

EDIT: Not to mention the person who said that actually hated the Jews: https://en.wikipedia.org/wiki/Talk:First_they_came_...#Poem_...



Thanks! and the project source: http://projects.csail.mit.edu/helium/


Oddly enough, there's another group at MIT working on rewriting binaries that was recently discussed on HN: https://news.ycombinator.com/item?id=9804036



I suspect that a valuable application of deep learning will be learning how legacy UIs operate by watching real users, and implement an easier to learn / nice UI / mobile interface on top of the legacy app. Probably with a human UI designer stitching it all together. It might even learn a voice interface by observing call center operators.

Could be easier, less risky, and more urgent than learning to recode the whole system.


Yeah, the paper in itself is interesting, not sure why they had to give it that completely misleading title.


Computer program generates machine code faster than expert engineers. It's called a compiler.


Did they lose original source code?


No, I don't think they did. I think they're just taking their old awful unmaintainable source, adding more to the hot heap of slag, and then using some clever compiler optimizations to make it run better. No love for the actual programmers. You get to keep working on the nightmare code (3 months to make changes...).

Now, color me impressed if the thing output the same language as the original source, all spruced up.

As it is, they'll just have a code base that will eventually transform into the anti-christ cause management will always be like, "Hey, don't ever fix the code, just run that see-saw whatever thing on it afterwards."


Black swan event in 2020: AI replaces 80% of software engineers.

</fiction></joke>


I know you're joking, but the reality is that software has been replacing huge chunks of developers' work for years, whether through optimising compilers, common libraries, CI/CD tools, web APIs, IDEs or whatever.

I've been involved in development for over quarter of a century. Back in the late 80s, we had a dev team dedicated for years building some mapping software. Their entire product could probably be put together in a couple of hours using Google Maps and a bit of JavaScript.

All this automation has happened, yet I see no slowdown in the amount of development that still happens. They can just concentrate on more bespoke, more value-adding work these days.


more bespoke, more value-adding work these days

A software version of the Jevons paradox: making software cheaper increases the demand for software.


For the love of the human race and all of us still using Windows, I really hope a faster version of Irfanview gets released as a result of this.


When people on HN don't bother to read the article, clickbait titles flood the frontpage.


The article is interesting because it is about the potential disruption of an industry and is based on work done at an institution with a reputation for research that disrupts technical industries, e.g. Chomsky's formal grammars.


Clickbait title




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: