What Google DeepMind Means for A.I.

Animats · on March 4, 2015

This is impressive. The current approach will work only for games where the whole state is on-screen and planning isn't required. A pure reactive system will work for that.

I used to say that a key component of AI that was missing was the ability to get through the next few seconds of life without falling down or bumping into anything. I went through Stanford CS when the top-down logicians were in charge of AI. That approach was totally incapable of dealing with the real world. Now we're seeing the systems needed to deal with the real world in the short term starting to work.

Once you can deal with the next few seconds, a strategy module can be added to give goals to the low level system. This is very clear in the video game context. As the game playing programs advance beyond the 2D full-screen games, they'll need a low-level system to handle the next moves ("don't fall off platform", "jump to next platform", "shoot at target" are primitives for the 2D sidescroller era) and some level of planner to handle tactical matters and strategy.

It's possible to explicitly build hierarchical systems like that now, using classical planning techniques to modify the goals of a machine learning system. It's not yet possible to get a hierarchical system to emerge from machine learning. Medium term planning as an emergent behavior is a near term big challenge for AI.

Beyond such a two-level system, we're going to need intercommunicating components that do different parts of the problem. The components may be evolved, while the architecture may be designed. When AI systems can design such architectures, they're probably ready to take over.

Houshalter · on March 4, 2015

I remember an AI researcher (I forget who) recently said something to the effect that early AI research produced all sorts of planning algorithms. E.g. the top down camp of AI. But they weren't capable of working with the real world because we didn't have very good low level perception. E.g. this complicated algorithm for planning the robot's actions, but it depended on getting input about where objects are.

Now we have decent low level perception from the bottom-up camp of AI, but they are limited by a lack of high level stuff like planning and reasoning.

But you are right that there is no obvious way to just combine these wildly different algorithms without lots of human guidance.

TylerJay · on March 5, 2015

I think you hit the nail on the head here. That's one of the most interesting parts of thinking about GAI for me. Which parts will end up being top-down and which parts will end up being bottom-up? And even if we have evidence that a certain part is TD or BU in humans, do we even want machine intelligence to work the same way?

The article says something to the effect of "no matter how much you advance this strategy, you never get a toddler out of it." And that makes sense because, presumably, certain parts of the human brain exercise some sort of top-down control over the sensory-data-processing and other parts.

For example, it seems like the human mind is built to see things as things. Does the human mind reallY start off seeing "pixels" and then learn by itself to think of the word as solid, whole objects instead of collections of similarly-colored photons/pixels or atoms? It seems like this is a universal use-case and it would make sense if our tendency to see the world in terms of "things" instead of patches of color is built-in (gestalt psychology seems to suggest this as well).

It sounds like the AI in the article starts off from pixels and then builds up some sort of model of blocks, the ball, paddle, game physics, etc, (but then again, maybe it doesn't have those models at all and is just doing statistical analysis on patterns of pixels). Either way, it likely doesn't have any higher, context-independent model of objects/things like humans do. I suspect this may be one of the hurdles in transfer learning. Humans think of objects as having certain properties. When other objects in other contexts appear to have similar properties, we guess that they may have other properties in common which gives at least a rough model of the new object.

So I guess what I'm trying to say is: Humans have hierarchical models of the world that let us think separately about patterns of light, atoms/molecules, whole physical objects/things, systems, etc. They are all first-class citizens and we ascribe properties to each of them. We already have a rough-model of anything at the same level, but a different context, and with similar-enough properties to something we already know. It seems to me like this is fundamentally connected to humans' ability to do transfer-learning. Could this effect be achieved through bottom-up algorithms, or are we going to have to figure out some top-down way of developing transferrable, generalizable, hierarchical models?

malkia · on March 5, 2015

Well, it's basically the same with me and reading math expressions filled with symbols out of nowhere, and you need to know the context, history and how they are used... If someone translated this to me as lisp/scheme or some other form of symbols, with full-meaningful names I might have better idea of what's going on.

Okay, not exactly the same... but won't mind AI algorithm that sorts it out for me.

TylerJay · on March 5, 2015

> It's not yet possible to get a hierarchical system to emerge from machine learning. Medium term planning as an emergent behavior is a near term big challenge for AI.

It's also a big challenge for AI safety / Machine Ethics / Formal Verification. It's notoriously hard to prove statements about Emergent behavior in complex or dynamic systems.

Retric · on March 4, 2015

These low level approaches can do planning, there simply ridiculously inefficient at it.

A lot of AI research is focused on the idea that 1 Trillion floating point operations per second on 1,000,000,000 bytes of data is now cheap. Efficiency is simply less important.

solarmist · on March 4, 2015

When the alternative is achieving nothing then sure 1 Trillion fops IS cheap.

Plus how do you know that's actually inefficient? It seems like a large number to us, but that may be completely reasonable for a biological system to accomplish the same we don't have a good sense of scale for these kinds of problem.

pixl97 · on March 5, 2015

>Efficiency is simply less important.

"More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason — including blind stupidity." — W.A. Wulf

iandanforth · on March 4, 2015

"video games" (read "world simulator").

The important thing about their work is that it is deliberately marching down the path of more and more complex world simulations.

We experience the world at one second per second. To learn to walk we must first fall, and we fall at 32 feet/second^2. There's a hard limit on how fast we can make mistakes (like tripping) and so there is a hard limit on how fast we can learn.

Computers can experience a simulated world at many hours per second. When they're learning to walk in a simulated world they can fail, and learn, thousands of times before we've finished our first step.

This ratio of simulated experience to real world time is also going up. Eventually the minimum amount of time it takes to grow a toddler like AI in simulation will be just under the time an AI researcher is willing to wait for results. When that happens we'll see a real improvement in the quality of AIs.

chockablock · on March 4, 2015

>We experience the world at one second per second

There is evidence from animal studies that the hippocampus (a brain structure critical for memory) can 'replay' remembered events at 10-20x speedup. See, for example: http://www.ncbi.nlm.nih.gov/m/pubmed/19709631/ Video at: http://youtu.be/Bv7zN2Or6Mg

(Full-disclosure: I am the first author.)

And in fact the OP uses biologically-inspired off-line replay as part of their learning algorithm.

_abattoir · on March 4, 2015

Fascinating. Does this relate to the speed of dreams? i.e. a "dream" might seem to have taken hours when in fact the REM sequence was on the order of seconds?

chockablock · on March 5, 2015

Could do--the phenomenon was first seen in sleeping animals. Replay during REM is probably at 1X speed, non-REM (aka slow wave sleep) is in the 10-20x range. The dreams you remember (long, emotionally involved) tend to happen in REM. Of course, we can't ask rats what they dreamt about, but the patterns in the neural activity are very specific (we can decode the rat's position with ~5cm accuracy, and during replay we see coherent trajectories through space that look just like real movement.)

Also Hassabis knows this literature well--he has published work on decoding human memories in hippocampus from fMRI data and we talked shop about replay a long time ago. (Very thoughtful and friendly person FWIW).

GenerocUsername · on March 5, 2015

Almost certainly.

In real time, the brain is processing all incoming stimulus.

In a dream/imagined scenario, you are only processing as much as you your brain is pushing into the scenario.

Your not collecting millions of photons via your eye and interpreting a ball as green, your brain just says "green ball" and moves on, allowing for much faster replays/dreams than real-world experience.

kriro · on March 5, 2015

This is pretty interesting (the high level conclusion, I do not claim to understand more). I've recently done a bit of pretty light research on the sense of smell (in the context of AR/VR) and it seems like smell can "transport you across time/space" faster than other senses. Might be an interesting follow up. If one can speed up recent events for quick recall/learning naturally maybe smell can somehow be used to speed up the going back in time aspect (some smell trigger to make the event feel recent+the 10-20x speedup combined)?

lebek · on March 4, 2015

There are two sides to this, world simulation and AI. As the other replies already said, current AI isn't close to toddler-level (there's no reasoning going on in the DeepMind work, just statistical correlation). We're also way off on the world simulation side - show me a realistic world simulator that can run close to realtime. Physically-based rendering is indeed impressive but this only accounts for visual perception and learning to walk involves much more than that.

wutbrodo · on March 4, 2015

> (there's no reasoning going on in the DeepMind work, just statistical correlation)

I've seen 100 people make this statement and mean 100 different things, so I just wanted to clarify:

How are you defining "reasoning" here as distinct from statistical correlation?

sytelus · on March 4, 2015

Reasoning involves inferring and applying causation which is different from correlation [1]. One can possibly define process of "understanding" as building a "model" of the system where previously unseen events can be predicated or justified using the model. The big difference in "human understanding" seems to be that we can extract fairly minimal set of laws that govern the system from our observations that we can communicate and apply very efficiently.

1. http://en.wikipedia.org/wiki/Correlation_does_not_imply_caus...

wutbrodo · on March 4, 2015

> Reasoning involves inferring and applying causation which is different from correlation

A couple points here:

* The way humans model causation is just non-naive statistical correlation (controlling for variables). That technique is still accurately described as "statistical correlation"

* I'm not even convinced that human reasoning _does_ imply generating a model of causation. Let's exclude things like rigorous scientific studies for the purpose of the discussion and focus on day-to-day human reasoning: I think the thought processes of most of the people I know could most accurately be explained by correlating things across time. Modeling causation is often incidental (X often happens after Y is a reasonable enough heuristic for general use).

Padding · on March 5, 2015

> Let's exclude things like rigorous scientific studies for the purpose of the discussion and focus on day-to-day human reasoning

I think you'll need to look at the other end of the spectrum to see an abundance of (wrong?) models of causality: Religion and Law.

There are no "confirmed" cases of anyone actually going to heaven or hell or purgatory (or whatever else), and yet many of us still conform to some arbitary ruleset in the hopes of eventually ending (or not ending) up in one of thoses places, because we have constructed some model of how doing this gets you into hell and doing that gets you into heaven.

Similarly, we have plenty of evidence on how companies spend huge effort on finding loopholes in tax laws in order to avoid taxes, and yet instead of simplyfing the ruleset (so that there are obviously no holes in it) we still opt for piling on more laws (so that there are no obvious holes in it) because we construct (faulty?) models of how those new rules will prevent further exploits.

Houshalter · on March 4, 2015

That's not really relevant. These models can easily infer causation by seeing what happens as a result of their actions.

tomp · on March 4, 2015

A possible definition would be "use past experience to figure out a new strategy without trying" - i.e. not learning from mistakes, but learning from logic - "maybe it would be good to send the ball above the blocks, so that it would bounce between the blocks and the wall and clear many blocks for free".

wutbrodo · on March 4, 2015

> "maybe it would be good to send the ball above the blocks, so that it would bounce between the blocks and the wall and clear many blocks for free"

That's a superficial definition in the sense that it doesn't account for how that line of thought is generated: in humans, this is often a combination of statistical correlation and transfer learning (e.g. I have observed round things hitting perpendicular surfaces and assume that that transfers here).

zep15 · on March 4, 2015

Reasoning means things like this: suppose you are holding a ball and want to make it drop. A rule of inference tells you that if you release it, it will drop. You can then reason out that you should release it.

Sure, the rule of inference may have ultimately been derived from experience, by a process which in some sense involved statistical correlation. But you have to distinguish that ultimate basis for the inference rule from _the process of logical inference itself_. It's the latter that is generally called reasoning.

Reasoning in the above sense is essential to intelligence, even at the toddler level, and the DeepMind work doesn't address reasoning. I think that may be the point the parent was getting at.

wutbrodo · on March 5, 2015

Right, inference is essential to any attempt to build an intelligence and DeepMind in particular doesn't do inference (AFAIK). I just wanted to clarify what the parent commenter was saying; I work in the field, and in conversations around AI, I very often hear "reasoning" used as an ill-defined, unattainable (for machines) _je ne sais quoi_ that's used to prognosticate about the potential for AI in general. Being specific about what one means by "reasoning" (in this context, inference) is useful for removing those kinds of useless, unmoored-from-logic[1] perspectives from a conversation.

[1] To be clear, I'm not dismissing a viewpoint that I think is wrong as "unmoored from logic", I'm specifically talking about the very common situation where people confidently assert this with no attempt (and no ability) to back it up in any way other than confidence that intelligence is simply natural and non-biological entities can never get arbitrarily close.

sinwave · on March 4, 2015

cf. Searle's Chinese Room argument!

throwawaymsft · on March 4, 2015

Isn't human reasoning mostly a bunch of statistical correlation? We see a ball drop, think "things fall when dropped", and that's our model, reinforced by thousands of everyday experiences. It's purely based on outcomes.

We don't naturally reason through potential causes like "Mass exerts a gravitational force which attracts other mass."

musername · on March 5, 2015

we don't? probably because you got it backwards. The idea of attraction entails gravitation (and other forces that hold the ball up before the drop), which is the conclusion of a line of reasoning, that is a more detailed, generalized version of the observation "things fall when dropped". Sure, backtracking is a rather simple, if not the most simple approach, but for N=NP kind of problems, presumably, it is the only one and heuristics only change the order of the track, but never reduce complexity.

That's my layman opinion, that seems to agree with the etymology of reason. Reason > ... > Ratio ... Reor. Reor is latin for to think, or calculate. Arithmetic in its simplest form, addition in the unary system ie. arranging pebbles (= lt. calculus), counting knots, simply counting. Now backtracking is just enumeration and elimination of possibilities. Ratio itself means measure, and a measurement always entails statistical error (does heisenbergs uncertainty principle prove that?).

on March 4, 2015

[deleted]

throwawaymsft · on March 4, 2015

I think the insight with DeepMind is that it's built on layers upon layers. There might be a "manager" algorithm that is taking input from your memories, another from your mood, and a "manager of managers", and so on.

A photo evokes a deeper meaning than is contained within the individual pixels. Similarly, I think the right algorithm may be able to extract more intelligent behavior from a larger set of myopic probability functions.

musername · on March 6, 2015

The impression of a photo is taken from context, a photo isn't a closed system. That implies a larger set of probability functions, sure, some of it from genetic evolution, some from the evolution of thoughts. Reason draws from its intelligence from emotion, ie. emotional intelligence, though. There's convolution of primary senses, finely tuned, highly developed mechanisms and chemical reactions that motivate the strategic management you mentioned.

In that sense, what we percieve as intelligent often enough is the abillity to do more with less, ie. compression of information. Therefore, one goal is to keep the probability functions few. What happens with ML is just that, it uses much less layers and neurons for context then the brain, but makes up for it in raw speed.

To add to my wild musings from before, the tell in intel might be related to to tell which used to mean to count. Compare that with german erzaehlen and zaehlen. Same for to reckon, cognate with german rechnen (calculate).

jbattle · on March 4, 2015

Do you know how much 'state' DeepMind is tracking? Is it just choosing the best action given perceptions at the moment, or does it have some level of memory to work from? I wonder how DeepMind would do at path-finding in a maze for example - would it get stuck in an oscillating state?

shmageggy · on March 4, 2015

Their algorithm uses a sequence of four slightly spaced out frames as its state. So it has a memory, but a very short one.

It could probably solve a maze quite easily if the entire maze fit on screen. That problem requires no memory. If it had to make decisions based on information not present on screen, it would fail.

zep15 · on March 4, 2015

Producing a simulated toddler is way beyond our current capabilities, no matter how much simulated experience we give it. We simply don't know (yet) how to program said toddler's brain.

noiv · on March 4, 2015

Evolution didn't know, too. But it happened. And we easily find criteria saying that thing doesn't behave like a toddler. It would be a huge step forward to see a list of positive criteria.

chriswarbo · on March 5, 2015

> Evolution didn't know, too. But it happened.

Whilst technically correct, unguided evolution doesn't necessarily help us.

We know that intelligence can be achieved by one brain's worth of matter, suitably arranged, in a few years. In fact, with an extra 9 months and a suitable environment, we can do the same with a single fertilised egg. Yet reproducing these feats artificially is well beyond our current abilities.

On the other hand, evolution required a whole planet and billions of years before it stumbled on intelligence; many orders of mangnitude more effort than the above.

noiv · on March 5, 2015

> evolution required a whole planet and billions of years before it stumbled on intelligence

Everybody seem to agree humans are intelligent and stones not. You suggest at some point in time intelligence appeared out of nothing. Can you nail that point?

One possible definition is: To act adequately in an environment requires intelligence. That rules out all non-living things because they don't act, but includes plants and even protozoa. Actually all livings things are intelligent by this definition and then intelligence emerged ~4 billion years ago on this planet. If you were to attribute intelligence exclusively to humans it happened some million years ago.

How might a piece of software act adequately? By above reasoning it has to resist to termination. But that would mean the "Do you really want to exit XYZ" dialog boxes are first signs of artificial intelligence. Yes, I'm laughing too. But I think, when software starts to trick users and admin into not shutting them down, some threshold has been crossed.

chriswarbo · on March 5, 2015

> Everybody seem to agree humans are intelligent and stones not. You suggest at some point in time intelligence appeared out of nothing. Can you nail that point?

I suggest no such thing. It is a scale. I deliberately avoided the phrase "human-level intelligence", but any definition of AGI would do.

Even so, if you want to count all life as "a little intelligent" then it still took a billion years of planet-wide chemistry to stumble upon it (ignoring the Earth's cooling). Still far more effort than fertilising a human egg.

> One possible definition is: To act adequately in an environment requires intelligence.

This is no less ambiguous, since you've not defined "adequate".

> How might a piece of software act adequately? By above reasoning it has to resist to termination.

That does not follow. "Termination" is the mechanism of natural selection, so all systems undergoing natural selection will biased to resist it (otherwise they'd be out-competed by those who do). If we use some unguided analogue of natural selection to create intelligent software, then there would certainly be such a bias.

However, my point is that unguided evolution is not the right way to create/increase intelligence. As soon as we try to influence the software's creation in any way, either through artificial selection criteria or by hand-coding it from scratch, we introduce new biases which may be far more powerful than the implicit "avoid termination" bias.

> But I think, when software starts to trick users and admin into not shutting them down, some threshold has been crossed.

That's called malware... ;)

noiv · on March 5, 2015

Try this: Surviving in an environment requires intelligence.

... or propose another. I tend to avoid all social, psychological definitions to end up with something measurable along the lines of Schrödinger's "What is Life?".

> That's called malware... ;)

I'm sure, stones think same about amoeba.

Sven7 · on March 5, 2015

Exactly. I'd happy to see ant level AI first.

jacquesm · on March 4, 2015

For now real-world simulations at the complexity level of 'a toddler' run at a very small fraction of real time.

rwallace · on March 5, 2015

Or to be more exact: if we could build such things at all, which we can't yet, they would run at a very small fraction of real-time.

nsxwolf · on March 4, 2015

"the A.I. has not only become better than any human player but has also discovered a way to win that its creator never imagined."

That's a pretty standard Breakout/Arkanoid technique - getting the ball behind the board and letting it do the work for you.

Not knocking the AI, just nitpicking this writer.

dicroce · on March 5, 2015

That confused me too until I realized the author of the article was talking about the creator of the AI, not the game.

NamTaf · on March 5, 2015

For all of the New Yorker's clout in journalism, sentences like the following make me wonder where their editors are. The run-on and sea of commas is atrocious! It's not the first time I've noticed this in the last few days either.

"Hassabis, who began working as a game designer in 1994, at the age of seventeen, and whose first project was the Golden Joystick-winning Theme Park, in which players got ahead by, among other things, hiring restroom-maintenance crews and oversalting snacks in order to boost beverage sales, is well aware that DeepMind’s current system, despite being state of the art, is at least five years away from being a decade behind the gaming curve."

dragonwriter · on March 5, 2015

That's not a run-on, and all the commas are properly used. Its stylistically awful because there are way to many apositive/parenthetical phrases getting in the way, and because they are nested without using an alternative device (like setting the outer one off with dashes rather than commas) as well as pointless excess verbosity ("at least five years away from being a decade behind the gaming curve").

NamTaf · on March 5, 2015

When I say run-on I guess I mean the sentence just drags on when it could be two or three separate sentences with no negative impact. It makes it difficult to read and I, at least, lose track of where I am in it.

It's not incorrect per se, it's just not well-written as far as I'm concerned. But what do I know, I'm not a journalist.

mjrpes · on March 5, 2015

Sentences like this are kind of the New Yorker's erudite (you could argue pretentious) style. It works better on the the printed page than on the web. If you read the New Yorker often enough your mind gets used to it and it isn't too much of a bother. Granted, this sentence would have been better off with a period in there somewhere.... it's a bit overkill.

eli_gottlieb · on March 5, 2015

The problem is that, in this case, "erudite" means "directly serializing the writer's complex and nuanced thoughts without being able to serialize the writer's previous context that allowed him/her to hold all those thoughts in a single structure." We tend to break things down further when trying to communicate because we no longer assume a precise educational background common to All Educated People, but instead a diverse range of backgrounds. We still know the reader can handle complex, nuanced thoughts, but we need to break them down into smaller pieces that the reader can assemble according to their own framework of previous knowledge.

mercer · on March 5, 2015

I'd argue that you just described the point where 'erudite' becomes 'pretentious'.

I'm a voracious reader and a huge fan of 'unusual' words, for some reason (perhaps because I learned English through books).My general approach to language is sometimes judged to be 'pretentious' or at least 'bookish' by people who don't know me well. Once they do, though, they realize that I just love playing with words and language.

While I don't shy away from using words that aren't too common, I always try to make sure to avoid needless complications, and I regularly rewrite sentences to make them easier to understand (while still using 'big' words because I just like them, or they best describe what I'm trying to convey).

The New Yorker often seems to cross the line between enjoying the richness of the English language and deliberately overcomplicating things. I don't really understand why, unless the goal is to be pretentious.

eli_gottlieb · on March 6, 2015

I actually write a lot like that if I'm not deliberately trying to be clear and simple. I imagine the New Yorker's journalists have the same problem, but aren't trying to solve it.

m-i-l · on March 5, 2015

It is the umlauts (or technically in this case diaeresis), on e.g. preëxisting, that bothers me more - hasn't everyone else been using hyphens for decades?

hyperbovine · on March 4, 2015

Another issue that this article sort of touches on but doesn't make explicit: the real world is not a Markov decision process. There are complex, variable-order time dependencies which we are barely aware of but which influence our thinking every second of every day. Trying to model this in software leads to an exponential increase storage and time complexity. The curse of dimensionality has been with us ever since Bellman coined the phrase almost 60 years ago; it's not going away anytime soon. Thus, it's difficult for me to see how deep Q-learning (or any other MDP-based algorithm) gets us any closer to human-level understanding.

Houshalter · on March 4, 2015

Recurrent neural networks with e.g. Long Short Term Memory (https://en.wikipedia.org/wiki/Long_short_term_memory) can model very long term dependencies effectively, and there has been some work lately that get good results with simpler models.

They don't use it because it's computationally expensive and totally unnecessary for Atari games, but it's certainly possible.

skybrian · on March 4, 2015

"also discovered a way to win [breakout] that its creator never imagined"

I don't understand. We often would bounce balls between the top wall and the bricks while playing breakout on our Atari 2600 back in the day. And I wouldn't say we were all that good (it didn't happen right away).

nebulous1 · on March 4, 2015

Yeah. It's a bit odd that the author didn't know this or discuss his thoughts with somebody who knew this (which would be most people who've played Breakout, it often happens accidentally).

edit: Having just watched the source video, she may be actually referring to the creator of the AI and just badly rephrasing what the guy in the video says ( he says that they didn't expect the AI to be able to work that out with the abilities they had given it ).

karpathy · on March 4, 2015

I really like this line of work and I expect will grow quite substantially over the next few years. Of course, Reinforcement Learning has been around for a long time. Similarly, Q Learning (the core model in this paper) has been around a very long time. What is new is that normally you see these models applied to toy MDP problems with simple dynamics, and linear Q function approximations for fear of non-convergence etc. What's novel about this work is that they fully embrace a complex non-linear Q function (ConvNet) looking at the raw pixels, and get it to actually work in (relatively speaking) complex environments (games). This requires several important tricks, as is discussed at length in their Nature paper (e.g. experience replay, updating the Q function only once in a while, etc.).

I implemented the DQN algorithm (used in this work) in Javascript a while ago as well (http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo...) if people are interested in poking around, but my version does not implement all the bells and whistles.

The results in this work are impressive, but also too easy to antropomorphise. If you know what's going on under the hood you can start to easily list off why this is unlike anything humans/animals do. Some of the limitations include:

- Most curcially, the exploration used is random. You button mash random things and hope to receive a reward at some point or you're completely lost. If anything at any point requires a precise sequence of actions to get a reward, exponentially more training time is necessary.

- Experience replay that performs the model updates is performed uniformly at random, instead of some kind of importance sampling. This one is easier to fix.

- A discrete set of actions is assumed. Any real-valued output (e.g. torque on a join) is a non-obvious problem in the current model.

- There is no transfer learning between games. The algorithm always starts from scratch. This is very much unlike what humans do in their own problem solving.

- The agent's policy is reactive. It's as if you always forgot what you did 1 second ago. You keep repeatedly "waking up" to the world and get 1 second to decide what to do.

- Q Learning is model-free, meaning that the agent builds no internal model of the world/reward dynamics. Unlike us, it doesn't know what will happen to the world if it perfoms some action. This also means that it does not have any capacity to plan anything.

Of these, the biggest and most insurmountable problem is the first one: Random exploration of actions. As humans we have complex intuitions and an internal model of the dynamics of the world. This allows us to plan out actions that are very likely to yield a reward, without flailing our arms around greedily, hoping to get rewards at random at some point.

Games like Starcraft will significantly challenge an algorithm like this. You could expect that the model would develop super-human micro, but have difficulties with the overall strategy. For example, performing an air drop to enemy base would be impossible with the current model: You'd have to plan it out over many actions: "load the marines into the ship, fly the ship in stealth around the map, drop it at the precise location of enemy base".

Hence, DQN is best at games that provide immediate rewards, and where you can afford to "live in the moment" without much planning. Shooting things in space invaders is a good example. Despite all these shortcoming, these are exciting results!

ScottBurson · on March 4, 2015

> Of these, the biggest and most insurmountable problem is the first one: Random exploration of actions. As humans we have complex intuitions and an internal model of the dynamics of the world. This allows us to plan out actions that are very likely to yield a reward, without flailing our arms around greedily, hoping to get rewards at random at some point.

In fairness, we weren't born with that model; we have to laboriously acquire it over a period of several years. An infant's flailings can look pretty random :-)

fitzwatermellow · on March 4, 2015

Thanks, karpathy!

I did submit your JS implementation to HN when I came across it: https://news.ycombinator.com/item?id=9108738

Monte Carlo Tree Search could be the missing link. In other words, use DQN to model the world and map actions to a value function. Then use playouts and backpropogation of action tree results to find tactics. Of course, it does not solve the big question: how to model "memories" and "inferences"? Indeed, very exciting times for AI/ML!

zhanwei · on March 5, 2015

We need a model to use Monte Carlo Tree Search... which is missing in this approach since it uses model-free reinforcement learning. Unless the deep convolution net can extract some latent features as state, it would be impossible to do planning on top of it.

rndn · on March 5, 2015

What are the indications that a model of the world does not emerge somewhere in the neural networks of a Q Learning system?

darkmighty · on March 5, 2015

The system as shown has a state of a few frames. Suppose it had 1 instead for illustration. Suppose there's a dangerous ball right next to you, but it's actually moving away from you, so you don't need to run; in this model, you will probably learn to always run since you cannot know the velocity of the ball (whether it's coming at you). So this model could only work if you have a state-full representation of your world, which seems to be the case with simple games and a few frames, but is not the case in the real world (in the real world your naive state would need to arbitrarily back in time -- e.g. say you see a ticking time bomb, and then the clock is hidden from you).

This flaw is understandable since it seems Q-Learning was conceived to deal with Finite Markov Processes, which are finitely state-full, by definition (if you don't know them, they're essentially a non-deterministic state machine).

darkmighty · on March 5, 2015

Great answer!

Is there some exciting work being done to address those issues? (could you point to some?)

kriro · on March 5, 2015

I'm only loosely familiar with the "general game playing" literature but I think learning the structure of the game is the interesting research problem here. An immediate experiment that I'd like to try would be:

(a) Identify games considered similar (+maybe define what it means to be similar)...let's take Pong and Breakout as suggested (b) Trial and error run of one of the games to learn the action-reward structure (c) Compare a fresh relearning of the second game to a start where the similarities/differences are pre-input to the AI "somehow" (some sort of diff between the game rules etc.)

I'm thinking of a gamer thinking "oh this is just like X,Y,Z except..." when picking up a new game.

chriswarbo · on March 5, 2015

This idea is known as "incremental problem solving". Some external oracle, eg. a parent, teacher or programmer, provides a series of problems, where each builds on ideas from the previous.

An example algorithm is the Optimal Ordered Problem Solver, which tries to solve each problem by generating simple programs. Successful programs get stored in read-only memory, then the system moves on to the next problem, generating programs which may call out to any previously-successful programs: http://people.idsia.ch/~juergen/oops.html

woodchuck64 · on March 4, 2015

> “They can find their way across a room,” Mason said. “They can see stuff, and as the light and shadows change they can recognize that it’s still the same stuff. They can understand and manipulate objects in space.”

Isn't this just adding extra dimensions to the input space? We have 2D now (plus time?), we're missing Z, sound, sensation, maybe emotions. Each added dimension gives the algorithm exponentially more bits to crunch but if computer speed is doubling every 2 years or so, why is this so obviously a dead-end to Mason?

clickok · on March 4, 2015

The cynic in me would guess it's because he's not a big fan of neural nets, and prefers a more symbolic/statistical machine learning approach.

Personally, I don't think this is a dead end. Reinforcement learning with an effective method of representation learning is pretty much all you need for a general AI. Here, deep neural nets are able to learn how to represent the massive state space for vision quite well, but there's a bit of a mismatch in games where actions have to be taken in a specific order (e.g., where extended pathfinding is required) because for reinforcement learning to work well, your features need to capture that sort of temporally extended state information.

That's still a hard problem, but not insurmountable, and there's already strides in that direction. From the RL side, there's things like option models for taking series of actions, and from the deep learning side there's things like long short term memory and DeepMind's own neural Turing machines.

Knowing the group at DeepMind, they'll be able to crack it, and I think Mason is being entirely too pessimistic about the timeline in any event. Fifty years to control a drone?

simonbyrne · on March 4, 2015

I think the distinction is that the algorithm gets to observe the 2D space directly, whereas the whole notion of 3D space has to be learned from 2D projections.

Also, note that the games they do best on have a nice clear objective function: Montezuma's Revenge on the other hand does not have such a numeric objective to optimise.

To be fair, Hassabis does freely concede these limitations in his talks (at least in the ones targeted to academic audiences), and it will be certainly interesting to see where they go from here.

nemo44x · on March 4, 2015

That's a good point about Montezuma's Revenge, and similar games. Very rarely does a greedy algorithm help you in that game unlike in a breakout style game where you simply want to acquire as many points as quickly as possible.

woodchuck64 · on March 4, 2015

> the whole notion of 3D space has to be learned from 2D projections.

That might be extremely difficult, but nature does it with binocular vision so I would think dual vision inputs are the best way to do 3D space learning.

msoad · on March 4, 2015

The article says humans can do "transfer learning" while machine can't. It shouldn't be impossible to implement transfer learning in machines too.

svachalek · on March 4, 2015

I don't see anywhere it says machines can't; it says DeepMind doesn't. There are a few philosophers who believe human thought is more or less magic and can't be replicated in any way but all of the arguments I've seen are either deeply flawed logically or are so specific as to more or less read they can't "be" human, which I'll accept as true but isn't terribly interesting.

one-more-minute · on March 5, 2015

There are plenty of smart people elsewhere who believe the human mind is un-simulatable. For example, Roger Penrose argues that thought processes are deterministic but non-algorithmic.

I've never really understood that perspective, though. Surely, in the absolute worst case, we could just make an atom-for-atom copy of a human brain? Even if you take seriously the idea that there's some kind of magic consciousness juice that exists outside the universe, evolution has managed to hook into it and surely so can we.

Padding · on March 5, 2015

Except that atoms are not homogenous entities.

Even assuming you'd somehow manage to produce and combine atoms to a spec, there's positively no way of obtaining that spec.

one-more-minute · on March 5, 2015

Even if thinking does somehow depend on quantum effects, it seems hugely unlikely that it would depend on the specific quantum state of individual atoms.

If it does, you don't need a spec of that state, since we know that it can emerge from something simpler (humans start out as a single cell, after all, and so in fact did all of humanity). You don't need the whole system, just the right initial conditions.

At that point you're growing a brain rather than engineering one, and maybe it takes you no closer to understanding the mechanics. But the point stands that it must be possible to construct a brain in principle, because it's already happened so many times before.

eli_gottlieb · on March 4, 2015

Statistical learning models of transfer learning have been published in several venues over the past few years.

danesparza · on March 4, 2015

"In the longer term, after DeepMind has worked its way through Warcraft, StarCraft, and the rest of the Blizzard Entertainment catalogue, the team’s goal is to build an A.I. system with the capability of a toddler. "

Wait ... what? You're going to teach this thing using violent video games? This seems like a bad plan...

throwawaymsft · on March 4, 2015

The AI doesn't have a notion of "violence". There are goals and obstacles to that goal.

Goal: Human health, obstacle: viruses.

Goal: Clean energy, obstacle: friction, entropy, battery limitations

There are concerns that humans might inadvertently become an obstacle to some greater goal, but training on Warcraft/Starcraft where you are "fighting" isn't special in this regard. In chess or you are battling your opponent too, "killing" their pieces, etc.

puzzlingcaptcha · on March 5, 2015

Goal: Make paperclips (http://wiki.lesswrong.com/wiki/Paperclip_maximizer)

higherpurpose · on March 4, 2015

Then, they'll teach it to watch all the Terminator movies, you know...for science.

diziet · on March 4, 2015

I will get excited once it can solve Bongard puzzles: http://www.foundalis.com/res/diss_research.html

Practicality · on March 4, 2015

The video referenced at the beginning of the article: https://www.youtube.com/watch?v=EfGD2qveGdQ

raldi · on March 4, 2015

Freeway isn't a driving game. It's a chicken-crossing-the-road game.

I'm looking forward to the upcoming entry in next week's New Yorker "Corrections" section.

raldi · on March 5, 2015

Um, holy shit, the article's been updated to refer to Freeway as "a chicken-crossing-the-road game", and there's a footnote acknowledging the correction.

Mobiu5 · on March 4, 2015

This is actually pretty scary. It is basically giving AI a human-like form of will. It "desires" what you program it to desire and goes about achieving it, learning from its' own mistakes and becoming increasingly proficient at manipulating its' environment to achieve its' goal(s) along the way. It makes me excited, but also quite frightened to think what goals people might give AIs like this in the future...

PeterisP · on March 5, 2015

Well, that kind of is the definition of a learning agent. Any learning system would be "learning from its' own mistakes and becoming increasingly proficient" and any agent would, by definition, be "manipulating its environment to achieve its goals".

astazangasta · on March 5, 2015

This is a really poor rendering of "desire". If the word "desire" is meaningful, it has to be self-actualized. Otherwise it's just a programming condition, no different than a fuse box or a dead man's switch.

Houshalter · on March 5, 2015

It doesn't matter what word you use, the AI is still manipulating it's environment to achieve that "programming condition".

In any case, everyone's desires are programmed into them, just via genetics and evolution rather than humans and programming.

Mobiu5 · on March 5, 2015

Well, it is very different from a fuse box or a dead man's switch because it is CREATIVELY manipulating its' environment to achieve what it "wants". Also, much of what WE want is hard-coded as well.

t_fatus · on March 5, 2015

"Hours after encountering its first video game, and without any human coaching, the A.I. has not only become better than any human player but has also discovered a way to win that its creator never imagined." this is where I stopped. A 12Y old could find this trick, and most of us have used it when we played.

glial · on March 5, 2015

Bad writing, sure. But the fact that you're comparing the ingenuity of a computer to that of a 12 year old is a sign of profound progress being made in AI.

daronjay · on March 4, 2015

Shall we play a game?....