> AlphaGo can play very well on a 19x19 board but actually has to be retrained to play on a rectangular board.
This right here is the soft underbelly of the entire “machine learning as step towards AGI” hype machine, fueled in no small part by DeepMind and its flashy but misleading demos.
Once a human learns chess, you can give it a 10x10 board and she will perform at nearly the same skill level with zero retraining.
Give the same challenge to DeepMind’s “superhuman” game-playing machine and it will be an absolute patzer.
This is an obvious indicator that the state of the art in so-called “machine learning” doesn’t involve any actual learning in the way it is normally applied to intelligent systems like humans or animals.
I am continually amazed by the failure of otherwise exceedingly intelligent tech people to grasp this problem.
Try learning to ride bike with inverted steering, try to navigate world with your vision flipped over or use your non-dominant hand to do things that you normally do. Well, try to write on Azerty keyboard if you are Qwerty native (really, fk Azerty :P).
Humans are also not a general intelligence.
In certain sense Deep Reinforcement Learning is actually more general than human intelligence. For example, when playing games you can remove certain visual clues. It makes it almost impossible to play for humans, while Deep RL scores will not even budge. It means that Deep RL is more general, because it does not relay on certain priors, but it also makes it more stupid in narrow domain of human expertise. Try this game to see yourself: https://high-level-4.herokuapp.com/experiment
Human brains are amazing, but they also require certain amount of time to retrain when inputs/outputs are fundamentally changed.
PS. I didn't hear about anyone testing different board sizes with AlphaZero-esque computer players. But I saw Leela Zero beating very strong humans, when rules of the game were modified so that that the human player could play 2 additional moves: https://www.youtube.com/watch?v=UFOyzU506pY
This is true too, we are adapted to our environment, in particular the things that we do automatically (System I).
Playing chess well is a combination of both conscious and unconscious skills. However when deep learning systems play, it is all the unconscious, automatic application of statistical rules. They are playing a very different game from the human chess game.
Because there is no abstract reasoning involved here, these systems cannot apply the lessons learned from chess to another board game, or to something completely different in life, which humans can and do. So even though they are much stronger than human players, they aren't strong in the same way.
>Once a human learns chess, you can give it a 10x10 board and she will perform at nearly the same skill level with zero retraining.
Interesting. Has this actually been shown? I would assume a lot of the strategies a human is familiar with would fall apart as well. I'm no chess or go player but I would have to learn new strategies in a tic-tac-toe game scaled to 10x10. I would certainly not be as proficient although I would still consider myself to have intelligence.
Almost all of the human strategies and concepts would still apply: center control, square control, development, initiative, king safety, the opposition, etc. The only exceptions would be fringe concepts like opening theory (already moot in Chess960) and endgame edge cases.
If you’re still not convinced,
I’ll prove that skills transfer by playing bullet against anyone who can make a 10x10 variant playable online.
I'm pretty sure this isn't how chess masters actually play chess. That'd be too slow and error prone. They pattern match very heavily in the beginning and towards the end. And all those patterns would be wrong on a larger board with more chess pieces. At least for chess.
I'm not a chess master myself, my knowledge of this topic is from what I've read of Kasparov. The guy is all about patterns. He even advises to play using a physical board as much as possible, to improve _visual_ recognition of patterns. Thing is, you might not even recognize this as "pattern recognition" per se. Call it "intuition", call it "experience", or whatever you like, but combinatorially I'm pretty sure you're not searching the entire tree of possible positions several moves ahead - that's literally impossible to do for a human to do in the finite time allotted to a game. You're relying on patterns to constrain the search, much like a modern neural algorithm would constrain its search using a cost function. That's what's meant by "pattern recognition" here, not rigid recognition of fixed positions. That is also combinatorially impossible for a human to precisely memorize.
"Center control", "square control", etc. all sound to me like things discovered by pattern matching. Yes, the patterns are large and somewhat abstract, but they're still patterns.
How? Remember, there'd have to be more pieces, and the new pieces could have different moves. The game would also be dramatically more difficult, combinatorially.
Chess is already very difficult, combinatorially. Chess players learn patterns with the current pieces. Those patterns include patterns that work both locally and globally. When learning a particular pattern, chess players typically can generalize that pattern to other cases. (This is what tactics training is all about. You're almost never going to find the exact same pattern that you learn in tactics training in the real world but tactics training will help you recognize similar patterns.) New pieces are going to give you new patterns you have to learn. But ranks, files, and diagonals are not going away. New pieces will likely be a bishop-knight combo piece and a rook-knight combo piece as in Seirawan chess. Therefore, they will still have common patterns you can recognize. The larger board is not a non-issue but it's not anywhere near as drastic a change as you're making it out to be. Most tactics in chess don't make use of the fact that the board is 8x8 rather than 10x10. They'll work in both boards.
I'm not saying I'm definitely going to do this, but is there a rulebook somewhere for 10x10 chess? (What are the initial piece positions, and how would castling work?)
I’m not aware of 10x10 chess ever being attempted, let alone codified. Here’s my suggestion: add another set of bishops (or knights, or one of each) in between rooks and king/queen. Castling works the same, king goes around rook.
Former world chess champion Capablanca suggested a 10x8 board with two additional pieces in the 1920s, but there have been many variants proposed earlier and later [0]. Grand Chess [1] is the most known 10x10 variant, also with 2 additional pieces and a different start position, castling not allowed. See wikipedia for links to programs implementing these rules.
Interesting. Note that both of those types involve introducing new pieces. I would argue that this changes the fundamental nature of the game in a way that increasing board size alone doesn’t. The reason is that I (as a human chess master) would need to retrain myself to learn the new piece movements.
What I really want is a 10x10 or even 8x10 board using the original set of pieces. This would be sufficient to prove that human chess masters can adapt in a way that machine-learning based algorithms cannot.
Wouldn't adding more of the existing pieces in the back row change the game a bit too? Which piece(s) would you suggest having more of to accomodate the extra fields?
It would change the game but not enough to reduce the performance gap between an expert and a novice. See my grandparent comment for a specific suggestion on which pieces to use.
Generalized NxN chess is a thing I've seen talked about in complexity theory. I'm not sure if anyone has actually made a proper set of rules for it though. It looks like they often just don't care about such trivialities as starting positions (and probably castling). E.g. http://www.ms.mff.cuni.cz/~truno7am/slozitostHer/chessExptim...
Why stick to chess? Magic: the Gathering is a game that is played with cards with printed rules text that describes how a card should be played. The set of cards is constantly updated with a few hundred new cards introduced at least twice a year (although games are often played with only a subset of all cards).
Despite the constant change of the card pool, and also the wording of the rules text on the cards, and the rules themselves, human players are perfectly capable of "picking up a card they've never seen before and playing it" correctly.
Perhaps a better example than 10x10 chess would be bughouse chess [1]. That's a chess variant played between two teams of two players using two sets and two clocks. It's a common break activity between rounds at amateur chess tournaments. Human chess players of all levels pick it up pretty fast after they play a handful of games.
I think a lot of AI research now is very narrow, but this ignores that there's also a lot of research in RL/etc that's working to solve the problem of generalization.
Meta learning is for solving similar problems from a distribution (like different sized boards in your chess example) and has taken off recently (only baby steps so far though). Modular learning is also becoming big, where concepts that are repeatedly used are stored/generalized.
Of course if you mess with a function's inputs in ways it's never seen it's going to "not understand" what's going on. This is an agent which only knows 8x8 space.
Train it on variable spaces, and you'll get an agent that can play on variable spaces. In fact, you can probably speed things up drastically by using transfer learning from a model which already learned 8x8 space and modifying the inputs and outputs to match the new state and action space.
What part of this do you think "exceedingly intelligent tech people" aren't grasping? Something qualitative? Do you think people in machine learning think of "learning" as literally meaning the same thing as the colloquial usage? What, precisely, are you attacking here? All the harsh anti-machine-learning viewpoints with no clarity are becoming exhausting.
The GP described the "hype machine" and the implication that deep learning is step to AGI. As far as I can tell, the "hype machine" is real in sense that popular articles describe current methods as steps towards our broad concept of intelligence.
Certainly, someone close enough to the technical process of deep learning will admit that it essentially an extension of logistic regression without any "larger" implications - at least some deep learning researchers are always clear to distinguish the activity from "human intelligence" (and even if a given research never parrots the hype train's mantra, they know it's there and inherently play some part).
But more a minimum assertion of deep learning is that it "generalizes well". And what does "well" mean in this context? In the few situations where data can be generated by the process, like Alpha-Go, it can make a good average approximation of a function but in most situations of deep learning it means "generalizes like a human" - especially image recognition.
This comes together in the process of training AIs. Researchers take data that they hope represents a pattern of inputs and output in a human decision making process and assume they can construct a good approximation of a function that underlies this data. A variety of things can go wrong - the input data can be selective in ways the researchers don't understand (there was a discussion about a large database of images from the net being biased just by the tendency of photographers to center their main subject), there can be no unambiguous "function" - loan/parole AI that's inherently biased because it associated data that isn't legitimate, objective criteria for the decision sought), and so-forth. Some tech people are aware of the problems here to but this stuff is going out the door and being used in decisions affecting people's lives. Merely noting possible problems isn't enough here. These "exceedingly smart people" are still handing off their creations to other people are taking them as something akin to miraculous decision makers.
The question we should be asking is how much retraining had to occur to accomplish the new task? If it's significantly less than to the accomplish the original task, the algorithm has transferred its latent knowledge from the original task to the new task, which is significant.
Humans have orders of magnitude more neurons, more complicated neurons, more intricate neural structures, and their training data is larger and more varied.
Well if arguments by reduction are on the table then "learning" is just a fancy way to say "complex chemical process in the hippocampus that we don't understand that well".
Agreed except for training data is larger. Training data is often far smaller for humans. You probably saw a few cats before generalizing and understanding what a cat looks like. A neural net might require hundreds of thousands if not more samples to be a robust classifier for cats. AlphaGo et al look at tens of millions of games, humans look at a small fraction.
This doesn't have much to do with the algorithms, and is more to do with the engineering decisions that went into AlphaGo and AlphaZero. They are designed to play one combinatorial game really well. With a bit of additional efffort and a lot of additional compute, you could expand the model to account for multiple rule / scale variations, maybe even different combinatorial games.
I think it's quite important to look at the distinction between the actual agent in play and the learning algorithm used.
The learning algorithm AlphaGo uses is somewhat general, and can handle different games (e.g. you can put chess or Go through the algorithm and it functions well for either).
The output of this algorithm, however, is a specialised agent. The agent is not general. If I create a chess agent and give it Go or chess with different rules, it will perform very poorly.
Creating general learning algorithms is arguably a somewhat easier task than creating a general agent, since learning algorithms are typically run for a long time while an agent often has to make time constrained decisions.
The holy grail of AGI is to make the learning algorithm and the agent the same thing, and have them be general. Then you have an agent which can rapidly adapt to its environment and self-modify as needed. We are still a long way off a system that would do this in terms of current research.
The distinction you’re making between agent and algorithm is meaningless for the point I was trying to make, which is that the only connection between this DeepMind research (agent, algorithm, whatever) and AGI have in common is the word “general”.
Their “general learning” tech doesn’t even generalize to barely modified variants of the original games it has claimed to master. I call bullshit.
> Their “general learning” tech doesn’t even generalize to barely modified variants of the original games it has claimed to master. I call bullshit.
But the point I was making is precisely that the "general learning" tech is in fact somewhat general. AlphaGo and certainly AlphaZero's learning tech generalises to Go, chess, and a few other games. That's relatively general in the domain of board games, in my humble opinion.
The reason this isn't close to AGI is because it's not the agent doing the learning, and so while a relatively general learning algorithm produces the agent, the agent itself is not general even in the field of board games.
You appear to be completely missing the point of my root comment, which is that AlphaGo’s tech isn’t nearly as general as it’s made out to be, even if you stick to Go.
> AlphaGo can play very well on a 19x19 board but actually has to be retrained to play on a rectangular board.
It doesn’t even generalize to the same game with a different board shape. Whereas a human Go master could easily do so.
DeepMind is essentially hacking the common usage of the word “general” in order so that they can make claims about “general” intelligence. And it’s working!
But the training process does generalise. The same training process produces an agent that works on a 19x19 board, or a standard Go board, or even a game of chess.
How is that not general? Sure it doesn't work for all problems but in the domain of board games it definitely feels very general.
The agent the training algorithm produces may not be general, but out of what I've read I've only ever seen DeepMind claim generality of the learning algorithm, not the agent.
I think the GP was noting the problem that AI can easily encounter situations beyond what it was designed and simply fail while human intelligence involves a more robust combination of behaviors and thus humans can generalize in a much wider variety of situations.
If the system designer has to know the parameters of the challenge the system is up again, it should be obvious you can always add another parameter that the designer didn't know about and get a situation where the system will fail. This is much more of a problem in "real world situations" which no designer can fully describe.
I'm coming to suspect that even our data isn't enough for useful AI. Imagine you had a truly general sci-fi AI at your office. It still couldn't just look at your database and answer a simple question like "What was the difference in client churn rates between mobile and desktop last month?" or "What was the effect of experiment 1234 on per-client revenue?" Hell, a human couldn't do it. As far as the human or AI would know, you just presented it with a bunch of random tables. This matters because it's incredibly helpful to know which pieces are randomized. Which rows are repeated measurements as opposed to independent measurements. Which pieces are upstream of which others. There's so much domain knowledge baked into data, while we just expect an algorithm to learn from a simple table of floats.
The human state of the art solution seems to be going on slack and asking questions about the data provenance, which will decidedly not work for an automated approach.
A primary reason I can do better a better job than a generic algorithm is because you told me where the data came from (or I designed the schema and ETL myself), while the algo can't make any useful assumptions because all that info is hidden.
I'm beginning to suspect that it could be relevant soon. If you wait for such advanced AI that can understand your documentation, you're making it harder than it has to be, kinda like Marcus is saying in OP. You could solve this with just tons of raw data, but that seems unnecessarily hard. For a firm with the usual small dataset, maybe even unrealistically hard.
Anyways, with some naive googling I found these references which seem interesting with regards to lineage and causality (for the query "lineage causal database schema"):
[0] Causality and Explanations in Databases. The "Related topics in Databases" section seems interesting.
[1] Duke's 'Understanding Data: Theory and Applications, Lecture 16: Causality in Databases' (by one of [0]'s authors).
[2] Quantifying Causal Effects on Query Answering in
Databases. This has some interesting definitions.
[3] Causality in Databases. Seems like a more in depth version of [0].
[4] A whole course on "Provenance and lineage"
[5] Causality and the Semantics of Provenance. Defines "provenance graphs" and some properties.
If you don't know the sampling regime that generated the data, you might as well give up. The only solutions to this seem to be collecting and providing all the data to ML analysis, or using provenance to inform the training procedure, which requires human expertise.
However, with a more general AI, we would be able to tell it "this is where and how the data were collected" and it could make the necessary inferences. Fully general AI would also be able to ask the right questions and make reasonable guesses on its own. Everything you do now, and nothing like anything that's been developed.
To me the weak point of this article isn't in the thesis (of course everyone can agree that general intelligence is more useful than narrow, all else being equal) but that there was nothing said about how to get there. The only reason ML is currently resurgent is because it we've figured out how to do something that works, while general intelligence has proven beyond our reach for 60+ years.
Edit: actually it seems to be "classical AI" and hybrid approaches and I guess for more details one would need to read the book.
I guess articles like this are worthwhile to temper expectations of what's possible with the current crop of technologies for those not in the field, which could help prevent another winter due to overinflated expectations.
I totally agree with your statement about AGI, but wouldn't be as pessimistic about neural networks in general. Of course our data is enough for useful deep neural models! Many problems can be solved without it, but in areas of computer vision and speech recognition they seem to be the best (currently known) choice.
Your point about AGI which needs to ask questions about data provenance is super interesting. Are you aware of the line of inquiry into active learning? It's fascinating and has a long history:
https://papers.nips.cc/paper/1011-active-learning-with-stati...
The point is that data provenance isn't encoded in the data. Like the difference between a schema where column X has type `int` versus X having type `do(int)` or something cleverer. If the way you get your causal model is to ask the person who ran the experiment, then it's very much an uphill battle for an algorithm to get a causal model. We want to enable automated causal inference, so we should better record our causal models (data lineage).
I've been waiting for the Symbolic/NN pendulum to starting swinging back the other way and start settling in the center. NN/DL is great for the interface between the outer world and the inner world of the mind (pattern recognition and re-construction), and symbolic AI more straightforwardly represents more "language of the mind" tasks, and easily handles issues like explanation and other meta-behaviors that with DL is difficult due to its black-box nature. DL's reliance on extension/training vs. intention/rules can develop ad-hoc intentional emergent theories which is their strength but also their weakness as these theories may not be correct or complete. Each can be brittle in their own way - so it'll be interesting to see more cross-pollination.
Lenat's Cyc? I'm quite familiar. I hadn't looked at it in about 25 years until I just happened to come across the ATT-CYC docs a few days ago (I seem to be missing Part 2) and a printout of a PPT that he gave group of us at Microsoft in 1994/5 or so.
SAT solvers are really fast now. Some sort of "neural SAT problem definition" followed by solving it seems to be an interesting direction, but I'm relatively naive on it all. Not sure how training would work since there's no backprop through Boolean logic.
I'm not an expert but it seems like whatever symbolic reasoning humans have is pretty rudimentary anyways compared to what we're doing on computers already, so I could see the union being very powerful.
I started reading Rebooting AI last night. I think that Marcus and Davis (so far in the book) take a reasonable approach by wanting to design robust AI. Robust AI requires general real world intelligence that is not provided by deep learning.
I have earned over 90% of my income over the last five or six years as a deep learning practitioner. I am a fan of DL based on great results for perception tasks as well as solid NLP results like using BERT like models for things like anaphora resolution.
But, I am in agreement with Marcus and Davis that our long term research priorities are wrong.
As much I'm hoping there'll be a breakthrough in AGI, maybe the right approach is the one AlphaGo was using: DL not as the top level decision-making, but plugged into a traditional decision-making algorithm in specific places.
Yes. Though arguably this is just the same thing as the traditional approach to deep learning, which is to select features to input into a model. You don't always just have to train on raw information, pre-processing the input data and calculating specific features that we know, as humans, will be important for the final result and feeding those in addition to the raw data is a common approach. Don't see much difference between this and taking the output of a network and running it through a few decision trees. Most publishable projects applying AI you see typically have these type of human interaction on both sides of models.
> As much I'm hoping there'll be a breakthrough in AGI
I think it probably won't be one breakthrough, but several, over decades. Personally, I'm pretty happy that AGI is taking a long time to materialize. We likely won't see a "fast takeoff scenario" (the computer is learning at a geometric rate !!1). It will likely happen gradually over years (progressively more intelligent, more aware computer systems), and we may have a chance to adapt in response.
A business professor told me that cars were entirely incremental innovation all the way from the model T. Just little improvements, one at a time. I don't know if that's true, but I wonder if it will be an apt analogy for AGI - one feature at a time, and older attempts at it just look outdated.
Perhaps, although the automobile has always had a clear explicit purpose: convert potential energy into rotational energy in a controlled manner. It’s hard to identify such a singular purpose for AGI.
As someone who once got stuck in an intersection after flooding a '79 Monte Carlo's carburetor while deciding to go on red, I'm with you on fuel injection. But I could see the counter-argument that something that makes an experience nicer is not radical innovation. That old junker got me where I need to go for a while.
I think that’s fair, deep learning today has an issue with learning guide rails and obviously it is only as good as the data you feed it. I think it’s fair that our models need more
Making 90% of your income off of this tech over the last n years is different than that tech being successful. I work at a very large company that is trying to use ML and AI in all kinds of places. The trend I am seeing is that most of that effort is falling flat, really flat, in fact. They have success in places where regular algorithms would also succeed, but just having regular developers design matching systems and do pretty basic statistics isn't sexy in terms of marketing, so they hush it all up and pretend that ML and neural nets and things are the only way forwards. I don't think it is. Our problem is that we have too many ETL robots who aren't very intelligent people and not very forward-thinking themselves!
The cutting edge NLP stuff just showcased at my company was pretty lame, too. I barely saw any statistically significant results at all and yet they rather unscientifically proclaim success because they got any effect at all. Some of what we do in our field doesn't matter because it comes down to whether a customer got a 2nd call back and got converted to some minor sale or added to a program for them. It's throw away and creates good will at conferences and talks. We make a big deal out of it.
We are spending hundreds of millions on projects, trying to save money on generating leads, reducing interactions with customers and vendors through staffed phone banks, and so on. My company has hired all kinds of academics and research type people and has given them titles of "Distinguished this" and "Principal that" and honestly there's not that much to show for it, maybe zero direct outcomes so far. What galls me the most is in all the conferences and demos they are showing off things like High School robotics vehicles and AI parlor tricks and astonishingly little has translated into the business we do. Meanwhile, there are people in the company who do know how to reduce costs and get more done and have outstanding outcomes, but their techniques are not sexy and thus unimportant to the PT Barnum MBAs running our company. I'm sure that's true most everywhere, of course.
These Principals and Distinguisheds all keep proclaiming success while cashing fat paychecks. Meanwhile this year, our stock has had a tough go of it, so I'm curious whether these attempts will continue. The market takes no prisoners. Sure we get a lot of mileage out of looking cool for the recent grad crowd purposes of recruiting--kids want sexy, cool tech projects to work on and words like "insurance" turn them off, so there's that, I guess.
My take on that is that it won't be long before all those new recruits will figure out they got bait and switched pretty bad and that they aren't going to get to work on any of this sexy ML and AI stuff anymore than I am in my role. I got lured in by Data Science (because PhD), which just shows how gullible I am, but at least some of that traditional statistical modeling is having an impact here and there. The problem again is that even that is overblown by a couple of orders of magnitude! In my project, we're simply trying to get more real-time data out to people who need it without having to call in to get it and that is ridiculously difficult because of all the systems we try to knit together and how overall terrible our data quality is. And now my boss wants to build out an "analytics engine" to capture some of this sexy ML and AI stuff. It leads me to believe that the people involved are most interested in getting promoted and not much more.
Anyways, it is cool tech, but American taxpayers and people who are forced to buy our products are paying for it and I rather think they would prefer to spend their money in some better fashion.
Thanks for your response. Sure, the level of hype is rather high for DL.
That said, I also lived through and worked through the level of hype around expert systems. I think the high level of hype around expert systems in the 1980s was much more extreme and unwarranted that the DL hype levels. I base this on selling expert system tools for both Xerox Lisp Machines and for the Macintosh when it was released in 1984. Some of my customers did cool and useful things, but nothing earth shaking.
At least DL provides very strong engineering results for some types of problems.
there are some surprisingly weak arguments in the text. It's correct to not treat computational resources as constant, but ot treat them as unimportant or negligible is awful as well.
Already computational resources are becoming prohibitive with only a few institutions producing state of the art models at high financial cost. If the goal is AGI this might get exponentially worse. Intelligence needs to take resource consumption into account. The models we produce aren't even close to high level reasoning and we're already consuming significantly more energy than humans or animals, something is wrong.
The scale argument isn't great either because deep learning is running into the inverse issue of classical AI. Now instead of having to program all logic explicitly we have to formulate every individual problem as training data. This doesn't scale either. If an AI gets attacked by a wild animal the solution can't be to first produce 10k pictures of mauled victims, intelligence includes to reason about things in the abscence of data. We can't have autonomous cars constantly running into things until we provide huge amounts of data for every problem, this does not scale either.
> Already computational resources are becoming prohibitive with only a few institutions producing state of the art models at high financial cost.
That is something of an illusion.
Obviously there will be some sort of uneven distribution of computing power; some institutions will have more, some less. The institutions with more power will create models at the limit of what they can do, because that is the best use of their power.
So if the thesis of more power = more results holds then truly cutting results will always be by people with resources that are practically unattainable by everyone else. Google's AlphaGo wasn't a particularly clever model, for example. It just had a lot of horsepower behind it to train it and the various ranging shot attempts Deepmind would have gone through. Someone else would have figured it out albeit more slowly in a few years as computing power became available.
Computational power is still getting exponentially more affordable [0]. Costs aren't really rising, so much as the people who have spent more money get a few years ahead of everyone else and can preview what is about to become cheap.
That article says that doubling flops/$ used to take 1 year and now takes 3. It's open question whether that time gap will confirm to grow. The recent phase shift was due to approaching an asymptote in single core performance, and switching to optimizing multicore overhead. The low hanging fruit there will be consumed as well.
This argument completely ignores anything statistical, which is another limiting factor. It reminds me of the difference in bandit research and full RL research. RL researchers are fine throwing a thousand years of experience at their algorithm, because it can be simulated. Meanwhile people using bandits in the real world care about statistical efficiency (learning lots with little data), and it's reflected in the research. Most decisions aren't made with a huge abundance of data (most of us aren't google or facebook).
> General AI also ought to be able to work just as comfortably reasoning about politics as reasoning about medicine. It’s the analogue of what people have; any reasonably bright person can do many, many different things.
The average human has extreme difficulty reasoning about politics, while usually being reasonable on medicine (anti-vax being one of many exceptions). And it seems strange to expect a skilled pianist to also be a skilled neuroscientist or a skilled construction worker. On the other hand these people all use similar neural architectures (brains). So he seems pretty off-track when he criticizes "narrow AI" in favor of "general AI", as if there's some magic AI that will do everything perfectly, and even more off track when he criticizes researchers for using "one-size-fits-all" technologies, when indeed that is exactly what humans have been doing for millennia for their cognitive needs.
And sure, ML models in publications so far are typically one-off things that react poorly to modified inputs or unexpected situations. But it's not clear this has any relevance to commercial use. Tesla is still selling self-driving cars despite the accidents.
Total straw man. He actually uses an intern as an example in the very next sentence after what you quoted, as you would expect them to be able to read and get up to speed on a new area regardless of what it was. Meanwhile SOTA in NLP is a system that can be built to answer a single kind of question but can't explain why it did so or do anything useful if given an explanation of why its answer was wrong.
There are deep models like BERT that do pre-training and then need minimal training to do multiple tasks such as question answering, entailment, sentiment analysis, etc. I don't know about "explaining" an answer but there are debuggers that find errors in data sets: https://arxiv.org/pdf/1603.07292.pdf.
But as I said, I don't see why an artist would suddenly get up to speed as a construction worker. He seems to overestimate the capacity of interns as well.
Deeply familiar with BERT. It lacks the very ability he is describing, to adapt itself, get up to speed, and collect relevant information in a new field, because that's simply not how it works. It can't possibly explain itself because it lacks any mechanism of introspection that could possibly give it that ability. It's an expensive way of gaining a very accurate language model that can be tweaked and get good results on a lot of tasks, but it doesn't understand what it's doing. It can't argue for its position or explain why it thinks whatever it thinks. It's not operating on that level of reasoning, at all.
An artist understands the goals of construction work, and can pick up the skills necessary along the way, because we can understand a goal and have a wide variety of cognitive tools to let us know how we are doing. If you've worked closely with BERT you already know that interns have nothing to worry about, not just from the current crop of tools that includes BERT, but from the entire line of deep learning research, short of a sudden and dramatic shift in direction.
In cognitive science we talk about having cognitive models of things. So I’m sitting in a hotel room, and I understand that there’s a closet, there’s a bed, there’s the television that’s mounted in an unusual way. I know that there are all these things here, and I don’t just identify them. I also understand how they relate to one another. I have these ideas about how the outside world works. They’re not perfect. They’re fallible, but they’re pretty good. And I make a lot of inferences around them to guide my everyday actions.
The opposite extreme is something like the Atari game system that DeepMind made, where it memorized what it needed to do as it saw pixels in particular places on the screen. If you get enough data, it can look like you’ve got understanding, but it’s actually a very shallow understanding. The proof is if you shift things by three pixels, it plays much more poorly. It breaks with the change. That’s the opposite of deep understanding.
Of course. There are an infinte way to make interpretations of perceptions and a finite subset of possible valid ones.
It's among those possible, that the AI will be a concrete implementation of an ideology.
This article doesn’t have any substance. It’s full of anecdata like shifting by 3 pixels to mess up a video game AI or some vague nonsense about “a model of this chair or this tv mounted to the wall.” It’s all casual hypotheticals.
There’s plenty of research on Bayesian neural networks for causal inference. But even more, a lot of causal inference problems are “small data” problems where choosing a strongly informative prior to pair with simple models is needed to prevent overfitting and poor generalization and to account for domain expertise.
Deep learning practitioners generally know plenty about this stuff and fully understand that deep neural networks are just one tool in the tool box, not applicable to all problems and certainly not approaching any kind of general AI solution that supersedes causal inference, feature engineering, etc.
This article is just a sensationalist hit job trying to capitalize on public anxieties about AI to raise the profile of this academic and try to sell more copies of his book.
I’d say, let’s not waste time on this crap. There are engineering problems that deep learning allows us to safely & reliably solve where other methods never could. We absolutely can trust these models for specific use cases. Let’s just get on with doing the work.
Understanding a sentence is fundamentally different from recognizing an object. But people are trying to use deep learning to do both.
I agree with most of the article but I think this^^ skips over the different types of networks used to solve perception and language problems. A CNN is very different from say, word2vec, which isn't a very deep network at all.
I’d go further and say that deep networks are excellent for sentence understanding, and various types of RNN or 1D convolutional layers are very good at this in specialized domains just as CNNs and ResNets are good in specialized vision applications.
It absolutely makes sense to use deep learning for both of these tasks.
In fact, one very effective thing to do is to use a Siamese network to learn joint representational spaces of text and imagery in the same network.
It’s really specious and disingenuous to say “boy, vision and language sure seem different but can you believe these DL researchers are using the same tools for both!?”
Perhaps the difference is in the nature of the information that is being probed and its larger context? Visual imagery often provides almost all of its own context, but the “meaning” of a sentence can be radically different depending upon its source. Humans produce words, so you almost need a working theory of mind to fully understand them. None of that context will ever make it into word2vec.
Why not? It's true that the word "space" has different meanings when in appears in a math book, CS book or astronomy book. But we just have 3 different word2vec models. When I read something about math, I pick the math word2vec model and there "space" appears close to words "Hilbert" and "separable", while in the CS model, the same word is next to "complexity" and "memory". As I read more, I improve my word2vec models, but never mix them together. Now what happens if I'm reading something and don't understand the context? No, I don't switch to some general word2vec model. I rather try to guess which model to use and then reread the same text using that model.
So, I’ve been reading articles on this and I think I have a fuzzy idea of some of these solutions would entail. But what I’m hung up on is this: if Deep learning is about coming up with solutions to problems that are too hard for humans, how do we hope to understand the rationale behind whatever solutions the machine comes up with?
I've asked myself the same. I think of it this way: we can imagine that, in the set of all possible solutions to a particular problem, a subset of those are too hard for humans because they are solved with systems too large for a human to have designed in a reasonable amount of time, or the systems themselves contain pieces that are composed in some previously unrealized way. Of that subset, there may be solutions whose pieces and compositions are well-understood. In these cases (perhaps not exclusively), it might be possible for us to begin understanding any rationale.
edit: after noticing this other hacker news article (https://news.ycombinator.com/item?id=21107706), I wanted to add that this line of thinking is applicable to understanding programs and proofs written by humans as well. Programs and proofs can be well-understood when their pieces, and the way those pieces compose, are well-understood. When the pieces, e.g. lemmata in a proof, are large or hard to decompose, the proof (i.e. the solution to a problem) is harder to verify and understand.
Sometimes it’s easier to understand a solution then it is to produce the solution in the first place. I don’t need to be Newton or Leibniz to understand calculus.
That said, in general I don’t expect that we could understand any particular solution produced by an AI, be it deep or otherwise, but I do expect it to be possible quite often.
A lot of DL is about teaching computers to solve problems that are easy for humans (like driving and recognizing your grandmother) but for which humans have a tough time explaining how they do it.
The holy grail of neural nets has always been to build a simulation of the brain, figure out how it works, and apply that knowledge to how the human brain might work.
We're not there yet but progress has been made. Eventually we'll understand NNs well enough to explain not only themselves but also human brains. In any case we have no choice because we cannot deploy NNs in life critical situations until we understand how they work, because that's the only way to understand how they fail.
> The holy grail of neural nets has always been to build a simulation of the brain, figure out how it works, and apply that knowledge to how the human brain might work.
I'd say that's a goal for some people -- for those whose goal is to figure out how the brain works, rather than constructing a more ideal and powerful GI. Remember the brain is great at some things, but laughable at others -- such as a "7 +/- 2" items in short term memory, inability to immediately retain rote knowledge after one instance and in great numbers, etc. It's the merging of the fuzzy, goal-directed behavior of the mind, in conjunction with its ability to effect the "real world", and the super-human memory and computational capabilities of computers that makes possible future GAIs that are so powerful and possibly scary.
One key question is whether symbolic AI is the right model of the world. It underperforms vector based AI on many specific tasks. But human experts heavily reply on it to communicate with each other. If symbolic AI is not the right model, P vs NP problem might be just irrelevant. Human philosophy is full of craps. We will lose a lot of beliefs.
Elon will be right, we will abandon human languages, and connect through a cable in the brain. Everyone will relearn every thing from NN.
If symbolic AI is the right model, but difficult to build algorithmically. Vector based model just help to make it faster and better. Then we humans are fine. We simply proxy the lower level optimization to AI. Our functionalities will be shifted just like what happened when engine was invented hundreds of years ago.
Of course symbolic AI is the way to go. We communicate with text messages that consist of words that we internally convert to word2vec style vectors to detect similar words. One more thing we do in our heads: we build a graph of those word2vec symbols. When I read in a book "a cat is sleeping on a tree" I instantly build a small graph where nodes Cat and Tree are connected with an edge labeled Sleeps. I may visualize this as a picture, but that's not necessary for AI. In fact, some people can't visualize anything, but they can definitely think. How? They can still build this knowledge graph. I believe that this graph representation is the limit of our intelligence: there are many facts out there that we can't possibly think about because they don't fit this graph model. It's like the set of real numbers can't be squeezed into the set of rational numbers: most of the numbers are irrational.
We perceive ourselves building a knowledge graph but at the physical level how does that happen? Does this graph materialize physically as neuron connections? Or is it just an abstraction the brain makes, a part of the subjective experience of thinking?
It's certainly true that we think in symbols but they exist somewhere in the mushy goo of neurons, could symbolic thinking emerge from large ANNs in the same way?
This graph is a high level abstraction, of course. How exactly neurons store information is interesting, but hardly relevant here. My guess is that one symbol is stored in a very sparse subset of neurons and each neuron acts a bit like a node in a DHT. All together these neurons implement a fast DHT where a word2vec graph node acts as a key. On top of that this "wet DHT" can quickly find keys nearby, i.e. in can instantly return all neighbors of word2vec("apple").
I think ANNs implement only the word2vec function that translates images or sounds into symbols and vice versa.
Does one’s knowledge of how to ride a bike have anything to do with such graph structures? Or is intelligence unrelated to such skills? It seems that the intelligence involved in having a basic conversation would engage a lot of such skills, even just social tact.
I think intelligence isn't involved in riding a bike. Understanding the theory about riding a bike is a different story, and that understanding is a knowledge graph.
Gary Marcus literally gave a talk about this last week in my department's ML seminar. I asked him how sure he was that humanity would eventually achieve AGI and he said 100%. When I asked him when that would be he replied 30-100 years. Interesting perspective.
I haven't read the book, but the viewpoints he expresses in the interview are spot-on. DL can a great alert/suggestion mechanism in narrow domains, but it should never be trusted to make critical decisions. I believe that general intelligence will only be achieved through major advancements in general symbolic reasoning. Something like DL might play a small role in this breakthrough, but it will not be a core part of the solution.
Yes. Or rules, logic, or other symbolic system. AI has always been divided into two camps: Connectionist and not. Current "AI" is all connectionist. What we're now calling "Classical AI" is the non-connectionist kind that was prominent in the 60s-80s but fell out of favor in the AI Winter.
Has work been done to formally prove general AI can't arise from deep learning? I can't help but feel its an assumption being made by those that prefer classical research.
It is trivially proved that deep learning can represent any computable function so the proof you're asking for is not going to be possible. However, it's also completely obvious to anyone familiar with deep learning as it exists today that it is not moving in the direction of AGI and none of the research is ever going to lead to AGI short of some kind of miracle. The burden of proof here would be on those who think this is possible to give some plausible mechanism (or research agenda) for getting there.
I'm no expert but it doesn't seem very far fetched to me. If we can use deep learning to create a digital assistant, computer vision, and navigation/traversal then we're not very far off from something surpassing a dog's level of intelligence even with what exists today. If that's possible then it seems plausible it could continue onward.
Not saying its obviously possible but it doesn't seem obviously impossible and its a mistake to assume as such.
Those systems seem close to a dog's intelligence only because you haven't worked on them and don't know how they work. Nothing about this field is obvious. If you are not an expert, just look at the history. People thought at the dawn of this field over 60 years ago (yes, really) that machine intelligence was right around the corner. The optimists have been wrong for more than six decades and they are wrong now. If you are an expert you can look at speech recognition, computer vision, and all these things you mention, and compare them to the intelligence and awareness of a dog, and realize that they are not close, not in the ballpark, not in the same league, not even the same sport. The ANN/DL research is not moving in the direction of AGI and nobody in industry and almost nobody in academia is bothered by that because ML is getting results and careers are being made, while chasing AGI means spending your career on something that almost certainly will not show any results. If anything, the current success of deep learning means we are further away from AGI than we would have been (which is probably a good thing) because there are fewer people working on it.
So I take it the answer is no. It seems like a lot of the discussion around what deep learning is not makes a lot of assumptions about what "actual" learning is but doesn't seem to back that up in any formal way.
This right here is the soft underbelly of the entire “machine learning as step towards AGI” hype machine, fueled in no small part by DeepMind and its flashy but misleading demos.
Once a human learns chess, you can give it a 10x10 board and she will perform at nearly the same skill level with zero retraining.
Give the same challenge to DeepMind’s “superhuman” game-playing machine and it will be an absolute patzer.
This is an obvious indicator that the state of the art in so-called “machine learning” doesn’t involve any actual learning in the way it is normally applied to intelligent systems like humans or animals.
I am continually amazed by the failure of otherwise exceedingly intelligent tech people to grasp this problem.