Known Unknowns

fanzhang · on July 1, 2018

This is not a problem inherent to machines. If I teach a baby the word "tank" by pointing to a picture of a sunny tank and the word "forest" by pointing to a picture of a night forest, the baby could be correct in deducing "tank" meant day and "forest" meant night.

This is not even specific to deep neural nets (DNNs). Even something like line fitting (OLS) will have problems with it.

Classical statistics has a great way to deal with this, which is that your training data for classifying a label A against B should encompass the entire range of A and B that you want to use this machine on.

This means if you want to identify a picture of tanks in a situation x, you better have a training set of tanks that "overlay a neighborhood" of x.

But maybe the article isn't talking about the ability to train MLs, or how humans will be better than MLs, but that it's scary to have machines that can decide things and we don't know how they did it (or more precisely, the meaning of how they did it).

That's reasonable, but I don't think operationally any different that if you ask a human who misclassified something and she just said "well uh, it just kinda looked like B from this angle / from a glance..."

throwaway2048 · on July 1, 2018

I dont think many babies need to see thousands of examples of tanks before they understand what a tank is, and even then sometimes mysteriously classify horses as tanks when one pixel changes.

cdoxsey · on July 1, 2018

You've nailed. Language is a great examply where our intuition about how one learns is wildly incorrect.

Babies are prewired to rapidly learn language, merely by hearing it around them. They do this regardless of the language, with teachers who have no idea what they're doing. It's an incredible process but because it happens so easily it leads us to think teaching a computer these things must not be all that hard.

But we're not generic computing devices, and our linguistic developmental skills are innate and incredibly well adapted. We just can't see it so we take it for granted.

techbio · on July 1, 2018

They understand something simpler earlier: a name/word for something [a noun] is in the set ("manmade object", "animal", "neither of these").

Classifiers built on classifiers referring to classifiers that make them laugh or cry; new knowledge is not built on random initiation weights. It is based on physiological imperatives that start before any of us were born.

More data, more generations, more generalization.

stickfigure · on July 1, 2018

A baby has 100 billion neurons and 100 trillion to 1 quadrillion synapses, so they have a rather large advantage (as of yet).

That said, I'm not sure babies are as clever as you suggest. By the time they can verbalize "tank" they've had a vast amount of neural training. Try running this experiment with a 6-month-old; I'll bet on the program.

antcas · on July 1, 2018

This is a good point.

Two operational differences I can think of between humans and ML/CV systems:

1) Humans can explain themselves. Many ML systems are not designed to provide any explanation, even poor ones.

2) Humans are much more limited in the number of decisions they can make per second. A ML system could make millions of mistakes per seconds.

If a system is capable of making many orders of magnitude more decisions than a similarly tasked human, then I think it's fair to push for a higher standard of accuracy and error logging than the human would be capable of.

The point made, that both biological and artificial systems are susceptible to some of the same errors and bound by some of the same constraints, is well taken.

yorwba · on July 1, 2018

> Humans can explain themselves. Many ML systems are not designed to provide any explanation, even poor ones.

Humans are good at generating sentences that appear to explain their decisions, but whether those are their actual reasons is unverifiable. Unless the decision is based on some predecided verbal rule, most such "explanations" are probably pretty useless.

While I agree that most ML systems are not set up to explain themselves any better, the ability to get an explanation is something that can be bolted on after the fact. By providing the model with slightly modified inputs, it's possible to determine which kinds of input would have changed the decision if they had been different, i.e. the value of those inputs being as observed is the reason for the output. One possible implementation of that general idea is "Anchors: High-Precision Model-Agnostic Explanations" https://homes.cs.washington.edu/%7Emarcotcr/aaai18.pdf

throwawaymath · on July 1, 2018

> Humans can explain themselves. Many ML systems are not designed to provide any explanation, even poor ones.

In my opinion, the vast majority of self-reported reasons given by humans to explain their behavior are extremely poor, inconsistent and low-fidelity.

pron · on July 1, 2018

> If I teach a baby the word "tank" by pointing to a picture of a sunny tank and the word "forest" by pointing to a picture of a night forest, the baby could be correct in deducing "tank" meant day and "forest" meant night.

I am not at all sure this is the case. It is possible that humans do primitive statistical classification on sensory signals only at very shallow, subconscious, levels of cognition, but not at levels they normally use words for.

throwawaymath · on July 1, 2018

When you say you're not sure, are you saying you're not sure children could be confused like that? I've witnessed it before. Especially for very young children, it can take several repetitions of mapping a word to different depictions of an object before they realize what you're talking about.

pron · on July 1, 2018

They can be confused, but not quite in the same manner as a statistical clustering algorithm. Children (or adults) don't seem to learn by statistical clustering. It's often said that NNs require many samples while humans normally just need one, but a more precise description is that humans need zero. I can teach you to recognize a Chinese character without showing you even a single picture. It is true that this requires some background knowledge that small children lack, but there is little reason to believe that their learning is in any way similar to statistical clustering.

throwawaymath · on July 1, 2018

How would you teach me to recognise a Chinese character without showing me a picture?

pron · on July 2, 2018

A square, divided into four equal squares, floating symmetrically inside a larger rectangular U whose two vertical sides slightly protrude lower than the horizontal bar, and above the square, a horizontal line, slightly longer than the side of the square and equal in length and alignment to the horizontal line below the square, the base of the U.

throwawaymath · on July 2, 2018

I don't mean to be obtuse but I think there are two problems here...

First, I personally found that really difficult to follow - to the point that practically using that method seems ridiculous to me. Why would you ever do this instead of just showing a picture? I can only imagine how laborious it would be to rhetorically explain an entire alphabet through tedious description.

Second, and maybe more importantly, what is a "square" and what is a "line"? Sooner or later you need to anchor this in something visual or you'll have built up a brittle tower of abstraction. This seems like moving the goalposts.

I very strongly disagree that this example is illustrative of how children - or humans in general - could learn something with 0 samples. This strikes me as a contrived and inorganic way for humans to learn things which are fundamentally visual or auditory in nature, not semantic.

pron · on July 2, 2018

That it's hard to follow is beside the point (and it's hard because this particular character is rather complex, but I wanted to pick the character for "picture" as a pun; some characters are much simpler, and could be very easily described, while others are far more complex). People are capable of doing that, and do it all the time (maybe not with characters, but certainly with other things). Sure, it requires prior knowledge, but that we can do it and do it often, suggests that perhaps most forms of high-level learning internally build this "tower of abstraction". One could claim that the internal layers of an NN do something similar, but that's not quite accurate, because the backpropagation process is global, rather than modular.

yorwba · on July 2, 2018

The way I actually used that description is to mentally construct the example from it, to then recognize "画". Verbal descriptions are just a different way to input examples, they don't help you learn without any examples at all.

pron · on July 2, 2018

But that's not the way NNs work. NNs learn classification on a particular representation only. From the NN perspective, this kind of description is exactly zero samples (of the required representation).

yorwba · on July 2, 2018

> NNs learn classification on a particular representation only

To perform the task based on the description, you need to have learned to associate descriptions of images with the image itself. To train that ability you'll need lots of paired examples in both representations.

That the description is not in the same representation as the image you want to recognize doesn't change the fact that both need to use a representation that has been trained on.

Cross-representation learning is still useful because it increases the range of usable data (e.g. it can improve game-playing agents by telling them how to play https://arxiv.org/abs/1704.05539), but it doesn't magically enable learning from zero samples (just more kinds of samples).

pron · on July 2, 2018

> To train that ability you'll need lots of paired examples in both representations.

People rarely ever need lots of paired examples, except possibly once, when they learn the concept of "paired representation," and even then probably a few examples suffice to learn an extremely abstract concept that is then used in all learning. People simply don't learn high-level concepts through statistical clustering.

> that both need to use a representation that has been trained on.

But this training is of a very different nature. As far as I know, not a single person claims that NNs learn in a way remotely similar to how people learn (nor are they meant to), certainly not when it comes to high-level concepts.

sjclemmy · on July 1, 2018

An interesting article but some of the conclusions at the end of descriptive paragraphs seems to come from nowhere and are completely unsubstantiated; “But machines don’t correct our flaws—they replicate them.”

“We face a world, not in the future but today, where we do not understand our own creations. The result of such opacity is always and inevitably violence.“

btrettel · on July 1, 2018

Is anyone aware of a particularly good explanation of the general confounding issue (not the tank problem specifically) discussesd at the start of the article?

In a talk I'll give in about a month I will be briefly mentioning how confounding between two variables makes much previous research in my field wrong. Most people in my field will never have heard of confounding and I don't have much time to dedicate to this in the talk. Right now I am planning to say something like "If you make two changes at once, you can't know which caused the observed change in the output or the relative contributions of each input change."

techbio · on July 1, 2018

> "If you make two changes at once, you can't know which caused the observed change in the output or the relative contributions of each input change."

And there are millions of changes in the pixel data. I would appreciate "a particularly good explanation" myself.

qmalzp · on July 1, 2018

When talking about chess: "But even the most powerful program can be defeated by a skilled human player with access to a computer—even a computer less powerful than the opponent."

Is that true? Can a state-of-the-art chess engine plus a grandmaster really outperform just the state-of-the-art chess engine?