Hacker News new | past | comments | ask | show | jobs | submit login
More data can make you more wrong (spectator.co.uk)
80 points by da02 on April 2, 2017 | hide | past | favorite | 24 comments



The University of Chicago study is interesting and covers nuances not really dealt with in the article (naturally): http://www.pnas.org/content/113/33/9250.full

with a summary here: https://newschicagobooth.uchicago.edu/newsroom/problem-slow-...

The key point neglected in the original article is that the jurors were being asked about intent and premeditation in the slow-mo study. My knee-jerk reaction when first reading this article was to assume that the jurors might have just been able to conquer reasonable doubt more with slo-mo, but the questions seemed well designed to isolate the fact that slo-mo is creating a false narrative (even when the viewers can see the real-time clock clicking over at a very slow rate).

It would be interesting to see whether viewers who got to see both clips would still have the same perception, and whether ordering matters.


The article states that jurors who saw both were slightly more likely to convict. That is, the effect of the slow-mo was reduced if they also saw the real time.

It doesn't address the ordering question.

Thanks for the links. A lot of data there. Hope I don't reach the wrong conclusion. ;)


The essential argument here also applies to any automated decision-making process, and that is one of the big risks we face in adopting ever more technology to help run our businesses and governments.

If we're not careful, we'll take all the old problems caused by unjustified discrimination based on factors like gender or skin colour, which today we (at least most of us) would consider irrelevant to making most decisions, and multiply them up many times over to apply to each input into an automated decision-making process. How do machine learning and statistical analysis tools distinguish between a causal relationship and mere correlation? And how often will they actually demonstrate the latter, yet be treated as if they had found the former?


>"How do machine learning and statistical analysis tools distinguish between a causal relationship and mere correlation?"

They have no reason to make such a distinction.

If you're trying to look for subjects of with a particular property X, but X is hidden, but you know property Y correlates with X, the correct action is to choose subjects with property Y. The causative relationship (or lack thereof) between X and Y is irrelevant.

As a simple example, if you're hiring people to carry 80-pound bags of grain between trucks, you want the property of physical strength. But, if you only have resumes to go on, you can't observe physical strength directly. But, gender correlates with physical strength, and names correlate with gender, so you'd be rational to choose the resumes with recognizably male names.

The reason we avoid discriminating between various protected groups isn't because groups don't have characteristics. If that was the case, such discrimination would be pointless and people would not bother doing it because it would have a cost but no benefit. Discrimination based on hair color is like this. Nobody does it and we don't need campaigns to stop it.

In reality, groups of people do have characteristics - physical, psychological, emotional, intellectual, cultural. However, we avoid discriminating between protected groups because we believe it's wrong to subject an individual to that kind of discrimination, even if such discrimination would be rational for the discriminator. Such discrimination violates our western ideals of individual rights and equal individual opportunity.


However, we avoid discriminating between protected groups because we believe it's wrong to subject an individual to that kind of discrimination, even if such discrimination would be rational for the discriminator.

I don't think most of us would have a problem with discriminating on an inherent property of a group that is relevant to the decision being made. Indeed, often neither do anti-discrimination laws. Such discrimination can be objectively justified.

For example, consider a job where a certain level of fitness is a functional requirement, say a firefighter whose role includes being able to carry someone out of a burning building. Setting a relatively high bar for physical strength is going to discriminate against female job applicants and those with various physical disabilities as their respective groups, but it's not discriminating against someone because they're female or in a wheelchair, it's discriminating against them because they can't carry someone out of a burning building and so that person is going to die. I don't think most of us would have a problem with this kind of rule. The only thing unfair here is life not making us all equal, and there isn't much we can do about that.

The problems usually start when either you discriminate against a whole group based on some property that is somewhat correlated with membership of the group but not inherent, or you discriminate against a whole group based on a property that is inherent but isn't actually relevant to the decision being made.

For example, if a job involves sitting at a desk and using a computer all day, a woman or someone in a wheelchair can presumably do that just as well as our hypothetical strong male. In this case, discrimination on the basis of gender or physical strength simply isn't relevant to the decision being made but sadly does sometimes happen because of individual prejudice, and thus we have laws to protect those in more vulnerable situations.

Unfortunately, the kinds of tools and mathematical analyses we're talking about here don't necessarily see those situations any differently. They may learn the wrong lesson if there are coincidental, misleading correlations in whatever data is used to train them, which brings us back to where we came in.


>"How do machine learning and statistical analysis tools distinguish between a causal relationship and mere correlation?"

They don't.

>"And how often will they actually demonstrate the latter, yet be treated as if they had found the former?"

Every time the former occurs it is incorrect.


The two issues brought up (obstruction in cricket and the murder trial) say more about rules than they do about evidence.

In the cricket case, the problem seems to be that small differences in a player's reaction time can make the difference between acceptable and unacceptable actions. Why should it matter if the player intentionally blocked the ball? Justice is much more difficult when rules depend on precise subjective judgement.

The same is true of the murder case. Clearly the robbers committed a second degree murder. In the hypothetical cases, was the prosecution pushing for first degree murder instead? Why rely on precise subjective judgement?

This like trying to decide from a grainy photo whether a character is an 'A' or a '4'. It's an unavoidably difficult problem, but in the two cases above, we can change the rules to avoid it.


Intentionality plays a role in most criminal definitions. The exceptions are the strict liability cases like statutory rape or DUI (just to name a couple). I think the law looks to intentions because it's very human to base our judgments on our sense of the person's intent. Beyond me why it developed evolutionally (or if it's just a very recent phenomenon), but we have a soft spot for good intentions.

As to the case at hand, I have similar questions. I would think in the U.S. this would have just been a felony murder case where intentions are irrelevant. Agree that a first-degree murder charge in a robbery case would be odd (unless the robbery was a cover story for the planned murder??).

As for sports, the pendulum seems to shifted in the other direction with most rules of the 'strict' variety. And I would agree with you that any rule that can be formulated as a strict rule, should be.


In a world filled with randomness intent is the only thing we can control. I think the trend towards statutory crime comes from the fact we feel we've tamed much of the world. And therefore what occurs is most likely someone's fault - instead of an act of god.


Very astute observation. Allowing for bad acts or bad outcomes without a scapegoat (because accident) makes us feel less in control of our world.


Most of this story is about slow-motion in videos, not about the amount of data.

To add to it, data-gathering by NSA can indeed lead to the wrong conclusions. Or statistics in scientific experiments that are wrong, due to selecting only a part of the data-set.


There's a fair bit of discussion of crime scene data gathering beyond video: CSI-type stuff. Makes the point that when you have a big budget to gather a bunch of data, you have a lot more leeway to paint a narrative supporting your theory.

Would have liked to have been given a more concrete example of this, because I don't have a strong enough imagination to really understand that scenario. Finding the suspects fingerprints in more places around the crime scene? I don't know, I guess that could help convict but could see it just as easily distracting the jury and muddling the case.


Really interesting, it's like overfitting applied to human decision making!


This doesn't strike me as having anything to do with data volume. This is much more about exploiting cognitive biases. This is not a new thing. If anything, the subject is a classic one.


Slow motion video isn't more data, it's the same data just presented in a different way.


There's a limit to what we can absorb in real time so slow-motion will tend to also generate more data ingestion. But most of the effect should indeed be from our "intent sensing neural nets" being trained for real time speed and going awry when fed slow motion replays. A big part of figuring out if something is intentional in something as fast paced as sports will always be "was there enough time for a human to react?".


The title is clickbait. The subtitle makes sense though.


Clickbait doesn't mean any remotely catchy title. It means sensational and misleading titles. More data can make you more wrong and that's exactly what the article is about.


Agreed. A clickbait title for this article would be something like "The Shocking Reason that Big Data is Failing" or "Think data leads to good decisions? Think again.".


or "Slow-Mo Killed the Video Star"



Be careful with this source; as articulate and urbane as much of the writing is, it's the UK's Breitbart.


Bullshit. I'm often in disagreement with The Spectator but that is a ridiculous slur.


I was a print subscriber for two years, read every article in every issue, and that was my conclusion.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: