Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Red Beads Experiment (2019) (medium.com/make-work-better)
52 points by kiyanwang on Dec 17, 2024 | hide | past | favorite | 25 comments


This experiment ought to be a great reflection on the importance of picking the right metrics to manage, rather than just focusing something that's easy to measure -- even if it might be directly related to the outcome / bottom line!

Both input and output metrics ostensibly play an important role when attempting to manage a system (especially when leading people), but the most crucial aspect is to start with a good causal model or put yourself in the right frame to discover one with very few iterations. Otherwise you will most likely steer the system into a crash and move on to the next one and repeat the same travesty.


It reminds me of https://existentialcomics.com/comic/58 which has a bunch of philosophers playing Candyland:

> The contrast between the meaningless fixed nature of the game, and the narratives we tell ourselves as we are playing causes the feeling of the absurd. "We are doing poorly," "We are happy to be ahead," "We are nervous about being passed." None of these stories have any meaning, since the game merely exists, and the outcome is simply a part of that existence.

The workers do a thing, and management creates a narrative based on the outcomes, even though the outcomes themselves are purely a product of the system, and nothing about the workers. But they expect a narrative, so they'll create one out of whole cloth if necessary.


I was at a conference where this was presented by John https://www.amazon.com/Journey-Profound-Knowledge-Altered-In...

It’s a fun little eye opener that starts conversations. I wish more of those conversations ended up moving decision makers


Cedric Chin has a great series on this and has done a deep dive into Deming:

https://commoncog.com/becoming-data-driven-first-principles/


I have a hard time identifying with this. The premise seems to be that the worker is just a passive participant in the system, literally a ping pong ball on a galton board. If the end state isn't what is desired the problem is with the board not the balls. This in fact may be true in the large, but that is the same as saying that management systems and processes are for people that need management systems and processes, reliably producing mediocre results. There is definite tension in accepting that this mediocrity is practically inevitable if anything is going to be done at scale.


If you treat people as overly simple machines you might as well mechanise that part of the process.

Human workers have hand skill, learning, intelligence, wisdom, introspection, communication, colour and pattern recognition all of which are not being utilised making the job repetitive and unfulfilling.

Then the threat of firing based on a random metric - this would create a culture of fear and one would expect a high turnover of any ‘good’ employees.

I agree this would be a horrible place to work.

Which is the genius of the allegory - how the wrong metric can make efficiency and progress impossible.

Of course even a good metric becomes corrupted according to Goodhart's law:

“ When a metric becomes a target it ceases to be a good metric because people will seek to game it.”

‘The Wire’ remains a salutary lesson on how all stats when tied to pay and promotion will get ‘juked’.

A metric can measure nothing but random chance ( there is always some randomness that is rarely accounted for ) - this makes management impossible and workers disempowered.

Even with good stats there are too few Baysians vs Frequentists in corporate numerical analysis.

In terms of the company in this analogy a useful exercise is to imagine how to improve things.

An immediate solution could be to allow the workers to make multiple sequential draws and to discuss what percentage of red beads constitutes a good draw and discard and resample if more red beads than that.

Let workers share their best practices and it may be that there is a subtle skill that can be learnt about how one wields the paddle in the box of beads to steer it away from red clusters.

Or divide and conquer and take fewer beads and allow the workers to discard all red beads, then combine the pure samples and batch up to correct size.


The delivery is a bit too weak for me. It would be way more relatable if the workers stopped delivering red beads (as a stand in for bad news) on day two but miss their quota, except for one employee, who delivered red beads, but is fired as a result.

In this modified scenario, the manager would have initially known that there is a problem, but then the manager thinks that he fixed the problem and his team is merely underperforming, but in reality the underlying problem is still there, but it no longer turns up in the metrics.



I think the only conclusion one can draw from this is that high school statistics is actually useful in real-life.

If the company is crazy enough to put people on PIP based on 2 low-precision data points, and fire them based on 4 data points... it's not going to be a good place to work at, and nothing will get done. If you are looking only at the numbers, then at least use proper statistical methods. Or stop treating people like numbers and treat them as humans, this'll give you much more data to work with.


> I think the only conclusion one can draw from this is that high school statistics is actually useful in real-life.

That isn't the conclusion, you can have a management team that all technically understand statistics and they still make the same mistakes. It is depressing to watch it play out in real time. The problem is when people don't identify that workers are just doing what they are told and personal effort isn't the major factor determining system outcomes.

Going through the article, the important part about the narrative is the misconception that "performance outcomes are being determined by the worker, so managing the workers will get better performance!". The management is being done by statistics throughout even though the quality of that management is bad.


The management in this game is being done by a deep misunderstanding of statistics. They’re using random noise as a signal and taking action based on it.

Actually using statistics would tell you that your numbers are random noise and can’t be used to make sensible decisions. It would tell you that performance outcomes aren’t being determined by the worker. There might be better ways to figure that out (in this case, taking five seconds to look at the process) but statistics will get that job done.


Well you aren't wrong, but that is a question of identifying what model to fit. If the question is "are the workers in control of the outcome" then sure, they weren't using statistics properly. But if management realises that is the question to ask they also probably won't need statistics because the answer is obviously no in almost all contexts.

Management was asking the question "given that the workers control the process and we know that there are differences between people, which workers are performing the best?". And they were doing the best they could under pressure with the incomplete information they had. It was a stupid model, but that is why they are going to an Ed Deming talk - the median manager defaults to stupid and needs a bit of help.

Training people in high school stats doesn't necessarily get better results at model selection, some extremely educated and capable people are still stupid about that part.


They were certainly not doing their best to answer that question! Stats will answer that question correctly, answering “they’re all performing the same, within the margin of error.”

Yes, actually understanding what the hell is going on is really helpful, but as a backstop, it’s good to have the basic stats knowledge that you can’t come to any conclusion from a handful of samples.


No you're not seeing it from the management perspective. They've already selected their model and assumed that performance depends on the worker involved. There isn't any "they’re all performing the same within the margin of error" because that would violate a model assumption. It has a probability of ~0%, there isn't any point checking for it.

You're working from a model where the performance could be the same and then doing the obvious statistical checks. Typically in the wild management have sufficient knowledge to do those checks if their model allowed for performance to be equal. But they don't have a model where that is a reasonable possibility so there is nothing to check. They're doing the correct statistics for the wrong model. And the model happens to be really stupid but that is managers for you.

You can teach them all that theory and they still won't apply it. The problem isn't usually a gap in statistical knowledge. Take another look at how Deming framed his experiment - if you look closely he is imparting statistical knowledge but that isn't where the real focus is. He's actually attacking the idea that the workers have any influence on outcomes.


I see the management perspective. I’m just saying that perspective is obviously wrong if you have any stats knowledge.

If your point is just that management could have that knowledge and fail to apply it, I won’t argue with that. But I definitely wouldn’t call it “doing their best” when that happens.


The paddle is a random-number generator. Stats are irrelevant, except to measure luck.


I think the lesson was designed so that this was just obvious enough for people to realize it themselves as the demonstration progressed.

Also, when done well, the teacher acts the part of the manager who keeps pushing people even as this realization spreads throughout the group.

The meta-lesson is something that Deming emphasized over and over: Don't rate your workers based on the outcome of a random number generator.


Exactly, but that's out-of-story knowledge. Presumably in-story, the "red beads" means something else which we don't have intuition about - downtime incidents, number of broken widgets produced, part yield...

If those (in-story) managers had a basic statistics knowledge, they would have said: "Wait! The variance is too high, you cannot make any decisions based on 2-4 points". Then they'd do some statistical tests (or at least plot points with error bars and eyeball it), and realize that the numbers given are random noise and are not actually related to individual performance.

But those managers apparently flunked their stat, or they forgot it all, so innocent people got fired, and project got cancelled.

A sad story about why math is important even if you are not in STEM directly.


The workers control the paddle and the scooping of the beads.

A smart worker will eyeball their draw for red and redraw a few times until they get a good draw.

There may be other ways to flick off red beads by tilting and tapping.

None of this is excluded behaviour at least in this article.


When there’s an obvious workaround to the allegory, you can take it as implied that it’s not allowed.


Figuring out that the numbers are pure luck is the only important thing here.


Yeah, but anybody with statistical knowledge might have been able to guess that this is a random process and that not evetything is down to individual performance.

The point of the thought experiment is to make us think of what the system contributes to team/individual performance. In reality every procedd might have a random component, but it might also have a feedback component — e.g. you give a project with many red beats to a team that did well before only to have them (to us: predictably) underperform, then you give it to another team etc.

The lesson here is to understand this underperformance doesn't necessarily mean they got worse, it just means they got the harder project to pull off and other teams might have underperformed even more.

If you as a manager focus on the totally fictional underperformance of one team vs the other you optimize things that had no bearing on the outcome. In fact you will make things worse as you make yourself believe the (in reality worse) team that only handled red-bead-free projects before will perform better.


A grounding in statistics shouldn't be needed to see that taking a scoop of mixed beads is likely to be mixed beads.

A variant of Gell-Manning amnesia arises when real managers see this demonstration, go back to their real jobs, and think "of course, my team's processes aren't as stochastic as that."

But then again, managers are also beads.


Here is a video of the experiment: https://www.youtube.com/watch?v=ckBfbvOXDvU





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: