Hacker News new | past | comments | ask | show | jobs | submit login

Wouldn't it be nice if policy changes were accompanied by an A/B testing plan to evaluate their impact? I have always thought so. I have also seen a major pitfall of A/B testing that real humans can hand-pick and slice data to make it sound as positive or negative as wanted. Nonetheless, the more data the better.





We already had A/B testing of congestion pricing. The A test was without congestion pricing in NYC, and has been tested for decades.

An important part of testing is establishing assessment criteria and collecting data.

I wish more laws would pre-state what their intended outcome and success would look like.


you haven't described the observations or the sample

TFA describes this extensively. The observations are traffic speed, bus timeliness, and over a dozen other metrics. The samples are sub-areas of NYC.

Anything negative ?

That's not an A/B test because it has no way of controlling for broader economic trends over time. How do you figure out if what you're seeing is because of that one thing that changed, or the enormous list of other things that also changed around the same time?

A more valid design would be randomly assigning some cities to institute congestion pricing, and other cities to not have it. Obviously not feasible in practice, but that's at least the kind of thing to strive toward when designing these kinds of studies.


That would be a bad design for an A/B study (and NYC congestion pricing is not a “study” anyway), because cities are few and not alike and have an enormous list of other things that are different. What NYC equivalent would you pick?

In any case, not every policy change needs to be an academic exercise.


Yup, that is indeed a part of the problem. You'll notice I did say, "Obviously not feasible in practice."

I've got a textbook on field experiments that refers to these kinds of questions as FUQ - acronym for "Fundamentally Unanswerable Questions". You can collect suggestive evidence, but firmly establishing cause and effect is something you've just got to let go of.


Everyone knows how you can conduct good experiments in a land of frictionless spherical cows.

> randomly assigning some cities to institute congestion pricing, and other cities to not have it

Cities are stupidly heterogenous. These data wouldn't be more meaningful than comparing cities with congestion pricing to those without. (And comparing them from their congestion eras.)


The real world isn't A/B tests. No government is going to spend millions on equipment and infrastructure on a congestion zone because some engineers are like "Let's just test this out. I have done zero research on what could possibly happen, but it would be fun to see what the results are."

When you write it out like that, it seems to make total sense! But then you read grant proposals that get funded - in things like the social sciences and humanities, and even conventional science and health - millions of dollars essentially just throwing darts to see what sticks.

Surely you see the difference between working in a development environment and working in production?

The comment to which I replied was referring to the cost, not to the implementation

> Wouldn't it be nice if policy changes were accompanied by an A/B testing plan to evaluate their impact? I have always thought so. I have also seen a major pitfall of A/B testing that real humans can hand-pick and slice data to make it sound as positive or negative as wanted. Nonetheless, the more data the better.

Policies have different effects depending on how likely people judge them to be long-term changes. Construction along a route will cause people to temporarily use alternative forms of transportation, but not e.g. sell their car or buy a long-term bus pass.

Yes, the inability to know counterfactuals will make judging policies more subjective than we might like. The closest we get to A/B testing is when different jurisdictions adopt substantially similar policies at different times. For example, this was done to judge improvements from phasing out leaded-gasoline, since it was done at different times and rates in different areas.


please don't quote the entire comment you're replying to

sometimes people edit their post after the fact. It is important sometimes to quote it, to ensure that context is preserved

Yeah, let's do that for everything: safety belts, safety on gun triggers, melamine in milk, etc...

Do you A/B test your comments too?


unfortunately, building a second NYC for the purposes of A/B testing isn't feasible.

but we have before and after data to compare - that's what this article is about. and the congestion pricing plan included requirements to publish data specifically for the purposes of comparison between last year and this year.


Unfortunately, the possibility exists that the moment of introducing the A/B test requirement will be strategically chosen to freeze the status quo in the way the chooser prefers.

What a good idea. Simply build another Manhattan for the purpose.

test A - before

test B - after

what are you talking about ?


“A/B in time” suffers from inability to control for other factors that might vary over time. In this case, that could be the economy or other transit policies.

But sometimes it’s the only possible approach.


"before" and "after" introduces a large axis of noise

The problem is that for A/B testing to really work you need independent groups outcomes. As soon as there is any bias in group selection or cross group effect it's very hard to unpick.


Generally, that's considered to introduce counfounding factors on the time axis ("did we see improvement because we changed something or because flu season hit and people stayed home") that you'd prefer to mitigate by running your A and B simultaneously.

But in the absence of the ability to run them simultaneously, "A is before and B is after" can be a fine proxy. Of course, if B is worse, it'd be nice if you could only subject, say, 5% of your population to it before you just slam the slider to 100% and hit everyone with it.


yes, but how the hell he proposes to make A/B testing of "whole Manhattan policy"? build another Manhattan just for test? makes no sense. whole manhattan is important. not 5%. so no 5%. a/b test can be done only for things which affect one person, like for example GUI etc, big group under test but effect on individuals,

in such big scale a/b test is tool to deceive, not to get to right conclusion


It is, indeed, much easier to do A/B testing online in environments you control than IRL.

(Purely hypothetically: one could identify 10% of the island as operating under the new rules and compare outcomes. This is politically fraught on multiple levels and also gives messy spatial results.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: