Hacker News new | past | comments | ask | show | jobs | submit login
Experiments at Airbnb (airbnb.com)
164 points by lennysan on May 27, 2014 | hide | past | favorite | 43 comments



Airbnb could likely get a lot more bang for their buck by letting hosts run experiments on pricing than by testing button colors and whatnot.

I ran an online marketplace at a previous gig. Our service providers always complained that they didn't know what to charge to maximize their business. They couldn't see the forest as a tree. Because we had the data for all providers, we started letting them know if they were under- or over-priced, and we saw more conversions and revenue.

Dynamic pricing (like Uber does on holidays) alone could be hugely valuable.


> testing button colors and whatnot.

Isn't that exactly what they're not doing?

As a conversion optimization guy, I see my fare share of "button color" stories--which are as much of a mockery as "growth hacking". So I was glad to see they're doing this right: Tracking conversions and paths by cohorts; cautious of false positives; testing big changes instead of, eg, button colors; combining test results with qualitative data (usability tests) when making decisions; ...


I agree. I've had to spend some time lately on the host pricing page for an AirBnB rental and I would love some more advanced features.

I've previously thought about writing a little program that keeps an eye on the major hotels in the city, so I can work out when rooms are more in demand and amend my pricing.

Even a "this is the average cost of similar rentals within a X radius of you" would be helpful.

It would be beneficial to AirBnB too. They'd get more money in fees and money flowing through their system. It also works both ways - if I reduce pricing in low demand times, it's possible I'd get bookings I wouldn't have before - again increasing the amount for AirBnB.


Classic 'push this button to make more money' app, with a nice path to acquisition baked in.

Someone with an afternoon free should really step on this. Just make sure it doesn't end up starting price wars with itself!

Call it Autopilot, or something.


Because we had the data for all providers, we started letting them know if they were under- or over-priced, and we saw more conversions and revenue.

Did your TOS allow for this? At one of the early SaaS firms we really wanted to publish metrics (well, sell..) about how each firm was doing relative to their peers.

They all wanted this info, but at the same time were strongly opposed to our using their data in the studies.


This talk may interest you (includes speakers from Uber and Airbnb.)

http://nerds.airbnb.com/openair-algorithmic-pricing/


The phrase "forest as a tree" tripped me up. Did you mean "forest for the trees"?


It will be based on the idiom "forest for the trees", effectively having the same meaning but also includes the fact that they are a part of what makes up the forest.


A simple hack is to run an A-A-B-B test instead of an A-B test. Rather than splitting 50-50, use 25-25-25-25 splits. When A1==A2 and B1==B2, then you know that you have statistically relevant data and you can compare A to B. Depending on the dataset, this could happen in minutes or weeks.


To explain this in a different way, let's use a simplified example:

Suppose I have a website with a "Click Me" button that's green in color. I want to increase clicks and think to myself, "perhaps if it was a red button instead of a green button, more people would click!" To test this, I would run an A-B test along the lines of:

if random(2) == 0 then color='red' else color='green';

In theory, I just push this code and track the number of clicks on the red button versus the green button and then pick the best. But in practice, when I push the code, there might be 5 clicks on green and none on red in the first hour. Maybe green is better? Maybe I didn't wait long enough? Okay, let's wait longer. A few hours later, there's now 10 clicks on red and only 6 clicks on green. Okay, so red is better? Let's wait even longer. A week later, there's 5000 clicks on red and 4500 clicks on green. That seems like enough data that I can make a conclusion about red vs. green. But is there a better way?

This is where A-A-B-B testing can help. Let's start by looking at just the A-A part of the test. If I split my audience into two groups (green1 and green2) and show them both green buttons, the results should be identical because both buttons are green. If I check back in an hour and the "green1" and the "green2" groups are off by 20%, then I have a large margin of error and need to wait longer. If I check back in 6 hours and they're off by 10%, then I need to wait longer. If I check back in a day and green1 and green2 are only off by 1% then that means we've probably waited long enough and my margin of error is around 1%. I can now add green1+green2 and compare it to red1+red2 groups and see if there's a clear winner (e.g. red is 5% better). And this only took a day instead of a week!


Using four buckets instead of two like that will improve your confidence in the results, but will also double the required sample / testing duration. You could just as easily use two buckets and wait twice as long to achieve the same effect.


A/A testing (Null testing) or A/A/B testing gives a different effect than A/B testing.

Microsoft Research suggested (http://ai.stanford.edu/~ronnyk/2009controlledExperimentsOnTh...) that you continuously run A/A tests alongside your experiments. An A/A test can:

- Collect data and assess its variability for power calculations

- test the experimentation system (the Null hypothesis should be rejected about 5% of the time when a 95% confidence level is used)

- tell if users are split according to the planned percentages


Can you explain why? I'm struggling with the math behind the whole thing as it is, but intuitively this sounds like a very clever hack. I wonder why it would double the experiment time if effectively people are seeing either A or B variants.


That comment is brilliant, thanks for contributing it.

You'll probably have to ensure it applies sequentially too, at least to be sure As and Bs are stable in their matching, but it seems to me an elegant solution for the problem (not that I'm statistician, though).


This is better than stopping when you get a statistically significant finding which is nearly always the wrong thing to do. Do you have any math behind this?


I'm not sure I understand - isn't that essentially an A/B test because 25 + 25 = 50?


I believe it lets you compensate for the possibility that, say, all of your conversions might be coming from the bottom 1% of your users. Segmenting A into A1/A2 therefore insulates your interpretation of the results for A from being as heavily skewed.


Yes but in your A/B test you shouldn't be picking first half vs second half. Each visitor should be randomly assigned, so it should mitigate the problem you mentioned.


Statisticians have spent time thinking about the right way to deal with these sorts of problems for a long time: https://en.wikipedia.org/wiki/Sequential_analysis.

Funnily enough, the page they reference for calculating the right sample size actually talks about sequential analysis, but AirBnB doesn't mention this in describing their solution...


HN user btilly has a really helpful essay on the math behind stopping tests earlier than your predetermined sample size. It calls for setting a maximum duration, and provides stopping points along the way. Works similar to the method AirBnB describes.

http://elem.com/~btilly/ab-testing-multiple-looks/part2-limi...


As a developer working for a major competitor to airbnb on a shopping page, and having implemented hundreds of experiments on my page, I can say that these guys are way too obsessed with statistical certainty.

Rate of deployment of experiments is a better focus; since all your opponents are bound to copy your winners anyways, you have to rely on the few months edge you've earned before they do so, and constantly maintain that lead.


Unless random high bias means your "edge" is exactly the opposite.


This is about proportionality. Finite window trading strategies need to take into account the link between implementation overhead (time) and overall profitability (linked to time). Just the same way they need to link pricing and profits (linked to pricing). That seems to be the crux of the issue.


Yes, I'll agree with that. Just warning about throwing the baby out with the bath water. Applying quantitative decision making tools is worthwhile, being picky about it can protect us from our emotions and preconceptions.


This article contains some serious p-value abuse. The p-value should be adjusted to account for multiple testing. You do this to minimise the effect that a hypothesis would be accepted purely due to random chance.

Try setting your p-value to your Type 1 error rate divided by the number of tests you perform. It will be much smaller, and this is a good thing. Significance should really test for significance, not random chance.


I wish AirBnB would make the cost scale logarithmic, to match the fact that this is roughly how the prices will be distributed too. I'm usually only using the left-most 5% of that slider.


Isn't there any better alternative to sliders, though? I never use them, as I usually know exactly how much I'm willing to spend, and will probably not feel comfortable going out of my bounds anyway.


Can anyone point out a good introduction to some of the methods used in the article? Terms such as the p-value, treatment effect etc.


The cult of statistical significance is alive and well. A 0.05 p-value implies a 1:20 chance of "alternative" performing worse upon final installation. That's rather risk adverse. It also implies that "alternative" is worse from the get-go. When is that the case? Type 1 and Type 2 errors are much more balanced in web apps. Anyone care to show me why that's a bad mentality?


No, it doesn't. It means that there is a 1 in 20 chance that you would have seen results as good or better if your change had no effect (assuming a standard one-sided hypothesis test). Thus if the effect appears to be good, you should take the test as some evidence that it is worth implementing.


Right, that's a specific use, but I'm speaking of a two-sided test where you're indifferent between alternatives.


Ok, I'll be "that" guy who heckles every AirBnB post, even if this one did have some nice graphs (and ideas).

When is AirBnB going to experiment with helping their hosts follow the law? I bet I can predict that graph. Why, look at all those illegal rentals in SF right there in the sample screenshots--oh the irony.

Remember, DON'T FUCK UP THE CULTURE! But it's OK to fuck up your host city for a buck or 2 billion.


Following the law would mean not renting out their properties in SF. People clearly find value in doing so. If AirBNB goes away, the same owners will just switch to Craigslist or the next alternative.

Learn to deal with change instead of bitching about everyone who bucks the trend. I assume you feel the same way about Uber and Google Fiber displacing all those taxi commissions and telecom monopolies.


You rant about "follow the law" but who exactly is being hurt? The person making rent off his/her extra room? Please give examples or real harm rather than, "follow the law" statements. I suppose you're the kind of person that'd turn in Ann Frank. That'd be following the law.


Well, if I lived in an apartment building, I'd like to have real neighbors, people that speak my language, enjoy a sense of community, and friends in the building. It would get quite annoying if I come home every evening to a new set of people on vacation or backpacking, and checking in and out of the surrounding apartments. The building shouldn't be a hotel, it's my home.

Now, I've used Airbnb a dozen times, but I can see the concern for people in the above situation. I've had people ask me when I'm renting certain Airbnb apartments if I live in the building, and they seemed a little upset with the idea of me just passing through for a week.



We have a Godwinner!!!

In case you haven't heard: http://en.wikipedia.org/wiki/Tragedy_of_the_commons

Also: there is a local law on the books, duly enacted by a democratic process, but you're arguing nobody needs to follow that law because you think nobody is being impacted? Is that your position? I have to assume you're an AirBnB host, so I wonder if you've contacted every one of your neighbors to see if they're cool with your gig.


Would you mind explaining how Airbnb is related to the Tragedy of the Common? I was with it until the part about the actions being contrary to the common good and depleting a resource. What long-term good is Airbnb affecting? Clearly the there's a sustainable market and model for their business or it wouldn't have been so successful, not to mention that hotels and motels had long since proven the market for people needing temporary places to stay.


In my view, there are two commons being affected by AirBnB: first the community and sense of place where you live (building, neighborhood, town, city, etc.), and second a stable and accessible housing market.

Community can have many facets: security, friendship, socialization, child-raising, and even intangibles such as feeling integrated and belonging (not for everybody, but important to many).

Market stability is important for financial security, financial planning, schools for children, sense of home, reduced stress (not having to move often, or even needing to think about it).

Now, neither of these things is a resource per se, but I believe both are sought-after and generally preferred when people are given a choice. There are many laws and regulations made to help preserve these commons and prevent their erosion. For example, Prop. 13 in CA helps with financial stability in rising markets. Further, people actively work (invest effort) into maintaining these two commons and most people recognize that localities where either or both of these exist are more desirable.

Of course, that makes them also more desirable for investment property, and AirBnB facilitates the monetization of such investments, often to the point of enabling the investment altogether (the people who buy extra properties for the sole purpose of short-term rental). And I think it's pretty clear that those cases are profiting from the commons and not contributing anything back--net withdrawals that undermine the commons.

The "change is inevitable" argument gets tossed around for a lot of things, but often by the people who are profiting the most by change, and the faster things change, the more money they make. I think that in the domain of housing and community, fast change is disruptful to people. A landlord with 4 investment properties may think nothing of flipping them or kicking out tenants to remodel and rent on AirBnB, but the people involved have several months of stress and uncertainty in their lives, not to mention changing schools and sometimes jobs as a result.

I think city governments need to regulate AirBnB and the like so that the investment potential they create does not overwhelm the community and stability of residents. This applies anywhere the housing market is saturated or nearly so and visitors are plentiful: SF, Seattle, Portland, San Diego, Honolulu, and much of the east coast probably as well. I advocate for an annual limit of 30-40 day short-term rental for any given residential property. That covers the situation where you go on vacation and rent out your house/flat, and where there's a convention/concert in town and all regular hotels are booked (and you have a spare bedroom to rent). The booking site should enforce this limit and collect all hotel and sales taxes on the transaction.


I appreciate the thought out reply. I disagree, though, on the idea that a certain property being an AirBnB property lowers the value of the nearby properties, and think you're overstating the cascading effects by having it spill to schools and increasing stress.

My take on neighbors has always been that while it's good to know and like them, it's not required. A property being a discreet rental (as AirBnBs tend to be) would have neutral impact I'd argue. Other than seeing different people come and go, I would hardly notice that the property is anything but occupied. There is always the chance that someone unsavory would rent the property, but there's an equal likelihood (from my point of view) that a property would be purchased or long-term rented by someone of the same nature. I guess, frankly, I don't really care who lives next to me, since I do what I must to feel secure in my place.

I also think there's a bit of irony in your statement that change is championed those who are profiting, when you then lobby for regulation in the form of taxation. Instead of the purely capitalistic model of rewarding innovation and ingenuity, we give money back to the government to regulate something that may or may not need regulating in the first place. But I hesitate to go further, lest this turns to a political debate.

I just want to make sure you understand that I do get your point of affecting the feel of a community and the things that come with it, but I think that is highly subjective, and I'd even say that folks who look for that kind of experience in a living situation wouldn't be the kind to list their home on AirBnB anyway, and wouldn't sell to a flipper or someone who would.


(1) Tragedy of the Commons? Are you saying an extra room in your own home is "The Commons" (2) "... nobody needs to follow that law because you think nobody is being impacted? Is that your position?" Yes. That is my position. At one point in the US it was illegal to be Gay. It was illegal for blacks to drink from a white persons fountain. Those were laws "duly enacted by a democratic process". I'm saying that before you comply to with a law or suggest others should comply with a law that you ask yourself, "is this law just". If you think so, great. Back up the virtue of the law itself. My point remains, a law that is unjust doesn't deserve to be followed regardless of the process that enacted it.


For my definition of the commons being eroded by AirBnB, see my reply to eddieroger above.

In a sense you are right that "democratic" doesn't necessarily mean just or morally good, but I don't think you can compare what are essentially civil rights with property rights. I agree that laws should be backed with arguments about why they are needed and beneficial--though you realize it's not always possible to go into such details when I'm already so far off topic :-)

So here is my quick defense of SF's banning of short-term rentals (essentially making all AirBnB rentals in SF illegal): Regulating property is about zoning and controlling the market so that financial forces don't overwhelm the people involved. It is the city's responsibility to keep the city livable for its residents. This regulation defends local communities and avoids the instability of speculation properties, evictions for AirBnB conversions, etc.

Note that I think the total ban is both slightly too strict and probably expensive to enforce. That's why I advocate for a new policy: allow any owner (and renter) to do short-term sublets up to a limit of 30-40 days per year. For renters, they are limited to collecting the full amount of their rent in any calendar month, and any additional money collected belongs to the landlord. And finally, any 3rd party booking service must enforce these limits, collect hotel and sales tax, and turn records over to the city. That way people (even renters) can rent out when they go on vacation or make a little extra money, the city gets an elastic supply of rooms for big conventions, concerts, and sports events, but residential stays residential the other 330 days of the year and housing doesn't get bought up by speculators, nor does it have a bubble due to the value of AirBnB conversion.


I understand what you're aiming for but you risk insulting very just causes by trying to equate them to AirBnb even if it's obviously unintended.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: