Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask News.YC: Any studies on whether social filtering works?
11 points by aston on May 25, 2008 | hide | past | favorite | 6 comments
Web 2.0's clearly about people, and the more people you get, and the more you know about them, the more your company's worth, right? The social graph, while not yet clearly monetizable, is universally considered valuable for the predictive ability we think it has or will soon have.

Which leads to my question: Does data about my friends actually help at all when it comes to guessing things for me? I know Amazon, Netflix and others have demonstrated the predictive power of aggregated data, but my gut says restricting that data only to people I know will make it worse, not better. And if indeed the friend graph decreases predictive ability, what's the value of the social graph beyond the network effect/lock-in to a service?

Pure speculation is cool, though if anyone has hard data on this, that'd be even awesomer.



I would assume it better to compare and make predictions about me based on somebody unknown to me but with strikingly similar purchasing habits, than to compare me just to known associates.

Part of the problem with the social graph, particularly as it exists on broad networking sites like Facebook et al, is the lack of data about the nature of those relationships. (ie, "the more you know about them"). I've got however many dozen friends, but the social graph there doesn't distinguish between the guys I went to primary school with and the guys I see every weekend.

If I could better position my relationships, more accurate data could be created. Eg, if I declare Dave is my 'drinking buddy' and Dan is my 'film friend', the system could make reasonable predictions about what I might like to drink based on Dave and what films I might watch based on Dan, while ignoring the vice-versa.

In other words, the value of the social graph requires more meaningful information about relationships, not just more people. Without that, aggregated data wins out.

You're right - pure speculation is cool!


Turns out that "it depends"; a recent paper by Jon Kleinberg, Sid Suri and others suggests that for wikipedia edits, social networks are more important predictors, where as for Live Journal, "similarity networks" are better.

For hard data google the paper (it's a preprint so I'm not gonna link to it): Sid Suri site:cs.cornell.edu, and checkout the Papers link.


Made me take the long way to the paper, but from the abstract it's exactly what I was looking for. Thanks.


Social filtering absolutely works in the small: consider how often your friends and co-workers can recommend things that are of interest and of use to you. How you construct an application that facilitates or automates this is a different question.


It all depends on how data is aggregated and how specialized service like yours cater to the data i.e uses that data.

For example linkedIn takes all the data from the end user whenever there is communication event.

1) adding a profile: name/school/college/specialization/job industry/company

2) adding a new job- job industry/company

3) adding a contact - type of relationship

4) asking a question - mark that to industry vertical/set of people

So having granular details (which would help your service to serve users better) defines what data you need to collect from start and then you can use the same.

pure speculation is cooler with above set of data


From a theoretical perspective, what you're assuming above is that you apply a social filter and then use collaborative filtering (i.e. Amazon, Netflix and similar's method). In fact the two supplement each other -- collaborative filtering using the entire customer database and then do a weighted merge of that with an algorithm ranking things based on social graph traversal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: