Hacker News new | past | comments | ask | show | jobs | submit login
Mining Product Hunt, Part 1: Detecting Vote-Rings (algorithmia.com)
130 points by ANaimi on July 22, 2015 | hide | past | favorite | 40 comments



The issue with Product Hunt is that they don't care about voting rings. Just look at Twitter for "product hunt upvote": https://twitter.com/search?q=product%20hunt%20upvote&src=typ...

I've criticized PH for allowing voting rings and being a "black box" in general with respect to voting (see interview in Recode which had a comment from me: http://recode.net/2015/06/18/product-hunt-the-startup-kingma... ). The response to that interview is essentially "haters gonna hate," which means all attempts at offering suggestions to improve Product Hunt will be futile.

Yes, it still an echo chamber which I believe actually hurts the startup community because it incentives them to work on products which have zero value outside the echo chamber. It also incentives this bad behavior, and it has been spreading outside of Product Hunt which is why I'm upset by it.

If Product Hunt has enough community awareness such that they can tweet a GIF to every user who gets more than 100 points (seriously?!), then they have enough manpower to effectively punish and discourage vote manipulators.


"Yes, it still an echo chamber which I believe actually hurts the startup community because it incentives them to work on products which have zero value outside the echo chamber."

This. I hate this. I love the idea behind Product Hunt and I was on there early on. I've discovered some great products from there but I find it increasingly becoming centered around Product Hunt itself and startups in general.

I suppose that's only natural as the community posts for the community, but I feel like there's so much more potential with a site like this.

I'm not in the valley. I'm in Chicago. And Product Hunt increasingly feels like some huddle of startup guys in the valley smiling at each other over the latest do-hickey they made.


Maybe an example of Goodhart's Law in effect?

https://en.wikipedia.org/wiki/Goodhart%27s_law

(Cheers, from a fellow Chicagoan.)


We do have voting ring detection in place and often inform makers of our policy on Twitter when we see them asking for upvotes (often times they simply don't know it's not OK) in addition to highlighting this policy in the email that people receive when they're tagged as a maker.


Hi Ryan,

While I have to take your word that you tell people not to ask for votes, that's not sufficient to creating a fair system without manipulation. The Product Hunt demographic is "growth hackers" and scrappy entrepreneurs who want to do anything to get their startup to succeed. These are the same people who want to "move fast and break things." The flip side of that is that they know that what they do is wrong, but they can get away with it without much consequence.

Case in point, there's a lot of meta discussion on "how to tell people to vote for your PH submission without getting flagged by the voting ring detector" (http://www.reddit.com/r/startups/comments/36hsfc/product_hun... ) and "how to get upvotes on PH because you know an influential investor." (https://medium.com/ferris-life/zero-to-featured-how-ferris-c...).

Voting manipulation, not limited to but including voting rings, completely compromise Product Hunt mission to "find the best products." A one-liner in the FAQ will not resolve it, and I don't believe PH is doing enough to facilitate awareness and enforce that voting manipulation, in any form, is very bad.


(cto here)

We work similar to HN as we have spam rating on votes and devalue and derank posts based on those.

Essentially if you ask for upvotes there is a (almost too high) chance that our system will notice and punish your post for this. Sadly a lot of really good products drop because of this.

Regarding those blogposts - we don't really speak up if they are right or wrong (for obvious reasons) but those kind of articles exist for any system/website/mechanic/process where users believe they can get a benefit if they manipulate.


Wait, seriously? I had no idea it wasn't OK to asks for up-votes. I guess things have changed: https://twitter.com/rrhoover/status/453903533083856897


Product Hunt has always felt 'off' to me. It has always had a sense of a closed club open for public viewership. Even though they have opened it up a bit, it still feels the same.


Yeah honestly the community doesn't come off as sincere or genuine, so I've never felt compelled to participate or even visit the site more than a handful of times.


Man it feels good to hear somebody else say this. I feel like I'm shooting myself in the foot sometimes business-wise by not participating regularly in Product Hunt but I find that it's just not inviting. I don't feel a part of the club. I browse, find cool products, click through, and that's about it.

But I rarely comment as it just feels forced, like I'm standing next to a circle of people having a conversation at a networking event and I'm a few inches outside the circle, and it's clear I'm not a part of that circle.


Well, our method showed less than 1% of posts having a collusion ratio higher than 0.5. I personally think it's very reasonable for such a large community.


I think my perception came from the side of submitting products. For the longest time they were hand picked and you had to 'know someone' to get on the site. Which is fine, it's not a public good, the PH team is free to use it as they wish. Nowadays I see so many 'guides' on how to get on PH and how to maximize potential when you do get on PH.


It's just another marketing platform, right? Nothing really "crowdsourced" about it. We dealt with similar voting ring issues when I worked on another industry forum, but people were really not subtle about it, even when it was explicitly discouraged. Sort of mindblowing, actually.


And by unsubtle, I mean all the accounts had the same prefix and similar profile pictures.


This is a very clever article. I suggest further research into whether vote-rings are actually indicative of product teams and founders with strong networks who they are able to motivate to support them. It may be that up votes from voting rings are just as useful (or perhaps more) in determining the likelihood of success of a product.


Is it still a vote ring if it's "unconscious" ?


This is/was a problem on Stack Overflow (retired mod here). Work colleagues in the same office or company upvoting each other's questions or answers. You could often tell it was fairly innocent from the activity on their accounts (active, positive participation, asking and answering with reasonably good posts) and from the spread of votes (lots of upvotes given to/from unrelated users outweighing their votering count).

But sometimes the votering detection would ring bells and when contacted these users had genuinely not considered what they were doing was creating a votering, yet were willing to understand the problem at hand and back off from each other a bit.


I'm the admin for one of the rare instance of Stack behind a corporate firewall, which doesn't have votering detection - I've noticed this happening among a few co-located sprint teams. I created a d3.js Sankey Diagram to show the volume of people voting for other people and posted it on the site to let the community discuss it and it died down.


Hm...probably not.

https://news.ycombinator.com/item?id=1647826

Voting rings undermine the exposure of truly successful products, while giving preference to a possibly inferior friend's product.


To valee's point though. I also believe collusion ratings could be found very high in cases where this is not a "voter ring" issue, but more an issue of who has a better network and/or is better at marketing.

Take Amy Hoy for instance (at least some people here on hacker news will know who she is). She has more than 16,000 followers on Twitter right now. I have 79 followers. She is good at marketing and it must have taken time for her to build up all of those followers. If she were to put several products on Product Hunt and I were to put up competitive products to each of those and we both tweeted about our Product Hunt posting naturally she will get more upvotes. Some of these upvotes will likely come from the same people, some of her most fanatical loyal customers.

To the algorithm this might look like collusion/voting ring, but it is just some basic and completely ethical marketing.


After several rounds of the same people always being in the first ten people to vote, how doesnt that fit the defition of a voting ring?


I'm not saying the voting ring problem doesn't exist, but that the question of "is this ethical" is one that is more difficult to answer than this algorithm. I think the developers who ran this collusion algorithm must agree with this to some extent, otherwise there would be no reason to pseudo-anonymize the results.

To give an example. If I were to create fake accounts and then do some type of automated up-voting system for my own posts; in my mind that is clearly unethical behavior.

If I announce to my existing customers that I have posted something and they go of their free accord and up-vote this is clearly ethical behavior.

If I ask my existing customer/friends/family to go up-vote me I believe this falls into a grayer area, but I would personally say this too is unethical.


Ok but why is ethics the most important consideration here? The consideration for the community is that the quality of the links is good and whether ethical or not, posting everything publicly and having THE SAME people be in the first 10 upvotes means there is a system that takes quality of the article into account less than the identity of the submitter, and it's been doing this over and over.

So ethics may be not the important question here, for the community.


So I think there are (at least) a couple of separate issues here.

1) Is Something ethical marketing tactics.

    This is mostly what I was trying to address.  If something is the result of ethical marketing then I'm of the opinion, "We'll allow it!"  The reason for this being that "the better product" will fail if marketing is poor; and so people are typically better off going with an inferior product that survives.
2) Is Product Hunt allowing/perpetuating voter rings?

I think this goes to your issue of, "THE SAME people be in the first 10 upvotes". Honestly I'm not sure, but I do think Product Hunt's model is flawed. I think this message on Product Hunt alone gives some indication to the problem,

"Product Hunt is a community of product enthusiasts. Submissions are accepted by our most active members, specifically those that have been invited by others in the community."

It is an invite only, non open community. So in a certain sense the entire site is one giant voter ring; we all can vote, but we can't submit. So all voters are limited to voting on products from those deemed worthy by Product Hunt of posting in the first place.

If we were to contrast this with Eduhunt.co (same as Product Hunt, but targets only educational products) it is an open community that allows anyone that registers to post.


agreed , this type analysis only open a direction to look further in and do not provide any absolutes (as most data analysis).


I would enjoy seeing the same with Hacker News. It would be interesting to see how affiliation with Y Combinator influences the vote.


This isn't possible with Hacker News since the voters are not public.

I had done brief analysis that shows that the (YC X) submissions do receive more upvotes on average. Which is hard to attribute to a voting ring specifically.


I often feel more inclined to upvote them because, in the end, YC created this community and I feel their incubees deserve a bit of that reflected glory.

It's my way of saying thanks, and I suspect other people's too.


https://news.ycombinator.com/item?id=5006037

There's some information here. PG was late to the conversation, but said there isn't anything, while others said the exact opposite before he posted.

Of course a lot could have happened in the 929 days since that was posted.


YC founders still see each others' names in orange, but that's it.


It would be also interesting to see how the big developer conferences influenced the vote on HN.


This could also be used another approach to recommendation engine. The typical recommendation engine predicts that users who buy butter also buy eggs, but doesn't make direct connections between the individual users. The process described in this article instead identifies individual users who act alike, and that can also be used to predict if user A buys an ostrich egg for no logical reason, then their "collusion" peers are also highly likely to buy an ostrich egg ... you can assume if they're not colluding directly then they at least think alike if they have a very high collusion ratio.


Part II is a recommendation engine we built for the PH.


I think there is a lot of frustration from some people with regards to PH and it’s understandable. But I don’t know if PH can be blamed for this. There has to be some kind of curation for something like this, and whether it’s a journalist (ex: TechCrunch), a community of people (PH), an editorial team (AppStore), as with all curation systems, there will always be people who feel it’s “unfair” (typically when their thing does not get selected).

I think what this fails to capture is that actually, it seems to me PH is a great proxy to the real world of startups. Yes it helps to be connected to influential people to be featured on PH. But the exact same thing is true for your startup in general. If you don’t know anybody, you’ll have a hard time getting noticed and finding investors. Your network and ability to connect to influential people can make or break your venture (I used to NOT think it was the case… I changed my mind based on my personal experience :-) ). I think that’s one of the very reasons Silicon Valley works much, much better than other places in the world: London, Paris, etc. They are a tightly connected community of makers, investors, journalists, influencers etc. It’s the whole echo-chamber thing and it’s absolutely fundamental.


Calling it a "vote-ring" is a bit of a logical leap, that would imply that its individuals purposefully upvoting one another's products.

Yet with such a low amount of posts meeting this tests standards, and the fact that Product Hunt was specifically designed to focus on influencers it doesn't seem unlikely that a small group of users would vote in similar patterns on either quality products, or ones shared by influencers they follow.

And, when you take into account that they share via email posts from influencers you follow it just adds to this behavior. The last few times I've been on Product Hunt was because I got an email that Hiten Shah and shared something. I follow him because we have similar interests. I opened the PH link, agreed it was a great find and upvoted it. It doesn't make me a vote-ring.


Is it possible to run this algorithm efficiently for a large number of users? My intuition is telling me that the naive implementation its factorial time due to the problem of selecting combinations of users and posts to test.


The formula/collusion ratio is exponential, but not factorial. The implementation however is very efficient: instead of applying the algorithm on the complete dataset of users, we apply it to each group of users within a post. This drastically reduces the running time.

The implementation goes over every post and computes the ratio for voters within that post. It then removes one user from that group and recalculates the ratio. If the ratio drops, it brings that user back in. If it increases, it keeps them out.

You can check the implementation here (click on Edit Algorithm): https://algorithmia.com/algorithms/ANaimi/SimpleVoteRingDete...

Running SimpleVoteRingDetection on the complete Product Hunt dataset (16k+ posts, 52k+ users) takes a few seconds. If you have a dataset for any other website/application, you can easily feed it into the algorithm and experiment with that.


Cool solution. I like it.

Is it helpful to first look at the names and sign-up times of a particular set of users, and then search for votes on common posts? This would result in a slightly different ratio:

SUM Votes(U1, P) / Votes(Un, P)

where U1 is a particular user, P is the post voted on by that user, and Un is the rest of the users up to n total users.

The reason this occurs to me is because you can still make this run more efficiently by limiting the number of users you examine (as opposed to running across only certain posts - should be the same number of queries for a particular number of either users or posts), and it would allow you to start the top of the detection funnel on heuristics around obviously fake IDs or correlated sign-up times.

This might help get around vote bots that set up fake accounts and all vote for the same posts, but also vote randomly for at a certain frequency for other posts (which would not be differentiated in the first algorithm from a true vote ring versus voters with similar trends in taste, such as the effect observed on pinterest).

Anyway, just thinking out loud. Or whatever the typing equivalent of that is.


"We removed users’ names to protect the innocent. Instead, we’re showing random fruit names prepended to the real users’ ids."

Can't someone just go lookup the IDs to get real names?


Yeah... but the purpose of the post is to explain the methodology not to point fingers. We kept the IDs because they are useful to demonstrate whether users created those accounts consecutively. Product Hunt's public API can be used to find the real names and even expand this method further.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: