Show HN: See what words are trending on HN

hpvic03 · on Nov 28, 2013

This is a little something I did for fun. The trending hash tags on twitter are an interesting metric for what people are talking about, so I thought it would be cool to make something like that for Hacker News. But we don't have hash tags, so the app just uses words.

Note that you can click on a word to get some of the posts it's mentioned in.

Edit: Also, a heads up -- it's running on a Heroku free dyno, and it's already feeling a little slow.

jamra · on Nov 28, 2013

Can you try stemming the words (bitcoin, bitcoins) as well as combining synonymous words like (go, golang).

kyyd · on Nov 28, 2013

Good stuff! This would be awesome as a word cloud.

hpvic03 · on Nov 28, 2013

That's a good idea. I could do that.

tomasien · on Nov 28, 2013

I posted this on Twitter this morning, but watch http://www.google.com/trends/explore?q=bitcoin#q=bitcoin&cmp... and the chart in this post carefully if you bought Bitcoin speculatively (ie, not because you actually care about Bitcoin or want to be long on it). When the Google chart drops and this goes off the top 5 on HN but the price hasn't plummeted, you're at a mid-term peak and you should sell.

The price of Bitcoin is being driven by media and hype right now - deserved or undeserved, in the near term there WILL be a dip in that and the price will drop for no "good" reason. Keep an eye on that if you're looking for a peak at which to sell, or (maybe more importantly) a bottom at which to buy.

gabriel34 · on Nov 28, 2013

Couple of suggestions:

Exclude commonly trending words such as Google. It isn't trending if it's already huge.

I've got '[' and ']' for the last week, exclude these too.

EDIT: As a matter of fact something seems to be wrong if Google is classified as trending. There haven't been a big surge in posts about Google in the recent past. Either something is wrong, I am wrong or the algorithm is still training (e.g.: learning what is the normal rate of appearance of certain words)

hpvic03 · on Nov 28, 2013

I've added [ and ] to the filter list, though I'll have to remove some stuff from the db to get rid of the past results.

When I was developing it I saw Google trending all the time. I think Google just comes up in conversation on HN very frequently.

I suppose I could remove Google it if it doesn't have meaning. But maybe it does. This is interesting: if Google stops trending, perhaps that's a canary signal that it's not relevant anymore.

alanctgardner2 · on Nov 28, 2013

I think what the parent means is that the derivative of the appearances is a lot more interesting than the number of appearances (and possibly as a percentage of the total number of appearances). Google moving from 100 to 105 isn't very interesting, but a new language going from 0 to 10 mentions might be very significant. (edit: and Google owning x% and staying there isn't very interesting either)

In other words, frequency of occurrence is interesting, but statistically unlikely occurrences (more or less frequent than expected) is even more interesting.

dmunoz · on Nov 28, 2013

Fun to browse, and I could certainly see this enabling me to notice some interesting things I had missed on HN.

Two minor things I noticed:

Currently, 7 is listed as tending now with seven mentions. Not that numbers trending could never be interesting e.g. if 600 were trending due to the discussion about the lowering of the prime gap [0], but 7 seems to be trending just because of submission titles.

Both [ and ] are tending in the past week. This seems to be due to submission titles tagged with e.g. [video], [pdf], [<year>].

[0] https://news.ycombinator.com/item?id=6784383

hpvic03 · on Nov 28, 2013

I've added [ and ] to the filter list, so they won't be included in the future.

It's true that there is a bit of noise. I think it's pretty good overall though. There is a filter list of 866 words that greatly improved the results after I implemented it.

nmc · on Nov 28, 2013

Wow! Top5 has "bitcoin","income","bitcoins", and "bank".

Is HN all about the money?

EDIT: also, very neat work! However, you should consider excluding "]" and "[" from the list.

hpvic03 · on Nov 28, 2013

Thanks. Haha, it seems so, at least this week.

I just added [ and ] to the filter list.

nakovet · on Nov 28, 2013

Filter out words that are less than 3 characters maybe?

Dru89 · on Nov 28, 2013

Right now "ockhams" is trending with 17 mentions and "razor" is trending with 15 mentions.

Does that mean that someone just said "ockhams" twice?

route66 · on Nov 28, 2013

You just did. And it shows in the stats I happen to see right now.

hpvic03 · on Nov 28, 2013

If there's not a bug in the app, then yes.

_quasimodo · on Nov 28, 2013

Or someone misspelled it twice.

sylvainkalache · on Nov 28, 2013

Hnwatcher.com provides graph that shows numbers of time a specific word was mentioned on HN over time.

antonius · on Nov 28, 2013

Nice work. Before clicking, I knew bitcoin would hold the top spot. Kind of hope this craze dies down a bit so more non-bitcoin stories can appear.

pluralVision · on Nov 28, 2013

Are you using Lucene or Elastic search for indexing words?

I would be great to read about your project more.

hpvic03 · on Nov 28, 2013

No, it's nothing fancy. It's just your standard Rails app. It saves a record for each word found along with time data, then runs queries grouping by word count and filtering between now and whatever time you choose. I'm sure that's a naive implementation and the app could certainly be optimized.

I could open source it if you really want to take a look.