Hacker News new | past | comments | ask | show | jobs | submit login

Couple of suggestions:

Exclude commonly trending words such as Google. It isn't trending if it's already huge.

I've got '[' and ']' for the last week, exclude these too.

EDIT: As a matter of fact something seems to be wrong if Google is classified as trending. There haven't been a big surge in posts about Google in the recent past. Either something is wrong, I am wrong or the algorithm is still training (e.g.: learning what is the normal rate of appearance of certain words)




I've added [ and ] to the filter list, though I'll have to remove some stuff from the db to get rid of the past results.

When I was developing it I saw Google trending all the time. I think Google just comes up in conversation on HN very frequently.

I suppose I could remove Google it if it doesn't have meaning. But maybe it does. This is interesting: if Google stops trending, perhaps that's a canary signal that it's not relevant anymore.


I think what the parent means is that the derivative of the appearances is a lot more interesting than the number of appearances (and possibly as a percentage of the total number of appearances). Google moving from 100 to 105 isn't very interesting, but a new language going from 0 to 10 mentions might be very significant. (edit: and Google owning x% and staying there isn't very interesting either)

In other words, frequency of occurrence is interesting, but statistically unlikely occurrences (more or less frequent than expected) is even more interesting.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: