Yes, generally n-gram-based analyses are a huge minefield. Computational linguists do use them, with a lot of caveats and careful analysis of confounding factors.
One simple one that comes to mind here is that you need to analyze to what extent changes over the period of the data set are caused by underlying societal changes, versus changes in the NYT itself; the end result will be a mixture of those two changes, some of which may be magnifying and others offsetting. The 1980 NYT and the 2013 NYT are not the same newspaper, not edited by the same people, not sold to the same readership demographics, and not soliciting the same advertisers, so it's somewhat questionable to treat it as a stable proxy for a social group.
Another common pitfall is language change screwing up all kinds of measures (since n-gram models just work on word counts). For example, if two words are used roughly interchangeably in 1980, but by 1990 one of them has fallen out of usage, and been replaced wholly by the other one, searches for just the one word will look like the word's on an upwards trend, but it would be misleading to infer an increase in the underlying concept over the period. Of course, you can account for this by merging words into equivalence classes (most analyses will do basic stemming and merging of alternate spellings), but you have to be very careful to get all the equivalence classes (which is not a well-defined notion). Just a list of the top words in a year will tend to be some mixture of 1) top concepts; and 2) concepts expressed using only a small number of wording variations, so their count doesn't get diluted.
let me fix that for you
"This makes it possible to put numbers on our preconceived notions and play around with them."
It may be entertaining, but rigorous? I don't think so.