Hack idea: text classification for political propaganda

xirium · on Feb 14, 2008

This verbose stenography also applies to economists. Apparently, Alan Greenspan wasn't happy with a speech until he read the newspaper coverage the following day. If one newspaper reported the opposite of another newspaper then he was satisfied. You don't want to spook the market.

danohuiginn · on Feb 14, 2008

Even very simple text classification can get you somewhere. Last year a couple of economists, Matthew Gentzkow and Jesse Shapiro, published a wonderful paper in which they did something similar.

Take a look at it: http://home.uchicago.edu/~jmshapir/biasmeas052507_formatted....

"To measure news slant, we examine the set of all phrases used by members of Congress in the 2005 Congressional Record, and identify those that are used much more frequently by one party than by another. We then index newspapers by the extent to which the use of politically charged phrases in their news coverage resembles the use of the same phrases in the speech of a congressional Democrat or Republican."

They then go on to compare their index of politically-slanted language in newspapers to the politics of the newspapers' readers, and conclude that newspaper bias is driven more by readers wanting their own prejudices confirmed than by the politics of the newspaper owners. It's a really great piece of work: worth reading in full if you have any interest in text classification politics.

iamwil · on Feb 14, 2008

I like the idea, though if it became in widespread use, writers would do as spammers do--write to get past your filter.

One concern is that it'd be labor intensive for you to build up a large enough corpus with some metric to compare to so you know how well you're doing. Marking it by hand is labor intensive, and if you use a lot of humans, it might be a little inconsistent.

dfranke · on Feb 14, 2008

I don't think they'd do that. The purpose of the fluff is not (usually) to deliberately conceal their beliefs. It's there because it's what some people want to read. They want a candidate who "cares". I don't think the people who write the fluff would have any motivation to try to beat fluff filters.

dfranke · on Feb 14, 2008

Come to think of it, instead of trying to beat the filters, what if they specifically cooperated with them so that they'd filter out nothing at all? Think of what a great boast it would make: "I'll take you straight to the point, and these folks over here will prove it mathematically". I could picture Ralph Nader trying this.

Tichy · on Feb 14, 2008

I think they already do that. Those phrases on posters of politicians remind me a lot of the way I write my CV, inserting all the keywords that recruiters look for.

In fact, it probably would be easy to identify the best keywords for politics with Google Adwords, just see which ones get clicked the most?

darose · on Feb 14, 2008

Semi-related:

You might want to take a look at this blog entry:

http://billburnham.blogs.com/burnhamsbeat/2008/02/skygrid-an...

It's about a startup that has accomplished some cool things in the area of recognizing meaning from text. They've developed some "sentiment"-measuring algorithms that allow them to classify articles into "good" or "bad", for the purposes of deciding whether to invest in a stock. Pretty cool stuff! Perhaps it might give you some ideas on similar approaches you could apply to your problem.

neilc · on Feb 14, 2008

A professor at the university where I did my undergrad has done research on this topic:

http://www.cs.queensu.ca/home/skill/papers.html

For example, he took the Enron email data set, and applied machine learning techniques to try to distinguish "dishonest" emails where something was being concealed from normal emails. A student of his has applied similar techniques to speeches given by MPs in Parliament (Canadian equivalent of Congress).

dkokelley · on Feb 13, 2008

So, a program that loads certain text, then recognizes the primary points of the text, and prints it the filtered version?

I'd think that there would be much better uses for it than as a BS-O-Meter for political literature.

Think about a program that could generate the summary of an essay or book, or assist novice writers in writing concisely. Maybe use it as a service that takes news items and abbreviates them for busy people who want to stay on top of things. you could call it, snapnews.com or something.

dfranke · on Feb 14, 2008

I think you'd find that usefulness dimishes pretty rapidly as the problem domain expands. The reason spam filters work as well as they do is that most spam pretty much looks alike. Likewise, there are only so many ways you can phrase soothing platitudes that will get soccer moms to vote for you.

pius · on Feb 14, 2008

I'm working on something different, but related. I'm hoping to open-source part of it this week, so stay tuned. :)

andreyf · on Feb 14, 2008

Shoot me an e-mail when you do, please?

fedorov@rutgers.edu

cedsav · on Feb 14, 2008

A bit of topic, but that reminds me of Isaac Asimov's Foundation. At one point, a politician comes, makes a big speech with lots of promises and everybody is happy, until a day-long sophisticated semantic analysis of the speech reveals that it was actually devoid of any meaningful content.

tocomment · on Feb 14, 2008

Good idea, I think it could be done. Perhaps you could look at congressional speeches and match it to how each speaker voted?

edw519 · on Feb 14, 2008

Just invert this:

http://www.dack.com/web/bullshit.html

curi · on Feb 14, 2008

where do you get good and bad seed phrases?

byrneseyeview · on Feb 14, 2008

Read transcripts from every speaker of the house you've never heard of. Being powerful but creating nothing memorable is an indicator.

byrneseyeview · on Feb 14, 2008

Another possibility: watch a few speeches on Youtube, and write down the applause lines.