While the FRED data might be ~%50 of the datasets, most of those table are 200-4000 rows, it is not nearly %50 of the rows of data.
the remaining 50% of datasets have a lot of gems
quality in the breadth of data is important
state-wide liquor license, corp reg. OSHA is great data. AMS shipping records. FDA adverse events. Oil and Gas well locations/production. Consolidated weather reports since 1800... im almost certainly forgetting some.
i bet you'd have better luck just removing all spaces and then resorting to a text segmentation algorithm. Peter Norvig has some great papers on how to do this effectively. I've wrapped his code with a Tornado web service: https://github.com/adkatrit/text-segmentation-server
I have received bogus friend requests from bogus fb accounts about 10 times since i've been on facebook. (8 years). Some even go as far as friending people I am friends with and liking things that I'm liking or being from my home town. Sometimes it's hard to tell the difference between a fake account and just a really bizarre person. Usually you can spot a fake from lack of interaction with other people on their timeline. That is not to say that elaborate fakes haven't been creating bogus friend networks. It would be an interesting project to collect the fb_ids from all these potentially bogus accounts, though it would be best to get the info that is surely private, like login times/locations. I'm sure the people in quality assurance at facebook are all over this.