Hacker News new | past | comments | ask | show | jobs | submit login
Good News: Craigslist drops exclusive license to your posts (eff.org)
132 points by dredmorbius on Aug 9, 2012 | hide | past | favorite | 49 comments



This is the first time I have seen EFF tap-dancing around issues that are important to its mission and I can't help but think that this arises from its relationship to Mr. Newmark.

Craigslist has recently engaged in a lawyer-driven campaign to ensure that no third party can use any of the data posted to its site:

1. Quite apart from the "exclusive license" language that it recently added (and has now dropped), it had recently amended its terms of use to make absolutely clear that no one has the right to use any data posted to its site, whether directly or indirectly, under any circumstances.

2. It sent cease-and-desist demands to various parties, including PadMapper, warning them not to use any such data but, as much as anything, making clear that it was revoking any implied license that the law might find to exist from CL's having allowed parties to scrape its data in the past (i.e., it was saying, in effect, no more loose community norms will apply but only the heavy hand of the law).

3. It also sued 3Tap for having set up a business that used CL data from the Google cache and offered it to third parties to enable them to have access to CL data without having to get it from the site directly. 3Tap contended that it could lawfully use the cached data without violating CL's terms of use or any copyright held by CL in any such data. CL strongly disagreed and set out its claims of infringement, etc. in a complaint filed in federal court just recently. The 3Tap business model may or may not have withstood this legal challenge but the salient fact here is that CL faced a lot of uncertainty on the legal issues involved, meaning that it might ultimately have lost on its claims and further meaning that 3Tap's business model would serve as a ready-made way for third parties to gain unfettered access to the CL data, at least until CL were able to obtain a preliminary injunction in the lawsuit or an ultimate victory on its claims. To do that, however, it would have needed to run the risk of potentially losing on its claims and thereby having helped create some pretty bad legal precedents for its business model.

4. That, to my thinking, is the only reason why CL would have taken the otherwise insane-looking step of changing its terms of use to demand that posters give it an exclusive license to all items posted to the CL site. That sort of license represents a bonehead decision from almost any angle one looks at it except one, and that is the legal angle of giving CL a strong position to claim that 3Tap can no longer use any of the data taken from the Google cache, whether or not 3Tap was deemed to have been bound by the CL terms of use. If CL had exclusive rights to enforce claims relating to the CL postings, then it could try to shut down 3Tap whether or not 3Tap was technically bound by the CL terms of use because it could do so from the copyright angle. In other words, the whole point of having tried to claim exclusive rights to the posts was to have a weapon against 3Tap to block it from having an avenue by which it could open up CL posts for use by potentially innumerable third parties via 3Tap - that is, to plug the final loophole in its legal system aimed at strict enforcement of the terms of use.

5. Any agreement, however, that amounts to a bare assignment of the right to sue on a copyright (with the assignor retaining effective ownership rights to everything else besides the right to sue) is of dubious enforceability, as copyright troll Righthaven found out to its dismay when it was upended by the combined efforts of Democratic Underground, Fenwick & West, and EFF - who convinced a judge to toss all its cases on the ground that such an assignment was ineffective and could not be used as a basis upon which to sue alleged infringers. Now the legal doctrines involved here can get complex and I have not had a chance to assess them carefully in the 3Tap/CL context. But I would wager that CL had lawyers telling it that, if the new terms amounted in practice only to a bare assignment of the right to sue on copyright (and what other reason is there for this?), there was still some potentially serious doubt on whether it would be able to prevail against 3Tap even though CL could now claim that it had the exclusive right to sue to enforce copyrights relating to CL posts.

6. This left CL with only one practical choice, which it took: remove its data from the Google cache or at least set it up so that third parties could not use such data. And that is precisely what it did, just a couple of days ago (see http://news.ycombinator.com/item?id=4351207 for an HN discussion on this event, with link to relevant article).

With the final step above, I assume that CL has effectively crippled the 3Tap business model. Therefore, it no longer needs to impose insane-looking licensing terms on persons wanting to post to its site and it has dropped those terms. It has simply found a more practical way to defeat the 3Tap model, rendering the oppressive licensing terms unnecessary.

The bottom line to what has happened here, in my view, is that CL is engaged in a heavy-handed, lawyer-driven campaign to guard its turf at all costs but doesn't really want to say this publicly for its own prudential reasons. This too is perhaps why EFF is being so diffident in this post, trying to celebrate the removal of obnoxious language from the CL terms of use without really giving convincing reasons for why that language found its way into the terms of use in the first place or why (short of CL being on a drunken toot) such language underwent a here-today-gone-tomorrow transformation for no apparent good reason over a very short period of time. I am only speculating here but it looks like EFF pulled a punch here.


grellas, I've read your thoughtful and thorough legal analysis on HN for years and wanted to say yet again, thank you.


grellas: your analysis of CL's behavior makes a lot of sense to me. However, I don't understand why you say the EFF was "tap-dancing around issues that are important to its mission."

While I agree the EFF should be concerned with companies asking for exclusive licensing to user posts, because its mission is to protect individuals' digital rights, I don't see why it should concern itself with whether companies make aggregate data cacheable, scrap-able, or API-accessible to other businesses.

CL makes money by aggregating data: it attracts buyers by aggregating all local listings in one place, and it attracts sellers by aggregating customer searches for local listings in one place. IMO the EFF has no business asking CL to give its product -- aggregate data -- away for free.

Please explain, how exactly is the EFF "tap-dancing" here?


Thanks for your thoughtful comment.

I agree that EFF has no business asking CL to give away its product for free, nor do I think CL should be forced to do so for any reason.

By tap-dancing, I meant that EFF failed to discuss the legal issues in a way that it might, for example, have discussed the actions of a patent or copyright troll in making an outrageous attempt to disrupt internet norms through some indefensible legal artifice. In such a case, EFF would be the go-to site for an astute legal analysis delving into: (a) what was done, (b) why it is legally indefensible (if it arguably is), (c) the horrible policy implications of what was done, and (d) what the motives of the actor were in doing this and whether those motives could be justified for any reason.

As I read this post, I saw none of the normally sharp and even hard-hitting analysis that EFF would offer in such a case. An action that normally would call for outrage has instead received a very diplomatically worded critique from the EFF. That is all I meant by "tap-dancing." It is almost as much about tone as about substance but clearly this to me is not one of EFF's normal posts.


Ah, I see your point, although I still don't agree with it.

The simplest explanation for the EFF's behavior is that in this case a confrontational stance was not necessary -- and in fact would have been counter-productive. CL has not only supported EFF values for a longtime, it also reversed the new user licensing terms very quickly. In other words, CL is very different from, and deserves different treatment than, a copyright or patent troll.

Had CL persisted with those insane end-user terms, I have no doubt the EFF would have taken an increasingly confrontational stance.

Thanks for the prompt response.


Asking Google to stop caching CL pages is probably not a real solution. There are plenty of other parties caching them.

From Squid/Varnish style proxies, to paid CDN's, to free CDNs. If you upload content to the web, and you do not put it behind access controls like a password, it gets copied.

It's the nature of the web.

Many sites have built businesses around caching. Not taking content from search engine caches, but caching content that users request through their proxy. The users have no idea they are using a proxy. They think they are making a request to "A" site, when really their request is being passed to "B" site and "A" site is caching the result. Then other users come along and request the same content from "A" site and they think, "Wow, this is fast. And 'A' site sure has lots of content."

What's strange is that guys like Craig Newmark have made a fortune running "B" site, yet they get extremely protective of their turf. They think they are _entitled_ to something. Not really. They are just lucky. Because anyone could provide the same "service". Users can get the same content (which incicentally "B" nor "A" site own) from a myriad of points on the network. The web is a vast copy machine.

There is no entitlement. These guys who have made a fortune should be thankful.

Just because you can afford lawyers does not mean you have to start suing people because you are annoyed by competitors and the lawyers advise you that indeed you can sue. They are rarely going to tell you you cannot sue. They can find something.

Copyright trolls like Righthaven are one example of egregiousness that comes to mind. But how about with the recent OpenDNS Paxfire lawsuit? Junk patent nonsense. Fighting over who gets to hijack NXDOMAINS. The suit was dropped of course. Meritless.

There are companies out there whose business consists of hijacking and manipulating traffic, whether DNS or HTTP, to show ads. And these guys think they are _entitled_ to something. Please.

They should be happy with what they get away with.

We need more Alsups and Posners. We need more adults in the room.


The EFF just lost so much of my respect for this. They're supposed to be protecting the rights of people on the internet, not playing stupid political games one would expect from Congress.

I'd have expelled Craig for that bit of buffoonery, rather than walk on eggshells and tapdance around the fact that he's being a two-faced ass.

You bet your ass I'm mad. This kind of idiocy sickens me, especially coming from a group as supposedly altruistic as the EFF.


The disclosure at the bottom is rather informative:

"Disclosure: Craig Newmark, founder of craigslist, is a member of EFF’s Advisory Board, and craigslist has donated to EFF."


I find this deeply ironic.

I'm intensely curious how much responsibility for the current debacle can be attributed to Craig and how much can be attributed to Jim Buckmaster


The official line at Craigslist and from Craig is that Craig just works there (and owns roughly half the company). Jim's the boss.

Given his ownership share (slightly less than half IIRC given the split between him and Jim and shared held by eBay), that's a slightly disingenuous party line.


This is strangely reminiscent of money and politics (corporations giving money to congressmen, etc)


EFF just lost all my respect. Craigslist should clearly disclaim ownership of content like before http://web.archive.org/web/20110511105832/http:/www.craigsli...


Two things: CL didn't pull back as far as it could of: > This new language was a marked difference from prior Terms of Use (2008/2011), which clearly stated “craigslist does not claim ownership of content that its users post.” While this clear language has not returned to the TOU, the same result comes from the non-exclusive license currently in the TOU.

And why can't the EFF just quote the current passage in its changed form? The post itself is still unclear. Is this the affected passage?

> You automatically grant and assign to CL, and you represent and warrant that you have the right to grant and assign to CL, a perpetual, irrevocable, unlimited, fully paid, fully sub-licensable (through multiple tiers), worldwide license to copy, perform, display, distribute, prepare derivative works from (including, without limitation, incorporating into other works) and otherwise use any content that you post. You also expressly grant and assign to CL all rights and causes of action to prohibit and enforce against any unauthorized copying, performance, display, distribution, use or exploitation of, or creation of derivative works from, any content that you post (including but not limited to any unauthorized downloading, extraction, harvesting, collection or aggregation of content that you post).

http://www.craigslist.org/about/terms.of.use


You also expressly grant and assign to CL all rights and causes of action to prohibit and enforce against any unauthorized...

How is this in effect any different from a non-exclusive license? Are they going to ask each person if they gave permission for something that was cross posted multiple places? At least now people can post an ad to multiple place, but CL wants to make sure that they aren't crawled?

If I were CL, I'd be more worried about losing Safe Harbor protection than whether or not I owned user submissions.


"We understand that craigslist faces real challenges in trying to preserve its character and does not want third parties to simply reuse its content in ways that are out of line with its user community’s expectations and could be harmful to its users."

You don't want your users unexpectedly harmed by saving them hours of their precious lives with a better UI. That would be a tragedy.


If Craigslist claimed ownership of the content, wouldn't that make them responsible for the content and open them up to the lawsuits surrounding criminal activities initiated through a Craigslist posting?



So what does this mean for services such as PadMapper?


People do read, and some likely thought twice about clicking "Continue", and did not post because of the ridiculous license. If it happened too frequently, CL might have noticed a dip in posts.


I highly, highly doubt this. How many users on a given day are brand new users? How many are users who have used CL occasionally or regularly? And how many of those are likely to read the text of the license on their nth post?

Very, very few, I bet, unless CL highlighted the change in red and made them click through a few Confirm dialogs.


i second that :)


I think this has less to do with a drop in user postings and more to do with the obvious irony of an EFF Advisory Board member extending their rights over user content so broadly.


maybe it has been discussed before (i didn't follow the craigslist discussions) but it's funny that craigslist does not have any robots.txt on it's city subdomains

i.e.: http://newyork.craigslist.org/mnh/sub/3195940666.html is hosted on http://newyork.craigslist.org/

but http://newyork.craigslist.org/robots.txt makes an HTTP 302 redirect to http://www.craigslist.org/robots.txt, but http://www.craigslist.org/robots.txt is only a valid robots.txt for http://www.craigslist.org, and not for http://newyork.craigslist.org/, as the host, protocol and port of the robots.txt file is an essential part of its validity range.

so basically newyork.craigslist.org does not have any robots.txt and can be crawled as you like.

if you want more about it, see https://developers.google.com/webmasters/control-crawl-index...

  >The directives listed in the robots.txt file apply only to 
  >the host, protocol and port number where the file is hosted.


Saying that "newyork.craigslist.org does not have any robots.txt and can be crawled as you like" is false. Search engines follow redirects until valid robots.txt files are found. From that same document you linked:

3xx (redirection)

Redirects will generally be followed until a valid result can be found (or a loop is recognized). We will follow a limited number of redirect hops (RFC 1945 for HTTP/1.0 allows up to 5 hops) and then stop and treat it as a 404. Handling of robots.txt redirects to disallowed URLs is undefined and discouraged. Handling of logical redirects for the robots.txt file based on HTML content that returns 2xx (frames, JavaScript, or meta refresh-type redirects) is undefined and discouraged.


Moreover, removing the exclusive license provision retains craigslist’s compatibility with common licensing schemes, like the Creative Commons ShareAlike license or the GNU Free Documentation License.

Hold on a sec here. Are they saying that Craigslist postings are now licensed under CC ShareALike (http://creativecommons.org/licenses/by-sa/3.0/)? If so that's probably the most substantial point in this post.


No, they seem to be saying that a Craigslist poster may, if they choose, license their own post under CC-BY-SA. Why anyone would bother to do that is beyond me.

And I'm not convinced it's true given that the terms of use, last updated in February, still say "You also expressly grant and assign to CL all rights and causes of action to prohibit and enforce against any unauthorized copying, performance, display, distribution, use or exploitation of, or creation of derivative works from, any content that you post (including but not limited to any unauthorized downloading, extraction, harvesting, collection or aggregation of content that you post)." http://www.craigslist.org/about/terms.of.use

Very disappointing post from EFF here. The meaningless Creative Commons bit you quoted is a fine illustration of what toothless nonsense the whole thing is. It seems they have a major conflict of interest with the legal bully's founder (Craig Newmark) sitting on their Advisory Board.


If this is indeed the case then that's absolutely ridiculous. I expected more from the EFF than chummy back-slapping and congratulatory praise for repealing a legally baffling policy.


They no longer need the exclusive since PadMapper's workaround has been effectively killed by the changes Craigslist made to its robots.txt, Google is no longer allowed to index apartments thus PadMapper can not scrape the Google cache.


craigslist didn't change robots.txt:

$ wget -q -O- --save-headers http://www.craigslist.org/robots.txt | fgrep Last-Modified

Last-Modified: Fri, 04 Nov 2011 18:13:24 GMT


Does this mean that Craigslist will no longer be taking legal action against Padmapper or will that still be going ahead? This is great news though, it's amazing what a little (well a lot) of backlash can accomplish.


Does this effectively mean Padmapper can use CL data once again?


CL has also removed themselves from search engines, so PadMapper can no longer use the google cache workaround.

I don't think they can go back to scraping either, but I'm not certain of that aspect of it.

This may be the real reason behind the change. Perhaps CL just doesn't need it any more.


That sucks, searching is how I double-checked dubious craigslist posts. Select a couple phrases and search google. If identical posts pop up under various other cities on craigslist, then I knew for sure it was spam/fraud.


Just one of the many ways third parties fix broken things on Craigslist. Things Craigslist refuses to recognize.


I think this is correct, though the EFF post is too vague, and doesn't quote the passage in its current state.

Also, CL has not removed itself from search engines. They're using NOARCHIVE.


Last I checked, robots.txt excludes all crawlers from at least several major classifications, including housing listings (/hhh):

    User-agent: *
    Disallow: /cgi-bin
    Disallow: /cgi-secure
    Disallow: /forums
    Disallow: /search
    Disallow: /res/
    Disallow: /post
    Disallow: /email.friend
    Disallow: /eaf
    Disallow: /reply
    Disallow: /?flagCode
    Disallow: /ccc
    Disallow: /hhh
    Disallow: /sss
    Disallow: /bbb
    Disallow: /ggg
    Disallow: /jjj
    Disallow: /*rss$
http://www.craigslist.org/robots.txt

'jjj' is jobs, 'ggg' is gigs, 'bbb' is services, 'sss' is for sale, 'ccc' is community, 'res' is resumes.



Point. I mistakenly assumed /aaa covered all of housing.

Looks like the same principle applies to a number of other subcategories. So while overview indices aren't spiderable, subindices are.


Google still provides a screenshot of CL pages - I doubt it would take much work for an enterprising hacker or 3taps to hook an OCR library up to that.


But wouldn't that be insane on bandwidth?


Inbound bandwidth is free.


I'm talking about for downloading all those pictures.


That's the point he was making - inbound traffic to AWS is $0/GB.


The exclusive license effectively meant that you couldn't cross-post to both CraigsList and PadMapper because you had granted exclusivity to CraigsList.


Ok makes sense, so that legitimizes something like PadLister. Does this however enable one to scrape CL for data?


...this does not legitimize anything in particular. It says it's OK for users to post ads on multiple sites, not that other sites can scrape CL's user content.

This is akin to freelance contract work: if you're a news photographer, there's usually a clause saying that while you can own your photos, you may not sell them to your client's competitors (at least right away).

So CL is saying to its users: Go ahead and copy and paste your stuff to other sites. So, not a huge difference by any means in what most users assumed.


PadMapper was doing the cross-posting to PadList to slowly build its user base. I however did not realize that Craigslist frowned upon that until the change in the agreement.


PadMapper doesn't take CL listings and put it into Padlister. What's listed on Padlister was listed by users.

PadMapper puts CL listings and Padlister listings on the same map.


My understanding is that this is a separate legal issue. CL's complaint against PadMapper is for scraping, not with users reposting on PadLister.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: