Don't be evil, Craigslist

Yesterday (Nov 30th 2009), at approximately 2pm PST, Craigslist effectively killed all Yahoo Pipes projects that use Craigslist as a data source. For those unfamiliar with Yahoo Pipes, it's a nifty tool that allows you to take data from anywhere on the net, mash it up, parse it, and produce interesting results. One of its benefits is that it caches the data, so the original source does not get pummeled on every request.

Our project, known as Flippity, used pipes as the primary data source and was in its alpha stages of development. We were one of 2,111 pipes that used Craigslist on Yahoo Pipes, all of which to my knowledge were non-commercial, created solely as fun projects to provide a new perspective on data already available.

What's most interesting is perhaps the timing of all this. You see, about 4 days ago I posted a thread on hacker news asking for feedback on what my friend Dan & I have come up with so far -- a mashup that simply plots Craigslist listings on a map, and allows you to do a radius search around any location. It was, in essence, HousingMaps.com generalized across all listings.

Having exchanged emails with the Craigslist founder several weeks ago (see previous blog post) in which he expressed interest in seeing more, I decided to follow up and give him the closed alpha link.

Fri, November 27, 2009 3:14:06 PM

Hi Craig,

You asked to hear more about this, so here you go!

http://www.flippity.com/alpha

This is currently closed to the public, but I've received significant positive feedback from friends.

Would like to hear what you think,

Romy

As you can see, this was sent 4 days ago. I received no response, which is unusual as Craig responded within minutes to my original several emails. Oh well, I thought to myself, we'll just keep working on it and see what happens.

I couldn't believe my eyes when I saw what Craigslist had done. They literally added a check for "pipes.yahoo.com" in the referrer header of any HTTP request, which was then redirected to the home page. In essence, they blocked them. Really, Craig ? This is your response ? Allow me to quote the first email you ever sent me:

... thanks! and as a rule of thumb, okay to use RSS feeds for noncommercial purposes.

Well, we are using RSS feeds for noncommercial purposes. So were 2,110 other people. And you just shafted all of them, not to mention Yahoo itself. May I ask why ?

I mean really, these pipes aren't all that popular so if you told me your servers were getting hammered, I would say that's unlikely. So what is it then ? What did these people do that was so wrong that it merited such a response ?

Anyway, it's a sad day for me. I'm not too upset about my own project, as Flippity was already removing Craigslist as a data source. With the likes of eBay and Oodle not only providing open APIs but encouraging and rewarding developers, spending my time wrestling with Craigslist is just plain stupid and exhausting. I'm sure I'm not the only person to have come to that conclusion, and I wish it were different.

By the way, it's not too hard to defeat any technical measure Craigslist can put up. We could, for instance, build a peer to peer network that obtains bits and pieces of data from Craigslist via hundreds of IPs with randomized time intervals. Or build a Java applet that we distribute to our users, having them exchange data with us a la BitTorrent. There's very little Craigslist would be able to do to counter. However, it's just not worth my time. If Craigslist wants to keep its doors shut to the world, so be it.

[ If you're interested in startups, or just the future of flippity, I will be documenting our journey every step of the way. Subscribing to the blog will show me someone cares. ]

Filed under  //   craigslist flippity yahoo pipes  

Comments (33)

Dec 01, 2009
pjdonnelly said...
Or create a simple proxy that calls craigslist.

For example:

http://www.scriptlets.org/run/oo3ohh?url=http://sfbay.craigslist.org/rea/index.rss

view the source here: http://www.scriptlets.org/view/oo3ohh

Creating a proxy is pretty easy these days. Host it either on your own server or use some service out there. Then plug that url into the Fetch Feed module.

Hope this helps.

Dec 01, 2009
Isaac Schlueter liked this post.
Dec 01, 2009
 said...
Oodle has a free API for access to tens of millions of classifieds listings.
http://www.oodle.com/info/api
http://developer.oodle.com/

We even have a mappable tag so you can query for listings that are mappable. Note, it doesn't look like this tag is documented yet, I'll get that updated.

Dec 01, 2009
mashaper said...
HI Romy,
I crashed on this post through Hacker News, and by chance I'm creating a product that basically is resolving third party's API problems and your problem too, in some way.

We will lunch a private version in January. I will let you know...

Cheers!

Dec 01, 2009
pjdonnelly said...
http://pipesproxy.jgate.de/?url=http://sfbay.craigslist.org/rea/index.rss

also works..

source: http://apps.jgate.de/platform/source?pipesproxy

Dec 01, 2009
sh1mmer said...
YQL still works, so you could probably use the YQL module in pipes to query craigslist.
Dec 01, 2009
Kim Landwehr said...
Unfortunately, this is not the first time Craiglist has used this kind of tactic and probably will not be the last.
Dec 01, 2009
mcurry said...
I'll defend craigslist. I run a site that has around 4500 custom RSS feeds. According to the yahoo pipes search there are 2 pipes that hit my site using a total of 5 feeds. Yet looking at my server logs I see an abnormal amount of pipes requests. I don't know if the pipes search is off or pipes just hits the server an abnormal amount, but I finally just blocked pipes.
Dec 01, 2009
pbausch said...
Yeah I'm with mcurry, the Yahoo Pipes bot is extremely aggressive and there's no way to set a crawl-delay or control how often they access feeds. It's all or nothing with them, so I think some of the blame for this situation should rest with Yahoo. Maybe this will nudge the Pipes folks toward tuning their bot or giving site owners some control beyond blocking their IPs.
Dec 01, 2009
Jonathan Lin liked this post.
Dec 01, 2009
altrenda said...
I think your last sentence says it all. "If Craigslist wants to keep its doors shut to the world, so be it." Why force it, just ignore them.
Dec 02, 2009
xbuzz said...
Hopefully the decision to block Pipes will be reversed. There are so many excellent non-commercial YP projects on the net I have found very useful.
Dec 02, 2009
snay2 said...
What I don't understand is why Craigslist seems to want to quash the use of their data through services. I've heard of more than one instance of something like this happening. I mean, okay if their servers are getting hosed by an inefficient Yahoo Pipes implementation. But it seems like they're against a lot of these mash-up projects that would ultimately lead to more business for them.
Dec 02, 2009
Terry McDonald said...
Quite a response from CL! But I side with the Pipes folks, one of Yahoo's most useful features. Indeed, if you listen to the Yahoo engineer who created it, it was to parse CL data for what he was looking for.
Dec 02, 2009
coolass said...
If pipes is too aggressive this is a Yahoo issue, if not, it seems odd that they would block it.
Dec 02, 2009
 said...
i had a little mashup a while ago called craigslittlebuddy.com that allowed folks to search multiple CL cities' classifieds at the same time... my app was "banned" from CL HTTP access too.

I tried yahoo pipes once i was banned but eventually just shut the whole thing down cuz i figured fighting CL wasn't worth the hassle, especially for a free (non-commercial) web app.

I feel your pain dude.

Arin

Dec 02, 2009
jonathantrevor said...
There's a little bit of misunderstanding here in the comments about Pipes.

First there are numerous ways of controlling what and how Pipes can access your content. Most are clearly documented here: http://pipes.yahoo.com/pipes/docs?doc=troubleshooting We even have a new mechanism, if you look at the HTTP referer you can see WHICH pipe is actually accessing your content (and either block that specifically or let us know and we'll generally remove it). We need to update the docs with that new way.

Secondly, Pipes is not a "crawler". Pipes fetches data that used in a Pipe "on demand". Very much like a smart proxy with some code in it that people create. It goes through a few very aggressive caches but ultimately if you have content that Pipes is accessing then its because someone is running a Pipe a lot (and changing the parameters in it to avoid caching). Again, you can stop that one at your server or let us know and we'll remove it. There's an abuse link on every Pipe page. In essence, if someone is being too abusive for your content every Pipe request has detailed information on how to stop that particular person or Pipe rather than Pipes itself.

Dec 02, 2009
That's a shame. I found my current apartment by piping rental listing addresses onto a map, which took all of 15 minutes with Yahoo pipes. Hopefully CL will realize that this is blocking a lot of legitimate users, and that their minimal UI really needs an API or at least open access to their RSS'd data.
Dec 02, 2009
Michael Jung liked this post.
Dec 03, 2009
sebbo said...
I'll also defend craigslist. I do run a similar marketplace website ouitside the US with more than 60 Mio Pageviews and 3 Mio Visits. Some thoughts:

1. An email from Craig is a personal message from him to you. It does NOT belong to puplicity or in any blog. You can maybe tell users what he wrote in genereal but quoting whole messages is a very bad habit that emailing and the interent brought along. (that being told from someone that still remembers writing snail mails)

2. Its also a bad habit to complain in blogs about other people or websites.

3. Think about the craigslist view: They are market leader by far, they get 200 mails a day from people like you, asking for cooperations, feeds, data, approval, anything. Craig tries to be nice, answering you. He is 100% free in what he is doing and which services he blocks or not. He does also not have any obligation to answer all your mails. Propably he has more important things to do.

4. Why is he blocking Yahoo! Pipes? Because its causing traffic, requests and moving away traffic from craigslist to other pages. When all visitors on craigslist will access data from somewhere else, craigslist will be dead.

Our service is thinking about an API since years. We did not have any till now. All people tell us: Hey you are the Web 1.0. You ll gonna loose. But I think its not that simple. Giving things away for free without any revenue impact does not make any sense. Maybe if you have a business modell that is based on transactions. But thats something Craigslist does not offer.

So I do totally understand Craig!

Cheers

Dec 03, 2009
Romy Maxwell said...
Thanks for the comment sebbo.

I'll agree on #1, could have gone without it. However, that's where our agreements end.

#2 - it's called freedom of speech
#3 - actually, he doesn't have anything more important to do. he demoted himself to customer service so that he can spend his days answering emails. i'll agree it was somewhat stupid of me to email him because he doesn't really care too much about all this.
#4 - what? if everyone accessed craigslist from somewhere else, craigslist would have the same amount of traffic and revenue OR MORE. The only way i can think of that craigslist can lose users is if someone starts mixing craigslist results together with their own results, and eventually stops showing craigslist results without the user caring or noticing. And today this is pretty much impossible.

nobody is giving things away without impact. An API cannot decrease the amount of traffic they get, it can only increase it. More traffic means more views. More views means more incentive for users to buy paid listings. Plus more users who can buy those paid listings. All that translates to equal or more revenue, not less. And if you're going to use bandwidth as an argument, guess what, they're already growing by like a million users per month. How's that different ?

and just because you chose not to have an API doesn't mean you made the optimal decision. You might be 2-3x bigger if you did, but how would you know ?

Dec 03, 2009
snay2 said...
sebbo, about that #4. The way I see it, APIs are designed to get your data in as many relevant places as possible so that users will eventually come to your site themselves because they like your services. For example, if your site never gets listed on Google, most people will never get to your site. Not because it's important that Google lists you, but because Google acts as an intermediary between the customer and you to provide a relevant context for your services.

The people that want to use Craigslist APIs are just the same: intermediaries who want to direct more traffic to Craigslist, not take it away. The API makes it easier to do that in a relevant, context-based way that the user will understand better.

Dec 03, 2009
Bryan Bortz said...
This is horrible, it prolly broke so many sites.
Dec 03, 2009
rchk liked this post.
Dec 04, 2009
joedevon said...
Here's the craigslist official note on the topic via @jzawodn http://blog.craigslist.org/2009/12/pipes-faucets/
Dec 06, 2009
bobsaintclare said...
We had a similar issue about a year ago: contacted Craig about our project (tinkomatic.com a classifieds and auctions search and monitoring service - currently with direct support for Oodle, Kijiji, and eBay) and got an initial response from Craig stating concerns about the possible strain on their RSS servers.

So we've put together a number of enhancements in order to minimize impact (no retry; no more than 200 classifieds retrieved).

After additional follow-ups with the craigslist' tech team, and a couple of days before our initial private launch date, we got a response from Jim B. (craigslist' CEO) stating that "RSS feeds are for personal use only, not for use in 3rd party services".

Bummer.

So at this point we treat craigslist just like any RSS feed reader would - and take our users through the somewhat convoluted process of adding the RSS feed manually.

But that's not enough. It seems any service which makes multiple calls to craigslist will get their IP blocked instantly.
The way we circumvent the issue is by proxying through an Internet behemoth (Google Apps anyone?).

All in all - the technical issue of bypassing the blocking is fairly trivial - the question has more to do with sticking to the letter and the spirit of craigslist' Terms of Use.

Dec 06, 2009
Romy Maxwell said...
Thanks for sharing. I've actually run across your site when we were
developing Flippity and liked it.

Seems you got farther along in terms of communicating with them,
though the end result is basically the same.

And yeah, bypassing the block is indeed trivial, but as long as CL
treats developers like we're pissing in their pool, I see no reason to
get involved.

How's ebay & oodle working out ? Shoot me an IM @slay2k
Dec 06, 2009
bobsaintclare said...
Oodle was a snap to add - the most developer-friendly API we've seen so far. eBay is a little bit more involved; but they do have good support.
Feel free to ping me if you need more details - easiest way is through email - saintclare dot bob at gmail.
Good luck with everything
Dec 07, 2009
sbaker said...
@BobSaintclare and @Romy: Steve Baker here from Oodle. I'm the lead engineer on the Oodle API and I just wanted to drop in and give you a shout out to say we got your back.

Unfortunately, your story of getting blocked is an old one. Examples abound, but one that sticks out in my mind is from 2007. Developer Ryan Sit developed a cool thumbnail gallery view of CL listings called ListPic. People loved it. But you know what happens next ...

"Craigslist cuts off Listpic, cites bandwidth issues, TOS violations"
http://news.cnet.com/8301-17939_109-9727521-2.html

In response, Ryan applied for an Oodle API key and was up and going on our API in a week or so. The site is still live at listpic.com.

So yeah, we know how it goes. In any case, we're happy to help - it's awesome to see the cool things that developers cook up on the API. If you have any questions, hit me up at api@oodle.com. Good luck with everything!

-Steve

Dec 16, 2009
 said...
Looks like we're back in business. My Pipes that use craiglists feeds are working again !!
Dec 16, 2009
Romy Maxwell said...
We've been back in business way before that ;)
Aug 09, 2010
 said...
Yeah I'm with mcurry, the Yahoo Pipes bot is extremely aggressive and there's no way to set a crawl-delay or control how often they access feeds. It's all or nothing with them, so I think some of the blame for this situation should rest with Yahoo. Maybe this will nudge the Pipes folks toward tuning their bot or giving site owners some control beyond blocking their IPs.Inbound Links
Feb 01, 2011
gmmo said...
Yet looking at my server logs I see an abnormal amount of pipes requests. I don't know if the pipes search is off or pipes just hits the server an abnormal amount, but I finally just blocked pipes.
realty homes

Leave a comment...