Don't be evil, Craigslist
Yesterday (Nov 30th 2009), at approximately 2pm PST, Craigslist effectively killed all Yahoo Pipes projects that use Craigslist as a data source. For those unfamiliar with Yahoo Pipes, it's a nifty tool that allows you to take data from anywhere on the net, mash it up, parse it, and produce interesting results. One of its benefits is that it caches the data, so the original source does not get pummeled on every request.
Our project, known as Flippity, used pipes as the primary data source and was in its alpha stages of development. We were one of 2,111 pipes that used Craigslist on Yahoo Pipes, all of which to my knowledge were non-commercial, created solely as fun projects to provide a new perspective on data already available.
What's most interesting is perhaps the timing of all this. You see, about 4 days ago I posted a thread on hacker news asking for feedback on what my friend Dan & I have come up with so far -- a mashup that simply plots Craigslist listings on a map, and allows you to do a radius search around any location. It was, in essence, HousingMaps.com generalized across all listings.
Having exchanged emails with the Craigslist founder several weeks ago (see previous blog post) in which he expressed interest in seeing more, I decided to follow up and give him the closed alpha link.
Fri, November 27, 2009 3:14:06 PM
Hi Craig,You asked to hear more about this, so here you go!http://www.flippity.com/alphaThis is currently closed to the public, but I've received significant positive feedback from friends.Would like to hear what you think,Romy
As you can see, this was sent 4 days ago. I received no response, which is unusual as Craig responded within minutes to my original several emails. Oh well, I thought to myself, we'll just keep working on it and see what happens.
I couldn't believe my eyes when I saw what Craigslist had done. They literally added a check for "pipes.yahoo.com" in the referrer header of any HTTP request, which was then redirected to the home page. In essence, they blocked them. Really, Craig ? This is your response ? Allow me to quote the first email you ever sent me:
... thanks! and as a rule of thumb, okay to use RSS feeds for noncommercial purposes.
Well, we are using RSS feeds for noncommercial purposes. So were 2,110 other people. And you just shafted all of them, not to mention Yahoo itself. May I ask why ?
I mean really, these pipes aren't all that popular so if you told me your servers were getting hammered, I would say that's unlikely. So what is it then ? What did these people do that was so wrong that it merited such a response ?
Anyway, it's a sad day for me. I'm not too upset about my own project, as Flippity was already removing Craigslist as a data source. With the likes of eBay and Oodle not only providing open APIs but encouraging and rewarding developers, spending my time wrestling with Craigslist is just plain stupid and exhausting. I'm sure I'm not the only person to have come to that conclusion, and I wish it were different.
By the way, it's not too hard to defeat any technical measure Craigslist can put up. We could, for instance, build a peer to peer network that obtains bits and pieces of data from Craigslist via hundreds of IPs with randomized time intervals. Or build a Java applet that we distribute to our users, having them exchange data with us a la BitTorrent. There's very little Craigslist would be able to do to counter. However, it's just not worth my time. If Craigslist wants to keep its doors shut to the world, so be it.
[ If you're interested in startups, or just the future of flippity, I will be documenting our journey every step of the way. Subscribing to the blog will show me someone cares. ]
For example:
http://www.scriptlets.org/run/oo3ohh?url=http://sfbay.craigslist.org/rea/index.rss
view the source here: http://www.scriptlets.org/view/oo3ohh
Creating a proxy is pretty easy these days. Host it either on your own server or use some service out there. Then plug that url into the Fetch Feed module.
Hope this helps.
http://www.oodle.com/info/api
http://developer.oodle.com/
We even have a mappable tag so you can query for listings that are mappable. Note, it doesn't look like this tag is documented yet, I'll get that updated.
I crashed on this post through Hacker News, and by chance I'm creating a product that basically is resolving third party's API problems and your problem too, in some way.
We will lunch a private version in January. I will let you know...
Cheers!
also works..
source: http://apps.jgate.de/platform/source?pipesproxy
I tried yahoo pipes once i was banned but eventually just shut the whole thing down cuz i figured fighting CL wasn't worth the hassle, especially for a free (non-commercial) web app.
I feel your pain dude.
Arin
First there are numerous ways of controlling what and how Pipes can access your content. Most are clearly documented here: http://pipes.yahoo.com/pipes/docs?doc=troubleshooting We even have a new mechanism, if you look at the HTTP referer you can see WHICH pipe is actually accessing your content (and either block that specifically or let us know and we'll generally remove it). We need to update the docs with that new way.
Secondly, Pipes is not a "crawler". Pipes fetches data that used in a Pipe "on demand". Very much like a smart proxy with some code in it that people create. It goes through a few very aggressive caches but ultimately if you have content that Pipes is accessing then its because someone is running a Pipe a lot (and changing the parameters in it to avoid caching). Again, you can stop that one at your server or let us know and we'll remove it. There's an abuse link on every Pipe page. In essence, if someone is being too abusive for your content every Pipe request has detailed information on how to stop that particular person or Pipe rather than Pipes itself.
1. An email from Craig is a personal message from him to you. It does NOT belong to puplicity or in any blog. You can maybe tell users what he wrote in genereal but quoting whole messages is a very bad habit that emailing and the interent brought along. (that being told from someone that still remembers writing snail mails)
2. Its also a bad habit to complain in blogs about other people or websites.
3. Think about the craigslist view: They are market leader by far, they get 200 mails a day from people like you, asking for cooperations, feeds, data, approval, anything. Craig tries to be nice, answering you. He is 100% free in what he is doing and which services he blocks or not. He does also not have any obligation to answer all your mails. Propably he has more important things to do.
4. Why is he blocking Yahoo! Pipes? Because its causing traffic, requests and moving away traffic from craigslist to other pages. When all visitors on craigslist will access data from somewhere else, craigslist will be dead.
Our service is thinking about an API since years. We did not have any till now. All people tell us: Hey you are the Web 1.0. You ll gonna loose. But I think its not that simple. Giving things away for free without any revenue impact does not make any sense. Maybe if you have a business modell that is based on transactions. But thats something Craigslist does not offer.
So I do totally understand Craig!
Cheers
I'll agree on #1, could have gone without it. However, that's where our agreements end.
#2 - it's called freedom of speech
#3 - actually, he doesn't have anything more important to do. he demoted himself to customer service so that he can spend his days answering emails. i'll agree it was somewhat stupid of me to email him because he doesn't really care too much about all this.
#4 - what? if everyone accessed craigslist from somewhere else, craigslist would have the same amount of traffic and revenue OR MORE. The only way i can think of that craigslist can lose users is if someone starts mixing craigslist results together with their own results, and eventually stops showing craigslist results without the user caring or noticing. And today this is pretty much impossible.
nobody is giving things away without impact. An API cannot decrease the amount of traffic they get, it can only increase it. More traffic means more views. More views means more incentive for users to buy paid listings. Plus more users who can buy those paid listings. All that translates to equal or more revenue, not less. And if you're going to use bandwidth as an argument, guess what, they're already growing by like a million users per month. How's that different ?
and just because you chose not to have an API doesn't mean you made the optimal decision. You might be 2-3x bigger if you did, but how would you know ?
The people that want to use Craigslist APIs are just the same: intermediaries who want to direct more traffic to Craigslist, not take it away. The API makes it easier to do that in a relevant, context-based way that the user will understand better.
So we've put together a number of enhancements in order to minimize impact (no retry; no more than 200 classifieds retrieved).
After additional follow-ups with the craigslist' tech team, and a couple of days before our initial private launch date, we got a response from Jim B. (craigslist' CEO) stating that "RSS feeds are for personal use only, not for use in 3rd party services".
Bummer.
So at this point we treat craigslist just like any RSS feed reader would - and take our users through the somewhat convoluted process of adding the RSS feed manually.
But that's not enough. It seems any service which makes multiple calls to craigslist will get their IP blocked instantly.
The way we circumvent the issue is by proxying through an Internet behemoth (Google Apps anyone?).
All in all - the technical issue of bypassing the blocking is fairly trivial - the question has more to do with sticking to the letter and the spirit of craigslist' Terms of Use.
developing Flippity and liked it.
Seems you got farther along in terms of communicating with them,
though the end result is basically the same.
And yeah, bypassing the block is indeed trivial, but as long as CL
treats developers like we're pissing in their pool, I see no reason to
get involved.
How's ebay & oodle working out ? Shoot me an IM @slay2k
Feel free to ping me if you need more details - easiest way is through email - saintclare dot bob at gmail.
Good luck with everything
Unfortunately, your story of getting blocked is an old one. Examples abound, but one that sticks out in my mind is from 2007. Developer Ryan Sit developed a cool thumbnail gallery view of CL listings called ListPic. People loved it. But you know what happens next ...
"Craigslist cuts off Listpic, cites bandwidth issues, TOS violations"
http://news.cnet.com/8301-17939_109-9727521-2.html
In response, Ryan applied for an Oodle API key and was up and going on our API in a week or so. The site is still live at listpic.com.
So yeah, we know how it goes. In any case, we're happy to help - it's awesome to see the cool things that developers cook up on the API. If you have any questions, hit me up at api@oodle.com. Good luck with everything!
-Steve
realty homes