RSS Scaling Issues

Chad Dickerson has a really interesting article up at Infoworld in which he talks about the problem with desktop RSS aggregators:

    Fast forwarding to the present, now sees a massive surge of RSS newsreader activity at the top of every hour, presumably because most people configure their newsreaders to wake up at that time to pull their feeds. If I didnt know how RSS worked, I would think we were being slammed by a bunch of zombies sitting on compromised home PCs. Our hourly RSS surge has all the characteristics of a distributed DoS attack, and although the requests are legitimate and small, the sheer number of requests in that short time period creates some aggravating scaling issues.

This is the scaling problem that I’ve been talking about since we launched Bloglines a year ago. It’s a serious concern. Centralized services like Bloglines avoid this problem because we only fetch a feed once regardless of how many subscribers we have to it. Desktop aggregators can’t do that, of course, and end up generating huge amounts of traffic to sites like Infoworld. There are various things that a desktop aggregator can do to mitigate the load, like using the HTTP last-modified header and supporting gzip compression. But the aggregator still has to query the server, so there will always be a load issue.
Because Bloglines has a vested interest in increasing RSS (in the generic sense) adoption, we’re looking at ways we can help. We are working on a couple of projects right now, and we’re of course open to suggestions.