Not a Shining Day in Bloglines History

Bloglines has been sluggish at times the past couple of days. Our user numbers and web traffic keep increasing at a good clip, and we reached a point where the existing hardware couldn’t handle it. So we added more hardware yesterday and moved one of the databases around. That helped with the load, but it uncovered another bottleneck, which we were dealing with today. And then this evening, our main web server died. I haven’t had a chance to go to the co-lo, so I don’t know what’s wrong with it other than it’s down and isn’t coming back up. We have backup machines, and were able to switch over to one of them (we’ll be running on multiple web machines when we switch co-los in another week, but not at the moment), but the site was off-line for about half an hour. In addition, the favicons (the little icons next to the subscriptions), are off-line for the moment.
We’ve spent a lot of time designing the Bloglines system to be scalable, and it is. But sometimes it’s difficult to simulate actual loads and therefore sometimes bottlenecks slip through, only to be uncovered on the running system. Scalability is both a science and an art. Scaling a system is a good problem to have, it means that people are using the service. And they are. But it can take some tweaking to get right. You spend time before launch to design a good system, then you run like hell after launch to keep up with the growth.
So, thanks to everyone who uses Bloglines, and I apologize for the system issues. Hopefully they aren’t too noticable and they should be cleared up soon.

Advertisements