What I Learned As A Sysadmin

My first job out of college was working for a defense contractor, splitting time between programming and system administration. It was a small company with a network of SGI machines. I had being working as an intern at the San Diego Supercomputer Center, working on a distributed volume visualization system. I knew that SGI machines were fun to work on and I liked graphics. I’d be able to do both at this new job, and I’d get some sysadmin experience. It sounded great.
Jim Patterson was the main system administrator and one of my bosses. Unfortunately, it didn’t take me long to realize that the job wasn’t for me. Specificially, the world of defense wasn’t for me; I just wasn’t interested in what everybody was working on. But even though I wasn’t terribly happy with the work, I’m very glad I took the job. Jim was a great mentor, and I learned a lot about system administration from him. I think every programmer should have some sysadmin experience; I’m convinced it makes you a better programmer.
The IRIX version of Unix that SGI shipped with their boxes was considered state of the art for its time. Very stable (at least the 4.0 version that I initially used), and easy to program to. Even so, it taught me to avoid some things. Specifically, NFS and NIS never seemed to work correctly. NFS, in particular, would cause machines to freeze up randomly. It was such a pain that I vowed to never use NFS voluntarily again. And I haven’t used it since.
How did this experience effect the architecture of ONElist and, now, Bloglines? NFS is used to distribute access to a partition (ie. data) to multiple machines. Instead of using NFS, I take two approaches. For read-only information, I copy files between machines and have the programs just read them off the local disk. It’s simple and scales very well, as long as you’re not copying huge amounts of data every 10 seconds or so.
In the other case, where I’m dealing with either large quantities of information, or re-writable data, I create simple client/server applications. Yes, this can be a bit of work the first time you do it, but it’s easy enough to create a library that makes creating these client/server pairs really easy. By doing this, you can also instrument them so that you can get all sorts of statistics that you’d be hard pressed to get using NFS. For each server, I also have a ‘ping’ application that queries the server for various statistics. The servers get pinged once a minute, and are hooked up to a monitoring/paging system, so we know quickly if there’s a problem.
There are many other design decisions that must be made when building a scalable on-line service, like how you will partition and distribute the various data, but I’ve procrastinated enough for today. Time to get back to work.