What Runs Groups.io

I always appreciate when people talk about how they’ve built a particular piece of software or a web service, so I thought I’d talk about some of the architecture choices I made when building Groups.io, my recently launched email groups service. This will be a multi-part series.

Go

One of the goals I had when I first started working on Groups.io was to use it as an opportunity to learn the new language Go. Groups.io is written completely in Go and is my first project in the language. As a diehard C programmer (ONElist was written in C, and Bloglines was written in C++), it took very little time to get up to speed on Go and I now consider myself a huge fan of the language. There are many reasons why I like to code in Go. It’s compiled, so it’s fast and you get all the code checks you miss from interpreted languages. It generates stand alone binaries, which is great for distributing to production machines. It’s got a great standard library. It’s easy to write multithreaded code (threads are called goroutines). The documentation system is good. But besides all that, the philosophy behind Go just fits my mental model better than any other language I’ve worked in. It all combines to make programming in Go the most fun I’ve had coding in a very long time.

Components

Groups.io consists of several components that interact with each other. All interactions are done using JSON over HTTP.

Web

The web server handles all web traffic, naturally. It is proxied behind nginx, because I believe that makes for a more flexible and slightly more secure system. Nginx terminates the encrypted HTTPS traffic and passes the unencrypted traffic to the web process. We use the standard Go HTML template system for our web templates, and we use several parts of the Gorilla web toolkit. We use Bootstrap for our HTML framework.

Smtpd

The smtpd daemon handles incoming SMTP traffic for the groups.io domain. It is also proxied behind nginx. The email it handles consists mainly of group messages, although there are some other messages as well, including bounce messages. It sends group and bounce messages to the messageserver for processing. Other messages are forwarded, using a set of rules, to other email addresses. We based smtpd heavily on Go-Guerrilla’s SMTPd.

Messageserver

The messageserver daemon processes group messages, bounce messages and email commands. For group messages, it verifies that the poster is subscribed and has permission to post to the group, it archives the message and sends it out to the group subscribers, using Karl to send the messages. It also sends the messages to our Elasticsearch cluster. Bounce and email command messages are processed as well. All group messages are processed through the messageserver, whether they arrive through the smtpd, or whether they were posted through the web site.

Karl

Karl, named after Karl ‘The Mailman’ Malone, is our email sending process. It is responsible for all emails originating from the groups.io domain. It is passed an email message, a footer template, a sender, and a set of data about each receiver the message should be sent to. For each receiver, it evaluates the template, inserting subscriber specific information, and then merges it with the email message before sending it out. It also handles DKIM signing of emails. It stores all emails using Google’s leveldb database until they are successfully sent.

A reasonable question to ask is why didn’t I outsource the email delivery part of the service. There are several companies that provide email delivery outsourcing. In general, outsourcing is a way to save development time. But when I thought about it, I did not think I’d be able to save much time by outsourcing; I’d still have to connect our data with whatever templating system the email delivery service used. And Karl did not take very long to write. But more importantly, email delivery is a core competency of our service and I believe we have to own that.

Errord

Errord is a simple logging process, used to log error messages and stack traces from any core dumps in any of the other processes. I can look at the errord log and instantly see if anything in the system has crashed and where it crashed.

Rsscrawler, Instagramcrawler

Rsscrawler and instagramcrawler are cronjobs that deal with the Feed and Instagram integrations, respectively. Rsscrawler looks for updates in feeds that are integrated with our groups, and Instagramcrawler does the same for instagram accounts. They’re currently run twice an hour. If they find an update, they generate a group message and pass it along to the messageserver.

Bouncer

Bouncer is a cronjob that is run once a day to manage bouncing users.

Expirethreads

Expirethreads is a cronjob that’s run twice an hour to expire threads that are tagged with hashtags that have an expiration.

Senddigests

Senddigests is a cronjob that’s run once a night, to generate digest emails for users with digest subscriptions.

Next Time

In future articles, I’ll talk about the machine cluster running Groups.io, the database design behind the service, and some aspects of the code itself. Are there any specific topics you’d like me to address? Please let me know.

Are you unhappy with Yahoo Groups or Google Groups? Or are you looking for an email groups service for your company? Please try Groups.io.

Introducing Groups.io

I’m not one to live in the past (well, except maybe for A-Team re-runs), but for many years now, I’ve felt like I’ve had unfinished business. I started the service ONElist in 1998. ONElist made it easy for people to create, manage, run and find email groups. As it grew over the next two and a half years, we expanded, changed our name to eGroups, and, in the summer of 2000, were acquired by Yahoo. The service was renamed Yahoo Groups, and I left the company to pursue other startups.

But really this story starts even further back, in the Winter of 1989, when in college I was introduced to mailing lists. I was instantly hooked. It was obvious that a mailing list was a great way to communicate with a group of people about a common interest. I started subscribing to lists dedicated to my favorite bands (’80’s Hair Metal, anyone?). I joined a list for a local running club. And, at every company I’ve worked at since graduating, there have been invaluable internal company mailing lists.

But that doesn’t mean that mailing lists can’t improve. And this is where we get back to the unfinished business. Because email groups (the modern version of mailing lists) have stagnated over the past decade. Yahoo Groups and Google Groups both exude the dank air of benign neglect. Google Groups hasn’t been updated in years, and some of Yahoo’s recent changes have actually made Yahoo Groups worse! And yet, millions of people put up with this uncertainty and neglect, because email groups are still one of the best ways to communicate with groups of people. And I have a plan to make them even better.

So today I’m launching Groups.io in beta, to bring email groups into the 21st Century. At launch, we have many features that those other services don’t have, including:

  • Integration with other services, including: Github, Google Hangouts, Dropbox, Instagram, Facebook Pages, and the ability to import Feeds into your groups.
  • Businesses and organizations can have their own private groups on their own subdomain.
  • Better archive organization, using hashtags.
  • Many more email delivery options.
  • The ability to mute threads or hashtags.
  • Fully searchable archives, including searching within attachments.

We’re just starting out; following the tradition of new startups everywhere, we’re in Beta. We’re working hard to squash the inevitable bugs and work to make the system even better (based on your feedback!).

I’m passionate about email groups. They are one of the very best things about the Internet and, with Groups.io, I’ve set out to make them even better. As John ‘Hannibal’ Smith, leader of the A-Team, liked to say, “I love it when a plan comes together.”

Turning A Web Site Into A Mac App

For some web sites, I have multiple accounts, and need to be able to switch between those accounts easily. I created a set of site specific browsers for each web site and account using Fluid. A site specific browser looks like a normal app, but is actually a self contained browser set to open a specific web page. These site specific browsers don’t share resources, so you can set multiple ones up targeting the same web page, but using different logins. The problem with Fluid, however, is that it doesn’t seem to work with 1Password, the app I use to manage all my passwords. This meant that each time I launched a Fluid app, I’d have to also launch 1Password, look up the appropriate password, and then cut and paste it into the Fluid app to log in. Not ideal. Fortunately, I’ve come across a better solution, using Chrome. It allows me to create site specific browsers using Chrome and it also integrates with 1Password. And it’s free. It involves just a couple steps.

First, you must download this shell script. Each time you run it, it will create a new site specific browser app. It requires 3 bits of information: the name you want to call the app, the web page it should open up, and an icon to use for the app. For icons, I used Google Image Search.

Once you run the script, it creates the new app in your /Applications directory. Clicking on this app will launch a Chrome process, separate from your normal Chrome browser, pointed at the page you specified. So far, we’ve duplicated Fluid. Now, we need to install the 1Password extension. Hit Command-T, to open a new tab in the app, and go to the web page: https://agilebits.com/browsers/index.html. Then click on the green button to install the 1Password extension. Now, the site specific browser you’ve created has 1Password installed. Quit out of it and restart it. You can now right-click to bring up 1Password and fill in any log in form you have.

Backup and Dropbox Strategy

The hard drive on my iMac is making a funny noise, which is a good excuse to review my current backup strategy. It’s fairly simple these days. All of my data is on a 3TB iMac fusion drive. That gets backed up 3 ways. It’s backed up to an Apple Time Capsule, which also serves as my main WiFi router. I also use CrashPlan to back up to their cloud service, CrashPlan Central. And as a third backup, I use CrashPlan to back up to a USB drive every couple of weeks. The USB drive is otherwise stored in a (hopefully) fireproof safe. This gives me the safety of an off-site backup as well as the speediness of two different on-site backups. I’ve had occasion to use the Time Capsule to recover some files, but I have not yet had to rely on any of the CrashPlan backups.

CrashPlan provides the software to do local backups for free; they make their money on CrashPlan Central. I signed up for the 4 year CrashPlan+ Family unlimited plan, because we have several computers to back up. When I first set up the CrashPlan cloud backup, I used their ‘seed service’, in which a hard drive is mailed to you, you use it to do a local backup, and then you mail the drive back to them. This is much faster than doing the initial backup entirely over the net.

These days, a discussion of backups wouldn’t be complete without talking about Dropbox. My wife and I share a Dropbox account. It’s a great way to share family-related information, and it’s automatically backed up. But we wanted some extra security on the files we share, so we encrypt our Dropbox account using BoxCryptor. BoxCryptor creates an encrypted volume on top of a Dropbox directory. This shows up on Windows as another drive; on Mac it shows up as a volume. It encrypts files on a per-file basis, instead of creating a monolithic encrypted filesystem. This allows DropBox to continue to sync on a file-by-file basis when something changes. BoxCryptor is free for individuals; the paid version has extra features.

One feature to consider is encryption of file names. It provides an extra level of security; if someone were able to look at your DropBox account, they’d only see a bunch of files with gibberish for file names. In the end, I decided against enabling that feature. We’re giving up a little bit of security (the file data itself is still encrypted). The advantage of not encrypting file names comes up in the case of recovering deleted files, should you have to do that. You’ll be able to locate the files to recover because the file names are still readable. That wouldn’t be possible with encrypted file names.

Yahoo Groups

I read with interest Marissa Mayer’s comments today at the Goldman Sachs Technology conference, specifically her mention of Yahoo Groups:

One of our strongholds has been Yahoo Groups, as it moves to the phone it opens up all kinds of possibilities. The phone is a much better place to do group communication.

My first startup was ONElist, which was renamed Yahoo Groups after we were acquired in August 2000. Over the past 12 plus years, I’ve watched as Yahoo did basically nothing with Groups. It’s still almost the same as when it was acquired. Yahoo has devoted only enough resources to keep it going all these years. In fact, if you try to use the site now, it often times out and is generally extremely sluggish. I don’t have current numbers, but I’ve been told that even with all the neglect, Groups still has over 100 million users. The group archives make up many petabytes of data. It is not a small service.

Email groups are great ways to communicate. As numerous people have told me over the years, Yahoo Groups have affected people’s lives in significant and profound ways. As my friends will attest, I’m at least as cynical as the next software engineer. But I think group communication is one of the most important aspects of the Internet and I truly believe that it has and continues to make the world a better, safer, more inclusive place. But Y! Groups has stagnated for 12 years.

Several months ago, I got fed up with the state of (neglect of) Groups and decided to start working on a next generation Groups service. It’s not ready yet, but it’s not too far out.

With all that, ever since Mayer took over as CEO, I’ve been watching for signs that she’d devote resources to Groups, and this is the first sign I’ve seen that they may be working on an update. They have a lot of challenges in doing so. With a service that hasn’t changed in 12 years, people have become accustomed to the interface and I believe there will be a lot of resistance from long time Groups users (which is the subject of an essay for another day). But I know that Groups can be so much more than what Y! Groups are right now. It’s only a matter of time. Whether Yahoo, or I, or someone else launches the next generation of groups, it will happen, and people will be better for it.

Book Review: The Centre Cannot Hold


I’ve been on safari in Africa twice; the first time was our honeymoon in 2011. We had such an amazing time that we decided our next trip would be another safari, and so we went again this past June. And we knew even before we returned, that it won’t be our last safari. I’ve posted some of the photos I took on those trips here on the blog and several adorn our house.

For Christmas, my in-laws got me The Centre Cannot Hold, a book by David Gulden, of black and white images taken in Africa of (mostly) animals. David’s goal was to take photos that no one else had been able to capture. This entailed using such devices as an infrared-triggered camera, and going to such lengths as using a cross-bow to mount a camera near an eagle’s nest. His effort was worth it. The images are stunning, and not just because of his MacGyver ways. The man clearly has a talent. The result is a great coffee-table book. And for this newbie photographer, the photos are an inspiration.

ONElist Office

Sam Rushing recently came across some old photos he took, including this one, which is a panorama of the old ONElist building in Redwood City. It was taken in February, 2000, which was after we merged with eGroups and before we were acquired by Yahoo (and became Yahoo Groups). The office was a converted warehouse and had about 50 people in it. This photo doesn’t show all the cubicles behind the photographer, nor does it show the offices underneath. During the whirlwind that was ONElist, to my lasting regret, I never took any photos, so I especially appreciate Sam’s rediscovery.

The cardboard cutout, btw, is Sarah Michelle Geller, during her Buffy The Vampire Days. I never knew the story behind why that cutout was in the office.

Stitched Panorama

Tanzania Safari Report – Final Day

Lee suggested that we’d have time for a brief drive before our flight this morning. We jumped at the opportunity to do so, calling it our bonus drive, and we were immediately rewarded. Lee mentioned that he had heard some lion calls near our lodge in the night, so we went looking for them, and 10 minutes later came across a pair of lions mating. Out of respect for the delicate readers of this blog, this first photo is post-coitus. The male lion is walking away to find something to read while the female lion rolls on her back, pawing the air and says “Wait, can’t we talk about feelings now?”

The mating process takes around 3 days, during which they will have sex approximately every 20 minutes. I got tired just typing that sentence. Each, umm, event, only takes about 10 seconds. Feel free to insert your own jokes here. At the point of climax, both lions roar. It’s quite impressive. Especially when you’re just 20 feet away.

At the end of our bonus drive we came across a pair of jackals. They’re small, canine scavengers. These guys were definitely less shy than the other jackals we saw on the trip.

Thus ends our safari adventure. I hope that both of you reading this enjoyed it!