Streaming Postgres Changes is based on a series of Postgres databases, which has worked very well for us. There are some scenarios, however, where it would be advantageous to have our data in a streaming log-based system, like Apache Kafka. One specific scenario involves ElasticSearch, which we use to provide full text search over group archives. Right now, when a message is added or updated, we send the update to Postgres, then we send the update to the ElasticSearch cluster. If we want to re-index our archives, we have to prevent new messages from coming in as well as changes to existing messages, while we do a table scan of the archives into a new ES cluster. This is non-optimal.

A better pattern would be as follows:

  • Additions/Changes/Deletions from Postgres get streamed into a log-based system.
  • A reader is constantly consuming those changes and updating the ES cluster.
  • When we want to re-index the site, we start a new reader which consumes the log from the beginning, creating a new ES index.
  • When the new ES index is up to date, simply switch the site over to it, stop the old reader and delete the old ES index.

There are other scenarios where having a log-based representation of the data would be useful as well. With this in mind, I’ve been researching ways to stream Postgres changes. These are my notes about what I’ve learned so far. They may be incomplete and contain errors. Corrections are appreciated.

Postgres introduced logical replication in version 9.4. With the addition of a plugin, it is now possible to stream changes from a Postgres database, in whatever format you prefer. There are a couple of projects that use this to stream Postgres into Kafka, like Bottled Water (no longer maintained) and Debezium. I could not get the Debezium Postgres plugin compiled on Centos 7. In addition, there’s a competitor to Kafka, called NATS, which, while not as mature as Kafka, has the advantage (to me) of being written in Go. There appears to be a connector between Postgres and NATS, but I haven’t explored it. Another related project is pg_warp, which allows you to stream Postgres changes to another Postgres database.

I wanted to explore exactly how a Postgres streaming system would work. While everything is documented, it was not clear to me how the process worked, should one want to implement their own logical replication system. The Postgres docs helped my understanding, along with this presentation, and this blog post from Simple. Also, playing with the wal2json plugin and following their README helped, and Debezium’s docs go into some detail as well. But there is more to it. This is what I’ve found out.

There are a couple of parts to a Postgres streaming system. You want to first get a complete snapshot of the existing database, and then you want to get all changes to the database going forward, without missing any changes should there be a hiccup (ie crash).

Here are the steps required (note: this may be updated as I gain more experience with this):

  • Set up Postgres for logical replication, and decide which plugin to use. All the plugin does is determine the format of the data you will receive.
  • Connect to the database using the streaming protocol. This means appending “replication=database” to the URL. The streaming protocol is not the same as the normal Postgres protocol, although you can use psql to send some commands. If you are programming in Go, the only Postgres driver that I found that supports the replication protocol is pgx. Unfortunately, one of the commands needed from it, CreateReplicationSlot(), does not return the name of the Snapshot created, which you need. I’ve submitted a pull-request with the change to return this information.
  • At the same time, connect to the database the normal way.
  • On the streaming connection, issue this command (using wal2json as the plugin for this example):
  • This creates the replication slot, and it also generates a snapshot. The name of the snapshot is returned. Also, the consistent_point is returned.
  • On the normal connection, you can now take a snapshot of the existing database using the snapshot name returned above. Use these commands to initiate the transaction:
    SELECT * from ….;
  • Once the snapshot has been completed, on the streaming connection, issue:
    START_REPLICATION SLOT test_slot LOGICAL consistent_point;
  • That starts the streaming. You will receive data in the format the plugin outputs. One piece of data also returned is the WAL position. You can use this to resume streaming, should you have to restart your streaming system.
  • When you are done streaming, you must issue a DROP_REPLICATION_SLOT command on the streaming connection, otherwise Postgres will not be able remove old WAL files and you will eventually run out of disk space.

I am not completely sure that I’ve outlined the process correctly for when you have to restart streaming. I am also unclear about replication origins, but I think that’s only applicable if you’re replicating into another Postgres database. But I’m not sure.


Slack is wonderful and awesome and horrible and bad. A rant.

Slack is the hot new thing. It’s killing email! It’s the best way to collaborate! <Insert awesome emoji here, also highlighting how great Slack is and how cool I am for knowing emoji.>

It’s so easy to create a new Slack team. That is, unless you want people from outside your domain to join. That’s cool, here’s this 20 step process to create a sign-up form on some other service so that people can join your group. My kids’ pre-school can master a sign-up form for snack day without having to download a gist from Github. No pressure.

Finding Slack teams to join is awesome and gives such a sense of accomplishment. It’s like Indiana Jones discovering the lost Arc. Except he at least had part of a map showing where it was, which is a lot more than you have for discovering Slack rooms. Here’s a tip: every time you stumble across a Slack room to join, buy a lotto ticket, because clearly luck is with you.

Slack is real-time, which is great. But wait. All of the sudden you’re wading through the 50 reaction gifs that just blew up your phone notifications from the 1,000 person web design Slack room you just joined. Slack is like what Ugthor, cave dwelling Neanderthal futurist, imagined the world of the tomorrow would be.

So you turn off notifications and then promptly forget to work through your 11 different logins to your 500 chat rooms[1].

But no, you get back up on that pony. John starts talking about one thing, but then Bill starts talking about something else at the same time. Before you know it, you’ve given yourself an aneurysm trying to mentally thread and follow the 15 different conversations going on at any one time in each room. Who doesn’t love the smell of hematoma in the morning; except, whoops, you’ve lost your sense of smell from the brain damage.

Slack is the best and one true way to collaborate. Also, interestingly, it’s your new employer. Who knew when you started using Slack that you were getting a new job: keeping up with Slack. The hours are long and the boss is a taskmaster who communicates only through emoji, but at least the pay is, well, umm, nevermind. Hope those lotto tickets pay out.

Look, I have three rules which I live by: Never play cards with a guy who has the same first name as a city, don’t look directly into the light, and never go near a lady with a tattoo of a dagger on her hand.[2]

But that’s beside the point. Here’s the thing. I like chat as much as the next guy. Slack takes it to the next level (shoutout to Hipchat as well). And Stewart and his team are amazing people doing objectively awesome things. They deserve ALL the success. <Insert that 100% emoji here. See, I’m cool.>



You’re wrong if you think Slack replaces email. Email’s been killed more times than Freddy Krueger (kids, ask your parents). There’s a reason it’s not going anywhere. It works. Threads are an absolute necessity when multiple conversations are happening. You’re always going to check your email. And you don’t have to reply immediately.

You’re wrong if you think Slack’s the one true way to collaborate and will kill all others. Especially for larger teams. There is no one tool that fits all. Forums are awesome. So too are my personal favorite, email groups.[3]

You’re wrong if you think … hell, I don’t have a third point. I’m not a good writer. But you knew that.

In conclusion: <Insert shrugging guy emoji here. See, I’m cool!>

Also, if you have a moment, please check out We even integrate with Slack.

(Also published on Medium).

[2] Bastardized from:
[3] My startup, Update

It’s been almost six months since I launched and four months since I’ve talked about it here on the blog, so I figured it was time for an update. I’ve been heads down working on new features and bug fixes. Here’s a short list of the major features added during that time:

Slack Member Sync

Mailing lists and chat, like peanut butter and chocolate, go great together. Do you have a Slack Team? You can now link it with your group. Our new Slack Member Sync feature lets you synchronize your Slack and member lists. When someone joins your group, they will automatically get an invite to join your Slack Team. And when someone joins your Slack Team, they’ll automatically get added to your group. You can configure the sync to be automatic or you can sync members by hand. Access the new member sync area from the Settings page for your group.

As an aside, another potentially great combination, bacon and chocolate, do not go great together. Trust us, we’ve tried.

Google Log-in

You can now log into using Google. For new users, this allows them to skip the confirmation email step, making it quicker and easier to join your groups.

Markdown and Syntax Highlighting Support

You can now post messages using Markdown and emoji characters. And we support syntax highlighting of code snippets.

Archive Management Tools

The heart of a group is the message archive. And nobody likes a unorganized archive. We’ve added the ability to split and merge threads. Has a thread changed topics half way through? Split it into two threads. Or if two threads are talking about the same thing, you can merge them. You can also delete individual messages, and change the subject of threads.

Subgroups now supports subgroups. A subgroup is a group within another group. When viewing your group on the website, you can create a subgroup by clicking the ‘Subgroup’ tab on the left side. The email address of a subgroup is of the form

Subgroups have all the functionality of normal groups, with one exception. To be a member of a subgroup, you must be a member of the parent group. A subgroup can be open to all members of the parent group, or it can be restricted. Archives can be viewable by members of the parent group, or they can be private to the members of the subgroup. Subgroups are listed on the group home page, or they can be completely hidden.

Calendar, Files and Wiki

Every group now has a dedicated full-featured Calendar, Files section, and Wiki.

In other news, we also started an Easy Group Transfer program, for people who wish to move their groups from Yahoo or Google over to

Email groups are all about community, and I’m pleased that the Beta group has developed into a valuable community, helping define new features and scope out bugs. I’m working to be as transparent as possible about the development of through that group, and through a dedicated Trello board which catalogs requested features and bug reports. If you’re interested, please join and help shape the future of! Database Design

Continuing to talk about the design of, today I’ll talk about our database design.

Database Design is built on top of Postgresql. We use GORP to handle marshaling our database objects. We split our data over several separate databases. The databases are all currently running in one Postgresql instance, but this will allow us to easily split data over several physical databases as we scale up. A downside to this is that we end up having to manage more database connections now, and the code is more complicated, but we won’t have to change any code in the future when we split the databases over multiple machines (sharding is a whole other thing).

There are no joins in the system and there are no foreign key constraints. We enforce constraints in an application layer. We did this for future scalability. It did require more work in the beginning and it remains to be seen if we engaged in an act of premature optimization. Every record in every table has a 64-bit integer primary key.

We have 3 database machines. DB01 is our main database machine. DB02 is a warm-standby, and DB03 is a hot-standby. We use wall-e to backup DB01’s database to S3. DB02 uses wall-e to pull its data from S3 to keep warm. All three machines also run Elasticsearch as part of a cluster. We run statistics on DB03.

Our data is segmented into the following main databases: userdb, archivedb, activitydb, deliverydb, integrationdb.


The userdb contains user, group and subscription records. Subscriptions provide a mapping from users to groups, and we copy down several bits of information from users and groups into the subscription records, to make some processing easier. Here are some of the copied down columns:

GroupName string // Group.Name
Email string // User.Email
UserName string // User.UserName
FullName string // User.FullName
UserStatus uint8 // User.Status
Privacy uint8 // Group.Privacy

We maintain these columns in an application layer above the database. By duplicating this information in the subscription record, we greatly reduce the number of user and group record fetches we need to do throughout the system. These fields rarely change, so there’s not a large write penalty. There is definitely a memory penalty, with the expanded subscription record. But I figured that was a good trade off.


The archivedb stores everything related to message archives. The main tables are the thread table and the message table. We store every message in the message table, as raw compressed text, but before we insert each message, we strip out any attachments, and instead store them in Amazon’s S3. This reduces the average size of emails to a much more manageable level.


The activitydb stores activity logging records for each group.


The deliverydb stores bounce information for users.


The integrationdb stores information relating to the various integrations available in


We use Elasticsearch for our search, and our indexes mirror the Postgresql tables. We have a Group index, a Thread index and a Message index. I tried a couple Go Elasticsearch libraries and didn’t like any of them, so I wrote my own simple library to talk to our cluster.

Next Time

In future articles, I’ll talk about some aspects of the code itself. Are there any specific topics you’d like me to address? Please let me know.

Are you unhappy with Yahoo Groups or Google Groups? Or are you looking for an email groups service for your company? Please try

What Runs

I always appreciate when people talk about how they’ve built a particular piece of software or a web service, so I thought I’d talk about some of the architecture choices I made when building, my recently launched email groups service. This will be a multi-part series.


One of the goals I had when I first started working on was to use it as an opportunity to learn the new language Go. is written completely in Go and is my first project in the language. As a diehard C programmer (ONElist was written in C, and Bloglines was written in C++), it took very little time to get up to speed on Go and I now consider myself a huge fan of the language. There are many reasons why I like to code in Go. It’s compiled, so it’s fast and you get all the code checks you miss from interpreted languages. It generates stand alone binaries, which is great for distributing to production machines. It’s got a great standard library. It’s easy to write multithreaded code (threads are called goroutines). The documentation system is good. But besides all that, the philosophy behind Go just fits my mental model better than any other language I’ve worked in. It all combines to make programming in Go the most fun I’ve had coding in a very long time.

Components consists of several components that interact with each other. All interactions are done using JSON over HTTP.


The web server handles all web traffic, naturally. It is proxied behind nginx, because I believe that makes for a more flexible and slightly more secure system. Nginx terminates the encrypted HTTPS traffic and passes the unencrypted traffic to the web process. We use the standard Go HTML template system for our web templates, and we use several parts of the Gorilla web toolkit. We use Bootstrap for our HTML framework.


The smtpd daemon handles incoming SMTP traffic for the domain. It is also proxied behind nginx. The email it handles consists mainly of group messages, although there are some other messages as well, including bounce messages. It sends group and bounce messages to the messageserver for processing. Other messages are forwarded, using a set of rules, to other email addresses. We based smtpd heavily on Go-Guerrilla’s SMTPd.


The messageserver daemon processes group messages, bounce messages and email commands. For group messages, it verifies that the poster is subscribed and has permission to post to the group, it archives the message and sends it out to the group subscribers, using Karl to send the messages. It also sends the messages to our Elasticsearch cluster. Bounce and email command messages are processed as well. All group messages are processed through the messageserver, whether they arrive through the smtpd, or whether they were posted through the web site.


Karl, named after Karl ‘The Mailman’ Malone, is our email sending process. It is responsible for all emails originating from the domain. It is passed an email message, a footer template, a sender, and a set of data about each receiver the message should be sent to. For each receiver, it evaluates the template, inserting subscriber specific information, and then merges it with the email message before sending it out. It also handles DKIM signing of emails. It stores all emails using Google’s leveldb database until they are successfully sent.

A reasonable question to ask is why didn’t I outsource the email delivery part of the service. There are several companies that provide email delivery outsourcing. In general, outsourcing is a way to save development time. But when I thought about it, I did not think I’d be able to save much time by outsourcing; I’d still have to connect our data with whatever templating system the email delivery service used. And Karl did not take very long to write. But more importantly, email delivery is a core competency of our service and I believe we have to own that.


Errord is a simple logging process, used to log error messages and stack traces from any core dumps in any of the other processes. I can look at the errord log and instantly see if anything in the system has crashed and where it crashed.

Rsscrawler, Instagramcrawler

Rsscrawler and instagramcrawler are cronjobs that deal with the Feed and Instagram integrations, respectively. Rsscrawler looks for updates in feeds that are integrated with our groups, and Instagramcrawler does the same for instagram accounts. They’re currently run twice an hour. If they find an update, they generate a group message and pass it along to the messageserver.


Bouncer is a cronjob that is run once a day to manage bouncing users.


Expirethreads is a cronjob that’s run twice an hour to expire threads that are tagged with hashtags that have an expiration.


Senddigests is a cronjob that’s run once a night, to generate digest emails for users with digest subscriptions.

Next Time

In future articles, I’ll talk about the machine cluster running, the database design behind the service, and some aspects of the code itself. Are there any specific topics you’d like me to address? Please let me know.

Are you unhappy with Yahoo Groups or Google Groups? Or are you looking for an email groups service for your company? Please try


I’m not one to live in the past (well, except maybe for A-Team re-runs), but for many years now, I’ve felt like I’ve had unfinished business. I started the service ONElist in 1998. ONElist made it easy for people to create, manage, run and find email groups. As it grew over the next two and a half years, we expanded, changed our name to eGroups, and, in the summer of 2000, were acquired by Yahoo. The service was renamed Yahoo Groups, and I left the company to pursue other startups.

But really this story starts even further back, in the Winter of 1989, when in college I was introduced to mailing lists. I was instantly hooked. It was obvious that a mailing list was a great way to communicate with a group of people about a common interest. I started subscribing to lists dedicated to my favorite bands (’80’s Hair Metal, anyone?). I joined a list for a local running club. And, at every company I’ve worked at since graduating, there have been invaluable internal company mailing lists.

But that doesn’t mean that mailing lists can’t improve. And this is where we get back to the unfinished business. Because email groups (the modern version of mailing lists) have stagnated over the past decade. Yahoo Groups and Google Groups both exude the dank air of benign neglect. Google Groups hasn’t been updated in years, and some of Yahoo’s recent changes have actually made Yahoo Groups worse! And yet, millions of people put up with this uncertainty and neglect, because email groups are still one of the best ways to communicate with groups of people. And I have a plan to make them even better.

So today I’m launching in beta, to bring email groups into the 21st Century. At launch, we have many features that those other services don’t have, including:

  • Integration with other services, including: Github, Google Hangouts, Dropbox, Instagram, Facebook Pages, and the ability to import Feeds into your groups.
  • Businesses and organizations can have their own private groups on their own subdomain.
  • Better archive organization, using hashtags.
  • Many more email delivery options.
  • The ability to mute threads or hashtags.
  • Fully searchable archives, including searching within attachments.

We’re just starting out; following the tradition of new startups everywhere, we’re in Beta. We’re working hard to squash the inevitable bugs and work to make the system even better (based on your feedback!).

I’m passionate about email groups. They are one of the very best things about the Internet and, with, I’ve set out to make them even better. As John ‘Hannibal’ Smith, leader of the A-Team, liked to say, “I love it when a plan comes together.”