The Aggregated Me

The concept of aggregation is increasingly important on the Internet, as the sheer number of information resources increases. The average user wants to track more and more things on the Internet; an aggregator quickly becomes necessary as one’s bookmark list grows to infinity. The first aggregators, what I call ‘general purpose’ aggregators, like Bloglines, Google Reader, and Newsgator, are focused on tracking blogs and news feeds, making it easy to subscribe to whatever blogs the user came across.
The new service FriendFeed has been getting a lot of attention the past couple of weeks. It’s the latest in the line of what I call ‘individual aggregators,’ services that aggregate all the distributed parts of a person’s on-line presence in one place. A person may have a blog, a Twitter account, a Flickr photostream. These services combine all of these items in one place. This trend started with Facebook’s newsfeed, continued with Plaxo’s Pulse, and then several other services, including Tumblr can do most of what the individual. These services are different than the general purpose aggregators in that they’re focused on tracking individuals, not feeds. But the general purpose aggregators can do what the individual aggregators can do, because the underlying technology, RSS, is the same. It’s really just a matter of user interfaces and a key bit of information.

The Problem

The individual aggregators collect a list of all of the distributed parts of a person’s on-line presence. They ask each user to list their Twitter account, their Flickr account, their YouTube account, their blog. This list doesn’t exist anywhere in a way that’s machine readable. Each of the individual aggregators has to deduce this information and then maintain it. Or more specifically, each user has to maintain this information on each of the individual aggregators. Wouldn’t it be better if this list existed somewhere under direct control of the user in a way where it wasn’t siloed in a centralized, proprietary service? That way, every aggregator could take advantage of it and users would only have to update the list in one place.

A Modest Proposal

This problem is actually a general purpose version of a problem already solved by something called RSS Autodiscovery. In order to make it easier for general purpose aggregators to find RSS feeds to subscribe to, many publishers included a special line of text in the headers of their HTML. I have one on my blog:

<link rel=”alternate” type=”application/rss+xml” title=”RSS” href=”http://www.wingedpig.com/index.rdf&#8221; />

Aggregators know to look for this line, which tells them where the RSS feed for that blog exists. Can’t we just extend this to include a list of all the other aspects of a person’s identity? Have one line for each service the person uses, and change the title accordingly. So, I could include:

<link rel=”alternate” type=”application/rss+xml” title=”Flickr Feed” href=”http://api.flickr.com/services/feeds/photos_public.gne?id=35034347955@N01&lang=en-us&format=rss_200&#8243; />

for my Flickr feed. This doesn’t have to only apply to services that publish RSS feeds. I could even do something like:

<link rel=”alternate” type=”application/twitter” title=”Twitter” href=”wingedpig” />

to indicate my Twitter account.
By doing this, the list of all the parts of a person’s on-line presence is kept under the control of the person, associated with their blog. It’s distributed, open, and easy to implement.

How To Make It Work

For this to work, a couple things need to happen. Blog publishing software has to be modified to ask for and then insert this information into the headers of a person’s blog. Then,aggregators need to be modified to look for this information, and to periodically recheck it. The general purpose aggregators need to augment their interfaces to allow people to subscribe to these new feeds. But none of these things are terribly difficult to do.

Advertisements

Comments

  1. What you’re looking for, already exists. It is called Microformats / XFN. With this, I can define on my blog, which other, external links are “mine” by using rel=”me”.
    NoseRub (http://noserub.com) does exactly this and you can install it on your own server, if you want. By doing so, you have a unique URL (see my URL here, this is a free NoseRub service running on identoo.com) which holds all the information about my web accounts.
    Other instances of NoseRub out there are able to parse these pages, but also profile pages from eg. FriendFeed and aggregate the content of all RSS-Feeds on that pages.
    Technically, it’s all there – we just don’t should wait for the propietary players like FriendFeed and SocialThing to guide the way for us. They have other interests, beside the good for all users…

  2. This is definitely an important problem (we call it “online identity consolidation” at Plaxo). Linking to your other sites with rel=”me” solves a lot of the pain, esp. since you can use Google’s Social Graph API to pull down the transitive closure of all rel=me links given a single input URL. For instance, when you come to Plaxo Pulse and tell us your home page (or even just hook up a feed like twitter), we use the Social Graph API to find all your other me-links and auto-suggest to you to add the ones that match sites we support. You don’t really need “application/twitter” since you have a URL like twitter.com/jsmarr that is “self-describing”. And if you set up a public profile on Plaxo (like http://joseph.myplaxo.com), you can choose to share some or all of your sites in your “On the web” section, which we tag with rel=me so other sites can consume the info. This way mainstream users can participate in the ecosystem without having to code up their own list of links. But there’s still a massive deployment issue to get users to actively hook up and share what sites they’re using, so the pain will continue to be felt for a while. Maybe a browser plug-in like sxipper (that can watch what sites you visit) would be in a unique position to help accelerate/ease the gathering and sharing of this data?

  3. I agree with the general sentiment, but I worry that representing your Twitter user ID in a way different than your Flickr photo feed (as an example) means that the browsers or extensions that support these meta tags have to be constantly updated to understand every kind of credential or identification. Keeping them as feeds, with additional metadata that can be utilized if the extension understands how to use it, seems better.
    More importantly, I would offload the management task of keeping up with your ‘me-feeds’ to another service, rather than piggyback it on top of your blogging tool. You could do this by having a single META tag that points to an OPML file that consists of all your personal feeds. The OPML file could in turn either be a hand-edited list, or a file hosted by another provider. As a (truly hypothetical) example, FriendFeed could supply an OPML file of the feeds you liked to your account, and you could include that in your blog’s headers. Then when you open a StumbleUpon account you just need to add it to FriendFeed and it would be available as metadata to the people reading your blog, because it would be automatically added to ‘your’ OPML file that you linked to from your blog.
    This would have the added benefit of keeping the page size small. If you have 20 accounts you’d still just need one OPML file that would only be accessed once per session, rather than a set of META tags that would be transmitted on every pageview.

  4. @kevin: having a hosted OPML file for your “me feeds” in plaxo/friendfeed/etc would be useful for batch-importing into a feed reader, but does it provide any additional value beyond just rel=me linking to the other sites you use, each of which can in turn link to their RSS feed via the current auto-discovery mechanism? The advantage to rel=me links is that they’re visible both to users and computers, whereas OPML files are separate/invisible metadata that can easily become out-of-sight-out-of-mind. And since the Social Graph API means you don’t have to crawl this stuff in real-time, there shouldn’t be a speed difference in terms of storing all the data in a single OPML file vs. linking between the sites you use (the latter architecture also gives each site an additional level of indirection if they need to change their feed urls, etc.). And the mechanisms around publishing and consuming rel=me links are already established. Thoughts?