From a Happier Time

When I brought Scott Shambarger in as the first person to help run ONElist back in 1998, we made a bet. If we ever sold the company for a decent amount of money, we’d shave our heads. A little more than two years after that, the company threw a party on a boat on San Francisco bay to celebrate the closing of the Yahoo acquisition of (the now called) eGroups. Michael Klein was instrumental in making that happen.
I tried to find a funny picture of Mike, and came upon this one, taken by Joe Gross, of Mike assisting in the shaving of my head at that party.


Reports That Michael Klein Was in a Plane Crash in Panama

The title says it all. Reports are that Mike Klein, who I hired as CEO of ONElist and who later oversaw the acquisition of (the renamed) eGroups to Yahoo, is missing after a plane crash on Sunday in Panama. Mike owns a resort off the coast of Panama, and the reports say he was flying with his daughter, a friend, and the pilot. Their destination was the Chiriqui volcano, but they never made it. Searchers have been hampered by bad weather.
Mike currently runs Pacificor, a hedge fund in Santa Barbara. They just issued a press release. Another report with more details is here.
This is terrible news. I will post when more is known.
Update: Report from CNN with a little more information.
Update: Unfortunately, the crash site was found Christmas day, and Mike, his daughter, and the pilot did not survive the crash. Mike was one of the smartest people I knew and this is a great loss.

Seasons Greetings

One of my favorite memories of this past year was of an afternoon spent selecting LOLcat pictures for use in a presentation I would later give in Edinburgh on blogging (trust me, it was more fun than it sounds). Anyways, here is my contribution to the LOLcat canon. Happy Holidays.

23andMe Call Rates and Exporting Data

Not to make this an all 23andMe all the time, kind of blog, but I wanted to publish some answers I recently received from the company in response to a couple questions I raised in my last post on the subject. I asked if it was possible to export the genome data. Their response was:

At this time 23andMe does not have a way to give customers their genetic data on a CD, flash drive or other downloadable or stored format. But we are working to make that possible, and hope to be able to distribute raw data to our customers in the near future.

We’re not talking about a lot of data, especially compressed (at least in this day and age). A conservative back of the envelope calculation, using 10 characters per SNP (for the rsid and the actual allele) and 600,000 total SNPs, yields less than 6 megabytes of raw data. And there shouldn’t be any technical reasons why they couldn’t export the raw data. I hope they decide to allow this soon.
The second question I asked was about call rates. The website talks about a 99% call rate (the rate at which a given SNP can be reliably decoded). But I was seeing a bunch of No Calls when looking at my data. Their response was:

If you look through the Genome Explorer, however, you will find many instance where the call rate falls well below 99%. In the APOE gene, for example, many customers will find that their data includes five or more ‘no calls’ out of a total 19 SNPs.
The high number of ‘no calls’ in such situations is a result of the way 23andMe has customized the Illumina Hap550+ chip that is used to analyze our customers’ DNA. Our laboratory easily achieves a 99% call rate at the 550,000 (actually closer to 561,000) SNPs that are on the standard chip.
But in 23andMe has added an additional 30,000 custom SNPs in locations that we expect to be of particular interest to our customers. These additional SNPs that we have selected based on research interest are sometimes not genotypable at the same call rate levels. We did however want to include them so we can review further and possibly improve the information down the road.

The APOE gene talked about in their response corresponds to the gene thought to contain many SNPs relevant to Alzheimer’s Disease. Hopefully in the near future, the technology will be refined enough to yield more accurate data.
I appreciate the speed and candor in 23andMe’s response to my questions.

Database Developments (new post on Startupping)

I just wrote a new post over on Startupping about two items related to databases and Internet services. I talk about SSDs and the launch of Amazon’s new SimpleDB, which I think is a very big deal.

23andMe and SNPedia

In my last post about 23andMe, I mentioned that there didn’t seem to be a central place where one could look up diseases and the SNPs associated with them. In the comments, I was pointed to, a wiki that aims to do exactly that. It’s fairly basic at this point, but it does have a decent amount of information on 69 different diseases and traits.
I’m interested in Alzheimer’s Disease (AD), and SNPedia has a lot of information on the SNPs that have been associated with that disease. For example, Rs4420638 (SNPs are referred to by these RS numbers), has been associated with an increased risk of late-onset AD. Looking at the SNPedia page for this SNP, you can see that specifically it’s when this SNP is GG does one have the increased risk for AD. This is exactly the information I should be able to use with the data that 23andMe has on my genotype. So I took the Rs4420638 number and plugged it into the Genome Explorer of 23andMe. But here is one of the two problems I’ve run into with 23andMe. There’s no data on this SNP because this is not one of the 600,000 SNPs that 23andMe has sequenced. The technology doesn’t yet exist to sequence an entire genome cheaply, so 23andMe has chosen to do a subset, as represented by 600,000 SNPs. To be fair to 23andMe, they have always been up front about this.
In addition to an incomplete list of SNPs, some of the SNPs that 23andMe does try to sequence don’t result in valid data. This manifests as a ĎNo Callí indicator when looking up an affected SNP. According to the web site, the chip that 23andMe uses results in a 99% call rate, meaning that 99% of the SNPs sequenced should yield valid results. I haven’t been able to verify this with my data, as there doesn’t appear to be a way to export my entire data set yet. But I did run into the ĎNo Callí indicator when looking up another SNP related to AD, Rs7412. Interestingly, I talked with another person who also went through the 23andMe process, and their Rs7412 SNP was also marked as ĎNo Callí.
These two issues do limit the value of the results. I look forward to the day when full genome sequencing is cheap and reliable. Iím going to continue to play with my results and I’ll post more if I find out more.

23andMe Results

Unexpectedly, I received an email this evening saying that the results from my 23andMe DNA sequencing were ready. As I just blogged, I sent in my saliva sample just 2 weeks ago, and I didnít expect to hear anything for another couple weeks.
There are three main parts to the 23andMe web site, the Gene Journal, the Ancestry section, and the Genome Labs section. I first went to the Ancestry section, and selected the Maternal Ancestry sub section. Up came a Ďheat mapí of the world, showing areas where my ancestors were from. Turns out Iím mainly from the Near East, Europe, Central Asia and Northern Africa. Iím part of the Ashkenazi, Druze and Kurds population. No surprises here.
More interesting was the Gene Journal section. This part of the web site details 14 different traits/predispositions. I learned that I am more likely to be able to taste certain bitter flavors, which explains my hatred of brussels sprouts and other things (see Mom, itís not my fault I was a picky eater!). I have a slightly lower chance of getting Type 1 Diabetes, but a higher chance of getting Type 2 Diabetes. And I have a slightly higher chance of suffering from something called ĎRestless Legs Syndromeí.
But what I hoped would be the most interesting part of the 23andMe experience was the Genome Explorer section of the Genome Labs part of the site. This is the part where you can view the actual SNPs of your genome (or at least the ones that were mapped as part of the process). In the press, this has been referred to as ĎGoogling your DNAí. You can look up SNPs by gene, or you can go to a specific SNP. This is great and all, but pretty meaningless unless you can correlate a gene/SNP to a specific disease or trait. For example, Iím interested in Alzheimerís disease. Recent research suggests that there may be a genetic link to at least some forms of the disease. I wanted to see if I was affected. Googling around, I found that the APOE gene on chromosome 19 is of particular interest, specifically APOE e2, e3 and e4. In the Genome Explorer, I can type in APOE, and it takes me to a listing of 19 SNPs on the APOE gene. Ok, great. But I have no idea which one(s) of those SNPs are the ones weíre talking about and what the mutations are. Without this last bit, the Genome Explorer is basically meaningless.
Iíve sent an email to customer support asking about this. Itís entirely likely that Iím just missing some key piece of information. Iíll post again when I get a response.

23andMe Unboxing

When the website 23andMe launched their genome sequencing service before Thanksgiving, I immediately signed up. Their service, as I understand it, looks at approximately 600,000 SNPs. In English, it looks for mutations over 600,000 points on a person’s DNA. It’s not sequencing an entire genome, but it provides a good ‘sampling’. This information can be used to determine such things as ancestry and whether a person is predisposed to some genetic-based diseases. The service costs $1000 and only requires a sample of saliva.

I ordered the kit on a Sunday, and it arrived on Wednesday, November 21st.

Opening it up, you see this: IMG_0113
The left side in more detail: IMG_0114
The right side in more detail: IMG_0115

You spit in the tube. The process took me about 5 minutes (it’s a lot of saliva!). Then you screw on the big cap, which releases a fluid into the tube. Then you unscrew the big cap and screw on the small cap, which seals the tube. Shake the tube up a bit and then put the tube into the enclosed FedEx envelope and send it off. 23andMe says the sequencing process takes 4 to 6 weeks, which means for me that I should have my results back somewhere between December 19th and the end of the year.

It will be interesting to see how useful this is. I’m fascinated with genetics, and when whole genome sequencing becomes affordable, I’ll definitely do that. The technology is moving quickly.

Update: Looks like Mike over at Techcrunch is doing the same thing.

Me on Twitter

I’ve been playing around with Twitter for awhile. To follow along, see my Twitter page.

The Nerd Handbook

“A nerd needs a project because a nerd builds stuff. All the time. Those lulls in the conversation over dinner? Thatís the nerd working on his project in his head.” A great post explaining us nerds.