23andMe Export

I was pleasantly surprised today when I logged into my 23andMe account today to check something and I found a new option to export all my data. It’s about 5 megs in size compressed, and is in a tab delimited format. Here are the first 30 or so lines of my data file:

# This data file generated by 23andMe at: Wed Jan 23 09:48:03 2008
# Below is a text version of your data. Fields are TAB-separated
# Each line corresponds to a single SNP.  For each SNP, we provide its identifier
# (an rsid or an internal id), its location on the reference human genome, and the
# genotype call oriented with respect to the plus strand on the human reference
# sequence.  We are using reference human assembly build 36.  Note that it is possible
# that data downloaded at different times may be different due to ongoing improvements
# in our ability to call genotypes.
# More information on reference human assembly build 36:
# http://www.ncbi.nlm.nih.gov/projects/mapview/map_search.cgi?taxid=9606&build=36
# rsid	chromosome	position	genotype
rs3094315	1	742429	AG
rs12562034	1	758311	GG
rs3934834	1	995669	CC
rs9442372	1	1008567	AG
rs3737728	1	1011278	AG
rs11260588	1	1011521	GG
rs6687776	1	1020428	CC
rs9651273	1	1021403	AA
rs4970405	1	1038818	AA
rs12726255	1	1039813	AA
rs11807848	1	1051029	CC

I also saw that 23andMe has launched a blog called the spittoon™.

23andMe Updates

From the comments on my 23andMe posts:
Ann Turner asks:

Mark, if you or any of your readers are homozygous for the mutation conferring lactase persistence (both alleles are A), I’d like to compare notes on how far the homozygosity extends along the chromosome.

If this applies to you, leave a comment or send me an email and I’ll make sure Ann gets it.
Andrew Scheidecker posts:

If you can’t wait for 23andme to add support for raw data export, I’ve released the source code for a program that can extract the raw data from their website:


It can store the data in its own encrypted file format or export it to CSV.

I haven’t had the chance to try this program yet, but look forward to doing so soon.
Finally, Gary Wolf has an excellent post up on his experience with 23andMe so far. He ends his post:

But for now, I’m looking for ways to make my 23andme results more relevant. I welcome suggestions.

I’m in the same boat.

23andMe Call Rates and Exporting Data

Not to make this an all 23andMe all the time, kind of blog, but I wanted to publish some answers I recently received from the company in response to a couple questions I raised in my last post on the subject. I asked if it was possible to export the genome data. Their response was:

At this time 23andMe does not have a way to give customers their genetic data on a CD, flash drive or other downloadable or stored format. But we are working to make that possible, and hope to be able to distribute raw data to our customers in the near future.

We’re not talking about a lot of data, especially compressed (at least in this day and age). A conservative back of the envelope calculation, using 10 characters per SNP (for the rsid and the actual allele) and 600,000 total SNPs, yields less than 6 megabytes of raw data. And there shouldn’t be any technical reasons why they couldn’t export the raw data. I hope they decide to allow this soon.
The second question I asked was about call rates. The website talks about a 99% call rate (the rate at which a given SNP can be reliably decoded). But I was seeing a bunch of No Calls when looking at my data. Their response was:

If you look through the Genome Explorer, however, you will find many instance where the call rate falls well below 99%. In the APOE gene, for example, many customers will find that their data includes five or more ‘no calls’ out of a total 19 SNPs.
The high number of ‘no calls’ in such situations is a result of the way 23andMe has customized the Illumina Hap550+ chip that is used to analyze our customers’ DNA. Our laboratory easily achieves a 99% call rate at the 550,000 (actually closer to 561,000) SNPs that are on the standard chip.
But in 23andMe has added an additional 30,000 custom SNPs in locations that we expect to be of particular interest to our customers. These additional SNPs that we have selected based on research interest are sometimes not genotypable at the same call rate levels. We did however want to include them so we can review further and possibly improve the information down the road.

The APOE gene talked about in their response corresponds to the gene thought to contain many SNPs relevant to Alzheimer’s Disease. Hopefully in the near future, the technology will be refined enough to yield more accurate data.
I appreciate the speed and candor in 23andMe’s response to my questions.

23andMe and SNPedia

In my last post about 23andMe, I mentioned that there didn’t seem to be a central place where one could look up diseases and the SNPs associated with them. In the comments, I was pointed to SNPedia.com, a wiki that aims to do exactly that. It’s fairly basic at this point, but it does have a decent amount of information on 69 different diseases and traits.
I’m interested in Alzheimer’s Disease (AD), and SNPedia has a lot of information on the SNPs that have been associated with that disease. For example, Rs4420638 (SNPs are referred to by these RS numbers), has been associated with an increased risk of late-onset AD. Looking at the SNPedia page for this SNP, you can see that specifically it’s when this SNP is GG does one have the increased risk for AD. This is exactly the information I should be able to use with the data that 23andMe has on my genotype. So I took the Rs4420638 number and plugged it into the Genome Explorer of 23andMe. But here is one of the two problems I’ve run into with 23andMe. There’s no data on this SNP because this is not one of the 600,000 SNPs that 23andMe has sequenced. The technology doesn’t yet exist to sequence an entire genome cheaply, so 23andMe has chosen to do a subset, as represented by 600,000 SNPs. To be fair to 23andMe, they have always been up front about this.
In addition to an incomplete list of SNPs, some of the SNPs that 23andMe does try to sequence don’t result in valid data. This manifests as a ‘No Call’ indicator when looking up an affected SNP. According to the web site, the chip that 23andMe uses results in a 99% call rate, meaning that 99% of the SNPs sequenced should yield valid results. I haven’t been able to verify this with my data, as there doesn’t appear to be a way to export my entire data set yet. But I did run into the ‘No Call’ indicator when looking up another SNP related to AD, Rs7412. Interestingly, I talked with another person who also went through the 23andMe process, and their Rs7412 SNP was also marked as ‘No Call’.
These two issues do limit the value of the results. I look forward to the day when full genome sequencing is cheap and reliable. I’m going to continue to play with my results and I’ll post more if I find out more.

23andMe Results

Unexpectedly, I received an email this evening saying that the results from my 23andMe DNA sequencing were ready. As I just blogged, I sent in my saliva sample just 2 weeks ago, and I didn’t expect to hear anything for another couple weeks.
There are three main parts to the 23andMe web site, the Gene Journal, the Ancestry section, and the Genome Labs section. I first went to the Ancestry section, and selected the Maternal Ancestry sub section. Up came a ‘heat map’ of the world, showing areas where my ancestors were from. Turns out I’m mainly from the Near East, Europe, Central Asia and Northern Africa. I’m part of the Ashkenazi, Druze and Kurds population. No surprises here.
More interesting was the Gene Journal section. This part of the web site details 14 different traits/predispositions. I learned that I am more likely to be able to taste certain bitter flavors, which explains my hatred of brussels sprouts and other things (see Mom, it’s not my fault I was a picky eater!). I have a slightly lower chance of getting Type 1 Diabetes, but a higher chance of getting Type 2 Diabetes. And I have a slightly higher chance of suffering from something called ‘Restless Legs Syndrome’.
But what I hoped would be the most interesting part of the 23andMe experience was the Genome Explorer section of the Genome Labs part of the site. This is the part where you can view the actual SNPs of your genome (or at least the ones that were mapped as part of the process). In the press, this has been referred to as ‘Googling your DNA’. You can look up SNPs by gene, or you can go to a specific SNP. This is great and all, but pretty meaningless unless you can correlate a gene/SNP to a specific disease or trait. For example, I’m interested in Alzheimer’s disease. Recent research suggests that there may be a genetic link to at least some forms of the disease. I wanted to see if I was affected. Googling around, I found that the APOE gene on chromosome 19 is of particular interest, specifically APOE e2, e3 and e4. In the Genome Explorer, I can type in APOE, and it takes me to a listing of 19 SNPs on the APOE gene. Ok, great. But I have no idea which one(s) of those SNPs are the ones we’re talking about and what the mutations are. Without this last bit, the Genome Explorer is basically meaningless.
I’ve sent an email to customer support asking about this. It’s entirely likely that I’m just missing some key piece of information. I’ll post again when I get a response.

23andMe Unboxing

When the website 23andMe launched their genome sequencing service before Thanksgiving, I immediately signed up. Their service, as I understand it, looks at approximately 600,000 SNPs. In English, it looks for mutations over 600,000 points on a person’s DNA. It’s not sequencing an entire genome, but it provides a good ‘sampling’. This information can be used to determine such things as ancestry and whether a person is predisposed to some genetic-based diseases. The service costs $1000 and only requires a sample of saliva.

I ordered the kit on a Sunday, and it arrived on Wednesday, November 21st.

Opening it up, you see this: IMG_0113
The left side in more detail: IMG_0114
The right side in more detail: IMG_0115

You spit in the tube. The process took me about 5 minutes (it’s a lot of saliva!). Then you screw on the big cap, which releases a fluid into the tube. Then you unscrew the big cap and screw on the small cap, which seals the tube. Shake the tube up a bit and then put the tube into the enclosed FedEx envelope and send it off. 23andMe says the sequencing process takes 4 to 6 weeks, which means for me that I should have my results back somewhere between December 19th and the end of the year.

It will be interesting to see how useful this is. I’m fascinated with genetics, and when whole genome sequencing becomes affordable, I’ll definitely do that. The technology is moving quickly.

Update: Looks like Mike over at Techcrunch is doing the same thing.

%d bloggers like this: