23andMe Call Rates and Exporting Data

Not to make this an all 23andMe all the time, kind of blog, but I wanted to publish some answers I recently received from the company in response to a couple questions I raised in my last post on the subject. I asked if it was possible to export the genome data. Their response was:

At this time 23andMe does not have a way to give customers their genetic data on a CD, flash drive or other downloadable or stored format. But we are working to make that possible, and hope to be able to distribute raw data to our customers in the near future.

We’re not talking about a lot of data, especially compressed (at least in this day and age). A conservative back of the envelope calculation, using 10 characters per SNP (for the rsid and the actual allele) and 600,000 total SNPs, yields less than 6 megabytes of raw data. And there shouldn’t be any technical reasons why they couldn’t export the raw data. I hope they decide to allow this soon.
The second question I asked was about call rates. The website talks about a 99% call rate (the rate at which a given SNP can be reliably decoded). But I was seeing a bunch of No Calls when looking at my data. Their response was:

If you look through the Genome Explorer, however, you will find many instance where the call rate falls well below 99%. In the APOE gene, for example, many customers will find that their data includes five or more ‘no calls’ out of a total 19 SNPs.
The high number of ‘no calls’ in such situations is a result of the way 23andMe has customized the Illumina Hap550+ chip that is used to analyze our customers’ DNA. Our laboratory easily achieves a 99% call rate at the 550,000 (actually closer to 561,000) SNPs that are on the standard chip.
But in 23andMe has added an additional 30,000 custom SNPs in locations that we expect to be of particular interest to our customers. These additional SNPs that we have selected based on research interest are sometimes not genotypable at the same call rate levels. We did however want to include them so we can review further and possibly improve the information down the road.

The APOE gene talked about in their response corresponds to the gene thought to contain many SNPs relevant to Alzheimer’s Disease. Hopefully in the near future, the technology will be refined enough to yield more accurate data.
I appreciate the speed and candor in 23andMe’s response to my questions.

Advertisements

Comments

  1. it seems decode does provide the raw data.
    http://www.decodeme.com/index/faq#samples_data11

  2. If you can’t wait for 23andme to add support for raw data export, I’ve released the source code for a program that can extract the raw data from their website:
    http://www.scheidecker.net/personal-genome-explorer/
    It can store the data in its own encrypted file format or export it to CSV.