Converting a 23andMe Raw Genetic File to VCF
23andMe customers can download their raw genetic data, following these instructions.
However, this data comes in a propietary 23andMe format.
The 23andMe file is called something like genome_John_Doe_v3_Full_20190101201010.zip (the exact file name format may vary).
Here, we explain how to convert the file to the standard VCF format.
Step 1: Download a reference file - any version of hg19 or GRCh37 will do, for example this one: hs37d5.fa.gz. This file is quite large: appoximately 900MB.
Step 2: Compress your 23andMe file with Genozip:
Step 3: Convert the file to VCF
genocat -e hs37d5.fa.gz --vcf genome_John_Doe_v3_Full_20190101201010.genozip --output mydata.vcf.gz
Note: the output file (mydata.vcf.gz in the example above) is compressed into .gz format if the file name ends with .gz.
Limitations: Indel variants (‘DD’ ‘DI’ ‘II’) as well as uncalled sites (’–’) are discarded